D
D
Dasha Tsiklauri2018-09-12 01:51:10
linux
Dasha Tsiklauri, 2018-09-12 01:51:10

Why can there be a long fdatasync?

There is an application that uses boltdb as key-value storage. The problem is that when recording sometimes (every few minutes) very "long" times slip ( 100ms+ ), while usually 3-5ms . Recording within one transaction is carried out in small portions ( ~ 100 bytes ), and overwriting is constantly being carried out - the counter increases, saving to disk (that is, the file does not expand, but changes). The recording is done with the synchronization of data to disk (looking into the library code, you can see the fdatasync call ).
Feature: this behavior is observed at low loads ( 1-2 wr/5sec ), with increasing load (up to 1-3 wr/sec) time stabilizes.
If we talk about the quality of the disk - everything is spinning on the cx series virtual machine in hetzner.
I am attaching an example of the log (when the time goes beyond 100ms, I display tx.Stats () - this is the transaction statistics in the bolt engine, which makes it clear the bottleneck)

2018/09/11 22:46:32 write op [1] -> db_time: 1.903964ms; all_time: 1.929144ms; count: 31102
2018/09/11 22:46:33 write op [1] -> db_time: 1.581177ms; all_time: 1.604657ms; count: 31103
2018/09/11 22:46:35 write op [1] -> db_time: 1.994612ms; all_time: 2.019831ms; count: 31104
2018/09/11 22:46:36 write op [1] -> db_time: 1.465624ms; all_time: 1.486556ms; count: 31105
2018/09/11 22:46:36 write op [1] -> db_time: 1.591314ms; all_time: 1.610906ms; count: 31106
2018/09/11 22:46:37 write op [1] -> db_time: 2.026614ms; all_time: 2.058671ms; count: 31107
2018/09/11 22:46:38 write op [1] -> db_time: 2.002381ms; all_time: 2.030729ms; count: 31108
2018/09/11 22:46:39 write op [1] -> db_time: 148.475589ms; all_time: 148.523364ms; count: 31109
2018/09/11 22:46:39 {PageCount:4 PageAlloc:16384 CursorCount:4 NodeCount:3 NodeDeref:0 Rebalance:0 RebalanceTime:0s Split:0 Spill:3 SpillTime:47.935µs Write:5 WriteTime:148.3063ms}
2018/09/11 22:46:40 write op [1] -> db_time: 2.154875ms; all_time: 2.180598ms; count: 31110
2018/09/11 22:46:42 write op [1] -> db_time: 1.889926ms; all_time: 1.919976ms; count: 31111
2018/09/11 22:46:42 write op [1] -> db_time: 1.620444ms; all_time: 1.64762ms; count: 31112
2018/09/11 22:46:42 write op [1] -> db_time: 2.231889ms; all_time: 2.266515ms; count: 31113
2018/09/11 22:46:45 write op [1] -> db_time: 1.964315ms; all_time: 1.982548ms; count: 31114
2018/09/11 22:46:45 write op [1] -> db_time: 2.211583ms; all_time: 2.227578ms; count: 31115
2018/09/11 22:46:47 write op [1] -> db_time: 2.087202ms; all_time: 2.141793ms; count: 31116
2018/09/11 22:46:48 write op [1] -> db_time: 1.777217ms; all_time: 1.791075ms; count: 31117
2018/09/11 22:46:49 write op [1] -> db_time: 1.604663ms; all_time: 1.620596ms; count: 31118
2018/09/11 22:46:49 write op [1] -> db_time: 1.929698ms; all_time: 1.953727ms; count: 31119
2018/09/11 22:46:50 write op [1] -> db_time: 2.098869ms; all_time: 2.123076ms; count: 31120
2018/09/11 22:46:52 write op [1] -> db_time: 2.335013ms; all_time: 2.355466ms; count: 31121
2018/09/11 22:46:53 write op [1] -> db_time: 1.846938ms; all_time: 1.876342ms; count: 31122
2018/09/11 22:46:54 write op [1] -> db_time: 1.897064ms; all_time: 1.922749ms; count: 31123
2018/09/11 22:46:55 write op [1] -> db_time: 1.804152ms; all_time: 1.829208ms; count: 31124
2018/09/11 22:46:57 write op [1] -> db_time: 2.354736ms; all_time: 2.380525ms; count: 31125
2018/09/11 22:46:59 write op [1] -> db_time: 1.975578ms; all_time: 2.043906ms; count: 31126
2018/09/11 22:46:59 write op [1] -> db_time: 1.639791ms; all_time: 1.665766ms; count: 31127
2018/09/11 22:47:00 write op [1] -> db_time: 35.765551ms; all_time: 35.799343ms; count: 31128
2018/09/11 22:47:02 write op [1] -> db_time: 1.789419ms; all_time: 1.815851ms; count: 31129
2018/09/11 22:47:02 write op [1] -> db_time: 1.684525ms; all_time: 1.711484ms; count: 31130
2018/09/11 22:47:03 write op [1] -> db_time: 1.828242ms; all_time: 1.847136ms; count: 31131
2018/09/11 22:47:05 write op [1] -> db_time: 2.129164ms; all_time: 2.159227ms; count: 31132
2018/09/11 22:47:07 write op [1] -> db_time: 2.292547ms; all_time: 2.324302ms; count: 31133
2018/09/11 22:47:08 write op [1] -> db_time: 1.750997ms; all_time: 1.776809ms; count: 31134
2018/09/11 22:47:09 write op [1] -> db_time: 2.146918ms; all_time: 2.168435ms; count: 31135
2018/09/11 22:47:10 write op [1] -> db_time: 2.037713ms; all_time: 2.064122ms; count: 31136
2018/09/11 22:47:11 write op [1] -> db_time: 1.936771ms; all_time: 1.962748ms; count: 31137
2018/09/11 22:47:11 write op [1] -> db_time: 1.70052ms; all_time: 1.72535ms; count: 31138
2018/09/11 22:47:11 write op [1] -> db_time: 2.138566ms; all_time: 2.163997ms; count: 31139
2018/09/11 22:47:12 write op [1] -> db_time: 1.735974ms; all_time: 1.769459ms; count: 31140
2018/09/11 22:47:14 write op [1] -> db_time: 2.271941ms; all_time: 2.287771ms; count: 31141
2018/09/11 22:47:14 write op [1] -> db_time: 1.961868ms; all_time: 2.012049ms; count: 31142
2018/09/11 22:47:14 write op [1] -> db_time: 3.009883ms; all_time: 3.044088ms; count: 31143

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dasha Tsiklauri, 2018-09-20
@dasha_programmist

As Puma Thailand said - they conducted tests on real hardware, after 30k requests for 6-7 hours there are no problems, the recording time is stable.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question