What write speed can be squeezed out of a budget storage system?

D

Dmitry2017-05-13 13:09:25

ZFS

Dmitry, 2017-05-13 13:09:25

Good day
Given
Intel R1304BTLSFANR
Intel Pentium G2020
32Gb RAM ECC
4 x 2Tb WD Red
1 SSD Intel DC S3500 Series SSDSC2BB120G401
Intel X520-DA2
Requires storage with NFS share that can write and read at speeds of more than 2-3Gb/single stream over a 10Gb network - Naturally, you want everything for free (i.e. for nothing).
While ZFS Raid10 + Zil is seen on SSD
Questions:
1. Are there any other options? Which?
2. What is the best way to build the proposed scheme? Will there be big losses when using Debian? FreeNAS?
3. What is the theoretically possible speed on the proposed solution, taking into account this iron?
4. Will adding a second SSD or replacing the processor help speed things up?
Thanks..

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

A

Artem @Jump, 2017-05-13
Tag curated by

which could write and read at a speed of more than 2-3 Gbit / with one stream

if we are talking about one stream - 0 or 10 raid, even in software implementation it will provide close to the specified speed on most modern sata disks
, unless of course there is no strong fragmentation.
A 120 GB SSD is clearly an extra element here, which is not needed at all, and not only will it not provide advantages, but also seriously worsen performance. Therefore, it is better to exclude it from the configuration.
An SSD is good for casual reading, but not good for streaming.

What is the best way to build the proposed scheme? Will there be big losses when using Debian? FreeNAS?

Depends on the protocol by which the storage system will work.

What is the theoretically possible speed on the proposed solution, taking into account this iron?

200-400Mb/s

Will adding a second SSD or replacing the CPU help speed things up?

No.
It is better to add HDD to 10 to have more stripes, or to make 0.
That is, one process continuously reads serial data, or writes a data stream.
If there are more than one simultaneous reading processes, or if the work is not a stream, but with random data samples, everything will be the other way around.

A

athacker, 2017-05-13
@athacker

There are several things to be clearly understood here.
1) You can't fool physics. Whatever the tricks, with a really high recording intensity, they do little to help. Write caching and what is called coalescing can really help a little - that is, the system groups blocks before writing in such a way that it is possible (well, if completely on the fingers) to write them in one span of the head above the surface. The main load will still fall on the disks, and there is exactly one option for accelerating writes - this is as many spindles as possible, as fast disks as possible (15k RPM instead of 7200 RPM), or even switching to SSD / NVMe. So it's better to take a pack of discs of a smaller volume, but in a larger quantity, and the recording speed will be higher.
2) ZIL is not a write cache. This is just a log that provides synchronous writing and allows you to keep the file system in a consistent state even during failures. In ZFS, with synchronous writing, in fact, the recording is carried out twice - first, the data goes to ZIL, and then it is already rewritten to the main disk. If no separate device was allocated for the ZIL, the ZIL is placed on the main disk. Of course, placing ZIL on an SSD speeds up the write operation. But one subtle point - it speeds up not in absolute terms, but relative to the case when ZIL is located on the main (HDD) disks.
Considering that you need NFS, which uses only synchronous writing, then you have two options - place ZIL on an SSD, or force asynchronous writing to be enabled in the properties of the ZFS dataset that will be presented via NFS. The choice of solution depends on what exactly you want to receive. If data consistency is important to you, it's better to leave sync and place ZIL on SSD. But you need to understand that async writing will be somewhat faster in any case, since writing to ZIL is still a delay and an additional overhead. Since even if the ZIL is located on the SSD, the access time to it and the data recording is not zero.
3) The cache in the RAM really helps (ARC). Considering that you will also have read in parallel, then increasing the RAM makes sense, since the fact that some blocks were read from ARC means that these blocks were not read from the disks, i.e. the disks did not have to do additional gestures for searching and reading pancakes from these blocks. This means that the discs can devote more time to recording. As for whether it is worth adding L2ARC - this is a philosophical question, and it depends heavily on the load. According to my observations, in the scenario of placing virtual machines on ZFS and onboard RAM of 128 GB, nothing reached L2ARC. Comparative numbers -- ARC hits/misses averaged about 80k/15k per minute, and L2ARC numbers were in the 15/800 region. Those. 15 hits per 800 L2ARC requests. In the end, we decided not to even waste an SSD on L2ARC.
One more thing - if you do not expect sequential reading, but will be pure random (for example, this is the case when placing virtual machines), then you should disable ZFS prefetch. Disabling it on synthetic tests showed an increase in IOPS by about 15-20%, with read / write 20% / 80% and full random access.
In general, you need to deploy the system, put it on monitoring, and observe what happens and how. Frya gives out enough information to draw conclusions - do you need a prefetch, does L2ARC work, how effective is ARC, etc. If necessary, contact me, I will throw off templates and configs for zabbix for monitoring storage based on FreeBSD + ZFS. I plan to draw an article about this, but that's until my hands reach ... :-)

D

Dmitry, 2017-05-14
@bogidaich

FFS is just an example. Can LVM be faster?
Maybe someone has other options for this iron that will give more speed?