A
A
alexdora2017-10-24 19:10:20
Video
alexdora, 2017-10-24 19:10:20

MacPro or self-assembly on Win10 in this case?

Given 3 RAW cameras that drive the video stream to the computer via HDMI / SDI and you need to encode these 3 streams, attention here, 6-9 times in [email protected] Encoding is based on ffmpeg/OBS from different angles.
The scheme looks something like this:
cam1 => HDMI => ffmpeg 1080p/30 15mbit (small color correction + compression) => RTMP (as separator) =>
1 stream is sent as is to another RTMP server for publishing
2 stream is taken by OBS to overlay dynamic graphics webgl and sending the result to another RTMP
And so 3 times + one more encoding may be added.
Immediately: I understand that catching OBS and transcoding again comes with a loss of quality (but this is not critical there), and even at such a bitrate it will not be noticeable.
There is such a choice:
- buy a macpro bucket with 8-12 cores and it will cost 280kr with all adapters (including HDMI captures ). Yes, it's expensive for their hardware, but there are several advantages:
1. It's a stable system. There is a requirement here that the site will work 24/7 and people who are not always versed in computers will use it. The basis in the form of macosx worked out in several companies gives reliability, no matter what they say. Yes, and the user himself mac osx on a beech, everything is clear here
2. If you need to sell, then it is sold with a minimum loss of money
- Build a slightly more powerful computer based on Windows and it will turn out to be the most powerful by 15 percent (by eye), and cheaper by 100k rubles. Considered top i9 or Zion E5 + 16RAM DDR4 ECC + one video card
There are two main minuses - Windows (which needs to be serviced periodically, no matter what anyone says. I have specialists who have been working exclusively with Windows for a long time and hint that everything can be rebuilt with server Windows and it works well. But the usual for users, it needs to be maintained and configured.It can be broken very easily if you have hands from the ass, even if it was set up by a professional) and it is not clear where to sell this iron if you need to sell it. Yes, and the loss in price will be greater.
This choice arose not because of the money so per se. That is, there is a hardware solution that will cost about 1 million rubles, but there is such a solution that is higher. The main trouble is that there is a doubt that macpro will pull out the whole thing (still, the percentage, although it remains at the top of the bench, has been for 13 years already). I have an i5 in a beech macbook pro retina late 12 through thunderbolt chokes with one stream, and here there will be 6 of them, and despite the full-fledged server stone, there are doubts. But this machine will still be used simultaneously when the streams are running, and the same ffmpeg, in the absence of a power reserve, will begin to overflow the buffer and discard packets.
For all this to work, all this above should load the processor by a maximum of 30-40%. The remains of the relics must be at work at the computer by yourself.
Fresh thoughts are required, it is too difficult to calculate how much one thread will consume.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
alexdora, 2017-10-25
@alexdora

How the matter ended
It was assembled on two stones xeon E5-2690v4 (28 cores with HP * 2 = 56 cores at a frequency of 2.6 ) + Nvidia Quadro P2000 + Blackmagic Duo 2 board on 4
SDI RDP for 4 machines. While everything was being set up, we came to the conclusion to pull out the stones 2690 and use them elsewhere, and instead put 2 stones younger 2630v4 (20 cores with HP * 2 = 40 frequency 2.2 )
At first everything was fine, but ran into an unpleasant problem. When connecting via RDP, the user has a choice of how to transmit audio. Either play on the device that is connected, or play on the main device. If you select on which is connected, then the audio drivers are not displayed in broadcast programs. Ie something like a standard device blah blah blah. If you choose on the main trouble starts with the fact that you can accidentally break everything. We tried a lot of things, corrected the registry and looked for a solution, in the end they spat because they considered that this was already a crutch, moreover, it would not be clear how stable it would be at all.
While all this was being tested, I did a coding stress test. The main question was whether and how many streams will pull. After this stress test, it was actually decided to put younger stones. Results:
One [email protected] slow High keyframe 1 h.264 stream (on one Xeon 2630v4 processor) gives 8% loading
One [email protected] High keyframe 1 Nvenc stream (on quadro p2000) gives 10% loading
Answering your question at the very beginning, pull Macpro or not - pull. I highlighted the slow encoding mode. Usually we encode everything in veryfast format. That is, translating into Russian, ONE xeon 2630v4 processor can grind [email protected] in slow mode with 10 threads. In veryfast mode, there is a download, God forbid, 1-2%. And in the task we generally need 30 frames per second and not scenes that should be processed in slow mode
Now further. We decided to use a method that many would consider inadequate, but it turned out to be decisive for us. We installed Vmware Esxi 6.5 and forwarded the card to a Windows VM in Passthrought VGA mode. I did it myself and am surprised how easy it is. Still, I'm not a techie, but in 1 hour I installed everything according to the instructions.
At this point, I want to explain what caused the choice of the Quadro P2000 card (like it was possible to get by with a Geforce card)
Firstly, it was about more than one nvenc thread, there is such a sign : -decode-g...
The cheapest card with Unlimited and more or less recent was selected.
Secondly, they paid attention to the fact that it occupies one slot. There are a lot of crutches to bypass the limit on Geforce game cards. But all these crutches are not for production.
Thirdly, there is the issue of energy consumption. We pay for electricity. The same thing on Geforce would have cost us additional money for power supply + about 42 thousand per year in money for electricity. Yes, we took it at our leisure and calculated it;)
Taking into account the changed scheme, namely the installation of a hypervisor and cutting the machine into 4 parts with a choice of video cards, we lost only that it was possible to buy a younger P400 model. Anyway.
Let's summarize how everything looks and what happened.
Server: Dual Xeon E5-2630V4 + 64RAM ECC Reg + Blackmagic Duo 2 + 3x Nvidia Quadro P400 + 1x P2000
Vmware Esxi 6.5 base operating system
4 Windows 10 Pro with Passthroght VGA. Each has its own video card in exclusive access.
1 Ubuntu 16.04 with Blackmagic Passthrough. The operating system was given the Blackmagic card completely, ffmpeg was assembled there, which converts SDI ports to NDI (this is a fairly fresh standard without quality loss for transmission over the network). Then NDI catch machines on Windows 10 Pro and digest it further. Linux because you have to dance with tambourines to collect ffmpeg + NDI under a prominent one. On linux, this is done with one command, on Windows you need to smoke the manual. It is very convenient that OSes sit on one virtual interface. There really is no delay.
All this cost a total of 200k rubles (with screws, housing). As a result, we have 4 full-fledged machines in one, or rather 3 workstations for broadcasting that 4K60 will be pulled if desired. Eats it all when you turn on coding for everyone with a 25% load on both stones - 520 watts.
Now we are doing all the last stress tests to launch in production. Although many of it has already passed. I was pleased with the reboot on the hypervisor without freezing the PCI-e vidyahi. Real rescue
I'll write an answer. The night of sitting with people has borne fruit and in principle we have found solutions. And thoughts will be useful on this topic to someone
1. We tried to choose components for a car with two zions (Xeon). It turned out that this is not really a trivial task. We needed a board with two stones + an ATX case (because the video card should stand up) + we need a certain amount of PCIe 4x slots. There are few such boards suitable for v3 / v4 processors. v3/v4 are selected because the memory for older models will stupidly have to be collected from flea markets. Already with such a frequency, low memory cannot be taken. We found only 3 boards under suitable conditions, we sat and looked at the pictures so that the vidyaha and the rest of the boards would definitely fit. An additional complication is that some PCIs change their mode of operation under certain conditions. In general, an ass with all this. We got a car with two stones for 189 thousand rubles, the total benchmark is 26k. That is, there was a dialogue on this topic above and I was right. But I'm still glad that the dialogue took place, because.
2. Today-Tomorrow we will run the test on a weak i5 + Nvidia test machine with core Pascal. The most resource-intensive and what any processor can do at its peak is the initial encoding of the HDMI / SDI RAW signal from the capture card into something like h.264 with all the goodies. Specifically, OBS cannot do this on the basis of video, but ffmpeg eats nvec for a sweet soul. Here a nuance immediately arises - how much a particular card can draw streams. With ffmpeg, everything is quite simple - in extreme cases, you can specify which video card to use, but the original task was real-time processing (graphics / scaling) and in fact we should get 2 different content streams. One is clean, the other is processed.
Here, a diagram is drawn for a cursory view:
=> ffmpeg nvidia => ( RTMP for division )
1. ffmpeg sends directly to the server without compression, from where everything goes through services with the necessary compression
2. We take the stream with vMix software, apply effects / scaling and then
vMix can also work with nvidia on the same server (well, it actually does the same as ffmpeg ). The software is paid, but for these tasks it seems not so expensive, and the flexibility rolls over
. As a result, we get 2 clean encodings on the machine where the camera is connected
And yes, everything is based on Windows.
The moment remains with the fact that vMix, as far as I understand, is able to choose the video card that needs to be used. But this is the last mile. That is, you cannot run more than vMix on one machine. And the problem arises that either to assemble low-power machines with 1 vidyakha, or to draw a crutch of the format 3 vidyahi in one + a capture card and somehow miraculously steer everything. It is possible to taxi with 2 more cars, each with two vidyahi. But again, according to the same site, vMix i7 simple + nvidia 1080 draw 4 incoming streams
3. And I can’t help but remember about macpro. The same scheme through ffmpeg and nvidia vidyahi. It will cost a little more, but it will work. Two vidyahi through Thunderbold box (1000 bucks), but you can also take macro for this thing cheaper.
while we consider and think that it will be less piled up in terms of equipment. Macpro won't have vMix, of course, but OBS will definitely hook you. You only need 3 of them. Common vidyahi will pull webgl, percent will pull the stream already compressed by ffmpeg into a video stream. In general, now we will ride on Windows and think about how to live.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question