V
V
Vayladion Gognazdiak2020-03-30 19:39:05
ruby
Vayladion Gognazdiak, 2020-03-30 19:39:05

How to quickly output a LARGE amount of data to STDOUT in Ruby?

Good day.
The challenge is to quickly print hundreds of millions of lines to STDOUT.
The code is executed in a docker container with 2 cores and 4 GB of RAM.
Ruby 2.7.0

Now pseudo-specific:
There are several files (each from 2,000,000 to 3,000,000 lines) that are read line by line, each line is output from 10,000 to 150,000 times.

def print_gen(line)
  rand(10000..150000).times do
    $stdout.write(line)
  end
end

['1.txt', '2.txt', '3.txt'].each do |file|
  fork do
    File.readlines(file).each do |line|
      print_gen(line)
    end
  end
end
Process.waitall


Result of work:
time ruby start.rb > test.txt
real	1m13,236s
user	1m12,338s
sys	0m1,011s

wc -l test.txt
119245 test.txt

The profiler clearly shows the problem with IO.wait.
When using threads, the picture is similar.

How can you speed up the process?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
R
Roman Mirilaczvili, 2020-03-30
@2ord

What does the output to stdout have to do with reading the logs?
$stdout.write
The test is meaningless.
File.each_linemore efficient in memory consumption.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question