B
B
boom472018-06-26 19:18:11
Programming languages
boom47, 2018-06-26 19:18:11

What PL can quickly process large amounts of information?

The essence is this:
There is a base of mail addresses - 300 million lines (in each line 1 mail). There is a second base of mail addresses - 10 thousand lines (in each line 1 mail).
You need to check the 10k base with the 300kk base and find unique mails in 10k that are not in the main 300kk base.
What is the best language for developing such a program.

Answer the question

In order to leave comments, you need to log in

5 answer(s)
W
wegres, 2018-06-26
@wegres

Speed ​​comparison of mawk, nawk, gawk system utilities with Java, Python, Perl, C++, Ruby
Don't MAWK AWK – the fastest and most elegant big data munging language!
brenocon.com/blog/2009/09/dont-mawk-awk-the-fastes...

S
sim3x, 2018-06-26
@sim3x

SQL
grep
*(any)

D
Dimonchik, 2018-06-26
@dimonchik2013

pingvinus.ru/note/compare-files-diff-in-linux

M
Mikhail Potanin, 2018-07-05
@potan

kdb

A
Alexander, 2018-07-22
@Crysdd

If the task is one-time, it is best to use Unix command line utilities.
If you need to compare the entire string, fgrep is faster than grep.
If you need to compare large lists - I don't know better comm.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question