What PL can quickly process large amounts of information?

B

boom472018-06-26 19:18:11

Programming languages

boom47, 2018-06-26 19:18:11

The essence is this:
There is a base of mail addresses - 300 million lines (in each line 1 mail). There is a second base of mail addresses - 10 thousand lines (in each line 1 mail).
You need to check the 10k base with the 300kk base and find unique mails in 10k that are not in the main 300kk base.
What is the best language for developing such a program.

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

W

wegres, 2018-06-26
@wegres

Speed comparison of mawk, nawk, gawk system utilities with Java, Python, Perl, C++, Ruby
Don't MAWK AWK – the fastest and most elegant big data munging language!
brenocon.com/blog/2009/09/dont-mawk-awk-the-fastes...

S

sim3x, 2018-06-26
@sim3x

SQL
grep
*(any)

D

Dimonchik, 2018-06-26
@dimonchik2013

pingvinus.ru/note/compare-files-diff-in-linux

M

Mikhail Potanin, 2018-07-05
@potan

kdb

A

Alexander, 2018-07-22
@Crysdd

If the task is one-time, it is best to use Unix command line utilities.
If you need to compare the entire string, fgrep is faster than grep.
If you need to compare large lists - I don't know better comm.