C
C
Chvalov2015-07-27 21:59:03
linux
Chvalov, 2015-07-27 21:59:03

Explain which is better AWK or UNIQ on Linux (Removing duplicates from a larger file)?

There is a txt file whose volume is 107GB free on a 109 GB screw.
What is better to use to quickly get rid of duplicate lines in a text file.
I tried the command " " Everything started beautifully and very quickly, but after 15-17 hours I already saw how it does everything line by line and it really started to blunt the computer. I look to the side, but I don’t know how much better it will be than the previous team. Who can advise what? awk '!seen[$0]++' text.txt
uniq text.txt> text_new.txt

Answer the question

In order to leave comments, you need to log in

1 answer(s)
R
Ruslan Fedoseev, 2015-07-27
@Chvalov

do not display the result on the screen and the speed will pleasantly surprise you;)
awk and uniq are about the same in speed
I work with database dumps through sed and awk, a 250 gig text file... a total of 5 minutes after setting the task...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question