A
A
akimdi2017-08-12 01:43:54
linux
akimdi, 2017-08-12 01:43:54

How to remove substrings in strings in txt file?

There is a text file user.js
https://gist.githubusercontent.com/anonymous/8b1e7...
it has many lines with repeated substrings.
For example, there is the 1189th line
"user_pref("geo.wifi.uri", ""); // comments;"
and there is the 12th line
"user_pref("geo.wifi.uri", "");"
from the point of view of a text editor, these are different lines, but from the point of view of logic, these are the same lines.
And there are many such examples (lines).
How can I remove these duplicate lines?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
L
longclaps, 2017-08-12
@akimdi

#!/usr/bin/python3
import re

ptrn = re.compile(r'^\s*user_pref\(([^\)]+)\);').search
unic = set()
with open("user.js", "r") as fi, open("user_nodup.js", "w") as fo:
    for s in fi:
        m = ptrn(s)
        if m:
            data = m.group(0)
            if data in unic:
                print(s, end="")  # duplicate
                continue
            unic.add(data)
        fo.write(s)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question