S
S
serj372019-09-26 09:49:33
Notepad++
serj37, 2019-09-26 09:49:33

How to count duplicates in a text file?

There is a tekstovik, in each line a word. The volume of the textbook is 2 GB.
It is necessary to count all the doubles and display statistics (+ doubles with different case in the word) repetitions for each double.
Who-thread solved a similar problem?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
N
Nick Sdk, 2019-09-26
@serj37

php script

<?php
$fileName = 'fileName.txt';
$doubles = [
    'withRegister' => [],
    'withoutRegister' => [],
];

$fileHandle = fopen($fileName, "r");
if ($fileHandle) {
    while (($line = fgets($fileHandle)) !== false) {
        $lineWithoutRegister = mb_strtolower($line);
        if (!isset($doubles['withRegister'][$line])) {
            $doubles['withRegister'][$line] = 0;
        }
        if (!isset($doubles['withoutRegister'][$lineWithoutRegister])) {
            $doubles['withoutRegister'][$lineWithoutRegister] = 0;
        }
        $doubles['withRegister'][$line]++;
        $doubles['withoutRegister'][$lineWithoutRegister]++;

    }
    fclose($fileHandle);
} else {
    throw new Exception('Error read file.');
}
echo "\nДубли с учетом регистра:\n";
foreach ($doubles['withRegister'] as $line => $count) {
    if ($count > 1) {
        echo "{$count} дублей:\n{$line}\n";
    }
}
echo "\nДубли без учета регистра:\n";
foreach ($doubles['withoutRegister'] as $line => $count) {
    if ($count > 1) {
        echo "{$count} дублей:\n{$line}\n";
    }
}

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question