How to remove duplicates in CSV by value from a separate column (PHP)?

G

Gangg2016-11-23 13:42:14

PHP

Gangg, 2016-11-23 13:42:14

Please help me solve my problem.
There is a CSV file with the following content:
12/17/15;01:11:57;Name1;Surname1;RU; 176.70.69.242 ;1;123;
12/18/15;05:45:43;Name2;Surname2;RU;456.8.39.432;1;123;
12/18/15;09:24:32;Name3;Surname3;RU; 176.70.69.242 ;1;123;
There are doubles in the 6th column (highlighted in bold).
How can I clean the csv file from such duplicates?
That is, the task is the following. If there are two or more lines with the same values in the 6th column, then only one line should be left in the file.
Thanks in advance for your help!
ps I'm interested in the PHP version

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

U

ukoHka, 2016-11-23
@ukoHka

$ips = array(); //массив уникальных значений
... //тут код обработки, далее в цикле
$row_items = split(";",$row); //Разбиваем строку на массив
if (in_array($row_items[5], $ips)) { //Если уже есть в массиве
//удаляем строку
} else {
$ips[] = $row_items[5]; //добавляем значение в массив
}

E

Eugene Volf, 2016-11-23
@Wolfnsex

I think in this scenario (based on the comments above), I would do something like this:

$uniq_string = []; //Пустой массив уникальных значений
$uniq_column = 4; //Номер уникальной колонки
$data = file('file.csv'); //Получаем массив строк

$f = fopen('new_file.csv', 'w');
for($i = 0; $i < count($data); $i++) {
    $row_array = explode(';', $row[$i]);
    if (!in_array($row_array[$uniq_column], $uniq_string)
        $uniq_string[] = $row_array[$uniq_column];
        fwrite($f, $data[$i])
    }
}
fclose($f);

Something like this, the code is approximate and I did not test it, but I think the idea is clear ... We rewrite the file leaving only unique lines.

G

Gangg, 2016-11-23
@Gangg

Thanks to all. But I already solved the problem with my crutch) As always, I will ask a question on the forum, and after half an hour insight comes)
Here is the code, maybe it will come in handy for someone)

$baseCSV = file('base.csv', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);//Складываем строки из CSV файла в масив

 
foreach($baseCSV as $itemBaseCSV){
$arrLineCsv = explode(";", $itemBaseCSV);//Формируем масив из отдельной строки по разделителю ;
$arrUniqFinish[$arrLineCsv[0].";".$arrLineCsv[1].";".$arrLineCsv[2].";".$arrLineCsv[3].";".$arrLineCsv[4].";".$arrLineCsv[5].";".$arrLineCsv[6].";".$arrLineCsv[7]] = $arrLineCsv[5];//В новый масив забиваем всю строку как ключ, а елемент масива, по которому фильтруем на дубли, как значение          

}

 
$arrUniqFinish = array_unique($arrUniqFinish);//Фильтруем дубли с помощью функции array_unique.

foreach($arrUniqFinish as $keyArr => $valueArr){
$finishSavedCsv[] = $keyArr;//Забиваем в новый масив значения которые берем с ключей масива $arrUniqFinish, который в свою очередь уже чистый от дублей по признаку 5 столбца (счет от 0)

}

file_put_contents('base.csv', implode("\n", $finishSavedCsv))//Перезаписываем CSV файл с уникальными строками