How to sort a large number of photos?

M

MrQwerty2017-02-09 16:29:55

Debian

MrQwerty, 2017-02-09 16:29:55

Good afternoon. I decided to make myself a copy of the family archive. After collecting from all sources, you will get a large nested structure of folders, photos and video content (about 100 gigabytes). It is necessary to sort the photos into an ordered structure (let's say - Camera / YYYY / MM / DD). For processing, the idea was to write a small program (worth debian). At the moment, the algorithm is as follows:
1. Find all jpg in the selected folder.
2. For each jpg, check the headers (I can be wrong, but they seem to be in jpg) for integrity (you never know the file is damaged or the file is not a photo, but just a file with the jpg extension).
3. Write information about each file (name and relative path, hash, exif tags) to a csv file (or something similar).
4.Find duplicates and apply filters (for example, there should be all exif tags).
5.Copy/move the remaining files to the specified folder structure.
How would it be better to implement it, is it possible to optimize the algorithm?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

E

Edward Tibet, 2017-02-09
@MrQwerty

From the series: "how to make everything true in the console":
1. find
2. Here you need to clarify. do you mean exif? But there are many options here - a photo saved without exif, etc. The simplest - exiv2
3. Information - exiv2
4. By duplicates - jdupes (it's better than fdupes).
5. find + exec mv
Wrap everything in sh and debug.

A

abcd0x00, 2017-02-13
@abcd0x00

How would it be better to implement it, is it possible to optimize the algorithm?

It is better to write several different scripts. One script should take a directory and do its own thing with it. And then a script is written that connects these independent scripts into one system.
Well, here's one script that finds all jpg files in a given folder and transfers them to a new folder, which is intended for further processing (which processing - he does not know and should not know).
The script takes the name of the folder, the desired file extension, and the name of the new folder to save.
2. For each jpg, check the headers (I can be wrong, but they seem to be in jpg) for integrity (you never know the file is damaged or the file is not a photo, but just a file with the jpg extension).
The second script also accepts a folder and checks the files in it for integrity. What is the result of his work? For example, moving non-entire files to a new folder. Further, he also does not know what will be done there with whole or non-whole files.
The script takes the name of the folder, the desired file extension, and the name of the new folder to save.
3. Write information about each file (name and relative path, hash, exif tags) to a csv file (or something similar).
The third script takes a folder and writes information about the files in it to a csv file.
The script accepts a folder name, a searchable file extension, and a filename to save.
4.Find duplicates and apply filters (for example, there should be all exif tags).
The fourth script takes a folder, looks for duplicates and deletes them.
The fifth script takes a folder and applies filters to the files in it.
(I hope you understand that they cannot be combined into one script.)
The sixth script takes a folder and moves files from it into a folder structure.
The script takes a folder name and a new folder name to save.
So, at the end, you write the seventh script that manages these scripts, which knows where and what folder to submit at each stage.