Search for the desired column in csv. How to implement it algorithmically?

V

Vadim Apenko2018-03-20 13:16:18

Python

Vadim Apenko, 2018-03-20 13:16:18

Good afternoon toaster dwellers.
At work, I had to write a simple python script for selecting photos by geotag.
The problem arose further.
From the introductory:
A directory with a bunch of photos.
CSV file, where there are several columns, the first one is always the name of the photo file.
And somewhere in the rest there are lat and lon coordinates (they always stand like this in a pair, first lat, then lon)
And now, the columns are sometimes mixed up, for example, in one CSV
photoname.jpg,lat,lon,,,,
and in another
photoname.jpg,angleX,angleY,lat,lon,,,
So, are there any ideas how at the CSV parsing stage to make it so that he himself found the columns with coordinates?
At the output I want to get a list like

[[photoname,lat,lon],[photoname,lat,lon],[photoname,lat,lon],....]

The coordinates are always of the form XX.XXXXXX.. (moreover, the number of characters after the dot can change, but not less than 6i)
The problem is that in the remaining columns there are numbers like "-43.0056"
Probably you can somehow do regular expressions, but I still don’t really understand like... I'm not good at them.

Reply

Answer the question

In order to leave comments, you need to log in

[[+comments_count]] answer(s)

V

Vladimir Varlamov, 2018-03-20
@k4m454k

You need for each line you need to get lat,lon via findall() and a regexp of the form ,(-?\d{1,2}\.\d{6,},-?\d{1,2}\.\d{6,}),( https://regex101.com/r/dUucUj/2) , which relies only on 2 consecutive numbers and containing at least 6 characters after comma.

V

Vladimir Olohtonov, 2018-03-20
@sgjurano

I recommend using pandas and not reinventing the wheel :)
If you have headers in csv, it will determine them and you can access the data directly by the column name.
PS: or the csv module at worst.

L

LODIII, 2018-03-21
@LODIII

In many smart books they write that you need to do a regular expression when
the code has been working for a long time, you want to improve it once, refactor it and forget it.
And the key is that this code will never be read by another programmer again.
I suggest a line when you read immediately into stroka1 = str.split(',') that is, into a list
and then see how many characters after the comma and these columns you name
like

for st in stroka1:
    arr = st.split('.')
    if len(arr) > 1:
        if len(arr[1]) > 5:
            print('lat or lon')