Answer the question
In order to leave comments, you need to log in
AI algorithm for processing text and extracting columns of data
There is a structured text in the form of a table of the following form
Позиция Код С1 С2 С3
Кошки 1000 20 30 45
Собачки 2000 13 49 -40
Попугайчики 3000 45 -90
зеленопёрые
Свинки 4000 10
Хомяки 5000 67
Answer the question
In order to leave comments, you need to log in
AI has nothing to do with it. Here you need to develop an algorithm, the usual stupid algorithm.
I would try to solve the problem like this:
- we always have the first column, then we have data.
- We select from the line the positions of all the data in the line, for example, for the line with parrots, we get that the value is far behind the previous one, which means one is missing before it. Well, and so by the distance between the values, you can make assumptions about which column it belongs to.
AI has nothing to do with it.
If your downward moving columns cannot go to the side so much that the headers of the next column will go to the next place, then you just need to find familiarity spaces that are equal to spaces throughout the file from the very top to the very bottom ("whitespace columns"). Then merge adjacent whitespace columns, split each line by their positions and find inside the split either a number or a void (the number is missing). This algorithm is deterministic and has no parameters (there is nothing to configure in it).
If the previous one is not performed and the columns move out strongly, then you can run the same algorithm not globally on the entire file, but locally, for example, on nearby 3-4-5 lines - this will correspond to what a living person thinks that a column is 5 lines cannot go to the place of a neighbor. In the local version, you may already have to look for suitable parameters (number of monitored consecutive lines, maximum side shift, etc.)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question