V
V
Vladimir Korotenko2020-06-18 21:25:17
Algorithms
Vladimir Korotenko, 2020-06-18 21:25:17

I request the help of the collective mind for parsing addresses, which option is better?

The input has a weakly typed address. Where, in theory, the format is as follows:

Index, Country, Region, City, Street, house, all sorts of

things. What I caught.

  1. Missing commas
  2. Part omissions
  3. Index skips
  4. Missing commas
  5. Passes in general in any combination, without a separator
  6. The reverse of everything, by the way, for some reason Chechnya and suddenly Dagestan suffers from this (Why????)
  7. Trash with a mixture of spaces and \t
  8. Replacing commas with spaces (Nizhny Novgorod or Novo Voronezh, Nizhny Ustyug)
  9. Variations (Nizhny-Novgorod Nizhny - Novgorod Nizhny Novgorod)


As if the solutions are on the surface, more precisely, I went over:
1. Parse everything in logical formats, that is, if the first index is greater than 6 or is not parsed into an integer, then this is garbage and not an address
2. if there are no commas, then divide by spaces, but then the street the revolution of 1905 just cries
3. Take a fias and parse each address, alas and ah
4. take dadat and parse, also a so-so option

In general, I call for discussion.

Perhaps there are some thoughts on how to bring this chaos under the banner of the Emperor!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
d-stream, 2020-06-18
@firedragon

ParseRussianAddressV3 ? )
I think the separators of the supposed entities are the smallest and simplest task ... but then ... like 8-9 pieces similar to fields were drawn - we sort through the options for maximum matching ...
But ... "105037, Parkovaya street 3- I, Moscow" may suddenly turn out to be "105037_3rd Parkovaya" ...
ps Did the address that breaks the parsers from the diadoc fly?
Here is another from the same: https://github.com/diadoc/diadocsdk-csharp/issues/227

F
freeExec, 2020-06-18
@freeExec

https://github.com/openvenues/libpostal

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question