How to identify a language pair from 2 documents?

S

swcalc2015-12-03 16:11:51

Text Processing Automation

swcalc, 2015-12-03 16:11:51

Hello, there are 2 documents, how (in theory, without code = ) ), can you find out that 2 words in an English document are equal to 3 words in a Russian one?
At least the beginnings of an idea are needed.
How to check, symbolically - nonsense, the compilation of the previous database?
For example hello = (hello||hello||hello) and build on that?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

V

Vladimir Sergeev, 2015-12-07
@swcalc

If you need a good commercial solution, then you need a neural network that will feed hundreds of thousands of language pairs at the level of literate and complete thematic texts with translation into the desired language, and a long training period. Over time, she must learn to recognize the most frequent semantic equivalents for the context at the level of sentences and, probably, typical phrases. But I think that even companies like Google, which have their own translation web service and access to petabytes of language pairs, are not yet able to do something comparable to the work of even an amateur translator is a good indicator of the complexity of the task.
If nothing good is needed, then you can try to concoct a simple translator from open dictionaries, where most of the frequently used language pairs are already included at the level of words and phrases. Only this has already been implemented by many people ("Promt" remember?) And it is absolutely useless for translating large amounts of text.
Of course, you will never be able to accurately translate according to words, just as you can’t transplant a deer’s leg into a cow and hope that she will run faster from this.