Answer the question
In order to leave comments, you need to log in
Where to see how many words starting with a specific letter in which language
In other words, you need to scatter the words on the plates depending on the first letter (in order to reduce the size) and you need to decide for which letters you need a separate plate and which ones can be merged into a common one (because there are few of them).
Thank you.
Answer the question
In order to leave comments, you need to log in
Alternatively:
cat big_dict_en.txt | while read s; do for a in $s; do echo $a;done;done | uniq | grep "^a" | wc -l
First: a very large volume of texts. Second: There, in the database, the search is organized like Google’s bubet, each word has its own offset in the article and then search by words and find those where the mixing between words is minimal, etc. (there are many criteria) and not only morphological. In a large volume, these tablets are supposed to be stored on different servers and the search will take place in parallel. In general, a lot of things (almost your Google, laughter).
As for the question, you can, of course, just see how many pages there are in Slovak for which letter, but I don’t have all the necessary dictionaries.
Yes, still, I would like to see a base of synonyms for languages somewhere in order to tie it to all this.
Unfortunately, I can’t give everything at the mercy of Google. the information in which the search will be carried out is not public.
And those search engines that I have seen are not optimized for load distribution and database across multiple servers.
(can someone tell me, then it is not necessary to reinvent the wheel).
it won’t work out, there are still requirements for integration into existing systems, it’s easier to write your own than to finalize Sphinx. I've been looking at him too.
I bought a book on it here in the evening I'll read it again. Might really try.
Thank you, I just returned from the library, stupidly surrounded myself with dictionaries and counted the number of pages occupied by each letter.
1. Usually not all words are in the dictionary. There are no word forms. Therefore, it may not turn out exactly what you need.
2. No need to know the exact amount. Just knowing the ratio is enough. To do this, you can take a not very large number of documents for each language (for example, from Wikipedia) and calculate the distribution in these documents.
3. In fact, you can avoid all this if you do not divide by the first letter, but calculate the hash from the word and take the remainder of the division by the desired number of tables.
If you have entries like this:
SELECT op.parcel_cn, op.id
FROM objects_process AS op
WHERE op.status IS NULL
SELECT op.parcel_cn, op.id, o.area_value
FROM objects_process AS op
JOIN objects_copy AS o
ON op.parcel_cn = o.parcel_cn
ON op.parcel_cn = o.parcel_cn
: WHERE op.status IS NULL
i.e. there are no such records that would satisfy both conditions at once
Create a couple of signs with 2-3 lines for which this should definitely work. If it doesn't work, cut off pieces from the request until it becomes clear what exactly is wrong.
Of course, I don’t have experience with postgresql, but if you look from the point of view of a simple sql query, you can write like this:
SELECT
op.parcel_cn,
op.id,
o.area_value
FROM
objects_process op,
objects_copy o
WHERE
op.parcel_cn=o.parcel_cn and
op.status IS NULL
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question