J
J
joger2012-03-12 17:42:41
Algorithms
joger, 2012-03-12 17:42:41

Algorithm for finding names in text

There are several texts for a couple of thousand words each. there is a list (a couple of hundred) of names (Alexander, Ivan, etc.).
If the name is followed by words with a capital letter, then we consider this a name and a surname.
Is there a more efficient way to find the first and last name other than brute force?

although it does not matter, but the language is PHP

Answer the question

In order to leave comments, you need to log in

4 answer(s)
V
Vitaly Peretyatko, 2012-03-12
@viperet

Of course, there are more efficient algorithms, but I would not waste time on optimization if several texts of a couple of thousand words are all that needs to be processed. There is a high probability that the creation of an optimal program will take more time than the work of a suboptimal one.

@
@sledopit, 2012-03-12
_

What does the surname have to do with it and whether they are needed, I did not understand. PHP I know at the level

<?php echo "Hello world" ?>
.
But bash I know well. And I would do this:
for FILE in *txt ; do
comm -12 <(cat $FILE | sed 's/[!?., ]/\n/g'|sort -u) <(sort FILE_WITH_NAME_LIST)
done

Since the algorithm was asked, it does the following: it turns the file into a sequence of lines, where each line has only one word. It is also assumed that the file with names is also the same sequence, and then comm searches for matches by brute force.
Unfortunately, I can’t compare this method with php either in terms of productivity or laboriousness.

A
Anton, 2012-03-12
Buichik @RoAChik

Maybe I didn’t understand what the author wants exactly, but you can implement the selection like this: we
break the lines into separate pieces (explode), unless of course they are separated, for example, only by spaces or some kind of character. Then, similarly, we split the name separately into an array of characters and check the first character for a capital letter.

W
winolog, 2012-03-12
@winolog

array_uintersect

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question