O
O
Oleg Savvateev2012-10-18 08:39:06
Programming
Oleg Savvateev, 2012-10-18 08:39:06

How to determine the stress in a word?

I have a site on the site that allows you to search for rhymes to words. Currently, the search is carried out using Zaliznyak's dictionary, which I would like to expand. Using the morphological module from AOT , I can find word forms for a word (google, google, google, etc.), and this module allows you to find word forms heuristically if they are not in the dictionary, as in the example in brackets. But the whole problem is that in this case, the module does not determine the stress in the word, but to search for rhymes, you understand, this is necessary. Can someone tell me how to find the accent? And is it even possible?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
I
ixSci, 2012-10-18
@ixSci

So in word forms the same stress as in the source. use it

I
ivanra, 2012-10-18
@ivanra

I can give you a hint.
At one time, there was such a popular padeg.dll library that allowed inflecting full names, positions and names of organizations. In the naming of names, an internal algorithm was used there, which converts the source text into a sonority line.
Perhaps this is what you need.
I once converted this library from delphi to java, here is a piece of code from there:

strToSonic
  /**
   * разрешенные символы
   */
  private static final String legalChar = "абвгдежзийклмнопрстуфхцчшщъыьэюя";
  /**
   * звучности символов
   * 3 - гласные
   * 2 - сонорные
   * 1 - шумные
   * ъ и ь не имеют звучности и обозначены символом ^
   */
  private static char sonic(char index) {
    //       абвгдежзийклмнопрстуфхцчшщъыьэюя
    return ("31111311321222312113111111^3^333".charAt(index-'а'));
  }
  /**
   * Формирует строку звучности, соответствующую строке символов
   * @param value
   * @return
   */
  public static String strToSonic(String value) {
    StringBuilder result = new StringBuilder();
    // заменим ё на е
    value = value.toLowerCase().replace("ё", "е");	
    if (value.length()>0) 
      // для всех символов
      for (int i = 0; i < value.length(); i++) {			
        //if (legalChar.indexOf(value.charAt(i)) >= 0)	
        char ch = value.charAt(i);
        // если символ разрешенный
        if (legalChar.charAt(0)<=ch && ch<=legalChar.charAt(legalChar.length()-1)) {
          char test = sonic(ch);
          // и имеет звучность
          if (test != '^')
            // заменим его на звучность
            result.append(test);	
        }
      }
    return result.toString();
  }

K
Konstantin Kanaev, 2012-10-19
@Yoh_Asakura

In my opinion, you have several options:
1) Either use a dictionary where you indicate stress. To facilitate your work, you can make a small online form and people will help you with the accumulation of such data (how to attract users is a separate question), but here you should think about the literacy of users (you can check literacy with a small test and only then provide an opportunity).
2) Or leave this idea.
It seems to me that there is no third.

L
linguist, 2020-11-13
@linguist

Accent program:
morpher.ru/accentizer
API: morpher.ru/ws3/#addstressmarks

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question