Answer the question
In order to leave comments, you need to log in
How to break text into sentences without using regular expressions?
They gave me a task at the university - to break the Russian text in utf-8 into sentences. But without regular expressions.
The program needs to handle the situation when the first word of a sentence begins with an uppercase letter.
It would seem easy
text = text.replace('. ', '.|').replace('! ', '!|').replace('? ', '?|')
sentences = text.split('|')
Try it, come on! wow what! here's a big chick! you think I won't find a trial for you
i = 0
while i < len(sentences) - 1:
if not sentences[i + 1].istitle():
sentences[i] += sentences[i + 1]
sentences.pop(i + 1)
i += 1
ch1 = 'Б'
print ord(ch1)
TypeError: ord() expected a character, but string of length 2 found
ch1 = 'Б'
ch2 = 'б'
if ch1.istitle():
print ("Верхний")
else:
print ("Нижний")
if ch2.istitle():
print ("Верхний")
else:
print ("Нижний")
"Look how brave you are!" said Chub, left alone in the street. "Try it, come on! wow what! here's a big chick! you think I won't find a trial for you. No, my dear, I'll go, and I'll go straight to the commissioner. You will know me. I will not see that you are a blacksmith and painter. However, look at the back and shoulders: I think there are blue spots. The son of the enemy must have beaten him painfully! it's a pity that it's cold and you don't want to throw off the casing! Wait, you demonic blacksmith, so that the devil beats both you and your forge, you will dance with me! you see, damned shibenik! however, because now he is not at home. Solokha, I think, is sitting alone. Hm... it's not far from here; would go! The time is now such that no one will catch us. Maybe even that will be possible... see how painfully the damned blacksmith beat him!'
Answer the question
In order to leave comments, you need to log in
"Squeeze" your own solution
by adding a check for capital letters and slightly adjusting the arguments in replace() :
>>> letters = 'АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЭЮЯ'
>>> text = '"Смотри, как расхрабрился!" говорил Чуб, оставшись один на улице..' # и далее по тексту
>>> for letter in letters:
if letter in text:
text = text.replace('. '+letter, '.|'+letter).replace('. "'+letter, '.|"'+letter).replace('! '+letter, '!|'+letter)
>>> for sentence in text.split('|'):
print(sentence)
"Смотри, как расхрабрился!" говорил Чуб, оставшись один на улице.
"Попробуй, подойди! вишь какой! вот большая цяца! ты думаешь, я на тебя суда не найду.
Нет, голубчик, я пойду, и пойду прямо к комиссару.
Ты у меня будешь знать.
....
и т.д.
ch1 = u'Б'
ch2 = u'б'
if ch1.istitle():
print ("Верхний")
else:
print ("Нижний")
i = 0
while i < len(sentences) - 1:
if ord(sentences[i + 1])>1039 and ord(sentences[i + 1])<1071:
print (sentences[i + 1])
i += 1
бла бла. "Ой..."
It would seem easy
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question