How to iterate over bigrams?

T

Timebird2017-10-08 02:33:33

Python

Timebird, 2017-10-08 02:33:33

There are bigrams from the text obtained by the following code:

token = nltk.word_tokenize(train_words)
bigrams = ngrams(token, 2)
print(list(bigrams))

Conclusion (shortened):

[('Вот', 'дом'), ('дом', 'Который'), ('Который', 'построил'), ('построил', 'Джек'), ('Джек', 'А'), ('А', 'это'), ('это', 'пшеница'), ('пшеница', 'Которая'), ('Которая', 'в'), ('в', 'тёмном'), ('тёмном', 'чулане'), ('чулане', 'хранится'), ('хранится', 'В'), ('В', 'доме'), ('доме', 'Который'), ('Который', 'построил'), ('построил', 'Джек'), ('Джек', 'А'), ('А', 'это'), ('это', 'весёлая'), ('весёлая', 'птица-синица'), ('птица-синица', 'Которая'), ('Которая', 'часто'), ('часто', 'ворует'), ('ворует', 'пшеницу'), ('пшеницу', 'Которая'), ('Которая', 'в'), ..., ]

Question: it is necessary, for example, to iterate tritely - first, output the first bigram in the list, and then the first word of the first bigram in the list. How to do it?
I try trivially:

for bigram in bigrams:
    print(bigram)

But jupyter doesn't output anything at all. What's the matter?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

Alexey S., 2017-10-08
@Timebird

I don’t know why, but the output print(list(bigrams))breaks the execution of for, if you comment it out, the loop will work fine, but you can also do something like this:

lst= list(bigrams)
print(lst)
bigrams=iter(lst)