Answer the question
In order to leave comments, you need to log in
Encoding when reading files in Ruby
The task of counting the number of words in a file, please tell me how to properly set up the encoding in File.read () so that Russian is read normally.
the code:
def words_from_string(string)
string.downcase.scan(/[\w']+/)
end
def count_frequency(word_list)
count = Hash.new(0)
word_list.each {|word| count[word] += 1 }
count
end
raw_text = File.read("text.txt") #, encoding: Encoding::UTF_8) #, encoding: "cp1251")
p raw_text
word_list = words_from_string(raw_text)
p word_list
counts = count_frequency(word_list)
p counts
sorted = counts.sort_by { |word, count| -count }
p sorted
top_five = sorted.last(5)
p top_five
top_five.each { |word, count| puts "#{word} #{count}" }
Answer the question
In order to leave comments, you need to log in
Great, I'm one step closer! )
there is some difference between p and puts:
f = File.open("text_ascii.txt", "r:windows-1251")
raw_text = f.gets
puts raw_text.encoding
puts raw_text
Windows-1251
Здравствуйте, уважаемые читатели. Я продолжаю свою серию постов про распределенную систему контроля версий Mercurial.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question