Answer the question
In order to leave comments, you need to log in
How to properly get all the content in the body using Nokogiri and convert it to text?
<body>
<p>Content</p>
...Content...
<body>
new_content = nokogiri_content.at('body').children.text
Answer the question
In order to leave comments, you need to log in
To reduce spaces, alas, is not included in the Nokogiri functions, you can remove the starting spaces with a regular expression.
But in general, this is not a very normal way, since in the text you will not only have spaces, but also content that is usually not processed as text. Processing html with Nokogiri involves more targeted actions, such as extracting the necessary tags and text from them:
new_content.gsub(/^ +/, "")
require 'open-uri'
require 'nokogiri'
url = 'https://ru.wikipedia.org/wiki/Ruby'
doc = Nokogiri::HTML(open(url))
text = ''
doc.css('p,h1').each do |e|
text << e.content
end
puts text
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question