A
A
Anton2016-08-19 15:27:24
ruby
Anton, 2016-08-19 15:27:24

How to parse an HTML string?

There is this HTML code:

<span class="title">Название:</span> Rising Water - James Vincent McMorrow<!-- После слитно еще несколько span -->

I'm trying to parse it:
puts /<span class="title">Название:<\/span>(.*)-(.*)/.match(line)

But it doesn't come out. I get the name and the spans following it.
Help me please.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Alexander Shvaikin, 2016-08-19
@hummingbird

It is not worth parsing with a regular expression, the above mentioned is quite suitable, rubyists often use the nokogiri library for parsing sites.
It is quite convenient, for example, because you can get to elements by ccs selectors.
https://habrahabr.ru/post/52680/
Example:

doc = Nokogiri::HTML('<body><span class="title">Название:</span> Rising Water - James Vincent McMorrow<span></span><body>')

name = doc.xpath('//span/following-sibling::text()[1]').text

p name # => " Rising Water - James Vincent McMorrow"

HTML DOES NOT PARSE WITH REGULATORS. REGEXP IS A TOOL NOT COMPLEX ENOUGH FOR HTML PARSING.
D͓O̰̭̳̭̠̠͢N̠̞̠͉ţ̤̝ ̣̣̼̫̥̯E̷̥̝V̦E̵̻N̰͕͟ ̵̗̻̪̯T̗̪̯̬͕̺R͉̰̮̣̬͖Y̶͕͍ ̝̩̟̬̟ţ̮O̶̰̭ ̵̣̥̬̯̜̟͓P̴͎̼̜A̝̖̦̫͈͖R̝͇̖̟͖̬S̭̭̯͉͎̙͘E҉̠̺̻̻̝ ̣̺̤̣I͈̭̤͍̼͘T̞͉̺̘͚ ̺̞̰̳͔̭O̠̗͙̖̬͇U͠ţ͉̘̦̝̪

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question