M
M
Maxim Valerievich2015-05-20 18:55:01
Ruby on Rails
Maxim Valerievich, 2015-05-20 18:55:01

Does anyone know how to parse this nokogiri page?

Hello. I have a problem and don't know what the problem is. There is a page like this:
This is a spotify player. In any browser it loads perfectly, the page is downloaded by curl without problems, the usual get request. But in rails, when I try to parse it with nokogiri, only one line is loaded:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

If nokogiri parses the downloaded file with curl from this address, then everything is fine. What can be wrong?
UPD.
The problem was in the user-agent. Here is the solution:
source = 'https://embed.spotify.com/?uri=spotify:user:128386105:playlist:39BkANk6cQDivVkymDRQTL'
user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.854.0 Safari/535.2"
page = Nokogiri::HTML(open(source, 'User-Agent' => user_agent), nil, "UTF-8")

Answer the question

In order to leave comments, you need to log in

3 answer(s)
V
Viktor Vsk, 2015-05-20
@S-anches

Perhaps it reacts differently to different user agents.
Possibly redirects
For example, https://github.com/typhoeus/typhoeus opened the page without additional settings.

M
Maxim Valerievich, 2015-05-20
@S-anches

I also tried to load the page through the rest-client, the same is absolutely the same.

K
kkrieger, 2015-05-21
@kkrieger

When curl do add -v option there will be headers it sends, repeat all headers in code

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question