How to parse dynamic content?

Q

qbr-code2019-06-17 19:54:21

Node.js

qbr-code, 2019-06-17 19:54:21

Good afternoon!
I am writing a NodeJs web scrapper for the page https://zachtronics.bandcamp.com/album/shenzhen-i-...
From the connected modules I use request and cheerio.
My task is to get a link that is in one of several script tags (the link itself looks like this: https://t4.bcbits.com/stream/b60bed46407ad20cf804c...
Problem:
request returns only html, and I need what lies in script tag, i.e. dynamic content. I understand that the only way out is to use webdriver, puppeteer, or headless chrome? But it's resource intensive, launching a whole browser just to get to the script tag and take the link from there. Any other ways?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

D

DanKud, 2019-06-17
@qbr-code

Nothing is dynamically connected there. All page content, including the scripts you need, you get immediately upon loading. cheerioyou don’t even need it here, and it won’t help.
Here is an example of how you can get the value you need:

request('https://zachtronics.bandcamp.com/album/shenzhen-i-o-ost', (error, response, body) => {
  const json = JSON.parse(body.match(/trackinfo:.*(\[.*?\])/)[1]);
  const mp3 = json[0]['file']['mp3-128'];
  console.log(mp3);
});

H

hzzzzl, 2019-06-17
@hzzzzl

and request will definitely not return the entire html file of the page with scripts?
if I understand correctly, I'm interested in the trackinfo array: [...] from the main page, it can be pulled out without cheerio with a regular expression from request.get('bandcamp.com/...'), and then just parse it as usual string via JSON.parse
UPD
and by the way it turns out, I’ll pick up the regular routine and also write my bandcamp grabber

request('https://zachtronics.bandcamp.com/album/shenzhen-i-o-ost', function (error, response, body) {
  res.send(body.match(/(?<=trackinfo:)(.*)(?=,)/gi))
  // кривая регулярка, выцепляет не всё что надо
});