Answer the question
In order to leave comments, you need to log in
How to parse dynamic content?
Good afternoon!
I am writing a NodeJs web scrapper for the page https://zachtronics.bandcamp.com/album/shenzhen-i-...
From the connected modules I use request and cheerio.
My task is to get a link that is in one of several script tags (the link itself looks like this: https://t4.bcbits.com/stream/b60bed46407ad20cf804c...
Problem:
request returns only html, and I need what lies in script tag, i.e. dynamic content. I understand that the only way out is to use webdriver, puppeteer, or headless chrome? But it's resource intensive, launching a whole browser just to get to the script tag and take the link from there. Any other ways?
Answer the question
In order to leave comments, you need to log in
Nothing is dynamically connected there. All page content, including the scripts you need, you get immediately upon loading. cheerio
you don’t even need it here, and it won’t help.
Here is an example of how you can get the value you need:
request('https://zachtronics.bandcamp.com/album/shenzhen-i-o-ost', (error, response, body) => {
const json = JSON.parse(body.match(/trackinfo:.*(\[.*?\])/)[1]);
const mp3 = json[0]['file']['mp3-128'];
console.log(mp3);
});
and request will definitely not return the entire html file of the page with scripts?
if I understand correctly, I'm interested in the trackinfo array: [...] from the main page, it can be pulled out without cheerio with a regular expression from request.get('bandcamp.com/...'), and then just parse it as usual string via JSON.parse
UPD
and by the way it turns out, I’ll pick up the regular routine and also write my bandcamp grabber
request('https://zachtronics.bandcamp.com/album/shenzhen-i-o-ost', function (error, response, body) {
res.send(body.match(/(?<=trackinfo:)(.*)(?=,)/gi))
// кривая регулярка, выцепляет не всё что надо
});
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question