D
D
Dmitry2018-04-14 20:54:20
JavaScript
Dmitry, 2018-04-14 20:54:20

PhantomJS: how to keep html code of multiple pages looping?

There is a JS code for saving the html code of some page to the text.html file

var url_name= "https://www.myscore.ru/match/bNYBihO3/#odds-comparison;over-under;full-time";

var page = require('webpage').create();

var fs = require('fs');
var path = 'text.html'

page.open(url_name, function (status) {
  var content = page.content;
  fs.write(path, content, 'w')
  phantom.exit();
});

everything works fine, only when I create an array of URLs and go through it with a for loop, I get something wrong
var mass = ["https://www.myscore.ru/match/bNYBihO3/#odds-comparison;over-under;full-time", "https://www.myscore.ru/match/nRJYS03A/#odds-comparison;over-under;full-time"];
var page, fs;
for(var i=1; i<mass.length; i++){
  
  page = require('webpage').create();

  fs = require('fs');
  
  page.open(mass[0]);
  
  fs.write('text'+i+'.html', page.content, 'w');
   
}

phantom.exit();

As I understand it, the for loop does not wait until the first page is loaded and loads the second one. As a result, only the second file is created with the following content Please help me solve this problem. Thank you in advance.
<html><head></head><body></body></html>

Answer the question

In order to leave comments, you need to log in

3 answer(s)
D
Dmitry, 2018-04-15
@s-dimyan

no, that doesn't work either. 2 html files are created, only there is not a page but tags The problem is solved!!! To do this, I created a parser folder on localhost with the files started.php and pars_page.js. PHP starts PhantomJS and script file started.php:

// массив ссылок
  $mass_url = [
    "https://www.myscore.ru/match/bNYBihO3/#odds-comparison;over-under;full-time", 
    "https://www.myscore.ru/match/nRJYS03A/#odds-comparison;over-under;full-time"
  ];
  // перебор по ссылкам
  for($i=0; $i<count($mass_url); $i++){
    // запускаем phantomjs
    // запускаем скрипт JS
    // первый аргумент - URL страницы
    // второй аргумент - имя файла для сохранения ее содержимого
    exec("start C:\phantomjs\bin\phantomjs pars_page.js ".$mass_url[$i]." text".$i.".html");
  }

pars_page.js file:
var page = require('webpage').create(); // создаем окно браузера куда будем загружать страницы
var system = require('system'); // Инициализируем модуль system
var address = system.args[1]; // адрес страницы (первый после наименования скрипта аргумент system)
var path = system.args[2]; // наименование создаваемого файла с содержимым страницы (второй аргумент system)
var fs = require('fs'); // Инициализируем модуль fs для записи в файл

page.open(address, function (status) { // открываем страницу в окне браузера
  fs.write(path, page.content, 'w'); // записываем содержимое страницы в файл
  phantom.exit(); // закрываем phantomjs
});

Everything works great for me!

Z
ZaurK, 2018-04-15
@ZaurK

Maybe that will work?

for(var i=0; i<mass.length; i++){
  page = require('webpage').create();
  fs = require('fs');
  page.open(mass[i]);
  fs.write('text'+i+'.html', page.content, 'w');
}

A
Anton fon Faust, 2018-07-24
@bubandos

var mass = ["https://www.myscore.ru/match/bNYBihO3/#odds-comparison;over-under;full-time", "https://www.myscore.ru/match/nRJYS03A/#odds-comparison;over-under;full-time"];
var page, fs;
for(var i=1; i<mass.length; i++){
    page = require('webpage').create();
    fs = require('fs');
    page.open(mass[0]);
    page.onLoadFinished = function() {
        fs.write('text'+i+'.html', page.content, 'w');
    };

    
}

phantom.exit();

everything is described in the documentation.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question