A
A
askold20132018-06-01 21:12:42
Parsing
askold2013, 2018-06-01 21:12:42

Error: Socket hang up. How to handle an error when parsing pages?

Hello! I can’t figure out what the problem is - when parsing, it gives an error Socket hang up, after which the companiesList array is empty for some reason. Who can tell what is wrong? Or at least where to look for it.
My code:

const tress = require('tress')
const needle = require('needle');
const cheerio = require('cheerio');
const config = require('../config.json')
const resolveURL = require('url').resolve;

module.exports = function (vacancyUrls) {

  // find links to vacancies in vacancies list by url
  const findCompanyLink = (vacancyUrl) => new Promise ((resolve, reject) => {

    let companiesList = []

    const q = tress((url, callback) => {
      needle.get(url, function(err, res) {
        if (err) console.log(err, companiesList.length, url)
        if (res && res.body) {
          const $ = cheerio.load(res.body);
          companiesList.push($('dd>a').attr('href'))
        }
        callback()
      })
    }, 10000)

    q.drain = function(){
      resolve(companiesList)
    }

    q.push(vacancyUrl)
    
  })
  
  let promises = []

  vacancyUrls.forEach(vacancyUrl => {
    promises.push(findCompanyLink(config.baseUrl + vacancyUrl))
  })

  return Promise.all(promises)

}

How it works here - the module accepts an array of links to vacancies. Downloads these pages using the findCompanyLink method, which pushes a link to the company that posted this job.
Sometimes err gets an error about which I wrote. So the question is - how to "bypass" such a page without putting the whole process, and without corrupting the companiesList array?

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question