How to handle large amounts of data in NodeJS?

K

Kiril2016-04-08 22:40:06

Node.js

Kiril, 2016-04-08 22:40:06

In general, I am new to the node and immediately stepped on the rake of processing a large heap of data ... In general, I am writing a parser for several sites (at the moment there are 12 of them). A request is sent to the server via sockets. Every site is processed there.
The site is processed in the following way (in general, it is almost the same for each site): the first page of the search is parsed, then the blocks from the search are processed, from them we get links to the full description, load the description, from the description pages go to the author's description page - we collect all this and send to the user. Also, do not forget that there may be more than one page in the search and therefore repeat the operation on the following pages. And so for 12 sites.
Looks like this:

selected_sites.forEach(function(site_name) {
  var site = new sites[site_name];

  site.on('found', function(data) {
      socket.emit('found', data);
  });

  site.on('not_found', function() {
    socket.emit('not_found', 'Nothing found at ' + site.getSiteName());
    delete site;
  });

  site.search(socket_data.params);
});

So, if I open several tabs and click on different requests, I have to wait until certain requests are parsed from the one who clicked first ...
In general, this is where the question comes from - how to speed up this whole process so that other users do not have to wait ...

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

Alastor, 2016-04-08
@Alastor

if the pages are static or updated (once every 3-12 hours), you can store html in the database. or parse and store the necessary data.
it is possible to make a queue of requests being processed at the moment and not to be re-run on the same URL.
if i understand the problem correctly