Answer the question
In order to leave comments, you need to log in
How to handle large amounts of data in NodeJS?
In general, I am new to the node and immediately stepped on the rake of processing a large heap of data ... In general, I am writing a parser for several sites (at the moment there are 12 of them). A request is sent to the server via sockets. Every site is processed there.
The site is processed in the following way (in general, it is almost the same for each site): the first page of the search is parsed, then the blocks from the search are processed, from them we get links to the full description, load the description, from the description pages go to the author's description page - we collect all this and send to the user. Also, do not forget that there may be more than one page in the search and therefore repeat the operation on the following pages. And so for 12 sites.
Looks like this:
selected_sites.forEach(function(site_name) {
var site = new sites[site_name];
site.on('found', function(data) {
socket.emit('found', data);
});
site.on('not_found', function() {
socket.emit('not_found', 'Nothing found at ' + site.getSiteName());
delete site;
});
site.search(socket_data.params);
});
Answer the question
In order to leave comments, you need to log in
if the pages are static or updated (once every 3-12 hours), you can store html in the database. or parse and store the necessary data.
it is possible to make a queue of requests being processed at the moment and not to be re-run on the same URL.
if i understand the problem correctly
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question