Answer the question
In order to leave comments, you need to log in
Application architecture for parsing a large number of pages
Good afternoon.
Please help me with the following question:
I need to check prices for ~10 million items every day.
Previously, such an amount has never been processed (especially at specified time intervals), so there are doubts about the implementation of this.
How to estimate the sufficient power of the server (or servers?), throughput and the like. What database is better to use, perhaps even PL. How many threads to run and the like.
What would you use for a similar task? Page size ~100kb, return time ~ 2c + ~2c on proxy.
Thanks
Answer the question
In order to leave comments, you need to log in
Experiment is the criterion of truth. Nonsense. Parsing 100 kilobytes is a trifling matter. I'm at work parsing 2 megabytes of JS on the client. At the same time with the complex logic of rebuilding the DOM. + did everything asynchronously so that the browser does not hang.
In your case, with a regular regular expression, you can quickly parse everything in one line, getting an array at the output. Or DOM selector.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question