J
J
John Smith2017-07-25 12:12:20
JavaScript
John Smith, 2017-07-25 12:12:20

What is the best way to store data in a database? Which technology stack to choose?

Briefly describe the essence of the project.
There is a distributor who sells goods (say, 300 items) to fifty other small wholesalers, each of which has its own website. The distributor wants to regularly (every day) scrape these 50 sites, with the following goals:

  1. Price dumping control.
  2. The ability to see the dynamics of price changes for your product.

A little about myself: I have experience writing web scrapers using Node.js (1 year). I have some experience with noSQL database RethinkDB (1 project).
The project will consist of:
  1. Basically a web scraper.
  2. Small admin panel on localhost.

The admin panel will consist of:
  1. The main page from which the scraper is controlled: "start", "pause", "stop", "upload new product data" (upload a .csv file with a list of exact product names and their prices; by default, the last uploaded file will be used) . Also, on the main page, the parsing progress will be displayed in the form of a loader (I implement this through socket.io), and at the end of the parsing - a brief summary of the results (on which sites the goods are sold below cost).
  2. A page with a list of all products in the following form: on the left is a product, on the right is a graph (chart.js) of the average price for this product over the past month.
  3. You can click on a product and go to the page with this product, which contains the following information: price charts on each of the 50 sites (lazy loading) for the last N days.
  4. You can click on any of the sites and go to the page with the relevant information: on the left - products, on the right - price charts for goods on this site for the last N days.

I plan to implement the entire backend in the following form: server.js, which will host the admin panel and which will launch the manager.js parser manager (via child_process.fork()), which in turn will fork 50 parsers (firstsitecom.js, secondsitecom.js , ...). Why so many forks? Why not just do it via module.export? So that an error in one of the parsers does not put the whole system down. So, if one of the modules fails, I will be able to simply restart it, or ignore it and parse further.
And finally, the question itself, more precisely, two:
  1. What is the best way to store data in a database? Maybe create a table for each of the fifty sites, or is it better to create a table for each product? I would be very grateful for any detailed answer.
  2. What is the best technology stack? For now, I settled on MEAN: Node.js, Express, Angular 2 (I went through the PhoneCat and Hero tutorials on the official site, everything seems to be clear, but its monstrosity and TypeScript confuse), MongoDB (similar to RethinkDB). If you have any tips on choosing a stack, I'll take a look at them.

I understand that with my not very extensive knowledge, all this can take me a lot of time and effort, but I am certainly ready for this and will spend as much effort as necessary.
Perhaps you have something to say, or some useful links come to mind. I will be glad to any answer. Thank you.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
4
4X_Pro, 2017-07-25
@XXXXPro

Why do this in NoSQL? This is where relational databases come in handy.
I would generally limit myself to three tables:
1) site
2) product in general (in fact, only its id and name are stored there)
3) product on a specific site (product id, site id, price, parsing date are stored here).

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question