Answer the question
In order to leave comments, you need to log in
How to collect information about tours from several operators in your database?
Hello! The task is to develop a selection of tours that displays tours corresponding to the parameters from several tour operators. There is a list of operators, some provide an API, some do not.
I looked at similar questions in the toaster and realized that I need to parse data from the site of these operators and put it in the database. Let's say it will be every day at 2:00.
In this regard, I have questions:
Answer the question
In order to leave comments, you need to log in
I can answer the first question: never rely on the fact that you will be given a page with valid HTML. We used the usual regexp to extract information. Long and slow, but reliable. Perhaps there is a bible for PHP that can build a partial DOM from invalid HTML. For .NET, there is such a bible, this is the preferred option.
There are a few things to think about when it comes to storage format:
1. Google the tourism industry's presentation standards. It is useful in order to format the brain for this industry, but, in fact, no one supports these standards in our country.
2. Analyze the data structure of the operators you have chosen. It will not be difficult to single out the general structure.
3. Immediately think about the parameters by which you will search for data. This is your main feature, so it needs to be given the most attention. Think about data storage technology: SQL, NoSQL, a hybrid solution (for example, SQL stores normalized source data, and NoSQL generates denormalized views tailored for fast search).
I can’t say anything more on this topic, because I haven’t been working in this industry for 5 years. During this time, a bunch of new technologies have appeared, and the approach to data storage can be completely different.
1. Parsing: datacol
2. Storage: trees based on ID: id, parent_id, param1,....,paramN
About children and adults - combinatorics.
Data bits: 0 - child, 1 - adult, 00-11 - number from 0 to 3.
Then:
000000 - no places
000001 - 1 child
000101 - 1 adult
101001 - 1 adult and 1 child
110001 - 2 adults and 1 child
110010 - 2 adults and 2 children
111011 - 3 adults and 3 children
With a fixed data order, you can get rid of 2 extra bits.
1) Better without any tools with a regular regular season.
2) Each separately
3) Did not decide.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question