D
D
dake12312014-11-27 19:34:39
PHP
dake1231, 2014-11-27 19:34:39

How to collect information about tours from several operators in your database?

Hello! The task is to develop a selection of tours that displays tours corresponding to the parameters from several tour operators. There is a list of operators, some provide an API, some do not.
I looked at similar questions in the toaster and realized that I need to parse data from the site of these operators and put it in the database. Let's say it will be every day at 2:00.
In this regard, I have questions:

  1. What tools should be used in parsing. While I'm thinking Xpath or this option
  2. How in general this data is collected, in what form and how to arrange it so that you can search at home. For example, the input parameters can be different, for example: 1 adult and 1 child, or 2 adults and 2 children, and so on. In general, I have no idea and I don’t undertake to argue, but as an option I assume that the maximum number of free places is taken and put in the database. or each option is considered separately.
  3. Users ilBEastli , ThePretender , advertise solved these problems, I would like to consider your examples or ask questions in skype for example

Thanks everyone for the replies!

Answer the question

In order to leave comments, you need to log in

3 answer(s)
T
ThePretender, 2014-11-27
@ThePretender

I can answer the first question: never rely on the fact that you will be given a page with valid HTML. We used the usual regexp to extract information. Long and slow, but reliable. Perhaps there is a bible for PHP that can build a partial DOM from invalid HTML. For .NET, there is such a bible, this is the preferred option.
There are a few things to think about when it comes to storage format:
1. Google the tourism industry's presentation standards. It is useful in order to format the brain for this industry, but, in fact, no one supports these standards in our country.
2. Analyze the data structure of the operators you have chosen. It will not be difficult to single out the general structure.
3. Immediately think about the parameters by which you will search for data. This is your main feature, so it needs to be given the most attention. Think about data storage technology: SQL, NoSQL, a hybrid solution (for example, SQL stores normalized source data, and NoSQL generates denormalized views tailored for fast search).
I can’t say anything more on this topic, because I haven’t been working in this industry for 5 years. During this time, a bunch of new technologies have appeared, and the approach to data storage can be completely different.

X
xmoonlight, 2014-11-27
@xmoonlight

1. Parsing: datacol
2. Storage: trees based on ID: id, parent_id, param1,....,paramN
About children and adults - combinatorics.
Data bits: 0 - child, 1 - adult, 00-11 - number from 0 to 3.
Then:
000000 - no places
000001 - 1 child
000101 - 1 adult
101001 - 1 adult and 1 child
110001 - 2 adults and 1 child
110010 - 2 adults and 2 children
111011 - 3 adults and 3 children
With a fixed data order, you can get rid of 2 extra bits.

P
Puma Thailand, 2014-11-28
@opium

1) Better without any tools with a regular regular season.
2) Each separately
3) Did not decide.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question