S
S
Sergey Eremin2014-10-17 04:23:52
Parsing
Sergey Eremin, 2014-10-17 04:23:52

Ethical question: to parse the site or try to negotiate?

For research, publication, project (underline as appropriate), data from open sources were needed. These are cards of all buildings, with their energy efficiency figures, number of storeys, area, degree of wear, etc., etc. State website (supported by the Housing and Utilities Reform Assistance Fund, the Ministry of Construction and Housing and a bunch of other organizations, committees and departments). Unfortunately, they do not have an API, how to take this data honestly, no (well, or there is no information about this API). To the second regret, the structure of the file cabinet is a "black box". That is, if parsing with the help of brute force, then this is to check 10 million cards, of which most are empty (well, so many houses have not been built in Russia yet). Of course, such a load when parsing a site will be noticeable on their side. I'm afraid,
I am writing a polite letter to their support: your site is beautiful... you are super... but I need some data, a lot of data... And my need for them is great, because. I'm doing something like this, which will also be partly useful to the state ... we need such fields (list). Say, you have an open, state-owned source, I can take it myself, but I'm afraid to create problems with unnecessary loads. If you can, send us the files, and if this is not possible, tell us what loads per request per second will be acceptable for you ...
In response, silence.
Question: wait or start parsing? In general, how are my intentions from the ethical side?
PS You can also try looking for other channels. Once I wrote for Expert and Kommersant, and I still have both Pressa crusts and corporate e-mails. You can organize an official media request (and even, probably, on a letterhead). Only this is already quite a forgery. That is, if necessary, I can easily write an article based on the results of the analysis, but there is no assignment of editors for it. In addition, it is not a fact that the data will be given, since, in fact, the media need consolidated reports, and not information on all objects. And this is logical and fair.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Denis, 2014-10-17
@Sergei_Erjemin

We had a similar situation, only we immediately started parsing. And after a while, the admins of the resource phoned us and offered a ready-made database :-)
So there is a positive side to parsing. In addition, you can parse carefully, but it will take time.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question