How to parse a public procurement website?

A

Alexey Maltsev2013-11-27 15:00:44

.NET

Alexey Maltsev, 2013-11-27 15:00:44

Good afternoon, there is one public procurement website - zakupki.gov.ru
There is a task from the management to write an application on dotnet in order, roughly speaking, to see in the application window a list of lots for certain dates, by default from yesterday to today.
Then scroll through this list, mark the lots of interest, and then with one button get a list of links to them.
Let's say we open this link , we get a list of applications.
We look at the source code, we see an interesting line there:
It is followed by many even more interesting lines: , each of which, of course, is a separate lot in the list.
Now the question is, in which direction should I dig and what to read, given that I have no idea how to parse it and have no idea about the dotnet? Help me please.
I'm going to do it in C #, because I just came across it a couple of years ago, I wrote calculators with tags for fun and experience.

Reply

Answer the question

In order to leave comments, you need to log in

7 answer(s)

A

Alexander, 2015-02-09
@Aleserche

The answer is very late, but still.
A more correct option (if not the most correct) is to use data from the open part of the site. They are hosted on ftp.zakupki.gov.ru . Password and login: free and free. It contains all the xml data.
This link contains the documents necessary for working with the OOS (diagrams, description, etc.)

V

Vit, 2013-11-27
@fornit1917

> see in the application window a list of lots for certain dates
Insert into the WebView window and display the public procurement site in it :)
But seriously, take the System.Net.Http.HttpClient class, use it to download the source code of the page and among the "interesting lines" take the ones you need.
"Interesting Lines" is the html markup of what you see in your browser window when you visit the site. All the information that you see with your eyes can be found in these very "interesting lines". Just choose what you need and save locally or display, depending on the requirements.

G

Gregory, 2013-11-27
@gvas_ru

Do you need a list of links? Can open a WebView and hook to the link click event, saving it?

A

Alexey, 2013-11-27
@ScorpLeX

Well, the algorithm is quite simple:

Load the html page by substituting the dates in the url orderPublishDateFrom=11/25/2013&orderPublishDateTo=11/27/2013
We collect regexp'om id of new zazkza for example like this /common_info\/show?source=epz¬ificationId=(.?*)"/
We form links and output\save.

L

leclecovich, 2013-11-28
@leclecovich

First of all, @fornit1917 is right about getting the web page. But parsing with regular expressions is not the best idea. You can look at CsQuery, a handy library, easily installed via NuGet.

D

Dmitry, 2013-11-28
@mezastel

If you have no idea how to do it, I advise you to hire those who can. It will be very expensive, true, but you won’t have to suffer. Data acquisition is an art.

F

fragolino, 2018-08-19
@fragolino

Here is a ready-made parser for sale zakupki-online.site