G
G
glodev2017-06-08 12:54:13
HTML
glodev, 2017-06-08 12:54:13

What is the implementation or tips for implementing reading and searching for information from a site in c++ (Parser)?

The idea is this, we enter the address, we read the page code, we select information from the page, let’s say the links, roughly speaking, we make a map, and so on, let’s say you need to start from one site to collect all the links from it and the pictures, and for all the links, collect the same links and pictures, and so until we say stop.
Interested in the implementation of reading html , advice on sampling from there (since some links can be scripts or css (can be filtered by the end)), well, if you have ideas how to implement all this, also multithreading.
The idea is to create a universal parser that starts with links and spreads collecting information in txt by mask

Answer the question

In order to leave comments, you need to log in

1 answer(s)
N
nirvimel, 2017-06-08
@glodev

  1. Downloading pages from the web using libcurl.
    There is no problem with this. XPath //awill only select real links to pages that can be navigated.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question