I
I
Ilya2019-02-06 09:44:21
HTML
Ilya, 2019-02-06 09:44:21

How to collect the text inside the links from the HTML code of the page?

Hello!
There is a site-aggregator of tutors, from there it is necessary to collect the names of all tutors.
300+ pagers , 10 tutors on one page .
The names of the tutors are listed inside the link with the class teacer-name
Example:

<a href="/repetitor.aspx?id=4350" class="teacher-name"> Полина Игоревна</a>

Is it possible to collect the content of these links with some tool instead of manually?
Please explain, I don't know much about this topic.
Thanks in advance!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
2
2cha.headz, 2019-02-06
@glagolew059

you can use simple_html_dom.php (parses html pages)
then you can get a list of pages (I hope everything is ok on your site) from sitemap.xml
code example ( errors are possible, I write without checking the syntax)) )

require_once($_SERVER["DOCUMENT_ROOT"] . "/parser/simple_html_dom.php");

$sitemap = "http://example.ru/sitemap.xml";
$xmlstring = file_get_contents($sitemap);

$xml = simplexml_load_string($xmlstring);
$json = json_encode($xml);
$array = json_decode($json,TRUE);

foreach($array['url'] as $link) {

        $url = $link['loc'];
  $html = file_get_contents($url);
  $data = str_get_html($html);

        $teacherArray = $data->find('.teacer-name'); //тут массив ссылок
    
        if(count($teacherArray)) {
                foreach($teacherArray as $a){
                    echo $a->href;
                    echo $a->plaintext;
                }
         }

}

V
Vladimir, 2019-02-06
@djQuery

curl + html dom parser could help you. But if you are not well versed, it is better to contact specialists.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question