Answer the question
In order to leave comments, you need to log in
How to collect the text inside the links from the HTML code of the page?
Hello!
There is a site-aggregator of tutors, from there it is necessary to collect the names of all tutors.
300+ pagers , 10 tutors on one page .
The names of the tutors are listed inside the link with the class teacer-name
Example:
<a href="/repetitor.aspx?id=4350" class="teacher-name"> Полина Игоревна</a>
Answer the question
In order to leave comments, you need to log in
you can use simple_html_dom.php (parses html pages)
then you can get a list of pages (I hope everything is ok on your site) from sitemap.xml
code example ( errors are possible, I write without checking the syntax)) )
require_once($_SERVER["DOCUMENT_ROOT"] . "/parser/simple_html_dom.php");
$sitemap = "http://example.ru/sitemap.xml";
$xmlstring = file_get_contents($sitemap);
$xml = simplexml_load_string($xmlstring);
$json = json_encode($xml);
$array = json_decode($json,TRUE);
foreach($array['url'] as $link) {
$url = $link['loc'];
$html = file_get_contents($url);
$data = str_get_html($html);
$teacherArray = $data->find('.teacer-name'); //тут массив ссылок
if(count($teacherArray)) {
foreach($teacherArray as $a){
echo $a->href;
echo $a->plaintext;
}
}
}
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question