PHP parser. How to make a function to extract text from a given link?

R

Roma Kozubiak2017-06-07 16:03:59

PHP

Roma Kozubiak, 2017-06-07 16:03:59

I am making a PHP parser that should copy all publications from the site and display this information on my site (this is not content theft, I agreed with the site owner)!
I have already written a code that copies the list of publications on the main page (title, photo and short text), now I need to parse the content of each publication, for this I started to parse links to all publications (on the main page of the site). Now I need to write a function that will parse the content of each post for these links. Please show with an example how to parse the text that is inside each link!

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

A

Alexander Shapoval, 2017-06-07
@sanek_os9

There is a good library simple_html_dom read the documentation for a lot of information on it.
Example of my code:

$html = new simple_html_dom();
            $html->load_file($_GET['go']);
            $name = $html->find('h2');
            $description = $html->find('div');    
            $video = $html->find('iframe[width=770]'); 
            $video = $text = preg_replace('/.*\/embed\/(.+)/i', 'https://www.youtube.com/watch?v=$1', $video[0]->src) ;
            $description = replace($description[$_GET['id']]->xmltext) ;
            $description = preg_replace('/<!.*>.*/is', '', $description) ;
            $spoilers = $html->find('div.uSpoilerText') ;
            $spoiler = '' ;
            foreach($spoilers AS $k => $post){
                $text = replace($post->xmltext) ;
                preg_match_all("/<!--usn\(\=(.*)\)-->/i", $text, $title);
                $text = preg_replace('/<!--ust-->/i', '', $text) ;
                $text = preg_replace('/<!--usn\(\=(.*)\)-->/i', '', $text) ;
                $text = preg_replace('/<!--\/ust-->/i', '[/spoiler]', $text) ;
                    //echo $k . ' -> ' . $post->xmltext . '<hr /><br />' ;
                $spoiler .= '[spoiler title="' . $title[1][0] . '"]' . $text . "\n\n" ;
            }
            
            $description = '[b]Название:[/b] [u]' . $name[0]->plaintext . '[/u] скачать торрент' . $description . $spoiler . $video ;
            //$file->meta_description = $title . ' скачать с торрента бесплатно в хорошем качестве' ; 
            //$file->runame = $name[0]->plaintext ;
            $file->description = $description ;
        }
        $groups = groups::load_ini(); // загружаем массив групп

        $form = new form(new url);
        $form->text('name', __('Название файла') . ' *', isset($name[0]->plaintext) ? $name[0]->plaintext : $file->runame);
        $form->text('link_name', __('Доступен по адресу'), $file->name);
        $form->textarea('description', __('Описание'), $file->description);
        $form->textarea('description_small', __('Краткое описание'), $file->description_small);

Pay no attention to $form.
An example of a page that I parse https://manytorrents.pro/load/films/boeviki/chudo_...
The script has not been used for a long time, so this particular example may not work for this page

K

Kirill Gorelov, 2017-06-07
@Kirill-Gorelov

Well, everything is great.
First you get a list of links that you want to parse.
You drive this business in an array. And then in the array you go through each link and parse what you need.