Answer the question
In order to leave comments, you need to log in
How to get only text from html page?
Good evening.
You need to get all the text that is on the page of the site.
Without binding to tag attributes. It is necessary to exclude script, iframe.
At the end, you need to write it to a text file, with line breaks.
Started with the following:
$str = file_get_contents('http://site.com');
$doc = new DOMDocument();
@$doc->loadHTML($str);
$body = $doc->getElementsByTagName('body');
...
Заголовок
Подзаголовок
Текст
Меню
Текст
Текст
и т.д.
Answer the question
In order to leave comments, you need to log in
$text=<<<t
hellow <b>test</b> <iframe src=javascript>browser is bad!</iframe>
<script>alert('hi');</script>
test2<br>
<script>alert('hi');</script>
test3
t;
$text=preg_replace('#(\<iframe.*?\/iframe>)#u',"",$text);
$text=preg_replace('#(\<script.*?\/script>)#u',"",$text);
$text=preg_replace('#(\<(\/?[^>]+)>)#u',"",$text);
$text=preg_replace('#((\n\r)+)#u',"",$text);
echo "<pre>".$text."</pre>";
/*
hellow test
test2
test3
*/
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question