Answer the question
In order to leave comments, you need to log in
How to break a web document into semantic blocks?
Good day!
Tell me, who has come across a similar one, how can you use php to break an html page into blocks: select the main part with content, menu, footer, etc., without knowing the DOM structure?
I am more interested in defining the main content of the page.
The network has materials on this topic, but something I did not quite understand the implementation algorithm.
For example:
habrahabr.ru/post/210824
www.vestnik.vsu.ru/pdf/analiz/2008/02/2008_02_20.pdf
Answer the question
In order to leave comments, you need to log in
I just did this 2 years ago)
And everything is simple - it's a logical subtraction of two pages with different content:
two different articles, two products, etc. (i.e. "leaf" elements of the tree structure)
Accordingly, after that you get just the code of only the part that has changed.
Then, sort the blocks (from the result) by the amount of text in DESCING order and get the code containing the content (the first element of the list).
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question