D
D
Daniel Reed2016-02-03 11:32:08
Python
Daniel Reed, 2016-02-03 11:32:08

How to automatically validate html?

There was a need to validate many html files. Arrange indents (or at least not delete existing ones), close unclosed tags, and so on.
Of the online services, I did not find those that close the tags, only indent them.
From the Tidy programs, but she is weird and does something completely different. text turns into text
I tried to use html5lib for python, but nothing good came of it either.

import html5lib

def pars(html):
    parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
    dom_tree = parser.parseFragment(html)
    walker = html5lib.getTreeWalker("dom")
    stream = walker(dom_tree)

    s = html5lib.serializer.htmlserializer.HTMLSerializer(omit_optional_tags=False)
    return u''.join(s.serialize(stream))

res = pars(u'html code')
print res

Are there ready-made solutions for such validation, at least to close unclosed tags?
PS It seems that chrome itself can close unclosed tags, but can other browsers, like chrome, slightly tweak html, closing unclosed tags, etc.? (firefox, opera, ie)
Thanks for the replies.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Alex, 2016-02-03
@streetflush

Use Jade

S
sim3x, 2016-02-03
@sim3x

lxml.de/parsing.html#parsing-html+stackoverflow.com/a/9050454
_
_
lxml.etree.HTMLParser()? - (because the xml is broken) Here's a secret - HTMLParser() is... a Parser with recover=True

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question