Answer the question
In order to leave comments, you need to log in
How to parse html file into multiple files?
Please help me with static html parsing.
There is a file, quite unusual, and, from the point of view of standards, with a bunch of violations. But not about that now.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict/EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Parse Me</title>
</head>
<body>
<div id="my_id">
<!-- Something -->
</div>
<div id="my_id">
<!-- Something -->
</div>
<!-- Много <div id="my_id"> -->
<div id="my_id">
<!-- Something -->
</div>
</body>
</html>
<div id="my_id">
<!-- Something -->
</div>
and already from them create new separate html files.<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict/EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Parse Me</title>
</head>
<body>
<div id="my_id">
<!-- Something -->
</div>
</body>
</html>
Answer the question
In order to leave comments, you need to log in
In general, bs4 is considered a bitch here, but I still love it.
from bs4 import BeautifulSoup
src = """
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict/EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Parse Me</title>
</head>
<body>
<div id="my_id">
<!-- Something -->
</div>
<div id="my_id">
<!-- Something -->
</div>
<!-- Много <div id="my_id"> -->
<div id="my_id">
<!-- Something -->
</div>
</body>
</html>
"""
template = """
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict/EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Parse Me</title>
</head>
<body>
{}
</body>
</html>
"""
bs = BeautifulSoup(src)
divs = bs.find_all("div", {"id": "my_id"})
for div in divs:
print(template.format(div))
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question