Answer the question
In order to leave comments, you need to log in
Is it possible to save changes in BeautifulSoup?
Given:
Saved copy of the book (in html format), downloaded 3-4 years ago. All links in each file are not working (for example, there is a file describing the contents of the book, all transitions in it lead to a site that issues 404). However, the desired pages are saved, along with all the content.
Needed:
Fix all broken links to links to local files. Handles do a long and, to be honest, lazy. I decided to try to parse these pages and change the wrong links to the right ones along the way.
Wrote code in Python 3.5:
from bs4 import BeautifulSoup
file = open(path_to_file, "r+")
soup = BeautifulSoup(file,"lxml")
links = soup.find_all("a")
images = soup.find_all("img")
print(path_to_file)
for link in links:
print(link['href'])
for image in images:
if image['src'] == "wrong_link":
image['src'] = "Changed"
print(image['src'])
file.close()
Answer the question
In order to leave comments, you need to log in
Found a way to save filenames. I hope that someone will come in handy.
file = open(path_to_file, "r+")
soup = BeautifulSoup(file, "lxml")
links = soup.find_all("a")
images = soup.find_all("img")
file2 = open(path_to_file, "w+", encoding='utf-8')
file2.write(soup.prettify())
file2.close()
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question