M
M
Maxim Britvin2016-08-04 13:46:54
Python
Maxim Britvin, 2016-08-04 13:46:54

Is it possible to save changes in BeautifulSoup?

Given:
Saved copy of the book (in html format), downloaded 3-4 years ago. All links in each file are not working (for example, there is a file describing the contents of the book, all transitions in it lead to a site that issues 404). However, the desired pages are saved, along with all the content.
Needed:
Fix all broken links to links to local files. Handles do a long and, to be honest, lazy. I decided to try to parse these pages and change the wrong links to the right ones along the way.
Wrote code in Python 3.5:

from bs4 import BeautifulSoup

file = open(path_to_file, "r+")
soup = BeautifulSoup(file,"lxml")
links = soup.find_all("a")
images = soup.find_all("img")
print(path_to_file)
for link in links:
    print(link['href'])
for image in images:
    if image['src'] == "wrong_link":
        image['src'] = "Changed"
        print(image['src'])
file.close()

When outputting to the Python console, everything is fine, everything changes. However, when viewing the file, it remains the same. Is there a way to save changes to the same file?
If it matters, then Windows 7.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
M
Maxim Britvin, 2016-08-04
@DarkwingDuck48

Found a way to save filenames. I hope that someone will come in handy.

file = open(path_to_file, "r+")
soup = BeautifulSoup(file, "lxml")
links = soup.find_all("a")
images = soup.find_all("img")
file2 = open(path_to_file, "w+", encoding='utf-8')
file2.write(soup.prettify())
file2.close()

S
sim3x, 2016-08-04
@sim3x

No
Need to write changes to a new file

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question