E
E
ekzotika2020-12-11 17:52:04
Python
ekzotika, 2020-12-11 17:52:04

How to find a link in a string completely?

I need to find and cut the desired link in a string. Now I find links like this:

pattern = r'<a rel="(.+?)">'
s = re.findall(pattern, item.content)

I go through the cycle

for string in s:
...

But then after a certain condition, if it is true, I need to remove this particular string from item.content, while leaving the name of the link that is visible on the page. Tell me how to do it?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
devdb, 2020-12-18
@devdb

Alternatively (not the only possible solution):

pattern = r'(<a rel=")(.+?)(">)'
splitted = re.split( pattern, html_str ) 
# splitted == [ '<html>...', '<a rel="', 'http://site.com/image1.jpg', '">', '<div>...', '<a rel="', 'http://site.com/image2.jpg', '">', ... ]
urls = splitted[2::4]
# urls == ['http://site.com/image1.jpg', 'http://site.com/image2.jpg', ... ]

Then iterate through the splitted in a loop with step 4, and if the image does not satisfy the condition, remove the current ones [ '<a rel="', 'http://site.com/image2.jpg', '">' ]from the splitted list or replace them with something (for example, "link name").
And after cleaning:
cleaned_html_str = ''.join(splitted)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question