V
V
Vyacheslav2015-09-21 11:13:56
Regular Expressions
Vyacheslav, 2015-09-21 11:13:56

How to find sentences containing a URL on a web page?

The task is this: to find in the html markup all sentences that contain at least one URL-like substring.
url can be of the form aaa.bbb....(/dir/page/?asdf) - the following expression \S*?\.([az.])+(/.*?\s)?) is suitable for them.
The difference between links and non-links is not important, sentences can contain tags, etc.
I want to understand whether it is possible to implement such an algorithm using regular expressions (and without additional coding in the language):
I find the URL, for example, according to the specified pattern, then I search back to the first combination of dot + space character and search forward to the same combination, and everything that turned out between these positions I get as a result.
PS. I'm using Python, but any compatible engine will do.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
L
lyeskin, 2015-09-21
@lyeskin

You need to make a regular expression for "everything in between" and wrap it in parentheses.
/\.\s(%your regular expression)\.\s/

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question