S
S
Sergey Karbivnichy2020-05-09 16:50:05
Regular Expressions
Sergey Karbivnichy, 2020-05-09 16:50:05

Regex - How to get the site address from a string?

There are many lines like this:

www.site.ru      [email protected] +7 926 33-2222-11123 Москва

I need to get the site address from there. Sites can be as follows:
https://www.site.ru
site.ru <-- here http.site.ru, just the habr parser spoils the site.ru links
a little
www.site.ru

I searched in google (both ru and eng) Five pages passed, then there is no point. Half of the regulars don't work. 90% of those who work search with https or http.

PS: The order of the data in the file is different.
PS: I also came up with this:
split the string on a space, then strip (), then check each element of the list for the presence of a dot and the absence of a dog? What do you think about this? I, in principle, at one time, because the data has already been loaded, now I am processing it. Or regular, which is better?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
Sergey Karbivnichy, 2020-05-09
@hottabxp

I killed 2.5 hours on the regular season, but nothing happened. Here I thought logically, and in 1 minute I threw it:

string = 'www.site.ru      [email protected] +7 926 33-2222-11123 Москва'

contacts = string.split()

for contact in contacts:
  if '.' in contact and not '@' in contact:
    www = contact
print(www)

More than 1K lines, and not one miss! Yes, it's like in that joke, now you have 2 problems.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question