Python what means to parse an array of strings and extract the most used mask?

I

If-so-girl12014-10-10 13:21:58

Python

If-so-girl1, 2014-10-10 13:21:58

I need to parse an array of urls, and somehow extract a frequently used link mask. For example, I have the following URLs:
" lenta.ru/articles/2014/10/08/mosclassicgp "
" lenta.ru/photo/2014/10/07/longway "
" lenta.ru/photo/2014/ 10/03/misstuning "
" lenta.ru/photo/2014/08/27/nivajpg "
" lenta.ru/photo/2014/02/18/dynamic "
" lenta.ru/news/2014/10/08/nsxprice "
" lenta.ru/autosport "
Visual analysis shows that the most frequently used mask will be lenta.ru/photo <
I would like something similar by automated means, maybe there are some libraries for this, or, in extreme cases, some kind of algorithm.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

V

Valentine, 2014-10-10
@vvpoloskin

You are unlikely to find specific libraries, but the algorithm is extremely simple:

# Критерии
def isdigits(str):
   for i in str:
      if not i.isdigit()
         return False
   return True

def istext(str):
# какая-то логика

token = ("type_of_token", "value_of_token", len("value_of_token"))

def process_link(link):
   tokenlist = []
   for i in link.split('/'):
      if isdigit(i):
         tokenlist.append(("digit", i, len(i))
      if istext():
         tokenlist.append(("text", i, len(i))
   return tokenlist

It remains only to get a list of tokens for each link and count similar options)