A
A
Albion262020-06-22 00:19:20
Python
Albion26, 2020-06-22 00:19:20

How to get UTM tags from the URL if they are not registered in all cases?

The essence of the problem: I have a large list of domains.
Example:

list_url = [{'id': '7a8809acc2b249b7a868a49b89793cc9',
'url': 'https://mysite.com/utm_source=facebook&utm_medium=cpc'},
{'id': '7a8809acc2b249b7a868a49b89793cc4',
'url': 'https://mysite.com/contacts'}]

I iterate over each of them and write id, url, source, medium into a separate dictionary if UTM tags are specified, I do this through regular expressions.
'url': list_url['url'],
'source' : re.findall('(?<=utm_source=).*(?=&utm_medium)',list_url['url']),
'medium' : re.findall('(?<=utm_medium=).*(?=&utm_campaign)',list_url['url'])

If you use the findall method in source, medium, a list is written to get the value from it, I can refer to the null element, but in this case, all cases where there are no source, medium values ​​are discarded.
How can I write a condition for replacing empty list values ​​with None.
In general, as a result, I should get the following:
[{'id': '7a8809acc2b249b7a868a49b89793cc9',
'url': 'https://mysite.com/utm_source=facebook&utm_medium=cpc',
'source': 'facebook',
'medium' : 'cpc'},

'{id': '7a8809acc2b249b7a868a49b89793cc4',
'url': 'https://mysite.com/contacts',
'source' : None
'medium' : None}]

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
soremix, 2020-06-22
@Albion26

def parse(mark, url):
    result = re.search(r'utm_{}=(.+?)(&|$)'.format(mark), url)
    if result:
        return result.group(1)
    return None

'source' : parse('source', list_url['url'])

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question