A
A
Andrew2021-11-18 21:52:02
Python
Andrew, 2021-11-18 21:52:02

How to filter a list by a given condition?

There is a list of elements (language-country).

['en-us', 'en-mc', 'en-gb', 'en-im', 'en-je', 'en-vg', 'en-ie', 'en-lu', 'sv-se', 'en-by', 'en-md', 'en-al', 'en-xk', 'en-me', 'fr-fr', 'fr-bl', 'fr-ch', 'es-es', 'it-it', 'it-sm', 'pt-pt', 'de-de', 'de-at', 'de-li', 'de-ch', 'nl-nl', 'nl-be', 'en-no', 'en-sj', 'en-fi', 'en-ax', 'en-dk', 'en-gl', 'en-is', 'ru-ru', 'pl-pl', 'bg-bg', 'cs-cz', 'el-gr', 'hu-hu', 'lt-lt', 'ro-ro', 'sk-sk', 'uk-ua', 'en-lv', 'en-rs', 'en-si', 'en-ba', 'en-cy', 'en-ee', 'en-hr', 'en-mk', 'en-mt', 'en-ph', 'en-mm', 'en-kh', 'en-mn', 'en-kz', 'en-la', 'en-za', 'en-ck', 'fr-ca', 'en-au', 'en-nz', 'es-ar', 'es-gt', 'es-do', 'es-hn', 'es-ni', 'es-pa', 'es-ec', 'es-py', 'es-ve', 'en-ae', 'en-lb', 'en-il', 'en-pk', 'id-id', 'tr-tr', 'ko-kr', 'th-th', 'en-ca', 'es-co', 'en-sg', 'zh-hk', 'zh-cn', 'en-in', 'en-bd', 'en-lk', 'en-np', 'en-mv', 'pt-br', 'es-pe', 'en-hk', 'ar', 'es-mx', 'ja-jp', 'en-my', 'vi-vn', 'zh-tw', 'en-se']


How to elegantly filter it by language, leaving unique language pairs. Suppose there are many elements in the list with the English language (en-us, en-lu, en-ca, etc.), it is necessary to leave only one en (it doesn’t matter which country, let’s say us) and so on, get rid of the rest of the duplicate languages ​​(for example de)...

Answer the question

In order to leave comments, you need to log in

2 answer(s)
V
Vladimir Kuts, 2021-11-19
@fox_12

langs = ['en-us', 'en-mc', 'en-gb', 'en-im', 'en-je', 'en-vg', 'en-ie', 'en-lu', 'sv-se', 'en-by', 'en-md', 'en-al', 'en-xk', 'en-me', 'fr-fr', 'fr-bl', 'fr-ch', 'es-es', 'it-it', 'it-sm', 'pt-pt', 'de-de', 'de-at', 'de-li', 'de-ch', 'nl-nl', 'nl-be', 'en-no', 'en-sj', 'en-fi', 'en-ax', 'en-dk', 'en-gl', 'en-is', 'ru-ru', 'pl-pl', 'bg-bg', 'cs-cz', 'el-gr', 'hu-hu', 'lt-lt', 'ro-ro', 'sk-sk', 'uk-ua', 'en-lv', 'en-rs', 'en-si', 'en-ba', 'en-cy', 'en-ee', 'en-hr', 'en-mk', 'en-mt', 'en-ph', 'en-mm', 'en-kh', 'en-mn', 'en-kz', 'en-la', 'en-za', 'en-ck', 'fr-ca', 'en-au', 'en-nz', 'es-ar', 'es-gt', 'es-do', 'es-hn', 'es-ni', 'es-pa', 'es-ec', 'es-py', 'es-ve', 'en-ae', 'en-lb', 'en-il', 'en-pk', 'id-id', 'tr-tr', 'ko-kr', 'th-th', 'en-ca', 'es-co', 'en-sg', 'zh-hk', 'zh-cn', 'en-in', 'en-bd', 'en-lk', 'en-np', 'en-mv', 'pt-br', 'es-pe', 'en-hk', 'ar', 'es-mx', 'ja-jp', 'en-my', 'vi-vn', 'zh-tw', 'en-se']

out_langs = []
for lang in langs:
    if not lang.split('-')[0] in [x.split('-')[0] for x in out_langs]:
        out_langs.append(lang)
print(out_langs)
# ['en-us', 'sv-se', 'fr-fr', 'es-es', 'it-it', 'pt-pt', 'de-de', 'nl-nl', 'ru-ru', 'pl-pl', 'bg-bg', 'cs-cz', 'el-gr', 'hu-hu', 'lt-lt', 'ro-ro', 'sk-sk', 'uk-ua', 'id-id', 'tr-tr', 'ko-kr', 'th-th', 'zh-hk', 'ar', 'ja-jp', 'vi-vn']

or so a bit more optimal:
langs = ...

out_langs = []
current_lang = None
for lang in sorted(langs):
    if current_lang != lang.split('-')[0]:
        out_langs.append(lang)
        current_lang = lang.split('-')[0]
print(out_langs)
# ['ar', 'bg-bg', 'cs-cz', 'de-at', 'el-gr', 'en-ae', 'es-ar', 'fr-bl', 'hu-hu', 'id-id', 'it-it', 'ja-jp', 'ko-kr', 'lt-lt', 'nl-be', 'pl-pl', 'pt-br', 'ro-ro', 'ru-ru', 'sk-sk', 'sv-se', 'th-th', 'tr-tr', 'uk-ua', 'vi-vn', 'zh-cn']

V
Vindicar, 2021-11-18
@Vindicar

langs = ['en-us', 'en-mc', 'en-gb', 'en-im', 'en-je', 'en-vg', 'en-ie', 'en-lu', 'sv-se', 'en-by', 'en-md', 'en-al', 'en-xk', 'en-me', 'fr-fr', 'fr-bl', 'fr-ch', 'es-es', 'it-it', 'it-sm', 'pt-pt', 'de-de', 'de-at', 'de-li', 'de-ch', 'nl-nl', 'nl-be', 'en-no', 'en-sj', 'en-fi', 'en-ax', 'en-dk', 'en-gl', 'en-is', 'ru-ru', 'pl-pl', 'bg-bg', 'cs-cz', 'el-gr', 'hu-hu', 'lt-lt', 'ro-ro', 'sk-sk', 'uk-ua', 'en-lv', 'en-rs', 'en-si', 'en-ba', 'en-cy', 'en-ee', 'en-hr', 'en-mk', 'en-mt', 'en-ph', 'en-mm', 'en-kh', 'en-mn', 'en-kz', 'en-la', 'en-za', 'en-ck', 'fr-ca', 'en-au', 'en-nz', 'es-ar', 'es-gt', 'es-do', 'es-hn', 'es-ni', 'es-pa', 'es-ec', 'es-py', 'es-ve', 'en-ae', 'en-lb', 'en-il', 'en-pk', 'id-id', 'tr-tr', 'ko-kr', 'th-th', 'en-ca', 'es-co', 'en-sg', 'zh-hk', 'zh-cn', 'en-in', 'en-bd', 'en-lk', 'en-np', 'en-mv', 'pt-br', 'es-pe', 'en-hk', 'ar', 'es-mx', 'ja-jp', 'en-my', 'vi-vn', 'zh-tw', 'en-se']

lang_pairs = { l.partition('-')[0]:l for l in langs }

This code will leave the last occurrence of the main language in langs. If you need otherwise, sort the langs list, or redefine the desired entries separately.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question