Answer the question
In order to leave comments, you need to log in
How to sort different phone numbers?
Hello!
There was a question: there is a file with >10000 lines.
.txt
The file contains data in the form:
First Name Last Name : 899999999- phone number
Answer the question
In order to leave comments, you need to log in
I can’t suggest any library for working with numerals in Russian (and in any other language), but have you thought about making the parser simpler?
Something like:
import re
REPLACEMENT = {
'ноль': '0',
'один': '1',
'два': '2',
'три': '3',
'четыре': '4',
'пять': '5',
'шесть': '6',
'семь': '7',
'восемь': '8',
'девять': '9'
}
PHONE_REGEX = re.compile('(\+)?\d{10,11}')
def parse_phones(file_path):
parsed = []
unparsed = []
with open(file_path, 'r') as file:
for line in file:
name, phone, *_ = line.split(':')
name = name.strip()
phone = phone.strip()
for key, value in REPLACEMENT.items():
phone = phone.replace(key, value)
if PHONE_REGEX.match(phone):
phone_len = len(phone)
if phone_len == 10:
phone = '+7' + phone
elif phone_len == 11:
phone = '+7' + phone[1:]
parsed.append((name, phone))
else:
unparsed.append(line)
return parsed, unparsed
1. Filter out everything that is definitely not a phone (texts "no phone", etc.)
2. Turn numbers written in words into numbers.
3. Leave only numbers and "+" in the text.
If it is guaranteed in the dataset that only phones can be with numbers (there are no ip-addresses, postal codes, passport data, etc.), then it should work.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question