Answer the question
In order to leave comments, you need to log in
How to implement a loop for deleting words from a string in a dataframe?
Good afternoon.
I'm trying to write a loop to remove certain words from a string, perhaps this can be implemented much more easily.
I have a list of settlements where there are errors instead of the letter "ё" it says "e" or instead of "city" it says "village". I'm trying to create a column only by name, after deleting all the 'extra' words. As a result, nothing changes.
wordlist = ['поселок','посёлок','городской','городского','типа','деревня']
import re
def locality_id(row):
name_id = row['locality_name']
if name_id in wordlist:
name_id = re.sub('(' + '|'.join(wordlist) + ')','',name_id)
return name_id
else:
return name_id
Answer the question
In order to leave comments, you need to log in
1. How is the locality written in locality_name ? If 'village is Bearish', then the function will not process it, because if checks for a match with the list element as a whole, and not the village separately. IMHO it's easier not to do an additional check, but to process everything at once
2. You need to remove the space / s after the deleted word / and remove it via lstrip or add a space to the words in the list in the regexp,
3. Add the options Village, City, Township
wordlist = ['Посёлок','Поселок','поселок','посёлок','городской','городского','типа','деревня','Деревня']
def locality_id(row):
name_id = row['locality_name']
name_id = re.sub('(' + '|'.join(wordlist) + ')','',name_id).lstrip()
return name_id
for idx, row in df1.iterrows():
print ('cell=', df1.loc[idx, 'locality_name'])
df1.loc[idx, 'locality_name']=new_cell
new_cell=locality_id(row)
print ('new_cell=',df1.loc[idx, 'locality_name'])
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question