Answer the question
In order to leave comments, you need to log in
What is the correct way to use the and (&) operator in pandas?
I have a lot of data frames (in the example there are several), I need to check each of them for the presence of keywords in the first column. To get the desired date set, I have to get two (or more) matches, in a normal situation this is solved through the 'and' operator, but it does not work in pandas (I know that in pandas the 'and' operator is denoted as '&').
At the same time, the 'or' operator ('|') works great. But the problem is that '|' at least one match is enough to fulfill the condition, but I need several.
I understand that with the '&' operator both conditions will be checked, which will be applied to each row in the data frame, I would like to know if it is possible to apply the conditions to the whole column and not row by row ?
import pandas as pd
import numpy as np
import re
df_16 = pd.DataFrame({"number": ["onE: ", " Two:", np.nan, "fIve", "[email protected]", "tHree", " one", "tWo", "tWo SIx", "one"], 'phone': [11, 22, 33, 44, 55, 300, 77, 1000, 1, 5],
"item": ["debt", "phone", "room", "go", "hoot", "aaple", "orange", "phone", "tool", 'hoooo'],
"seria": [1050.34344343, 150, 544, 245, 230, 50, 300, 70, 33, 100]},
index=['q', 'e', 'r', 't', 'u', 'y', 'j', 'v', 'm', 'g'])
df_17 = pd.DataFrame({"number": ["onE#", "tWo:", "six ", " fiVE ", "four", "thRee:", np.nan, "twO", "two Six", "Two fOr"], 'phone': [11, 22, 33, 44, 55, 300, 77, 1000, 1, 5],
"item": ["debt", "phone", "room", "go", "hoot", "aaple", "orange", "phone", "tool", 'hoooo'],
"seria": [1050.34344343, 160, 544, 245, 230, 50, 300, 70, 33, 100]},
index=['q', 'e', 'r', 't', 'u', 'y', 'j', 'v', 'm', 'g'])
df_18 = pd.DataFrame({"number": ["gggg", "TTt", "HHHH ", " HJkl ", "green", "redd", np.nan, "twO", "nuul", "Null"], 'phone': [11, 22, 33, 44, 55, 300, 77, 1000, 1, 5],
"item": ["debt", "phone", "room", "go", "hoot", "aaple", "orange", "phone", "tool", 'hoooo'],
"seria": [1050.34344343, 160, 544, 245, 230, 50, 300, 70, 33, 100]},
index=['q', 'e', 'r', 't', 'u', 'y', 'j', 'v', 'm', 'g'])
list_data = [df_16, df_17]
for i in list_data:
i['number'] = i['number'].fillna('empty') # заполняет nan значения
i['number'] = i['number'].agg([lambda do: re.sub(r'[#@:]', ' ', do)], axis=0) # убирает ненужные символы
i['number'] = i['number'].agg([lambda do: re.sub(r'\s+', ' ', do)], axis=0) # убирает двойные (или больше) пробелы
# переводит строк в нижний регистр, убирает пробелы (которые остались), поиск совпадений
check = (i['number'].astype(str).str.lower().str.strip().str.fullmatch('two (six|for)') & i['number'].astype(str).str.lower().str.strip().str.fullmatch('six'))
if check.any():
getting = i
print(getting)
print(check)
import pandas as pd
import numpy as np
import re
df_16 = pd.DataFrame({"number": ["onE: ", " Two:", np.nan, "fIve", "[email protected]", "tHree", " one", "tWo", "tWo SIx", "one"], 'phone': [11, 22, 33, 44, 55, 300, 77, 1000, 1, 5],
"item": ["debt", "phone", "room", "go", "hoot", "aaple", "orange", "phone", "tool", 'hoooo'],
"seria": [1050.34344343, 150, 544, 245, 230, 50, 300, 70, 33, 100]},
index=['q', 'e', 'r', 't', 'u', 'y', 'j', 'v', 'm', 'g'])
df_17 = pd.DataFrame({"number": ["onE#", "tWo:", "six ", " fiVE ", "four", "thRee:", np.nan, "twO", "two Six", "Two fOr"], 'phone': [11, 22, 33, 44, 55, 300, 77, 1000, 1, 5],
"item": ["debt", "phone", "room", "go", "hoot", "aaple", "orange", "phone", "tool", 'hoooo'],
"seria": [1050.34344343, 160, 544, 245, 230, 50, 300, 70, 33, 100]},
index=['q', 'e', 'r', 't', 'u', 'y', 'j', 'v', 'm', 'g'])
df_18 = pd.DataFrame({"number": ["gggg", "TTt", "HHHH ", " HJkl ", "green", "redd", np.nan, "twO", "nuul", "Null"], 'phone': [11, 22, 33, 44, 55, 300, 77, 1000, 1, 5],
"item": ["debt", "phone", "room", "go", "hoot", "aaple", "orange", "phone", "tool", 'hoooo'],
"seria": [1050.34344343, 160, 544, 245, 230, 50, 300, 70, 33, 100]},
index=['q', 'e', 'r', 't', 'u', 'y', 'j', 'v', 'm', 'g'])
list_data = [df_16, df_17]
for i in list_data:
i['number'] = i['number'].fillna('empty') # заполняет nan значения
i['number'] = i['number'].agg([lambda do: re.sub(r'[#@:]', ' ', do)], axis=0) # убирает ненужные символы
i['number'] = i['number'].agg([lambda do: re.sub(r'\s+', ' ', do)], axis=0) # убирает двойные (или больше) пробелы
check = i['number'].astype(str).str.lower().str.strip().str.fullmatch('two (six|for)') # Проверка 1
check_1 = i['number'].astype(str).str.lower().str.strip().str.fullmatch('six') # Проверка 2
if check.any() and check_1.any():
print(check)
print(check_1)
getting = i
print(getting)
Answer the question
In order to leave comments, you need to log in
It is not entirely clear why the checks have different conditions, if the condition required to find more than 1 match of the condition, and not several conditions?
In general, check returns an array of boolean values, i.e. the number of matches can be found simply through sum()
check = (i['number'].astype(str).str.lower().str.strip().str.fullmatch('two (six|for)')
print('совпадений:', sum(check))
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question