[[+content_image]]
P
P
pintel2020-06-21 14:57:03
Python
pintel, 2020-06-21 14:57:03

pandas. Finding values ​​in a column with lists?

There are two dataframes

df1 = pd.DataFrame({'id': [24, 75, 32, 89]})
df2 = pd.DataFrame({'id': [4, 87, 145, 99, 175],
                    'lst': [[24, 56, 78], [24, 32, 89, 54, 127], [67], [78, 89, 34], [12, 45]]})

How to find all rows in df2 so that df1['id] values ​​are in lists.

Solution through loops - too slow
df3 = pd.DataFrame()
for i in df2.itertuples():
    for j in df1.itertuples():
        if j[1] in i[2]:
            df3 = df3.append({'id': i[1], 'lst': i[2]}, ignore_index=True)


isin() method does not work in this case
for j in df1.itertuples():
    df3 = df2[df2['id'].isin(j[1])]


Need this result
id                    lst
0   4           [24, 56, 78]
1  87  [24, 32, 89, 54, 127]
2  99           [78, 89, 34]

Answer the question

In order to leave comments, you need to log in

[[+comments_count]] answer(s)
A
Alan Gibizov, 2020-06-21
@pintel

Here, I scrawled on a napkin:

import pandas as pd
df1 = pd.DataFrame({'id': [24, 75, 32, 89]})
df2 = pd.DataFrame({'id': [4, 87, 145, 99, 146],
                    'lst': [[24, 56, 78], [24, 32, 89, 54, 127], [67], [78, 89, 34], [12, 45]]})

f1 = set(df1['id'])
df22 = df2.set_index('id')

for ident in df22.index:
    f2 = set(df22['lst'][ident])
    result = list(f1&f2)
    if len(result):
        print(ident, list(result))

D
Dmitry, 2020-06-21
@LazyTalent

No matter how cool it is, but any solution will be O(n^2), so in order for this to work as quickly as possible, you need to do everything with pandas. But with such a df2 architecture, you are unlikely to succeed, so think about how to convert the original dataframes.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question