Sampling from a Data Frame in Pandas. How to do it?

G

gadzhi152016-02-09 00:57:56

Python

gadzhi15, 2016-02-09 00:57:56

There is some data frame. The columns contain: First Name and Last Name, Age, Gender. I need to find out which name is more common in females. Created a new data frame and entered into it only those names that have F in the age field. The data frame turned out as follows:
Nasser, Mrs. Nicholas (Adele Achem)
Sandstrom, Miss. Marguerite Ruth
Bonnell, Miss. Elizabeth
Vestrom, Miss. Hulda Amanda Adolfina
As I understand it, the name comes after the words Miss or Mrs. Now there's a problem that I can't solve. How to delete the words and characters before the name in the "First and Last Name" column in the rows? Tried with str.lstrip and str.rstrip but it doesn't work.
PS Task from the course on Casera

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

V

Vladimir Olohtonov, 2016-02-09
@gadzhi15

Not certainly in that way. Name - the first word after "Miss", "Mrs" if there are no parentheses in the string.
In general, the easiest option for this task is to throw everything out to the point, and see which words are more, without understanding the name or not.
It is enough to assume that the most common first name occurs more often than the most common last name :)
Something like:

female_names = ['Nasser, Mrs. Nicholas (Adele Achem)', 'Sandstrom, Miss. Marguerite Rut', 'Bonnell, Miss. Elizabeth']
names = {}
for name in female_names:
    for word in a.split('.')[1].replace('(', '').replace(')', '').split():
        names.setdefault(word)
        names[word] += 1

M

Mark Adams, 2016-03-16
@ilyakmet

Shitty code, but it works.

import pandas

data = pandas.read_csv('titanic.csv', index_col='PassengerId')
data2 = data[data.Sex == 'female']['Name']


C = []
for i in data2:
  if '(' in i:
    if ')' in i.split('(')[1].split(' ')[0]:
      C.append(i.split('(')[1].split(' ')[0].split(')')[0])
    else:
      C.append(i.split('(')[1].split(' ')[0])

  else:
    C.append(i.split('. ')[1].split(' ')[0])

print pandas.DataFrame.from_dict(C)[0].value_counts()