F
F
frosty77777772015-09-19 12:43:59
data mining
frosty7777777, 2015-09-19 12:43:59

How to remove all duplicates from Data Frame?

There is a Data Frame with approximately the following content:
2 a ...
3 b ...
1 a ...
5 b ...
6 c ...
How to remove from it all lines containing duplicates with the condition that if several lines contain repetitions , then the line in which the value of the first variable is the smallest will remain. Those. the result should be:
3 b ...
1 a ...
6 c ...

Answer the question

In order to leave comments, you need to log in

3 answer(s)
F
frosty7777777, 2015-09-19
@frosty7777777

I found the following solution for myself:

A <-data.frame(c('A','A','B','C','C','A','C','B','A'),c(3,1,7,6,5,4,2,9,8))
names(A) <-c('name','number')

A <- A[order(A$name, A$number),]
A <- A[!duplicated(A$name),]

R
redmode, 2015-09-24
@redmode

Variant using 'dplyr':

library(dplyr)
A <- data.frame(name = c('A', 'A', 'B', 'C', 'C', 'A', 'C', 'B', 'A'),
                number = c(1:9))
> A

  name number
1    A      1
2    A      2
3    B      3
4    C      4
5    C      5
6    A      6
7    C      7
8    B      8
9    A      9

B <- A %>%
  group_by(name) %>%
  summarise(number = min(number))
> B

Source: local data frame [3 x 2]

    name number
  (fctr)  (int)
1      A      1
2      B      3
3      C      4

P
protven, 2015-09-19
@protven

I'm not a big connoisseur of R, so far I'm just learning. Therefore, I can immediately suggest using only the sqldf package, which allows you to work with data.frame as with a relational database.

A <-data.frame(c('A','A','B','C','C','A','C','B','A'),c(1:9))
names(A) <-c('name','number')
install.packages('sqldf')
library(sqldf)
> A
  name number
1    A      1
2    A      2
3    B      3
4    C      4
5    C      5
6    A      6
7    C      7
8    B      8
9    A      9
> sqldf("SELECT a1.name,a1.number from A a1 where a1.number=(SELECT min(a2.number) from A a2 where a2.name=a1.name)")
  name number
1    A      1
2    B      3
3    C      4
>

With pleasure I would see a better and more elegant version.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question