Answer the question
In order to leave comments, you need to log in
How to remove all duplicates from Data Frame?
There is a Data Frame with approximately the following content:
2 a ...
3 b ...
1 a ...
5 b ...
6 c ...
How to remove from it all lines containing duplicates with the condition that if several lines contain repetitions , then the line in which the value of the first variable is the smallest will remain. Those. the result should be:
3 b ...
1 a ...
6 c ...
Answer the question
In order to leave comments, you need to log in
I found the following solution for myself:
A <-data.frame(c('A','A','B','C','C','A','C','B','A'),c(3,1,7,6,5,4,2,9,8))
names(A) <-c('name','number')
A <- A[order(A$name, A$number),]
A <- A[!duplicated(A$name),]
Variant using 'dplyr':
library(dplyr)
A <- data.frame(name = c('A', 'A', 'B', 'C', 'C', 'A', 'C', 'B', 'A'),
number = c(1:9))
> A
name number
1 A 1
2 A 2
3 B 3
4 C 4
5 C 5
6 A 6
7 C 7
8 B 8
9 A 9
B <- A %>%
group_by(name) %>%
summarise(number = min(number))
> B
Source: local data frame [3 x 2]
name number
(fctr) (int)
1 A 1
2 B 3
3 C 4
I'm not a big connoisseur of R, so far I'm just learning. Therefore, I can immediately suggest using only the sqldf package, which allows you to work with data.frame as with a relational database.
A <-data.frame(c('A','A','B','C','C','A','C','B','A'),c(1:9))
names(A) <-c('name','number')
install.packages('sqldf')
library(sqldf)
> A
name number
1 A 1
2 A 2
3 B 3
4 C 4
5 C 5
6 A 6
7 C 7
8 B 8
9 A 9
> sqldf("SELECT a1.name,a1.number from A a1 where a1.number=(SELECT min(a2.number) from A a2 where a2.name=a1.name)")
name number
1 A 1
2 B 3
3 C 4
>
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question