Using any() and all() for string columns in grouped data

Using any() and all() for string columns in grouped data

Problem Description:

I have a grouped data and I would like to create a new variable based on the values of the each of the rows.

> df <- data.frame(Group = c("A","A","A","B","B","B", "C", "C"), Gender=c("M","M","F","F","F","F", "M", "M"))
>   df
  Group Gender
1     A      M
2     A      M
3     A      F
4     B      F
5     B      F
6     B      F
7     C      M
8     C      M 

In this example, I would like to now if groups of A, B and C are

  • Male Only: all members of the group are male
  • Female Only: all members of the group are female
  • Mixed gender: there are males and females in the group

So the desired outcome is:

  Group Gender  gender_mix
1     A      M       Mixed
2     A      M       Mixed
3     A      F       Mixed
4     B      F Female Only
5     B      F Female Only
6     B      F Female Only
7     C      M   Male Only
8     C      M   Male Only

I tried using any() and all() with no luck:

>   df%>%
+     group_by(Group)%>%
+     mutate(gender_mix=case_when(all(Gender)=="M"~"Male Only",
+                                 all(Gender)=="F"~"FemAle Only",
+                                 any(Gender)=="M"&any(Gender)=="F"~"Mixed",
+                                 TRUE~NA_character_))
# A tibble: 8 × 3
# Groups:   Group [3]
  Group Gender gender_mix
  <chr> <chr>  <chr>     
1 A     M      NA        
2 A     M      NA        
3 A     F      NA        
4 B     F      NA        
5 B     F      NA        
6 B     F      NA        
7 C     M      NA        
8 C     M      NA        
There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
2: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
3: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
4: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
5: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
6: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
7: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
8: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
9: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
10: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
11: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
12: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".

Another issues is that my data is rather large (10M rows) and any() and all() seems to be very slow.

Any help would be much appreciated.

Solution – 1

You should put the entire conditions between parenthesis:

df %>% 
  group_by(Group) %>% 
  mutate(gender_mix = case_when(all(Gender == "M") ~ "Male only",
                                all(Gender == "F") ~ "Female only",
                                any(Gender == "F") & any(Gender == "M") ~ "Mixed",
                                TRUE ~ NA_character_))
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject