# R: Selecting Multiple Columns Based on Conditions

## R: Selecting Multiple Columns Based on Conditions

Contents

Problem Description:

I have the following data in R:

``````id = 1:5
col1 = c("john", "henry", "adam", "jenna", "Phone: 222 2222")
col2 = c("river B8C 9L4", "Field U9H 5E2 PP", "NA", "ocean A1B 5H1 dd", "dave")
col3 = c("Phone: 111 1111 111", "steve", "forest K0Y 1U9 hu2", "NA", "NA")
col4 = c("matt", "peter", "Phone: 333 333 1113", "Phone: 444 111 1153", "kevin")
my_data = data.frame(id, col1, col2, col3, col4)

id            col1             col2                col3                col4
1  1            john    river B8C 9L4 Phone: 111 1111 111                matt
2  2           henry Field U9H 5E2 PP               steve               peter
3  3            adam               NA  forest K0Y 1U9 hu2 Phone: 333 333 1113
4  4           jenna ocean A1B 5H1 dd                  NA Phone: 444 111 1153
5  5 Phone: 222 2222             dave                  NA               kevin
``````

I am trying to accomplish the following task – I would like to create a new dataset with the following columns. For each row:

• Step 1: id (trivial, this is always the first column)
• Step 2: A column with the phone number
• Step 3: A column that satisfies the following condition `'(([A-Z] ?[0-9]){3})|.', '\1'`
• Step 4: Once Step 1 – Step 3 has been completed, I would like to combine all names into a single column

Here is a sample of the desired output:

``````  id              name             address                phone
1  1         john matt       river B8C 9L4  Phone: 111 1111 111
2  2 henry steve peter    Field U9H 5E2 PP                   NA
3  3              adam  forest K0Y 1U9 hu2  Phone: 333 333 1113
4  4             jenna  ocean A1B 5H1 dd    Phone: 444 111 1153
5  5        dave kevin                  NA      Phone: 222 2222
``````

Here is the code I have written:

``````my_data\$col1[grep("Phone", my_data\$col1)]
my_data\$col2[grep("Phone", my_data\$col2)]
my_data\$col3[grep("Phone", my_data\$col3)]
my_data\$col4[grep("Phone", my_data\$col4)]

my_data\$col1[grep( '(([A-Z] ?[0-9]){3})|.', '\1' , my_data\$col1)]
my_data\$col2[grep('(([A-Z] ?[0-9]){3})|.', '\1', my_data\$col2)]
my_data\$col3[grep('(([A-Z] ?[0-9]){3})|.', '\1', my_data\$col3)]
my_data\$col4[grep('(([A-Z] ?[0-9]){3})|.', '\1', my_data\$col4)]
``````

Based on the above code, I was thinking on identifying which of the columns meet the condition in each step, and then using the COLASCE statement in dplyr to create the final dataset. But I think this might be a very long way of accomplishing this problem.

Can someone please suggest a faster way to solve this problem?

Thanks!

## Solution – 1

Try this:

``````tmp <- apply(my_data[,-1], 1, function(z) { z <- z[!is.na(z) & z != "NA"]; ind <- dplyr::case_when(grepl("^Phone:", z) ~ "phone", grepl("(([A-Z] ?[0-9]){3})", z) ~ "address", TRUE ~ "name"); data.frame(lapply(split(z, ind), paste, collapse = " ")); })
tmp
# [[1]]
# 1 river B8C 9L4 john matt Phone: 111 1111 111
# [[2]]
# 1 Field U9H 5E2 PP henry steve peter
# [[3]]
# 1 forest K0Y 1U9 hu2 adam Phone: 333 333 1113
# [[4]]
# 1 ocean A1B 5H1 dd jenna Phone: 444 111 1153
# [[5]]
#         name           phone
# 1 dave kevin Phone: 222 2222
``````

With this, we can combine them (I prefer `dplyr::bind_rows` or `data.table::rbindlist`).

``````dplyr::bind_rows(tmp)
# 1      river B8C 9L4         john matt Phone: 111 1111 111
# 2   Field U9H 5E2 PP henry steve peter                <NA>
# 3 forest K0Y 1U9 hu2              adam Phone: 333 333 1113
# 4   ocean A1B 5H1 dd             jenna Phone: 444 111 1153
# 5               <NA>        dave kevin     Phone: 222 2222
cbind(my_data[,1,drop=FALSE], dplyr::bind_rows(tmp))
# 1  1      river B8C 9L4         john matt Phone: 111 1111 111
# 2  2   Field U9H 5E2 PP henry steve peter                <NA>
# 3  3 forest K0Y 1U9 hu2              adam Phone: 333 333 1113
# 4  4   ocean A1B 5H1 dd             jenna Phone: 444 111 1153
# 5  5               <NA>        dave kevin     Phone: 222 2222
``````

## Solution – 2

One solution based on `grep`

``````setNames(data.frame(my_data\$id,
sapply(c("^[a-z]+[a-z]+\$", "^[^P]+.*[[:digit:]]+", "Phone"), function(srch)
apply(my_data[,-1], 1, function(x)
paste(grep(srch, x, value=T), collapse=" ")))),