How to extract row features, multiply respective rows and add single feature as column?

How to extract row features, multiply respective rows and add single feature as column?

Problem Description:

I have a dataset that looks something like this:

  id      col1    col2  col3  col4
1  1    12 ABC   Henry  Alex 13 AB
2  2       123      12 David   344
3  3      John     567  Luke  Huh8
4  4 123344567 abc 123  Paul    98
5  5  1345677.     Sam  17df   Tom
    

Goal: For each row, I would like to take every cell that does not contain a numerical value, and create new columns from the existing values of that row:

   Name      col1    col2 col3  col4
1 Henry    12 ABC    <NA> <NA> 13 AB
2  Alex    12 ABC    <NA> <NA> 13 AB
3 David       123      12 <NA>   344
4  John      <NA>     567 <NA>  Huh8
5  Luke      <NA>     567 <NA>  Huh8
6  Paul 123344567 abc 123 <NA>    98
7   Sam   1345677    <NA> 17df  <NA>
8   Tom   1345677    <NA> 17df  <NA>

Based on the nature of this question, I think the two following concepts can be used:

  • To determine if a column contains a numerical value, the following code can be used: grepl("\d", my_data$col1)

  • I think some form of "pivot_wider" and "pivot_longer" might be applicable, but I am not sure exactly how to do this.

Can someone please show me how to do this?

Data

my_data <- structure(list(id = 1:5, col1 = c("12 ABC", "123", "John", "123344567", 
"1345677."), col2 = c("Henry", "12", "567", "abc 123", "Sam"), 
    col3 = c("Alex", "David", "Luke", "Paul", "17df"), col4 = c("13 AB", 
    "344", "Huh8", "98", "Tom")), class = "data.frame", row.names = c(NA, 
-5L))

Solution – 1

It’s pretty messy and I think there must be more simple way, but you may try

library(tidyverse)
Name <- unlist(t(my_data), use.names = F)[!grepl("\d", unlist(t(my_data)))]
key <- unname(sapply(Name, function(x) {c(1:nrow(my_data))[apply(my_data, 1, function(y) any(y %in% x))]}))
cbind(Name, my_data[key,]) %>%
  mutate(across(-Name, ~ifelse(grepl("\d", .), ., NA))) %>%
  select(-id) %>%
  `rownames<-`(1:length(key))

   Name      col1    col2 col3  col4
1 Henry    12 ABC    <NA> <NA> 13 AB
2  Alex    12 ABC    <NA> <NA> 13 AB
3 David       123      12 <NA>   344
4  John      <NA>     567 <NA>  Huh8
5  Luke      <NA>     567 <NA>  Huh8
6  Paul 123344567 abc 123 <NA>    98
7   Sam  1345677.    <NA> 17df  <NA>
8   Tom  1345677.    <NA> 17df  <NA>

Solution – 2

Making a boolean matrix bo using grepl in apply. Then in Map cbind the name identified by which and replace with NA.

bo <- apply(my_data, 1:2, (x) !grepl('\d', x))

Map((x, y, z) {
  lapply(y, (i) cbind(Name=my_data[x, i], replace(my_data[x, ], y, NA))) |>
    do.call(what=rbind)
}, 
seq_len(nrow(my_data)), 
apply(bo, 1, which)) |>
  c(make.row.names=FALSE) |>
  do.call(what=rbind)
#    Name id      col1    col2 col3  col4
# 1 Henry  1    12 ABC    <NA> <NA> 13 AB
# 2  Alex  1    12 ABC    <NA> <NA> 13 AB
# 3 David  2       123      12 <NA>   344
# 4  John  3      <NA>     567 <NA>  Huh8
# 5  Luke  3      <NA>     567 <NA>  Huh8
# 6  Paul  4 123344567 abc 123 <NA>    98
# 7   Sam  5  1345677.    <NA> 17df  <NA>
# 8   Tom  5  1345677.    <NA> 17df  <NA>

Solution – 3

A little inefficient, but simpler to understand,

new_data <- cbind(Names="", my_data)
new_data <- new_data[0,]

for (row in 1:nrow(my_data)) {
    temp_row <- my_data[row,]
    names <- vector()
    for (val in 1:ncol(temp)) {
        if (!grepl("\d", temp_row[val])) {
            names <- append(temp_row[val], names)
            temp_row[val] <- NA
        }
    }
    for (name in 1:length(names)) {
        new_data[nrow(new_data)+1,] <- temp_row
        new_data[nrow(new_data),]["Names"] <- names[name]
    }
}
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject