R data.table Vectorising Column Multiplication

R data.table Vectorising Column Multiplication

Problem Description:

Suppose I have a data.table like this (imagine it has many columns like "a1, …, a100, …" and similarly "b1, …, b100, …")

dt <- data.table(id = 1:5, a1 = runif(5), a2 = runif(5), b1 = runif(5), b2 = runif(5))

so that the output looks like this:

   id         a1         a2         b1         b2
1:  1 0.94431156 0.34668771 0.54899478 0.91512664
2:  2 0.32730005 0.87924651 0.88777763 0.90167832
3:  3 0.07438915 0.53728539 0.21463741 0.11291512
4:  4 0.23025893 0.08528074 0.68454936 0.45441690
5:  5 0.86105462 0.49976703 0.07362091 0.08834252

I want to create new columns c1, c2 such that effectively I have

dt[, c('c1', 'c2') := .(a1*b1, a2*b2)]

with output

   id         a1         a2         b1         b2         c1         c2
1:  1 0.94431156 0.34668771 0.54899478 0.91512664 0.51842212 0.31726316
2:  2 0.32730005 0.87924651 0.88777763 0.90167832 0.29056966 0.79279752
3:  3 0.07438915 0.53728539 0.21463741 0.11291512 0.01596669 0.06066765
4:  4 0.23025893 0.08528074 0.68454936 0.45441690 0.15762361 0.03875301
5:  5 0.86105462 0.49976703 0.07362091 0.08834252 0.06339162 0.04415068

How can this be achieved without the use of slow loops?

Solution – 1

Since matrix multiplication is element-wise in R, you can do the following:

dt <- data.frame(id = 1:5, a1 = runif(5), a2 = runif(5), b1 = runif(5), b2 = runif(5))
a = dt[,2:3]
b = dt[,4:5]
c = a * b
names(c) = c("c1", "c2")
cbind(dt, c)

Output

  id          a1        a2        b1        b2          c1         c2
1  1 0.082863389 0.7108292 0.8952547 0.4530363 0.074183837 0.32203140
2  2 0.125423227 0.8957771 0.2231827 0.1042432 0.027992292 0.09337865
3  3 0.278592590 0.9317453 0.7910442 0.3729406 0.220379066 0.34748565
4  4 0.004518196 0.3890797 0.5323291 0.7997701 0.002405167 0.31117430
5  5 0.784290484 0.5499781 0.7429104 0.8106772 0.582657582 0.44585471

Solution – 2

Taking @GoldenGateBridge ‘s answer into account, and assuming you don’t know how big your table can be, you could generalize in the form:

set.seed(10)
dt <- data.frame(id = 1:5, 
                 a1 = runif(5),
                 a2 = runif(5),
                 a3 = runif(5),
                 a4 = runif(5),
                 a5 = runif(5),
                 # ....
                 b1 = runif(5), 
                 b2 = runif(5),
                 b3 = runif(5),
                 b4 = runif(5),
                 b5 = runif(5)
                 # ....
)

n <- 1:((ncol(x)-1)/2)
ind_a <- paste0("a", n)
ind_b <- paste0("b", n)
ind_c <- paste0("c", n)

producto <- dt[ind_a]*dt[ind_b]
names(producto) <- ind_c
dt <- cbind(dt, producto)

I hope it is useful.

Solution – 3

We can try split.defult to split dt by column names and conduct element-wise multiplication in turn

dt[, c(
  dt,
  lapply(
    split.default(.SD, paste0("c", gsub("\D+", "", names(.SD)))),
    function(v) do.call(`*`, v)
  )
),
.SDcols = patterns("\d$")
]
Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject