Taking powers of a matrix in R - A blog about math and programming

Consider a number, say 4. If we want to multiply this number by itself 3 times we can write this as either \(4 \times 4 \times 4\), or, more compactly, as \(4^3\). This operation is known as exponentiation and follows some well-known algebraic laws and conventions. For example, for any non-zero number \(x\) we have:

\(x^k = \overbrace{x \times x \times \dots \times x}^{k \text{ times}}\)
\(x^1 = x\)
\(x^0 = 1\)
\(x^{-1} = \frac{1}{x}\)
\(x^{-k} = (\frac{1}{x})^{k}\)

This is all surely familiar to the reader. What may not be directly obvious, however, is that we can meaningfully define analogous operations on invertible matrices and use the same notation:

\(A^k = \overbrace{AA \dots A}^{k \text{ times}}\): Exponentiation by positive integer is repeated multiplication.
\(A^1 = A\): Exponentiation by identity returns original matrix.
\(A^0 = I\): Exponentiation by zero returns the identity matrix.
\(A^{-1} = \text{inv}(A)\): Exponentiation by -1 returns inverse of original matrix.
\(A^{-k} = \text{inv}(A)^k\): Exponentiation by negative integer is repeated multiplication of inverse.

R does not have built-in support for these operations (e.g. for a matrix A, A^2 does not multiply A by itself, but rather squares every entry of A). In this post, therefore, we will implement this functionality. In doing so, we will touch on a number of topics:

Defining test cases to make clear exactly what goal we’re trying to achieve, and test if our proposed solution achieves that goal
Higher-order functions
Using purrr::reduce to avoid loops
Diagonalizing a matrix (which involves an eigendecomposition)
Computational complexity
Benchmarking code

Defining test cases

We start by loading some necessary packages.

library(tidyverse)
library(assertthat)
library(microbenchmark)

Next, we define a sample \(3 \times 3\) matrix that we use throughout the post.

set.seed(1)
n <- 3
(A <- matrix(sample(10, n ^ 2, TRUE), n, n))

##      [,1] [,2] [,3]
## [1,]    3   10   10
## [2,]    4    3    7
## [3,]    6    9    7

We now define a function test_properties that takes as input a matrix A and a function f that takes powers of that matrix. It then applies f to A for various exponents and checks if it gives the desired result. For example, for any implementation of a power function f we would expect that f(A, 0) (i.e. \(A\) raised to the zero power) would return the identity matrix (i.e. diag(1, nrow(A))).

test_properties <- function(f, A) {
  list(
    "A^4 == AAAA"      = are_equal(f(A,  4), A %*% A %*% A %*% A),
    "A^1 == A"         = are_equal(f(A,  1), A),  
    "A^0 == I"         = are_equal(f(A,  0), diag(1, nrow(A))),
    "A^-1 == inv(A)"   = are_equal(f(A, -1), solve(A)),
    "A^-4 == inv(A)^4" = are_equal(f(A, -4), solve(A) %^% 4)
  )
}

Since test_properties is a function that takes another function as input, we would call this a higher-order function (or a functional to be more precise).

Strategy 1: Naive repeated multiplication

We can try out test_properties on a trivial implementation of a power function. (If the reader is unfamiliar with the percentage sign-notation, that is how we define infix functions in R).

`%^%` <- function(A, k) {
  A
}

So %^% takes an A and a k as input, throws away the k and returns the A unchanged. Let’s see what tests it passes.

str(test_properties(`%^%`, A))

## List of 5
##  $ A^4 == AAAA     : logi FALSE
##  $ A^1 == A        : logi TRUE
##  $ A^0 == I        : logi FALSE
##  $ A^-1 == inv(A)  : logi FALSE
##  $ A^-4 == inv(A)^4: logi FALSE

In order to pass all the other tests we need an implementation that actually does something. The next implementation starts by defining the identity matrix I of the same dimensions as A and initializes the matrix Ak to be this matrix (this is a bit roundabout, but useful for didactic purposes). It then enters a while-loop that decrements k while multiplying Ak by A. Once k hits zero, it returns the final matrix.

`%^%` <- function(A, k) {
  I <- diag(1, nrow(A))
  Ak <- I
  while(k > 0) {
    Ak <- Ak %*% A
    k <- k - 1
  }
  Ak
}

A %*% A

##      [,1] [,2] [,3]
## [1,]  109  150  170
## [2,]   66  112  110
## [3,]   96  150  172

A %^% 2

##      [,1] [,2] [,3]
## [1,]  109  150  170
## [2,]   66  112  110
## [3,]   96  150  172

Let’s see if this implementation is any better.

str(test_properties(`%^%`, A))

## List of 5
##  $ A^4 == AAAA     : logi TRUE
##  $ A^1 == A        : logi TRUE
##  $ A^0 == I        : logi TRUE
##  $ A^-1 == inv(A)  : logi FALSE
##  $ A^-4 == inv(A)^4: logi FALSE

Ok, so it works for positive, but not negative, exponents. Let’s fix that.

`%^%` <- function(A, k) {
  I <- diag(1, nrow(A))
  if(k < 0) {
    A <- solve(A)
    k <- k * -1
  }
  Ak <- I
  while(k > 0) {
    Ak <- Ak %*% A
    k <- k - 1
  }
  Ak
}

Here we just check if k is negative, and if it is, replace A with its inverse. Everything else remains unchanged.

str(test_properties(`%^%`, A))

## List of 5
##  $ A^4 == AAAA     : logi TRUE
##  $ A^1 == A        : logi TRUE
##  $ A^0 == I        : logi TRUE
##  $ A^-1 == inv(A)  : logi TRUE
##  $ A^-4 == inv(A)^4: logi TRUE

It now passes all tests. Beautiful. As good functional programmers, however, we feel a bit queasy about using loops, so let’s use purrr::reduce to get rid of it.

`%^%` <- function(A, k) {
  I <- diag(1, nrow(A))
  if(k < 0) {
    A <- solve(A)
    k <- k * -1
  }
  reduce(seq_len(k), ~. %*% A, .init = I)
}

The first part here is identical to the earlier function, but instead of the while loop we call reduce on a dummy vector of length k with the identity matrix as the starting value. The machinery of reduce then takes care of all the details of accumulating the matrix products.

str(test_properties(`%^%`, A))

## List of 5
##  $ A^4 == AAAA     : logi TRUE
##  $ A^1 == A        : logi TRUE
##  $ A^0 == I        : logi TRUE
##  $ A^-1 == inv(A)  : logi TRUE
##  $ A^-4 == inv(A)^4: logi TRUE

Indeed, the implementation with reduce also passes our test cases.

If we stop to consider how this appropach scales with larger inputs of k, however, we notice that we always need to do exactly k matrix multiplications. So the growth rate is linear in k. Since matrix multiplication itself is very computationally demanding – between \(\mathcal{O}(n^3)\) and \(\mathcal{O}(n^{2.373})\) depending on the algorithm – this approach will only work for fairly small matrices and small values of k. Fortunately, there’s a very clever approach that involves diagonalizing A which runs in much better time.

Strategy 2: Diagonalization

It’s a remarkable fact that under the right circumstances, a matrix \(A\) can be decomposed as \(A = PDP^{-1}\), where the columns of \(P\) are the eigenvectors of \(A\) and \(D\) is a diagonal matrix with the eigenvalues of \(A\) along the main diagonal.

Consider our matrix A. Let’s find its eigenvectors and eigenvalues.

eigen(A)

## eigen() decomposition
## $values
## [1] 19.317668+0.000000i -3.158834+0.253746i -3.158834-0.253746i
## 
## $vectors
##               [,1]                  [,2]                  [,3]
## [1,] -0.6483318+0i -0.8600864+0.0000000i -0.8600864+0.0000000i
## [2,] -0.4288083+0i  0.2414274-0.2544460i  0.2414274+0.2544460i
## [3,] -0.6291179+0i  0.2882856+0.2326217i  0.2882856-0.2326217i

So \(A\) has three eigenvalues (which happen to be complex numbers) and three associated eigenvectors. If what we said above about the eigendecomposition is true, we should thus be able to do the following:

eig <- eigen(A)
P <- eig$vectors
P_inv <- solve(P)
D <- diag(eig$values)

A

##      [,1] [,2] [,3]
## [1,]    3   10   10
## [2,]    4    3    7
## [3,]    6    9    7

P %*% D %*% P_inv

##      [,1]  [,2]  [,3]
## [1,] 3-0i 10+0i 10-0i
## [2,] 4-0i  3+0i  7-0i
## [3,] 6-0i  9+0i  7-0i

Re(P %*% D %*% P_inv)

##      [,1] [,2] [,3]
## [1,]    3   10   10
## [2,]    4    3    7
## [3,]    6    9    7

So we indeed have \(A = PDP^{-1}\) (since the eigenvalues are in \(\mathbb{C}\) we have to coerce them back to \(\mathbb{R}\) using the Re function).

Now, not all matrices can be decomposed like this and the exact conditions for when a matrix is diagonalizable is a bit involved (we would have to figure out both the algebraic and geometric multiplicity of each eigenvalue, and check if they are all equal). However, there is a sufficient (but not necessary) condition for diagonalizability, viz. that an \(n \times n\) matrix is always diagonalizable if it has \(n\) distinct eigenvalues. So checking for this condition should deal with most situations. (We will ignore issues of numeric stability).

But how does all this help us with the problem of taking powers of a matrix?

Observe:

\[ A^2 = AA = PDP^{-1}PDP^{-1} = PD(P^{-1}P)DP^{-1} = PDIDP^{-1} = PDDP^{-1} = PD^2P^{-1} \]

More generally, \(A^n = PD^nP^{-1}\). And since \(D\) is diagonal, we can get \(D^n\) by simply raising the elements on the diagonal of \(D\) to \(n\), which is a very cheap operation. No messy repeated matrix multiplication!

Let’s write an implementation.

`%^^%` <- function(A, k) {
  eig <- eigen(A)
  
  # check if A is diagonalizable
  stopifnot(length(unique(eig$values)) == nrow(A))
  
  P <- eig$vectors
  D <- diag(eig$values ^ k)
  Ak <- P %*% D %*% solve(P)
  Re(Ak)
}

Let’s see if works.

str(test_properties(`%^^%`, A))

## List of 5
##  $ A^4 == AAAA     : logi TRUE
##  $ A^1 == A        : logi TRUE
##  $ A^0 == I        : logi TRUE
##  $ A^-1 == inv(A)  : logi TRUE
##  $ A^-4 == inv(A)^4: logi TRUE

Let’s also check it for some other matrix. Let \(B\) be the rotate-by-90-degrees-counterclockwise matrix. We would then expect \(B^4\) to be a full 360 degree turn, which is equivalent to doing nothing (i.e. the identity matrix).

(B <- matrix(c(0, 1, -1, 0), 2))

##      [,1] [,2]
## [1,]    0   -1
## [2,]    1    0

B %^^% 4

##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

str(test_properties(`%^^%`, B))

## List of 5
##  $ A^4 == AAAA     : logi TRUE
##  $ A^1 == A        : logi TRUE
##  $ A^0 == I        : logi TRUE
##  $ A^-1 == inv(A)  : logi TRUE
##  $ A^-4 == inv(A)^4: logi TRUE

Comparing the two strategies

We would expect the run time of the first approach to be quite fast for small values of \(k\), since we don’t have to do any eigendecomposition, but to grow more or less linearly as \(k\) increases. Conversely, the second approach should be slow for small values of \(k\) since we have to find the eigenvalues and eigenvectors if even \(k\) is very small. But once that decomposition is found, the magnitude of \(k\) should have little effect on how long it takes to compute.

We set up a vector of k running from 1 to 200, and for each value we take A to that power. We repeat that 50 times for each value of k and record how long each iteration takes. Finally, we filter out some outliers that otherwise make the results difficult to visualize.

times <- 1:200 %>%
  map_df(~
        microbenchmark(
          "repeated multiplication" = A %^% .,
          "diagonalization" = A %^^% .,
          times = 50
        ) %>%
        as_data_frame() %>%
        mutate(k = .x)
  ) %>%
  filter(time < quantile(time, .99))

To figure out for what value of \(k\) the diagonalization approach starts being superior, we run a simple regression and figure out where the two lines intersect.

m1 <- lm(time ~ k*expr, data = times)

df_intersect <-
  data_frame("expr" = "diagonalization", 
             "k"    = -1 * coef(m1)["exprdiagonalization"] / 
                           coef(m1)["k:exprdiagonalization"]) %>%
  mutate(time = predict(m1, .))

df_intersect

## # A tibble: 1 x 3
##              expr        k     time
##             <chr>    <dbl>    <dbl>
## 1 diagonalization 82.48014 220569.9

The interpretation is that for matrices of this particular size, raising it to a power less than 82 is best done with the repeated multiplication approach, and thereafter by the diagonalization approach.

Finally, we plot the results.

ggplot(times, aes(x = k, y = time / 1e3, group = expr)) +
  geom_jitter(aes(color = expr), alpha = 0.1) +
  geom_smooth(method = "lm", color = "black", size = 0.1) +
  geom_point(data = df_intersect) +
  geom_text(aes(label = sprintf("(%0.0f, %0.0f)", k, time)),
            data = df_intersect, 
            vjust = 2, hjust = 0, size = 3) +
  guides(color = guide_legend(override.aes = list(alpha = 1))) +
  theme_light() +
  theme(legend.position = c(1, 0), 
        legend.justification = c(1, 0),
        legend.background = element_rect(color = "grey70")) +
  labs(y = "Time (microseconds)", color = NULL,
       title = "Powers of a matrix",
       subtitle = paste0(
         "Time taken to raise a 3x3 matrix to the", 
         " kth power using different approaches")
       )