10  Introduction to Cache

Author

Eliot J. B. McIntire

Published

June 10, 2026

See Barebones R script for the code shown in this chapter

The objective of a reproducible workflow is likely that the entire work flow from raw data to publication, decision support, report writing, presentation building etc., can be built and be reproducible anywhere, on any computer and operating system, with any starting conditions, on demand.

As part of a reproducible workflow, caching of function calls, code chunks, and other elements of a project can be very valuable. Caching allows a code writer to run all code regularly without “secretly” skipping certain lines because they take too long to run. This has 2 benefits.

  1. Each line gets run regularly so when failures crop up, they are detected quickly and can be fixed when they are introduced.

  2. Running code “somewhere else” (different machine, person, operating system) will be more likely to work on an ongoing basis, i.e., it will help maintain a “reproducible” state.

These benefits allow code to be constantly in a state of functioning, from start to finish, which lowers the effort “at the end” to make all the work reproducible.

The reproducible::Cache() function is built to work with many R functions, including some that are used for their side effects or that use pointers (e.g., a terra SpatRaster) instead of regular R objects.

Code
repos <- c("https://predictiveecology.r-universe.dev", getOption("repos"))
options(repos = repos)
if (!require("pak")) install.packages("pak")
pak::pak(c("Require", "SpaDES.project"), ask = FALSE)

library(SpaDES.project)
out <- setupProject(
  packages = c("reproducible", "terra"),
  options = list(repos = repos),
  paths = list(projectPath = "~/SpaDES_book/Caching")
)

10.1 How to use Cache

The Cache function can be used with any function. A user can wrap it around another function call, use the base pipe operator |> or specify the function and function arguments as arguments to Cache. The following calls are the same, and thus the 2nd and 3rd will result in the identical output to the first:

Code
library(reproducible)
reproducible::Cache(rnorm(1))
[1] 0.4757227
attr(,".Cache")
attr(,".Cache")$newCache
[1] FALSE

attr(,"tags")
[1] "cacheId:ca275879d5116967"
attr(,"callInCache")
[1] ""
Code
rnorm(1) |>
  reproducible::Cache()
[1] 0.4757227
attr(,".Cache")
attr(,".Cache")$newCache
[1] FALSE

attr(,"tags")
[1] "cacheId:ca275879d5116967"
attr(,"callInCache")
[1] ""
Code
reproducible::Cache(rnorm,
                    n = 1)
[1] 0.4757227
attr(,".Cache")
attr(,".Cache")$newCache
[1] FALSE

attr(,"tags")
[1] "cacheId:ca275879d5116967"
attr(,"callInCache")
[1] ""

See ?reproducible::Cache for many more examples.

10.2 When to use Cache

The most obvious case to use Cache is when the calculation is expensive. For example, GIS operations are commonly time consuming. In the example below, we will use terra::project three times, with timings.

10.2.1 From disk

Code
# Data setup
library(terra)
tmpDir <- file.path(tempdir(), "reproducible_examples", "Cache")
dir.create(tmpDir, recursive = TRUE)

x <- y <- 2001
ras <- terra::rast(terra::ext(0, x, 0, y), vals = sample(1:(x*y)), res = 1)
terra::crs(ras) <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +datum=WGS84"
newCRS <- "+init=epsg:4326" # A longlat crs

# Call slow operation: project
# No Cache
system.time(map1 <- terra::project(ras, newCRS)) # Warnings due to new PROJ
   user  system elapsed 
  0.789   0.014   0.805 
Code
# With Cache -- a little slower the first time because saving to disk
system.time(map2 <- terra::project(ras, newCRS) |> Cache())
   user  system elapsed 
  5.973   0.150   6.129 
Code
# faster the second time; improvement depends on size of object and time to run function
system.time(map3 <- terra::project(ras, newCRS) |> Cache())
   user  system elapsed 
  0.346   0.005   0.362 

In this example, the object map3 gets cached, along with the evaluation of the function terra::project and its arguments (ras and newCRS). If either the function or supplied arguments change, Cache repeats the operation and re-caches the output in a new cache entry.

10.2.2 From disk and memory

We can set an option so that objects will also be saved to disk as normal, but they will also be saved as RAM objects – “memoising”.

Code
optOrig <- options("reproducible.useMemoise" = TRUE)
system.time(map4 <- terra::project(ras, newCRS) |> Cache())
   user  system elapsed 
  0.489   0.030   0.527 
Code
system.time(map5 <- terra::project(ras, newCRS) |> Cache())
   user  system elapsed 
  0.284   0.042   0.334 
Code
options(optOrig)

10.3 Where does the cache live?

By default, on a temporary folder which does not persist between R sessions. To see where this folder is run:

Code
options("reproducible.cachePath")
$reproducible.cachePath
[1] "/home/emcintir/SpaDES_book/Caching/cache"

For a persisting cache, we should change to a permanent folder path. This can be done by in two ways:

Code
Cache(rnorm(1), 
      cachePath = "~/SpaDES_book/cache")
options("reproducible.cachePath")   ## still the temporary directory


options("reproducible.cachePath" = "~/SpaDES_book/cache")
Cache(rnorm(1))

The second, using options sets the cachePath for any subsequent Cache call.

In a SpaDES workflow context, the cached directory can be set by passing the list(..., cachePath = <a_path>)1 to the paths argument in setupProject, simInit or simInitAndSpaDES.

10.4 Caching examples

10.4.1 Basic cache use with tags

We can add tags to identify the Cache call.

Code
ranNumsA <- Cache(rnorm, 4, userTags = c("random number generator"))

showCache(userTags = "random number generator")
             cacheId              tagKey                  tagValue
              <char>              <char>                    <char>
 1: adf21923cd1e50d0            function                     rnorm
 2: adf21923cd1e50d0            userTags   random number generator
 3: adf21923cd1e50d0            accessed 2026-05-26 14:23:33.99418
 4: adf21923cd1e50d0             inCloud                     FALSE
 5: adf21923cd1e50d0   elapsedTimeDigest          0.002181053 secs
 6: adf21923cd1e50d0           preDigest     .FUN:4f604aa46882b368
 7: adf21923cd1e50d0           preDigest     mean:c40c00762a0dac94
 8: adf21923cd1e50d0           preDigest        n:7eef4eae85fd9229
 9: adf21923cd1e50d0           preDigest       sd:853b1797f54b229c
10: adf21923cd1e50d0               class                   numeric
11: adf21923cd1e50d0         object.size                        80
12: adf21923cd1e50d0            fromDisk                     FALSE
13: adf21923cd1e50d0          resultHash                          
14: adf21923cd1e50d0 elapsedTimeFirstRun          0.001419544 secs
15: adf21923cd1e50d0            accessed 2026-05-26 14:25:12.21082
16: adf21923cd1e50d0            accessed 2026-05-26 15:14:20.18162
17: adf21923cd1e50d0            accessed 2026-05-26 15:43:55.72938
18: adf21923cd1e50d0            accessed 2026-05-26 16:29:38.38299
19: adf21923cd1e50d0            accessed 2026-05-26 16:54:02.87339
20: adf21923cd1e50d0            accessed 2026-06-10 13:58:57.45605
             cacheId              tagKey                  tagValue
              <char>              <char>                    <char>
                   createdDate
                        <char>
 1: 2026-05-26 14:23:33.996704
 2: 2026-05-26 14:23:33.996704
 3: 2026-05-26 14:23:33.996704
 4: 2026-05-26 14:23:33.996704
 5: 2026-05-26 14:23:33.996704
 6: 2026-05-26 14:23:33.996704
 7: 2026-05-26 14:23:33.996704
 8: 2026-05-26 14:23:33.996704
 9: 2026-05-26 14:23:33.996704
10: 2026-05-26 14:23:33.996704
11: 2026-05-26 14:23:33.996704
12: 2026-05-26 14:23:33.996704
13: 2026-05-26 14:23:33.996704
14: 2026-05-26 14:23:33.996704
15: 2026-05-26 14:25:12.210919
16: 2026-05-26 15:14:20.181741
17: 2026-05-26 15:43:55.729499
18: 2026-05-26 16:29:38.383091
19: 2026-05-26 16:54:02.873481
20: 2026-06-10 13:58:57.456166
                   createdDate
                        <char>

10.5 Clean up cache

We can use either clearCache, keepCache, or cc to remove things from the Cache database. clearCache removes everything that matches the query. keepCache keeps everything that matches the query. cc removes the most recent entry (i.e., it is a shorthand for a commonly used option).

Code
# Two different functions
a <- rnorm(1) |> Cache()
b <- runif(1) |> Cache()

# Clear only the first one
clearCache(userTags = "rnorm", ask = FALSE)
a2 <- rnorm(1) |> Cache()
b2 <- runif(1) |> Cache()

# b2 and b are still identical; a and a2 are not
a == a2 # FALSE
b == b2 # TRUE

# This time keep ONLY the rnorm
keepCache(userTags = "rnorm", ask = FALSE)
a3 <- rnorm(1) |> Cache()
b3 <- runif(1) |> Cache()

# b2 and b are still identical; a and a2 are not
a2 == a3 # TRUE
b2 == b3 # FALSE

10.6 Nested Caching

Nested caching is when Caching of a function occurs inside an outer function, which is itself cached. This is a critical element to working within a reproducible work flow. Ideally, at all points in a development cycle, it should be possible to get to any line of code starting from the very initial steps, running through everything up to that point, in less than a few seconds. If the workflow can be kept very fast like this, it will be very likely to work at any point it is tested.

In the example here, we run an outer function that calls an inner function. If we decide to change the outer function along the way, and the inner function is unaffected, then we can still recover the cached version of the inner call.

Warning this will not necessarily work the other way, i.e., if inner is changed, we won’t notice until we clearCache and rerun.

Code
# Make 2 functions
inner <- function(mean) {
  d <- 1
  Cache(rnorm(3, mean = mean))
}
outer <- function(n) {
  Cache(inner(0.1))
}

# Call outer function
Cache(outer(n = 2))
[1] -0.08168648 -1.27280341  2.19653370
attr(,".Cache")
attr(,".Cache")$newCache
[1] FALSE

attr(,"tags")
[1] "cacheId:2d26a68d5154433e"
attr(,"callInCache")
[1] ""
Code
# Change outer function
outer <- function(n) {
  a <- 0.1
  Cache(inner(a))
}

# Still recovers inner 
Cache(outer(n = 2))
[1] -0.08168648 -1.27280341  2.19653370
attr(,".Cache")
attr(,".Cache")$newCache
[1] FALSE

attr(,"tags")
[1] "cacheId:554ba414000bf37f"
attr(,"callInCache")
[1] ""
Code
# BUT if we change the inner, it won't work
inner <- function(mean) {
  d <- 2                        # Changed d
  Cache(rnorm(3, mean = mean))
}
Cache(outer(n = 2))
[1] -0.08168648 -1.27280341  2.19653370
attr(,".Cache")
attr(,".Cache")$newCache
[1] FALSE

attr(,"tags")
[1] "cacheId:554ba414000bf37f"
attr(,"callInCache")
[1] ""

10.7 Best practices

In general, we have found that use of Cache to be beneficial when used as following:

  1. Slow functions get cached. “Slow” can be “slower than using Cache

  2. Regularly clearCache(ask = FALSE), e.g., at the end of a day or week of work, then let it run.

Some of our team regularly add:

  1. If using SpaDES, use cache at the event level, if the event is non-stochastic

  2. Don’t cache a simInit call – instead implement internal caching in the modules and use event caching. See vignette(topic = "iii-cache", package = "SpaDES.core")

10.8 See also

SpaDES.core vignette on caching

10.9 Barebones R script

Code
repos <- c("https://predictiveecology.r-universe.dev", getOption("repos"))
options(repos = repos)
if (!require("pak")) install.packages("pak")
pak::pak(c("Require", "SpaDES.project"), ask = FALSE)

library(SpaDES.project)
out <- setupProject(
  packages = c("reproducible", "terra"),
  options = list(repos = repos),
  paths = list(projectPath = "~/SpaDES_book/Caching")
)

library(reproducible)
reproducible::Cache(rnorm(1))

rnorm(1) |>
  reproducible::Cache()

reproducible::Cache(rnorm,
                    n = 1)

# Data setup
library(terra)
tmpDir <- file.path(tempdir(), "reproducible_examples", "Cache")
dir.create(tmpDir, recursive = TRUE)

x <- y <- 2001
ras <- terra::rast(terra::ext(0, x, 0, y), vals = sample(1:(x*y)), res = 1)
terra::crs(ras) <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +datum=WGS84"
newCRS <- "+init=epsg:4326" # A longlat crs

# Call slow operation: project
# No Cache
system.time(map1 <- terra::project(ras, newCRS)) # Warnings due to new PROJ

# With Cache -- a little slower the first time because saving to disk
system.time(map2 <- terra::project(ras, newCRS) |> Cache())

# faster the second time; improvement depends on size of object and time to run function
system.time(map3 <- terra::project(ras, newCRS) |> Cache())

optOrig <- options("reproducible.useMemoise" = TRUE)
system.time(map4 <- terra::project(ras, newCRS) |> Cache())

system.time(map5 <- terra::project(ras, newCRS) |> Cache())
options(optOrig)

options("reproducible.cachePath")



ranNumsA <- Cache(rnorm, 4, userTags = c("random number generator"))

showCache(userTags = "random number generator")

# # Two different functions
# a <- rnorm(1) |> Cache()
# b <- runif(1) |> Cache()
# 
# # Clear only the first one
# clearCache(userTags = "rnorm", ask = FALSE)
# a2 <- rnorm(1) |> Cache()
# b2 <- runif(1) |> Cache()
# 
# # b2 and b are still identical; a and a2 are not
# a == a2 # FALSE
# b == b2 # TRUE
# 
# # This time keep ONLY the rnorm
# keepCache(userTags = "rnorm", ask = FALSE)
# a3 <- rnorm(1) |> Cache()
# b3 <- runif(1) |> Cache()
# 
# # b2 and b are still identical; a and a2 are not
# a2 == a3 # TRUE
# b2 == b3 # FALSE

# Make 2 functions
inner <- function(mean) {
  d <- 1
  Cache(rnorm(3, mean = mean))
}
outer <- function(n) {
  Cache(inner(0.1))
}

# Call outer function
Cache(outer(n = 2))

# Change outer function
outer <- function(n) {
  a <- 0.1
  Cache(inner(a))
}

# Still recovers inner 
Cache(outer(n = 2))

# BUT if we change the inner, it won't work
inner <- function(mean) {
  d <- 2                        # Changed d
  Cache(rnorm(3, mean = mean))
}
Cache(outer(n = 2))

  1. where ... are other paths, like modulePath .↩︎