14  setupProject for nimble workflows

Author

Eliot McIntire

Published

June 12, 2026

See Barebones R script for the code shown in this chapter

Most SpaDES workflows begin life as an ordinary R script: load some packages, set some paths, download a few things, read a shapefile, initialize a simulation, run it. That script works – on the machine where it was written. Moving it to another computer (a colleague’s, a cluster, the version of your own machine six months from now) is where the trouble starts: a missing package, a hard-coded path that doesn’t exist, a data file that was never in the repository.

This chapter takes one such script and translates it, step by step, into the equivalent SpaDES.project::setupProject() call. The simulation it runs is identical. What changes is that the nimble version is reproducible and portable by construction: everything it needs is described declaratively, packages are reconciled in one place, and every input starts from a URL, not a local file path.

14.1 The starting point: a hand-rolled script

Here is an actual example that was written by a student. It is a “works on my machine” setup for a small LandR Biomass run. It is not wrong – it is just doing a lot of bookkeeping by hand, and most of that bookkeeping is exactly what breaks elsewhere.

Code
## 1) Packages -------------------------------------------------------
## to be portable, we must INSTALL what is missing ...
if (!requireNamespace("SpaDES.core",    quietly = TRUE)) install.packages("SpaDES.core")
if (!requireNamespace("SpaDES.project", quietly = TRUE)) install.packages("SpaDES.project")
if (!requireNamespace("terra",          quietly = TRUE)) install.packages("terra")
if (!requireNamespace("sf",             quietly = TRUE)) install.packages("sf")

## ... and then still LOAD it (installing does not attach a package)
library(SpaDES.core)
library(SpaDES.project)
library(terra)
library(sf)

## 2) Paths ----------------------------------------------------------
setPaths(
  modulePath  = "~",
  inputPath   = "~",
  outputPath  = "~",
  cachePath   = "~",
  scratchPath = "~"
)

## clean cache to avoid previous mismatches
unlink(getPaths()$cachePath, recursive = TRUE)
dir.create(getPaths()$cachePath, showWarnings = FALSE)

## 3) Download modules ----------------------------------------------
SpaDES.project::getModule(
  modules = c(
    "PredictiveEcology/Biomass_borealDataPrep",
    "PredictiveEcology/Biomass_speciesData",
    "PredictiveEcology/Biomass_core"
  ),
  modulePath = getPaths()$modulePath,
  overwrite  = TRUE
)

## 4) Download study area from Google Drive -------------------------
shapeZipID <- "1gycTXyZzgXIUGM6dyAJbmFxq8TOAm-Ik"
shapeDir   <- file.path(getPaths()$inputPath, "studyArea")
dir.create(shapeDir, recursive = TRUE, showWarnings = FALSE)
shapeZip   <- file.path(shapeDir, "studyArea.zip")

if (!file.exists(shapeZip)) {
  download.file(
    url      = paste0("https://drive.google.com/uc?export=download&id=", shapeZipID),
    destfile = shapeZip,
    mode     = "wb"
  )
}

## unzip ------------------------------------------------------------
if (length(list.files(shapeDir, pattern = "\\.shp$", recursive = TRUE)) == 0) {
  unzip(shapeZip, exdir = shapeDir)
}

## read shapefile ---------------------------------------------------
shpFile <- list.files(shapeDir, pattern = "\\.shp$",
                      recursive = TRUE, full.names = TRUE)
stopifnot(length(shpFile) == 1)

studyArea <- terra::vect(shpFile)
studyArea <- terra::project(studyArea, "EPSG:3978")

## 5) Simulation settings -------------------------------------------
times   <- list(start = 1, end = 1)
modules <- c("Biomass_borealDataPrep", "Biomass_speciesData", "Biomass_core")
objects <- list(studyArea = studyArea, studyAreaLarge = studyArea)

## 6) Initialize and run --------------------------------------------
sim <- simInit(times = times, modules = modules,
               objects = objects, paths = getPaths())
sim <- spades(sim)

14.2 The same workflow with setupProject

The same run, expressed declaratively:

Code
if (!require("pak")) install.packages("pak")
pak::pak(c("PredictiveEcology/Require@development",
           "PredictiveEcology/SpaDES.project@development"), ask = FALSE)

out <- SpaDES.project::setupProject(
  modules = c(
    "PredictiveEcology/Biomass_borealDataPrep",
    "PredictiveEcology/Biomass_speciesData",
    "PredictiveEcology/Biomass_core"
  ),
  overwrite = TRUE,
  times = list(start = 1, end = 1),
  packages = c("PredictiveEcology/LandR@development (>= 1.2.0.9002)"),
  shapeZipID = "1gycTXyZzgXIUGM6dyAJbmFxq8TOAm-Ik",
  studyArea = reproducible::prepInputs(url = shapeZipID, fun = terra::vect) |>
    terra::project("EPSG:3978"),
  studyAreaLarge = studyArea
)

finalOut <- SpaDES.core::simInitAndSpades2(out)

Roughly ninety lines become fifteen, but the point is not brevity for its own sake – it is what each line now does.

14.3 What changed, step by step

14.3.1 Packages: installing what’s missing, not just loading

A bare library(SpaDES.core) is not portable: it fails on any machine where the package isn’t already installed. So for the hand-rolled script to actually run for someone else, it cannot simply library() – it has to install what is missing and then load it:

Code
## install if missing ...
if (!requireNamespace("SpaDES.core",    quietly = TRUE)) install.packages("SpaDES.core")
if (!requireNamespace("SpaDES.project", quietly = TRUE)) install.packages("SpaDES.project")
if (!requireNamespace("terra",          quietly = TRUE)) install.packages("terra")
if (!requireNamespace("sf",             quietly = TRUE)) install.packages("sf")
## ... and still load, because installing does not attach
library(SpaDES.core)
library(SpaDES.project)
library(terra)
library(sf)

Note that the install and the load are separate steps: installing a package does not attach it. This is why the popular one-liner if (!require(x)) install.packages(x) is subtly broken on a fresh machine – require() only loads x when it is already installed, so on the very run where it installs x, it never loads it. Getting even this small dance right, for every package, is its own little chore.

That aside, the guard-and-load approach has three deeper weaknesses, and they grow with the project:

  1. Repetitive – two lines per package, by hand, kept in sync forever.
  2. Wrong sourceinstall.packages() uses whatever repositories happen to be configured, so the PredictiveEcology packages (which live on the r-universe repo) and any GitHub-only or version-pinned packages won’t be found without extra setup.
  3. Incomplete – it only installs your packages. It has no idea which packages the modules need, or at what versions, so those failures surface one at a time, deep into a run.

The nimble version replaces all of this with the packages argument to setupProject(), which installs and loads the requested packages into a project-specific library. It still needs a tiny bootstrap – the same if (!require(...)) idiom – but only to install the one tool that does the installing:

Code
if (!require("pak")) install.packages("pak")
pak::pak(c("PredictiveEcology/Require@development",
           "PredictiveEcology/SpaDES.project@development"), ask = FALSE)

After that, everything is declarative. And the important part is where the resolution happens: because setupProject() also knows the modules, it reads each module’s own package requirements (declared in their metadata) and reconciles them together with the packages you ask for – in one step, before anything runs. So a module that needs a particular version of LandR (note the @development (>= 1.2.0.9002) pin) and your own analysis needs are satisfied at once, rather than discovered one failed library() call at a time. We only list LandR explicitly; SpaDES.core, terra and the rest arrive as dependencies of the requested modules and packages.

14.3.2 Paths: from setPaths() + manual cache cleanup to defaults

The script sets five paths by hand and then deletes and recreates the cache to “avoid previous mismatches”. setupProject() establishes a sensible project directory layout for you (override it with a paths argument when you need to). And deleting the cache to avoid mismatches is treating a symptom: Cache keys on its inputs, so a correct cache doesn’t go stale (see Introduction to Cache 10) – there is nothing to clear.

14.3.3 Getting the modules

getModule() becomes the modules argument. setupProject() downloads the listed modules from their repositories as part of setup. (How modules let you compose a workflow is the subject of the other chapters in this section; here we are only moving the download step.)

14.3.4 Data: from download.file() + unzip() + vect() to prepInputs()

This is the heart of the translation. The manual version spends some forty lines turning a Google Drive id into an R object: build a URL, create a directory, conditionally download, conditionally unzip, glob for the .shp, assert there is exactly one, read it, reproject it. Every one of those steps is a place to special-case, to forget, or to get subtly wrong on a different machine.

reproducible::prepInputs() collapses all of it:

Code
studyArea = reproducible::prepInputs(url = shapeZipID, fun = terra::vect) |>
  terra::project("EPSG:3978")

prepInputs() downloads from the url, unzips if needed, locates and reads the file with fun, and – because it is built on Cache – does the download and processing once, reusing the result on every subsequent run (see prepInputs 12). The decisive shift is that the study area now starts from a URL, not a local file path. Nothing in the workflow assumes a file already exists on disk, so it runs the same way on any machine, for anyone, the first time.

Two smaller things worth noticing:

  • shapeZipID is passed as an argument to setupProject() and then referenced by studyArea. Arguments are evaluated in order, and each becomes available to those after it – which is also why studyAreaLarge = studyArea works. The whole project description stays in one self-contained call.
  • The reprojection (terra::project(..., "EPSG:3978")) is piped directly onto the loaded object, so “the study area” is defined as “this remote file, read and reprojected” – a recipe, not a file.

14.3.5 Settings and objects

times is passed straight through. The objects list disappears: studyArea and studyAreaLarge are simply named arguments now, and setupProject() routes named objects, times, paths, params, modules and packages to the right place for you.

14.3.6 Running: simInit() + spades()simInitAndSpades2()

setupProject() returns a list of fully-prepared arguments. The 2 variant of simInitAndSpades() accepts that list directly, so the two-step simInit() then spades() becomes a single simInitAndSpades2(out).

14.3.7 Why not just put everything in the .GlobalEnv?

The hand-rolled script creates studyArea, times, objects and the rest as ordinary variables in your global workspace, the .GlobalEnv. For a single, short script that is harmless. But the .GlobalEnv is special in R: it sits at the top of the search path, so everything can see it – any function, any module, anywhere, can reach an object that lives there.

That sounds convenient, and that is exactly the problem. When a module goes looking for an object and doesn’t find it where it should – in the simList – R does not stop with an error. It keeps searching outward and, if a variable of the same name happens to be sitting in the .GlobalEnv, it quietly uses that one instead and carries on. The run appears to “work”, but for the wrong reason: a genuinely missing or misnamed input has been silently papered over by a leftover global.

This has bitten us so many times that we treat it as a rule of thumb: the .GlobalEnv is fine for small, simple projects, but as soon as a project grows past a single script it becomes a common source of confusing, hard-to-reproduce bugs.

setupProject() sidesteps the trap by design. The objects you pass (studyArea, studyAreaLarge, …) are routed into the simList rather than scattered across the global workspace, so a module that asks for an input it wasn’t given fails loudly, where you can see and fix it, instead of silently borrowing whatever the .GlobalEnv happened to contain.

14.4 Why is this considered “nimble”?

The translated script is nimble because it can move:

  • Portable – every input starts from a URL or repository, so there are no machine-specific file paths to fix up.
  • Self-installing – packages (yours and the modules’) are resolved and installed together, up front.
  • Reproducible and fast on re-runprepInputs() and Cache avoid re-downloading and re-computing unchanged steps.
  • Declarative – the entire project is one call describing what is needed, not a sequence of imperative bookkeeping that must succeed in order.
  • Idempotent or Rerun tolerant – if you run the same code again you will get the same answer, but faster

These same properties are what let the richer projects in the following chapters chain many modules together without the setup collapsing under its own weight.

14.5 See also

Workflows with setupProject

reproducible::prepInputs for Data 12

Introduction to Cache 10

?SpaDES.project::setupProject

14.6 Barebones R script

Code
1) Packages -------------------------------------------------------
to be portable, we must INSTALL what is missing ...
if (!requireNamespace("SpaDES.core",    quietly = TRUE)) install.packages("SpaDES.core")
if (!requireNamespace("SpaDES.project", quietly = TRUE)) install.packages("SpaDES.project")
if (!requireNamespace("terra",          quietly = TRUE)) install.packages("terra")
if (!requireNamespace("sf",             quietly = TRUE)) install.packages("sf")

... and then still LOAD it (installing does not attach a package)
library(SpaDES.core)
library(SpaDES.project)
library(terra)
library(sf)

2) Paths ----------------------------------------------------------
setPaths(
  modulePath  = "~",
  inputPath   = "~",
  outputPath  = "~",
  cachePath   = "~",
  scratchPath = "~"
)

clean cache to avoid previous mismatches
unlink(getPaths()$cachePath, recursive = TRUE)
dir.create(getPaths()$cachePath, showWarnings = FALSE)

3) Download modules ----------------------------------------------
SpaDES.project::getModule(
  modules = c(
    "PredictiveEcology/Biomass_borealDataPrep",
    "PredictiveEcology/Biomass_speciesData",
    "PredictiveEcology/Biomass_core"
  ),
  modulePath = getPaths()$modulePath,
  overwrite  = TRUE
)

4) Download study area from Google Drive -------------------------
shapeZipID <- "1gycTXyZzgXIUGM6dyAJbmFxq8TOAm-Ik"
shapeDir   <- file.path(getPaths()$inputPath, "studyArea")
dir.create(shapeDir, recursive = TRUE, showWarnings = FALSE)
shapeZip   <- file.path(shapeDir, "studyArea.zip")

if (!file.exists(shapeZip)) {
  download.file(
    url      = paste0("https://drive.google.com/uc?export=download&id=", shapeZipID),
    destfile = shapeZip,
    mode     = "wb"
  )
}

unzip ------------------------------------------------------------
if (length(list.files(shapeDir, pattern = "\\.shp$", recursive = TRUE)) == 0) {
  unzip(shapeZip, exdir = shapeDir)
}

read shapefile ---------------------------------------------------
shpFile <- list.files(shapeDir, pattern = "\\.shp$",
                      recursive = TRUE, full.names = TRUE)
stopifnot(length(shpFile) == 1)

studyArea <- terra::vect(shpFile)
studyArea <- terra::project(studyArea, "EPSG:3978")

5) Simulation settings -------------------------------------------
times   <- list(start = 1, end = 1)
modules <- c("Biomass_borealDataPrep", "Biomass_speciesData", "Biomass_core")
objects <- list(studyArea = studyArea, studyAreaLarge = studyArea)

6) Initialize and run --------------------------------------------
sim <- simInit(times = times, modules = modules,
               objects = objects, paths = getPaths())
sim <- spades(sim)

if (!require("pak")) install.packages("pak")
pak::pak(c("PredictiveEcology/Require@development",
           "PredictiveEcology/SpaDES.project@development"), ask = FALSE)

out <- SpaDES.project::setupProject(
  modules = c(
    "PredictiveEcology/Biomass_borealDataPrep",
    "PredictiveEcology/Biomass_speciesData",
    "PredictiveEcology/Biomass_core"
  ),
  overwrite = TRUE,
  times = list(start = 1, end = 1),
  packages = c("PredictiveEcology/LandR@development (>= 1.2.0.9002)"),
  shapeZipID = "1gycTXyZzgXIUGM6dyAJbmFxq8TOAm-Ik",
  studyArea = reproducible::prepInputs(url = shapeZipID, fun = terra::vect) |>
    terra::project("EPSG:3978"),
  studyAreaLarge = studyArea
)

finalOut <- SpaDES.core::simInitAndSpades2(out)

install if missing ...
if (!requireNamespace("SpaDES.core",    quietly = TRUE)) install.packages("SpaDES.core")
if (!requireNamespace("SpaDES.project", quietly = TRUE)) install.packages("SpaDES.project")
if (!requireNamespace("terra",          quietly = TRUE)) install.packages("terra")
if (!requireNamespace("sf",             quietly = TRUE)) install.packages("sf")
... and still load, because installing does not attach
library(SpaDES.core)
library(SpaDES.project)
library(terra)
library(sf)

if (!require("pak")) install.packages("pak")
pak::pak(c("PredictiveEcology/Require@development",
           "PredictiveEcology/SpaDES.project@development"), ask = FALSE)

studyArea = reproducible::prepInputs(url = shapeZipID, fun = terra::vect) |>
  terra::project("EPSG:3978")