8 Modules, Events and Functions

Author

Eliot McIntire

Published

July 5, 2024

See Barebones R script for the code shown in this chapter

A module is a collection of R scripts structured in a particular way. A module is very similar in structure to an R package, in the sense that it defines a collection of functions to be used, so will be familiar if you know what packages look like.

Events are ways to name chunks of code that can be run in a particular sequence or at a particular time.

Functions are the basic building blocks in R and other languages. We work with all three of these to make robust and re-useable workflows.

Modules encapsulate events, which encapsulate functions.

8.1 Modules

Modules include the following elements:

a function call to defineModule that defines the metadata (mandatory)
a function definition for doEvent.moduleName (mandatory)

There are many optional pieces too. The default template produces many of these optional pieces, which means it is “noisy”. This also means we can ignore most of it for now.

Optional pieces include:

other functions in the R folder or the main moduleName.R file
a documentation file (moduleName.rmd)

These are all contained within a file structure like this, with other optional files:

/moduleRepository
  |_ moduleName/
      |_ R/                     # contains additional/optional .R (helper) files
      |_ data/                  # directory for all included data
          |_ CHECKSUMS.txt      # contains checksums for data files
      |_ tests/                 # contains (optional) unit tests for module code
      |_ citation.bib           # bibtex citation for the module
      |_ LICENSE.txt            # describes module's legal usage
      |_ moduleName.R           # module code file (incl. metadata)
      |_ moduleName.Rmd         # documentation, usage info, etc.

To make a new module, see: Chapter 4.

How to decide whether to make a module, or expand an existing one?

There’s no fixed rule about when it is best to create a module vs. expand another existing module. This decision will depend on various factors, but at least two questions should be considered when making a decision:

Is the additional code a very small operation, or something that will never be disassociated from a current module? If the answer is “yes”, then perhaps expanding the existing module is the way to go.
Can we foresee that swapping between algorithms/approaches will be desired in the future? E.g., knowing that an ordinary least square regression could, in the future, be swapped by a mixed effects model to compare model performance – i.e. both approaches are valid and we want eventually to have both. If the answer is “yes”, then creating a new module will allow greater flexibility in the future.

We usually see “data preparation” and “model calibration” steps as semi-independent from “prediction/simulation steps” and have typically created “data preparation”/“calibration” modules to keep these steps separate, and to allow swapping between different approaches to preparing inputs for a model.

8.2 Events

Events are named chunks of code that can be scheduled to be run once or many times. These are scheduled with scheduleEvent(). There are several commonly occurring module types that can be grouped based on the events that they contain (see Section 8.4).

To see how to schedule events, see: Chapter 9.

8.3 Functions

Essentially, everything in R is or uses functions. Modules define events (doEvent.*moduleName*), events call specific functions (we some times call the main function an event calls an “event function”, but there can be other nested functions within these), and functions can be defined within modules, R packages, or user code contained in .R scripts inside the module’s R/ folder.

We would not define functions in the .GlobalEnv.

8.3.1 Adding functions

Any function can be written and defined in any .R file in the R folder, just like an R package. They can also be placed in the main module script (the one named <moduleName>.R). The default template includes several functions.

In this example, instead of writing the code that does what we want inside the doEvent function itself, lets use the Init function. We move the code from the “init” event into a function, then call that function.

Code

doEvent.My_linear_model.init <- function(sim, eventTime, eventType, priority) {
  sim <- Init(sim)
  return(invisible(sim))
}

Init <- function(sim) {
  y <- sim$x + rnorm(10)
  sim$model <- lm(y ~ sim$x)
  return(invisible(sim))
}

We can extend this to any number of functions. Notice, that functions can have any arguments. They don’t have to have sim. The critical point to retain is that the doEvent function must return sim. We can also write documentation using roxygen2 tags (though converting this to normal R documentation is still experimental within a SpaDES module).

Copy these three functions to your module, replacing the existing functions with these ones (for doEvent.My_linear_model.init and Init). generateY is completely new, so it won’t replace anything.

Code

doEvent.My_linear_model.init <- function(sim, eventTime, eventType, priority) {
  sim <- Init(sim)
  return(invisible(sim))
}

# Use this inside the `doEvent.My_linear_model.init` function
Init <- function(sim) {
  y <- generateY(sim$x)
  sim$model <- lm(y ~ sim$x)
  return(invisible(sim))
}

#' A function that generates random error around an `x`
#' 
#' @param x Any numeric vector of length 10 
generateY <- function(x) {
  x + rnorm(10)
}

8.4 Module types

SpaDES doesn’t have explicit module types. Rather, by convention, we associate different modules with generic things they do. A fairly comprehensive list of modules types that we create are below. We will revisit this later in [Part 3](ContinuousWorkflows.qmd).

8.4.1 Static

Static modules can be defined as modules that “run once”. This means that they may have only one event. Or a sequence of events that occur one after the other with no rescheduling.

These could include:

Data preparation modules - one (maybe just the “init” event) or few events and their primary goal is to get and deal with data;
GIS modules that do a number of GIS operations to get data into the necessary formats;
Data Visualization modules that specialize in creating a set of visuals from a known set of inputs.

8.4.2 Dynamic

Dynamic modules are modules that have events that recur. There are at least 2 types of such models: those that have cyclic dependencies, i.e., its outputs are also its inputs (possibly with other modules in between) and those that do not.

Landscape simulation modules (e.g., wildfire, vegetation change)
Wildlife population modules with Markov dependency (e.g., population matrix models)
Wildlife population modules without Markov-dependency (e.g., population models that only depend on habitat covariates)
Data visualization modules that get used e.g., annually after other modules.

In Barros et al. (2023) we classified modules with respect to what the modules try to accomplish:

“data/calibration modules” prepare model inputs and parameters
“prediction/simulation modules” generate predictions using either static or dynamic mechanisms
“validation modules” evaluate predictions against independent data.

There are no strict rules to classify a SpaDES module, just as there are no strict rules to classify an R script.

8.5 Where modules “live”

Modules can live only locally (e.g. in a computer), but in the spirit of the PERFICT principles (see Section 2.1 in the Introduction) we like our modules to be shareable.

The best way to share modules is to host them on GitHub repositories, as SpaDES tools have been developed to download modules from GitHub.

Most SpaDES modules known to us are hosted in their own GitHub repositories (preferred), or bundled in GitHub repositories with several modules (e.g. Castor modules and SCFM modules).