5  Module Files and Metadata

Author

Eliot McIntire

Published

June 20, 2024

See Barebones R script for the code shown in this chapter

5.1 Continue example – a linear model

We will start by thinking about metadata: What is metadata?

Slightly modifying the example, we remove the line with x <- rnorm(10). This will make the code chunk not work because it needs the x to run the next line. We can examine the following code chunk. First, we ask: what are the inputs and the outputs?

Code
# create some data
y <- x + rnorm(10)
# fit a linear model
model <- lm(y ~ x)

5.2 Input Expectations, Output Creations and Required Packages

5.2.1 Inputs and Outputs

We use the terms expectsInput and createsOutput to describe the inputs and outputs in the metadata. This makes it clear that the metadata do not specify which they will go get; rather, it makes it clear that it doesn’t matter where the inputs are coming from. They could come from one of three sources: a user, another module, or defaults that the developer sets up. Likewise, the module specifies which outputs it creates, without specifying “for what other module”.

The inputs to this chunk are just one: the object x. This code will not work (i.e., it will cause an error) if x is not defined. We can say that this code chunk “expects” x as an input.

The outputs are y and model. We can say that this code chunk “creates” y and model as outputs. However, we had said in Chapter 4 that we would only be interested in keeping model. So, we will continue with only one output, model.

5.2.2 Required Packages

Next, what are the package dependencies?. We call this reqdPkgs in the SpaDES metadata. We see that there are three functions: rnorm, lm and plot. We don’t know what packages they are in, so we can find out by typing them at the R prompt. At the bottom of the function, it says that the function rnorm is in the stats package. Fortunately for us, this is a default (“base”) package in R and it is always pre-loaded. So, nothing to do here.

Code
> rnorm function (n, mean = 0, sd = 1)  ... 
<environment: namespace:stats>}

So, our expectations, creations and dependencies are:

  • Inputs: x
  • Outputs: model
  • Package dependencies: Base packages only

We will next put them into the correct places in the new SpaDES module.

5.3 Module files

Make the module again (see Chapter 4). This time we will add sim$ for the x as we are now interested in the fact that it might be coming from outside this module.

Code
Require::Require(c("reproducible", "SpaDES.core"), repos = c("https://predictiveecology.r-universe.dev", getOption("repos")))

# make a module
modulePath <- "~/SpaDES_book/NewModuleIntro/NewModule"
SpaDES.core::newModule(name = "My_linear_model", path = modulePath, open = FALSE,
                       events = list(
                         init = {
                           y <- sim$x + rnorm(10)      # <--------- add sim$ here
                           # fit a linear model
                           sim$model <- lm(y ~ sim$x)  # <--------- add sim$ here
                         }
                       ))
newModule() is not part of the “workflow”

Be aware that every time newModule() is run with the same name and path arguments, it will overwrite your module folder/files.

As such, it is not meant to be part of a workflow. It is meant to provide the user with a tool to create module templates (once), which will then be changed as the module code develops.

Where is this module code? In the previous chapter, we didn’t look or care where the module code was.

newModule actually creates a new folder, with the name as provided by the argument, in the folder specified with path. This folder has several files in it. See ?newModule for details. For now, run the above and open the My_linear_model.R script that it creates.

When we make a module, we get a message stating where the module code is. From here, open the file, e.g., by copy-pasting the file path (pick the .R file NOT the .Rmd file for now).

You can also set newModule(..., open = TRUE) to have RStudio open the .R and .Rmd files automatically.

Opening a module file

From here onward, we will need to manually open the module code file.

Every SpaDES module is defined by having at least 1 file that is named with <modulePath>/<moduleName>/<moduleName>.R. Even though we can do a lot with newModule(), we will need to get used to opening, examining and changing the code in the module code file.

We will look at a few elements on the module R script in this chapter.

5.4 inputObjects

Scroll down to inputObjects and expectsInputs(). This is where we will put our inputs that we noticed in our code chunk. We will declare x as an “input” by putting it there, like this:

Code
inputObjects = bindrows(
  expectsInput(objectName = "x", objectClass = "numeric", 
               desc = "The inputs for the linear model", sourceURL = NA)
)

5.5 outputObjects

Next, scroll down to outputObjects and createsOutput(). We will declare model as an “output” by putting it there. Don’t forget the comma at the end of each createsOutput() as each is an argument to bindrows (unless it is the last one).

Code
outputObjects = bindrows(
  createsOutput(objectName = "model", objectClass = "lm", 
                desc = "A linear model object from the equation (x ~ y)")
)

Note that each input and output object gets a expectsInput or createsOutput entry, respectively.

5.6 Default Values

Recall, we don’t have a value for x, unlike in the previous chapter where we had x defined in the init event. This means that if you run the following, you will get an error.

Code
out <- simInit(modules = "My_linear_model", paths = list(modulePath = modulePath))
out <- spades(out)

# or 
out <- simInitAndSpades(modules = "My_linear_model", paths = list(modulePath = modulePath))

out$model

Just like functions in R, we can supply default values for our module inputs. We put these in a function at the bottom called .inputObjects(). See Chapter 6 for a model detailed explanation of module inputs and how to deal with them.

Copy this to the module, replacing the template .inputObjects() function.

Code
.inputObjects <- function(sim) {
  if (!suppliedElsewhere("x", sim))
    sim$x <- rnorm(10, mean = 20, sd = 2)
  return(invisible(sim))
}

!suppliedElsewhere("x", sim) will check if x is in sim and if not, will run the subsequent code lines (see ?suppliedElsewhere).

After saving the module R script, we can run the module and inspect the output

Code
out <- simInitAndSpades(modules = "My_linear_model", paths = list(modulePath = modulePath))

out$model

5.7 Sharing inputs and outputs

You may have noticed that the init event is now placed into a function called doEvent.My_linear_model.init and it has an argument sim. If you are familiar with making functions in R, this is just a named argument "sim". This means that we can use sim inside the function. We don’t know what sim is yet, and we don’t know how to use it fully yet. But we do know that we have added sim$ to the init event. We also see that there is a return(sim) added at the bottom of the event function.

An event occurs in a special function that starts with doEvent.. But, since it is “in a function”, each event has all the features of a function:

  • arguments – specifically sim can be used

  • something returned – in this case, the simList. return(sim) must always be present - it is by default, but don’t delete it!

  • it can be run– but don’t worry about running it; SpaDES runs it when it is time.

The simList

The simList is the data structure that is the foundation of SpaDES. It can be used like a list; accessing objects can be done with $ or [["object_name"]], for example, and objects are manipulated as they would normally – e.g. if DF is a data.frame one would use sim$DF[1, 3] to extract the value on the first row and third column.

In any function that has an argument sim, we pass the simList, so we can access it with sim$ from inside such a function.

See Chapter 7 for more details about the simList.

To share objects between modules, we must assign them to the sim and make sure that return(sim) is at the end of the function.

Now we have a module that creates one object, model and puts them inside sim. This all happens in the event called init.

Next: add the visualization module with metadata

5.8 Adding a new module: visualization module

We remake the second module from last chapter. But this time we will look at and update the metadata.

Code
newModule("visualize", path = modulePath, open = FALSE,
          events = list(
            init = {
              plot(sim$model)
            }
          )
)

5.8.1 Outputs of one module are Inputs of another

Here we start to see the “shared” objects. The module we just made above createsOutput of model. But this new visualization module will expectInput of model. So, we can copy the same description if it is the same.

Code
inputObjects = bindrows(
  expectsInput(objectName = "model", objectClass = "lm", 
               desc = "A linear model object from the equation (x ~ y)", sourceURL = NA)
)

5.9 Run the new module

Now, we have inputs and outputs defined, our code has been places in 2 spots (events), and we have default value for x.

Code
simInitAndSpades(modules = c("My_linear_model", "visualize"), 
                 paths = list(modulePath = modulePath))

We now have a SpaDES module that has metadata, generates random starting data (if the user doesn’t supply an alternative), fits a linear model, outputs that model, and plots the fit.

5.10 Questions

  1. What are some things we “gained” from putting our simple 3 lines of code into a module?

    • We can turn off plotting easily. Set .plotInitialTime = NA in the simInitAndSpades call.
  2. What are some things we “lost”?

    • More complicated. (overkill for these 3 lines?)
  3. What if we used an R package that wasn’t in the base packages list?

    • See ?defineModule for all the metadata items. Specifically, see reqdPkgs.
  4. What is the sim? See ?'.simList-class'

5.11 Try on your own

  1. Fill in the metadata from the Challenges you did in previous chapter.

  2. Look at the other elements of the metadata and cross reference them with ?defineModule

  3. Look at the Rmd file of one of the modules that has been built (recall the message after you call newModule), where you have filled in the metadata. Try to build it and look at the automatic tables that get built from the metadata.

5.12 Common mistakes

Some common mistakes/bugs that module developers encounter:

  • Object doesn’t exist/is NULL. Errors like the ones below are usually the result of not providing default values for an input in the simList (via .inputObjects()), the user/another module not providing values for that object, or forgetting to assign an object to the simList (i.e. sim$y <- <...>) See Chapter 6.

    Error in model.frame.default(formula = y ~ sim$x, drop.unused.levels = TRUE) : 
    invalid type (NULL) for variable 'sim$x'
    
    Error in eval(predvars, data, env) : object 'y' not found
  • Parsing errors. These happen when simInit has issues reading the module R script (or other associated scripts) and are usually easy to debug by reading the error message. For instance, the one below indicates that lines 43, 44, have a problem, which is a missing comma after the “)”. Error in parse(filename) : C:<modulePath>/My_linear_model/My_linear_model.R:44:3: unexpected symbol 43: ) 44: outputObjects ^

  • Environment- and simList-related error. If the simList (sim) is not returned at the end of a function that takes it as an argument – e.g. an event function, the .inputObjects() function… – it will either be lost or changed into something unexpected. Below are two examples of errors that may announce this problem.

    Error in <...>
    sim must be a simList
    
    Error in as.environment(pos) : invalid 'pos' argument

5.13 See also

Chapter 8 on Modules, Events and Functions

Chapter 6 on how to provide Module Inputs

Chapter 7 on the simList

?defineModule describes all the metadata entries.

?expectsInput

?createsOutput

?simInitAndSpades

?newModule

?SpaDES.project::setupProject

?SpaDES.core::simInit

?SpaDES.core::simInitAndSpaDES

5.14 Barebones R script

Code
# create some data
y <- x + rnorm(10)
# fit a linear model
model <- lm(y ~ x)



Require::Require(c("reproducible", "SpaDES.core"), repos = c("https://predictiveecology.r-universe.dev", getOption("repos")))

# make a module
modulePath <- "~/SpaDES_book/NewModuleIntro/NewModule"
SpaDES.core::newModule(name = "My_linear_model", path = modulePath, open = FALSE,
                       events = list(
                         init = {
                           y <- sim$x + rnorm(10)      # <--------- add sim$ here
                           # fit a linear model
                           sim$model <- lm(y ~ sim$x)  # <--------- add sim$ here
                         }
                       ))





out <- simInit(modules = "My_linear_model", paths = list(modulePath = modulePath))
out <- spades(out)

# or 
out <- simInitAndSpades(modules = "My_linear_model", paths = list(modulePath = modulePath))

out$model



out <- simInitAndSpades(modules = "My_linear_model", paths = list(modulePath = modulePath))

out$model

newModule("visualize", path = modulePath, open = FALSE,
          events = list(
            init = {
              plot(sim$model)
            }
          )
)



simInitAndSpades(modules = c("My_linear_model", "visualize"), 
                 paths = list(modulePath = modulePath))