Contributing to Model Parameters Pipeline
Source:CONTRIBUTING.md
Thank you for your interest in contributing to the Model Parameters Pipeline! This document outlines the guidelines for contributing to this project.
Code of Conduct
Please be respectful and constructive in all interactions. We aim to maintain a welcoming environment for contributors of all backgrounds and experience levels.
How to Contribute
Reporting Bugs
If you find a bug, please open an issue on GitHub with:
A clear description of the problem
A minimal reproducible example
Your package version (
packageVersion("model.parameters.pipeline"))-
Your session information from the
sessioninfo::session_info()function (this includes your R version, OS, system architecture, and additional information):install.packages("sessioninfo") sessioninfo::session_info()
Suggesting Features
Feature requests are welcome. Open an issue describing:
- The use case you are trying to address
- How the proposed feature would work
- Any relevant references to the Model Parameters specification
Submitting Changes
- Fork the repository and create a branch from
main - Make your changes, following the code style guidelines below
- Add or update tests as needed
- Run
devtools::test()to ensure all tests pass - Run
devtools::check()to ensure the package passes R CMD check - Submit a pull request with a clear description of your changes
Code Style
This project uses lintr to enforce consistent code style. Before submitting a pull request, check your code with:
lintr::lint_package()Key style conventions:
- Use
snake_casefor variable and function names - Use
snake_casefor file names, with unit test source files starting withtest-(eg.test-model_parameters_pipeline.R) and Model Parameters step files starting withstep-and named after the step name as found in the Model Parameters specification (eg.step-logistic-regression.R) - Internal (non-exported) functions are prefixed with
.(e.g.,.run_step_center) - Keep lines to a maximum of 80 characters where practical
- Document all exported functions using roxygen2 comments
Running Tests
devtools::test()To regenerate expected test outputs after changing a step’s behaviour:
source("tests/testthat/generate_step_tests_expected.R")
generate_step_tests_expected()Package Structure
-
R/- Source files. One file per transformation step (step-{stepname}.R), plus core pipeline files -
tests/testthat/- Unit tests and test data -
man/- Auto-generated documentation (do not edit directly; regenerate withdevtools::document()) -
vignettes/- Long-form documentation
Adding a New Transformation Step
This guide explains how to add support for a new transformation step to the Model Parameters Pipeline, as defined by the Model Parameters repository.
Overview
Adding a new transformation step requires three main tasks:
-
Update
run_model_pipelineinR/model_parameters_pipeline.R- Add a new conditional block to recognize the step -
Create a new step source file - Implement
.run_step_{stepname}inR/step-{stepname}.Rto execute the transformation - Add unit tests - Create test files to verify correct behavior
Step 1: Update run_model_pipeline
The run_model_pipeline function in R/model_parameters_pipeline.R processes each step defined in the model steps specification. You need to add a new conditional block for your step.
Location
Find the if-else chain in run_model_pipeline where each step function gets called. The if-else chain can be found by searching for the string [add-step-here] in R/model_parameters_pipeline.R. Its general format is shown below:
# [add-step-here]
if (step_name == "center") {
# Execute "center" step
} else if (step_name == "dummy") {
# Execute "dummy" step
} # ...Add Your Step
Add a new else if block for your step before the final else clause:
} else if (step_name == "your-step-name") {
res <- .run_step_your_step_name(mod, file_path)
} else {
# Handle unknown step name
}Important:
- Replace
"your-step-name"with the exact step name as it appears in the Model Parameters specification - Replace
your_step_namewith an underscored version for the function name - The step name must match what users will specify in their
model-steps.csvfile
Step 2: Create the Step Function
Create a new source file R/step-{stepname}.R containing a function named .run_step_{stepname} that implements the transformation logic.
Create the Source File
Create a new file at R/step-{stepname}.R (replace {stepname} with your step name).
Function Template
Use this template as a starting point:
#' Run {Step Name} Step
#'
#' {Brief description of what this step does and its purpose}.
#' Implements the '{stepname}' transformation step from the Model
#' Parameters pipeline.
#'
#' @param mod Model object containing input data in \code{mod$data}
#' @param file Path to {stepname} step specification file
#' @return A list containing: \code{mod} (the updated model object with
#' {description of added data} added to \code{mod$data}), and
#' \code{output_columns} (character vector of output columns
#' of this step)
#' @keywords internal
.run_step_{stepname} <- function(mod, file) {
# Load the step specification file
mod <- .add_file(mod, file)
step_data <- .get_file(mod, file)
# Verify required columns exist in the specification file
.verify_columns(
step_data,
c("column1", "column2", "column3"),
"{stepname} step file",
file
)
# Track which columns are produced by this step
output_columns <- c()
# Process each row in the step specification
for (i in seq_len(nrow(step_data))) {
info <- step_data[i, ]
# Extract parameters from the specification
param1 <- info[["column1"]]
param2 <- info[["column2"]]
param3 <- info[["column3"]]
# Implement your transformation logic here
# Example: mod$data[new_column] <- transformation(mod$data[existing_column])
output_columns <- c(output_columns, new_column)
}
# Return the updated model object and output column names
list(
mod = mod,
output_columns = output_columns
)
}Key Components Explained
-
File Location:
- Create your step function in
R/step-{stepname}.R
- Create your step function in
-
Function Signature:
- Always takes
mod(model object) andfile(path to specification file) - Input data is accessed and modified via
mod$data - Always returns a named list with
modandoutput_columns - Function name is
.run_step_{stepname}(with leading dot, making it internal)
- Always takes
-
Load Specification File:
Always use these helper functions to load files — never read files directly (e.g. with
read.csv)..add_fileand.get_fileensure that any file path is validated against the sandbox path, if one was specified viaprepare_model_pipeline. This prevents steps from reading files outside the permitted directory. -
Verify Columns:
.verify_columns(step_data, c("column1", "column2", "column3"), "{stepname} step file", file)Validates that the specification file contains all required columns. Update the column list to match your step’s requirements from the Model Parameters documentation.
Process Each Row: The
forloop processes each row in the specification file. Each row typically defines one transformation to apply.-
Access and Write Data:
- Read data:
mod$data[column_name]ormod$data[[column_name]] - Write data:
mod$data[new_column] <- transformed_values
- Read data:
Track Output Columns: Append each new column name to
output_columnsso the pipeline knows which columns this step produced. This might include columns that already existed in the data (eg. an existing column has been overwritten).Return a List: Always return a named list with
modandoutput_columnsso the pipeline can chain steps together.
Example: Center Step
Here’s a real example from the existing codebase (R/step-center.R):
.run_step_center <- function(mod, file) {
# Load the step specification file
mod <- .add_file(mod, file)
step_data <- .get_file(mod, file)
# Verify required columns exist in the specification file
.verify_columns(
step_data,
c("origVariable", "centerValue", "centeredVariable"),
"center step file",
file
)
# Track which columns are produced by this step
output_columns <- c()
# Process each row in the step specification
for (i in seq_len(nrow(step_data))) {
info <- step_data[i, ]
orig_variable <- info[["origVariable"]]
center_value <- info[["centerValue"]]
centered_variable <- info[["centeredVariable"]]
# Make sure origVariable exists
if (!orig_variable %in% colnames(mod$data)) {
stop(
"Variable \"",
orig_variable,
"\" specified as origVariable does not exist in data ",
"when performing center step in ",
basename(file)
)
}
# Center the variable
mod$data[centered_variable] <- mod$data[orig_variable] - center_value
output_columns <- c(output_columns, centered_variable)
}
# Return the updated model object and output column names
list(
mod = mod,
output_columns = output_columns
)
}This function:
- Is defined in its own source file
R/step-center.R - Accepts
modandfile; data is accessed viamod$data - Loads the center specification file
- Verifies it has the required columns (
origVariable,centerValue,centeredVariable) - For each row, make sure that the column specified under
origVariableexists in the data atmod$data, and raise an exception if it does not exist. - For each row, creates a new centered variable by subtracting
centerValuefrom the original variable inmod$data - Returns a list with the updated model object and the names of the new columns
Step 3: Add Unit Tests
Unit tests ensure your transformation step works correctly. The testing framework automatically discovers and runs tests based on directory structure.
Quick Reference
See the detailed guide Model Parameters Step Tests for complete instructions.
Reference Documentation
For detailed information about Model Parameters transformation steps and their required file formats, see:
Common Patterns
Parsing Delimited Strings
Some steps use delimited strings (e.g., “var1;var2;var3”) in their parameters file. Use the helper function:
parts <- .get_string_parts(info[["columnName"]])Working with Numeric Values
Convert string values to numeric when needed:
numeric_values <- as.double(.get_string_parts(info[["knots"]]))Creating New Columns Safely
To avoid column name conflicts:
new_col <- .get_unused_column(colnames(mod$data), "prefix")Adding Multiple Columns
You can add multiple columns at once using data frame assignment:
# Create a data frame with new columns
new_cols <- data.frame(
col1 = values1,
col2 = values2
)
mod$data[c("col1", "col2")] <- new_colsGetting Help
- For Model Parameters specification questions, refer to the Model Parameters documentation
- For existing step implementation examples, see source files like
R/step-center.R,R/step-dummy.R, etc. - For testing questions, see tests/testthat/testdata/steps/README.md