Adding a New Transformation Step
This guide explains how to add support for a new transformation step to the Model Parameters Pipeline, as defined by the Model Parameters repository.
Overview
Adding a new transformation step requires three main tasks:
Register the step in the
_STEP_DISPATCHdictionary inpipeline.pyCreate a new step module – Implement
run_step_{stepname}insrc/model_parameters_pipeline/steps/{stepname}.pyAdd unit tests – Create test files to verify correct behavior
Step 1: Register the Step in _STEP_DISPATCH
The pipeline.py module contains a _STEP_DISPATCH dictionary that maps
step names to their implementation functions. The run method uses this
dictionary to dispatch each step.
Location
Find the _STEP_DISPATCH dictionary near the top of pipeline.py:
_STEP_DISPATCH: dict[str, Callable[[ModelPipeline, str | Path], list[str]]] = {
"center": run_step_center,
"dummy": run_step_dummy,
"interaction": run_step_interaction,
"logistic-regression": run_step_logistic_regression,
"rcs": run_step_rcs,
}
Add Your Step
Import your new step function at the top of the file:
from model_parameters_pipeline.steps.your_step_name import run_step_your_step_name
Add a new entry to
_STEP_DISPATCH:_STEP_DISPATCH: dict[str, Callable[[ModelPipeline, str | Path], list[str]]] = { "center": run_step_center, "dummy": run_step_dummy, "interaction": run_step_interaction, "logistic-regression": run_step_logistic_regression, "rcs": run_step_rcs, "your-step-name": run_step_your_step_name, }
Important
The dictionary key (
"your-step-name") must match the exact step name as it appears in the Model Parameters specification and in users’model-steps.csvfiles.The function name (
run_step_your_step_name) uses underscores instead of hyphens.
Step 2: Create the Step Function
Create a new module src/model_parameters_pipeline/steps/{stepname}.py
containing a function named run_step_{stepname} that implements the
transformation logic.
Create the Module
Create a new file at src/model_parameters_pipeline/steps/{stepname}.py
(replace {stepname} with your step name, using underscores).
Function Template
Use this template as a starting point:
"""{Step Name} transformation step.
{Brief description of what this step does and its purpose}.
"""
from __future__ import annotations
from pathlib import Path
from typing import TYPE_CHECKING
from model_parameters_pipeline._utils import verify_columns
if TYPE_CHECKING:
from model_parameters_pipeline.pipeline import ModelPipeline
def run_step_{stepname}(mod: ModelPipeline, file: str | Path) -> list[str]:
"""Run {step name} transformation step.
Args:
mod: ModelPipeline instance (mutated in place).
file: Path to {step name} step specification CSV.
Returns:
List of output column names created by this step.
"""
mod._add_file(file)
step_data = mod._get_file(file)
verify_columns(
step_data,
["column1", "column2", "column3"],
"{stepname} step file",
file,
)
output_columns: list[str] = []
for _, row in step_data.iterrows():
# Extract parameters from the specification
param1 = row["column1"]
param2 = row["column2"]
param3 = row["column3"]
# Implement your transformation logic here
# Example: mod.data[new_column] = mod.data[existing_column] * param
output_columns.append(new_column)
return output_columns
Key Components Explained
File Location: Create your step function in
src/model_parameters_pipeline/steps/{stepname}.py.Function Signature:
Always takes
mod(ModelPipelineinstance) andfile(path to specification file)Input data is accessed and modified via
mod.data(a pandas DataFrame)Always returns a
list[str]of output column namesThe
ModelPipelineimport is underTYPE_CHECKINGto avoid circular imports
Load Specification File:
mod._add_file(file) step_data = mod._get_file(file)
Always use these
ModelPipelinemethods to load files – never read files directly (e.g. withpd.read_csv)._add_fileand_get_fileensure that any file path is validated against the sandbox path, if one was specified when constructing theModelPipeline. This prevents steps from reading files outside the permitted directory.Verify Columns:
verify_columns( step_data, ["column1", "column2", "column3"], "{stepname} step file", file, )
Validates that the specification file contains all required columns. Update the column list to match your step’s requirements from the Model Parameters documentation. Import
verify_columnsfrommodel_parameters_pipeline._utils.Process Each Row: The
for _, row in step_data.iterrows()loop processes each row in the specification file. Each row typically defines one transformation to apply.Access and Write Data:
Read a column:
mod.data[column_name](returns a pandas Series)Write a column:
mod.data[new_column] = transformed_values
Track Output Columns: Append each new column name to
output_columnsso the pipeline knows which columns this step produced.
Example: Center Step
Here’s a real example from the existing codebase (steps/center.py):
def run_step_center(mod: ModelPipeline, file: str | Path) -> list[str]:
"""Run center transformation step.
Args:
mod: ModelPipeline instance (mutated in place).
file: Path to center step specification CSV.
Returns:
List of output column names created by this step.
"""
mod._add_file(file)
step_data = mod._get_file(file)
verify_columns(
step_data,
["origVariable", "centerValue", "centeredVariable"],
"center step file",
file,
)
output_columns: list[str] = []
for _, row in step_data.iterrows():
orig_variable = row["origVariable"]
center_value = row["centerValue"]
centered_variable = row["centeredVariable"]
mod.data[centered_variable] = mod.data[orig_variable] - center_value
output_columns.append(centered_variable)
return output_columns
This function:
Is defined in its own module
steps/center.pyAccepts
mod(aModelPipelineinstance) andfile; data is accessed viamod.dataLoads the center specification file using
mod._add_file/mod._get_fileVerifies it has the required columns (
origVariable,centerValue,centeredVariable)For each row, creates a new centered variable by subtracting
centerValuefrom the original variable inmod.dataReturns a list of the new column names
Step 3: Add Unit Tests
Unit tests ensure your transformation step works correctly. The testing framework automatically discovers and runs tests based on directory structure.
See Model Parameters Step Tests for complete instructions.
Reference Documentation
For detailed information about Model Parameters transformation steps and their required file formats, see:
Checklist
Use this checklist when adding a new transformation step:
Create new module:
src/model_parameters_pipeline/steps/{stepname}.pyImplement
run_step_{stepname}function with type hints and docstringImport and add entry to
_STEP_DISPATCHinpipeline.pyVerify column names match the Model Parameters specification
Create test directory:
tests/testdata/step-tests/test-{stepname}/Create
test-model-export.csvin test directoryCreate
test-model-steps.csvin test directoryCreate
test-{stepname}.csvwith test parameters in test directoryGenerate expected output using
generate_step_tests_expected()Run
pytestto verify tests passReview and commit all changes including
test-expected.csv
Common Patterns
Parsing Delimited Strings
Some steps use delimited strings (e.g., “var1;var2;var3”) in their parameters file. Use the helper function:
from model_parameters_pipeline._utils import get_string_parts
parts = get_string_parts(row["columnName"])
Working with Numeric Values
Convert string values to numeric when needed:
from model_parameters_pipeline._utils import get_string_parts
numeric_values = [float(v) for v in get_string_parts(row["knots"])]
Creating New Columns Safely
To avoid column name conflicts:
from model_parameters_pipeline._utils import get_unused_column
new_col = get_unused_column(mod.data, "prefix_")
Adding Multiple Columns
You can add multiple columns at once using numpy array assignment:
import numpy as np
# vals is a 2D numpy array with one column per variable
vals = some_computation(mod.data[variable])
for col_idx, var_name in enumerate(variable_names):
mod.data[var_name] = vals[:, col_idx]
Getting Help
For Model Parameters specification questions, refer to the Model Parameters documentation
For existing step implementation examples, see modules like
steps/center.py,steps/dummy.py, etc.For testing questions, see
tests/testdata/step-tests/README.md