Introduction
There are two types of derived variables in the CHMS surveys. Both are supported in chmsflow.
- Variable mapping – mapping two or more variables into a single variable.
- Computed variables – variables derived using mathematical equations or clinical logic.
chmsflow computes derived variables using functions referenced in variable-details.csv. The recEnd column uses the prefix Func:: to name the R function, and the variableStart column uses the prefix DerivedVar:: to list the input variables.
For example, GFR (gfr_ml_min) has:
-
recEnd:Func::calculate_gfr -
variableStart:DerivedVar::[lab_bcre, pgdcgt, clc_sex, clc_age]
This tells rec_with_table() to call calculate_gfr() with the four input variables.
How to use derived variables
Since derived variables depend on their input variables, you must list both the derived variable and its inputs when calling rec_with_table():
cycle2_gfr <- recodeflow::rec_with_table(
cycle2,
variables = c("lab_bcre", "pgdcgt", "clc_sex", "clc_age", "gfr_ml_min"),
variable_details = variable_details,
log = TRUE
)For variables that depend on medication status (e.g., hypertension, diabetes), use recode_after_meds() instead of rec_with_table(). See Recoding medications and Analysis walkthrough for the full workflow.
Creating a derived variable
To add a new derived variable to chmsflow, you need to create a harmonized set of input variables and an R function that computes the derived value. See How to add variables for step-by-step instructions.
For details on the metadata schema, see Variable schema reference.
Next steps
- See derived variables in a full analysis – The Analysis walkthrough demonstrates deriving hypertension status from CHMS cycle 3 data.
-
Handle missing data – Learn how
tagged_na()codes propagate through derived variable functions in Missing data (tagged_na). - Understand the methodology – For the design rationale behind the rules-as-data approach, see Methodology.