Skip to contents

Introduction

There are two types derived variables in the CHMS surveys. Both types of derived variables are supported in chmsflow.

  • Variable mapping - mapping two or more variables into a single variable.
  • Variables that derived using math equations.

chmsflow calculates these more complex derived variables using functions that are referenced in variable_details.csv within RecTo section with the prefix ‘Func::’. The variables used in the function are referenced in the variableStart section with the prefix ‘DerivedVar::’. For example, GFR (gfr) includes Func::calculate_GFR in the RecTo section; and DerivedVar::[lab_bcre, pgdcgt, clc_sex, clc_age] in the variableStart section, which indicates the four starting variables (lab_bcre, pgdcgt, clc_sex, clc_age).

Example - Glomerular filtration rate (GFR)

A derived variable for GFR has been created in chmsflow that uses harmonized blood creatine (lab_bcre - in µmol/L), ethnicity (pgdcgt - 13 categories), sex (clc_sex - 2 categories), and age (clc_age - in years) variables across all CHMS cycles.

Using rec_with_table() you can transform the derived WHR variable across multiple CHMS cycles and create a transformed dataset.

In order derive variables, you must load the existing custom function associated with the derived variable

# Function for derived GFR variable
calculate_GFR <- function(LAB_BCRE, PGDCGT, CLC_SEX, CLC_AGE) {
  GFR <- 0
  serumcreat <- 0

  if (any(!LAB_BCRE %in% 14:785) || (any(!CLC_SEX %in% c(1, 2)) || any(!PGDCGT %in% 1:13)) || any(!CLC_AGE %in% 3:79)) {
    GFR <- haven::tagged_na("b") # GFR is NA if any non-responses found
  } else {
    serumcreat <- LAB_BCRE / 88.4 # Proceeds without non-responses

    if (!is.na(CLC_SEX) && !is.na(PGDCGT) && serumcreat != 0) {
      if (CLC_SEX == 2 && PGDCGT == 2) {
        GFR <- 175 * ((serumcreat)^(-1.154)) * ((CLC_AGE)^(-0.203)) * (0.742) * (1.210) # female and black
      } else if (CLC_SEX == 2 && PGDCGT != 2) {
        GFR <- 175 * ((serumcreat)^(-1.154)) * ((CLC_AGE)^(-0.203)) * (0.742) # female and not black
      } else if (CLC_SEX == 1 && PGDCGT == 2) {
        GFR <- 175 * ((serumcreat)^(-1.154)) * ((CLC_AGE)^(-0.203)) * (1.210) # male and black
      } else if (CLC_SEX == 1 && PGDCGT != 2) {
        GFR <- 175 * ((serumcreat)^(-1.154)) * ((CLC_AGE)^(-0.203)) # male and not black
      }
    } else {
      GFR <- haven::tagged_na("b") # Handle case where CLC_SEX or PGDCGT is NA or serumcreat is 0
    }
  }

  return(GFR)
}
cycle2_gfr <- recodeflow:::rec_with_table(cycle2, variables = c("gfr", "lab_bcre", "pgdcgt", "clc_sex", "clc_age"), variable_details = variable_details, log = TRUE)

Since derived variables are based on previously transformed variables, if you want to only transform your derived variable, you must also specify its base CHMS variables in rec_with_table() as shown above. So for the derived GFR variable, you will have to also specify the blood creatine (lab_bcre), ethnicity (pgdcgt), sex (clc_sex), and age (clc_age) variables.

Creating a derived variable

Creating a derived variable requires the harmonization of existing CHMS variables, and a custom function that uses those harmonized variables. For more information on how to create a derived variable see here