Skip to contents

Introduction

There are two types derived variables in the CHMS surveys. Both types of derived variables are supported in chmsflow.

  • Variable mapping - mapping two or more variables into a single variable.
  • Variables that derived using math equations.

chmsflow calculates these more complex derived variables using functions that are referenced in variable_details.csv within RecTo section with the prefix ‘Func::’. The variables used in the function are referenced in the variableStart section with the prefix ‘DerivedVar::’. For example, GFR (gfr) includes Func::calculate_GFR in the RecTo section; and DerivedVar::[lab_bcre, pgdcgt, clc_sex, clc_age] in the variableStart section, which indicates the four starting variables (lab_bcre, pgdcgt, clc_sex, clc_age).

Example - Glomerular filtration rate (GFR)

A derived variable for GFR has been created in chmsflow that uses harmonized blood creatine (lab_bcre - in µmol/L), ethnicity (pgdcgt - 13 categories), sex (clc_sex - 2 categories), and age (clc_age - in years) variables across all CHMS cycles.

Using rec_with_table() you can transform the derived WHR variable across multiple CHMS cycles and create a transformed dataset.

In order derive variables, you must load the existing custom function associated with the derived variable

# Function for derived GFR variable
calculate_GFR <- function(LAB_BCRE, PGDCGT, CLC_SEX, CLC_AGE) {
  GFR <- 0
  serumcreat <- 0

  if (any(!LAB_BCRE %in% 14:785) || (any(!CLC_SEX %in% c(1, 2)) || any(!PGDCGT %in% 1:13)) || any(!CLC_AGE %in% 3:79)) {
    GFR <- haven::tagged_na("b") # GFR is NA if any non-responses found
  } else {
    serumcreat <- LAB_BCRE / 88.4 # Proceeds without non-responses

    if (!is.na(CLC_SEX) && !is.na(PGDCGT) && serumcreat != 0) {
      if (CLC_SEX == 2 && PGDCGT == 2) {
        GFR <- 175 * ((serumcreat)^(-1.154)) * ((CLC_AGE)^(-0.203)) * (0.742) * (1.210) # female and black
      } else if (CLC_SEX == 2 && PGDCGT != 2) {
        GFR <- 175 * ((serumcreat)^(-1.154)) * ((CLC_AGE)^(-0.203)) * (0.742) # female and not black
      } else if (CLC_SEX == 1 && PGDCGT == 2) {
        GFR <- 175 * ((serumcreat)^(-1.154)) * ((CLC_AGE)^(-0.203)) * (1.210) # male and black
      } else if (CLC_SEX == 1 && PGDCGT != 2) {
        GFR <- 175 * ((serumcreat)^(-1.154)) * ((CLC_AGE)^(-0.203)) # male and not black
      }
    } else {
      GFR <- haven::tagged_na("b") # Handle case where CLC_SEX or PGDCGT is NA or serumcreat is 0
    }
  }

  return(GFR)
}
cycle2_gfr <- recodeflow:::rec_with_table(cycle2, variables = c("gfr", "lab_bcre", "pgdcgt", "clc_sex", "clc_age"), variable_details = variable_details, log = TRUE)
Using the passed data variable name as database_name
The variable clc_age was recoded into clc_age for the database cycle2 the following recodes were made: 
  value_to       From rows_recoded
1     copy    [3, 80]            4
2    NA::a        996            0
3    NA::b [997, 999]            0
4     <NA>       else            0
The variable clc_sex was recoded into clc_sex for the database cycle2 the following recodes were made: 
  value_to   From rows_recoded
1        1      1            2
2        2      2            2
3    NA::a      6            0
4    NA::b [7, 9]            0
5    NA(b)   else            0
The variable lab_bcre was recoded into lab_bcre for the database cycle2 the following recodes were made: 
  value_to         From rows_recoded
1     copy    [14, 785]            4
2    NA::a [9994, 9996]            0
3    NA::b [9997, 9999]            0
4     <NA>         else            0
NOTE for pgdcgt: Respondents who respond as indigenous to previous question are identified as 'not applicable' in this question. Recode to "other", as per OCAP.
The variable sdcdcgt was recoded into pgdcgt for the database cycle2 the following recodes were made: 
   value_to     From rows_recoded
1         1        1            3
2         2        2            1
3         3        3            0
4         4        4            0
5         5        5            0
6         6        6            0
7         7        7            0
8         8        8            0
9         9        9            0
10       10       10            0
11       11       11            0
12       12       12            0
13       13       13            0
14       12       96            0
15    NA::b [97, 99]            0
16    NA(b)     else            0

Since derived variables are based on previously transformed variables, if you want to only transform your derived variable, you must also specify its base CHMS variables in rec_with_table() as shown above. So for the derived GFR variable, you will have to also specify the blood creatine (lab_bcre), ethnicity (pgdcgt), sex (clc_sex), and age (clc_age) variables.

Creating a derived variable

Creating a derived variable requires the harmonization of existing CHMS variables, and a custom function that uses those harmonized variables. For more information on how to create a derived variable see here