Introduction

There are two types derived variables in the CCHS surveys. Both types of derived variables are supported in cchsflow.

  • Variable mapping - mapping two or more variables into a single variable.
  • Variables that derived using math equations - BMI is an example. where BMI = weight / height*height.

cchsflow calculates these more complex derived variables using functions that are referenced in variable_details.csv within RecTo section with the prefix ‘Func::’. The variables used in the function are referenced in the variableStart section with the prefix ‘DerivedVar::’. For example, BMI (HWTGBMI_der) includes Func::bmi_fun in the RecTo section; and DerivedVar::[HWTGHTM, HWTGWTK] in the variableStart section, which indicates the two starting variables (HWTGHTM, HWTGWTK).

Example - Body Mass Index (BMI)

While BMI is calculated across all CCHS cycles, the method in which it is calculated varies across CCHS cycles, leading to misclassification error that might affect your study. As such, a derived variable for BMI has been created in cchsflow that uses harmonized height (HWTGHTM) and weight (HWTGWTK) variables across all CCHS cycles.

Using rec_with_table() you can transform the derived BMI variable across multiple CCHS cycles and create a transformed dataset.

In order derive variables, you must load the existing custom function associated with the derived variable

# Custom ifelse for evaluating NA
if_else2 <- function(x, a, b) {
  falseifNA <- function(x) {
    ifelse(is.na(x), FALSE, x)
  }
  ifelse(falseifNA(x), a, b)
}

#BMI derived variable
# HWTGHTM: height (in meters)
# HWTGWTK: weight (in kilograms)
bmi_fun <- 
  function(HWTGHTM, 
           HWTGWTK) {
    ifelse2((!is.na(HWTGHTM)) & (!is.na(HWTGWTK)), 
            (HWTGWTK/(HWTGHTM*HWTGHTM)), NA)
  }
bmi2003 <- rec_with_table(cchs2003_p, variables = c("HWTGHTM", "HWTGWTK", 
            "HWTGBMI_der"), log = TRUE)
## No variable_details detected.
##               Loading cchsflow variable_details
## Using the passed data variable name as database_name
## NOTE for HWTGHTM: 2001 and 2003 CCHS use inches, values converted to meters to 3 decimal points
## NOTE for HWTGHTM: 74+ inches converted to 76 inches
## The variable HWTCGHT was recoded into HWTGHTM for the database cchs2003_p the following recodes were made:
##    value_to From rows_recoded
## 1     1.118    1            0
## 2     1.143    2            0
## 3     1.168    3            0
## 4     1.194    4            0
## 5     1.219    5            0
## 6     1.245    6            0
## 7      1.27    7            0
## 8     1.295    8            0
## 9     1.321    9            0
## 10    1.346   10            0
## 11    1.372   11            0
## 12    1.397   12            0
## 13    1.422   13            0
## 14    1.448   14            1
## 15    1.473   15            0
## 16    1.499   16            2
## 17    1.524   17            5
## 18    1.549   18            8
## 19    1.575   19           14
## 20      1.6   20           16
## 21    1.626   21           17
## 22    1.651   22           19
## 23    1.676   23           17
## 24    1.702   24           25
## 25    1.727   25           16
## 26    1.753   26           10
## 27    1.778   27           13
## 28    1.803   28           14
## 29    1.829   29           10
## 30    1.854   30            6
## 31     1.93   31            6
## 32    NA::a   96            0
## 33    NA::b   99            1
## The variable HWTCGWTK was recoded into HWTGWTK for the database cchs2003_p the following recodes were made:
##   value_to         From rows_recoded
## 1     copy [27.0,135.0]          192
## 2    NA::a          996            0
## 3    NA::b    [997,999]            8
bmi2010 <- rec_with_table(cchs2010_p, variables = c("HWTGHTM", "HWTGWTK",
            "HWTGBMI_der"), log = TRUE)
## No variable_details detected.
##               Loading cchsflow variable_details
## Using the passed data variable name as database_name
## NOTE for HWTGHTM: Height is a reported in meters from 2005 CCHS onwards
## The variable HWTGHTM was recoded into HWTGHTM for the database cchs2010_p the following recodes were made:
##   value_to          From rows_recoded
## 1     copy [0.914,2.134]          190
## 2    NA::a         9.996            2
## 3    NA::b [9.997,9.999]            8
## The variable HWTGWTK was recoded into HWTGWTK for the database cchs2010_p the following recodes were made:
##   value_to            From rows_recoded
## 1     copy    [27.0,135.0]          186
## 2    NA::a          999.96            0
## 3    NA::b [999.97,999.99]           14

Since derived variables are based on previously transformed variables, if you want to only transform your derived variable, you must also specify its base CCHS variables in rec_with_table() as shown above. So for the derived BMI variable, you will have to also specify the height (HWTGHTM) and weight (HWTGWTK) variables.

Using bind_rows(), you can then combine your transformed datasets.

HWTGHTM HWTGWTK HWTGBMI_der
1.651 81 29.71604
1.753 81 26.35853
1.575 77 31.04056
1.829 106 31.68681
1.727 72 24.14059
1.549 81 33.75843
1.727 81 27.15816
1.651 NA NA
1.727 77 25.81702
1.676 68 24.20811

Creating a derived variable

Creating a derived variable requires the harmonization of existing CCHS variables, and a custom function that uses those harmonized variables. For more information on how to create a derived variable see here