There are two types derived variables in the CCHS surveys. Both types of derived variables are supported in cchsflow
.
cchsflow
calculates these more complex derived variables using functions that are referenced in variable_details.csv
within RecTo
section with the prefix ‘Func::’. The variables used in the function are referenced in the variableStart
section with the prefix ‘DerivedVar::’. For example, BMI (HWTGBMI_der
) includes Func::bmi_fun
in the RecTo
section; and DerivedVar::[HWTGHTM, HWTGWTK]
in the variableStart
section, which indicates the two starting variables (HWTGHTM, HWTGWTK
).
While BMI is calculated across all CCHS cycles, the method in which it is calculated varies across CCHS cycles, leading to misclassification error that might affect your study. As such, a derived variable for BMI has been created in cchsflow
that uses harmonized height (HWTGHTM) and weight (HWTGWTK) variables across all CCHS cycles.
Using rec_with_table()
you can transform the derived BMI variable across multiple CCHS cycles and create a transformed dataset.
In order derive variables, you must load the existing custom function associated with the derived variable
# Custom ifelse for evaluating NA
if_else2 <- function(x, a, b) {
falseifNA <- function(x) {
ifelse(is.na(x), FALSE, x)
}
ifelse(falseifNA(x), a, b)
}
#BMI derived variable
# HWTGHTM: height (in meters)
# HWTGWTK: weight (in kilograms)
bmi_fun <-
function(HWTGHTM,
HWTGWTK) {
ifelse2((!is.na(HWTGHTM)) & (!is.na(HWTGWTK)),
(HWTGWTK/(HWTGHTM*HWTGHTM)), NA)
}
bmi2003 <- rec_with_table(cchs2003_p, variables = c("HWTGHTM", "HWTGWTK",
"HWTGBMI_der"), log = TRUE)
## No variable_details detected.
## Loading cchsflow variable_details
## Using the passed data variable name as database_name
## NOTE for HWTGHTM: 2001 and 2003 CCHS use inches, values converted to meters to 3 decimal points
## NOTE for HWTGHTM: 74+ inches converted to 76 inches
## The variable HWTCGHT was recoded into HWTGHTM for the database cchs2003_p the following recodes were made:
## value_to From rows_recoded
## 1 1.118 1 0
## 2 1.143 2 0
## 3 1.168 3 0
## 4 1.194 4 0
## 5 1.219 5 0
## 6 1.245 6 0
## 7 1.27 7 0
## 8 1.295 8 0
## 9 1.321 9 0
## 10 1.346 10 0
## 11 1.372 11 0
## 12 1.397 12 0
## 13 1.422 13 0
## 14 1.448 14 1
## 15 1.473 15 0
## 16 1.499 16 2
## 17 1.524 17 5
## 18 1.549 18 8
## 19 1.575 19 14
## 20 1.6 20 16
## 21 1.626 21 17
## 22 1.651 22 19
## 23 1.676 23 17
## 24 1.702 24 25
## 25 1.727 25 16
## 26 1.753 26 10
## 27 1.778 27 13
## 28 1.803 28 14
## 29 1.829 29 10
## 30 1.854 30 6
## 31 1.93 31 6
## 32 NA::a 96 0
## 33 NA::b 99 1
## The variable HWTCGWTK was recoded into HWTGWTK for the database cchs2003_p the following recodes were made:
## value_to From rows_recoded
## 1 copy [27.0,135.0] 192
## 2 NA::a 996 0
## 3 NA::b [997,999] 8
bmi2010 <- rec_with_table(cchs2010_p, variables = c("HWTGHTM", "HWTGWTK",
"HWTGBMI_der"), log = TRUE)
## No variable_details detected.
## Loading cchsflow variable_details
## Using the passed data variable name as database_name
## NOTE for HWTGHTM: Height is a reported in meters from 2005 CCHS onwards
## The variable HWTGHTM was recoded into HWTGHTM for the database cchs2010_p the following recodes were made:
## value_to From rows_recoded
## 1 copy [0.914,2.134] 190
## 2 NA::a 9.996 2
## 3 NA::b [9.997,9.999] 8
## The variable HWTGWTK was recoded into HWTGWTK for the database cchs2010_p the following recodes were made:
## value_to From rows_recoded
## 1 copy [27.0,135.0] 186
## 2 NA::a 999.96 0
## 3 NA::b [999.97,999.99] 14
Since derived variables are based on previously transformed variables, if you want to only transform your derived variable, you must also specify its base CCHS variables in rec_with_table()
as shown above. So for the derived BMI variable, you will have to also specify the height (HWTGHTM
) and weight (HWTGWTK
) variables.
Using bind_rows()
, you can then combine your transformed datasets.
HWTGHTM | HWTGWTK | HWTGBMI_der |
---|---|---|
1.651 | 81 | 29.71604 |
1.753 | 81 | 26.35853 |
1.575 | 77 | 31.04056 |
1.829 | 106 | 31.68681 |
1.727 | 72 | 24.14059 |
1.549 | 81 | 33.75843 |
1.727 | 81 | 27.15816 |
1.651 | NA | NA |
1.727 | 77 | 25.81702 |
1.676 | 68 | 24.20811 |
Creating a derived variable requires the harmonization of existing CCHS variables, and a custom function that uses those harmonized variables. For more information on how to create a derived variable see here