4.1 Customize Data

Other years of CCHS can be used but only if the required variables are transformed to the 2013/2014 CCHS format.

Prior to using the Project Big Life Planning Tool you may want to transform your data set. Reasons include: customized filter(s) and/or customized stratification(s), or transforming variables from other population health surveys to the 2013/2014 Canadian Community Health Survey (CCHS) format. Any variable that you create can be selected within the planning tool in the filter and/or stratify options.

Data manipulation can occur on any programming software: R, SAS, STATA, etc, as long as you output your data set as a ‘.csv’ file.

4.1.1 Customize filter and/or stratification

An example of customizing your data set is converting the variable: Body Mass Index (CCHS 2013/2014 variable HWTGBMI) from a continuous variable into four distinct categories:

  • Underweight: BMI less then 18.5
  • Normal or Healthy Weight: BMI of 18.5 to 24.9
  • Overweight: BMI of 25.0 to 29.9
  • Obese: BMI greater or equal to 30.0

Steps

The following steps show the R code that would be used to create these strata:

  1. Convert observations “Not stated” from 999.99 to NA
    data[data == 999.99] <- NA
  1. Load the R package dpylr. This package is used for data manipulation.
    library(dpylr)
  1. Create a new column that contains four categories for BMI
    data$newcolumn <- cut(data$HWTGBMI, breaks = c(0,18.5,25,30,Inf),  labels=c("Underweight", "Healthy", "Overweight", "Obese")
  1. The output will be your data set + a new column with the corresponding category (“Underweight”, “Healthy”, “Overweight”, “Obese”) for that individual.
    HWTGBMI   newcolumn
  1   22.68     Healthy
  2   26.99  Overweight
  3      NA        <NA>
  4   34.44       Obese
  5   23.77     Healthy
  6   17.23 Underweight

This new column can be now be used with the Project Big Life Planning Tool for the purpose of filtering or stratification.

###Transforming variables to 2013/2014 CCHS format

Population health surveys other then the 2013/2014 CCHS PUMF can be used if the variables: sex, age and survey weight (required variables) are transformed to the 2013/2014 CCHS format.

For example the 2005/2006 CCHS data can be used if the 2005/2006 CCHS variable for age is renamed from DHHEGAGE to and DHHGAGE (2013/2014 format). Although the name for the variable name for age is different between the 2005/2006 and 2013/2014 surveys, they capture identical information.

library(dpylr)

CCHSdata.2005.2006 <- CCHSdata.2005.2006 %>%
  rename(DHHGAGE = DHHEGAGE)

All required variables must be transformed to the 2013/2014 CCHS format. It is also recommended that you transform the recommend variables as well to increase the accuracy of the prediction(s).

When transforming required or recommended variables pay attention to how the data was captured. There may be differences limit the ability to use the data set or differences that will affect your results. For instance from the 2001-2004 CCHS survey age is captured in 15 categories but from 2005-2013 age is captured in 16 categories.