Missing data (NA –> tagged_na)

Base R supports only one type of NA (‘not available’) to represent missing values. The CCHS, however, has several types of missing data or incomplete category responses. cchsflow uses the haven package tagged_na() to allow multiple missing data categories. tagged_na() adds an addition character to the NA value, thereby allowing users to define additional missing data types. tagged_na() applies only for numeric values, as character base values can use any string to represent NA or missing data.

CCHS approach to coding missing data

CCHS category values can change across variables and survey cycles, but the most common values are:

CCHS category value Label
6 not applicable
7 don’t know
8 refusal
9 not stated

cchsflow recodes the four CCHS category missing data categories values into two NA values that are commonly used for most studies. NA(a) = ‘not applicable’ and NA(b) = ‘missing’. Furthermore, variables may be entirely missing from a specific cycle, which results in ‘not asked’ missing data that is not included or coded in CCHS surveys, recoded to NA(a)

Summary of tagged_na values and their corresponding CCHS category values.

cchsflow tagged_na CCHS category value Label
NA(a) 6 not applicable
NA(b) 7 don’t know
NA(b) 8 refusal
NA(b) 9 not stated
NA(c) question not asked in the survey cycle

Example haven::tagged_na()

library(haven)
x <- c(1:5, tagged_na("a"), tagged_na("b"))

# Is used to read the tagged NA in most other functions they are still viewed as NA
na_tag(x)
## [1] NA  NA  NA  NA  NA  "a" "b"
# Is used to print the na as well as their tag
print_tagged_na(x)
## [1]     1     2     3     4     5 NA(a) NA(b)
# Tagged NA's work identically to regular NAs
x
## [1]  1  2  3  4  5 NA NA
## [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE