Missing data (NA –> tagged_na)

Base R supports only one type of NA (‘not available’) to represent missing values. However, your data may include several types of missing data or incomplete category responses.

To address this issue of a singular “NA”, we use the haven package tagged_na()

tagged_na() adds an additional character to any NA values, thereby allowing users to define additional missing data types. tagged_na() applies only for numeric values; character based values can use any string to represent NA or missing data.

recodeflow approach to coding missing data

recodeflow recodes missing data categories values into 3 NA values that are commonly used for most studies:

  • NA(a) = ‘not applicable’
  • NA(b) = ‘missing’
  • Na(c) = ‘not asked’ which is any variable that was in a dataset.

Summary of tagged_na values and their corresponding category values.

recodeflow tagged_na category value
NA(a) 6
NA(b) 7
NA(b) 8
NA(b) 9
NA(c) question not asked in the survey cycle

Example haven::tagged_na()

library(haven)
## Registered S3 methods overwritten by 'tibble':
##   method     from  
##   format.tbl pillar
##   print.tbl  pillar
x <- c(1:5, tagged_na("a"), tagged_na("b"))

# Is used to read the tagged NA in most other functions they are still viewed as NA
na_tag(x)
## [1] NA  NA  NA  NA  NA  "a" "b"
# Is used to print the na as well as their tag
print_tagged_na(x)
## [1]     1     2     3     4     5 NA(a) NA(b)
# Tagged NA's work identically to regular NAs
x
## [1]  1  2  3  4  5 NA NA
## [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE