Recode with Table is responsible for recoding values of a dataset based on the specifications in variable_details.

rec_with_table(
  data,
  variables = NULL,
  database_name = NULL,
  variable_details = NULL,
  else_value = NA,
  append_to_data = FALSE,
  log = FALSE,
  notes = TRUE,
  var_labels = NULL,
  custom_function_path = NULL,
  attach_data_name = FALSE
)

Arguments

data

A dataframe containing the variables to be recoded. Can also be a list of dataframes

variables

character vector containing variable names to recode or a variables csv containing additional variable info

database_name

String, the name of the dataset containing the variables to be recoded. Can also be a vector of strings if data is a list

variable_details

A dataframe containing the specifications (rules) for recoding.

else_value

Value (string, number, integer, logical or NA) that is used to replace any values that are outside the specified ranges (no rules for recoding).

append_to_data

Logical, if TRUE (default), recoded variables will be appended to the data.

log

Logical, if FALSE (default), a log of recoding will not be printed.

notes

Logical, if FALSE (default), will not print the content inside the `Note`` column of the variable being recoded.

var_labels

labels vector to attach to variables in variables

custom_function_path

path to location of the function to load

attach_data_name

to attach name of database to end table

Value

a dataframe that is recoded according to rules in variable_details.

Details

The variable_details dataframe needs the following variables to function:

variable

name of new (mutated) variable that is recoded

toType

type the variable is being recoded to cat = categorical, cont = continuous

databaseStart

name of dataframe with original variables to be recoded

variableStart

name of variable to be recoded

fromType

variable type of start variable. cat = categorical or factor variable cont = continuous variable (real number or integer)

recTo

Value to recode to

recFrom

Value/range being recoded from

Each row in variable_details comprises one category in a newly transformed variable. The rules for each category the new variable are a string in recFrom and value in recTo. These recode pairs are the same syntax as sjmisc::rec(), except in sjmisc::rec() the pairs are a string for the function attribute rec =, separated by '='. For example in rec_w_table variable_details$recFrom = 2; variable_details$recTo = 4 is the same as sjmisc::rec(rec = "2=4"). the pairs are obtained from the RecFrom and RecTo columns

recode pairs

each recode pair is row. see above example or PBC-variableDetails.csv

multiple values

multiple old values that should be recoded into a new single value may be separated with comma, e.g. recFrom = "1,2"; recTo = 1

value range

a value range is indicated by a colon, e.g. recFrom= "1:4"; recTo = 1 (recodes all values from 1 to 4 into 1)

value range for doubles

for double vectors (with fractional part), all values within the specified range are recoded; e.g. recFrom = "1:2.5'; recTo = 1 recodes 1 to 2.5 into 1, but 2.55 would not be recoded (since it's not included in the specified range)

"min" and "max"

minimum and maximum values are indicates by min (or lo) and max (or hi), e.g. recFrom = "min:4"; recTo = 1 (recodes all values from minimum values of x to 4 into 1)

"else"

all other values, which have not been specified yet, are indicated by else, e.g. recFrom = "else"; recTo = NA (recode all other values (not specified in other rows) to "NA")

"copy"

the "else"-token can be combined with copy, indicating that all remaining, not yet recoded values should stay the same (are copied from the original value), e.g. recFrom = "else"; recTo = "copy"

NA's

NA values are allowed both as old and new value, e.g. recFrom "NA"; recTo = 1. or "recFrom = "3:5"; recTo = "NA" (recodes all NA into 1, and all values from 3 to 5 into NA in the new variable)

Examples

library(cchsflow) bmi2001 <- rec_with_table( data = cchs2001_p, c( "HWTGHTM", "HWTGWTK", "HWTGBMI_der" ) )
#> No variable_details detected. #> Loading cchsflow variable_details
#> Using the passed data variable name as database_name
#> NOTE for HWTGHTM: 2001 and 2003 CCHS use inches, values converted to meters to 3 decimal points
#> NOTE for HWTGHTM: 74+ inches converted to 76 inches
head(bmi2001)
#> HWTGHTM HWTGWTK HWTGBMI_der #> 1 1.422 56.25 27.81784 #> 2 1.549 51.75 21.56788 #> 3 1.803 78.75 24.22474 #> 4 1.575 60.75 24.48980 #> 5 1.727 63.00 21.12301 #> 6 1.829 91.80 27.44197
bmi2011_2012 <- rec_with_table( data = cchs2011_2012_p, c( "HWTGHTM", "HWTGWTK", "HWTGBMI_der" ) )
#> No variable_details detected. #> Loading cchsflow variable_details
#> Using the passed data variable name as database_name
#> NOTE for HWTGHTM: Height is a reported in meters from 2005 CCHS onwards
tail(bmi2011_2012)
#> HWTGHTM HWTGWTK HWTGBMI_der #> 195 1.600 64.35 25.13672 #> 196 1.880 80.10 22.66297 #> 197 1.753 78.75 25.62635 #> 198 1.651 83.70 30.70657 #> 199 1.930 74.25 19.93342 #> 200 1.575 76.50 30.83900
combined_bmi <- bind_rows(bmi2001, bmi2011_2012) head(combined_bmi)
#> HWTGHTM HWTGWTK HWTGBMI_der #> 1 1.422 56.25 27.81784 #> 2 1.549 51.75 21.56788 #> 3 1.803 78.75 24.22474 #> 4 1.575 60.75 24.48980 #> 5 1.727 63.00 21.12301 #> 6 1.829 91.80 27.44197
tail(combined_bmi)
#> HWTGHTM HWTGWTK HWTGBMI_der #> 395 1.600 64.35 25.13672 #> 396 1.880 80.10 22.66297 #> 397 1.753 78.75 25.62635 #> 398 1.651 83.70 30.70657 #> 399 1.930 74.25 19.93342 #> 400 1.575 76.50 30.83900