recodeflow supports the use of derived variables. Derived variables can be any custom function as long as the variable can be calculated on a per row basis. Functions requiring operations across rows or on the full data set are not supported.
The two most common uses for derived variables are:
To create derived variables, you need to complete two steps:
We’ll walk through an example of creating a derived variable with our example data.
Our customized derived function is multiplying the blood concentration of cholesterol (chol
) with the blood concentration of bilirunbin (bili
).
Create the custom function: Here is the customized function for our derived variable (chol
*bili
):
#example_der_fun caluclates chol*bili
#@param chol the row value for chol
#@param bili the row value for bili
#@export
example_der_fun <- function(chol, bili){
# as numeric is used to coerce in case categorical numeric variables are used.
# Warning either chol or bili being NA will result in NA return
example_der <- as.numeric(chol)*as.numeric(bili)
return(example_der)
}
Note: You must use roxygen2 documentation for custom functions otherwise the function cannot be attached to a package. See roxygen2 on how to format and document your function.
Load the custom function into your R environment. Load the customized function by either:
rec_with_table
parameter to pass the path to your function R script.If you don’t load the customized function you cannot create the derived variable.
variable_details
and variables
worksheets.Add the derived variable to the variables
worksheet. You’ll use the same nomenclature as any other variable. See the article variables_sheet
for nomenclature rules.
Add the derived variable to the variable_details. See the article variable_details
for nomenclature rules.
Use the function rec_with_table
to recode your derived function.
#Load the package
library(recodeflow)
chol
and bili
) and the derived variable (example_der
).
derived1 <- rec_with_table(data = tester1,
variables = c("chol", "bili","example_der"),
variable_details = variable_details,
log = TRUE)
## Using the passed data variable name as database_name
## NOTE for bili: This is sample survival pbc data
## The variable bili was recoded into bili for the database tester1 the following recodes were made:
## value_to From rows_recoded
## 1 copy [0,28] 209
## 2 <NA> else 0
## NOTE for chol: This is sample survival pbc data
## NOTE for chol: This is sample survival pbc data
## The variable chol was recoded into chol for the database tester1 the following recodes were made:
## value_to From rows_recoded
## 1 copy [120, 1775] 186
## 2 Na::a NA 0
## 3 <NA> else 23