We have examples to demonstrate how to recode variables with the recodeflow function rec_with_table()

Our examples use following packages:

Package recodeflow

Steps on how to install recodeflow are in how to install

#Load the package
library(recodeflow)

Package dplyr to combine datasets (function: bind_rows).

Our examples use example data

Our examples use the dataset pbc from the package survival. We’ve split this dataset in two (tester1 and tester2) to mimic real data e.g., the same survey preformed in separate years. For our examples, we’ve also added columns (agegrp5 and agegrp10) to this dataset.

test1 <- survival::pbc[1:209,]
test2 <- survival::pbc[210:418,]

#Adapting the data for How To examples. Breaking cont age variable into categories - 5 and 10 year age groups.
agegrp <- cut(test1$age, breaks = c(25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80), right = FALSE)
agegrp <- as.numeric(agegrp)
tester1 <- cbind(test1, agegrp)

agegrp <- cut(test2$age, breaks = c(20, 30, 40, 50, 60, 70, 80), right = FALSE)
agegrp <- as.numeric(agegrp)
tester2 <- cbind(test2, agegrp)

Example 1. Recode a single variable from a single dataset

In our example datasets, the variable sex contains the values: m for males and f for females.

Using dataset tester1, we’ll recode the variable sex into a harmonized sex variable. The harmonized sex variable has the values: 0 for males and 1 for females.

  1. Recode the sex variable in tester1.
sex_1 <- rec_with_table(data = tester1, 
                        variables = "sex", 
                        variable_details = recodeflow::tester_variable_details,
                        log = TRUE,
                        var_labels = c(sex = "sex"),
                        database_name = 'tester1'
                        )
#> The variable sex was recoded into sex for the database tester1 the following recodes were made:
#> # A tibble: 4 × 3
#>   value_to From  rows_recoded
#>   <chr>    <chr>        <int>
#> 1 m        m               27
#> 2 f        f              182
#> 3 NA::a    9                0
#> 4 NA(b)    else             0
#>   sex
#> 1   f
#> 2   f
#> 3   m
#> 4   f
#> 5   f
#> 6   f

Example 2. Recode a single variable across multiple datasets

We’ll recode and combine the variable sex for our two datasets.

  1. Recode the sex variable in tester1 and tester2.
sex_1 <- rec_with_table(data = tester1, 
                        variables = "sex", 
                        variable_details = recodeflow::tester_variable_details,
                        log = TRUE,
                        var_labels = c(sex = "Sex"),
                        database_name = 'tester1'
                        )
#> The variable sex was recoded into sex for the database tester1 the following recodes were made:
#> # A tibble: 4 × 3
#>   value_to From  rows_recoded
#>   <chr>    <chr>        <int>
#> 1 m        m               27
#> 2 f        f              182
#> 3 NA::a    9                0
#> 4 NA(b)    else             0
head(sex_1)
#>   sex
#> 1   f
#> 2   f
#> 3   m
#> 4   f
#> 5   f
#> 6   f

sex_2 <- rec_with_table(data = tester2, 
                        variables = "sex", 
                        variable_details = recodeflow::tester_variable_details,
                        log = TRUE,
                        var_labels = c(sex = "Sex"),
                        database_name = 'tester2'
                        )
#> The variable sex was recoded into sex for the database tester2 the following recodes were made:
#> # A tibble: 4 × 3
#>   value_to From  rows_recoded
#>   <chr>    <chr>        <int>
#> 1 m        m               17
#> 2 f        f              192
#> 3 NA::a    9                0
#> 4 NA(b)    else             0
tail(sex_2)
#>     sex
#> 413   f
#> 414   f
#> 415   f
#> 416   f
#> 417   f
#> 418   f
  1. Combine the harmonized sex variable from tester1 to the harmonized sex variable in tester2.
sex_combined <- bind_rows(sex_1, sex_2)
#>   sex
#> 1   f
#> 2   f
#> 3   m
#> 4   f
#> 5   f
#> 6   f
#>     sex
#> 413   f
#> 414   f
#> 415   f
#> 416   f
#> 417   f
#> 418   f
  1. Set labels

Labels are lost during the database merging.

Use set_data_labels() to label the variables in your final dataset. set_data_labels() sets the labels with the original information in variables and variable_details.

labeled_sex_combined <- set_data_labels(
  data_to_label = sex_combined,
  variable_details = recodeflow::tester_variable_details,
  variables_sheet = recodeflow::tester_variables
)

Example 3. Recode a single variable, with different categories, from multiple datasets

You could have a situation where a variable is the same across datasets but its categories change.

In our example data the variable agegrp is different in tester1 and tester2.

  • In tester1 the agegrp variable is 5-year age groups: 20-24, 25-29, 30-34, etc.
  • In tester2 the agegrp variable is 10-year age groups: 20-29, 30-39, 40-49, etc.

There are three options to facilitate the use of variables with inconsistent categories across datasets.

Option 1: recode category agegrp variable into a common variable for only datasets with the same category responses

Recode the agegrp variable into a common variable only in datasets were the categories are the same. If the categories are different between datasets, separate columns will be created.

The categories in the agegrp variable in tester1 are different than the categories of agegrp in tester2. Therefore, it is not possible to have the same agegrp categories across our example data sets.

  1. Recode agegrp5 in tester1 and recode agegrp10 in tester2.
agegrp_1 <- rec_with_table(data = tester1, 
                           variables = "agegrp5", 
                           variable_details = recodeflow::tester_variable_details,
                           log = TRUE,
                           database_name = 'tester1'
                           )
#> The variable agegrp was recoded into agegrp5 for the database tester1 the following recodes were made:
#> # A tibble: 12 × 3
#>    value_to From  rows_recoded
#>    <chr>    <chr>        <int>
#>  1 1        1                2
#>  2 2        2               12
#>  3 3        3               15
#>  4 4        4               37
#>  5 5        5               37
#>  6 6        6               40
#>  7 7        7               28
#>  8 8        8               21
#>  9 9        9                9
#> 10 10       10               6
#> 11 11       11               2
#> 12 Na::b    else             0
head(agegrp_1)
#>   agegrp5
#> 1       7
#> 2       7
#> 3      10
#> 4       6
#> 5       3
#> 6       9


agegrp_2 <- rec_with_table(data = tester2, 
                             variables = "agegrp10", 
                             variable_details = recodeflow::tester_variable_details,
                             log = TRUE,
                           database_name = 'tester2')
#> The variable agegrp was recoded into agegrp10 for the database tester2 the following recodes were made:
#> # A tibble: 7 × 3
#>   value_to From  rows_recoded
#>   <chr>    <chr>        <int>
#> 1 1        1                1
#> 2 2        2               39
#> 3 3        3               52
#> 4 4        4               67
#> 5 5        5               45
#> 6 6        6                5
#> 7 NA(b)    else             0
head(agegrp_2)
#>     agegrp10
#> 210        3
#> 211        4
#> 212        3
#> 213        4
#> 214        5
#> 215        3
  1. Combine the harmonized variable agegrp5 in tester1 with the harmonized agegrp10 in tester2.
agegrp_combined <- bind_rows(agegrp_1, agegrp_2)
#>   agegrp5 agegrp10
#> 1       7     <NA>
#> 2       7     <NA>
#> 3      10     <NA>
#> 4       6     <NA>
#> 5       3     <NA>
#> 6       9     <NA>
#>     agegrp5 agegrp10
#> 413    <NA>        2
#> 414    <NA>        5
#> 415    <NA>        2
#> 416    <NA>        4
#> 417    <NA>        4
#> 418    <NA>        4

Option 2: recode the categorical agegrp variable into a continuous age_cont variable

Recode categorical variable agegrp into a single harmonized continuous variable age_cont.

age_cont takes the midpoint age of each category for ‘agegrp’ across datasets. With this option, the categorical variable ‘agegrp’ from each dataset can be combined into a single dataset.

  1. Recode variable agegrp in tester1 and agegrp in tester2 to the harmonized continuous variable age_cont.
agegrp_1_cont <- rec_with_table(data = tester1, 
                             variables = "age_cont", 
                             variable_details = recodeflow::tester_variable_details,
                             log = TRUE,
                           database_name = 'tester1')
#> The variable agegrp was recoded into age_cont for the database tester1 the following recodes were made:
#> # A tibble: 12 × 3
#>    value_to From  rows_recoded
#>    <chr>    <chr>        <int>
#>  1 27       1                2
#>  2 32       2               12
#>  3 37       3               15
#>  4 42       4               37
#>  5 47       5               37
#>  6 52       6               40
#>  7 57       7               28
#>  8 62       8               21
#>  9 67       9                9
#> 10 72       10               6
#> 11 77       11               2
#> 12 NA       else             0
head(agegrp_1_cont)
#>   age_cont
#> 1       57
#> 2       57
#> 3       72
#> 4       52
#> 5       37
#> 6       67


agegrp_2_cont <- rec_with_table(data = tester2, 
                             variables = "age_cont", 
                             variable_details = recodeflow::tester_variable_details,
                             log = TRUE,
                           database_name = 'tester2')
#> The variable agegrp was recoded into age_cont for the database tester2 the following recodes were made:
#> # A tibble: 7 × 3
#>   value_to From  rows_recoded
#>   <chr>    <chr>        <int>
#> 1 25       1                1
#> 2 35       2               39
#> 3 45       3               52
#> 4 55       4               67
#> 5 65       5               45
#> 6 75       6                5
#> 7 NA       else             0
head(agegrp_2_cont)
#>     age_cont
#> 210       45
#> 211       55
#> 212       45
#> 213       55
#> 214       65
#> 215       45
  1. Combine the harmonized continous variable age_cont from tester1 and tester2.
agegrp_cont_combined <- bind_rows(agegrp_1_cont, agegrp_2_cont)
#>   age_cont
#> 1       57
#> 2       57
#> 3       72
#> 4       52
#> 5       37
#> 6       67
#>     age_cont
#> 413       35
#> 414       65
#> 415       35
#> 416       55
#> 417       55
#> 418       55

Option 3: recode the categorical agegrp variable into a harmonized categorical variable

Dataset tester1 has 5-year age groups (e.g., 30-34, 35-39), and tester2 has 10-year age groups (e.g., 30-39). Therefore, we can collapse the 5-year age groups in dataset tester1 to the same 10-year age groups in dataset tester2.

  1. Recode variable agegrp in tester1 into agegrp10. recode variable agegrp in tester2 into agegrp10.
agegrp10_1 <- rec_with_table(data = tester1, 
                             variables = "agegrp10", 
                             variable_details = recodeflow::tester_variable_details,
                             log = TRUE,
                           database_name = 'tester1')
#> The variable agegrp was recoded into agegrp10 for the database tester1 the following recodes were made:
#> # A tibble: 12 × 3
#>    value_to From  rows_recoded
#>    <chr>    <chr>        <int>
#>  1 1        1                2
#>  2 2        2               12
#>  3 2        3               15
#>  4 3        4               37
#>  5 3        5               37
#>  6 4        6               40
#>  7 4        7               28
#>  8 5        8               21
#>  9 5        9                9
#> 10 6        10               6
#> 11 6        11               2
#> 12 NA(b)    else             0
head(agegrp10_1)
#>   agegrp10
#> 1        4
#> 2        4
#> 3        6
#> 4        4
#> 5        2
#> 6        5


agegrp10_2 <- rec_with_table(data = tester2, 
                             variables = "agegrp10", 
                             variable_details = recodeflow::tester_variable_details,
                             log = TRUE,
                            database_name = 'tester2')
#> The variable agegrp was recoded into agegrp10 for the database tester2 the following recodes were made:
#> # A tibble: 7 × 3
#>   value_to From  rows_recoded
#>   <chr>    <chr>        <int>
#> 1 1        1                1
#> 2 2        2               39
#> 3 3        3               52
#> 4 4        4               67
#> 5 5        5               45
#> 6 6        6                5
#> 7 NA(b)    else             0
head(agegrp10_2)
#>     agegrp10
#> 210        3
#> 211        4
#> 212        3
#> 213        4
#> 214        5
#> 215        3
  1. Combine the harmonized categorical variable age_cat from tester1 and tester2.
agegrp10_combined <- bind_rows(agegrp10_1, agegrp10_2)
#>   agegrp10
#> 1        4
#> 2        4
#> 3        6
#> 4        4
#> 5        2
#> 6        5
#>     agegrp10
#> 413        2
#> 414        5
#> 415        2
#> 416        4
#> 417        4
#> 418        4

Example 4. Recode multiple variables from multiple datasets

The variables argument in rec_with_table() allows multiple variables to be recoded from a dataset.

In this example, the age and sex variables from the tester1 and tester2 datasets will be recoded and labeled using rec_with_table().

We’ll then combine the two recoded datasets into a single dataset and labeled using set_data_labels().

  1. Recode age and sex in dataset tester1 and tester2
age_sex_1 <- rec_with_table(data = tester1, 
                            variables = c("age", "sex"), 
                            variable_details = recodeflow::tester_variable_details,
                            log = TRUE, 
                            var_labels = c(age = "Age", sex = "Sex"),
                            database_name = 'tester1')
#> The variable age was recoded into age for the database tester1 the following recodes were made:
#> # A tibble: 3 × 3
#>   value_to From    rows_recoded
#>   <chr>    <chr>          <int>
#> 1 copy     [20,80]          209
#> 2 NA::a    999                0
#> 3 NA       else               0
#> The variable sex was recoded into sex for the database tester1 the following recodes were made:
#> # A tibble: 4 × 3
#>   value_to From  rows_recoded
#>   <chr>    <chr>        <int>
#> 1 m        m               27
#> 2 f        f              182
#> 3 NA::a    9                0
#> 4 NA(b)    else             0
head(age_sex_1)
#>        age sex
#> 1 58.76523   f
#> 2 56.44627   f
#> 3 70.07255   m
#> 4 54.74059   f
#> 5 38.10541   f
#> 6 66.25873   f

age_sex_2 <- rec_with_table(data = tester2, 
                            variables = c("age", "sex"), 
                            variable_details = recodeflow::tester_variable_details,
                            log = TRUE, 
                            var_labels = c(age = "Age", sex = "Sex"),
                            database_name = 'tester2'
                            )
#> The variable age was recoded into age for the database tester2 the following recodes were made:
#> # A tibble: 3 × 3
#>   value_to From    rows_recoded
#>   <chr>    <chr>          <int>
#> 1 copy     [20,80]          209
#> 2 NA::a    999                0
#> 3 NA       else               0
#> The variable sex was recoded into sex for the database tester2 the following recodes were made:
#> # A tibble: 4 × 3
#>   value_to From  rows_recoded
#>   <chr>    <chr>        <int>
#> 1 m        m               17
#> 2 f        f              192
#> 3 NA::a    9                0
#> 4 NA(b)    else             0
head(age_sex_2)
#>          age sex
#> 210 49.76318   m
#> 211 52.91444   f
#> 212 47.26352   f
#> 213 50.20397   f
#> 214 69.34702   f
#> 215 41.16906   f
  1. Combine the harmonized variables age and sex from tester1 and tester2.
combined_age_sex <- bind_rows(age_sex_1, age_sex_2)
head(combined_age_sex)
#>        age sex
#> 1 58.76523   f
#> 2 56.44627   f
#> 3 70.07255   m
#> 4 54.74059   f
#> 5 38.10541   f
#> 6 66.25873   f
  1. Set labels

Use set_data_labels() to label the variables in your final dataset. set_data_labels() sets the labels with the original information in variables and variable_details.

var_labels can be used all the variables in variables.csv or a subset of variables.

labeled_combined_age_sex <- 
  set_data_labels(
      data_to_label = combined_age_sex,
      variable_details = recodeflow::tester_variable_details,
      variables_sheet = recodeflow::tester_variables
      )

You can check if labels have been added to your recoded dataset by using get_label().

library(sjlabelled) 
#> 
#> Attaching package: 'sjlabelled'
#> The following object is masked from 'package:dplyr':
#> 
#>     as_label
get_label(labeled_combined_age_sex)
#>   age   sex 
#> "age" "sex"

For more information on get_label() and other label helper functions, please refer to the sjlabelled package.

Example 5. Recode all variables in the variables worksheet

All the variables listed in variables worksheet can be recoded with rec_with_table().

In this example, all variables specified in the variables worksheet will be recoded and combined for the datasets tester1 and tester2.

  1. Recode all variables listed in the variables worksheet, for dataset tester1 and dataset tester2
recoded1 <- rec_with_table(data = tester1,
                           variables = recodeflow::tester_variables,
                           variable_details = recodeflow::tester_variable_details,
                          log = TRUE,
                           database_name = 'tester1'
                          )
#> The variable age was recoded into age for the database tester1 the following recodes were made:
#>   value_to    From rows_recoded
#> 1     copy [20,80]          209
#> 2    NA::a     999            0
#> 3     <NA>    else            0
#> The variable agegrp was recoded into age_cont for the database tester1 the following recodes were made:
#>    value_to From rows_recoded
#> 1        52    6           40
#> 2        27    1            2
#> 3        32    2           12
#> 4        37    3           15
#> 5        42    4           37
#> 6        47    5           37
#> 7        57    7           28
#> 8        62    8           21
#> 9        67    9            9
#> 10       72   10            6
#> 11       77   11            2
#> 12     <NA> else            0
#> The variable agegrp was recoded into agegrp10 for the database tester1 the following recodes were made:
#>    value_to From rows_recoded
#> 1         6   10            6
#> 2         6   11            2
#> 3         2    3           15
#> 4         3    4           37
#> 5         1    1            2
#> 6         2    2           12
#> 7         4    7           28
#> 8         5    8           21
#> 9         3    5           37
#> 10        4    6           40
#> 11        5    9            9
#> 12    NA(b) else            0
#> The variable agegrp was recoded into agegrp5 for the database tester1 the following recodes were made:
#>    value_to From rows_recoded
#> 1         4    4           37
#> 2         5    5           37
#> 3         1    1            2
#> 4         2    2           12
#> 5         3    3           15
#> 6         9    9            9
#> 7         6    6           40
#> 8         7    7           28
#> 9         8    8           21
#> 10       10   10            6
#> 11       11   11            2
#> 12    Na::b else            0
#> The variable albumin was recoded into albumin for the database tester1 the following recodes were made:
#>   value_to  From rows_recoded
#> 1     copy [1,5]          209
#> 2    NA::a    99            0
#> 3     <NA>  else            0
#> The variable alk.phos was recoded into alk.phos for the database tester1 the following recodes were made:
#>   value_to        From rows_recoded
#> 1     copy [200,15000]          209
#> 2    NA::a       99999            0
#> 3     <NA>        else            0
#> The variable ascites was recoded into ascites for the database tester1 the following recodes were made:
#>   value_to From rows_recoded
#> 1        1    1           18
#> 2    NA::a    9            0
#> 3        0    0          191
#> 4    NA(b) else            0
#> The variable ast was recoded into ast for the database tester1 the following recodes were made:
#>   value_to     From rows_recoded
#> 1     copy [20,500]          209
#> 2    NA::a     9999            0
#> 3     <NA>     else            0
#> The variable bili was recoded into bili for the database tester1 the following recodes were made:
#>   value_to    From rows_recoded
#> 1     copy [0,100]          209
#> 2     <NA>    else            0
#> The variable chol was recoded into chol for the database tester1 the following recodes were made:
#>   value_to       From rows_recoded
#> 1     copy [100,2000]          186
#> 2    NA::a       9999            0
#> 3     <NA>       else           23
#> The variable copper was recoded into copper for the database tester1 the following recodes were made:
#>   value_to     From rows_recoded
#> 1     copy [0,1000]          208
#> 2    NA::a     9999            0
#> 3     <NA>     else            1
#> The variable edema was recoded into edema for the database tester1 the following recodes were made:
#>   value_to From rows_recoded
#> 1      0.5  0.5           22
#> 2        1    1           16
#> 3        0    0          171
#> 4    NA::a    9            0
#> 5    NA(b) else            0
#> The variable hepato was recoded into hepato for the database tester1 the following recodes were made:
#>   value_to From rows_recoded
#> 1        0    0          102
#> 2        1    1          107
#> 3    NA::a    9            0
#> 4    NA(b) else            0
#> The variable platelet was recoded into platelet for the database tester1 the following recodes were made:
#>   value_to     From rows_recoded
#> 1     copy [0,1000]          205
#> 2    NA::a     9999            0
#> 3     <NA>     else            4
#> The variable protime was recoded into protime for the database tester1 the following recodes were made:
#>   value_to    From rows_recoded
#> 1     copy [5, 30]          209
#> 2    NA::a      99            0
#> 3     <NA>    else            0
#> The variable sex was recoded into sex for the database tester1 the following recodes were made:
#>   value_to From rows_recoded
#> 1        m    m           27
#> 2    NA::a    9            0
#> 3        f    f          182
#> 4    NA(b) else            0
#> The variable spiders was recoded into spiders for the database tester1 the following recodes were made:
#>   value_to From rows_recoded
#> 1        0    0          145
#> 2    NA::a    9            0
#> 3        1    1           64
#> 4    NA(b) else            0
#> The variable stage was recoded into stage for the database tester1 the following recodes were made:
#>   value_to From rows_recoded
#> 1        1    1           12
#> 2        2    2           46
#> 3        4    4           74
#> 4    NA::a    9            0
#> 5        3    3           77
#> 6    NA(b) else            0
#> The variable status was recoded into status for the database tester1 the following recodes were made:
#>   value_to From rows_recoded
#> 1        1    1            7
#> 2        2    2          108
#> 3        0    0           94
#> 4    NA::a    9            0
#> 5    NA(b) else            0
#> The variable time was recoded into time for the database tester1 the following recodes were made:
#>   value_to     From rows_recoded
#> 1     copy [0,5000]          209
#> 2    NA::a     9999            0
#> 3     <NA>     else            0
#> The variable trig was recoded into trig for the database tester1 the following recodes were made:
#>   value_to     From rows_recoded
#> 1     copy [0,1000]          185
#> 2    NA::a     9999            0
#> 3     <NA>     else           24
#> The variable trt was recoded into trt for the database tester1 the following recodes were made:
#>   value_to From rows_recoded
#> 1        2    2          103
#> 2    NA::a    9            0
#> 3        1    1          106
#> 4    NA(b) else            0

recoded2 <- rec_with_table(data = tester2,
                           variables = recodeflow::tester_variables,
                           variable_details = recodeflow::tester_variable_details,
                          log = TRUE,
                           database_name = 'tester2'
                          )
#> The variable age was recoded into age for the database tester2 the following recodes were made:
#>   value_to    From rows_recoded
#> 1     copy [20,80]          209
#> 2    NA::a     999            0
#> 3     <NA>    else            0
#> The variable agegrp was recoded into age_cont for the database tester2 the following recodes were made:
#>   value_to From rows_recoded
#> 1       25    1            1
#> 2       35    2           39
#> 3       45    3           52
#> 4       55    4           67
#> 5       65    5           45
#> 6       75    6            5
#> 7     <NA> else            0
#> The variable agegrp was recoded into agegrp10 for the database tester2 the following recodes were made:
#>   value_to From rows_recoded
#> 1        1    1            1
#> 2        2    2           39
#> 3        3    3           52
#> 4        4    4           67
#> 5        5    5           45
#> 6        6    6            5
#> 7    NA(b) else            0
#> The variable albumin was recoded into albumin for the database tester2 the following recodes were made:
#>   value_to  From rows_recoded
#> 1     copy [1,5]          209
#> 2    NA::a    99            0
#> 3     <NA>  else            0
#> The variable alk.phos was recoded into alk.phos for the database tester2 the following recodes were made:
#>   value_to        From rows_recoded
#> 1     copy [200,15000]          103
#> 2    NA::a       99999            0
#> 3     <NA>        else          106
#> The variable ascites was recoded into ascites for the database tester2 the following recodes were made:
#>   value_to From rows_recoded
#> 1        1    1            6
#> 2    NA::a    9            0
#> 3        0    0           97
#> 4    NA(b) else          106
#> The variable ast was recoded into ast for the database tester2 the following recodes were made:
#>   value_to     From rows_recoded
#> 1     copy [20,500]          103
#> 2    NA::a     9999            0
#> 3     <NA>     else          106
#> The variable bili was recoded into bili for the database tester2 the following recodes were made:
#>   value_to    From rows_recoded
#> 1     copy [0,100]          209
#> 2     <NA>    else            0
#> The variable chol was recoded into chol for the database tester2 the following recodes were made:
#>   value_to       From rows_recoded
#> 1     copy [100,2000]           98
#> 2    NA::a       9999            0
#> 3     <NA>       else          111
#> The variable copper was recoded into copper for the database tester2 the following recodes were made:
#>   value_to     From rows_recoded
#> 1     copy [0,1000]          102
#> 2    NA::a     9999            0
#> 3     <NA>     else          107
#> The variable edema was recoded into edema for the database tester2 the following recodes were made:
#>   value_to From rows_recoded
#> 1      0.5  0.5           22
#> 2        1    1            4
#> 3        0    0          183
#> 4    NA::a    9            0
#> 5    NA(b) else            0
#> The variable hepato was recoded into hepato for the database tester2 the following recodes were made:
#>   value_to From rows_recoded
#> 1        0    0           50
#> 2        1    1           53
#> 3    NA::a    9            0
#> 4    NA(b) else          106
#> The variable platelet was recoded into platelet for the database tester2 the following recodes were made:
#>   value_to     From rows_recoded
#> 1     copy [0,1000]          202
#> 2    NA::a     9999            0
#> 3     <NA>     else            7
#> The variable protime was recoded into protime for the database tester2 the following recodes were made:
#>   value_to    From rows_recoded
#> 1     copy [5, 30]          207
#> 2    NA::a      99            0
#> 3     <NA>    else            2
#> The variable sex was recoded into sex for the database tester2 the following recodes were made:
#>   value_to From rows_recoded
#> 1        m    m           17
#> 2    NA::a    9            0
#> 3        f    f          192
#> 4    NA(b) else            0
#> The variable spiders was recoded into spiders for the database tester2 the following recodes were made:
#>   value_to From rows_recoded
#> 1        0    0           77
#> 2    NA::a    9            0
#> 3        1    1           26
#> 4    NA(b) else          106
#> The variable stage was recoded into stage for the database tester2 the following recodes were made:
#>   value_to From rows_recoded
#> 1        1    1            9
#> 2        2    2           46
#> 3        4    4           70
#> 4    NA::a    9            0
#> 5        3    3           78
#> 6    NA(b) else            6
#> The variable status was recoded into status for the database tester2 the following recodes were made:
#>   value_to From rows_recoded
#> 1        1    1           18
#> 2        2    2           53
#> 3        0    0          138
#> 4    NA::a    9            0
#> 5    NA(b) else            0
#> The variable time was recoded into time for the database tester2 the following recodes were made:
#>   value_to     From rows_recoded
#> 1     copy [0,5000]          209
#> 2    NA::a     9999            0
#> 3     <NA>     else            0
#> The variable trig was recoded into trig for the database tester2 the following recodes were made:
#>   value_to     From rows_recoded
#> 1     copy [0,1000]           97
#> 2    NA::a     9999            0
#> 3     <NA>     else          112
#> The variable trt was recoded into trt for the database tester2 the following recodes were made:
#>   value_to From rows_recoded
#> 1        2    2           51
#> 2    NA::a    9            0
#> 3        1    1           52
#> 4    NA(b) else          106
  1. Combine recoded datasets
combined_dataset <- bind_rows(recoded1, recoded2)
  1. Set labels for the combined recoded dataset
labeled_combined <- set_data_labels(data_to_label = combined_dataset,
                                    variable_details = recodeflow::tester_variable_details,
                                    variables_sheet = recodeflow::tester_variables
                                    )

Example 6: Add the data origin in combined datasets

To know the origin of each row of data, you can use the rec_with_table argument attach_data_name. When the argument attach_data_name is set to true it will add a column with the name of the dataset the row is from.

  1. Recode variables age and sex and attach dataset name for tester1 and tester2.
age_sex_1 <- rec_with_table(data = tester1,
                            variables = c("age", "sex"), 
                            variable_details = recodeflow::tester_variable_details,
                            var_labels = c(age = "Age", sex = "Sex"),
                            log = TRUE,
                            attach_data_name = TRUE,
                           database_name = 'tester1'
                            )
#> The variable age was recoded into age for the database tester1 the following recodes were made:
#> # A tibble: 3 × 3
#>   value_to From    rows_recoded
#>   <chr>    <chr>          <int>
#> 1 copy     [20,80]          209
#> 2 NA::a    999                0
#> 3 NA       else               0
#> The variable sex was recoded into sex for the database tester1 the following recodes were made:
#> # A tibble: 4 × 3
#>   value_to From  rows_recoded
#>   <chr>    <chr>        <int>
#> 1 m        m               27
#> 2 f        f              182
#> 3 NA::a    9                0
#> 4 NA(b)    else             0

age_sex_2 <- rec_with_table(data = tester2,
                            variables = c("age", "sex"), 
                            variable_details = recodeflow::tester_variable_details,
                            var_labels = c(age = "Age", sex = "Sex"),
                            log = TRUE,
                            attach_data_name = TRUE,
                           database_name = 'tester2'
                            )
#> The variable age was recoded into age for the database tester2 the following recodes were made:
#> # A tibble: 3 × 3
#>   value_to From    rows_recoded
#>   <chr>    <chr>          <int>
#> 1 copy     [20,80]          209
#> 2 NA::a    999                0
#> 3 NA       else               0
#> The variable sex was recoded into sex for the database tester2 the following recodes were made:
#> # A tibble: 4 × 3
#>   value_to From  rows_recoded
#>   <chr>    <chr>        <int>
#> 1 m        m               17
#> 2 f        f              192
#> 3 NA::a    9                0
#> 4 NA(b)    else             0
  1. Combine the harmonized datasets
combined_age_sex <- bind_rows(age_sex_1, age_sex_2)

head(combined_age_sex)
#>        age sex data_name
#> 1 58.76523   f   tester1
#> 2 56.44627   f   tester1
#> 3 70.07255   m   tester1
#> 4 54.74059   f   tester1
#> 5 38.10541   f   tester1
#> 6 66.25873   f   tester1
tail(combined_age_sex)
#>          age sex data_name
#> 413 35.00068   f   tester2
#> 414 67.00068   f   tester2
#> 415 39.00068   f   tester2
#> 416 56.99932   f   tester2
#> 417 58.00137   f   tester2
#> 418 52.99932   f   tester2

Example 7. Recode derived variables

Derived variables are variables that are not in the original dataset; rather they are created using variables from the original dataset.

Descriptions of derived functions are in the article derived functions

To recode a derived variable, you must:

  • create a customized function,
  • defined the derived variable on the worksheets variables and variable_details,
  • recode the variables that make up the derived variable.

Our example derived variable example_der equals chol times bili.

  1. Recode the underlying variables: chol and bili and the derived variable example_der for tester1 and tester2.
derived1 <- rec_with_table(data = tester1,
                          variables = c("chol", "bili","example_der"),
                          variable_details = recodeflow::tester_variable_details,
                          log = TRUE,
                           database_name = 'tester1')
#> The variable bili was recoded into bili for the database tester1 the following recodes were made:
#> # A tibble: 2 × 3
#>   value_to From    rows_recoded
#>   <chr>    <chr>          <int>
#> 1 copy     [0,100]          209
#> 2 NA       else               0
#> The variable chol was recoded into chol for the database tester1 the following recodes were made:
#> # A tibble: 3 × 3
#>   value_to From       rows_recoded
#>   <chr>    <chr>             <int>
#> 1 copy     [100,2000]          186
#> 2 NA::a    9999                  0
#> 3 NA       else                 23

derived2 <- rec_with_table(data = tester2,
                          variables = c("chol", "bili","example_der"),
                          variable_details = recodeflow::tester_variable_details,
                          log = TRUE,
                           database_name = 'tester2')
#> The variable bili was recoded into bili for the database tester2 the following recodes were made:
#> # A tibble: 2 × 3
#>   value_to From    rows_recoded
#>   <chr>    <chr>          <int>
#> 1 copy     [0,100]          209
#> 2 NA       else               0
#> The variable chol was recoded into chol for the database tester2 the following recodes were made:
#> # A tibble: 3 × 3
#>   value_to From       rows_recoded
#>   <chr>    <chr>             <int>
#> 1 copy     [100,2000]           98
#> 2 NA::a    9999                  0
#> 3 NA       else                111
  1. Combine the harmonized variables: chol, bili, and exampler_der
combined_der <- bind_rows(derived1, derived2)