Skip to contents

Applies garbage model to introduce realistic data quality issues. Replaces some valid values with implausible values (garbage_low, garbage_high, garbage_future, etc.).

Usage

make_garbage(values, details_subset, variable_type, seed = NULL)

Arguments

values

Vector. Generated values (already has valid + missing).

details_subset

Data frame. Rows from details (contains garbage_* rows).

variable_type

Character. "categorical", "continuous", "date", "survival".

seed

Integer. Optional random seed.

Value

Vector with garbage applied.

Details

Two-step garbage model:

  1. Identify valid value indices (not missing codes)

  2. Sample from valid indices based on garbage proportions

  3. Replace with garbage values

  4. Ensure no overlap (use setdiff for sequential garbage application)

Garbage types:

  • garbage_low: Values below valid range (continuous, integer)

  • garbage_high: Values above valid range (continuous, integer)

  • garbage_future: Future dates (date, survival)

Examples

if (FALSE) { # \dontrun{
values <- c(23.5, 45.2, 7, 30.1, 9, 18.9, 25.6)
result <- make_garbage(values, details_subset, "continuous", seed = 123)
# Some valid values replaced with implausible values
} # }