Main orchestrator function that generates complete mock datasets from configuration files. Reads metadata, filters for enabled variables, dispatches to type-specific create_* functions, and assembles results into a complete data frame.
Usage
create_mock_data(
databaseStart,
variables,
variable_details = NULL,
n = 1000,
seed = NULL,
validate = TRUE,
verbose = FALSE
)Arguments
- databaseStart
Character. The database identifier (e.g., "cchs2001_p", "minimal-example"). Used to filter variables to those available in the specified database.
- variables
data.frame or character. Variable-level metadata containing:
variable: Variable namesvariableType: Variable type (Categorical/Continuous/Date)role: Role tags (enabled, predictor, outcome, etc.)position: Display order (optional)database: Database filter (optional)
Can also be a file path (character) to variables.csv.
- variable_details
data.frame or character. Detail-level metadata containing:
variable: Variable name (for joining)recStart: Category code/range or date intervalrecEnd: Classification (numeric code, "NA::a", "NA::b")proportion: Category proportion (for categorical)catLabel: Category label/description
Can also be a file path (character) to variable_details.csv. If NULL, uses simple fallback generation.
- n
Integer. Number of observations to generate (default 1000).
- seed
Integer. Optional random seed for reproducibility.
- validate
Logical. Whether to use strict generation checks (default TRUE). When TRUE, unsupported variable types and generator errors stop generation. When FALSE, those errors are converted to warnings and the affected variable is skipped.
- verbose
Logical. Whether to print progress messages (default FALSE).
Details
v0.3.0 API: This function now follows the "recodeflow pattern" where it passes full metadata data frames to create_* functions, which handle internal filtering.
Generation process:
Load metadata from file paths or accept data frames
Filter for enabled variables (role has an exact "enabled" token)
Set global seed (if provided)
Loop through variables in position order: - Dispatch to create_cat_var, create_con_var, or create_date_var - Pass full metadata data frames (functions filter internally) - Merge result into data frame
Return complete dataset
Fallback mode: If variable_details = NULL, uses simple default generators for enabled variables (two-category categorical values, continuous values from 0, 100, and dates from 2000-01-01 to 2025-12-31).
Variable types supported:
Categorical: create_cat_var()Continuous: create_con_var()Date: create_date_var()
Configuration schema: For complete documentation of all configuration columns,
see vignette("reference-config", package = "MockData").
See also
Other generators:
create_cat_var(),
create_con_var(),
create_date_var(),
create_survival_dates(),
create_wide_survival_data()
Examples
if (FALSE) { # \dontrun{
# Basic usage with file paths
mock_data <- create_mock_data(
databaseStart = "minimal-example",
variables = "inst/extdata/minimal-example/variables.csv",
variable_details = "inst/extdata/minimal-example/variable_details.csv",
n = 1000,
seed = 123
)
# With data frames instead of file paths
variables <- read.csv("inst/extdata/minimal-example/variables.csv",
stringsAsFactors = FALSE)
variable_details <- read.csv("inst/extdata/minimal-example/variable_details.csv",
stringsAsFactors = FALSE)
mock_data <- create_mock_data(
databaseStart = "minimal-example",
variables = variables,
variable_details = variable_details,
n = 1000,
seed = 123
)
# Fallback mode (uniform distributions, no variable_details)
mock_data <- create_mock_data(
databaseStart = "minimal-example",
variables = "inst/extdata/minimal-example/variables.csv",
variable_details = NULL,
n = 500
)
# View structure
str(mock_data)
head(mock_data)
} # }