About this vignette: This tutorial introduces the v0.4 mock_spec workflow. All code is executed when the vignette builds, so this page also serves as a user-flow test for the public API.
The v0.4 workflow
MockData v0.4 separates data generation into three steps:
- Create a
mock_spec
- Generate baseline valid values
- Apply missing-code and garbage-value post-processing
That separation makes it easier to inspect what was requested and what was changed after generation.
Specify variables directly
Use the direct helpers when you want a small mock dataset without creating CSV metadata files.
spec <- mock_spec(
mock_spec_continuous(
"age",
range = c(18, 85),
distribution = "normal",
mean = 50,
sd = 12,
rtype = "integer"
),
mock_spec_categorical(
"smoking",
levels = c("never", "former", "current"),
proportions = c(0.5, 0.3, 0.2),
rtype = "character",
missing_codes = "unknown",
missing_proportions = 0.05
)
)
validate_mock_spec(spec)
MockData mock_spec validation result: valid
Generate baseline values first. These are values within the intended valid space.
age smoking
1 46 former
2 57 former
3 42 never
4 53 never
5 54 current
6 64 never
Then apply missing-code and garbage-value rules. This step adds diagnostics as an attribute on the returned data frame.
age smoking
1 46 former
2 57 former
3 42 never
4 53 never
5 54 current
6 64 never
diagnostics <- attr(mock_data, "mockdata_diagnostics")
diagnostics$variables$smoking
$n
[1] 100
$preexisting_missing_code_indices
integer(0)
$assigned_missing_indices
[1] 87 46 33 84 8
$assigned_missing_codes
[1] "unknown" "unknown" "unknown" "unknown" "unknown"
$assigned_garbage_indices
named list()
$assigned_garbage_values
named list()
The diagnostics distinguish values assigned by post-processing from values that were drawn naturally during baseline generation.
If you already have recodeflow-style variables and variable_details metadata, adapt those tables to the same mock_spec shape.
variables <- data.frame(
variable = c("age", "smoking"),
variableType = c("Continuous", "Categorical"),
rType = c("integer", "character"),
role = c("enabled", "enabled"),
stringsAsFactors = FALSE
)
variable_details <- data.frame(
variable = c("age", "smoking", "smoking", "smoking"),
recStart = c("[18, 85]", "1", "2", "97"),
recEnd = c("copy", "copy", "copy", "NA::b"),
proportion = c(1, 0.6, 0.3, 0.1),
stringsAsFactors = FALSE
)
spec_from_metadata <- mock_spec_from_recodeflow(variables, variable_details)
names(spec_from_metadata$variables)
The adapter preserves recodeflow semantics: valid ranges, categorical proportions, recEnd missing-code rows, and garbage settings.
age smoking
1 59 1
2 29 1
3 42 1
4 21 1
5 60 1
6 27 2
metadata_diag <- attr(metadata_mock, "mockdata_diagnostics")
metadata_diag$variables$smoking$assigned_missing_indices[1:5]
Use the compatibility wrapper
create_mock_data() remains available for v0.3-style workflows. In strict mode, it attempts the v0.4 pipeline for supported metadata and falls back to the legacy create_* dispatch path for features that are not yet supported by the v0.4 native backend.
wrapped <- create_mock_data(
databaseStart = "example",
variables = variables,
variable_details = variable_details,
n = 100,
seed = 301
)
head(wrapped)
age smoking
1 58 2
2 27 2
3 18 97
4 69 1
5 59 1
6 47 2
When the v0.4 path is used, create_mock_data() returns diagnostics. Legacy fallback paths return plain data frames without that attribute.
Choosing the next function