Changelog
Source:NEWS.md
MockData 0.4.0 (2026-06-10)
Breaking changes
-
create_cat_var(),create_con_var(), andcreate_date_var()now stop with the errorVariable '<name>' not found in variables metadatawhen the requested variable is absent, instead of warning and returningNULL. This affects direct generator calls andcreate_wide_survival_data()(a misspelled date-variable name now errors instead of being skipped with a warning).create_mock_data()itself derives variable names from thevariablesmetadata, so it cannot trigger this error; itsvalidate = FALSEflag continues to convert any generator error to warn-and-skip on the legacy path. Duplicatevariablesrows for the same variable now produce a warning in the legacycreate_*path before the first row is used (the v0.4mock_specpath already errors on duplicate names). -
create_mock_data()error messages for missing metadata files changed fromConfiguration file does not exist:/Details file does not exist:tovariables file does not exist:/variable_details file does not exist:.
New features
- Started the v0.4 production refactor around a normalized
mock_specarchitecture. - Added
mock_spec(),mock_spec_continuous(),mock_spec_categorical(),mock_spec_date(),is_mock_spec(), andvalidate_mock_spec(). - Added direct specification helpers
mock_continuous(),mock_categorical(), andmock_date()for simple use without recodeflow-style metadata tables. - Added
mock_spec_from_recodeflow()to adapt recodeflow-stylevariablesandvariable_detailsmetadata into validatedmock_specobjects while preserving role/database filtering, categorical proportions,recEndmissing-code semantics, valid ranges, garbage rules, date ranges, and survival/date fields. - Added
generate_mock_data_native()to generate baseline valid mock data frommock_specobjects with the native R backend. - Added
postprocess_mock_data()to applymock_specmissing-code and garbage-value rules after baseline generation, with diagnostics that distinguish assigned missing/garbage rows from naturally drawn values. - Post-processing diagnostics now protect naturally drawn missing-code collisions from later garbage assignment, apply garbage rules in canonical
low->high-> other order, and stop on repeated post-processing. This prevents silent diagnostic drift when a naturally drawn missing-code value would otherwise be overwritten by garbage assignment. - Added
generate_mock_data_simstudy()as a soft-gated optional backend for baseline categorical and uniform continuous generation whensimstudyis installed, with native generation retained for MockData-specific semantics. - The optional
simstudybackend is kept inSuggests, requiressimstudy >= 0.8.1, and validates categorical labels before converting generated values back into MockData’smock_speclevels. - The optional
simstudybackend now rejects variables namedid, which conflicts withsimstudy’s generated row identifier, and normalizes categorical output through an explicit label-or-index validation path. -
create_mock_data()now attempts the v0.4mock_specpipeline in strict mode for supported recodeflow metadata, while retaining the legacycreate_*dispatch path for unsupported v0.4 backend features and lenient generation. The v0.4 path attachesmockdata_diagnosticsand usesseedfor baseline generation plusseed + 1for post-processing, so exact seeded output may differ from v0.3.x even when the public seed is unchanged. Verbose mode now reports whether the v0.4 or legacy path was chosen. - Added forward-compatible specification fields:
spec_version,provenance, andmodel_hint. - Existing v0.3 generator APIs remain available while v0.4 internals are built.
MockData 0.3.0
Breaking changes
New function API - All generator functions now accept full metadata data frames instead of pre-filtered subsets:
# Before (v0.2.x)
var_row <- variables[variables$variable == "age", ]
details_subset <- variable_details[variable_details$variable == "age", ]
result <- create_con_var(var_row, details_subset, n = 1000)
# After (v0.3.0)
result <- create_con_var(
var = "age",
databaseStart = "minimal-example",
variables = variables,
variable_details = variable_details,
n = 1000
)Affected functions: create_cat_var(), create_con_var(), create_date_var(), create_wide_survival_data(), create_mock_data()
Deprecated: prop_garbage parameter in create_wide_survival_data(). Use garbage parameters in metadata instead:
# Old way (no longer supported)
surv <- create_wide_survival_data(..., prop_garbage = 0.03)
# New way
vars_with_garbage <- add_garbage(variables, "death_date",
garbage_high_prop = 0.03, garbage_high_range = "[2025-01-01, 2099-12-31]")
surv <- create_wide_survival_data(..., variables = vars_with_garbage)New features
Unified garbage generation across all variable types (categorical, continuous, date, survival):
-
garbage_low_prop+garbage_low_rangefor values below valid range -
garbage_high_prop+garbage_high_rangefor values above valid range - New helper function
add_garbage()for easy garbage specification - Categorical garbage now supported (treats codes as ordinal to generate out-of-range values)
# Add garbage to any variable type
vars_with_garbage <- add_garbage(variables, "smoking",
garbage_low_prop = 0.02, garbage_low_range = "[-2, 0]")
# Pipe-friendly
vars_with_garbage <- variables %>%
add_garbage("age", garbage_high_prop = 0.03, garbage_high_range = "[150, 200]") %>%
add_garbage("smoking", garbage_low_prop = 0.02, garbage_low_range = "[-2, 0]")Derived variable identification:
-
identify_derived_vars()- Identifies derived variables usingDerivedVar::andFunc::patterns -
get_raw_var_dependencies()- Extracts raw variable dependencies - Compatible with recodeflow patterns
Bug fixes
- Fixed categorical garbage factor level bug - garbage values were being converted to NA during factor creation
- Fixed
recEndcolumn requirement - now optional for simple configurations - Fixed derived variable generation in
create_mock_data()- derived variables now correctly excluded - Fixed
create_mock_data()error handling: strict mode is now the default, so unsupportedrTypevalues and generator errors stop generation instead of silently dropping columns. Usevalidate = FALSEto opt into warning-and-skip behavior. - Fixed role matching so
disabledno longer matchesenabled. - Fixed missing-code classification to use
recEndmetadata instead of numeric-code heuristics, so valid codes such as 7, 17, and 27 are not misclassified as missing. - Fixed
variable_details = NULLfallback mode for categorical, continuous, and date variables. - Fixed
rTypenormalization sodate/Datehandling is consistent across validation, defaults, and generation. - Added migration warnings for legacy
corrupt_*garbage fields and recStart values, rewriting them to canonicalgarbage_*names. - Added validation that
garbage_low_prop + garbage_high_propdoes not exceed 1, preventing silent truncation of the second garbage pass.
Documentation
Restructured using Divio framework:
- Removed 6 vignettes (cchs-example, chms-example, demport-example, dates, schema-change-dates, tutorial-config-files)
- Added 2 new vignettes (tutorial-categorical-continuous, tutorial-survival-data)
- Massively expanded reference-config (2,028 lines of comprehensive metadata schema documentation)
- All vignettes updated to v0.3.0 API
- All examples now use
inst/extdata/minimal-example/only
Final structure (9 vignettes):
- Tutorials (6): getting-started, tutorial-categorical-continuous, tutorial-dates, tutorial-survival-data, tutorial-missing-data, tutorial-garbage-data
- How-to guides (1): for-recodeflow-users
- Explanation (1): advanced-topics
- Reference (1): reference-config
Metadata simplification:
- Removed
inst/extdata/cchs/,inst/extdata/chms/,inst/extdata/demport/ - Only
inst/extdata/minimal-example/remains as canonical reference
Migration guide
Update function calls:
- Pass variable name as string (not pre-filtered row)
- Pass full metadata data frames (not subsets)
- Add
databaseStartparameter - Remove manual filtering
Update garbage specification:
- Remove
prop_garbagefromcreate_wide_survival_data()calls - Add garbage to metadata using
add_garbage()helper
MockData 0.2.0
Major changes
New configuration format (v0.2)
-
Breaking change: New configuration schema with
uid/uid_detailsystem - Replaces v0.1
cat/catLabelcolumns with unified metadata structure - Adds
rTypefield for explicit R type coercion (factor, integer, double, Date) - Adds
proportionfield for direct distribution control - Adds date-specific fields:
date_start,date_end,distribution
Backward compatibility: v0.1 format still supported via dual interface. Both formats work side-by-side.
Date variable generation
- New
create_date_var()function for date variables - Multiple distribution options: uniform, gompertz, exponential
- Support for survival analysis patterns
- SAS date format parsing
- Three source formats: analysis (R Date), csv (ISO strings), sas (numeric)
Survival analysis support
- New
create_wide_survival_data()function for cohort studies - Generates paired entry and event dates with guaranteed temporal ordering
- Supports censoring and multiple event distributions
-
Note: Must be called manually (not compatible with
create_mock_data()batch generation)
Data quality testing (garbage data)
- New
prop_invalidparameter across all generators - Generates intentionally invalid data for testing validation pipelines
- Supports garbage types:
corrupt_future,corrupt_past,corrupt_range - Critical for testing data cleaning workflows
Batch generation
- New
create_mock_data()function for batch generation from CSV configuration - New
read_mock_data_config()andread_mock_data_config_details()readers - Processes multiple variables in single call
- Fallback mode when details not provided
New functions
-
create_date_var()- Date variable generation -
create_wide_survival_data()- Paired survival dates with temporal ordering -
create_mock_data()- Batch generation orchestrator -
read_mock_data_config()- Configuration file reader -
read_mock_data_config_details()- Details file reader -
determine_proportions()- Unified proportion determination -
import_from_recodeflow()- Helper to adapt recodeflow metadata
Function updates
-
create_cat_var(): Add rType support, proportion parameter, uid-based filtering -
create_con_var(): Add rType support, proportion parameter for missing codes - Consolidate helpers in
mockdata_helpers.R,config_helpers.R,scalar_helpers.R
Documentation
New vignettes
-
getting-started.qmd- Comprehensive introduction -
tutorial-dates.qmd- Date configuration patterns -
tutorial-config-files.qmd- Batch generation workflow -
reference-config.qmd- Complete v0.2 schema documentation -
advanced-topics.qmd- Technical implementation details
Package infrastructure
- Added
_pkgdown.ymlfor documentation website - Updated NAMESPACE with new imports (stats::rexp, utils::read.csv, etc.)
- Updated DESCRIPTION with new dependencies
Breaking changes
Configuration format changes:
- Variable details now require
uidanduid_detailcolumns -
rTypefield required for proper type coercion - New date fields:
date_start,date_end,distribution
Migration path:
- v0.1 format still works (backward compatibility maintained)
- Dual interface auto-detects format based on parameters
- v0.2 recommended for new projects
File changes:
- Renamed
R/mockdata-helpers.R→R/mockdata_helpers.R - ICES metadata removed (maintained in recodeflow package)
Bug fixes
- Fixed ‘else’ handling in
recEndrules (issue #5) - Fixed create_wide_survival_data() compatibility with create_mock_data()
- Fixed Roxygen documentation link syntax errors
Known issues
- Survival variable type must be generated manually with
create_wide_survival_data() - Cannot be used in
create_mock_data()batch generation (requires paired variables)