Validate MockData configuration details against schema requirements including proportion sums and parameter completeness
Source:R/read_mock_data_config_details.R
validate_mock_data_config_details.RdValidates a mock_data_config_details data frame against schema requirements. Checks for required columns, valid proportions, proportion sums, parameter requirements, and optionally validates links to config file.
Details
Validation checks:
Required columns:
variable, recStart (always required)
recEnd (conditionally required when using missing data codes 6-9, 96-99)
uid, uid_detail (optional for simple examples)
Conditional recEnd requirement:
recEnd column required when recStart contains missing codes (6-9, 96-99)
Enables classification: NA::a (skip), NA::b (missing), numeric (valid)
Without recEnd, missing vs. valid codes cannot be distinguished
Uniqueness:
uid_detail values must be unique (if column present)
Proportion validation:
Values must be in range
[0, 1]Population proportions (valid + missing codes) must sum to 1.0 ±0.001 per variable
Contamination proportions (corrupt_*) are excluded from sum
Auto-normalizes with warning if sum ≠ 1.0
Parameter validation:
Distribution-specific requirements:
normal → mean + sd
gompertz → rate + shape
exponential → rate
poisson → rate
Link validation (if config provided):
All uid values must exist in config$uid
Flexible recEnd validation:
Warns but doesn't error on unknown recEnd values
Examples
if (FALSE) { # \dontrun{
# Validate details
details <- read.csv("mock_data_config_details.csv", stringsAsFactors = FALSE)
validate_mock_data_config_details(details)
# Validate with cross-check against config
config <- read.csv("mock_data_config.csv", stringsAsFactors = FALSE)
validate_mock_data_config_details(details, config = config)
} # }