Reference

variables

Contains all the variables needed for algorithm development and implementation

Algorithm type(s): all

Columns

Column Name Description Type Category Values
variable The name of the variable string
role The different processes that the variable was part of during the algorithm development string
variableType The statistical type of the variable category Categorical: Categorical type
Continuous: Continuous type
databaseStart The databases that the variable can be harmonized from string
variableStart The names of the variable for each starting database string
units The units for the variable string
label A short description of the variable string
labelLong A long description of the variable string
description Should contain any issues that need to be taken into account before using this variable string

variable-details

Contains the details for the variables defined in a variables sheet

Algorithm type(s): all

Columns

Column Name Description Type Category Values
variable The name of the variable string
dummyVariable The name of the dummy variable. Not valid if the variable is continuous. string
typeEnd The type of the variable category cat: Categorical type
cont: Continuous type
databaseStart The starting databases whose harmonization details this row contains string
variableStart The names of the variable for each starting database string
typeStart The type of the starting variables category cat: Categorical type
cont: Continuous type
N/A: Used mainly for derived variables when there are many start variables
recEnd The value that the variable will be harmonized to. Can be a number or an interval. string
catLabel The description for the category. string
catLabelLong A longer description for the category. string
numValidCat The number of valid categories for the variables. number
units The units for the variable. Only for continuous variables. string
recStart The value of the starting variable to harmonize from. Can be a single value or an interval. string
catStartLabel The description for the category of the start variable. string
variableStartShortLabel A short description of the starting variables string
variableStartLabel A long description of the starting variables string
notes Any issues/problems that users of this variable should consider. Can also include any problems encountered during harmonization. string

lookup

Contains the score range that each bin belongs to

Algorithm type(s): all

Columns

Column Name Description Type Category Values
catValue The bin number number
range The range if score values for this bin. Should use the mathematical range notation. string

descriptive

Contains descriptive information, for example mean, median etc. for variables

Algorithm type(s): all

Columns

Column Name Description Type Category Values
variable The variable whose descriptive statistics the row contains string
catValue The value of the category whose descriptive statistics the row contains. N/A for continuous variables. number
n The number of individuals used to calculate the row’s statistics number
proportion The proportion of all the individuals in the study used to calculate this row’s statistics number
median The median statistic number

model-export

Contains the list of all files that are part of the export for an algorithm

Algorithm type(s): all

Columns

Column Name Description Type Category Values
fileType The type of file category variables: A variables file
variable-details: A variable details file
descriptive: A descriptive lookup file
descriptive-bins: A descriptive bins file
model-export: A model export file
model-steps: A model steps
dummy: A dummy file
center: A centering file
rcs: An RCS file
interaction: An interaction file
fine-and-gray: A fine and gray file
cox: A cox file
survival_bins: File with survival data for the different bins
lookup: Contains the range of score values for each biin in a survival algorithm
validate: Contains the validation rules for variables in the algorithm
tables: Defines a tables model export file
filePath The path to the file relative to the model export file string

model-steps

Contains the steps to score an algorithm

Algorithm type(s): all

Columns

Column Name Description Type Category Values
step The type of step to perform category dummy: Step used to create dummy variables
center: Step used to create centered variables
rcs: Steps used to create spline terms
interaction: Step used to create interaction terms
fine-and-gray: Step used to calculate the outcome of a fine and gray model
cox: Step used to calculate the outcome of a cox proportional hazards model
simple-model: Step used to calculate the outcome of a simple model
logistic-regression: Step used to calculate the outcome of a logistic regression model
fileType The type of file referenced in the file path column category N/A: Missing since certain steps don’t need to specify the file type
beta-coefficients: A beta coefficients file for a fine and gray model or a cox model
baseline-hazards: The baseline hazards for a fine and gray model or a cox model
filePath The path to the file relative to the model location of the model steps file string
notes Any notes for the future string

dummy

Contains the variables to convert to dummy variables

Algorithm type(s): all

Columns

Column Name Description Type Category Values
origVariable The name of the variable to dummy string
catValue The category value in the original variable which the dummy variable represents string
dummyVariable The name of the dummy variable string

center

Contains the variables to create centered variables

Algorithm type(s): all

Columns

Column Name Description Type Category Values
origVariable The name of the variable to center string
centerValue The value to center with string
centeredVariable The name of the new centered variable string
centeredVariableType The type of the new centered variable category cat: Categorical type
cont: Continuous type

interaction

Contains the interaction variables for an algorithm

Algorithm type(s): all

Columns

Column Name Description Type Category Values
interactionVariable The name of the interaction variable string
interactingVariables The names of the variables that are part of this interaction variable string
interactionVariableType The statistical type of the interaction variable category cat: Categorical type
cont: Continuous type

rcs

Contains the RCS variables for an algorithm

Algorithm type(s): all

Columns

Column Name Description Type Category Values
variable The name of the variable which will be splined string
rcsVariables The names of the spline variables in correct order string
knots The knot values to use seperated by a semi-colon string

validate

Contains the validation rules for variables in the algorithm

Algorithm type(s): all

Columns

Column Name Description Type Category Values
variable The name of the variable to validate string
rule The type of validation rule to apply to a variable category type: Validate the data type of the variable
range: Validate that the variable is within a range of values. Meant for continuous variables.
allowed: Validate that the variable contains a value from a set of valid values. Meant for categorical variables.
nullable: Whether missing values are allowed for a variable
value The value to use when applying the validation string
error_handle How to handle failed validations category error: Failed validations for the variable should throw an error, stopping the scoring process
warning: Failed validations for the variable should log a warning and continue the scoring process
truncate: Failed validations for the variable should log a warning, truncate the failed variable value, and continue scoring process
error_replace Value to replace variables that fail validations whose errorHandle value is warning string
location Which step in the scoring process the validation should be used string

beta-coefficients

Contains the coefficients for regression models

Algorithm type(s): cox, fine-and-gray, logistic-regression

Columns

Column Name Description Type Category Values
variable The name of the variable whose beta coefficient the row contains. If this is the coefficient for the intercept then use the name Intercept string
coefficient The beta coefficient number
type The statistical type of the variable category cat: Categorical type
cont: Continuous type

baseline-hazards

Contains the baseline hazards to use with a cox proportional hazards model or a fine and grey model

Algorithm type(s): cox, fine-and-gray

Columns

Column Name Description Type Category Values
time The time upto which the baseline hazard should be used number
baselineHazard The baseline hazard value number

survival_bins

Contains the survival data for the different bins in a survival algorithm

Algorithm type(s): survival

Columns

Column Name Description Type Category Values
catValue The bin number number

lookup

Contains the range of score values for each bin in a survival algorithm

Algorithm type(s): survival

Columns

Column Name Description Type Category Values
catValue The bin number number
range The range if score values for this bin. Should use the mathematical range notation. string

tables

Contains the list of tables referenced in the variable and variable detail files

Algorithm type(s): all

Columns

Column Name Description Type Category Values
tableName The name of the table string
tablePath The path to the table relative to this file string

simple-model

Contains the metadata for a simple model

Algorithm type(s): simple-model

Columns

Column Name Description Type Category Values
name The name of the metadata category outputVariableName: The name of the output variable for the model. Should be defined in the variables and variable details sheets.
value The value of the metadata string

logistic-regression

Contains the beta coefficients for a logistic regression model

Algorithm type(s): logistic-regression

Columns

Column Name Description Type Category Values
variable The name of the variable whose beta coefficient the row contains. If this is the coefficient for the intercept then use the name Intercept string
coefficient The beta coefficient number
type The statistical type of the variable category cat: Categorical type
cont: Continuous type