User Tools

Site Tools


documentation:cdm:cdm:data_model_conventions

This is an old revision of the document!


Data Model Conventions

There are a number of implicit and explicit conventions that have been adopted in the CDM. Developers of methods that run methods against the CDM need to understand these conventions.

General conventions of data tables

The CDM is platform-independent. Data types are defined generically using ANSI SQL data types (VARCHAR, INTEGER, FLOAT, DATE, TIME, CLOB). Precision is provided only for VARCHAR. It reflects the minimal required string length and can be expanded within a CDM instantiation. The CDM does not prescribe the date and time format. Standard queries against CDM may vary for local instantiations and date/time configurations.

In most cases, the first field in each table ends in “_id”, containing a record identifier that can be used as a foreign key in another table.

General conventions of fields

Variable names across all tables follow one convention:

NotationDescription
<entity>_SOURCE_VALUEVerbatim information from the source data, typically used in ETL to map to CONCEPT_ID, and not to be used by any standard analytics. For example, condition_source_value = ‘787.02’ was the ICD-9 code captured as a diagnosis from the administrative claim
<entity>_IDUnique identifiers for key entities, which can serve as foreign keys to establish relationships across entities For example, person_id uniquely identifies each individual. visit_occurrence_id uniquely identifies a PERSON encounter at a point of care.
<entity>_CONCEPT_IDForeign key into the Standardized Vocabularies (i.e. the standard_concept attribute for the corresponding term is true), which serves as the primary basis for all standardized analytics For example, condition_concept_id = 31967 contains reference value for SNOMED concept of ‘Nausea’
<entity>_SOURCE_CONCEPT_IDForeign key into the Standardized Vocabularies representing the concept and terminology used in the source data, when applicable For example, condition_source_concept_id = 35708202 denotes the concept of ‘Nausea’ in the MedDRA terminology; the analogous condition_concept_id might be 31967, since SNOMED-CT is the Standardized Vocabularies for most clinical diagnoses and findings.
<entity>_TYPE_CONCEPT_IDDelineates the origin of the source information, standardized within the Standardized Vocabularies For example, drug_type_concept_id can allow analysts to discriminate between ‘Pharmacy dispensing’ and ‘Prescription written’
documentation/cdm/cdm/data_model_conventions.1415971227.txt.gz · Last modified: 2014/11/14 13:20 by cgreich