This is an old revision of the document!
There are a number of implicit and explicit conventions that have been adopted in the CDM. Developers of methods that run methods against the CDM need to understand these conventions.
The CDM is platform-independent. Data types are defined generically using ANSI SQL data types (VARCHAR, INTEGER, FLOAT, DATE, TIME, CLOB). Precision is provided only for VARCHAR. It reflects the minimal required string length and can be expanded within a CDM instantiation. The CDM does not prescribe the date and time format. Standard queries against CDM may vary for local instantiations and date/time configurations.
In most cases, the first field in each table ends in “_id”, containing a record identifier that can be used as a foreign key in another table.
Variable names across all tables follow one convention:
| Notation | Description |
|---|---|
| <entity>_SOURCE_VALUE | Verbatim information from the source data, typically used in ETL to map to CONCEPT_ID, and not to be used by any standard analytics. For example, condition_source_value = ‘787.02’ was the ICD-9 code captured as a diagnosis from the administrative claim |
| <entity>_ID | Unique identifiers for key entities, which can serve as foreign keys to establish relationships across entities For example, person_id uniquely identifies each individual. visit_occurrence_id uniquely identifies a PERSON encounter at a point of care. |
| <entity>_CONCEPT_ID | Foreign key into the Standardized Vocabularies (i.e. the standard_concept attribute for the corresponding term is true), which serves as the primary basis for all standardized analytics For example, condition_concept_id = 31967 contains reference value for SNOMED concept of ‘Nausea’ |
| <entity>_SOURCE_CONCEPT_ID | Foreign key into the Standardized Vocabularies representing the concept and terminology used in the source data, when applicable For example, condition_source_concept_id = 35708202 denotes the concept of ‘Nausea’ in the MedDRA terminology; the analogous condition_concept_id might be 31967, since SNOMED-CT is the Standardized Vocabularies for most clinical diagnoses and findings. |
| <entity>_TYPE_CONCEPT_ID | Delineates the origin of the source information, standardized within the Standardized Vocabularies For example, drug_type_concept_id can allow analysts to discriminate between ‘Pharmacy dispensing’ and ‘Prescription written’ |