====Metadata==== ===Proposals are now tracked as github issues=== [[https://github.com/OHDSI/CommonDataModel/issues/79|link to github issue]] Proposing person: Vojtech Huser **Discussion link:** http://forums.ohdsi.org/t/metadata-extension-to-cdm/1746/1 Table [[documentation:cdm:cdm_source|CDM_SOURCE]] provides metadata. (http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:cdm_source) ====== Use case ====== - display metadata within Atlas-Achilles Web (when reviewing data characterization plots and tables) - allow organizations with multiple OMOP CDM datasets to have a mechanism to store dataset metadata (analysis of this use will provide input for phase 2 of metadata standardization) - only run certain data quality checks when they are appropriate to the dataset (e.g., general population dataset; this use case depends on proper concept level standardization) ====== CDM changes ====== The proposal is adding a single table to the CDM specs. In phase 1, we are trying to provide a mechanism for sites to capture metadata. The concept level standardization is planned in phase 2. ===== new METADATA table ===== Tablename: METADATA This table is relying on concept_id's that exist for CDM tables. In Atlas, search for those using advanced search and selecting Metadata. ^ Column ^ Description ^ Data_type ^ | METADATA_CONCEPT_ID | OMOP Vocabulary CONCEPT_ID that identifies the information you with to track (e.g. 8 for metadata about a Visit) | INT | | METADATA_TYPE_CONCEPT_ID | OMOP Vocabulary CONCEPT_ID that identifies the type information you with to track (e.g. 1 for metadata about Domains such as a Visit) | INT | | NAME |Name of the CONCEPT_ID stored in METADATA_CONCEPT_ID or in the event there is not an applicable CONCEPT_ID NAME can be used to represent the data stored (e.g. CDM_BUILDER VERSION) |VARCHAR(250) | | VALUE |Store the metadata value you wish to capture |NVCHAR | **Modified proposal** ^ Column ^ Description ^ Data_type | Required | | METADATA_CONCEPT_ID | OMOP Vocabulary CONCEPT_ID that identifies the information you with to track (e.g. 8 for metadata about a Visit) | INT | | | METADATA_TYPE_CONCEPT_ID | OMOP Vocabulary CONCEPT_ID that identifies the type information you with to track (e.g. 1 for metadata about Domains such as a Visit) | INT | | | NAME | Name of the CONCEPT_ID stored in METADATA_CONCEPT_ID or in the event there is not an applicable CONCEPT_ID NAME can be used to represent the data stored (e.g. CDM_BUILDER VERSION) | VARCHAR(250) | | | VALUE_AS_STRING | Store the metadata value (string) | NVCHAR | | | VALUE_AS_CONCEPT_ID | OMOP Vocabulary CONCEPT_ID that reflects the metadata value | int | No | | METADATA_DATETIME | The date and time associated with metadata | datetime | No | | METADATA_DATE | date | date | No | **Example records:** ^ METADATA_ CONCEPT_ID ^ METADATA_TYPE_ CONCEPT_ID ^ NAME ^ VALUE ^ | 51 | 1 | PERSON | Person information is pulled from insurance enrollment data where the individual both has medical and prescription benefits. The month of birth is not provided however for enrollees who start their enrollment the year they are born we extrapolate their month of birth from the month where their enrollment starts, for the majority of patients only year of birth is available. Persons who change gender over their enrollments or change year of birth are excluded. | | 0 | 1 | OBSERVATION PERIOD | An observation period is a representation of when a patient was enrolled in a health insurance plan and had prescription benefits. Periods of continuous enrollment are consolidated by combining monthly records as long as the time between the end of one enrollment period and the start of the next is 32 days or less. | | 57 | 1 | CARE SITE | There is not clear care site information in this source so no data will be captured within this table. | | 8 | 1 | VISIT | For the outpatient visits, all activity that is recorded on a single day for a person is considered to have occurred during one visit with the visit start and end date corresponding to this date. | | 55 | 1 | PROVIDER | Unique list of health care providers (physicians). Truven does provide some provider information however some of the providers listed by Truven may also be considered care sites or organizations. Since there is not clear way to decipher between all items identified as providers by Truven, regardless if they are truly organizations or care sites, they will be added to this table. | | 0 | 1 | DEATH | Death in Truven can be captured at discharge from an inpatient visits or in some cases by diagnosis code. The death data in this source should not be considered complete, for example if a patient left a hospital and later died at home that would not be captured. Additionally if a death was recorded however if the patient continues to have services charges after 30 days of the death date we assume the death data was faulty. | |19|1|CONDITION|Condition records are primarily recorded as codified claims data (e.g. ICD9 or ICD10 records that are submitted associated with a service). Additional condition information comes from patients who also have Health Risk Assessment data from Truven.| |13|1|DRUG|Drug exposure records are primarily recorded as codified claims data (e.g. an NDC code or a procedure code that includes a drug). If the OMOP Vocabulary deems a code of a non-traditional drug centric vocabulary is in fact a drug exposure, the record will move to this table (e.g. CPT4- 90690- “Typhoid vaccine, live, oral” maps to drug concept in the OMOP Vocabularies so the CDM_BUILDER will move the record to the DRUG_EXPOSURE table instead of the procedure table). Additional drug exposure information comes from patients who also have Health Risk Assessment data from Truven.| |10|1|PROCEDURE|Procedure occurrence records are recorded as codified claims data (e.g. a CPT4 code or ICD9 procedure code). If the OMOP Vocabulary deems a procedure code to be of a type of another domain (e.g. CPT4- 90690- “Typhoid vaccine, live, oral” maps to drug concept in the OMOP Vocabularies so the CDM_BUILDER will move the record to the DRUG_EXPOSURE table instead of the procedure table) however in the case of the primary procedure code those will always write a record to this table in order to maintain cost data. | |21|1|MEASUREMENT|Measurement data traditionally comes from lab data supplied from laboratory service vendors however data vendors such as Truven do not have 100% representation with their lab results (e.g. they will only receive lab data of vendors they have contracted with like a Quest Diagnostics). If the OMOP Vocabulary deems a code of a non-traditional measurement centric vocabulary is in fact a measurement, the record will move to this table (e.g. ICD9- V85.22- “Body Mass Index 26.0-26.9, adult” usually thought of as a diagnosis code maps to a measurement concept in the OMOP Vocabularies so the CDM_BUILDER will move the record to the MEASUREMENT table). Additional measurement information comes from patients who also have Health Risk Assessment data from Truven.| |27|1|OBSERVATION|Codified data or Health Risk Assessment data that is not a diagnosis, drug exposure, procedure, or measurement will become an observation.| | 0 | 0 | CDM_BUILDER VERSION | 1.8.0.9 | |0|0|DATASET_TYPE|Clinical Trial Data| The proposal encourages all CDM adopters to fully populate and utilize the existing CDM_SOURCE table. ===== END OF PROPOSAL ===== ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- Text below only reflects some historical notes related to the proposal above. == Details 1 == Proposing person: Patrick Ryan, Martijn Schuemie, Ajit Londhe, & Erica Voss (may need to be updated) Additionally we would like the CDM_SOURCE table to store metadata about each of the domains. Our idea is to implement it by adding an additional column for each domain in the CDM to the CDM_SOURCE table (i.e. CDM_SOURCE.VISIT_OCCURRENCE, CDM_SOURCE.PERSON, etc). The value this brings is this will allow us to display information about a specific domain on an ACHILLES report. For example, VISIT_OCCURRENCE logic in PREMIER is fairly complex and displaying a description of that logic at the point where someone is reviewing the data in ACHILLES would be beneficial. Here is an example of some text for JMDC: == Database as a whole == (already has a column) //JMDC database consists of data from 60 Society-Managed Health Insurances covering workers aged 18 to 65 and their dependents (children younger than 18 years old and elderly people older than 65 years old). The old people (particularly those aged 66 or older) are less representative as compared with whole population in the nation. When estimated among the people who are younger than 66 years old, the proportion of children younger than 18 years old in JMDC is approximately the same as the proportion in the whole nation. JMDC data includes data on membership status of the insured people and claims data provided by insurers under contract. Claims data are derived from monthly claims issued by clinics, hospitals and community pharmacies.// == Person == //JMDC covers workers aged 18 to 65 and their dependents (children younger than 18 years old and elderly people older than 65 years old). The old people (particularly those aged 66 or older) are less representative as compared with whole population in the nation. When estimated among the people who are younger than 66 years old, the proportion of children younger than 18 years old in JMDC is approximately the same as the proportion in the whole nation. Only the year of birth is available, so not the day or month.// == Observation_period == //The observation period is defined as the time of enrollment in the health insurance. If the member is a dependent, the enrollment depends on the enrollment of the main beneficiary.// == Care_site == //Care sites in JMDC are institutions where care is provided, typically a department in a hospital.// ---- ==Details 2 == debate about CDM_SOURCE table ===== CDM_SOURCE table ===== improve the guidance for this table (superceded by inclusion of the below information in the METADATA table) * capture DATASET_TYPE_CONCEPT_ID Definition: Reference to concept_id in OHDSI/OMOP Terminology (class = "Dataset Type") that indicates what type of data is in the dataset. Set to NULL if none of the concepts correctly characterizes the data. For large samples of specialized population by insurance (e.g., US Medicaide, use general population concepts) * Values are: General population EHR data, General population claims data, General Population EHR + Claims Data, Clinical Trial Data Advanced Data Quality checks (inside Achilles Heel) would take advantage of this information in this new column. == DATASET_TYPE_CONCEPT_ID == * if you don't want to (or can't) declare the type of data, use concept 0 (*) * Clinical trial data (dataset type) (*) * Multiple sources (dataset type) * Registry data (dataset type) * Predominantly Electronic Health Record data (dataset type) * Predominantly Administrative/Claims data (dataset type) * Predominantly Health Information Exchange data (dataset type) * Data limited to a single medical specialty/clinical domain, not covering general population (dataset type) (*) Predominantly means if at least 51% of significant records comes from a given source. Inpatient vs outpatient data can be determined from visit types and does not need to be classified above. ---- ^ Column ^ Description ^Data type | | | DATASET_TYPE_CONCEPT_ID | Type of dataset. Reference to OMOP Concept that provides dataset type classification. | integer | == Details 3 == Proposing person: Ajit Londhe, & Erica Voss We would like to propose the following table to hold metadata: Tablename: METADATA ^ Column ^ Description ^ Data_type ^ | METADATA_CONCEPT_ID | OMOP Vocabulary CONCEPT_ID that identifies the information you with to track (e.g. 8 for metadata about a Visit) | INT | | METADATA_TYPE_CONCEPT_ID | OMOP Vocabulary CONCEPT_ID that identifies the type information you with to track (e.g. 1 for metadata about Domains such as a Visit) | INT | | NAME |Name of the CONCEPT_ID stored in METADATA_CONCEPT_ID or in the event there is not an applicable CONCEPT_ID NAME can be used to represent the data stored (e.g. CDM_BUILDER VERSION) |VARCHAR(250) | | VALUE |Store the metadata value you wish to capture |NVCHAR | Example records: ^ METADATA_CONCEPT_ID ^ METADATA_TYPE_CONCEPT_ID ^ NAME ^ VALUE ^ | 8 | 1 | VISIT | For the outpatient visits, all activity that is recorded on a single day for a person is considered to have occurred during one visit with the visit start and end date corresponding to this date. | | 0 | 0 | CDM_BUILDER VERSION | 1.8.0.9 | NOTES original table was ^ Column ^ Description ^Data type | | | DATASET_TYPE_CONCEPT_ID | Type of dataset. Reference to OMOP Concept that provides dataset type classification. | integer | | PERSON| |text| | OBSERVATION_PERIOD| |text| | VISIT_OCCURRENCE | Description of the logic used to populate the table (column name indicates the table). | text | | PROCEDURE_OCCURRENCE | Description of the logic used to populate the table (column name indicates the table). | text | | CONDITION_OCCURRENCE | Description of the logic used to populate the table (column name indicates the table). | text | | DRUG_EXPOSURE | Description of the logic used to populate the table (column name indicates the table). | text | | MEASUREMENT | Description of the logic used to populate the table (column name indicates the table). | text |