This is an old revision of the document!
The availability of very large-scale healthcare databases in electronic form has opened the possibility to generate systematic and large-scale evidence and insights about the application of healthcare to patients. This discipline is called Observational Outcome Research, and it uses the longitudinal patient level clinical data in order to describe and understand the pathogenesis of disease and the effect of other clinical events as well as treatment interventions on the progression of the disease. This research constitutes secondary use of the data, which is being collected usually for purposes other than research: administrative data such as insurance reimbursement claims and Electronic Health or Medical Record (EHR, EMR).
Because of the collection purpose for primary use, the format and representation of the data follows that primary use. It also introduces artifacts and bias into the data. In addition, all source datasets differ from each other in format and content representation. Since healthcare systems differ between countries, the problem becomes even harder for research carried out internationally. All this makes robust, reproducible and automated research a significant challenge.
The solution is the standardization of the data and a standardization of the representation. This allows methods and tools to operate on data of disparate origin, freeing the analyst from having to dissect the idiosyncrasies of a particular dataset and manipulating the data to make it fit for research. It also allows to develop analytical methods on one dataset, and applying it an any other dataset in CDM format.
The OMOP CDM and Standardized Vocabularies provide such a framework for systematic research. It consists of the following components and mechanisms:
It is important to note that these components are constructed strictly for the purpose of supporting observational research. In that regard the Standardized Vocabularies differ from large collections with equivalence mapping of concepts such as the UMLS. UMLS resources have been used heavily as a basis for constructing many of the Standardized Vocabulary components, but significant additional efforts have been made to the purpose of this resource:
The availability of very large-scale healthcare databases in electronic form, such as administrative claims and electronic health record data, has opened the possibility to generate systematic and large-scale evidence and insights about the application of healthcare to patients. Amongst them the effectiveness and risks of treatment interventions. However, because of a lack of standardization, clinical terminologies may differ across databases. One approach to fully harvest the value of multiple data sources and assure that the output is comparable is to standardize source codes into a common terminology.
In the US, diagnosis codes in medical claims are generally processed based on the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) coding system. The Health Insurance Portability and Accountability Act (HIPAA) prescribes adoption rules about how transaction standards for electronic healthcare data interchange for covered entities are regulated, among them the use of ICD-9-CM.[4] From October 2013, ICD-10-CM , the successor to ICD-9-CM, must be used on all HIPAA transactions.[5] For inpatient hospital procedure coding, the International Classification of Diseases, Tenth Revision, Procedure Coding System (ICD-10-PCS) will be used.[6]