Proof of concept study for externally validating existing stroke models in the OHDSI data network

Study: Link to Github

Objective: This study aims to demonstrate the power of using the OHDSI network to perform efficient external validation. This will be demonstrated by replicating five well known existing stroke models using the OHDSI patient level prediction framework and then getting the network of researchers to implement the models on their datasets over a 1 month period. The prediction question investigated is to predict stroke in a target population consisting of older female patients newly diagnosed with atrial fibrillation with no history of stroke. The specific target population was picked as it was the intersection of the different models’ development target populations. In this study five existing models; ATRIA, CHADS2, CHA2DS2-VASc, Framingham and Q-Stroke were replicated using the Patient Level Prediction framework. This study will investigate how well these five models perform when applied to datasets held by OHDSI network collaborators.

Rationale: Observational medical datasets, such as electronic healthcare records and insurance claims databases, present the opportunity to learn about disease progression and develop patient-level prediction models. Many prediction models have been developed [1-2] but the majority fail to make clinical impacts [3]. One of the main obstacles preventing model uptake is that lack knowledge about how transportable a model is. For example, a model developed using USA claims data may perform well in the US population but may not transport to Europe or Asia. In general the population used to develop the model may not be representative of the general population, so the true performance on a wider population may differ. The type of data the model was trained on (e.g., variables available) may also limit the model’s transportability to other datasets that lack an important variable. In addition, a model may obtain optimistic performance on the data use to develop it when poor model development practices are implemented (e.g., the performance may be optimistic if some data are not left out to perform internal validation). To address these concerns it is important to validate models on new datasets and gain insight into how well the model generalizes (how well it performs on similar data, e.g., validating a model trained on one US adult EMR by applying it to a different US adult EMR) and transports (how well it performs on different data, e.g., validating a model trained on one US adult EMR by applying it to a European adult EMR or US claims data) across populations. It has been shown that validating a model is often a slow process [4] and when independent researchers implement the existing model, they may make mistakes when processing the data or implement the existing models incorrectly. A collaborative approach to external model validation has been proposed to overcome some of these issues [5]. The Observational Healthcare Data Science and Informatics (OHDSI) community is an open group of researchers aiming to develop tools and best practices for analyzing observational healthcare data. The OHDSI network consists of a large number of researchers with access to diverse datasets from across the world. The community have developed a homogeneous format known as the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that any observational healthcare dataset can be mapped to. This standardization of the data can then be utilized to share analysis code directly between researchers, streamlining up the analysis process. Patient level prediction is one area of OHDSIs focus and standardized processes and tools that have been developed [6]. This presents the opportunity to validate models efficiently across the OHDSI network.

Project Lead: Jenna Reps

Coordinating Institution(s): Janssen

Additional Participants (currently seeking additional collaborators):

Full Protocol: External Validation of Stroke Models

Initial Proposal Date: 2018-04-01

Launch Date: 2018-05-14

Receive Results for Analysis Date: TBD

Study Closure Date: TBD

Results Submission: Via the OHDSI Sharing module embedded in study.

Requirements

CDM: V5 only

Table Accessed: person, condition_era, drug_exposure, observations

Database Dialects: SQL Server, Postgres, Oracle

Software: R (>= 3.2.2, RTools), Java, Python (3)

Hardware: Recommended 8 cores, 64GB memory, 250GB free space

Datasets Run

Datasets Running