2017 OHDSI Collaborator Face-to-Face

Loading Map....

Date(s) - 03/17/2017 - 03/18/2017
8:00 am - 6:00 pm

GeorgiaTech: GTRI Conference Center

Categories No Categories

The OHDSI face-to-face meeting is an annual event which aims to give active OHDSI collaborators the opportunity to openly discuss challenges facing our research community and work towards shared solutions. The 2017 face-to-face will be a two day event consisting of community discussions, working group break-out sessions, and hack-a-thon activities.

This is a working meeting. All in attendance are expected to participate in disucssions and contribute their knowledge and skills to each session they attend.


Day 1 – March 17th

Time Description
8:30 – 9:00am Registration
9:00 – 9:30am Welcome Session
9:30 – 10:30am Working Group Breakout Session – Part I

  • Common data model and vocabulary – Proposal review
  • Population-level estimation
  • Hadoop
  • Vocabulary Visualization
  • Orientation for newcomers
10:30 – 11:00am Break
11:00 – 12:00pm Working Group Breakout Session – Part II

  • Common data model and vocabulary – Proposal review
  • Architecture
  • Patient-level prediction
  • Natural language processing
  • Orientation for newcomers
12:00 – 1:00pm Reconvene the community

  • Summary of key points from each work group
  • Hack-a-thon presentation: Framing of target problems
1:00 – 1:30pm Lunch
1:30 – End of day Hack-a-thon – Three possible tracks

  • Phenotyping and cohort building
  • Large scale statistical computing
  • Design session: UI experience and information dissemination

Day 2 – March 18th

Time Description
8:00 – 12:30pm Continue hack-a-thon activities across three tracks:

  • Phenotyping and cohort building
  • Large scale statistical computing
  • Design session: UI experience and information dissemination
12:30 – 1:30pm Lunch
1:30 – 3:00pm Reconvene the community

  • Review outcomes from each hack-a-thon track
3:00 – 3:30pm Break
3:30 – 5:30pm Open community discussion

  • Next steps for following through on group projects
  • Other priorities for collaborative projects
  • Other ways to engage the community and make contributions
5:30pm Wrap-up


Hack-A-Thon Tracks

Phenotyping and Cohort Building

One of the first steps in using EHR data for research is to reliably identify a cohort of patients that have a condition of interest or a phenotype. Typically, methods for identifying patients with a given phenotype have relied on rule-based definitions. Given the heterogeneity of the data models in use, missing data values, and differences in standardization, in commercial EHR systems such rule-based definitions are difficult to port across different EHR systems and institutions. Developing such definitions against common data models is one way to create reusable phenotype definitions.

Recently, statistical learning approaches have also been employed for electronic phenotyping. Here the rate-limiting step is manual creation of training sets for statistical learning approaches. In the OHDSI community, we have demonstrated that by using semi-automatically assigned, and possibly noisy labels in training data, it is possible to build phenotype models that are comparable to rule-based phenotype definitions. The key intuition is that the large volume of training data which can be collected using an automated labeling process, can compensate for the inaccuracy in the labels.

At the hackathon, we intend to build a library of 100 phenotype definitions for the OHDSI community for shared used. We will create both rule based and statistical model based phenotypes and evaluate their validity be examining patient profiles found by applying those phenotype definitions. We will also work to compare agreement with existing definitions reported in the literature.

Participants will be grouped together to perform the following tasks:

  • Develop and implement rule-based heuristic phenotypes
    • Skills required: ability to perform literature search, knowledge of standardized vocabularies, ability to define inclusion criteria in ATLAS
  • Evaluate the performance of phenotypes
    • Skills required: clinical knowledge to adjudicate electronic patient profiles, ability to define ‘noisy labeled’ reference set, SQL programming to evaluate phenotype instantiation against reference set (requires access to patient-level data in CDM)
  • Learn and apply model-based phenotype approach
    • Skills required: R programming, access to CDM

Large-Scale Statistical Computing (Patient-Level Prediction)

The aim of this track is to enable large-scale patient-level prediction, i.e. scale up the proof-of-concept study that was discussed at the OHDSI symposium (predicting 22 different outcomes in patients with Pharmaceutically treated depression). We like to bound this problem to cherry-picking cohorts and outcomes of interest that show promise. In other words, the aim here is not to run all experiments on all pairs if this will not result in useful models and we need to find ways to reduce (bound) our search space in a smart way.

There are multiple ways to achieve this goal and we would like to work on concrete implementations of these ideas prior and during the Hack-a-thon:

  • Become familiar with the codebase 
    • Gain familiarity with the estimation and prediction codebase while providing much needed documentation and simple examples. Learn and practice open-coding development standards using GitHub. Write unit-tests and see continuous-integration using Travis-CI in action.
  • Reducing the search space
    • We could try to reduce dataset size (search space) without losing power for our cherry-picking goal. For example, we could reduce the lookback period and eventually expand that automatically if needed, start with only conditions and add other variables later, or we could reduce the whole dataset and grow until a plateau is reached in performance (learning curves). The idea is to work on modifications of the code to enable these kinds of experiments.
  • Model training optimization
    • An option is to look at sequential learning, i.e. start with one algorithm (Lasso) and only run the others if the results look promising. We could automate this. Moverover, the size of the hyper-parameter grid has high impact on the computational burden. Part of the team could work on adaptive hyper-parameter training where some heuristic is implemented to guide the search based on past performance results.
  • Model evaluation optimization
    • We can clearly gain speed by optimizing the calculation of the evaluation measures. A clear task for the group is to focus on individual measures and optimize the R code, port it to C++ etc.

It will be crucial to have a preparatory TC with the team members to divide tasks and get them up to speed with the current code base.

Design Session: UI Experience and Information Dissemination
Do you live to crunch numbers? If so, you’ll love the large-scale statistical computing track!

Do you stay up late at night wondering how to accurately define clinical concepts? Go to the phenotyping track!

But if you find yourself inspired by this important question: “How do we actually translate the great work of OHDSI into better health decisions and better care?” Then this is the track for you! Also if you love UI design, visualization, APIs, peanut butter, and Alexa.

The F2F Hackathon Dissemination track is where the rubber meets the road in terms of bringing the collective work of the OHDSI community to the broader community of stakeholders. We will be building three layers of dissemination tools during the hackathon, each designed to communicate the risk (whether personalized or at the population level) of clinical outcomes associated with drug exposures.

Here’s what we’ll be tackling together to Disseminate Evidence on Three Levels:

  • Website for Providers and Policymakers
    • Goal: To disseminate OHDSI evidence to decision-makers who have domain understanding but not necessarily deep scientific / statistical expertise.
    • Hackathon tasks: Site flow → Mockups (paper, wireframes) → Website (bootstrap, js, node, d3, etc) to AWS
  • OHDSI Resource for Researchers, Regulatory Scientists, and other Experts
    • Goal: To disseminate OHDSI in-depth analysis methods and results to experts in the field for purposes of transparency, reproducibility, and peer review.
    • Hackathon tasks: Determine optimal location (eg ohdsi.org), Library design, R Shiny build outs, convention for disseminating with publications etc
  • OHDSI Integration Tools
    • Goal: To support integration of OHDSI evidence in healthcare envrionments
    • Hackathon tasks: Build and deploy OHDSI FHIR-based APIs for delivery of PLP-based predictive models at the point of care.


This meeting will take place at the GTRI conference center at the Georgia Institute of Technology in Atlanta, GA. Details about the conference center and its location on campus can be found here:

Conference Center

Directions to the campus and nearby accommodation can be found here:

Travel & Accomodation