User Tools

Site Tools


welcome:overview:cdm:cdm_conversion_best_practices

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
welcome:overview:cdm:cdm_conversion_best_practices [2017/06/28 21:14]
bchristian created
welcome:overview:cdm:cdm_conversion_best_practices [2017/06/29 18:47] (current)
bchristian Updates from afternoon discussion
Line 1: Line 1:
 What do I have to have to do an OMOP conversion? What do I have to have to do an OMOP conversion?
 +
 +Separate the process into modules.
  
 Pre-analysis Pre-analysis
Line 7: Line 9:
     - OMOP vocabulary (reference codes)     - OMOP vocabulary (reference codes)
     - are any fields masked or modified for privacy and if so, what is the algorithm?     - are any fields masked or modified for privacy and if so, what is the algorithm?
 +  - Use min and max to measure range for observations and for each field.
 +    - Define how to handle exceptions
   - Identify subject matter expert for source.   - Identify subject matter expert for source.
   - Identify OMOP subject matter expert.   - Identify OMOP subject matter expert.
Line 15: Line 19:
     - Is the conversion feasible?     - Is the conversion feasible?
   - Draft ETL Specs.   - Draft ETL Specs.
 +  - Privacy requirements
 +  - Definition of visit
 +  - Specific codes that are of interest. ​ Ensure these are mapped.
 +  - Business rules for handling conflicting information (e.g. visits after patient death).
 +  - Specific metrics that are of interest.
  
  
Line 20: Line 29:
  
 Post-analysis Post-analysis
 +  - Revised Data dictionary
   - Initial ETL Spec including:   - Initial ETL Spec including:
-    - Business rules for mapping in a detailed specification,​ preferably in a computer readable format, like White Rabbit. +    - Business rules for mapping in a detailed specification,​ preferably in a computer readable format, like [[https://​www.ohdsi.org/​analytic-tools/​whiterabbit-for-etl-design/?​ModPagespeed=noscript|White Rabbit]]
-    - View of mapping, preferably in a computer readable format, like rabbit in hat.+    - View of mapping, preferably in a computer readable format, like [[https://​www.ohdsi.org/​analytic-tools/​whiterabbit-for-etl-design/?​ModPagespeed=noscript|Rabbit-In-a-Hat]].
   - Identify any additional mapping needed:   - Identify any additional mapping needed:
     - custom or local mapping of organizational codes     - custom or local mapping of organizational codes
     - country codes not already in OHDSI vocabulary like OHIP codes     - country codes not already in OHDSI vocabulary like OHIP codes
 +  - What to do with old codes that have been deactivated or invalidated but have no replacement codes
 +    - add new values to be published in next version of vocabulary
   - Preferred development methodology   - Preferred development methodology
  
Line 37: Line 49:
     - Dimension table first     - Dimension table first
     - GRA table always last because derived     - GRA table always last because derived
 +  - Define range of data to convert
 +    - Observation period
 +      - first/last?
 +    - EMR vs Claims
 +    - Use min and max to measure range for observations and for each field.
 +      - exclude reversals for claims
 +      - exclude invalid dates
 +      - identify broken processes that generate invalid dates to help remove invalid dates
 +    - Use start and end date of vocabulary items
 +    - Document what to do with records that are missing required fields
 +      - maybe you have a medical coder who can code from a description field
 +    - Document what to do with records that have fields with invalid values
 +  - Software lifecycle
 +    - How do you develop, test, and accept for production?
 +      - Jenkins for automated build
 +      - SVN for source code control
 +    - How do you manage effort and cost to convert millions of patient records in TB of data?
 +      - Use a sample subset of the total data based on number of patients, amount of data, or processing time
 +      - develop and test using a sample subset of entire data (150 thousand patients)
 +      - business acceptance test using a large sample subset of entire data (1 million patients)
 +      - production run using entire data (millions of patients)
 +    - Define destination location(s)
 +    - Always get the latest vocabularies before each refresh (development,​ test, or production run)
 +    - Where do you get the most recent list of codes?
 +    - Frequency or schedule of reviews
 +  - How do you become aware of updates to CDM?
 +  - How do you become aware of updates to vocabularies?​
 +  - Partitioning for parallelism to optimize performance
 +  - Guidelines for incremental update
 +  - Reusable code/Tables
 +  - Intermediate model?
 +
 +Quality Assurance (QA)
 +  - How do we ensure ETL is good?
 +    - metrics for success
 +    - for each source and destination table
 +      - top mapped codes with count of records
 +      - top unmapped codes with count of records
 +      - variance between previous and current run
 +      - count of records
 +      - % of records with mapped codes
 +      - % unique codes that are mapped
 +      - for select fields (demographics),​ show histogram of values
 +    - compare actual to expected results
 +    - ensure referential integrity on platforms that do not enforce it
 +    - it would be awesome to compare histogram of values for source with equivalent destination
 +    - it would be awesome to show improvements between runs due to better mapping and coding
 +    - it would be nice to show average condition per visit
 +    - some deidentification processes introduce variance in dates or id values
 +  - How do we get business units to participate?​
 +  - How do we get approval from business units?
 +  - Validate destination data with use cases and compare against source data with use cases. Investigate or accept variance.
 +  - Standard model checks that are independent of data or volume
 +  - automatic vs manual checks
 +  - Frequency or schedule of reviews
 +  - Tools
 +    - Achilles
 +    - Autosys
 +    - Oozie
  
 +Operation
 +  - Guidance for archive
 +  - Tools
 +    - monitoring
 +    - kibana
welcome/overview/cdm/cdm_conversion_best_practices.1498684463.txt.gz · Last modified: 2017/06/28 21:14 by bchristian