User Tools

Site Tools


welcome:overview:cdm:cdm_conversion_best_practices

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
welcome:overview:cdm:cdm_conversion_best_practices [2017/06/29 15:12]
bchristian Results from morning of second day of discussions
welcome:overview:cdm:cdm_conversion_best_practices [2017/06/29 18:47] (current)
bchristian Updates from afternoon discussion
Line 1: Line 1:
 What do I have to have to do an OMOP conversion? What do I have to have to do an OMOP conversion?
 +
 +Separate the process into modules.
  
 Pre-analysis Pre-analysis
Line 19: Line 21:
   - Privacy requirements   - Privacy requirements
   - Definition of visit   - Definition of visit
 +  - Specific codes that are of interest. ​ Ensure these are mapped.
 +  - Business rules for handling conflicting information (e.g. visits after patient death).
 +  - Specific metrics that are of interest.
  
  
Line 26: Line 31:
   - Revised Data dictionary   - Revised Data dictionary
   - Initial ETL Spec including:   - Initial ETL Spec including:
-    - Business rules for mapping in a detailed specification,​ preferably in a computer readable format, like White Rabbit. +    - Business rules for mapping in a detailed specification,​ preferably in a computer readable format, like [[https://​www.ohdsi.org/​analytic-tools/​whiterabbit-for-etl-design/?​ModPagespeed=noscript|White Rabbit]]
-    - View of mapping, preferably in a computer readable format, like rabbit in hat.+    - View of mapping, preferably in a computer readable format, like [[https://​www.ohdsi.org/​analytic-tools/​whiterabbit-for-etl-design/?​ModPagespeed=noscript|Rabbit-In-a-Hat]].
   - Identify any additional mapping needed:   - Identify any additional mapping needed:
     - custom or local mapping of organizational codes     - custom or local mapping of organizational codes
Line 54: Line 59:
     - Use start and end date of vocabulary items     - Use start and end date of vocabulary items
     - Document what to do with records that are missing required fields     - Document what to do with records that are missing required fields
 +      - maybe you have a medical coder who can code from a description field
     - Document what to do with records that have fields with invalid values     - Document what to do with records that have fields with invalid values
   - Software lifecycle   - Software lifecycle
     - How do you develop, test, and accept for production?     - How do you develop, test, and accept for production?
-    - How do you manage effort and cost to convert millions of patient records in TB of data? 
-      - develop and test using a sample subset of entire data (150 thousand patients) 
-      - business acceptance test using a large sample subset of entire data () 
-      - production run using entire data 
       - Jenkins for automated build       - Jenkins for automated build
       - SVN for source code control       - SVN for source code control
 +    - How do you manage effort and cost to convert millions of patient records in TB of data?
 +      - Use a sample subset of the total data based on number of patients, amount of data, or processing time
 +      - develop and test using a sample subset of entire data (150 thousand patients)
 +      - business acceptance test using a large sample subset of entire data (1 million patients)
 +      - production run using entire data (millions of patients)
     - Define destination location(s)     - Define destination location(s)
     - Always get the latest vocabularies before each refresh (development,​ test, or production run)     - Always get the latest vocabularies before each refresh (development,​ test, or production run)
     - Where do you get the most recent list of codes?     - Where do you get the most recent list of codes?
 +    - Frequency or schedule of reviews
   - How do you become aware of updates to CDM?   - How do you become aware of updates to CDM?
   - How do you become aware of updates to vocabularies?​   - How do you become aware of updates to vocabularies?​
   - Partitioning for parallelism to optimize performance   - Partitioning for parallelism to optimize performance
 +  - Guidelines for incremental update
 +  - Reusable code/Tables
 +  - Intermediate model?
  
-QA+Quality Assurance (QA)
   - How do we ensure ETL is good?   - How do we ensure ETL is good?
     - metrics for success     - metrics for success
Line 78: Line 89:
       - variance between previous and current run       - variance between previous and current run
       - count of records       - count of records
-      - % mapped codes+      - % of records with mapped codes 
 +      - % unique codes that are mapped 
 +      - for select fields (demographics),​ show histogram of values
     - compare actual to expected results     - compare actual to expected results
     - ensure referential integrity on platforms that do not enforce it     - ensure referential integrity on platforms that do not enforce it
 +    - it would be awesome to compare histogram of values for source with equivalent destination
 +    - it would be awesome to show improvements between runs due to better mapping and coding
 +    - it would be nice to show average condition per visit
 +    - some deidentification processes introduce variance in dates or id values
 +  - How do we get business units to participate?​
 +  - How do we get approval from business units?
 +  - Validate destination data with use cases and compare against source data with use cases. Investigate or accept variance.
 +  - Standard model checks that are independent of data or volume
 +  - automatic vs manual checks
 +  - Frequency or schedule of reviews
 +  - Tools
 +    - Achilles
 +    - Autosys
 +    - Oozie
  
 +Operation
 +  - Guidance for archive
 +  - Tools
 +    - monitoring
 +    - kibana
welcome/overview/cdm/cdm_conversion_best_practices.1498749178.txt.gz · Last modified: 2017/06/29 15:12 by bchristian