User Tools

Site Tools


welcome:overview:cdm:cdm_conversion_best_practices

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
welcome:overview:cdm:cdm_conversion_best_practices [2017/06/29 16:00]
bchristian Discussion about QA
welcome:overview:cdm:cdm_conversion_best_practices [2017/06/29 18:47] (current)
bchristian Updates from afternoon discussion
Line 1: Line 1:
 What do I have to have to do an OMOP conversion? What do I have to have to do an OMOP conversion?
 +
 +Separate the process into modules.
  
 Pre-analysis Pre-analysis
Line 29: Line 31:
   - Revised Data dictionary   - Revised Data dictionary
   - Initial ETL Spec including:   - Initial ETL Spec including:
-    - Business rules for mapping in a detailed specification,​ preferably in a computer readable format, like White Rabbit. +    - Business rules for mapping in a detailed specification,​ preferably in a computer readable format, like [[https://​www.ohdsi.org/​analytic-tools/​whiterabbit-for-etl-design/?​ModPagespeed=noscript|White Rabbit]]
-    - View of mapping, preferably in a computer readable format, like rabbit in hat.+    - View of mapping, preferably in a computer readable format, like [[https://​www.ohdsi.org/​analytic-tools/​whiterabbit-for-etl-design/?​ModPagespeed=noscript|Rabbit-In-a-Hat]].
   - Identify any additional mapping needed:   - Identify any additional mapping needed:
     - custom or local mapping of organizational codes     - custom or local mapping of organizational codes
Line 57: Line 59:
     - Use start and end date of vocabulary items     - Use start and end date of vocabulary items
     - Document what to do with records that are missing required fields     - Document what to do with records that are missing required fields
 +      - maybe you have a medical coder who can code from a description field
     - Document what to do with records that have fields with invalid values     - Document what to do with records that have fields with invalid values
   - Software lifecycle   - Software lifecycle
     - How do you develop, test, and accept for production?     - How do you develop, test, and accept for production?
-    - How do you manage effort and cost to convert millions of patient records in TB of data? 
-      - develop and test using a sample subset of entire data (150 thousand patients) 
-      - business acceptance test using a large sample subset of entire data () 
-      - production run using entire data 
       - Jenkins for automated build       - Jenkins for automated build
       - SVN for source code control       - SVN for source code control
 +    - How do you manage effort and cost to convert millions of patient records in TB of data?
 +      - Use a sample subset of the total data based on number of patients, amount of data, or processing time
 +      - develop and test using a sample subset of entire data (150 thousand patients)
 +      - business acceptance test using a large sample subset of entire data (1 million patients)
 +      - production run using entire data (millions of patients)
     - Define destination location(s)     - Define destination location(s)
     - Always get the latest vocabularies before each refresh (development,​ test, or production run)     - Always get the latest vocabularies before each refresh (development,​ test, or production run)
     - Where do you get the most recent list of codes?     - Where do you get the most recent list of codes?
 +    - Frequency or schedule of reviews
   - How do you become aware of updates to CDM?   - How do you become aware of updates to CDM?
   - How do you become aware of updates to vocabularies?​   - How do you become aware of updates to vocabularies?​
   - Partitioning for parallelism to optimize performance   - Partitioning for parallelism to optimize performance
 +  - Guidelines for incremental update
 +  - Reusable code/Tables
 +  - Intermediate model?
  
-QA+Quality Assurance (QA)
   - How do we ensure ETL is good?   - How do we ensure ETL is good?
     - metrics for success     - metrics for success
Line 89: Line 97:
     - it would be awesome to show improvements between runs due to better mapping and coding     - it would be awesome to show improvements between runs due to better mapping and coding
     - it would be nice to show average condition per visit     - it would be nice to show average condition per visit
 +    - some deidentification processes introduce variance in dates or id values
 +  - How do we get business units to participate?​
 +  - How do we get approval from business units?
 +  - Validate destination data with use cases and compare against source data with use cases. Investigate or accept variance.
 +  - Standard model checks that are independent of data or volume
 +  - automatic vs manual checks
 +  - Frequency or schedule of reviews
 +  - Tools
 +    - Achilles
 +    - Autosys
 +    - Oozie
  
 +Operation
 +  - Guidance for archive
 +  - Tools
 +    - monitoring
 +    - kibana
welcome/overview/cdm/cdm_conversion_best_practices.1498752029.txt.gz · Last modified: 2017/06/29 16:00 by bchristian