Differences

This shows you the differences between two versions of the page.

--- welcome:overview:cdm:cdm_conversion_best_practices [2017/06/29 15:12]
bchristian Results from morning of second day of discussions
+++ welcome:overview:cdm:cdm_conversion_best_practices [2017/06/29 18:47] (current)
bchristian Updates from afternoon discussion
@@ Line 1: / Line 1: @@
 What do I have to have to do an OMOP conversion?
+Separate the process into modules.
 Pre-analysis
@@ Line 19: / Line 21: @@
   - Privacy requirements
   - Definition of visit
+  - Specific codes that are of interest.  Ensure these are mapped.
+  - Business rules for handling conflicting information (e.g. visits after patient death).
+  - Specific metrics that are of interest.
@@ Line 26: / Line 31: @@
   - Revised Data dictionary
   - Initial ETL Spec including:
-    - Business rules for mapping in a detailed specification, preferably in a computer readable format, like White Rabbit.
+    - Business rules for mapping in a detailed specification, preferably in a computer readable format, like [[https://www.ohdsi.org/analytic-tools/whiterabbit-for-etl-design/?ModPagespeed=noscript|White Rabbit]].
-    - View of mapping, preferably in a computer readable format, like rabbit in a hat.
+    - View of mapping, preferably in a computer readable format, like [[https://www.ohdsi.org/analytic-tools/whiterabbit-for-etl-design/?ModPagespeed=noscript|Rabbit-In-a-Hat]].
   - Identify any additional mapping needed:
     - custom or local mapping of organizational codes
@@ Line 54: / Line 59: @@
     - Use start and end date of vocabulary items
     - Document what to do with records that are missing required fields
+      - maybe you have a medical coder who can code from a description field
     - Document what to do with records that have fields with invalid values
   - Software lifecycle
     - How do you develop, test, and accept for production?
-    - How do you manage effort and cost to convert millions of patient records in TB of data?
-      - develop and test using a sample subset of entire data (150 thousand patients)
-      - business acceptance test using a large sample subset of entire data ()
-      - production run using entire data
       - Jenkins for automated build
       - SVN for source code control
+    - How do you manage effort and cost to convert millions of patient records in TB of data?
+      - Use a sample subset of the total data based on number of patients, amount of data, or processing time
+      - develop and test using a sample subset of entire data (150 thousand patients)
+      - business acceptance test using a large sample subset of entire data (1 million patients)
+      - production run using entire data (millions of patients)
     - Define destination location(s)
     - Always get the latest vocabularies before each refresh (development, test, or production run)
     - Where do you get the most recent list of codes?
+    - Frequency or schedule of reviews
   - How do you become aware of updates to CDM?
   - How do you become aware of updates to vocabularies?
   - Partitioning for parallelism to optimize performance
+  - Guidelines for incremental update
+  - Reusable code/Tables
+  - Intermediate model?
-QA
+Quality Assurance (QA)
   - How do we ensure ETL is good?
     - metrics for success
@@ Line 78: / Line 89: @@
       - variance between previous and current run
       - count of records
-      - % mapped codes
+      - % of records with mapped codes
+      - % unique codes that are mapped
+      - for select fields (demographics), show histogram of values
     - compare actual to expected results
     - ensure referential integrity on platforms that do not enforce it
+    - it would be awesome to compare histogram of values for source with equivalent destination
+    - it would be awesome to show improvements between runs due to better mapping and coding
+    - it would be nice to show average condition per visit
+    - some deidentification processes introduce variance in dates or id values
+  - How do we get business units to participate?
+  - How do we get approval from business units?
+  - Validate destination data with use cases and compare against source data with use cases. Investigate or accept variance.
+  - Standard model checks that are independent of data or volume
+  - automatic vs manual checks
+  - Frequency or schedule of reviews
+  - Tools
+    - Achilles
+    - Autosys
+    - Oozie
+Operation
+  - Guidance for archive
+  - Tools
+    - monitoring
+    - kibana

Observational Health Data Sciences and Informatics

User Tools

Site Tools

Differences

Page Tools