This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
welcome:overview:cdm:cdm_conversion_best_practices [2017/06/29 15:12] bchristian Results from morning of second day of discussions |
welcome:overview:cdm:cdm_conversion_best_practices [2017/06/29 18:47] (current) bchristian Updates from afternoon discussion |
||
---|---|---|---|
Line 1: | Line 1: | ||
What do I have to have to do an OMOP conversion? | What do I have to have to do an OMOP conversion? | ||
+ | |||
+ | Separate the process into modules. | ||
Pre-analysis | Pre-analysis | ||
Line 19: | Line 21: | ||
- Privacy requirements | - Privacy requirements | ||
- Definition of visit | - Definition of visit | ||
+ | - Specific codes that are of interest. Ensure these are mapped. | ||
+ | - Business rules for handling conflicting information (e.g. visits after patient death). | ||
+ | - Specific metrics that are of interest. | ||
Line 26: | Line 31: | ||
- Revised Data dictionary | - Revised Data dictionary | ||
- Initial ETL Spec including: | - Initial ETL Spec including: | ||
- | - Business rules for mapping in a detailed specification, preferably in a computer readable format, like White Rabbit. | + | - Business rules for mapping in a detailed specification, preferably in a computer readable format, like [[https://www.ohdsi.org/analytic-tools/whiterabbit-for-etl-design/?ModPagespeed=noscript|White Rabbit]]. |
- | - View of mapping, preferably in a computer readable format, like rabbit in a hat. | + | - View of mapping, preferably in a computer readable format, like [[https://www.ohdsi.org/analytic-tools/whiterabbit-for-etl-design/?ModPagespeed=noscript|Rabbit-In-a-Hat]]. |
- Identify any additional mapping needed: | - Identify any additional mapping needed: | ||
- custom or local mapping of organizational codes | - custom or local mapping of organizational codes | ||
Line 54: | Line 59: | ||
- Use start and end date of vocabulary items | - Use start and end date of vocabulary items | ||
- Document what to do with records that are missing required fields | - Document what to do with records that are missing required fields | ||
+ | - maybe you have a medical coder who can code from a description field | ||
- Document what to do with records that have fields with invalid values | - Document what to do with records that have fields with invalid values | ||
- Software lifecycle | - Software lifecycle | ||
- How do you develop, test, and accept for production? | - How do you develop, test, and accept for production? | ||
- | - How do you manage effort and cost to convert millions of patient records in TB of data? | ||
- | - develop and test using a sample subset of entire data (150 thousand patients) | ||
- | - business acceptance test using a large sample subset of entire data () | ||
- | - production run using entire data | ||
- Jenkins for automated build | - Jenkins for automated build | ||
- SVN for source code control | - SVN for source code control | ||
+ | - How do you manage effort and cost to convert millions of patient records in TB of data? | ||
+ | - Use a sample subset of the total data based on number of patients, amount of data, or processing time | ||
+ | - develop and test using a sample subset of entire data (150 thousand patients) | ||
+ | - business acceptance test using a large sample subset of entire data (1 million patients) | ||
+ | - production run using entire data (millions of patients) | ||
- Define destination location(s) | - Define destination location(s) | ||
- Always get the latest vocabularies before each refresh (development, test, or production run) | - Always get the latest vocabularies before each refresh (development, test, or production run) | ||
- Where do you get the most recent list of codes? | - Where do you get the most recent list of codes? | ||
+ | - Frequency or schedule of reviews | ||
- How do you become aware of updates to CDM? | - How do you become aware of updates to CDM? | ||
- How do you become aware of updates to vocabularies? | - How do you become aware of updates to vocabularies? | ||
- Partitioning for parallelism to optimize performance | - Partitioning for parallelism to optimize performance | ||
+ | - Guidelines for incremental update | ||
+ | - Reusable code/Tables | ||
+ | - Intermediate model? | ||
- | QA | + | Quality Assurance (QA) |
- How do we ensure ETL is good? | - How do we ensure ETL is good? | ||
- metrics for success | - metrics for success | ||
Line 78: | Line 89: | ||
- variance between previous and current run | - variance between previous and current run | ||
- count of records | - count of records | ||
- | - % mapped codes | + | - % of records with mapped codes |
+ | - % unique codes that are mapped | ||
+ | - for select fields (demographics), show histogram of values | ||
- compare actual to expected results | - compare actual to expected results | ||
- ensure referential integrity on platforms that do not enforce it | - ensure referential integrity on platforms that do not enforce it | ||
+ | - it would be awesome to compare histogram of values for source with equivalent destination | ||
+ | - it would be awesome to show improvements between runs due to better mapping and coding | ||
+ | - it would be nice to show average condition per visit | ||
+ | - some deidentification processes introduce variance in dates or id values | ||
+ | - How do we get business units to participate? | ||
+ | - How do we get approval from business units? | ||
+ | - Validate destination data with use cases and compare against source data with use cases. Investigate or accept variance. | ||
+ | - Standard model checks that are independent of data or volume | ||
+ | - automatic vs manual checks | ||
+ | - Frequency or schedule of reviews | ||
+ | - Tools | ||
+ | - Achilles | ||
+ | - Autosys | ||
+ | - Oozie | ||
+ | Operation | ||
+ | - Guidance for archive | ||
+ | - Tools | ||
+ | - monitoring | ||
+ | - kibana |