Differences

This shows you the differences between two versions of the page.

--- documentation:athena:import_snomed [2015/04/01 08:45]
gleb_malikov created
+++ documentation:athena:import_snomed [2015/04/02 04:30] (current)
cgreich
@@ Line 1: / Line 1: @@
+I would build the logic slightly differently:
+. Concepts.
+- We have one authoratitative source: SNOMED international in combination with SNOMED UK. Other components might follow later (DM+D, other country-specific versions).
+- We get a stream of concepts from them:
+  - Attributes of existing concepts are overwritten by the new concepts
+  - New concepts are added
+  - Missing concepts are deprecated
+  - Explicitely deprecated (inactivated) concepts are deprecated
+  - We do domain assignments for all of them. This is done by building the entire hierarchical tree and defining "peaks", of which all children inherit their domain.
+  - We define standard_concepts depending on their deprecation status and domain
+- We get a stream of concept-to-concept relationships
+  - New ones get added
+  - Missing ones - if the concepts are deprecated, we leave them alone, if the concepts are active, we deprecate them
+  - Explicitely deprecated ones are deprecated
+- We get a stream of update (inactive to active) relationships (only one per deprecated concept must exist)
+  - New ones get added
+  - Existing identical ones get left alone
+  - Existing update relationship to a different concept get deprecated and the new one added
+Makes sense?
+I am not sure we need UMLS for them. UMLS is really only a re-formating of SNOMED. There isn't much going on. Unless you found something in http://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/SNOMEDCT_US. Take a look.
 ====== Import data from SNOMED vocabulary. ======
@@ Line 21: / Line 44: @@
   - Search OMOP by the "CONCEPT_CODE"
   - Query the UMLS by web API.
+After this checks we will receive the data:
+|Records processed| **X** |
+|Records recognized only by OMOP| **Y** |
+|Records recognized only by UMLS| **Z** |
+|Records recognized by OMOP and UMLS| **N**|
+|Records not recognized| **M**|
+From this table we can say that:
+**N** - stable records, recognized by both systems, most likely they are valid.
+**Z** - missing records, that should be added to OMOP. We can use UMLS data for validation purposes.
+**Y** - this data should be inspected. There might be an invalid records, or we importing newer version of SNOMED, that included in UMLS.
+**M** - new records, that are just added to new version of SNOMED. We need to validate them, using the source description.
+Also, we must have an ability to see each of this subsets as the table or export it to file by the user request.
+==== Validation ====
+This process allows us to ensure, that OMOP describes the Concept exactly as the Source vocabulary. We also can use UMLS API for additional checks.
+At first we should define the Concept's type. It can be:
+  * Domain
+  * Relationship
+  * Standard Concept
+  * Classification Concept
+  * Vocabulary
+After the type of the Concept is been defined, we can perform the additional checks, that are specific for each type.
+=== Domain ===
+If the current concept is Domain, we can verify that:
+  * There is a Domain entity connected with this Concept.
+  * The string description of the Source Concept is equal to CONCEPT_NAME, and DOMAIN_NAME. If they are not equal, there must be at least one connected Concept Synonym with equal CONCEPT_SYNONYM_NAME.
+=== Relationship ===
+If current Source Concept is Relationship, we should compare it with the Relationship Mapping.
+=== Classification Concept ===
+  * Must belong to same Domain as the Source Concept.
+  * Must be a part of the same Vocabulary, as being imported.
+  * Must have same count of siblings as the Source Concept.
+  * Must have equal CONCEPT_NAME, or CONCEPT_SYNONYM.
+=== Standard Concept ===
+  * Must belong to same Domain.
+  * Must be a part of same vocabulary.
+  * Must have an equal Relations within the importing vocabulary.

Observational Health Data Sciences and Informatics

User Tools

Site Tools

Differences

Page Tools