This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
|
documentation:athena:import_snomed [2015/04/01 08:45] gleb_malikov created |
documentation:athena:import_snomed [2015/04/02 04:30] (current) cgreich |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | I would build the logic slightly differently: | ||
| + | |||
| + | 1. Concepts. | ||
| + | - We have one authoratitative source: SNOMED international in combination with SNOMED UK. Other components might follow later (DM+D, other country-specific versions). | ||
| + | - We get a stream of concepts from them: | ||
| + | - Attributes of existing concepts are overwritten by the new concepts | ||
| + | - New concepts are added | ||
| + | - Missing concepts are deprecated | ||
| + | - Explicitely deprecated (inactivated) concepts are deprecated | ||
| + | - We do domain assignments for all of them. This is done by building the entire hierarchical tree and defining "peaks", of which all children inherit their domain. | ||
| + | - We define standard_concepts depending on their deprecation status and domain | ||
| + | - We get a stream of concept-to-concept relationships | ||
| + | - New ones get added | ||
| + | - Missing ones - if the concepts are deprecated, we leave them alone, if the concepts are active, we deprecate them | ||
| + | - Explicitely deprecated ones are deprecated | ||
| + | - We get a stream of update (inactive to active) relationships (only one per deprecated concept must exist) | ||
| + | - New ones get added | ||
| + | - Existing identical ones get left alone | ||
| + | - Existing update relationship to a different concept get deprecated and the new one added | ||
| + | |||
| + | Makes sense? | ||
| + | |||
| + | I am not sure we need UMLS for them. UMLS is really only a re-formating of SNOMED. There isn't much going on. Unless you found something in http://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/SNOMEDCT_US. Take a look. | ||
| ====== Import data from SNOMED vocabulary. ====== | ====== Import data from SNOMED vocabulary. ====== | ||
| Line 21: | Line 44: | ||
| - Search OMOP by the "CONCEPT_CODE" | - Search OMOP by the "CONCEPT_CODE" | ||
| - Query the UMLS by web API. | - Query the UMLS by web API. | ||
| + | After this checks we will receive the data: | ||
| + | |||
| + | |Records processed| **X** | | ||
| + | |Records recognized only by OMOP| **Y** | | ||
| + | |Records recognized only by UMLS| **Z** | | ||
| + | |Records recognized by OMOP and UMLS| **N**| | ||
| + | |Records not recognized| **M**| | ||
| + | |||
| + | From this table we can say that: | ||
| + | |||
| + | **N** - stable records, recognized by both systems, most likely they are valid. | ||
| + | |||
| + | **Z** - missing records, that should be added to OMOP. We can use UMLS data for validation purposes. | ||
| + | |||
| + | **Y** - this data should be inspected. There might be an invalid records, or we importing newer version of SNOMED, that included in UMLS. | ||
| + | |||
| + | **M** - new records, that are just added to new version of SNOMED. We need to validate them, using the source description. | ||
| + | |||
| + | Also, we must have an ability to see each of this subsets as the table or export it to file by the user request. | ||
| + | ==== Validation ==== | ||
| + | |||
| + | This process allows us to ensure, that OMOP describes the Concept exactly as the Source vocabulary. We also can use UMLS API for additional checks. | ||
| + | At first we should define the Concept's type. It can be: | ||
| + | * Domain | ||
| + | * Relationship | ||
| + | * Standard Concept | ||
| + | * Classification Concept | ||
| + | * Vocabulary | ||
| + | After the type of the Concept is been defined, we can perform the additional checks, that are specific for each type. | ||
| + | |||
| + | === Domain === | ||
| + | |||
| + | If the current concept is Domain, we can verify that: | ||
| + | * There is a Domain entity connected with this Concept. | ||
| + | * The string description of the Source Concept is equal to CONCEPT_NAME, and DOMAIN_NAME. If they are not equal, there must be at least one connected Concept Synonym with equal CONCEPT_SYNONYM_NAME. | ||
| + | |||
| + | === Relationship === | ||
| + | |||
| + | If current Source Concept is Relationship, we should compare it with the Relationship Mapping. | ||
| + | |||
| + | === Classification Concept === | ||
| + | * Must belong to same Domain as the Source Concept. | ||
| + | * Must be a part of the same Vocabulary, as being imported. | ||
| + | * Must have same count of siblings as the Source Concept. | ||
| + | * Must have equal CONCEPT_NAME, or CONCEPT_SYNONYM. | ||
| + | === Standard Concept === | ||
| + | * Must belong to same Domain. | ||
| + | * Must be a part of same vocabulary. | ||
| + | * Must have an equal Relations within the importing vocabulary. | ||