Attributes of existing concepts are overwritten by the new concepts
New concepts are added
Missing concepts are deprecated
Explicitely deprecated (inactivated) concepts are deprecated
We do domain assignments for all of them. This is done by building the entire hierarchical tree and defining “peaks”, of which all children inherit their domain.
We define standard_concepts depending on their deprecation status and domain

- We get a stream of concept-to-concept relationships

New ones get added
Missing ones - if the concepts are deprecated, we leave them alone, if the concepts are active, we deprecate them
Explicitely deprecated ones are deprecated

- We get a stream of update (inactive to active) relationships (only one per deprecated concept must exist)

New ones get added
Existing identical ones get left alone
Existing update relationship to a different concept get deprecated and the new one added

Makes sense?

I am not sure we need UMLS for them. UMLS is really only a re-formating of SNOMED. There isn't much going on. Unless you found something in http://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/SNOMEDCT_US. Take a look.

Import data from SNOMED vocabulary.

The source SNOMED vocabulary can be acquired from SNOMED. Also, SNOMED is included in UMLS. Both of this resources are suggested to be used in the import process.

Local copy of the vocabulary will be used to extract the concepts, and the UMLS web API will be used for additional concept analysis. The advantages of this approach are:

If the SNOMED vocabulary has been updated, we can process it, and we do not have to wait the next UMLS update.
Less load on web API.
Still we can use UMLS knowledge about the SNOMED vocabulary, especially in cross-vocabulary relations.

We'll start from the basic import process, which will give additional knowledge about the process itself.

The import process

Each concept in the source dictionary can be:

Identified
Validated

In current scope, identification means that OMOP and UMLS already have info about current context. When the Concept is identified, it can be validated. Each Concept is described by its type, set of attributes and relations with other Concepts. During the validation process, we must compare the Source and UMLS Concepts description to OMOP. If the translation can be performed to both directions, without data integrity and validity violation, we can say that the Concept is valid.

Identification

To identify the Concept we must:

Search OMOP by the “CONCEPT_CODE”
Query the UMLS by web API.

After this checks we will receive the data:

Records processed	X
Records recognized only by OMOP	Y
Records recognized only by UMLS	Z
Records recognized by OMOP and UMLS	N
Records not recognized	M

From this table we can say that:

N - stable records, recognized by both systems, most likely they are valid.

Z - missing records, that should be added to OMOP. We can use UMLS data for validation purposes.

Y - this data should be inspected. There might be an invalid records, or we importing newer version of SNOMED, that included in UMLS.

M - new records, that are just added to new version of SNOMED. We need to validate them, using the source description.

Also, we must have an ability to see each of this subsets as the table or export it to file by the user request.

Validation

This process allows us to ensure, that OMOP describes the Concept exactly as the Source vocabulary. We also can use UMLS API for additional checks. At first we should define the Concept's type. It can be:

Domain
Relationship
Standard Concept
Classification Concept
Vocabulary

After the type of the Concept is been defined, we can perform the additional checks, that are specific for each type.

Domain

If the current concept is Domain, we can verify that:

There is a Domain entity connected with this Concept.
The string description of the Source Concept is equal to CONCEPT_NAME, and DOMAIN_NAME. If they are not equal, there must be at least one connected Concept Synonym with equal CONCEPT_SYNONYM_NAME.

Relationship

If current Source Concept is Relationship, we should compare it with the Relationship Mapping.

Classification Concept

Must belong to same Domain as the Source Concept.
Must be a part of the same Vocabulary, as being imported.
Must have same count of siblings as the Source Concept.
Must have equal CONCEPT_NAME, or CONCEPT_SYNONYM.

Standard Concept

Must belong to same Domain.
Must be a part of same vocabulary.
Must have an equal Relations within the importing vocabulary.

Observational Health Data Sciences and Informatics

Sidebar

Automated Terminology Harmonization, Extraction and Normalization for Analytics - ATHENA

Table of Contents

Import data from SNOMED vocabulary.

The import process

Identification

Validation

Domain

Relationship

Classification Concept

Standard Concept

Observational Health Data Sciences and Informatics

User Tools

Site Tools

Sidebar

Automated Terminology Harmonization, Extraction and Normalization for Analytics - ATHENA

Table of Contents

Import data from SNOMED vocabulary.

The import process

Identification

Validation

Domain

Relationship

Classification Concept

Standard Concept

Page Tools