Extensive interest from the OHDSI community with reference to the text processing aspect. During the meeting, suggestions for improvements in the current projects were received.
IRB for use of clinical text
IRB language pertaining to textual part of the record is being compiled from multiple sources.
Anu will collect and generate a generic document for use as an example.
Once approval of the document is obtained from the contributors, the document will be posted online for use by the OHDSI community.
Clinical text data storage and representation schema
Minimum set of modifiers for all clinical entities that support use of rule to derive clinical concepts will be generated by Alex (Columbia).
To classify the notes for the representation schema, metadata about the notes with note-type defined in detail and mapped to LOINC codes will be generated.
Note types from different institutions will be collected. George will share hierarchical note type metadata. Also, we will collect note type metadata from Josh Denny at Vanderbilt. All the collected material will be aggregated by Karthik.
NLP tools/pipelines for ETL
The plan is to develop a set of wrappers for multiple NLP tools (currently cTAKES and MetaMap) for conversion of output to the OHDSI textual data schema.
In order to get an idea of the updates in cTAKES, need to invite Guergana Savova to present and do a demo of cTAKES during the January call.
In order to prioritize the work, focus on positive concepts first for high confidence extraction of NER from text.
Use cases, e.g, phenotyping for cohort selection using NLP outputs
To define the syntax for storing phenotypes, two aspects can be considered:
set of data elements or features on which an algorithm functions
formulation of the phenotype definition
In order to represent the NLP output, query-based phenotyping will be the first focus of the group.
For machine-learning based algorithms, the NLP output will be accessed outside of the CDM
Is ElasticSearch a good first step in this area? ES should be considered here as a tool more for cohort building and selection rather than phenotyping. For this purpose, it is a good starting point.
Finding patients for clinical trials will be used as a usecase here. The ES could serve as an explorer for feature selection in the phenotyping process.
Action item: Simple search set up for MT samples by next meeting by Min.
Use MIMICII and MIMICIII as demo datasets for the tools being developed by the group
Discussion
Action Items
General IRB document for use of clinical text and approval from all contributors, post online - Anu
Collect minimum set of modifiers for all clinical entities that support use of rule to derive clinical concepts: Alex
Aggregate and share note-type metadata from various sources: Karthik