Hua Xu, Jon Duke, George Hripcsak, Karthik Natarajan, Anupama Gururaj, Mark Khayter, Min Jiang, Alexandre Yahi, Noemie Elhadad, Juan M Banda, Olga Patterson, Lian Hu
the model is based on the SHARE-N model and adapted to the current data structure. This model incorporates other semantic types and all of the modifiers are not available in cTAKES yet.
the notes were processed from eMERGE cohort at Columbia with about 60,000 notes encompassing 1700 patients. The original patient number was 3200.
In theory, a set containing the combination of minimal modifiers can be generated. Practically, can we trust the data enough to add it into OHDSI tables? - only highest confidence data (with maximum PPV) should be added to the tables.
Next steps:
Look at the note sections to determine the errors.
Work with Sunny to generate the NLP outputs for the phenotyping data
Evaluate by comparisons with structured data
Make the system more robust
Generate a protocol and/or annotation guidelines
Share the data as a Gold standard with manually annotated CUIs
Alex's script is to be tried on different datasets and evaluated across notes from different institutions
Identify minimal set of notes to work with when recommending to the OHDSI community
Identify sets of concepts that are not reliable - negation is a very good example of this idea.
Continue discussion of NLP system evaluation across different sites
The NLP-WG will meet on second Wednesday of every month
Action Items
Note-type mapping Presentation - Karthik
Share existing ontologies from Vanderbilt (Hua) and Regenstrief (Jon)
Share strategies for combining data from different searches - Jon
Report on WG for commenting - Hua
Wrappers for cTAKES and Metamap - Min
Improvements to search engine set up using MT samples - Min
Textual Data Representation - Discussion
NLP system evaluation across different sites - Discussion