Table of Contents

Minutes_Meeting_10072015

Attendees

Hua Xu, Jon Duke, Noemie Elhadad, Anupama Gururaj, Alexandre Yahi, Thomas Ginter, Olga Patterson, George Hripsack, Vojtech Huser

Agenda

  1. IRB for use of clinical text
  2. Clinical text data storage and representation schema
    1. Presentation by Dr. Noemie Elhadad, Title: NLP schemas and clinical NLP tools in ShARe
    2. Discussion – Next steps
  3. NLP tools/pipelines for ETL
  4. Use cases, e.g, phenotyping for cohort selection using NLP outputs
    1. Presentation by Dr. Jon Duke, Title: Regenstrief NLP platform and approach to validation of phenotypes
    2. Discussion – Next steps
  5. Discussion

Minutes

  1. IRB for use of clinical text
  2. Clinical text data storage and representation schema
    1. Presentation by Dr. Noemie Elhadad
    2. Title: NLP schemas and clinical NLP tools in ShARe, File
      1. output of converted unstructured text could be in the form of structured data, bag of words and word embedding. Structured data and bag of words are the most useful in the current context.
      2. the ShARe schema for structured output combines many initiatives such as SHARP, THYME etc.
    3. Discussion – Next steps
      1. Table structure for storing concept level NLP outputs to be determined
      2. It is sufficient to start with structured output
      3. A concept table with concept ID in each row and note IDs should be generated
      4. OMOP vocabulary is to be used to aggregate concept to a higher level to manage and condense the number of concepts
      5. Next step is to go through all the columns exhaustively for all attributes, merge them and then decide the attributes that should be used in the table
  3. NLP tools/pipelines for ETL
  4. Use cases, e.g, phenotyping for cohort selection using NLP outputs
    1. Presentation by Dr. Jon Duke, Title: Regenstrief NLP platform and approach to validation of phenotypes
      1. the NLP platform is composed of a state machine with Regex based system
      2. the NLP data analysis tool is currently being used for data analytics at Regenstrief.
      3. the tool has text search capabilities and was demonstrated at the meeting
    2. Discussion – Next steps
      1. need to determine if the API for keyword search based on Solr or ElasticSearch can be shared
  5. Discussion