Natural language processing in biomedicine: A unifi ed system architecture overview

Son Doan, Mike Conway, Tu Minh Phuong, Lucila Ohno-Machado

Research output: Contribution to journalArticlepeer-review

56 Scopus citations


In contemporary electronic medical records much of the clinically important data—signs and symptoms,symptom severity, disease status, etc.—are not provided in structured data fi elds but rather are encoded inclinician-generated narrative text. Natural language processing (NLP) provides a means of unlocking thisimportant data source for applications in clinical decision support, quality assurance, and public health.This chapter provides an overview of representative NLP systems in biomedicine based on a unifi ed architecturalview. A general architecture in an NLP system consists of two main components: backgroundknowledge that includes biomedical knowledge resources and a framework that integrates NLP tools toprocess text. Systems differ in both components, which we review briefl y. Additionally, the challenge facingcurrent research efforts in biomedical NLP includes the paucity of large, publicly available annotated corpora,although initiatives that facilitate data sharing, system evaluation, and collaborative work betweenresearchers in clinical NLP are starting to emerge.

Original languageEnglish
Pages (from-to)275-294
Number of pages20
JournalMethods in Molecular Biology
StatePublished - 2014
Externally publishedYes


  • Biomedicine
  • Electronic medical record
  • Machine learning method
  • Natural language processing
  • Rule-based learning method
  • System architecture
  • Unified Medical Language System


Dive into the research topics of 'Natural language processing in biomedicine: A unifi ed system architecture overview'. Together they form a unique fingerprint.

Cite this