Classifying disease outbreak reports using n-grams and semantic features

Mike Conway, Son Doan, Ai Kawazoe, Nigel Collier

Research output: Contribution to conferencePaperpeer-review

8 Scopus citations

Abstract

This paper explores the benefits of using n-grams and semantic features for the classification of disease outbreak reports, in the context of a text mining system - BioCaster - that identifies and tracks emerging infectious disease outbreaks from online news. We show that a combination of bag-of-words features, n-grams and semantic features, in conjunction with feature selection, improves classification accuracy at a statistically significant level when compared to previous work. A novel feature of the work reported in this paper is the use of a semantic tagger - the USAS tagger - to generate features.

Original languageEnglish
Pages29-36
Number of pages8
StatePublished - 2008
Externally publishedYes
Event3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Turku, Finland
Duration: Sep 1 2008Sep 3 2008

Conference

Conference3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008
Country/TerritoryFinland
CityTurku
Period09/1/0809/3/08

Fingerprint

Dive into the research topics of 'Classifying disease outbreak reports using n-grams and semantic features'. Together they form a unique fingerprint.

Cite this