Abstract
This paper explores the benefits of using n-grams and semantic features for the classification of disease outbreak reports, in the context of a text mining system - BioCaster - that identifies and tracks emerging infectious disease outbreaks from online news. We show that a combination of bag-of-words features, n-grams and semantic features, in conjunction with feature selection, improves classification accuracy at a statistically significant level when compared to previous work. A novel feature of the work reported in this paper is the use of a semantic tagger - the USAS tagger - to generate features.
Original language | English |
---|---|
Pages | 29-36 |
Number of pages | 8 |
State | Published - 2008 |
Externally published | Yes |
Event | 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 - Turku, Finland Duration: Sep 1 2008 → Sep 3 2008 |
Conference
Conference | 3rd International Symposium on Semantic Mining in Biomedicine, SMBM 2008 |
---|---|
Country/Territory | Finland |
City | Turku |
Period | 09/1/08 → 09/3/08 |