TY - JOUR
T1 - Towards role-based filtering of disease outbreak reports
AU - Doan, Son
AU - Kawazoe, Ai
AU - Conway, Mike
AU - Collier, Nigel
N1 - Funding Information:
The authors thank Mika Shigematsu and Kiyosu Taniguchi at the National Institute of Infectious Diseases for useful discussions. This work was supported by Grants-in-Aid from the Japan Society for the Promotion of Science (Grant No. 18049071) and a Transdisciplinary Integration grant from ROIS.
PY - 2009/10
Y1 - 2009/10
N2 - This paper explores the role of named entities (NEs) in the classification of disease outbreak report. In the annotation schema of BioCaster, a text mining system for public health protection, important concepts that reflect information about infectious diseases were conceptually analyzed with a formal ontological methodology and classified into types and roles. Types are specified as NE classes and roles are integrated into NEs as attributes such as a chemical and whether it is being used as a therapy for some infectious disease. We focus on the roles of NEs and explore different ways to extract, combine and use them as features in a text classifier. In addition, we investigate the combination of roles with semantic categories of disease-related nouns and verbs. Experimental results using naïve Bayes and Support Vector Machine (SVM) algorithms show that: (1) roles in combination with NEs improve performance in text classification, (2) roles in combination with semantic categories of noun and verb features contribute substantially to the improvement of text classification. Both these results were statistically significant compared to the baseline "raw text" representation. We discuss in detail the effects of roles on each NE and on semantic categories of noun and verb features in terms of accuracy, precision/recall and F-score measures for the text classification task.
AB - This paper explores the role of named entities (NEs) in the classification of disease outbreak report. In the annotation schema of BioCaster, a text mining system for public health protection, important concepts that reflect information about infectious diseases were conceptually analyzed with a formal ontological methodology and classified into types and roles. Types are specified as NE classes and roles are integrated into NEs as attributes such as a chemical and whether it is being used as a therapy for some infectious disease. We focus on the roles of NEs and explore different ways to extract, combine and use them as features in a text classifier. In addition, we investigate the combination of roles with semantic categories of disease-related nouns and verbs. Experimental results using naïve Bayes and Support Vector Machine (SVM) algorithms show that: (1) roles in combination with NEs improve performance in text classification, (2) roles in combination with semantic categories of noun and verb features contribute substantially to the improvement of text classification. Both these results were statistically significant compared to the baseline "raw text" representation. We discuss in detail the effects of roles on each NE and on semantic categories of noun and verb features in terms of accuracy, precision/recall and F-score measures for the text classification task.
KW - Information extraction
KW - Named entities
KW - Semantic roles
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=66249107733&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2008.12.009
DO - 10.1016/j.jbi.2008.12.009
M3 - Article
C2 - 19171201
AN - SCOPUS:66249107733
SN - 1532-0464
VL - 42
SP - 773
EP - 780
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
IS - 5
ER -