TY - GEN
T1 - Document classification efficiency of phrase-based techniques
AU - Kapalavayi, Nagesh
AU - Murthy, S. N.Jayaram
AU - Hu, Gongzhu
PY - 2009
Y1 - 2009
N2 - Due to the exponential growth of available text documents in digital form, it is of great importance to develop techniques for automatic document classification based on the textual contents. Earlier document classification techniques have used keyword-based features and related statistics to achieve good results when applied to certain datasets. More recently, some of these techniques have been extended to include phrase-based and'concept-based features to achieve better results. Since the characteristics of data sets used by each of these research groups are remarkably different, it is not possible to compare the efficiency of these methods. In this paper, we present a study that uses the same data set to compare efficiency of a phrase-based technique with key-word based techniques. Results prove conclusively that use of phrase-based features is very effective in document classification.
AB - Due to the exponential growth of available text documents in digital form, it is of great importance to develop techniques for automatic document classification based on the textual contents. Earlier document classification techniques have used keyword-based features and related statistics to achieve good results when applied to certain datasets. More recently, some of these techniques have been extended to include phrase-based and'concept-based features to achieve better results. Since the characteristics of data sets used by each of these research groups are remarkably different, it is not possible to compare the efficiency of these methods. In this paper, we present a study that uses the same data set to compare efficiency of a phrase-based technique with key-word based techniques. Results prove conclusively that use of phrase-based features is very effective in document classification.
KW - Document classification
KW - Keyword-based and phrase-based features
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=70349907724&partnerID=8YFLogxK
U2 - 10.1109/AICCSA.2009.5069321
DO - 10.1109/AICCSA.2009.5069321
M3 - Conference contribution
AN - SCOPUS:70349907724
SN - 9781424438068
T3 - 2009 IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2009
SP - 174
EP - 178
BT - 2009 IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2009
T2 - 7th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA-2009
Y2 - 10 May 2009 through 13 May 2009
ER -