Document classification efficiency of phrase-based techniques

Nagesh Kapalavayi, S. N.Jayaram Murthy, Gongzhu Hu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Due to the exponential growth of available text documents in digital form, it is of great importance to develop techniques for automatic document classification based on the textual contents. Earlier document classification techniques have used keyword-based features and related statistics to achieve good results when applied to certain datasets. More recently, some of these techniques have been extended to include phrase-based and'concept-based features to achieve better results. Since the characteristics of data sets used by each of these research groups are remarkably different, it is not possible to compare the efficiency of these methods. In this paper, we present a study that uses the same data set to compare efficiency of a phrase-based technique with key-word based techniques. Results prove conclusively that use of phrase-based features is very effective in document classification.

Original languageEnglish
Title of host publication2009 IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2009
Pages174-178
Number of pages5
DOIs
StatePublished - 2009
Event7th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA-2009 - Rabat, Morocco
Duration: May 10 2009May 13 2009

Publication series

Name2009 IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2009

Conference

Conference7th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA-2009
Country/TerritoryMorocco
CityRabat
Period05/10/0905/13/09

Keywords

  • Document classification
  • Keyword-based and phrase-based features
  • Text mining

Fingerprint

Dive into the research topics of 'Document classification efficiency of phrase-based techniques'. Together they form a unique fingerprint.

Cite this