Text Document Clustering: The Application of Cluster Analysis to Textual Document

Venkata Srikanth Reddy, Patrick Kinnicutt, Roger Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Gathering the most relevant data for one's need, from the huge collection of data in the internet is a work of great difficult. To make it easier, we propose an application called text clustering, which is an automatic grouping of text documents into clusters, so that documents within a cluster defines the similarity between them, but they are not similar to documents in other clusters. Most of existing text clustering algorithms uses the traditional vector space model, which treats documents as group of words while the word sequences in the documents are ignored and the meaning of natural languages strongly depends on them. Our first objective is to implement a clustering algorithm in java, named Clustering based on Frequent Word Sequences. The frequent word sequences can provide compact and valuable information about the text documents. Our second objective is to use an association rule miner[13] to find the frequent two-word sets that satisfy the minimum support using Apriori Algorithm[2,5]. Our results will show that the finally compact documents will be more accurate and precise than the regular method documents.

Original languageEnglish
Title of host publicationProceedings - 2016 International Conference on Computational Science and Computational Intelligence, CSCI 2016
EditorsMary Yang, Hamid R. Arabnia, Leonidas Deligiannidis, Leonidas Deligiannidis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1174-1179
Number of pages6
ISBN (Electronic)9781509055104
DOIs
StatePublished - Mar 17 2017
Event2016 International Conference on Computational Science and Computational Intelligence, CSCI 2016 - Las Vegas, United States
Duration: Dec 15 2016Dec 17 2016

Publication series

NameProceedings - 2016 International Conference on Computational Science and Computational Intelligence, CSCI 2016

Conference

Conference2016 International Conference on Computational Science and Computational Intelligence, CSCI 2016
Country/TerritoryUnited States
CityLas Vegas
Period12/15/1612/17/16

Keywords

  • apriori algorithm
  • clustering
  • efficiency
  • group of words
  • space model
  • text
  • word sequence

Fingerprint

Dive into the research topics of 'Text Document Clustering: The Application of Cluster Analysis to Textual Document'. Together they form a unique fingerprint.

Cite this