κ-means-: A unified approach to clustering and outlier detection

Sanjay Chawla, Aristides Gionisy

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

136 Scopus citations

Abstract

We present a unified approach for simultaneously clustering and discovering outliers in data. Our approach is formalized as a generalization of the k-means problem. We prove that the problem is NP-hard and then present a practical polynomial time algorithm, which is guaranteed to converge to a local optimum. Furthermore we extend our approach to all distance measures that can be expressed in the form of a Bregman divergence. Experiments on synthetic and real datasets demonstrate the effectiveness of our approach and the utility of carrying out both clustering and outlier detection in a concurrent manner. In particular on the famous KDD cup network-intrusion dataset, we were able to increase the precision of the outlier detection task by nearly 100% compared to the classical nearest-neighbor approach.

Original languageEnglish
Title of host publicationProceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013
EditorsJoydeep Ghosh, Zoran Obradovic, Jennifer Dy, Zhi-Hua Zhou, Chandrika Kamath, Srinivasan Parthasarathy
PublisherSiam Society
Pages189-197
Number of pages9
ISBN (Electronic)9781611972627
DOIs
StatePublished - 2013
Externally publishedYes
EventSIAM International Conference on Data Mining, SDM 2013 - Austin, United States
Duration: May 2 2013May 4 2013

Publication series

NameProceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013

Conference

ConferenceSIAM International Conference on Data Mining, SDM 2013
Country/TerritoryUnited States
CityAustin
Period05/2/1305/4/13

Fingerprint

Dive into the research topics of 'κ-means-: A unified approach to clustering and outlier detection'. Together they form a unique fingerprint.

Cite this