TY - GEN
T1 - Class confidence weighted kNN algorithms for imbalanced data sets
AU - Liu, Wei
AU - Chawla, Sanjay
PY - 2011
Y1 - 2011
N2 - In this paper, a novel k-nearest neighbors (kNN) weighting strategy is proposed for handling the problem of class imbalance. When dealing with highly imbalanced data, a salient drawback of existing kNN algorithms is that the class with more frequent samples tends to dominate the neighborhood of a test instance in spite of distance measurements, which leads to suboptimal classification performance on the minority class. To solve this problem, we propose CCW (class confidence weights) that uses the probability of attribute values given class labels to weight prototypes in kNN. The main advantage of CCW is that it is able to correct the inherent bias to majority class in existing kNN algorithms on any distance measurement. Theoretical analysis and comprehensive experiments confirm our claims.
AB - In this paper, a novel k-nearest neighbors (kNN) weighting strategy is proposed for handling the problem of class imbalance. When dealing with highly imbalanced data, a salient drawback of existing kNN algorithms is that the class with more frequent samples tends to dominate the neighborhood of a test instance in spite of distance measurements, which leads to suboptimal classification performance on the minority class. To solve this problem, we propose CCW (class confidence weights) that uses the probability of attribute values given class labels to weight prototypes in kNN. The main advantage of CCW is that it is able to correct the inherent bias to majority class in existing kNN algorithms on any distance measurement. Theoretical analysis and comprehensive experiments confirm our claims.
UR - http://www.scopus.com/inward/record.url?scp=79957967238&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-20847-8_29
DO - 10.1007/978-3-642-20847-8_29
M3 - Conference contribution
AN - SCOPUS:79957967238
SN - 9783642208461
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 345
EP - 356
BT - Advances in Knowledge Discovery and Data Mining - 15th Pacific-Asia Conference, PAKDD 2011, Proceedings
PB - Springer Verlag
ER -