Ensemble-based wrapper methods for feature selection and class imbalance learning

Pengyi Yang, Wei Liu, Bing B. Zhou, Sanjay Chawla, Albert Y. Zomaya

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

50 Scopus citations

Abstract

The wrapper feature selection approach is useful in identifying informative feature subsets from high-dimensional datasets. Typically, an inductive algorithm "wrapped" in a search algorithm is used to evaluate the merit of the selected features. However, significant bias may be introduced when dealing with highly imbalanced dataset. That is, the selected features may favour one class while being less useful to the adverse class. In this paper, we propose an ensemble-based wrapper approach for feature selection from data with highly imbalanced class distribution. The key idea is to create multiple balanced datasets from the original imbalanced dataset via sampling, and subsequently evaluate feature subsets using an ensemble of base classifiers each trained on a balanced dataset. The proposed approach provides a unified framework that incorporates ensemble feature selection and multiple sampling in a mutually beneficial way. The experimental results indicate that, overall, features selected by the ensemble-based wrapper are significantly better than those selected by wrappers with a single inductive algorithm in imbalanced data classification.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 17th Pacific-Asia Conference, PAKDD 2013, Proceedings
Pages544-555
Number of pages12
EditionPART 1
DOIs
StatePublished - 2013
Externally publishedYes
Event17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013 - Gold Coast, QLD, Australia
Duration: Apr 14 2013Apr 17 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume7818 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013
Country/TerritoryAustralia
CityGold Coast, QLD
Period04/14/1304/17/13

Fingerprint

Dive into the research topics of 'Ensemble-based wrapper methods for feature selection and class imbalance learning'. Together they form a unique fingerprint.

Cite this