"A Novel Dataset-Similarity-Aware Approach for Evaluating Stability of " by Taghi M. Khoshgoftaar, Huanjing Wang et al.

Computer Science Faculty Publications

Title

A Novel Dataset-Similarity-Aware Approach for Evaluating Stability of Software Metric Selection Techniques

Authors

Taghi M. Khoshgoftaar, Florida Atlantic UniversityFollow
Huanjing Wang, Western Kentucky UniversityFollow
Randall Wald, Florida Atlantic UniversityFollow
Amri NapolitanoFollow

Abstract

Software metric (feature) selection is an important preprocessing step before building software defect prediction models. Although much research has been done analyzing the classification performance of feature selection methods, fewer works have focused on their stability (robustness). Stability is important because feature selection methods which reliably produce the same results despite changes to the data are more trustworthy. Of the papers studying stability, most either compare the features chosen from different random subsamples of the dataset or compare each random subsample with the original dataset. These either result in an unknown degree of overlap between the subsamples, or comparing datasets of different sizes. In this work, we propose a fixed-overlap partition algorithm which generates a pair of subsets with the same number of instances and a specified degree of overlap. We empirically evaluate the stability of 19 feature selection methods in terms of degree of overlap and feature subset size using sixteen real software metrics datasets. Consistency index is used as the stability measure, and we show that RF is the most stable filter. Results also show that degree of overlap and features subset size do affect the stability of feature selection methods.

Disciplines

Computer Engineering | Computer Sciences | Engineering | Physical Sciences and Mathematics

Recommended Repository Citation

Khoshgoftaar, Taghi M.; Wang, Huanjing; Wald, Randall; and Napolitano, Amri. (2012). A Novel Dataset-Similarity-Aware Approach for Evaluating Stability of Software Metric Selection Techniques. Proceedings The 13th IEEE International Conference on Information Reuse and Integration.
Available at: https://digitalcommons.wku.edu/comp_sci/5

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

TopSCHOLAR®