Abstract

Feature selection has been applied in many domains, such as text mining and software engineering. Ideally a feature selection technique should produce consistent out- puts regardless of minor variations in the input data. Re- searchers have recently begun to examine the stability (robustness) of feature selection techniques. The stability of a feature selection method is defined as the degree of agreement between its outputs to randomly-selected subsets of the same input data. This study evaluated the stability of 11 threshold-based feature ranking techniques (rankers) when applied to 16 real-world software measurement datasets of different sizes. Experimental results demonstrate that AUC (Area Under the Receiver Operating Characteristic Curve) and PRC (Area Under the Precision-Recall Curve) per- formed best among the 11 rankers.

Disciplines

Computer Engineering | Computer Sciences | Engineering | Physical Sciences and Mathematics

COinS