One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can capture only one specific aspect of the classification performance, it may be unable to evaluate the classification performance from different perspectives. Also, there is no general consensus among researchers and practitioners regarding which performance metrics should be used for evaluating classification performance. In this study, we investigated six filter-based feature ranking techniques and built classification models using five different classifiers. The models were evaluated using eight different performance metrics. All experiments were conducted on four imbalanced data sets from a telecommunications software system. The experimental results demonstrate that the choice of a performance metric may significantly influence the classification evaluation conclusion. For example, one ranker may outperform another when using a given performance metric, but for a different performance metric the results may be reversed. In this study, we have found five distinct patterns when utilizing eight performance metrics to order six feature selection techniques.


Artificial Intelligence and Robotics | Databases and Information Systems | Other Computer Sciences