Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set of software metrics that have the same predictive capability as a larger set of metrics – we strive to answer that question in this paper. We present a comprehensive comparison between seven commonly-used filter-based feature ranking techniques (FRT) and our proposed hybrid feature selection (HFS) technique. Our case study consists of a very highdimensional (42 software attributes) software measurement data set obtained from a large telecommunications system. The empirical analysis indicates that HFS performs better than FRT; however, the Kolmogorov-Smirnov feature ranking technique demonstrates competitive performance. For the telecommunications system, it is found that only 10% of the software attributes are sufficient for effective software quality prediction.
Artificial Intelligence and Robotics | Databases and Information Systems | Other Computer Sciences
Recommended Repository Citation
Wang, Huanjing; Khoshgoftaar, Taghi M.; and Gao, kehan. (2009). High-Dimensional Software Engineering Data and Feature Selection. 2009 21st IEEE International Conference on Tools with Artificial Intelligence, 83-90.
Available at: http://digitalcommons.wku.edu/comp_sci/3