RESUMEN
The commerce of falsified drugs has substantially grown in recent years due to facilitated access to technologies needed for copying authentic pharmaceutical products. Attenuated Total Reflectance coupled with Fourier Transform Infrared (ATR-FTIR) spectroscopy has been successfully employed as an analytical tool to identify falsified products and support legal agents in interrupting illegal operations. ATR-FTIR spectroscopy typically yields datasets comprised of hundreds of highly correlated wavenumbers, which may compromise the performance of classical multivariate techniques used for sample classification. In this paper we propose a new wavenumber interval selection method aimed at selecting regions of spectra that best discriminate samples of seized drugs into two classes, authentic or falsified. The discriminative power of spectra regions is represented by an Interval Importance Index (III) based on the Two-Sample Kolmogorov-Smirnov test statistic, which is a novel proposition of this paper. The III guides an iterative forward approach for wavenumber selection; different data mining techniques are used for sample classification. In 100 replications using the best combination of classification technique and wavenumber intervals, we obtained average 99.87% accurate classifications on a Cialis® dataset, while retaining 12.5% of the authentic wavenumbers, and average 99.43% accurate classifications on a Viagra® dataset, while retaining 23.75% of the authentic wavenumbers. Our proposition was compared with alternative approaches for individual and interval wavenumber selection available in the literature, always leading to more consistent and easier to interpret results.