RESUMO
Diagnosing cancer using microarray analysis to study differential gene expression has been a recent focus of intense research Although several very sophisticated analysis tools have been developed with this aim in mind, it still remains a challenge to keep these methods free of parametric adjustments as well as maintain their transparency for the final user. Nonparametric methods in general have been associated with these last two characteristics, thus becoming attractive tools for microarray analysis in cancer research. In particular, diagnosing cancer via microarray analysis is an exercise whereby tissue is characterized according to its differential gene expression levels. In this manuscript, two novel nonparametric methods for cancer diagnosis using microarray data are described and their performance assessed against a baseline approach that utilizes the Mann-Whitney test for median differences. Both methods show promising results in terms of their potential use in making diagnoses.
Assuntos
Neoplasias/diagnóstico , Humanos , Análise em Microsséries/métodos , Análise em Microsséries/estatística & dados numéricos , Neoplasias/genética , Estatísticas não ParamétricasRESUMO
This paper develops an algorithm that extracts explanatory rules from microarray data, which we treat as time series, using genetic programming (GP) and fuzzy logic. Reverse polish notation is used (RPN) to describe the rules and to facilitate the GP approach. The algorithm also allows for the insertion of prior knowledge, making it possible to find sets of rules that include the relationships between genes already known. The algorithm proposed is applied to problems arising in the construction of gene regulatory networks, using two different sets of real data from biological experiments on the Arabidopsis thaliana cold response and the rat central nervous system, respectively. The results show that the proposed technique can fit data to a pre-defined precision even in situations where the data set has thousands of features but only a limited number of points in time are available, a situation in which traditional statistical alternatives encounter difficulties, due to the scarcity of time points.