RESUMO
A new mathematical approach is proposed in the definition of molecular descriptors (MDs) based on the application of information theory concepts. This approach stems from a new matrix representation of a molecular graph (G) which is derived from the generalization of an incidence matrix whose row entries correspond to connected subgraphs of a given G, and the calculation of the Shannon's entropy, the negentropy and the standardized information content, plus for the first time, the mutual, conditional and joint entropy-based MDs associated with G. We also define strategies that generalize the definition of global or local invariants from atomic contributions (local vertex invariants, LOVIs), introducing related metrics (norms), means and statistical invariants. These invariants are applied to a vector whose components express the atomic information content calculated using the Shannon's, mutual, conditional and joint entropybased atomic information indices. The novel information indices (IFIs) are implemented in the program TOMOCOMDCARDD. A principal component analysis reveals that the novel IFIs are capable of capturing structural information not codified by IFIs implemented in the software DRAGON. A comparative study of the different parameters (e.g. subgraph orders and/or types, invariants and class of MDs) used in the definition of these IFIs reveals several interesting results. The mutual entropy-based indices give the best correlation results in modeling of a physicochemical property, namely the partition coefficient of the 34 derivatives of 2-furylethylenes, among the classes of indices investigated in this study. In a comparison with classical MDs it is demonstrated that the new IFIs give good results for various QSPR models.
Assuntos
Entropia , Etilenos/química , Preparações Farmacêuticas/química , Desenho de Fármacos , Modelos Químicos , Análise de Componente Principal , Relação Quantitativa Estrutura-AtividadeRESUMO
In this report, we present a new mathematical approach for describing chemical structures of organic molecules at atomic-molecular level, proposing for the first time the use of the concept of the derivative ([Formula: see text]) of a molecular graph (MG) with respect to a given event (E), to obtain a new family of molecular descriptors (MDs). With this purpose, a new matrix representation of the MG, which generalizes graph's theory's traditional incidence matrix, is introduced. This matrix, denominated the generalized incidence matrix, Q, arises from the Boolean representation of molecular sub-graphs that participate in the formation of the graph molecular skeleton MG and could be complete (representing all possible connected sub-graphs) or constitute sub-graphs of determined orders or types as well as a combination of these. The Q matrix is a non-quadratic and unsymmetrical in nature, its columns (n) and rows (m) are conditions (letters) and collection of conditions (words) with which the event occurs. This non-quadratic and unsymmetrical matrix is transformed, by algebraic manipulation, to a quadratic and symmetric matrix known as relations frequency matrix, F, which characterizes the participation intensity of the conditions (letters) in the events (words). With F, we calculate the derivative over a pair of atomic nuclei. The local index for the atomic nuclei i, Δ(i), can therefore be obtained as a linear combination of all the pair derivatives of the atomic nuclei i with all the rest of the j's atomic nuclei. Here, we also define new strategies that generalize the present form of obtaining global or local (group or atom-type) invariants from atomic contributions (local vertex invariants, LOVIs). In respect to this, metric (norms), means and statistical invariants are introduced. These invariants are applied to a vector whose components are the values Δ(i) for the atomic nuclei of the molecule or its fragments. Moreover, with the purpose of differentiating among different atoms, an atomic weighting scheme (atom-type labels) is used in the formation of the matrix Q or in LOVIs state. The obtained indices were utilized to describe the partition coefficient (Log P) and the reactivity index (Log K) of the 34 derivatives of 2-furylethylenes. In all the cases, our MDs showed better statistical results than those previously obtained using some of the most used families of MDs in chemometric practice. Therefore, it has been demonstrated to that the proposed MDs are useful in molecular design and permit obtaining easier and robust mathematical models than the majority of those reported in the literature. All this range of mentioned possibilities open "the doors" to the creation of a new family of MDs, using the graph derivative, and avail a new tool for QSAR/QSPR and molecular diversity/similarity studies.