RESUMO
Descriptors calculated from a specific representation scheme encode only one part of the chemical information. For this reason, there is a need to construct novel graphical representations of proteins and novel protein descriptors that can provide new information about the structure of proteins. Here, a new set of protein descriptors based on computation of bilinear maps is presented. This novel approach to biomacromolecular design is relevant for QSPR studies on proteins. Protein bilinear indices are calculated from the kth power of nonstochastic and stochastic graph-theoretic electronic-contact matrices, M(m)(k) and (s)M(m)(k), respectively. That is to say, the kth nonstochastic and stochastic protein bilinear indices are calculated using M(m)(k) and (s)M(m)(k) as matrix operators of bilinear transformations. Moreover, biochemical information is codified by using different pair combinations of amino acid properties as weightings. Classification models based on a protein bilinear descriptor that discriminate between Arc mutants of stability similar or inferior to the wild-type form were developed. These equations permitted the correct classification of more than 90% of the mutants in training and test sets, respectively. To predict t(m) and Delta DeltaG(f)(o) values for Arc mutants, multiple linear regression and piecewise linear regression models were developed. The multiple linear regression models obtained accounted for 83% of the variance of the experimental t(m). Statistics calculated from internal and external validation procedures demonstrated robustness, stability and suitable power ability for all models. The results achieved demonstrate the ability of protein bilinear indices to encode biochemical information related to those structural changes significantly influencing the Arc repressor stability when punctual mutations are induced.
Assuntos
Modelos Teóricos , Proteínas/química , Alanina , Aminoácidos , Biologia Computacional/métodos , Mutação de Sentido Incorreto , Estabilidade Proteica , Relação Quantitativa Estrutura-AtividadeRESUMO
A new set of nucleotide-based bio-macromolecular descriptors are presented. This novel approach to bio-macromolecular design from a linear algebra point of view is relevant to nucleic acids quantitative structure-activity relationship (QSAR) studies. These bio-macromolecular indices are based on the calculus of bilinear maps on Re(n)[b(mk)(x (m),y (m)):Re(n) x Re(n)-->Re] in canonical basis. Nucleic acid's bilinear indices are calculated from kth power of non-stochastic and stochastic nucleotide's graph-theoretic electronic-contact matrices, M(m)(k) and (s)M(m)(k), respectively. That is to say, the kth non-stochastic and stochastic nucleic acid's bilinear indices are calculated using M(m)(k) and (s)M(m)(k) as matrix operators of bilinear transformations. Moreover, biochemical information is codified by using different pair combinations of nucleotide-base properties as weightings (experimental molar absorption coefficient epsilon(260) at 260 nm and pH=7.0, first (Delta E(1)) and second (Delta E(2)) single excitation energies in eV, and first (f(1)) and second (f(2)) oscillator strength values (of the first singlet excitation energies) of the nucleotide DNA-RNA bases. As example of this approach, an interaction study of the antibiotic paromomycin with the packaging region of the HIV-1 Psi-RNA have been performed and it have been obtained several linear models in order to predict the interaction strength. The best linear model obtained by using non-stochastic bilinear indices explains about 91% of the variance of the experimental Log K (R=0.95 and s=0.08 x 10(-4)M(-1)) as long as the best stochastic bilinear indices-based equation account for 93% of the Log K variance (R=0.97 and s=0.07 x 10(-4)M(-1)). The leave-one-out (LOO) press statistics, evidenced high predictive ability of both models (q(2)=0.86 and s(cv)=0.09 x 10(-4)M(-1) for non-stochastic and q(2)=0.91 and s(cv)=0.08 x 10(-4)M(-1) for stochastic bilinear indices). The nucleic acid's bilinear indices-based models compared favorably with other nucleic acid's indices-based approaches reported nowadays. These models also permit the interpretation of the driving forces of the interaction process. In this sense, developed equations involve short-reaching (kAssuntos
Biologia Computacional
, HIV-1/genética
, Paromomicina/metabolismo
, RNA Viral/metabolismo
, Sequência de Bases
, Pegada de DNA
, Empacotamento do DNA/genética
, DNA Viral/genética
, HIV-1/metabolismo
, Modelos Moleculares
, Dados de Sequência Molecular
, Relação Quantitativa Estrutura-Atividade
, RNA Viral/genética
, Processos Estocásticos