RESUMO
The desirable pharmacological properties and a broad number of therapeutic activities have made peptides promising drugs over small organic molecules and antibody drugs. Nevertheless, toxic effects, such as hemolysis, have hampered the development of such promising drugs. Hence, a reliable computational tool to predict peptide hemolytic toxicity is enormously useful before synthesis and experimental evaluation. Currently, four web servers that predict hemolytic activity using machine learning (ML) algorithms are available; however, they exhibit some limitations, such as the need for a reliable negative set and limited application domain. Hence, we developed a robust model based on a novel theoretical approach that combines network science and a multiquery similarity searching (MQSS) method. A total of 1152 initial models were constructed from 144 scaffolds generated in a previous report. These were evaluated on external data sets, and the best models were fused and improved. Our best MQSS model I1 outperformed all state-of-the-art ML-based models and was used to characterize the prevalence of hemolytic toxicity on therapeutic peptides. Based on our model's estimation, the number of hemolytic peptides might be 3.9-fold higher than the reported.
Assuntos
Hemólise , Peptídeos , Humanos , Sequência de Aminoácidos , Peptídeos/farmacologia , Peptídeos/química , Algoritmos , Aprendizado de MáquinaRESUMO
The new pandemic caused by the coronavirus (SARS-CoV-2) has become the biggest challenge that the world is facing today. It has been creating a devastating global crisis, causing countless deaths and great panic. The search for an effective treatment remains a global challenge owing to controversies related to available vaccines. A great research effort (clinical, experimental, and computational) has emerged in response to this pandemic, and more than 125000 research reports have been published in relation to COVID-19. The majority of them focused on the discovery of novel drug candidates or repurposing of existing drugs through computational approaches that significantly speed up drug discovery. Among the different used targets, the SARS-CoV-2 main protease (Mpro), which plays an essential role in coronavirus replication, has become the preferred target for computational studies. In this review, we examine a representative set of computational studies that use the Mpro as a target for the discovery of small-molecule inhibitors of COVID-19. They will be divided into two main groups, structure-based and ligand-based methods, and each one will be subdivided according to the strategies used in the research. From our point of view, the use of combined strategies could enhance the possibilities of success in the future, permitting to development of more rigorous computational studies in future efforts to combat current and future pandemics.
Assuntos
Antivirais , COVID-19 , Proteases 3C de Coronavírus , Inibidores de Protease de Coronavírus , Descoberta de Drogas , Humanos , Antivirais/farmacologia , Simulação de Acoplamento Molecular , SARS-CoV-2 , Proteases 3C de Coronavírus/antagonistas & inibidores , Inibidores de Protease de Coronavírus/farmacologiaRESUMO
Leishmaniasis is a poverty-related disease endemic in 98 countries worldwide, with morbidity and mortality increasing daily. All currently used first-line and second-line drugs for the treatment of leishmaniasis exhibit several drawbacks including toxicity, high costs and route of administration. Consequently, the development of new treatments for leishmaniasis is a priority in the field of neglected tropical diseases. The aim of this work is to develop computational models those allow the identification of new chemical compounds with potential anti-leishmanial activity. A data set of 116 organic chemicals, assayed against promastigotes of Leishmania amazonensis, is used to develop the theoretical models. The cutoff value to consider a compound as active one was IC50≤1.5µM. For this study, we employed Dragon software to calculate the molecular descriptors and WEKA to obtain machine learning (ML) models. All ML models showed accuracy values between 82% and 91%, for the training set. The models developed with k-nearest neighbors and classification trees showed sensitivity values of 97% and 100%, respectively; while the models developed with artificial neural networks and support vector machine showed specificity values of 94% and 92%, respectively. In order to validate our models, an external test-set was evaluated with good behavior for all models. A virtual screening was performed and 156 compounds were identified as potential anti-leishmanial by all the ML models. This investigation highlights the merits of ML-based techniques as an alternative to other more traditional methods to find new chemical compounds with anti-leishmanial activity.
Assuntos
Antiprotozoários/farmacologia , Leishmania/efeitos dos fármacos , Aprendizado de Máquina , Antiprotozoários/química , Avaliação Pré-Clínica de Medicamentos , Modelos Moleculares , Testes de Sensibilidade Parasitária , SoftwareRESUMO
The features and theoretical background of a new and free computational program for chemometric analysis denominated IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) are presented. This is multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks. A total of 20 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in each case. Several information-theoretic parameters traditionally used as molecular descriptors (MDs) are adapted for use as unsupervised rank-based feature selection methods. On the other hand, a generalization scheme for the previously defined differential Shannon's entropy is discussed, as well as the introduction of Jeffreys information measure for supervised feature selection. Moreover, well-known information-theoretic feature selection parameters, such as information gain, gain ratio, and symmetrical uncertainty are incorporated to the IMMAN software ( http://mobiosd-hub.com/imman-soft/ ), following an equal-interval discretization approach. IMMAN offers data pre-processing functionalities, such as missing values processing, dataset partitioning, and browsing. Moreover, single parameter or ensemble (multi-criteria) ranking options are provided. Consequently, this software is suitable for tasks like dimensionality reduction, feature ranking, as well as comparative diversity analysis of data matrices. Simple examples of applications performed with this program are presented. A comparative study between IMMAN and WEKA feature selection tools using the Arcene dataset was performed, demonstrating similar behavior. In addition, it is revealed that the use of IMMAN unsupervised feature selection methods improves the performance of both IMMAN and WEKA supervised algorithms. Graphic representation for Shannon's distribution of MD calculating software.
Assuntos
Modelos Teóricos , Software , AlgoritmosRESUMO
Protozoan parasites have been one of the most significant public health problems for centuries and several human infections caused by them have massive global impact. Most of the current drugs used to treat these illnesses have been used for decades and have many limitations such as the emergence of drug resistance, severe side-effects, low-to-medium drug efficacy, administration routes, cost, etc. These drugs have been largely neglected as models for drug development because they are majorly used in countries with limited resources and as a consequence with scarce marketing possibilities. Nowadays, there is a pressing need to identify and develop new drug-based antiprotozoan therapies. In an effort to overcome this problem, the main purpose of this study is to develop a QSARs-based ensemble classifier for antiprotozoan drug-like entities from a heterogeneous compounds collection. Here, we use some of the TOMOCOMD-CARDD molecular descriptors and linear discriminant analysis (LDA) to derive individual linear classification functions in order to discriminate between antiprotozoan and non-antiprotozoan compounds as a way to enable the computational screening of virtual combinatorial datasets and/or drugs already approved. Firstly, we construct a wide-spectrum benchmark database comprising of 680 organic chemicals with great structural variability (254 of them antiprotozoan agents and 426 to drugs having other clinical uses). This series of compounds was processed by a k-means cluster analysis in order to design training and predicting sets. In total, seven discriminant functions were obtained, by using the whole set of atom-based linear indices. All the LDA-based QSAR models show accuracies above 85% in the training set and values of Matthews correlation coefficients (C) vary from 0.70 to 0.86. The external validation set shows rather-good global classifications of around 80% (92.05% for best equation). Later, we developed a multi-agent QSAR classification system, in which the individual QSAR outputs are the inputs of the aforementioned fusion approach. Finally, the fusion model was used for the identification of a novel generation of lead-like antiprotozoan compounds by using ligand-based virtual screening of 'available' small molecules (with synthetic feasibility) in our 'in-house' library. A new molecular subsystem (quinoxalinones) was then theoretically selected as a promising lead series, and its derivatives subsequently synthesized, structurally characterized, and experimentally assayed by using in vitro screening that took into consideration a battery of five parasite-based assays. The chemicals 11(12) and 16 are the most active (hits) against apicomplexa (sporozoa) and mastigophora (flagellata) subphylum parasites, respectively. Both compounds depicted good activity in every protozoan in vitro panel and they did not show unspecific cytotoxicity on the host cells. The described technical framework seems to be a promising QSAR-classifier tool for the molecular discovery and development of novel classes of broad-antiprotozoan-spectrum drugs, which may meet the dual challenges posed by drug-resistant parasites and the rapid progression of protozoan illnesses.
Assuntos
Antiprotozoários/farmacologia , Quinoxalinas/síntese química , Ciclização , Estrutura Molecular , Relação Quantitativa Estrutura-Atividade , Quinoxalinas/químicaRESUMO
A new mathematical approach is proposed in the definition of molecular descriptors (MDs) based on the application of information theory concepts. This approach stems from a new matrix representation of a molecular graph (G) which is derived from the generalization of an incidence matrix whose row entries correspond to connected subgraphs of a given G, and the calculation of the Shannon's entropy, the negentropy and the standardized information content, plus for the first time, the mutual, conditional and joint entropy-based MDs associated with G. We also define strategies that generalize the definition of global or local invariants from atomic contributions (local vertex invariants, LOVIs), introducing related metrics (norms), means and statistical invariants. These invariants are applied to a vector whose components express the atomic information content calculated using the Shannon's, mutual, conditional and joint entropybased atomic information indices. The novel information indices (IFIs) are implemented in the program TOMOCOMDCARDD. A principal component analysis reveals that the novel IFIs are capable of capturing structural information not codified by IFIs implemented in the software DRAGON. A comparative study of the different parameters (e.g. subgraph orders and/or types, invariants and class of MDs) used in the definition of these IFIs reveals several interesting results. The mutual entropy-based indices give the best correlation results in modeling of a physicochemical property, namely the partition coefficient of the 34 derivatives of 2-furylethylenes, among the classes of indices investigated in this study. In a comparison with classical MDs it is demonstrated that the new IFIs give good results for various QSPR models.
Assuntos
Entropia , Etilenos/química , Preparações Farmacêuticas/química , Desenho de Fármacos , Modelos Químicos , Análise de Componente Principal , Relação Quantitativa Estrutura-AtividadeRESUMO
Two-dimensional bond-based linear indices and linear discriminant analysis are used in this report to perform a quantitative structure-activity relationship study to identify new trypanosomicidal compounds. A database with 143 anti-trypanosomal and 297 compounds having other clinical uses, are utilized to develop the theoretical models. The best discriminant models computed using bond-based linear indices provides accuracies greater than 90 for both training and test sets. Our models identify as anti-trypanosomals five out of nine compounds of a set of already-synthesized substances. The in vitro anti-trypanosomal activity of this set against epimastigote forms of Trypanosoma cruzi is assayed. Both models show a perfect agreement between theoretical predictions and experimental results. The compounds identified as active ones show more than 98% of anti-epimastigote elimination (AE) at a concentration of 100 µg/mL. Besides, three compounds show more than 70% of AE at a concentration of 10 µg/mL. Finally, compounds with the best "activity against epimastigote forms/unspecific cytotoxicity" ratio are evaluated using an amastigote susceptibility assay. It should be noticed that, compound Va7-71 exhibit a 100% of intracellular amastigote elimination and shows similar activity when compared to a standard trypanosomicidal as nifurtimox. Finally, we can emphasize that, the present algorithm constitutes a step forward in the search for efficient ways of discovering new anti-trypanosomal compounds.
Assuntos
Sobrevivência Celular/efeitos dos fármacos , Descoberta de Drogas/métodos , Estágios do Ciclo de Vida/efeitos dos fármacos , Tripanossomicidas/química , Trypanosoma cruzi/efeitos dos fármacos , Algoritmos , Animais , Doença de Chagas/tratamento farmacológico , Doença de Chagas/parasitologia , Bases de Dados Factuais , Análise Discriminante , Fibroblastos/parasitologia , Ensaios de Triagem em Larga Escala , Humanos , Ligantes , Modelos Teóricos , Relação Quantitativa Estrutura-Atividade , Software , Tripanossomicidas/farmacologia , Trypanosoma cruzi/crescimento & desenvolvimentoRESUMO
Descriptors calculated from a specific representation scheme encode only one part of the chemical information. For this reason, there is a need to construct novel graphical representations of proteins and novel protein descriptors that can provide new information about the structure of proteins. Here, a new set of protein descriptors based on computation of bilinear maps is presented. This novel approach to biomacromolecular design is relevant for QSPR studies on proteins. Protein bilinear indices are calculated from the kth power of nonstochastic and stochastic graph-theoretic electronic-contact matrices, M(m)(k) and (s)M(m)(k), respectively. That is to say, the kth nonstochastic and stochastic protein bilinear indices are calculated using M(m)(k) and (s)M(m)(k) as matrix operators of bilinear transformations. Moreover, biochemical information is codified by using different pair combinations of amino acid properties as weightings. Classification models based on a protein bilinear descriptor that discriminate between Arc mutants of stability similar or inferior to the wild-type form were developed. These equations permitted the correct classification of more than 90% of the mutants in training and test sets, respectively. To predict t(m) and Delta DeltaG(f)(o) values for Arc mutants, multiple linear regression and piecewise linear regression models were developed. The multiple linear regression models obtained accounted for 83% of the variance of the experimental t(m). Statistics calculated from internal and external validation procedures demonstrated robustness, stability and suitable power ability for all models. The results achieved demonstrate the ability of protein bilinear indices to encode biochemical information related to those structural changes significantly influencing the Arc repressor stability when punctual mutations are induced.
Assuntos
Modelos Teóricos , Proteínas/química , Alanina , Aminoácidos , Biologia Computacional/métodos , Mutação de Sentido Incorreto , Estabilidade Proteica , Relação Quantitativa Estrutura-AtividadeRESUMO
Herein we present results of a quantitative structure-activity relationship (QSAR) studies to classify and design, in a rational way, new antitrypanosomal compounds by using non-stochastic and stochastic bond-based quadratic indices. A data set of 440 organic chemicals, 143 with antitrypanosomal activity and 297 having other clinical uses, is used to develop QSAR models based on linear discriminant analysis (LDA). Non-stochastic model correctly classifies more than 93% and 95% of chemicals in both training and external prediction groups, respectively. On the other hand, the stochastic model shows an accuracy of about the 87% for both series. As an experiment of virtual lead generation, the present approach is finally satisfactorily applied to the virtual evaluation of 9 already synthesized in house compounds. The in vitro antitrypanosomal activity of this series against epimastigote forms of Trypanosoma cruzi is assayed. The model is able to predict correctly the behaviour for the majority of these compounds. Four compounds (FER16, FER32, FER33 and FER 132) showed more than 70% of epimastigote inhibition at a concentration of 100 microg/mL (86.74%, 78.12%, 88.85% and 72.10%, respectively) and two of these chemicals, FER16 (78.22% of AE) and FER33 (81.31% of AE), also showed good activity at a concentration of 10 microg/mL. At the same concentration, compound FER16 showed lower value of cytotoxicity (15.44%), and compound FER33 showed very low value of 1.37%. Taking into account all these results, we can say that these three compounds can be optimized in forthcoming works, but we consider that compound FER33 is the best candidate. Even though none of them resulted more active than Nifurtimox, the current results constitute a step forward in the search for efficient ways to discover new lead antitrypanosomals.
Assuntos
Desenho Assistido por Computador , Descoberta de Drogas/métodos , Tripanossomicidas/química , Tripanossomicidas/farmacologia , Sobrevivência Celular/efeitos dos fármacos , Células Cultivadas , Análise Discriminante , Modelos Estatísticos , Estrutura Molecular , Relação Quantitativa Estrutura-Atividade , Trypanosoma cruzi/efeitos dos fármacosRESUMO
A new set of nucleotide-based bio-macromolecular descriptors are presented. This novel approach to bio-macromolecular design from a linear algebra point of view is relevant to nucleic acids quantitative structure-activity relationship (QSAR) studies. These bio-macromolecular indices are based on the calculus of bilinear maps on Re(n)[b(mk)(x (m),y (m)):Re(n) x Re(n)-->Re] in canonical basis. Nucleic acid's bilinear indices are calculated from kth power of non-stochastic and stochastic nucleotide's graph-theoretic electronic-contact matrices, M(m)(k) and (s)M(m)(k), respectively. That is to say, the kth non-stochastic and stochastic nucleic acid's bilinear indices are calculated using M(m)(k) and (s)M(m)(k) as matrix operators of bilinear transformations. Moreover, biochemical information is codified by using different pair combinations of nucleotide-base properties as weightings (experimental molar absorption coefficient epsilon(260) at 260 nm and pH=7.0, first (Delta E(1)) and second (Delta E(2)) single excitation energies in eV, and first (f(1)) and second (f(2)) oscillator strength values (of the first singlet excitation energies) of the nucleotide DNA-RNA bases. As example of this approach, an interaction study of the antibiotic paromomycin with the packaging region of the HIV-1 Psi-RNA have been performed and it have been obtained several linear models in order to predict the interaction strength. The best linear model obtained by using non-stochastic bilinear indices explains about 91% of the variance of the experimental Log K (R=0.95 and s=0.08 x 10(-4)M(-1)) as long as the best stochastic bilinear indices-based equation account for 93% of the Log K variance (R=0.97 and s=0.07 x 10(-4)M(-1)). The leave-one-out (LOO) press statistics, evidenced high predictive ability of both models (q(2)=0.86 and s(cv)=0.09 x 10(-4)M(-1) for non-stochastic and q(2)=0.91 and s(cv)=0.08 x 10(-4)M(-1) for stochastic bilinear indices). The nucleic acid's bilinear indices-based models compared favorably with other nucleic acid's indices-based approaches reported nowadays. These models also permit the interpretation of the driving forces of the interaction process. In this sense, developed equations involve short-reaching (kAssuntos
Biologia Computacional
, HIV-1/genética
, Paromomicina/metabolismo
, RNA Viral/metabolismo
, Sequência de Bases
, Pegada de DNA
, Empacotamento do DNA/genética
, DNA Viral/genética
, HIV-1/metabolismo
, Modelos Moleculares
, Dados de Sequência Molecular
, Relação Quantitativa Estrutura-Atividade
, RNA Viral/genética
, Processos Estocásticos
RESUMO
Two-dimensional atom- and bond-based TOMOCOMD-CARDD descriptors and linear discriminant analysis (LDA) are used in this report to perform a quantitative structure-activity relationship (QSAR) study of tyrosinase-inhibitory activity. A database of inhibitors of the enzyme is collected for this study, within 246 highly dissimilar molecules presenting antityrosinase activity. In total, 7 discriminant functions are obtained by using the whole set of atom- and bond-based 2D indices. All the LDA-based QSAR models show accuracies above 90% in the training set and values of the Matthews correlation coefficient (C) varying from 0.85 to 0.90. The external validation set shows globally good classifications between 89% and 91% and C values ranging from 0.75 to 0.81. Finally, QSAR models are used in the selection/identification of the 20 new dicoumarins subset to search for tyrosinase inhibitory activity. Theoretical and experimental results show good correspondence between one another. It is important to remark that most compounds in this series exhibit a more potent inhibitory activity against the mushroom tyrosinase enzyme than the reference compound, Kojic acid (IC(50) = 16.67 muM), resulting in a novel nucleus base (lead) with antityrosinase activity, and this could serve as a starting point for the drug discovery of novel tyrosinase inhibitor lead compounds. ( Journal of Biomolecular Screening 2008:1014-1024).