Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
1.
J Mol Biol ; 436(22): 168764, 2024 Aug 26.
Artículo en Inglés | MEDLINE | ID: mdl-39197652

RESUMEN

Classification of protein domains based on homology and structural similarity serves as a fundamental tool to gain biological insights into protein function. Recent advancements in protein structure prediction, exemplified by AlphaFold, have revolutionized the availability of protein structural data. We focus on classifying about 9000 Pfam families into ECOD (Evolutionary Classification of Domains) by using predicted AlphaFold models and the DPAM (Domain Parser for AlphaFold Models) tool. Our results offer insights into their homologous relationships and domain boundaries. More than half of these Pfam families contain DPAM domains that can be confidently assigned to the ECOD hierarchy. Most assigned domains belong to highly populated folds such as Immunoglobulin-like (IgL), Armadillo (ARM), helix-turn-helix (HTH), and Src homology 3 (SH3). A large fraction of DPAM domains, however, cannot be confidently assigned to ECOD homologous groups. These unassigned domains exhibit statistically different characteristics, including shorter average length, fewer secondary structure elements, and more abundant transmembrane segments. They could potentially define novel families remotely related to domains with known structures or novel superfamilies and folds. Manual scrutiny of a subset of these domains revealed an abundance of internal duplications and recurring structural motifs. Exploring sequence and structural features such as disulfide bond patterns, metal-binding sites, and enzyme active sites helped uncover novel structural folds as well as remote evolutionary relationships. By bridging the gap between sequence-based Pfam and structure-based ECOD domain classifications, our study contributes to a more comprehensive understanding of the protein universe by providing structural and functional insights into previously uncharacterized proteins.

2.
Comput Biol Med ; 176: 108534, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38754217

RESUMEN

Antifreeze proteins have wide applications in the medical and food industries. In this study, we propose a stacking-based classifier that can effectively identify antifreeze proteins. Initially, feature extraction was performed in three aspects: reduction properties, scalable pseudo amino acid composition, and physicochemical properties. A hybrid feature set comprised of the combined information from these three categories was obtained. Subsequently, we trained the training set based on LightGBM, XGBoost, and RandomForest algorithms, and the training outcomes were passed to the Logistic algorithm for matching, thereby establishing a stacking algorithm. The proposed algorithm was tested on the test set and an independent validation set. Experimental data indicates that the algorithm achieved a recognition accuracy of 98.3 %, and an accuracy of 98.5 % on the validation set. Lastly, we analyzed the reasons why numerical features achieved high recognition capabilities from multiple aspects. Data dimensionality reduction and the analysis from two-dimensional and three-dimensional views revealed separability between positive and negative samples, and the protein three-dimensional structure further demonstrated significant differences in related features between the two samples. Analysis of the classifier revealed that Hr*Hr, HrHr, and Sc-PseAAC_1, 188D(152,116,57,183) were among the seven most important numerical features affecting algorithm recognition. For Hr*Hr and HrHr, supportive sequence level evidence for the reduction dictionary was found in terms of conservation area analysis, multiple sequence alignment, and amino acid conservative substitution. Moreover, the importance of the reduction dictionary was recognized through a comparative analysis of importance before and after the reduction, realizing the effectiveness of the dictionary in improving feature importance. A decision tree model has been utilized to discern the distinctions between dipeptides associated with the physical and chemical properties of His(H), Iso(I), Leu(L), and Lys(K) and other dipeptides. We finally analyzed the other seven features of importance, and data analysis confirmed that hydrophobicity, secondary structure, charge properties, van der Waals forces, and solvent accessibility are also factors affecting the antifreeze capability of proteins.


Asunto(s)
Algoritmos , Proteínas Anticongelantes , Proteínas Anticongelantes/química , Aminoácidos/química , Bases de Datos de Proteínas , Biología Computacional/métodos
3.
Cell ; 186(15): 3182-3195.e14, 2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37379837

RESUMEN

The elucidation of protein function and its exploitation in bioengineering have greatly advanced the life sciences. Protein mining efforts generally rely on amino acid sequences rather than protein structures. We describe here the use of AlphaFold2 to predict and subsequently cluster an entire protein family based on predicted structure similarities. We selected deaminase proteins to analyze and identified many previously unknown properties. We were surprised to find that most proteins in the DddA-like clade were not double-stranded DNA deaminases. We engineered the smallest single-strand-specific cytidine deaminase, enabling efficient cytosine base editor (CBE) to be packaged into a single adeno-associated virus (AAV). Importantly, we profiled a deaminase from this clade that edits robustly in soybean plants, which previously was inaccessible to CBEs. These discovered deaminases, based on AI-assisted structural predictions, greatly expand the utility of base editors for therapeutic and agricultural applications.


Asunto(s)
Edición Génica , Proteínas , Proteínas/metabolismo , Citidina Desaminasa/genética , Citidina Desaminasa/metabolismo , ADN , Sistemas CRISPR-Cas , Citosina/metabolismo
4.
Biophys Chem ; 295: 106971, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36801589

RESUMEN

Structures can now be predicted for any protein using programs like AlphaFold and Rosetta, which rely on a foundation of experimentally determined structures of architecturally diverse proteins. The accuracy of such artificial intelligence and machine learning (AI/ML) approaches benefits from the specification of restraints which assist in navigating the universe of folds to converge on models most representative of a given protein's physiological structure. This is especially pertinent for membrane proteins, with structures and functions that depend on their presence in lipid bilayers. Structures of proteins in their membrane environments could conceivably be predicted from AI/ML approaches with user-specificized parameters that describe each element of the architecture of a membrane protein accompanied by its lipid environment. We propose the Classification Of Membrane Proteins based On Structures Engaging Lipids (COMPOSEL), which builds on existing nomenclature types for monotopic, bitopic, polytopic and peripheral membrane proteins as well as lipids. Functional and regulatory elements are also defined in the scripts, as shown with membrane fusing synaptotagmins, multidomain PDZD8 and Protrudin proteins that recognize phosphoinositide (PI) lipids, the intrinsically disordered MARCKS protein, caveolins, the ß barrel assembly machine (BAM), an adhesion G-protein coupled receptor (aGPCR) and two lipid modifying enzymes - diacylglycerol kinase DGKε and fatty aldehyde dehydrogenase FALDH. This demonstrates how COMPOSEL communicates lipid interactivity as well as signaling mechanisms and binding of metabolites, drug molecules, polypeptides or nucleic acids to describe the operations of any protein. Moreover COMPOSEL can be scaled to express how genomes encode membrane structures and how our organs are infiltrated by pathogens such as SARS-CoV-2.


Asunto(s)
COVID-19 , Proteínas de la Membrana , Humanos , Proteínas de la Membrana/química , Lípidos de la Membrana , Inteligencia Artificial , Modelos Moleculares , SARS-CoV-2/metabolismo , Membrana Dobles de Lípidos/química , Proteínas Adaptadoras Transductoras de Señales/metabolismo
5.
BMC Bioinformatics ; 23(1): 517, 2022 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-36456900

RESUMEN

BACKGROUND: This research aims to increase our knowledge of amyloidoses. These disorders cause incorrect protein folding, affecting protein functionality (on structure). Fibrillar deposits are the basis of some wellknown diseases, such as Alzheimer, Creutzfeldt-Jakob diseases and type II diabetes. For many of these amyloid proteins, the relative precursors are known. Discovering new protein precursors involved in forming amyloid fibril deposits would improve understanding the pathological processes of amyloidoses. RESULTS: A new classifier, called ENTAIL, was developed using over than 4000 molecular descriptors. ENTAIL was based on the Naive Bayes Classifier with Unbounded Support and Gaussian Kernel Type, with an accuracy on the test set of 81.80%, SN of 100%, SP of 63.63% and an MCC of 0.683 on a balanced dataset. CONCLUSIONS: The analysis carried out has demonstrated how, despite the various configurations of the tests, performances are superior in terms of performance on a balanced dataset.


Asunto(s)
Amiloidosis , Diabetes Mellitus Tipo 2 , Humanos , Amiloide , Teorema de Bayes , Pliegue de Proteína
6.
BMC Bioinformatics ; 23(1): 461, 2022 Nov 04.
Artículo en Inglés | MEDLINE | ID: mdl-36333658

RESUMEN

BACKGROUND: Adaptor proteins play a key role in intercellular signal transduction, and dysfunctional adaptor proteins result in diseases. Understanding its structure is the first step to tackling the associated conditions, spurring ongoing interest in research into adaptor proteins with bioinformatics and computational biology. Our study aims to introduce a small, new, and superior model for protein classification, pushing the boundaries with new machine learning algorithms. RESULTS: We propose a novel transformer based model which includes convolutional block and fully connected layer. We input protein sequences from a database, extract PSSM features, then process it via our deep learning model. The proposed model is efficient and highly compact, achieving state-of-the-art performance in terms of area under the receiver operating characteristic curve, Matthew's Correlation Coefficient and Receiver Operating Characteristics curve. Despite merely 20 hidden nodes translating to approximately 1% of the complexity of previous best known methods, the proposed model is still superior in results and computational efficiency. CONCLUSIONS: The proposed model is the first transformer model used for recognizing adaptor protein, and outperforms all existing methods, having PSSM profiles as inputs that comprises convolutional blocks, transformer and fully connected layers for the use of classifying adaptor proteins.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Algoritmos , Biología Computacional/métodos , Proteínas Adaptadoras Transductoras de Señales
7.
Front Microbiol ; 13: 932661, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35910662

RESUMEN

Phage has high specificity for its host recognition. As a natural enemy of bacteria, it has been used to treat super bacteria many times. Identifying phage proteins from the original sequence is very important for understanding the relationship between phage and host bacteria and developing new antimicrobial agents. However, traditional experimental methods are both expensive and time-consuming. In this study, an ensemble learning-based feature selection method is proposed to find important features for phage protein identification. The method uses four types of protein sequence-derived features, quantifies the importance of each feature by adding perturbations to the features to influence the results, and finally splices the important features among the four types of features. In addition, we analyzed the selected features and their biological significance.

8.
Front Bioeng Biotechnol ; 10: 788300, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35875501

RESUMEN

Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit-explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring "the state of the art" in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI-PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI-PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI-PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the "state of the art" on research in the AI-PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.

9.
Comput Struct Biotechnol J ; 20: 3503-3510, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35860409

RESUMEN

Proteins are the executors of cellular physiological activities, and accurate structural and function elucidation are crucial for the refined mapping of proteins. As a feature engineering method, the reduction of amino acid composition is not only an important method for protein structure and function analysis, but also opens a broad horizon for the complex field of machine learning. Representing sequences with fewer amino acid types greatly reduces the complexity and noise of traditional feature engineering in dimension, and provides more interpretable predictive models for machine learning to capture key features. In this paper, we systematically reviewed the strategy and method studies of the reduced amino acid (RAA) alphabets, and summarized its main research in protein sequence alignment, functional classification, and prediction of structural properties, respectively. In the end, we gave a comprehensive analysis of 672 RAA alphabets from 74 reduction methods.

10.
Curr Res Struct Biol ; 4: 134-145, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35586857

RESUMEN

Proteins perform their function by accessing a suitable conformer from the ensemble of available conformations. The conformational diversity of a chosen protein structure can be obtained by experimental methods under different conditions. A key issue is the accurate comparison of different conformations. A gold standard used for such a comparison is the root mean square deviation (RMSD) between the two structures. While extensive refinements of RMSD evaluation at the backbone level are available, a comprehensive framework including the side chain interaction is not well understood. Here we employ protein structure network (PSN) formalism, with the non-covalent interactions of side chain, explicitly treated. The PSNs thus constructed are compared through graph spectral method, which provides a comparison at the local and at the global structural level. In this work, PSNs of multiple crystal conformers of single-chain, single-domain proteins, are subject to pair-wise analysis to examine the dissimilarity in their network topologies and in order to determine the conformational diversity of their native structures. This information is utilized to classify the structural domains of proteins into different categories. It is observed that proteins typically tend to retain structure and interactions at the backbone level. However, some of them also depict variability in either their overall structure or only in their inter-residue connectivity at the sidechain level, or both. Variability of sub-networks based on solvent accessibility and secondary structure is studied. The types of specific interactions are found to contribute differently to structure variability. An ensemble analysis by computing the mathematical variance of edge-weights across multiple conformers provided information on the contribution to overall variability from each edge of the PSN. Interactions that are highly variable are identified and their impact on structure variability has been discussed with the help of a case study. The classification based on the present side-chain network-based studies provides a framework to correlate the structure-function relationships in protein structures.

11.
Front Genet ; 13: 875112, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35547252

RESUMEN

The major histocompatibility complex (MHC) is a large locus on vertebrate DNA that contains a tightly linked set of polymorphic genes encoding cell surface proteins essential for the adaptive immune system. The groups of proteins encoded in the MHC play an important role in the adaptive immune system. Therefore, the accurate identification of the MHC is necessary to understand its role in the adaptive immune system. An effective predictor called PredMHC is established in this study to identify the MHC from protein sequences. Firstly, PredMHC encoded a protein sequence with mixed features including 188D, APAAC, KSCTriad, CKSAAGP, and PAAC. Secondly, three classifiers including SGD, SMO, and random forest were trained on the mixed features of the protein sequence. Finally, the prediction result was obtained by the voting of the three classifiers. The experimental results of the 10-fold cross-validation test in the training dataset showed that PredMHC can obtain 91.69% accuracy. Experimental results on comparison with other features, classifiers, and existing methods showed the effectiveness of PredMHC in predicting the MHC.

12.
Comput Biol Chem ; 98: 107680, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35421797

RESUMEN

Membrane protein classification is a key to inferring the function of uncharacterized membrane protein. To get around the time-consuming and expensive biochemical experiments in the wet lab, there has been a lot of research focusing on developing fast and reliable bioinformatics or computer modeling methods for membrane protein prediction. However, most research is inclined to incorporate as many types of protein data as possible, yet in many cases, the number of accessible protein data types is quite limited. To solve this challenge, a channel attention adapted deep learning model that takes the position-specific scoring matrix (PSSM) as only input and its simplified version without channel attention have been developed. They are named SE-BLTCNN and BLTCNN, respectively (the abbreviations for "SE embedded BiLSTM-TextCNN" and "plain BiLSTM-TextCNN"). The basic ideais to embed the Squeeze-and-Excitation (SE) block into the architecture of the text convolutional neural network (textCNN) and combine the resulting architecture with the bidirectional long short-term memory (BiLSTM) layer. An ablation experiment is also conducted to verify the effectiveness of using BiLSTM to extract high-level features from PSSM. On the benchmark sample set, the BLTCNN can achieve average precision as high as 96.2% and turns out to be state-of-the-art membrane protein type predictors based solely on PSSM data to the best of our knowledge; the SE-BLTCNN is second-best among all comparison methods, at a bit lower average precision of 95.7%. In addition, through empirical research, the excessive zero-padding in the training examples has been pinpointed as the major cause of performance loss of the SE-LTCNN; and by using the adjusted sample set, it has been confirmed that the SE-BLTCNN model outperforms the BLTCNN once this major cause is suppressed. The core code and dataset are available at https://github.com/Raymond-2017/membrane-protein-classifiction.


Asunto(s)
Aprendizaje Profundo , Atención , Proteínas de la Membrana , Redes Neurales de la Computación , Posición Específica de Matrices de Puntuación
13.
BMC Bioinformatics ; 23(1): 148, 2022 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-35462533

RESUMEN

BACKGROUND: SNARE proteins play an important role in different biological functions. This study aims to investigate the contribution of a new class of molecular descriptors (called SNARER) related to the chemical-physical properties of proteins in order to evaluate the performance of binary classifiers for SNARE proteins. RESULTS: We constructed a SNARE proteins balanced dataset, D128, and an unbalanced one, DUNI, on which we tested and compared the performance of the new descriptors presented here in combination with the feature sets (GAAC, CTDT, CKSAAP and 188D) already present in the literature. The machine learning algorithms used were Random Forest, k-Nearest Neighbors and AdaBoost and oversampling and subsampling techniques were applied to the unbalanced dataset. The addition of the SNARER descriptors increases the precision for all considered ML algorithms. In particular, on the unbalanced DUNI dataset the accuracy increases in parallel with the increase in sensitivity while on the balanced dataset D128 the accuracy increases compared to the counterpart without the addition of SNARER descriptors, with a strong improvement in specificity. Our best result is the combination of our descriptors SNARER with CKSAAP feature on the dataset D128 with 92.3% of accuracy, 90.1% for sensitivity and 95% for specificity with the RF algorithm. CONCLUSIONS: The performed analysis has shown how the introduction of molecular descriptors linked to the chemical-physical and structural characteristics of the proteins can improve the classification performance. Additionally, it was pointed out that performance can change based on using a balanced or unbalanced dataset. The balanced nature of training can significantly improve forecast accuracy.


Asunto(s)
Aprendizaje Automático , Proteínas SNARE , Algoritmos , Análisis por Conglomerados
14.
Mol Biol Evol ; 39(4)2022 04 10.
Artículo en Inglés | MEDLINE | ID: mdl-35353898

RESUMEN

Functional classification of proteins from sequences alone has become a critical bottleneck in understanding the myriad of protein sequences that accumulate in our databases. The great diversity of homologous sequences hides, in many cases, a variety of functional activities that cannot be anticipated. Their identification appears critical for a fundamental understanding of the evolution of living organisms and for biotechnological applications. ProfileView is a sequence-based computational method, designed to functionally classify sets of homologous sequences. It relies on two main ideas: the use of multiple profile models whose construction explores evolutionary information in available databases, and a novel definition of a representation space in which to analyze sequences with multiple profile models combined together. ProfileView classifies protein families by enriching known functional groups with new sequences and discovering new groups and subgroups. We validate ProfileView on seven classes of widespread proteins involved in the interaction with nucleic acids, amino acids and small molecules, and in a large variety of functions and enzymatic reactions. ProfileView agrees with the large set of functional data collected for these proteins from the literature regarding the organization into functional subgroups and residues that characterize the functions. In addition, ProfileView resolves undefined functional classifications and extracts the molecular determinants underlying protein functional diversity, showing its potential to select sequences towards accurate experimental design and discovery of novel biological functions. On protein families with complex domain architecture, ProfileView functional classification reconciles domain combinations, unlike phylogenetic reconstruction. ProfileView proves to outperform the functional classification approach PANTHER, the two k-mer-based methods CUPP and eCAMI and a neural network approach based on Restricted Boltzmann Machines. It overcomes time complexity limitations of the latter.


Asunto(s)
Evolución Molecular , Proteínas , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Filogenia , Extractos Vegetales , Proteínas/química , Proteínas/genética
15.
Front Genet ; 12: 797641, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34887905

RESUMEN

Hormone binding protein (HBP) is a soluble carrier protein that interacts selectively with different types of hormones and has various effects on the body's life activities. HBPs play an important role in the growth process of organisms, but their specific role is still unclear. Therefore, correctly identifying HBPs is the first step towards understanding and studying their biological function. However, due to their high cost and long experimental period, it is difficult for traditional biochemical experiments to correctly identify HBPs from an increasing number of proteins, so the real characterization of HBPs has become a challenging task for researchers. To measure the effectiveness of HBPs, an accurate and reliable prediction model for their identification is desirable. In this paper, we construct the prediction model HBP_NB. First, HBPs data were collected from the UniProt database, and a dataset was established. Then, based on the established high-quality dataset, the k-mer (K = 3) feature representation method was used to extract features. Second, the feature selection algorithm was used to reduce the dimensionality of the extracted features and select the appropriate optimal feature set. Finally, the selected features are input into Naive Bayes to construct the prediction model, and the model is evaluated by using 10-fold cross-validation. The final results were 95.45% accuracy, 94.17% sensitivity and 96.73% specificity. These results indicate that our model is feasible and effective.

16.
Math Biosci Eng ; 18(5): 5943-5958, 2021 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-34517517

RESUMEN

A neurotoxin is essentially a protein that mainly acts on the nervous system; it has a selective toxic effect on the central nervous system and neuromuscular nodes, can cause muscle paralysis and respiratory paralysis, and has strong lethality. According to their principle of action, neurotoxins are divided into presynaptic neurotoxins and postsynaptic neurotoxins. Correctly identifying presynaptic and postsynaptic nerve toxins provides important clues for future drug development and the discovery of drug targets. Therefore, a predictive model, Neu_LR, was constructed in this paper. The monoMonokGap method was used to extract the frequency characteristics of presynaptic and postsynaptic neurotoxin sequences and carry out feature selection, then, based on the important features obtained after dimensionality reduction, the prediction model Neu_LR was constructed using a logistic regression algorithm, and ten-fold cross-validation and independent test set validation were used. The final accuracy rates were 99.6078 and 94.1176%, respectively, which proved that the Neu_LR model had good predictive performance and robustness, and could meet the prediction requirements of presynaptic and postsynaptic neurotoxins. The data and source code of the model can be freely download from https://github.com/gyx123681/.


Asunto(s)
Algoritmos , Neurotoxinas , Neurotoxinas/toxicidad
17.
Comput Struct Biotechnol J ; 18: 1904-1913, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32774785

RESUMEN

Chaos Game Representation (CGR) was first proposed to be an image representation method of DNA and have been extended to the case of other biological macromolecules. Compared with the CGR images of DNA, where DNA sequences are converted into a series of points in the unit square, the existing CGR images of protein are not so elegant in geometry and the implications of the distribution of points in the CGR image are not so obvious. In this study, by naturally distributing the twenty amino acids on the vertices of a regular dodecahedron, we introduce a novel three-dimensional image representation of protein sequences with CGR method. We also associate each CGR image with a vector in high dimensional Euclidean space, called the extended natural vector (ENV), in order to analyze the information contained in the CGR images. Based on the results of protein classification and phylogenetic analysis, our method could serve as a precise method to discover biological relationships between proteins.

18.
Foods ; 9(9)2020 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-32825591

RESUMEN

In the 21st century, we face a troubling trilemma of expanding populations, planetary and public wellbeing. Given this, shifts from animal to plant food protein are gaining momentum and are an important part of reducing carbon emissions and consumptive water use. However, as this fast-pace of change sets in and begins to firmly embed itself within food-based dietary guidelines (FBDG) and food policies we must raise an important question-is now an opportunistic time to include other novel, nutritious and sustainable proteins within FBGD? The current paper describes how food proteins are typically categorised within FBDG and discusses how these could further evolve. Presently, food proteins tend to fall under the umbrella of being 'animal-derived' or 'plant-based' whilst other valuable proteins i.e., fungal-derived appear to be comparatively overlooked. A PubMed search of systematic reviews and meta-analytical studies published over the last 5 years shows an established body of evidence for animal-derived proteins (although some findings were less favourable), plant-based proteins and an expanding body of science for mycelium/fungal-derived proteins. Given this, along with elevated demands for alternative proteins there appears to be scope to introduce a 'third' protein category when compiling FBDG. This could fall under the potential heading of 'fungal' protein, with scope to include mycelium such as mycoprotein within this, for which the evidence-base is accruing.

19.
Proteins ; : e25993, 2020 Aug 11.
Artículo en Inglés | MEDLINE | ID: mdl-32779779

RESUMEN

This article reports on the results of research aimed to translate biometric 3D face recognition concepts and algorithms into the field of protein biophysics in order to precisely and rapidly classify morphological features of protein surfaces. Both human faces and protein surfaces are free-forms and some descriptors used in differential geometry can be used to describe them applying the principles of feature extraction developed for computer vision and pattern recognition. The first part of this study focused on building the protein dataset using a simulation tool and performing feature extraction using novel geometrical descriptors. The second part tested the method on two examples, first involved a classification of tubulin isotypes and the second compared tubulin with the FtsZ protein, which is its bacterial analog. An additional test involved several unrelated proteins. Different classification methodologies have been used: a classic approach with a support vector machine (SVM) classifier and an unsupervised learning with a k-means approach. The best result was obtained with SVM and the radial basis function kernel. The results are significant and competitive with the state-of-the-art protein classification methods. This leads to a new methodological direction in protein structure analysis.

20.
BMC Genomics ; 21(1): 463, 2020 Jul 06.
Artículo en Inglés | MEDLINE | ID: mdl-32631258

RESUMEN

BACKGROUND: We performed an in-depth analysis of the ABC gene family in Aedes aegypti (Diptera: Culicidae), which is an important vector species of arthropod-borne viral infections such as chikungunya, dengue, and Zika. Despite its importance, previous studies of the Arthropod ABC family have not focused on this species. Reports of insecticide resistance among pests and vectors indicate that some of these ATP-dependent efflux pumps are involved in compound traffic and multidrug resistance phenotypes. RESULTS: We identified 53 classic complete ABC proteins annotated in the A. aegypti genome. A phylogenetic analysis of Aedes aegypti ABC proteins was carried out to assign the novel proteins to the ABC subfamilies. We also determined 9 full-length sequences of DNA repair (MutS, RAD50) and structural maintenance of chromosome (SMC) proteins that contain the ABC signature. CONCLUSIONS: After inclusion of the putative ABC proteins into the evolutionary tree of the gene family, we classified A. aegypti ABC proteins into the established subfamilies (A to H), but the phylogenetic positioning of MutS, RAD50 and SMC proteins among ABC subfamilies-as well as the highly supported grouping of RAD50 and SMC-prompted us to name a new J subfamily of A. aegypti ABC proteins.


Asunto(s)
Transportadoras de Casetes de Unión a ATP/clasificación , Aedes/genética , Proteínas de Insectos/clasificación , Transportadoras de Casetes de Unión a ATP/genética , Animales , Proteínas de Insectos/genética , Familia de Multigenes , Filogenia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA