Búsqueda | Portal Regional de la BVS

Parallel convolutional contrastive learning method for enzyme function prediction.

Yu, Xindi; Zhou, Shusen; Zang, Mujun; Wang, Qingjun; Liu, Chanjuan; Liu, Tong.

IEEE/ACM Trans Comput Biol Bioinform ; PP2024 Aug 21.

Artículo en Inglés | MEDLINE | ID: mdl-39167509

RESUMEN

The function labeling of enzymes has a wide range of application value in the medical field, industrial biology and other fields. Scientists define enzyme categories by enzyme commission (EC) numbers. At present, although there are some tools for enzyme function prediction, their effects have not reached the application level. To improve the precision of enzyme function prediction, we propose a parallel convolutional contrastive learning (PCCL) method to predict enzyme functions. First, we use the advanced protein language model ESM-2 to preprocess the protein sequences. Second, PCCL combines convolutional neural networks (CNNs) and contrastive learning to improve the prediction precision of multifunctional enzymes. Contrastive learning can make the model better deal with the problem of class imbalance. Finally, the deep learning framework is mainly composed of three parallel CNNs for fully extracting sample features. we compare PCCL with state-of-art enzyme function prediction methods based on three evaluation metrics. The performance of our model improves on both two test sets. Especially on the smaller test set, PCCL improves the AUC by 2.57%. The source code can be downloaded from https://github.com/biomg/PCCL.

TSVM: Transfer Support Vector Machine for Predicting MPRA Validated Regulatory Variants.

Li, Minglie; Zhou, Shusen; Liu, Tong; Liu, Chanjuan; Zang, Mujun; Wang, Qingjun.

IEEE/ACM Trans Comput Biol Bioinform ; 21(3): 472-479, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38451770

RESUMEN

Genome-wide association studies have shown that common genetic variants associated with complex diseases are mostly located in non-coding regions, which may not be causal. In addition, the limited number of validated non-coding functional variants makes it difficult to develop an effective supervised learning model. Therefore, improving the accuracy of predicting non-coding causal variants has become critical. This study aims to build a transfer learning-based machine learning method for predicting regulatory variants to overcome the problem of limited sample size. This paper presents a supervised learning method transfer support vector machine (TSVM) for massively parallel reporter assays (MPRA) validated regulatory variants prediction. First, uses a convolutional neural network to extract features with transfer learning. Second, the extracted features are selected by random forest method. Third, the selected features are used to train support vector machine for classification. We performed scale sensitivity experiments on the MPRA dataset and validated the effectiveness of transfer learning. The model achieves the Mcc of 0.326 and the AUC of 0.720, which are higher than the state-of-the-art method.

Asunto(s)

Biología Computacional , Máquina de Vectores de Soporte , Biología Computacional/métodos , Humanos , Variación Genética/genética , Algoritmos , Estudio de Asociación del Genoma Completo/métodos

WVDL: Weighted Voting Deep Learning Model for Predicting RNA-Protein Binding Sites.

Pan, Zhengsen; Zhou, Shusen; Liu, Tong; Liu, Chanjuan; Zang, Mujun; Wang, Qingjun.

IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 3322-3328, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37028092

RESUMEN

RNA-binding proteins are important for the process of cell life activities. High-throughput technique experimental method to discover RNA-protein binding sites is time-consuming and expensive. Deep learning is an effective theory for predicting RNA-protein binding sites. Using weighted voting method to integrate multiple basic classifier models can improve model performance. Thus, in our study, we propose a weighted voting deep learning model (WVDL), which uses weighted voting method to combine convolutional neural network (CNN), long short term memory network (LSTM) and residual network (ResNet). First, the final forecast result of WVDL outperforms the basic classifier models and other ensemble strategies. Second, WVDL can extract more effective features by using weighted voting to find the best weighted combination. And, the CNN model also can draw the predicted motif pictures. Third, WVDL gets a competitive experiment result on public RBP-24 datasets comparing with other state-of-the-art methods. The source code of our proposed WVDL can be found in https://github.com/biomg/WVDL.

Asunto(s)

Aprendizaje Profundo , ARN , Unión Proteica , ARN/química , Sitios de Unión , Proteínas de Unión al ARN/química

CRMSNet: A deep learning model that uses convolution and residual multi-head self-attention block to predict RBPs for RNA sequence.

Pan, Zhengsen; Zhou, Shusen; Zou, Hailin; Liu, Chanjuan; Zang, Mujun; Liu, Tong; Wang, Qingjun.

Proteins ; 91(8): 1032-1041, 2023 08.

Artículo en Inglés | MEDLINE | ID: mdl-36935548

RESUMEN

RNA-binding proteins (RBPs) play significant roles in many biological life activities, many algorithms and tools are proposed to predict RBPs for researching biological mechanisms of RNA-protein binding sites. Deep learning algorithms based on traditional machine learning get better result for predicting RBPs. Recently, deep learning method fused with attention mechanism has attracted huge attention in many fields and gets competitive result. Thus, attention mechanism module may also improve model performance for predicting RNA-protein binding sites. In this study, we propose convolutional residual multi-head self-attention network (CRMSNet) that combines convolutional neural network (CNN), ResNet, and multi-head self-attention blocks to find RBPs for RNA sequence. First, CRMSNet incorporates convolutional neural networks, recurrent neural networks, and multi-head self-attention block. Second, CRMSNet can draw binding motif pictures from the convolutional layer parameters. Third, attention mechanism module combines the local and global RNA sequence information for capturing long sequence feature. CRMSNet gets competitive AUC (area under the receiver operating characteristic [ROC] curve) result in a large-scale dataset RBP-24. And CRMSNet experiment result is also compared with other state-of-the-art methods. The source code of our proposed CRMSNet method can be found in https://github.com/biomg/CRMSNet.

Asunto(s)

Aprendizaje Profundo , Secuencia de Bases , Redes Neurales de la Computación , ARN/química , Proteínas de Unión al ARN/química

MCNN: Multiple Convolutional Neural Networks for RNA-Protein Binding Sites Prediction.

Pan, Zhengsen; Zhou, Shusen; Zou, Hailin; Liu, Chanjuan; Zang, Mujun; Liu, Tong; Wang, Qingjun.

IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 1180-1187, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-35471886

RESUMEN

Computational prediction of the RBP bound sites using features learned from existing annotation knowledge is an effective method because high-throughput experiments are complex, expensive and time-consuming. Many methods have been proposed to predict RNA-protein binding sites. However, the partial information of RNA sequence is not fully used. In this study, we propose multiple convolutional neural networks (MCNN) method, which predicts RNA-protein binding sites by integrating multiple convolutional neural networks constructed by RNA sequence information extracted from windows with different lengths. First, MCNN trains multiple CNNs base on RNA sequences extracted by different window lengths. Second, MCNN can extract more binding patterns of RBPs by combining these trained multiple CNNs previously. Third, MCNN only uses RNA base sequence information for RNA-protein binding sites prediction, which extracts sequence binding features and predicts the result with same architecture. This avoids the information loss of feature extraction step. Our proposed MCNN demonstrates a competitive performance comparing with other methods on a large-scale dataset derived from CLIP-seq, which is an effective method for RNA-protein binding sites prediction. The source code of our proposed MCNN method can be found in https://github.com/biomg/MCNN.

Asunto(s)

Proteínas de Unión al ARN , ARN , Unión Proteica/genética , ARN/química , Proteínas de Unión al ARN/química , Sitios de Unión , Redes Neurales de la Computación

Active semi-supervised learning method with hybrid deep belief networks.

Zhou, Shusen; Chen, Qingcai; Wang, Xiaolong.

PLoS One ; 9(9): e107122, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-25208128

RESUMEN

In this paper, we develop a novel semi-supervised learning algorithm called active hybrid deep belief networks (AHD), to address the semi-supervised sentiment classification problem with deep learning. First, we construct the previous several hidden layers using restricted Boltzmann machines (RBM), which can reduce the dimension and abstract the information of the reviews quickly. Second, we construct the following hidden layers using convolutional restricted Boltzmann machines (CRBM), which can abstract the information of reviews effectively. Third, the constructed deep architecture is fine-tuned by gradient-descent based supervised learning with an exponential loss function. Finally, active learning method is combined based on the proposed deep architecture. We did several experiments on five sentiment classification datasets, and show that AHD is competitive with previous semi-supervised learning algorithm. Experiments are also conducted to verify the effectiveness of our proposed method with different number of labeled reviews and unlabeled reviews respectively.

Asunto(s)

Algoritmos , Minería de Datos/estadística & datos numéricos , Aprendizaje Automático , Cultura , Internet , Reconocimiento de Normas Patrones Automatizadas

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA