Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Adv Sci (Weinh) ; 11(32): e2404845, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39031820

RESUMEN

Constructing discriminative representations of molecules lies at the core of a number of domains such as drug discovery, chemistry, and medicine. State-of-the-art methods employ graph neural networks and self-supervised learning (SSL) to learn unlabeled data for structural representations, which can then be fine-tuned for downstream tasks. Albeit powerful, these methods are pre-trained solely on molecular structures and thus often struggle with tasks involved in intricate biological processes. Here, it is proposed to assist the learning of molecular representation by using the perturbed high-content cell microscopy images at the phenotypic level. To incorporate the cross-modal pre-training, a unified framework is constructed to align them through multiple types of contrastive loss functions, which is proven effective in the formulated novel tasks to retrieve the molecules and corresponding images mutually. More importantly, the model can infer functional molecules according to cellular images generated by genetic perturbations. In parallel, the proposed model can transfer non-trivially to molecular property predictions, and has shown great improvement over clinical outcome predictions. These results suggest that such cross-modality learning can bridge molecules and phenotype to play important roles in drug discovery.


Asunto(s)
Redes Neurales de la Computación , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Microscopía/métodos , Algoritmos , Aprendizaje Automático
2.
JACS Au ; 4(7): 2492-2502, 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-39055138

RESUMEN

Illuminating synthetic pathways is essential for producing valuable chemicals, such as bioactive molecules. Chemical and biological syntheses are crucial, and their integration often leads to more efficient and sustainable pathways. Despite the rapid development of retrosynthesis models, few of them consider both chemical and biological syntheses, hindering the pathway design for high-value chemicals. Here, we propose BioNavi by innovating multitask learning and reaction templates into the deep learning-driven model to design hybrid synthesis pathways in a more interpretable manner. BioNavi outperforms existing approaches on different data sets, achieving a 75% hit rate in replicating reported biosynthetic pathways and displaying superior ability in designing hybrid synthesis pathways. Additional case studies further illustrate the potential application of BioNavi in a de novo pathway design. The enhanced web server (http://biopathnavi.qmclab.com/bionavi/) simplifies input operations and implements step-by-step exploration according to user experience. We show that BioNavi is a handy navigator for designing synthetic pathways for various chemicals.

3.
Nat Commun ; 15(1): 4476, 2024 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-38796523

RESUMEN

Protein functions are characterized by interactions with proteins, drugs, and other biomolecules. Understanding these interactions is essential for deciphering the molecular mechanisms underlying biological processes and developing new therapeutic strategies. Current computational methods mostly predict interactions based on either molecular network or structural information, without integrating them within a unified multi-scale framework. While a few multi-view learning methods are devoted to fusing the multi-scale information, these methods tend to rely intensively on a single scale and under-fitting the others, likely attributed to the imbalanced nature and inherent greediness of multi-scale learning. To alleviate the optimization imbalance, we present MUSE, a multi-scale representation learning framework based on a variant expectation maximization to optimize different scales in an alternating procedure over multiple iterations. This strategy efficiently fuses multi-scale information between atomic structure and molecular network scale through mutual supervision and iterative optimization. MUSE outperforms the current state-of-the-art models not only in molecular interaction (protein-protein, drug-protein, and drug-drug) tasks but also in protein interface prediction at the atomic structure scale. More importantly, the multi-scale learning framework shows potential for extension to other scales of computational drug discovery.


Asunto(s)
Biología Computacional , Proteínas , Proteínas/química , Proteínas/metabolismo , Biología Computacional/métodos , Algoritmos , Preparaciones Farmacéuticas/química , Preparaciones Farmacéuticas/metabolismo , Aprendizaje Automático , Interacciones Farmacológicas , Humanos , Unión Proteica
4.
J Chem Inf Model ; 64(6): 1945-1954, 2024 03 25.
Artículo en Inglés | MEDLINE | ID: mdl-38484468

RESUMEN

Self-supervised molecular representation learning has demonstrated great promise in bridging machine learning and chemical science to accelerate the development of new drugs. Due to the limited reaction data, existing methods are mostly pretrained by augmenting the intrinsic topology of molecules without effectively incorporating chemical reaction prior information, which makes them difficult to generalize to chemical reaction-related tasks. To address this issue, we propose ReaKE, a reaction knowledge embedding framework, which formulates chemical reactions as a knowledge graph. Specifically, we constructed a chemical synthesis knowledge graph with reactants and products as nodes and reaction rules as the edges. Based on the knowledge graph, we further proposed novel contrastive learning at both molecule and reaction levels to capture the reaction-related functional group information within and between molecules. Extensive experiments demonstrate the effectiveness of ReaKE compared with state-of-the-art methods on several downstream tasks, including reaction classification, product prediction, and yield prediction.


Asunto(s)
Aprendizaje Automático , Reconocimiento de Normas Patrones Automatizadas
5.
Nat Commun ; 15(1): 1071, 2024 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-38316797

RESUMEN

While significant advances have been made in predicting static protein structures, the inherent dynamics of proteins, modulated by ligands, are crucial for understanding protein function and facilitating drug discovery. Traditional docking methods, frequently used in studying protein-ligand interactions, typically treat proteins as rigid. While molecular dynamics simulations can propose appropriate protein conformations, they're computationally demanding due to rare transitions between biologically relevant equilibrium states. In this study, we present DynamicBind, a deep learning method that employs equivariant geometric diffusion networks to construct a smooth energy landscape, promoting efficient transitions between different equilibrium states. DynamicBind accurately recovers ligand-specific conformations from unbound protein structures without the need for holo-structures or extensive sampling. Remarkably, it demonstrates state-of-the-art performance in docking and virtual screening benchmarks. Our experiments reveal that DynamicBind can accommodate a wide range of large protein conformational changes and identify cryptic pockets in unseen protein targets. As a result, DynamicBind shows potential in accelerating the development of small molecules for previously undruggable targets and expanding the horizons of computational drug discovery.


Asunto(s)
Simulación de Dinámica Molecular , Proteínas , Ligandos , Proteínas/metabolismo , Conformación Proteica , Descubrimiento de Drogas , Unión Proteica , Simulación del Acoplamiento Molecular
6.
Patterns (N Y) ; 3(12): 100653, 2022 Dec 09.
Artículo en Inglés | MEDLINE | ID: mdl-36569549

RESUMEN

Jiahua Rao and Shuangjia Zheng are Ph.D. students in Prof. Yang's lab (Supercomputing And AI for Life science, SAIL Lab) at Sun Yat-sen University. They recently developed an interpretable framework to quantitatively assess the interpretability of Graph Neural Network (GNN) and made comparison with medicinal chemists. Their meaningful benchmarking and rigorous framework would greatly benefit development of new interpretable methods in GNNs.

7.
Patterns (N Y) ; 3(12): 100628, 2022 Dec 09.
Artículo en Inglés | MEDLINE | ID: mdl-36569553

RESUMEN

Graph neural networks (GNNs) have received increasing attention because of their expressive power on topological data, but they are still criticized for their lack of interpretability. To interpret GNN models, explainable artificial intelligence (XAI) methods have been developed. However, these methods are limited to qualitative analyses without quantitative assessments from the real-world datasets due to a lack of ground truths. In this study, we have established five XAI-specific molecular property benchmarks, including two synthetic and three experimental datasets. Through the datasets, we quantitatively assessed six XAI methods on four GNN models and made comparisons with seven medicinal chemists of different experience levels. The results demonstrated that XAI methods could deliver reliable and informative answers for medicinal chemists in identifying the key substructures. Moreover, the identified substructures were shown to complement existing classical fingerprints to improve molecular property predictions, and the improvements increased with the growth of training data.

8.
J Chem Inf Model ; 62(23): 5907-5917, 2022 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-36404642

RESUMEN

Fragment-based drug discovery is a widely used strategy for drug design in both academic and pharmaceutical industries. Although fragments can be linked to generate candidate compounds by the latest deep generative models, generating linkers with specified attributes remains underdeveloped. In this study, we presented a novel framework, DRlinker, to control fragment linking toward compounds with given attributes through reinforcement learning. The method has been shown to be effective for many tasks from controlling the linker length and log P, optimizing predicted bioactivity of compounds, to various multiobjective tasks. Specifically, our model successfully generated 91.0% and 93.9% of compounds complying with the desired linker length and log P and improved the 7.5 pChEMBL value in bioactivity optimization. Finally, a quasi-scaffold-hopping study revealed that DRlinker could generate nearly 30% molecules with high 3D similarity but low 2D similarity to the lead inhibitor, demonstrating the benefits and applicability of DRlinker in actual fragment-based drug design.


Asunto(s)
Diseño de Fármacos , Descubrimiento de Drogas
9.
Nat Commun ; 13(1): 3342, 2022 06 10.
Artículo en Inglés | MEDLINE | ID: mdl-35688826

RESUMEN

The complete biosynthetic pathways are unknown for most natural products (NPs), it is thus valuable to make computer-aided bio-retrosynthesis predictions. Here, a navigable and user-friendly toolkit, BioNavi-NP, is developed to predict the biosynthetic pathways for both NPs and NP-like compounds. First, a single-step bio-retrosynthesis prediction model is trained using both general organic and biosynthetic reactions through end-to-end transformer neural networks. Based on this model, plausible biosynthetic pathways can be efficiently sampled through an AND-OR tree-based planning algorithm from iterative multi-step bio-retrosynthetic routes. Extensive evaluations reveal that BioNavi-NP can identify biosynthetic pathways for 90.2% of 368 test compounds and recover the reported building blocks as in the test set for 72.8%, 1.7 times more accurate than existing conventional rule-based approaches. The model is further shown to identify biologically plausible pathways for complex NPs collected from the recent literature. The toolkit as well as the curated datasets and learned models are freely available to facilitate the elucidation and reconstruction of the biosynthetic pathways for NPs.


Asunto(s)
Productos Biológicos , Aprendizaje Profundo , Algoritmos , Vías Biosintéticas , Redes Neurales de la Computación
10.
J Chem Inf Model ; 62(5): 1308-1317, 2022 03 14.
Artículo en Inglés | MEDLINE | ID: mdl-35200015

RESUMEN

Identifying drug-protein interactions (DPIs) is crucial in drug discovery, and a number of machine learning methods have been developed to predict DPIs. Existing methods usually use unrealistic data sets with hidden bias, which will limit the accuracy of virtual screening methods. Meanwhile, most DPI prediction methods pay more attention to molecular representation but lack effective research on protein representation and high-level associations between different instances. To this end, we present the novel structure-aware multimodal deep DPI prediction model, STAMP-DPI, which was trained on a curated industry-scale benchmark data set. We built a high-quality benchmark data set named GalaxyDB for DPI prediction. This industry-scale data set along with an unbiased training procedure resulted in a more robust benchmark study. For informative protein representation, we constructed a structure-aware graph neural network method from the protein sequence by combining predicted contact maps and graph neural networks. Through further integration of structure-based representation and high-level pretrained embeddings for molecules and proteins, our model effectively captures the feature representation of the interactions between them. As a result, STAMP-DPI outperformed state-of-the-art DPI prediction methods by decreasing 7.00% mean square error (MSE) in the Davis data set and improving 8.89% area under the curve (AUC) in the GalaxyDB data set. Moreover, our model is an interpretable model with the transformer-based interaction mechanism, which can accurately reveal the binding sites between molecules and proteins.


Asunto(s)
Aprendizaje Profundo , Secuencia de Aminoácidos , Aprendizaje Automático , Redes Neurales de la Computación , Proteínas/química
11.
Nat Biomed Eng ; 6(1): 76-93, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34992270

RESUMEN

A reduced removal of dysfunctional mitochondria is common to aging and age-related neurodegenerative pathologies such as Alzheimer's disease (AD). Strategies for treating such impaired mitophagy would benefit from the identification of mitophagy modulators. Here we report the combined use of unsupervised machine learning (involving vector representations of molecular structures, pharmacophore fingerprinting and conformer fingerprinting) and a cross-species approach for the screening and experimental validation of new mitophagy-inducing compounds. From a library of naturally occurring compounds, the workflow allowed us to identify 18 small molecules, and among them two potent mitophagy inducers (Kaempferol and Rhapontigenin). In nematode and rodent models of AD, we show that both mitophagy inducers increased the survival and functionality of glutamatergic and cholinergic neurons, abrogated amyloid-ß and tau pathologies, and improved the animals' memory. Our findings suggest the existence of a conserved mechanism of memory loss across the AD models, this mechanism being mediated by defective mitophagy. The computational-experimental screening and validation workflow might help uncover potent mitophagy modulators that stimulate neuronal health and brain homeostasis.


Asunto(s)
Enfermedad de Alzheimer , Mitofagia , Enfermedad de Alzheimer/tratamiento farmacológico , Enfermedad de Alzheimer/patología , Péptidos beta-Amiloides , Animales , Aprendizaje Automático , Mitofagia/fisiología , Flujo de Trabajo
12.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35039821

RESUMEN

Protein-DNA interactions play crucial roles in the biological systems, and identifying protein-DNA binding sites is the first step for mechanistic understanding of various biological activities (such as transcription and repair) and designing novel drugs. How to accurately identify DNA-binding residues from only protein sequence remains a challenging task. Currently, most existing sequence-based methods only consider contextual features of the sequential neighbors, which are limited to capture spatial information. Based on the recent breakthrough in protein structure prediction by AlphaFold2, we propose an accurate predictor, GraphSite, for identifying DNA-binding residues based on the structural models predicted by AlphaFold2. Here, we convert the binding site prediction problem into a graph node classification task and employ a transformer-based variant model to take the protein structural information into account. By leveraging predicted protein structures and graph transformer, GraphSite substantially improves over the latest sequence-based and structure-based methods. The algorithm is further confirmed on the independent test set of 181 proteins, where GraphSite surpasses the state-of-the-art structure-based method by 16.4% in area under the precision-recall curve and 11.2% in Matthews correlation coefficient, respectively. We provide the datasets, the predicted structures and the source codes along with the pre-trained models of GraphSite at https://github.com/biomed-AI/GraphSite. The GraphSite web server is freely available at https://biomed.nscc-gz.cn/apps/GraphSite.


Asunto(s)
Algoritmos , Proteínas , Sitios de Unión , ADN/metabolismo , Unión Proteica , Dominios Proteicos , Proteínas/química
13.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3735-3743, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34637380

RESUMEN

MOTIVATION: The interactions of proteins with DNA, RNA, peptide, and carbohydrate play key roles in various biological processes. The studies of uncharacterized protein-molecules interactions could be aided by accurate predictions of residues that bind with partner molecules. However, the existing methods for predicting binding residues on proteins remain of relatively low accuracies due to the limited number of complex structures in databases. As different types of molecules partially share chemical mechanisms, the predictions for each molecular type should benefit from the binding information with other molecule types. RESULTS: In this study, we employed a multiple task deep learning strategy to develop a new sequence-based method for simultaneously predicting binding residues/sites with multiple important molecule types named MTDsite. By combining four training sets for DNA, RNA, peptide, and carbohydrate-binding proteins, our method yielded accurate and robust predictions with AUC values of 0.852, 0836, 0.758, and 0.776 on their respective independent test sets, which are 0.52 to 6.6% better than other state-of-the-art methods. To my best knowledge, this is the first method using multi-task framework to predict multiple molecular binding sites simultaneously.


Asunto(s)
Péptidos , ARN , ARN/química , Péptidos/química , Redes Neurales de la Computación , Proteínas/química , Sitios de Unión , Carbohidratos , ADN/genética , ADN/metabolismo , Unión Proteica
14.
J Cheminform ; 13(1): 87, 2021 Nov 13.
Artículo en Inglés | MEDLINE | ID: mdl-34774103

RESUMEN

Scaffold hopping is a central task of modern medicinal chemistry for rational drug design, which aims to design molecules of novel scaffolds sharing similar target biological activities toward known hit molecules. Traditionally, scaffolding hopping depends on searching databases of available compounds that can't exploit vast chemical space. In this study, we have re-formulated this task as a supervised molecule-to-molecule translation to generate hopped molecules novel in 2D structure but similar in 3D structure, as inspired by the fact that candidate compounds bind with their targets through 3D conformations. To efficiently train the model, we curated over 50 thousand pairs of molecules with increased bioactivity, similar 3D structure, but different 2D structure from public bioactivity database, which spanned 40 kinases commonly investigated by medicinal chemists. Moreover, we have designed a multimodal molecular transformer architecture by integrating molecular 3D conformer through a spatial graph neural network and protein sequence information through Transformer. The trained DeepHop model was shown able to generate around 70% molecules having improved bioactivity together with high 3D similarity but low 2D scaffold similarity to the template molecules. This ratio was 1.9 times higher than other state-of-the-art deep learning methods and rule- and virtual screening-based methods. Furthermore, we demonstrated that the model could generalize to new target proteins through fine-tuning with a small set of active compounds. Case studies have also shown the advantages and usefulness of DeepHop in practical scaffold hopping scenarios.

15.
J Chem Inf Model ; 61(10): 4900-4912, 2021 10 25.
Artículo en Inglés | MEDLINE | ID: mdl-34586824

RESUMEN

The protein kinase family contains many promising drug targets. Many kinase inhibitors target the ATP-binding pocket, leading to approved drugs in past decades. Scaffold hopping is an effective approach for drug design. The kinase ATP-binding pocket is highly conserved, crossing the whole kinase family. This provides an opportunity to develop a scaffold hopping approach to explore diversified scaffolds among various kinase inhibitors. In this work, we report the SyntaLinker-Hybrid scheme for kinase inhibitor scaffold hopping. With this scheme, we replace molecular fragments bound at the conserved kinase hinge region with deep generative models. Thus, we are able to generate new kinase-inhibitor-like structures hybridizing the privileged fragments against the hinge region. We demonstrate that this scheme allows generation of kinase-inhibitor-like molecules with novel scaffolds, while retaining the binding features of existing kinase inhibitors. This work can be employed in lead identification against kinase targets.


Asunto(s)
Aprendizaje Profundo , Diseño de Fármacos , Unión Proteica , Inhibidores de Proteínas Quinasas/farmacología , Proteínas Quinasas
16.
Bioinformatics ; 38(1): 94-98, 2021 12 22.
Artículo en Inglés | MEDLINE | ID: mdl-34450651

RESUMEN

MOTIVATION: The solvent accessible surface is an essential structural property measure related to the protein structure and protein function. Relative solvent accessible area (RSA) is a standard measure to describe the degree of residue exposure in the protein surface or inside of protein. However, this computation will fail when the residues information is missing. RESULTS: In this article, we proposed a novel method for estimation RSA using the Cα atom distance matrix with the deep learning method (EAGERER). The new method, EAGERER, achieves Pearson correlation coefficients of 0.921-0.928 on two independent test datasets. We empirically demonstrate that EAGERER can yield better Pearson correlation coefficients than existing RSA estimators, such as coordination number, half sphere exposure and SphereCon. To the best of our knowledge, EAGERER represents the first method to estimate the solvent accessible area using limited information with a deep learning model. It could be useful to the protein structure and protein function prediction. AVAILABILITYAND IMPLEMENTATION: The method is free available at https://github.com/cliffgao/EAGERER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Proteínas de la Membrana , Solventes/química
17.
J Chem Inf Model ; 61(4): 1627-1636, 2021 04 26.
Artículo en Inglés | MEDLINE | ID: mdl-33729779

RESUMEN

The goal of molecular optimization (MO) is to discover molecules that acquire improved pharmaceutical properties over a known starting molecule. Despite many recent successes of new approaches for MO, these methods were typically developed for particular properties with rich annotated training examples. Thus, these approaches are difficult to implement in real scenes where only a small amount of pharmaceutical data is usually available due to the expense and significant effort required for the data collection. Here, we propose a new approach, Meta-MO, for molecular optimization with a handful of training samples based on the well-recognized first-order meta-learning algorithms. By using a set of meta tasks with rich training samples, Meta-MO trains a meta model through the meta-learning optimization and adapts the learned model to new low-resource MO tasks. Meta-MO was shown to consistently outperform several pretraining and multitask training procedures, providing an average improvement in the success rate of 4.3% on a large-scale bioactivity data set with diverse target variations. We also observed that Meta-MO resulted in the best performing models across fine-tuning sets with only dozens of samples. To the best of our knowledge, this is the first study to apply meta learning to MO tasks. More importantly, such a strategy could be further extended to many low-resource scenarios in real-world drug design.


Asunto(s)
Algoritmos
18.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2775-2780, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33705321

RESUMEN

A novel coronavirus (COVID-19) recently emerged as an acute respiratory syndrome, and has caused a pneumonia outbreak world-widely. As the COVID-19 continues to spread rapidly across the world, computed tomography (CT) has become essentially important for fast diagnoses. Thus, it is urgent to develop an accurate computer-aided method to assist clinicians to identify COVID-19-infected patients by CT images. Here, we have collected chest CT scans of 88 patients diagnosed with COVID-19 from hospitals of two provinces in China, 100 patients infected with bacteria pneumonia, and 86 healthy persons for comparison and modeling. Based on the data, a deep learning-based CT diagnosis system was developed to identify patients with COVID-19. The experimental results showed that our model could accurately discriminate the COVID-19 patients from the bacteria pneumonia patients with an AUC of 0.95, recall (sensitivity) of 0.96, and precision of 0.79. When integrating three types of CT images, our model achieved a recall of 0.93 with precision of 0.86 for discriminating COVID-19 patients from others. Moreover, our model could extract main lesion features, especially the ground-glass opacity (GGO), which are visually helpful for assisted diagnoses by doctors. An online server is available for online diagnoses with CT images by our server (http://biomed.nscc-gz.cn/model.php). Source codes and datasets are available at our GitHub (https://github.com/SY575/COVID19-CT).


Asunto(s)
COVID-19/diagnóstico por imagen , COVID-19/diagnóstico , Aprendizaje Profundo , Diagnóstico por Computador/estadística & datos numéricos , Tomografía Computarizada por Rayos X/estadística & datos numéricos , Estudios de Casos y Controles , China , Biología Computacional , Diagnóstico Diferencial , Humanos , Modelos Estadísticos , Neumonía Bacteriana/diagnóstico , Neumonía Bacteriana/diagnóstico por imagen , SARS-CoV-2
19.
J Cheminform ; 13(1): 7, 2021 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-33557952

RESUMEN

Protein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information. In this study, we have developed a new structure-aware method GraphSol to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps only from the sequence. GraphSol was shown to substantially outperform other sequence-based methods. The model was proven to be stable by consistent [Formula: see text] of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based protein solubility predictions. More importantly, this architecture could be easily extended to other protein prediction tasks requiring a raw protein sequence.

20.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33341877

RESUMEN

Biomedical knowledge graphs (KGs), which can help with the understanding of complex biological systems and pathologies, have begun to play a critical role in medical practice and research. However, challenges remain in their embedding and use due to their complex nature and the specific demands of their construction. Existing studies often suffer from problems such as sparse and noisy datasets, insufficient modeling methods and non-uniform evaluation metrics. In this work, we established a comprehensive KG system for the biomedical field in an attempt to bridge the gap. Here, we introduced PharmKG, a multi-relational, attributed biomedical KG, composed of more than 500 000 individual interconnections between genes, drugs and diseases, with 29 relation types over a vocabulary of ~8000 disambiguated entities. Each entity in PharmKG is attached with heterogeneous, domain-specific information obtained from multi-omics data, i.e. gene expression, chemical structure and disease word embedding, while preserving the semantic and biomedical features. For baselines, we offered nine state-of-the-art KG embedding (KGE) approaches and a new biological, intuitive, graph neural network-based KGE method that uses a combination of both global network structure and heterogeneous domain features. Based on the proposed benchmark, we conducted extensive experiments to assess these KGE models using multiple evaluation metrics. Finally, we discussed our observations across various downstream biological tasks and provide insights and guidelines for how to use a KG in biomedicine. We hope that the unprecedented quality and diversity of PharmKG will lead to advances in biomedical KG construction, embedding and application.


Asunto(s)
Investigación Biomédica , Minería de Datos , Redes Neurales de la Computación , Semántica , Programas Informáticos , Benchmarking , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA