Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Anal Bioanal Chem ; 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39212696

RESUMEN

Integration of glycan-related databases between different research fields is essential in glycoscience. It requires knowledge across the breadth of science because most glycans exist as glycoconjugates. On the other hand, especially between chemistry and biology, glycan data has not been easy to integrate due to the huge variety of glycan structure representations. We have developed WURCS (Web 3.0 Unique Representation of Carbohydrate Structures) as a notation for representing all glycan structures uniquely for the purpose of integrating data across scientific data resources. While the integration of glycan data in the field of biology has been greatly advanced, in the field of chemistry, progress has been hampered due to the lack of appropriate rules to extract sugars from chemical structures. Thus, we developed a unique algorithm to determine the range of structures allowed to be considered as sugars from the structural formulae of compounds, and we developed software to extract sugars in WURCS format according to this algorithm. In this manuscript, we show that our algorithm can extract sugars from glycoconjugate molecules represented at the molecular level and can distinguish them from other biomolecules, such as amino acids, nucleic acids, and lipids. Available as software, MolWURCS is freely available and downloadable ( https://gitlab.com/glycoinfo/molwurcs ).

2.
Proteomics ; 24(14): e2300431, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38468111

RESUMEN

SWATH is a data acquisition strategy acclaimed for generating quantitatively accurate and consistent measurements of proteins across multiple samples. Its utility for proteomics studies in nonlaboratory animals, however, is currently compromised by the lack of sufficiently comprehensive and reliable public libraries, either experimental or predicted, and relevant platforms that support their sharing and utilization in an intuitive manner. Here we describe the development of the Veterinary Proteome Browser, VPBrowse (http://browser.proteo.cloud/), an on-line platform for genome-based representation of the Bos taurus proteome, which is equipped with an interactive database and tools for searching, visualization, and building quantitative mass spectrometry assays. In its current version (VPBrowse 1.0), it contains high-quality fragmentation spectra acquired on QToF instrument for over 36,000 proteotypic peptides, the experimental evidence for over 10,000 proteins. Data can be downloaded in different formats to enable analysis using popular software packages for SWATH data processing whilst normalization to iRT scale ensures compatibility with diverse chromatography systems. When applied to published blood plasma dataset from the biomarker discovery study, the resource supported label-free quantification of additional proteins not reported by the authors previously including PSMA4, a tissue leakage protein and a promising candidate biomarker of animal's response to dehorning-related injury.


Asunto(s)
Proteoma , Proteómica , Programas Informáticos , Espectrometría de Masas en Tándem , Bovinos , Animales , Espectrometría de Masas en Tándem/métodos , Proteómica/métodos , Proteoma/análisis , Bases de Datos de Proteínas , Genoma/genética
3.
J Integr Bioinform ; 20(3)2023 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-38073025

RESUMEN

Applications of Artificial Intelligence in medical informatics solutions risk sharing have social value. At a time of ever-increasing cost for the provision of medicines to citizens, there is a need to restrain the growth of health care costs. The search for computer technologies to stop or slow down the growth of costs acquires a new very important and significant meaning. We discussed the two information technologies in pharmacotherapy and the possibility of combining and sharing them, namely the combination of risk-sharing agreements and Machine Learning, which was made possible by the development of Artificial Intelligence (AI). Neural networks could be used to predict the outcome to reduce the risk factors for treatment. AI-based data processing automation technologies could be also used for risk-sharing agreements automation.


Asunto(s)
Inteligencia Artificial , Aprendizaje Automático , Redes Neurales de la Computación , Automatización , Gestión de Riesgos
4.
BMC Bioinformatics ; 24(1): 475, 2023 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-38097955

RESUMEN

BACKGROUND: The standardization of biological data using unique identifiers is vital for seamless data integration, comprehensive interpretation, and reproducibility of research findings, contributing to advancements in bioinformatics and systems biology. Despite being widely accepted as a universal identifier, scientific names for biological species have inherent limitations, including lack of stability, uniqueness, and convertibility, hindering their effective use as identifiers in databases, particularly in natural product (NP) occurrence databases, posing a substantial obstacle to utilizing this valuable data for large-scale research applications. RESULT: To address these challenges and facilitate high-throughput analysis of biological data involving scientific names, we developed PhyloSophos, a Python package that considers the properties of scientific names and taxonomic systems to accurately map name inputs to entries within a chosen reference database. We illustrate the importance of assessing multiple taxonomic databases and considering taxonomic syntax-based pre-processing using NP occurrence databases as an example, with the ultimate goal of integrating heterogeneous information into a single, unified dataset. CONCLUSIONS: We anticipate PhyloSophos to significantly aid in the systematic processing of poorly digitized and curated biological data, such as biodiversity information and ethnopharmacological resources, enabling full-scale bioinformatics analysis using these valuable data resources.


Asunto(s)
Productos Biológicos , Reproducibilidad de los Resultados , Algoritmos , Bases de Datos Factuales , Biología Computacional
5.
JMIR Med Inform ; 11: e46725, 2023 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-38153801

RESUMEN

Background: In recent years, many researchers have focused on the use of legacy data, such as pooled analyses that collect and reanalyze data from multiple studies. However, the methodology for the integration of preexisting databases whose data were collected for different purposes has not been established. Previously, we developed a tool to efficiently generate Study Data Tabulation Model (SDTM) data from hypothetical clinical trial data using the Clinical Data Interchange Standards Consortium (CDISC) SDTM. Objective: This study aimed to design a practical model for integrating preexisting databases using the CDISC SDTM. Methods: Data integration was performed in three phases: (1) the confirmation of the variables, (2) SDTM mapping, and (3) the generation of the SDTM data. In phase 1, the definitions of the variables in detail were confirmed, and the data sets were converted to a vertical structure. In phase 2, the items derived from the SDTM format were set as mapping items. Three types of metadata (domain name, variable name, and test code), based on the CDISC SDTM, were embedded in the Research Electronic Data Capture (REDCap) field annotation. In phase 3, the data dictionary, including the SDTM metadata, was outputted in the Operational Data Model (ODM) format. Finally, the mapped SDTM data were generated using REDCap2SDTM version 2. Results: SDTM data were generated as a comma-separated values file for each of the 7 domains defined in the metadata. A total of 17 items were commonly mapped to 3 databases. Because the SDTM data were set in each database correctly, we were able to integrate 3 independently preexisting databases into 1 database in the CDISC SDTM format. Conclusions: Our project suggests that the CDISC SDTM is useful for integrating multiple preexisting databases.

6.
Biophys Rev ; 15(5): 807-809, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37974980

RESUMEN

We present commentaries on the section "Biophysical education" of the VII Congress of Russian Biophysicists. Presentations are briefly introduced along with the current problems of biophysical education and the educational approach development. We discuss educational course on bioinformatics based on the integration of online databases and the usage of internet platforms for functional annotation of genes and proteins.

7.
J Stroke Cerebrovasc Dis ; 31(3): 106236, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-34954597

RESUMEN

OBJECTIVE: Administrative databases seldom include detailed clinical variables and vital status, limiting the scope of population-based studies. We demonstrate a comprehensive process for integrating 3 databases (all-payor inpatient hospitalizations, clinical acute stroke registry and vital statistics) into a single statewide ischemic stroke database. MATERIALS AND METHODS: The 3 Massachusetts databases spanned 2007-2017. Our integration process was composed of 3 phases: 1) hospitalizations-registry linkage, 2) hospitalizations-vital linkage, and 3) final integration of all 3 databases. Following data uniqueness assessment, rule-based deterministic linkage on indirect identifiers were applied in the first two phases. We validated the linkages by comparing additional patient variables not used in the linkage process in the absence of a gold standard database crosswalk. RESULTS: During the overlapping period from 1/1/2008 to 9/30/2015, there were 47,713 stroke admissions in the hospitalizations database and 43,487 admissions in the registry. We linked 38,493 (80.7%) of cases, 95% of which were validated. There were 391,176 deaths reported in Massachusetts between 1/1/2010 and 3/6/2017 in the vital database. Of the 38,493 encounters in the hospitalizations-registry linked data, 10,660 (27.7%) were linked to deaths, reflecting the cumulative mortality over the 7-year period among all registry-linked ischemic stroke hospitalization records. CONCLUSION: We demonstrate that a high-quality integration of the statewide hospitalizations, clinical registry, and vital statistics databases is achievable leveraging indirect identifiers. This data integration framework takes advantage of rich clinical data in registries and long term outcomes from hospitalizations and vital records and may have value for larger scale outcomes research.


Asunto(s)
Bases de Datos Factuales , Accidente Cerebrovascular Isquémico , Hospitalización , Humanos , Accidente Cerebrovascular Isquémico/epidemiología , Accidente Cerebrovascular Isquémico/terapia , Massachusetts/epidemiología , Sistema de Registros , Estadísticas Vitales
8.
J Comput Biol ; 28(6): 619-628, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-34081565

RESUMEN

Biomedical Entity Explorer (BEE) is a web server that can search for biomedical entities from a database of six biomedical entity types (gene, miRNA, drug, disease, single nucleotide polymorphism [SNP], pathway) and their gene associations. The search results can be explored using intersections, unions, and negations. BEE has integrated biomedical entities from 16 databases (Ensemble, PharmGKB, Genetic Home Reference, Tarbase, Mirbase, NCI Thesaurus, DisGeNET, Linked life data, UMLS, GSEA MsigDB, Reactome, KEGG, Gene Ontology, HGVD, SNPedia, and dbSNP) based on their gene associations and built a database with their synonyms, descriptions, and links containing individual details. Users can enter the keyword of one or more entities and select the type of entity for which they want to know the relationship for and by using set operations such as union, negation, and intersection, they can navigate the search results more clearly. We believe that BEE will not only be useful for biologists querying for complex associations between entities, but can also be a good starting point for general users searching for biomedical entities. BEE is accessible at (http://bike-bee.snu.ac.kr).


Asunto(s)
Biología Computacional/métodos , Programas Informáticos , Motor de Búsqueda , Análisis de Secuencia/métodos
9.
Front Microbiol ; 12: 593979, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33552037

RESUMEN

Synthetic biology seeks to create new biological parts, devices, and systems, and to reconfigure existing natural biological systems for custom-designed purposes. The standardized BioBrick parts are the foundation of synthetic biology. The incomplete and flawed metadata of BioBrick parts, however, are a major obstacle for designing genetic circuit easily, quickly, and accurately. Here, a database termed BioMaster http://www.biomaster-uestc.cn was developed to extensively complement information about BioBrick parts, which includes 47,934 items of BioBrick parts from the international Genetically Engineered Machine (iGEM) Registry with more comprehensive information integrated from 10 databases, providing corresponding information about functions, activities, interactions, and related literature. Moreover, BioMaster is also a user-friendly platform for retrieval and analyses of relevant information on BioBrick parts.

10.
J Pak Med Assoc ; 70(9): 1572-1576, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33040111

RESUMEN

OBJECTIVE: To highlight clinical scenarios and healthcare practitioners' difficulties where computer applications can help in multimorbidity management. METHODS: The cross-sectional study was conducted from December 2017 to January 2019 in the twin cities of Rawalpindi and Islamabad, Pakistan, and comprised local physicians/practitioners. Data was collected using a self-generated questionnaire which was distributed among the subjects. It identified four problems as most commonly faced: treatment/dose management, time management, forgetting to ask necessary questions about disease, and 'others', such as bad handwriting errors and ethical issues. Data was analysed using SPSS 17. RESULTS: Of the 53 subjects, 33(62%) marked problems related to treatment management, 35(66%) marked problems related to shortage of time, 34(64%) marked those related to difficulty in asking relevant questions about disease, 15(28%) marked the 'other' option. CONCLUSIONS: Computer technologies are significantly helpful in managing the problems of treating multimorbidity by adopting standard database.


Asunto(s)
Atención a la Salud , Multimorbilidad , Computadores , Estudios Transversales , Humanos , Pakistán
11.
BMC Bioinformatics ; 20(1): 243, 2019 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-31092193

RESUMEN

BACKGROUND: The complexity of representing biological systems is compounded by an ever-expanding body of knowledge emerging from multi-omics experiments. A number of pathway databases have facilitated pathway-centric approaches that assist in the interpretation of molecular signatures yielded by these experiments. However, the lack of interoperability between pathway databases has hindered the ability to harmonize these resources and to exploit their consolidated knowledge. Such a unification of pathway knowledge is imperative in enhancing the comprehension and modeling of biological abstractions. RESULTS: Here, we present PathMe, a Python package that transforms pathway knowledge from three major pathway databases into a unified abstraction using Biological Expression Language as the pivotal, integrative schema. PathMe is complemented by a novel web application (freely available at https://pathme.scai.fraunhofer.de/ ) which allows users to comprehensively explore pathway crosstalk and compare areas of consensus and discrepancies. CONCLUSIONS: This work has harmonized three major pathway databases and transformed them into a unified schema in order to gain a holistic picture of pathway knowledge. We demonstrate the utility of the PathMe framework in: i) integrating pathway landscapes at the database level, ii) comparing the degree of consensus at the pathway level, and iii) exploring pathway crosstalk and investigating consensus at the molecular level.


Asunto(s)
Transducción de Señal , Programas Informáticos , Biología Computacional , Bases de Datos como Asunto , Bases de Datos Factuales , Humanos , Serina-Treonina Quinasas TOR/metabolismo
12.
J Integr Bioinform ; 16(1)2018 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-30808160

RESUMEN

Metabolism has been a major field of study in the last years, mainly due to its importance in understanding cell physiology and certain disease phenotypes due to its deregulation. Genome-scale metabolic models (GSMMs) have been established as important tools to help achieve a better understanding of human metabolism. Towards this aim, advances in systems biology and bioinformatics have allowed the reconstruction of several human GSMMs, although some limitations and challenges remain, such as the lack of external identifiers for both metabolites and reactions. A pipeline was developed to integrate multiple GSMMs, starting by retrieving information from the main human GSMMs and evaluating the presence of external database identifiers and annotations for both metabolites and reactions. Information from metabolites was included into a graph database with omics data repositories, allowing clustering of metabolites through their similarity regarding database cross-referencing. Metabolite annotation of several older GSMMs was enriched, allowing the identification and integration of common entities. Using this information, as well as other metrics, we successfully integrated reactions from these models. These methods can be leveraged towards the creation of a unified consensus model of human metabolism.


Asunto(s)
Biología Computacional/métodos , Genoma Humano , Redes y Vías Metabólicas , Modelos Estadísticos , Bases de Datos Factuales , Humanos , Anotación de Secuencia Molecular , Transcripción Genética
13.
BMC Bioinformatics ; 18(1): 93, 2017 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-28178937

RESUMEN

BACKGROUND: Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. RESULTS: We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. CONCLUSIONS: SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .


Asunto(s)
Redes de Comunicación de Computadores , Bases de Datos Factuales , Internet
14.
J Struct Funct Genomics ; 17(4): 69-81, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-28012137

RESUMEN

Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein-protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts' knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/ .


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Genoma , Internet , Programas Informáticos , Animales , Humanos , Ratones , Conformación Proteica , Ratas , Análisis de Secuencia de ADN
15.
Oncotarget ; 7(32): 51619-51625, 2016 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-27322211

RESUMEN

The consistency of in vitro drug sensitivity data is of key importance for cancer pharmacogenomics. Previous attempts to correlate drug sensitivities from the large pharmacogenomics databases, such as the Cancer Cell Line Encyclopedia (CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC), have produced discordant results. We developed a new drug sensitivity metric, the area under the dose response curve adjusted for the range of tested drug concentrations, which allows integration of heterogeneous drug sensitivity data from the CCLE, the GDSC, and the Cancer Therapeutics Response Portal (CTRP). We show that there is moderate to good agreement of drug sensitivity data for many targeted therapies, particularly kinase inhibitors. The results of this largest cancer cell line drug sensitivity data analysis to date are accessible through the online portal, which serves as a platform for high power pharmacogenomics analysis.


Asunto(s)
Antineoplásicos/uso terapéutico , Recolección de Datos/métodos , Bases de Datos Genéticas , Resistencia a Antineoplásicos/genética , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Pruebas de Farmacogenómica , Línea Celular Tumoral , Genómica/métodos , Humanos , Almacenamiento y Recuperación de la Información , Farmacogenética , Interfaz Usuario-Computador
16.
J Struct Biol ; 194(2): 231-4, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-26873783

RESUMEN

With the advent of high throughput techniques like Next Generation Sequencing, the amount of biological information for genes and proteins is growing faster than ever. Structural information is also rapidly growing, especially in the cryo Electron Microscopy area. However, in many cases, the proteomic and genomic data are spread in multiple databases and with no simple connection to structural information. In this work we present a new web platform that integrates EMDB/PDB structures and UniProt sequences with different sources of protein annotations. The application provides an interactive interface linking sequence and structure, including EM maps, presenting the different sources of information at sequence and structural level. The web application is available at http://3dbionotes.cnb.csic.es.


Asunto(s)
Proteómica/estadística & datos numéricos , Programas Informáticos , Proteína de la Poliposis Adenomatosa del Colon/química , Proteína de la Poliposis Adenomatosa del Colon/genética , Proteína de la Poliposis Adenomatosa del Colon/metabolismo , Secuencia de Aminoácidos , Antígenos CD , Secuencia de Bases , Cadherinas/química , Cadherinas/genética , Cadherinas/metabolismo , Proteínas de Ciclo Celular/química , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Bases de Datos de Proteínas , Proteínas F-Box/química , Proteínas F-Box/genética , Proteínas F-Box/metabolismo , Expresión Génica , Humanos , Internet , Modelos Moleculares , Conformación Proteica , Proteínas Proto-Oncogénicas p21(ras)/química , Proteínas Proto-Oncogénicas p21(ras)/genética , Proteínas Proto-Oncogénicas p21(ras)/metabolismo , Relación Estructura-Actividad
17.
Res. Biomed. Eng. (Online) ; 31(3): 196-207, July-Sept. 2015. tab, graf
Artículo en Inglés | LILACS | ID: biblio-829435

RESUMEN

AbstractIntroductionThis paper's aim is to develop a data warehouse from the integration of the files of three Brazilian health information systems concerned with the production of ambulatory and hospital procedures for cancer care, and cancer mortality. These systems do not have a unique patient identification, which makes their integration difficult even within a single system.MethodsData from the Brazilian Public Hospital Information System (SIH-SUS), the Oncology Module for the Outpatient Information System (APAC-ONCO) and the Mortality Information System (SIM) for the State of Rio de Janeiro, in the period from January 2000 to December 2004 were used. Each of the systems has the monthly data production compiled in dbase files (dbf). All the files pertaining to the same system were then read into a corresponding table in a MySQL Server 5.1. The SIH-SUS and APAC-ONCO tables were linked internally and with one another through record linkage methods. The APAC-ONCO table was linked to the SIM table. Afterwards a data warehouse was built using Pentaho and the MySQL database management system.ResultsThe sensitivities and specificities of the linkage processes were above 95% and close to 100% respectively. The data warehouse provided several analytical views that are accessed through the Pentaho Schema Workbench.ConclusionThis study presented a proposal for the integration of Brazilian Health Systems to support the building of data warehouses and provide information beyond those currently available with the individual systems.

18.
Methods Inf Med ; 54(1): 50-5, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-24777240

RESUMEN

INTRODUCTION: This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". BACKGROUND: The need for complementary access to multiple RDF databases has fostered new lines of research, but also entailed new challenges due to data representation disparities. While several approaches for RDF-based database integration have been proposed, those focused on schema alignment have become the most widely adopted. All state-of-the-art solutions for aligning RDF-based sources resort to a simple technique inherited from legacy relational database integration methods. This technique - known as element-to-element (e2e) mappings - is based on establishing 1:1 mappings between single primitive elements - e.g. concepts, attributes, relationships, etc. - belonging to the source and target schemas. However, due to the intrinsic nature of RDF - a representation language based on defining tuples < subject, predicate, object > -, one may find RDF elements whose semantics vary dramatically when combined into a view involving other RDF elements - i.e. they depend on their context. The latter cannot be adequately represented in the target schema by resorting to the traditional e2e approach. These approaches fail to properly address this issue without explicitly modifying the target ontology, thus lacking the required expressiveness for properly reflecting the intended semantics in the alignment information. OBJECTIVES: To enhance existing RDF schema alignment techniques by providing a mechanism to properly represent elements with context-dependent semantics, thus enabling users to perform more expressive alignments, including scenarios that cannot be adequately addressed by the existing approaches. METHODS: Instead of establishing 1:1 correspondences between single primitive elements of the schemas, we propose adopting a view-based approach. The latter is targeted at establishing mapping relationships between RDF subgraphs - that can be regarded as the equivalent of views in traditional databases -, rather than between single schema elements. This approach enables users to represent scenarios defined by context-dependent RDF elements that cannot be properly represented when adopting the currently existing approaches. RESULTS: We developed a software tool implementing our view-based strategy. Our tool is currently being used in the context of the European Commission funded p-medicine project, targeted at creating a technological framework to integrate clinical and genomic data to facilitate the development of personalized drugs and therapies for cancer, based on the genetic profile of the patient. We used our tool to integrate different RDF-based databases - including different repositories of clinical trials and DICOM images - using the Health Data Ontology Trunk (HDOT) ontology as the target schema. CONCLUSIONS: The importance of database integration methods and tools in the context of biomedical research has been widely recognized. Modern research in this area - e.g. identification of disease biomarkers, or design of personalized therapies - heavily relies on the availability of a technical framework to enable researchers to uniformly access disparate repositories. We present a method and a tool that implement a novel alignment method specifically designed to support and enhance the integration of RDF-based data sources at schema (metadata) level. This approach provides an increased level of expressiveness compared to other existing solutions, and allows solving heterogeneity scenarios that cannot be properly represented using other state-of-the-art techniques.


Asunto(s)
Acceso a la Información , Ontologías Biológicas , Investigación Biomédica , Bases de Datos como Asunto , Programas Informáticos , Procesamiento de Lenguaje Natural , Semántica , Integración de Sistemas
19.
Plant Cell Physiol ; 55(1): e8, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24363285

RESUMEN

The Plant Genome DataBase Japan (PGDBj, http://pgdbj.jp/?ln=en) is a portal website that aims to integrate plant genome-related information from databases (DBs) and the literature. The PGDBj is comprised of three component DBs and a cross-search engine, which provides a seamless search over the contents of the DBs. The three DBs are as follows. (i) The Ortholog DB, providing gene cluster information based on the amino acid sequence similarity. Over 500,000 amino acid sequences of 20 Viridiplantae species were subjected to reciprocal BLAST searches and clustered. Sequences from plant genome DBs (e.g. TAIR10 and RAP-DB) were also included in the cluster with a direct link to the original DB. (ii) The Plant Resource DB, integrating the SABRE DB, which provides cDNA and genome sequence resources accumulated and maintained in the RIKEN BioResource Center and National BioResource Projects. (iii) The DNA Marker DB, providing manually or automatically curated information of DNA markers, quantitative trait loci and related linkage maps, from the literature and external DBs. As the PGDBj targets various plant species, including model plants, algae, and crops important as food, fodder and biofuel, researchers in the field of basic biology as well as a wide range of agronomic fields are encouraged to perform searches using DNA sequences, gene names, traits and phenotypes of interest. The PGDBj will return the search results from the component DBs and various types of linked external DBs.


Asunto(s)
Bases de Datos Genéticas , Genoma de Planta/genética , Internet , Mapeo Cromosómico , Marcadores Genéticos , Japón , Sitios de Carácter Cuantitativo/genética , Homología de Secuencia de Aminoácido
20.
Cancer Inform ; 2: 277-87, 2007 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-19458771

RESUMEN

Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.).

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA