Benchmarking database systems for Genomic Selection implementation.

Nti-Addae, Yaw; Matthews, Dave; Ulat, Victor Jun; Syed, Raza; Sempéré, Guilhem; Pétel, Adrien; Renner, Jon; Larmande, Pierre; Guignon, Valentin; Jones, Elizabeth; Robbins, Kelly

Nti-Addae, Yaw; Matthews, Dave; Ulat, Victor Jun; Syed, Raza; Sempéré, Guilhem; Pétel, Adrien; Renner, Jon; Larmande, Pierre; Guignon, Valentin; Jones, Elizabeth; Robbins, Kelly.

Afiliación

Nti-Addae Y; Institute of Biotechnology, Cornell University.
Matthews D; Boyce Thompson Institute.
Ulat VJ; Centro Internacional de Mejoramiento de Maíz y Trigo (CIMMYT).
Syed R; Institute of Biotechnology, Cornell University.
Sempéré G; INTERTRYP, Univ Montpellier, CIRAD, IRD.
Pétel A; UMR PVBMT, CIRAD.
Renner J; University of Minnesota.
Larmande P; UMR DIADE, IRD, University of Montpellier.
Guignon V; Bioversity International.
Jones E; Institute of Biotechnology, Cornell University.
Robbins K; Section of Plant Breeding and Genetics, School of Integrative Plants Sciences, Cornell University.

Database (Oxford) ; 20192019 01 01.

Article en En | MEDLINE | ID: mdl-31508797

RESUMEN

MOTIVATION: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. RESULTS: We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix. AVAILABILITY: http://gobiin1.bti.cornell.edu:6083/projects/GBM/repos/benchmarking/browse.

Asunto(s)

Bases de Datos Genéticas; Genómica; Genotipo; Técnicas de Genotipaje; Almacenamiento y Recuperación de la Información; Programas Informáticos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Programas Informáticos / Almacenamiento y Recuperación de la Información / Genómica / Bases de Datos Genéticas / Técnicas de Genotipaje / Genotipo Idioma: En Revista: Database (Oxford) Año: 2019 Tipo del documento: Article Pais de publicación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google