Búsqueda | Portal Regional de la BVS

elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.

Herzeel, Charlotte; Costanza, Pascal; Decap, Dries; Fostier, Jan; Reumers, Joke.

PLoS One ; 10(7): e0132868, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26182406

RESUMEN

elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.

Asunto(s)

Algoritmos , Exoma , Genoma Humano , Alineación de Secuencia/economía , Programas Informáticos , Benchmarking , Mapeo Contig , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple , Alineación de Secuencia/métodos , Alineación de Secuencia/estadística & datos numéricos

ARYANA: Aligning Reads by Yet Another Approach.

Gholami, Milad; Arbabi, Aryan; Sharifi-Zarchi, Ali; Chitsaz, Hamidreza; Sadeghi, Mehdi.

BMC Bioinformatics ; 15 Suppl 9: S12, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-25252881

RESUMEN

MOTIVATION: Although there are many different algorithms and software tools for aligning sequencing reads, fast gapped sequence search is far from solved. Strong interest in fast alignment is best reflected in the $10(6) prize for the Innocentive competition on aligning a collection of reads to a given database of reference genomes. In addition, de novo assembly of next-generation sequencing long reads requires fast overlap-layout-concensus algorithms which depend on fast and accurate alignment. CONTRIBUTION: We introduce ARYANA, a fast gapped read aligner, developed on the base of BWA indexing infrastructure with a completely new alignment engine that makes it significantly faster than three other aligners: Bowtie2, BWA and SeqAlto, with comparable generality and accuracy. Instead of the time-consuming backtracking procedures for handling mismatches, ARYANA comes with the seed-and-extend algorithmic framework and a significantly improved efficiency by integrating novel algorithmic techniques including dynamic seed selection, bidirectional seed extension, reset-free hash tables, and gap-filling dynamic programming. As the read length increases ARYANA's superiority in terms of speed and alignment rate becomes more evident. This is in perfect harmony with the read length trend as the sequencing technologies evolve. The algorithmic platform of ARYANA makes it easy to develop mission-specific aligners for other applications using ARYANA engine. AVAILABILITY: ARYANA with complete source code can be obtained from http://github.com/aryana-aligner.

Asunto(s)

Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/economía , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Alineación de Secuencia/economía , Análisis de Secuencia de ADN/economía

A faster algorithm for simultaneous alignment and folding of RNA.

Ziv-Ukelson, Michal; Gat-Viks, Irit; Wexler, Ydo; Shamir, Ron.

J Comput Biol ; 17(8): 1051-65, 2010 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-20649420

RESUMEN

The current pairwise RNA (secondary) structural alignment algorithms are based on Sankoff's dynamic programming algorithm from 1985. Sankoff's algorithm requires O(N(6)) time and O(N(4)) space, where N denotes the length of the compared sequences, and thus its applicability is very limited. The current literature offers many heuristics for speeding up Sankoff's alignment process, some making restrictive assumptions on the length or the shape of the RNA substructures. We show how to speed up Sankoff's algorithm in practice via non-heuristic methods, without compromising optimality. Our analysis shows that the expected time complexity of the new algorithm is O(N(4)sigma(N)), where sigma(N) converges to O(N), assuming a standard polymer folding model which was supported by experimental analysis. Hence, our algorithm speeds up Sankoff's algorithm by a linear factor on average. In simulations, our algorithm speeds up computation by a factor of 3-12 for sequences of length 25-250. Code and data sets are available, upon request.

Asunto(s)

Algoritmos , ARN/química , Alineación de Secuencia/métodos , Animales , Secuencia de Bases , Caenorhabditis elegans/genética , ADN/química , Conformación de Ácido Nucleico , Alineación de Secuencia/economía

De novo assembly of human genomes with massively parallel short read sequencing.

Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun.

Genome Res ; 20(2): 265-72, 2010 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-20019144

RESUMEN

Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

Asunto(s)

Genoma Humano , Proyecto Genoma Humano , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Pueblo Asiatico/genética , Población Negra/genética , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/economía , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Alineación de Secuencia/economía , Análisis de Secuencia de ADN/economía

High-throughput sequence alignment using Graphics Processing Units.

Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh.

BMC Bioinformatics ; 8: 474, 2007 Dec 10.

Artículo en Inglés | MEDLINE | ID: mdl-18070356

RESUMEN

BACKGROUND: The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. RESULTS: This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. CONCLUSION: MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

Asunto(s)

Gráficos por Computador/instrumentación , Sistemas de Administración de Bases de Datos , Alineación de Secuencia/economía , Alineación de Secuencia/instrumentación , Animales , Bacillus anthracis/genética , Secuencia de Bases , Caenorhabditis/genética , Gráficos por Computador/economía , Computadores/economía , Mapeo Contig/economía , Mapeo Contig/instrumentación , ADN/ultraestructura , Bases de Datos Genéticas , Biblioteca Genómica , Listeria monocytogenes/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/economía , Análisis de Secuencia de ADN/instrumentación , Análisis de Secuencia de ADN/métodos , Streptococcus suis/genética , Factores de Tiempo , Simplificación del Trabajo

Improved gapped alignment in BLAST.

Cameron, Michael; Williams, Hugh E; Cannane, Adam.

IEEE/ACM Trans Comput Biol Bioinform ; 1(3): 116-29, 2004.

Artículo en Inglés | MEDLINE | ID: mdl-17048387

RESUMEN

Homology search is a key tool for understanding the role, structure, and biochemical function of genomic sequences. The most popular technique for rapid homology search is BLAST, which has been in widespread use within universities, research centers, and commercial enterprises since the early 1990s. In this paper, we propose a new step in the BLAST algorithm to reduce the computational cost of searching with negligible effect on accuracy. This new step-semigapped alignment-compromises between the efficiency of ungapped alignment and the accuracy of gapped alignment, allowing BLAST to accurately filter sequences with lower computational cost. In addition, we propose a heuristic-restricted insertion alignment-that avoids unlikely evolutionary paths with the aim of reducing gapped alignment cost with negligible effect on accuracy. Together, after including an optimization of the local alignment recursion, our two techniques more than double the speed of the gapped alignment stages in BLAST. We conclude that our techniques are an important improvement to the BLAST algorithm. Source code for the alignment algorithms is available for download at http://www.bsg.rmit.edu.au/iga/.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Alineación de Secuencia/métodos , Secuencia de Aminoácidos , Bases de Datos Genéticas , Internet , Datos de Secuencia Molecular , Reproducibilidad de los Resultados , Alineación de Secuencia/economía , Homología de Secuencia de Aminoácido

Multiple sequence alignment using simulated annealing.

Kim, J; Pramanik, S; Chung, M J.

Comput Appl Biosci ; 10(4): 419-26, 1994 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-7804875

RESUMEN

Multiple sequence alignment is a useful technique for studying molecular evolution and analyzing structure-sequence relationships. Dynamic programming of multiple sequence alignment has been widely used to find an optimal alignment. However, dynamic programming does not allow for certain types of gap costs, and it limits the number of sequences that can be aligned due to its high computational complexity. The focus of this paper is to use simulated annealing as the basis for developing an efficient multiple sequence alignment algorithm. An algorithm called Multiple Sequence Alignment using Simulated Annealing (MSASA) has been developed. The computational complexity of MSASA is significantly reduced by replacing the high-temperature phase of the annealing process by a fast heuristic algorithm. This heuristic algorithm facilitates in minimizing the solution set of the low-temperature phase of the annealing process. Compared to the dynamic programming approach, MSASA can (i) use natural gap costs which can generate better solution, (ii) align more sequences and (iii) take less computation time.

Asunto(s)

Algoritmos , Alineación de Secuencia/métodos , Secuencia de Aminoácidos , Animales , Costos y Análisis de Costo , Estudios de Evaluación como Asunto , Humanos , Datos de Secuencia Molecular , Proteínas/genética , Alineación de Secuencia/economía , Alineación de Secuencia/estadística & datos numéricos , Homología de Secuencia de Aminoácido , Programas Informáticos , Factores de Tiempo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA