Your browser doesn't support javascript.
loading
Bacterial genomes lacking long-range correlations may not be modeled by low-order Markov chains: the role of mixing statistics and frame shift of neighboring genes.
Cocho, Germinal; Miramontes, Pedro; Mansilla, Ricardo; Li, Wentian.
Afiliação
  • Cocho G; Departamento de Sistemas Complejos, Instituto de Física, Universidad Nacional Autonoma de Mexico, Ciudad Universitaria, Mexico 04510, DF, Mexico.
  • Miramontes P; Facultad de Ciencias, Universidad Nacional Autónoma de México, Ciudad Universitaria, México 04510, DF, Mexico. Electronic address: pmv@ciencias.unam.mx.
  • Mansilla R; Centro de Investigaciones Interdisciplinarias en Ciencias y Hamanidades, Universidad Nacional Autónoma de México, Ciudad Universitaria, Mexico 04510, DF, Mexico.
  • Li W; The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA. Electronic address: wtli2012@gmail.com.
Comput Biol Chem ; 53 Pt A: 15-25, 2014 Dec.
Article em En | MEDLINE | ID: mdl-25257406
We examine the relationship between exponential correlation functions and Markov models in a bacterial genome in detail. Despite the well known fact that Markov models generate sequences with correlation function that decays exponentially, simply constructed Markov models based on nearest-neighbor dimer (first-order), trimer (second-order), up to hexamer (fifth-order), and treating the DNA sequence as being homogeneous all fail to predict the value of exponential decay rate. Even reading-frame-specific Markov models (both first- and fifth-order) could not explain the fact that the exponential decay is very slow. Starting with the in-phase coding-DNA-sequence (CDS), we investigated correlation within a fixed-codon-position subsequence, and in artificially constructed sequences by packing CDSs with out-of-phase spacers, as well as altering CDS length distribution by imposing an upper limit. From these targeted analyses, we conclude that the correlation in the bacterial genomic sequence is mainly due to a mixing of heterogeneous statistics at different codon positions, and the decay of correlation is due to the possible out-of-phase between neighboring CDSs. There are also small contributions to the correlation from bases at the same codon position, as well as by non-coding sequences. These show that the seemingly simple exponential correlation functions in bacterial genome hide a complexity in correlation structure which is not suitable for a modeling by Markov chain in a homogeneous sequence. Other results include: use of the (absolute value) second largest eigenvalue to represent the 16 correlation functions and the prediction of a 10-11 base periodicity from the hexamer frequencies.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: DNA Bacteriano / Genoma Bacteriano / Análise de Sequência de DNA / Escherichia coli Enteropatogênica / Mycobacterium tuberculosis Tipo de estudo: Health_economic_evaluation / Prognostic_studies Idioma: En Revista: Comput Biol Chem Assunto da revista: BIOLOGIA / INFORMATICA MEDICA / QUIMICA Ano de publicação: 2014 Tipo de documento: Article País de afiliação: México País de publicação: Reino Unido

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: DNA Bacteriano / Genoma Bacteriano / Análise de Sequência de DNA / Escherichia coli Enteropatogênica / Mycobacterium tuberculosis Tipo de estudo: Health_economic_evaluation / Prognostic_studies Idioma: En Revista: Comput Biol Chem Assunto da revista: BIOLOGIA / INFORMATICA MEDICA / QUIMICA Ano de publicação: 2014 Tipo de documento: Article País de afiliação: México País de publicação: Reino Unido