A general framework for integrative analysis of incomplete multiomics data.
Genet Epidemiol
; 44(7): 646-664, 2020 10.
Article
en En
| MEDLINE
| ID: mdl-32691502
There is a tremendous current interest in measuring multiple types of omics features (e.g., DNA sequences, RNA expressions, methylation profiles, metabolic profiles, protein expressions) on a large number of subjects. Although genotypes are typically available for all study subjects, other data types may be measured only on a subset of subjects due to cost or other constraints. In addition, quantitative omics measurements, such as metabolite levels and protein expressions, are subject to detection limits in that the measurements below (or above) certain thresholds are not detectable. In this article, we propose a rigorous and powerful approach to handle missing values and detection limits in integrative analysis of multiomics data. We relate quantitative omics variables to genetic variants and other variables through linear regression models and relate phenotypes to quantitative omics variables and other variables through generalized linear models. We derive the joint-likelihood for the two sets of models by allowing arbitrary patterns of missing values and detection limits for quantitative omics variables. We carry out maximum-likelihood estimation through computationally fast and stable algorithms. The resulting estimators are approximately unbiased and statistically efficient. An application to a major study on chronic obstructive lung disease yielded new biological insights.
Palabras clave
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Algoritmos
/
Genómica
/
Proteómica
/
Análisis de Datos
Tipo de estudio:
Prognostic_studies
Límite:
Humans
Idioma:
En
Revista:
Genet Epidemiol
Asunto de la revista:
EPIDEMIOLOGIA
/
GENETICA MEDICA
Año:
2020
Tipo del documento:
Article
Pais de publicación:
Estados Unidos