Your browser doesn't support javascript.
loading
High-Dimensional Overdispersed Generalized Factor Model With Application to Single-Cell Sequencing Data Analysis.
Nie, Jinyu; Qin, Zhilong; Liu, Wei.
Afiliación
  • Nie J; Center of Statistical Research and School of Statistics, Southwestern University of Finance and Economics, Chengdu, China.
  • Qin Z; Institute of Western China Economic Research, Southwestern University of Finance and Economics, Chengdu, China.
  • Liu W; School of Mathematics, Sichuan University, Chengdu, China.
Stat Med ; 43(25): 4836-4849, 2024 Nov 10.
Article en En | MEDLINE | ID: mdl-39237124
ABSTRACT
The current high-dimensional linear factor models fail to account for the different types of variables, while high-dimensional nonlinear factor models often overlook the overdispersion present in mixed-type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high-dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state-of-the-art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Algoritmos / Simulación por Computador / Modelos Estadísticos / Análisis de la Célula Individual Límite: Humans Idioma: En Revista: Stat Med Año: 2024 Tipo del documento: Article País de afiliación: China Pais de publicación: Reino Unido

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Algoritmos / Simulación por Computador / Modelos Estadísticos / Análisis de la Célula Individual Límite: Humans Idioma: En Revista: Stat Med Año: 2024 Tipo del documento: Article País de afiliación: China Pais de publicación: Reino Unido