Búsqueda | Portal Regional de la BVS

1.

Machine Learning-Based Risk Prediction of Discharge Status for Sepsis.

Cai, Kaida; Lou, Yuqing; Wang, Zhengyan; Yang, Xiaofang; Zhao, Xin.

Entropy (Basel) ; 26(8)2024 Jul 25.

Artículo en Inglés | MEDLINE | ID: mdl-39202095

RESUMEN

As a severe inflammatory response syndrome, sepsis presents complex challenges in predicting patient outcomes due to its unclear pathogenesis and the unstable discharge status of affected individuals. In this study, we develop a machine learning-based method for predicting the discharge status of sepsis patients, aiming to improve treatment decisions. To enhance the robustness of our analysis against outliers, we incorporate robust statistical methods, specifically the minimum covariance determinant technique. We utilize the random forest imputation method to effectively manage and impute missing data. For feature selection, we employ Lasso penalized logistic regression, which efficiently identifies significant predictors and reduces model complexity, setting the stage for the application of more complex predictive methods. Our predictive analysis incorporates multiple machine learning methods, including random forest, support vector machine, and XGBoost. We compare the prediction performance of these methods with Lasso penalized logistic regression to identify the most effective approach. Each method's performance is rigorously evaluated through ten iterations of 10-fold cross-validation to ensure robust and reliable results. Our comparative analysis reveals that XGBoost surpasses the other models, demonstrating its exceptional capability to navigate the complexities of sepsis data effectively.

2.

Missing Values in Longitudinal Proteome Dynamics Studies: Making a Case for Data Multiple Imputation.

Yan, Yu; Sankar, Baradwaj Simha; Mirza, Bilal; Ng, Dominic C M; Pelletier, Alexander R; Huang, Sarah D; Wang, Wei; Watson, Karol; Wang, Ding; Ping, Peipei.

J Proteome Res ; 23(9): 4151-4162, 2024 Sep 06.

Artículo en Inglés | MEDLINE | ID: mdl-39189460

RESUMEN

Temporal proteomics data sets are often confounded by the challenges of missing values. These missing data points, in a time-series context, can lead to fluctuations in measurements or the omission of critical events, thus hindering the ability to fully comprehend the underlying biomedical processes. We introduce a Data Multiple Imputation (DMI) pipeline designed to address this challenge in temporal data set turnover rate quantifications, enabling robust downstream analysis to gain novel discoveries. To demonstrate its utility and generalizability, we applied this pipeline to two use cases: a murine cardiac temporal proteomics data set and a human plasma temporal proteomics data set, both aimed at examining protein turnover rates. This DMI pipeline significantly enhanced the detection of protein turnover rate in both data sets, and furthermore, the imputed data sets captured new representation of proteins, leading to an augmented view of biological pathways, protein complex dynamics, as well as biomarker-disease associations. Importantly, DMI exhibited superior performance in benchmark data sets compared to single imputation methods (DSI). In summary, we have demonstrated that this DMI pipeline is effective at overcoming challenges introduced by missing values in temporal proteome dynamics studies.

Asunto(s)

Proteoma , Proteómica , Humanos , Proteoma/análisis , Proteoma/metabolismo , Proteómica/métodos , Animales , Ratones , Estudios Longitudinales , Interpretación Estadística de Datos

3.

A comparison of machine learning methods for recovering noisy and missing 4D flow MRI data.

Csala, Hunor; Amili, Omid; D'Souza, Roshan M; Arzani, Amirhossein.

Int J Numer Method Biomed Eng ; : e3858, 2024 Aug 28.

Artículo en Inglés | MEDLINE | ID: mdl-39196308

RESUMEN

Experimental blood flow measurement techniques are invaluable for a better understanding of cardiovascular disease formation, progression, and treatment. One of the emerging methods is time-resolved three-dimensional phase-contrast magnetic resonance imaging (4D flow MRI), which enables noninvasive time-dependent velocity measurements within large vessels. However, several limitations hinder the usability of 4D flow MRI and other experimental methods for quantitative hemodynamics analysis. These mainly include measurement noise, corrupt or missing data, low spatiotemporal resolution, and other artifacts. Traditional filtering is routinely applied for denoising experimental blood flow data without any detailed discussion on why it is preferred over other methods. In this study, filtering is compared to different singular value decomposition (SVD)-based machine learning and autoencoder-type deep learning methods for denoising and filling in missing data (imputation). An artificially corrupted and voxelized computational fluid dynamics (CFD) simulation as well as in vitro 4D flow MRI data are used to test the methods. SVD-based algorithms achieve excellent results for the idealized case but severely struggle when applied to in vitro data. The autoencoders are shown to be versatile and applicable to all investigated cases. For denoising, the in vitro 4D flow MRI data, the denoising autoencoder (DAE), and the Noise2Noise (N2N) autoencoder produced better reconstructions than filtering both qualitatively and quantitatively. Deep learning methods such as N2N can result in noise-free velocity fields even though they did not use clean data during training. This work presents one of the first comprehensive assessments and comparisons of various classical and modern machine-learning methods for enhancing corrupt cardiovascular flow data in diseased arteries for both synthetic and experimental test cases.

4.

A multi-source heterogeneous medical data enhancement framework based on lakehouse.

Sheng, Ming; Wang, Shuliang; Zhang, Yong; Hao, Rui; Liang, Ye; Luo, Yi; Yang, Wenhan; Wang, Jincheng; Li, Yinan; Zheng, Wenkui; Li, Wenyao.

Health Inf Sci Syst ; 12(1): 37, 2024 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-38974364

RESUMEN

Obtaining high-quality data sets from raw data is a key step before data exploration and analysis. Nowadays, in the medical domain, a large amount of data is in need of quality improvement before being used to analyze the health condition of patients. There have been many researches in data extraction, data cleaning and data imputation, respectively. However, there are seldom frameworks integrating with these three techniques, making the dataset suffer in accuracy, consistency and integrity. In this paper, a multi-source heterogeneous data enhancement framework based on a lakehouse MHDP is proposed, which includes three steps of data extraction, data cleaning and data imputation. In the data extraction step, a data fusion technique is offered to handle multi-modal and multi-source heterogeneous data. In the data cleaning step, we propose HoloCleanX, which provides a convenient interactive procedure. In the data imputation step, multiple imputation (MI) and the SOTA algorithm SAITS, are applied for different situations. We evaluate our framework via three tasks: clustering, classification and strategy prediction. The experimental results prove the effectiveness of our data enhancement framework.

5.

Missing data imputation using classification and regression trees.

Chen, Cheng-Yang; Chang, Yu-Wei.

PeerJ Comput Sci ; 10: e2119, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38983189

RESUMEN

Background: Missing data are common when analyzing real data. One popular solution is to impute missing data so that one complete dataset can be obtained for subsequent data analysis. In the present study, we focus on missing data imputation using classification and regression trees (CART). Methods: We consider a new perspective on missing data in a CART imputation problem and realize the perspective through some resampling algorithms. Several existing missing data imputation methods using CART are compared through simulation studies, and we aim to investigate the methods with better imputation accuracy under various conditions. Some systematic findings are demonstrated and presented. These imputation methods are further applied to two real datasets: Hepatitis data and Credit approval data for illustration. Results: The method that performs the best strongly depends on the correlation between variables. For imputing missing ordinal categorical variables, the rpart package with surrogate variables is recommended under correlations larger than 0 with missing completely at random (MCAR) and missing at random (MAR) conditions. Under missing not at random (MNAR), chi-squared test methods and the rpart package with surrogate variables are suggested. For imputing missing quantitative variables, the iterative imputation method is most recommended under moderate correlation conditions.

6.

Cross-modal missing time-series imputation using dense spatio-temporal transformer nets.

Qian, Xusheng; Zhang, Teng; Miao, Meng; Xu, Gaojun; Zhang, Xuancheng; Yu, Wenwu; Chen, Duxin.

Math Biosci Eng ; 21(4): 4989-5006, 2024 Mar 01.

Artículo en Inglés | MEDLINE | ID: mdl-38872523

RESUMEN

Due to irregular sampling or device failure, the data collected from sensor network has missing value, that is, missing time-series data occurs. To address this issue, many methods have been proposed to impute random or non-random missing data. However, the imputation accuracy of these methods are not accurate enough to be applied, especially in the case of complete data missing (CDM). Thus, we propose a cross-modal method to impute time-series missing data by dense spatio-temporal transformer nets (DSTTN). This model embeds spatial modal data into time-series data by stacked spatio-temporal transformer blocks and deployment of dense connections. It adopts cross-modal constraints, a graph Laplacian regularization term, to optimize model parameters. When the model is trained, it recovers missing data finally by an end-to-end imputation pipeline. Various baseline models are compared by sufficient experiments. Based on the experimental results, it is verified that DSTTN achieves state-of-the-art imputation performance in the cases of random and non-random missing. Especially, the proposed method provides a new solution to the CDM problem.

7.

Comparing preprocessing strategies for 3D-Gene microarray data of extracellular vesicle-derived miRNAs.

Takemoto, Yuto; Ito, Daisuke; Komori, Shota; Kishimoto, Yoshiyuki; Yamada, Shinichiro; Hashizume, Atsushi; Katsuno, Masahisa; Nakatochi, Masahiro.

BMC Bioinformatics ; 25(1): 221, 2024 Jun 20.

Artículo en Inglés | MEDLINE | ID: mdl-38902629

RESUMEN

BACKGROUND: Extracellular vesicle-derived (EV)-miRNAs have potential to serve as biomarkers for the diagnosis of various diseases. miRNA microarrays are widely used to quantify circulating EV-miRNA levels, and the preprocessing of miRNA microarray data is critical for analytical accuracy and reliability. Thus, although microarray data have been used in various studies, the effects of preprocessing have not been studied for Toray's 3D-Gene chip, a widely used measurement method. We aimed to evaluate batch effect, missing value imputation accuracy, and the influence of preprocessing on measured values in 18 different preprocessing pipelines for EV-miRNA microarray data from two cohorts with amyotrophic lateral sclerosis using 3D-Gene technology. RESULTS: Eighteen different pipelines with different types and orders of missing value completion and normalization were used to preprocess the 3D-Gene microarray EV-miRNA data. Notable results were suppressed in the batch effects in all pipelines using the batch effect correction method ComBat. Furthermore, pipelines utilizing missForest for missing value imputation showed high agreement with measured values. In contrast, imputation using constant values for missing data exhibited low agreement. CONCLUSIONS: This study highlights the importance of selecting the appropriate preprocessing strategy for EV-miRNA microarray data when using 3D-Gene technology. These findings emphasize the importance of validating preprocessing approaches, particularly in the context of batch effect correction and missing value imputation, for reliably analyzing data in biomarker discovery and disease research.

Asunto(s)

Vesículas Extracelulares , MicroARNs , Análisis de Secuencia por Matrices de Oligonucleótidos , Vesículas Extracelulares/metabolismo , Vesículas Extracelulares/genética , MicroARNs/genética , MicroARNs/metabolismo , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Esclerosis Amiotrófica Lateral/genética , Esclerosis Amiotrófica Lateral/metabolismo , Perfilación de la Expresión Génica/métodos

8.

A Deep Auto-Optimized Collaborative Learning (DACL) model for disease prognosis using AI-IoMT systems.

Nandagopal, Malarvizhi; Seerangan, Koteeswaran; Govindaraju, Tamilmani; Abi, Neeba Eralil; Balusamy, Balamurugan; Selvarajan, Shitharth.

Sci Rep ; 14(1): 10280, 2024 05 04.

Artículo en Inglés | MEDLINE | ID: mdl-38704423

RESUMEN

In modern healthcare, integrating Artificial Intelligence (AI) and Internet of Medical Things (IoMT) is highly beneficial and has made it possible to effectively control disease using networks of interconnected sensors worn by individuals. The purpose of this work is to develop an AI-IoMT framework for identifying several of chronic diseases form the patients' medical record. For that, the Deep Auto-Optimized Collaborative Learning (DACL) Model, a brand-new AI-IoMT framework, has been developed for rapid diagnosis of chronic diseases like heart disease, diabetes, and stroke. Then, a Deep Auto-Encoder Model (DAEM) is used in the proposed framework to formulate the imputed and preprocessed data by determining the fields of characteristics or information that are lacking. To speed up classification training and testing, the Golden Flower Search (GFS) approach is then utilized to choose the best features from the imputed data. In addition, the cutting-edge Collaborative Bias Integrated GAN (ColBGaN) model has been created for precisely recognizing and classifying the types of chronic diseases from the medical records of patients. The loss function is optimally estimated during classification using the Water Drop Optimization (WDO) technique, reducing the classifier's error rate. Using some of the well-known benchmarking datasets and performance measures, the proposed DACL's effectiveness and efficiency in identifying diseases is evaluated and compared.

Asunto(s)

Inteligencia Artificial , Internet de las Cosas , Humanos , Pronóstico , Aprendizaje Profundo , Enfermedad Crónica , Algoritmos

9.

A novel machine learning-based imputation strategy for missing data in step-stress accelerated degradation test.

Li, Yaqiu; Zhou, Qijie; Fan, Ye; Pan, Guangze; Dai, Zongbei; Lei, Baimao.

Heliyon ; 10(4): e26429, 2024 Feb 29.

Artículo en Inglés | MEDLINE | ID: mdl-38434061

RESUMEN

The presence of missing data is a significant data quality issue that negatively impacts the accuracy and reliability of data analysis. This issue is especially relevant in the context of accelerated tests, particularly for step-stress accelerated degradation tests. While missing data can occur due to objective factors or human error, high missing rate is an inevitable pattern of missing data that will occur during the conversion process of accelerated test data. This type of missing data manifests as a degradation dataset with unequal measuring intervals. Therefore, developing a more appropriate imputation method for accelerated test data is essential. In this study, we propose a novel hybrid imputation method that combines the LSSVM and RBF models to address missing data problems. A comparison is conducted between the proposed model and various traditional and machine learning imputation methods using simulation data, to justify the advantages of the proposed model over the existing methods. Finally, the proposed model is implemented on real degradation datasets of the super-luminescent diode (SLD) to validate its performance and effectiveness in dealing with missing data in step-stress accelerated degradation test. Additionally, due to the generalizability of the proposed method, it is expected to be applicable in other scenarios with high missing data rates.

10.

Predicting the Climate Impact of Healthcare Facilities Using Gradient Boosting Machines.

Yin, Hao; Sharma, Bhavna; Hu, Howard; Liu, Fei; Kaur, Mehak; Cohen, Gary; McConnell, Rob; Eckel, Sandrah P.

Clean Environ Syst ; 122024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38444563

RESUMEN

Health care accounts for 9-10% of greenhouse gas (GHG) emissions in the United States. Strategies for monitoring these emissions at the hospital level are needed to decarbonize the sector. However, data collection to estimate emissions is challenging, especially for smaller hospitals. We explored the potential of gradient boosting machines (GBM) to impute missing data on resource consumption in the 2020 survey of a consortium of 283 hospitals participating in Practice Greenhealth. GBM imputed missing values for selected variables in order to predict electricity use and beef consumption (R2=0.82) and anesthetic gas desflurane use (R2=0.51), using administrative data readily available for most hospitals. After imputing missing consumption data, estimated GHG emissions associated with these three examples totaled over 3 million metric tons of CO2 equivalent emissions (MTCO2e). Specifically, electricity consumption had the largest total carbon footprint (2.4 MTCO2e), followed by beef (0.6 million MTCO2e) and desflurane consumption (0.03 million MTCO2e) across the 283 hospitals. The approach should be applicable to other sources of hospital GHGs in order to estimate total emissions of individual hospitals and to refine survey questions to help develop better intervention strategies.

11.

Enhancing environmental data imputation: A physically-constrained machine learning framework.

Pastorini, Marcos; Rodríguez, Rafael; Etcheverry, Lorena; Castro, Alberto; Gorgoglione, Angela.

Sci Total Environ ; 926: 171773, 2024 May 20.

Artículo en Inglés | MEDLINE | ID: mdl-38522546

RESUMEN

In water resources management, new computational capabilities have made it possible to develop integrated models to jointly analyze climatic conditions and water quantity/quality of the entire watershed system. Although the value of this integrated approach has been demonstrated so far, the limited availability of field data may hinder its applicability by causing high uncertainty in the model response. In this context, before collecting additional data, it is recommended first to recognize what improvement in model performance would occur if all available records could be well exploited. This work proposes a novel machine learning framework with physical constraints capable of successfully imputing a high percentage of missing data belonging to several environmental domains (meteorology, water quantity, water quality), yielding satisfactory results. In particular, the minimum NSE computed for meteorologic variables is 0.72. For hydrometric variables, NSE is always >0.97. More than 78 % of the physical-water-quality variables is characterized by NSE > 0.45, and >66 % of the chemical-water quality variables reaches NSE > 0.35. This work's results demonstrate the proposed framework's effectiveness as a data augmentation tool to improve the performance of integrated environmental modeling.

12.

Learning the structure of the mTOR protein signaling pathway from protein phosphorylation data.

Salam, Abdul; Grzegorczyk, Marco.

J Appl Stat ; 51(5): 845-865, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38524794

RESUMEN

Statistical learning of the structures of cellular networks, such as protein signaling pathways, is a topical research field in computational systems biology. To get the most information out of experimental data, it is often required to develop a tailored statistical approach rather than applying one of the off-the-shelf network reconstruction methods. The focus of this paper is on learning the structure of the mTOR protein signaling pathway from immunoblotting protein phosphorylation data. Under two experimental conditions eleven phosphorylation sites of eight key proteins of the mTOR pathway were measured at ten non-equidistant time points. For the statistical analysis we propose a new advanced hierarchically coupled non-homogeneous dynamic Bayesian network (NH-DBN) model, and we consider various data imputation methods for dealing with non-equidistant temporal observations. Because of the absence of a true gold standard network, we propose to use predictive probabilities in combination with a leave-one-out cross validation strategy to objectively cross-compare the accuracies of different NH-DBN models and data imputation methods. Finally, we employ the best combination of model and data imputation method for predicting the structure of the mTOR protein signaling pathway.

13.

Self-supervised deep learning of gene-gene interactions for improved gene expression recovery.

Wei, Qingyue; Islam, Md Tauhidul; Zhou, Yuyin; Xing, Lei.

Brief Bioinform ; 25(2)2024 Jan 22.

Artículo en Inglés | MEDLINE | ID: mdl-38349062

RESUMEN

Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool to gain biological insights at the cellular level. However, due to technical limitations of the existing sequencing technologies, low gene expression values are often omitted, leading to inaccurate gene counts. Existing methods, including advanced deep learning techniques, struggle to reliably impute gene expressions due to a lack of mechanisms that explicitly consider the underlying biological knowledge of the system. In reality, it has long been recognized that gene-gene interactions may serve as reflective indicators of underlying biology processes, presenting discriminative signatures of the cells. A genomic data analysis framework that is capable of leveraging the underlying gene-gene interactions is thus highly desirable and could allow for more reliable identification of distinctive patterns of the genomic data through extraction and integration of intricate biological characteristics of the genomic data. Here we tackle the problem in two steps to exploit the gene-gene interactions of the system. We first reposition the genes into a 2D grid such that their spatial configuration reflects their interactive relationships. To alleviate the need for labeled ground truth gene expression datasets, a self-supervised 2D convolutional neural network is employed to extract the contextual features of the interactions from the spatially configured genes and impute the omitted values. Extensive experiments with both simulated and experimental scRNA-seq datasets are carried out to demonstrate the superior performance of the proposed strategy against the existing imputation methods.

Asunto(s)

Aprendizaje Profundo , Epistasis Genética , Análisis de Datos , Genómica , Expresión Génica , Perfilación de la Expresión Génica , Análisis de Secuencia de ARN

14.

Predicting lung cancer survival prognosis based on the conditional survival bayesian network.

Zhong, Lu; Yang, Fan; Sun, Shanshan; Wang, Lijie; Yu, Hong; Nie, Xiushan; Liu, Ailing; Xu, Ning; Zhang, Lanfang; Zhang, Mingjuan; Qi, Yue; Ji, Huaijun; Liu, Guiyuan; Zhao, Huan; Jiang, Yinan; Li, Jingyi; Song, Chengcun; Yu, Xin; Yang, Liu; Yu, Jinchao; Feng, Hu; Guo, Xiaolei; Yang, Fujun; Xue, Fuzhong.

BMC Med Res Methodol ; 24(1): 16, 2024 Jan 22.

Artículo en Inglés | MEDLINE | ID: mdl-38254038

RESUMEN

Lung cancer is a leading cause of cancer deaths and imposes an enormous economic burden on patients. It is important to develop an accurate risk assessment model to determine the appropriate treatment for patients after an initial lung cancer diagnosis. The Cox proportional hazards model is mainly employed in survival analysis. However, real-world medical data are usually incomplete, posing a great challenge to the application of this model. Commonly used imputation methods cannot achieve sufficient accuracy when data are missing, so we investigated novel methods for the development of clinical prediction models. In this article, we present a novel model for survival prediction in missing scenarios. We collected data from 5,240 patients diagnosed with lung cancer at the Weihai Municipal Hospital, China. Then, we applied a joint model that combined a BN and a Cox model to predict mortality risk in individual patients with lung cancer. The established prognostic model achieved good predictive performance in discrimination and calibration. We showed that combining the BN with the Cox proportional hazards model is highly beneficial and provides a more efficient tool for risk prediction.

Asunto(s)

Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico , Teorema de Bayes , Pronóstico , Calibración , China/epidemiología

15.

Multiple augmented reduced rank regression for pan-cancer analysis.

Wang, Jiuzhou; Lock, Eric F.

Biometrics ; 80(1)2024 Jan 29.

Artículo en Inglés | MEDLINE | ID: mdl-38281771

RESUMEN

Statistical approaches that successfully combine multiple datasets are more powerful, efficient, and scientifically informative than separate analyses. To address variation architectures correctly and comprehensively for high-dimensional data across multiple sample sets (ie, cohorts), we propose multiple augmented reduced rank regression (maRRR), a flexible matrix regression and factorization method to concurrently learn both covariate-driven and auxiliary structured variations. We consider a structured nuclear norm objective that is motivated by random matrix theory, in which the regression or factorization terms may be shared or specific to any number of cohorts. Our framework subsumes several existing methods, such as reduced rank regression and unsupervised multimatrix factorization approaches, and includes a promising novel approach to regression and factorization of a single dataset (aRRR) as a special case. Simulations demonstrate substantial gains in power from combining multiple datasets, and from parsimoniously accounting for all structured variations. We apply maRRR to gene expression data from multiple cancer types (ie, pan-cancer) from The Cancer Genome Atlas, with somatic mutations as covariates. The method performs well with respect to prediction and imputation of held-out data, and provides new insights into mutation-driven and auxiliary variations that are shared or specific to certain cancer types.

Asunto(s)

Neoplasias , Humanos , Análisis Multivariante , Neoplasias/genética

16.

Imputing missing sleep data from wearables with neural networks in real-world settings.

Lee, Minki P; Hoang, Kien; Park, Sungkyu; Song, Yun Min; Joo, Eun Yeon; Chang, Won; Kim, Jee Hyun; Kim, Jae Kyoung.

Sleep ; 47(1)2024 01 11.

Artículo en Inglés | MEDLINE | ID: mdl-37819273

RESUMEN

Sleep is a critical component of health and well-being but collecting and analyzing accurate longitudinal sleep data can be challenging, especially outside of laboratory settings. We propose a simple neural network model titled SOMNI (Sleep data restOration using Machine learning and Non-negative matrix factorIzation [NMF]) for imputing missing rest-activity data from actigraphy, which can enable clinicians to better handle missing data and monitor sleep-wake cycles of individuals with highly irregular sleep-wake patterns. The model consists of two hidden layers and uses NMF to capture hidden longitudinal sleep-wake patterns of individuals with disturbed sleep-wake cycles. Based on this, we develop two approaches: the individual approach imputes missing data based on the data from only one participant, while the global approach imputes missing data based on the data across multiple participants. Our models are tested with shift and non-shift workers' data from three independent hospitals. Both approaches can accurately impute missing data up to 24 hours of long dataset (>50 days) even for shift workers with extremely irregular sleep-wake patterns (AUCâ>â0.86). On the other hand, for short dataset (~15 days), only the global model is accurate (AUCâ>â0.77). Our approach can be used to help clinicians monitor sleep-wake cycles of patients with sleep disorders outside of laboratory settings without relying on sleep diaries, ultimately improving sleep health outcomes.

Asunto(s)

Trastornos del Sueño del Ritmo Circadiano , Dispositivos Electrónicos Vestibles , Humanos , Sueño , Redes Neurales de la Computación , Algoritmos , Descanso , Actigrafía

17.

Nonparametric tensor ring decomposition with scalable amortized inference.

Tao, Zerui; Tanaka, Toshihisa; Zhao, Qibin.

Neural Netw ; 169: 431-441, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-37931474

RESUMEN

Multi-dimensional data are common in many applications, such as videos and multi-variate time series. While tensor decomposition (TD) provides promising tools for analyzing such data, there still remains several limitations. First, traditional TDs assume multi-linear structures of the latent embeddings, which greatly limits their expressive power. Second, TDs cannot be straightforwardly applied to datasets with massive samples. To address these issues, we propose a nonparametric TD with amortized inference networks. Specifically, we establish a non-linear extension of tensor ring decomposition, using neural networks, to model complex latent structures. To jointly model the cross-sample correlations and physical structures, a matrix Gaussian process (GP) prior is imposed over the core tensors. From learning perspective, we develop a VAE-like amortized inference network to infer the posterior of core tensors corresponding to new tensor data, which enables TDs to be applied to large datasets. Our model can be also viewed as a kind of decomposition of VAE, which can additionally capture hidden tensor structure and enhance the expressiveness power. Finally, we derive an evidence lower bound such that a scalable optimization algorithm is developed. The advantages of our method have been evaluated extensively by data imputation on the Healing MNIST dataset and four multi-variate time series data.

Asunto(s)

Algoritmos , Aprendizaje , Redes Neurales de la Computación , Distribución Normal , Factores de Tiempo

18.

A 10-year Metocean dataset for Laguna Madre, Texas, including for the Study of Extreme Cold Events.

White, Miranda C; Vicens-Miquel, Marina; Tissot, Philippe; Krell, Evan.

Data Brief ; 52: 109828, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38105904

RESUMEN

Coastal observations along the Texas coast are valuable for many stakeholders in diverse domains. However, the management of the collected data has been limited, creating gaps in hydrological and atmospheric datasets. Among these, water and air temperature measurements are particularly crucial for water temperature predictions, especially during freeze events. These events can pose a serious threat to endangered sea turtles and economically valuable fish, which can succumb to hypothermic stunning, making them vulnerable to cold-related illness or death. Reliable and complete water and air temperature measurements are needed to provide accurate predictions of when cold-stunning events occur. To address these concerns, the focus of this paper is to describe the method used to create a complete 10-year dataset that is representative of the upper Laguna Madre, TX using multiple stations and various gap-filling methods. The raw datasets consist of a decade's worth of air and water temperature measurements within the Upper Laguna Madre from 2012 to 2022 extracted from the archives of the Texas Coastal Ocean Observation Network and the National Park Service. Large portions of data from the multiple stations were missing from the raw datasets, therefore a systematic gap-filling approach was designed and applied to create a near-continuous dataset. The proposed imputation method consists of three steps, starting with a short gap interpolation method, followed by a long gap-filling process using nearby stations, and finalized by a second short gap interpolation method. This systematic data imputation approach was evaluated by creating random artificial gaps within the original datasets, filling them using the proposed data imputation method, and assessing the viability of the proposed methods using various performance metrics. The evaluation results help to ensure the reliability of the newly imputed dataset and the effectiveness of the data imputation method. The newly created dataset is a valuable resource that transcends the local cold-stunning issue, offering viable utility for analyzing temporal variability of air and water temperatures, exploring temperature interdependencies, reducing forecasting uncertainties, and refining natural resource and weather advisory decision-making processes. The cleaned dataset with minimal gaps (<2%) is ready and convenient for artificial intelligence and machine learning applications.

19.

A Missing Traffic Data Imputation Method Based on a Diffusion Convolutional Neural Network-Generative Adversarial Network.

Zhang, Chenchen; Zhou, Lei; Xiao, Xuemei; Xu, Dongwei.

Sensors (Basel) ; 23(23)2023 Dec 04.

Artículo en Inglés | MEDLINE | ID: mdl-38067974

RESUMEN

Traffic state data are key to the proper operation of intelligent transportation systems (ITS). However, traffic detectors often receive environmental factors that cause missing values in the collected traffic state data. Therefore, aiming at the above problem, a method for imputing missing traffic state data based on a Diffusion Convolutional Neural Network-Generative Adversarial Network (DCNN-GAN) is proposed in this paper. The proposed method uses a graph embedding algorithm to construct a road network structure based on spatial correlation instead of the original road network structure; through the use of a GAN for confrontation training, it is possible to generate missing traffic state data based on the known data of the road network. In the generator, the spatiotemporal features of the reconstructed road network are extracted by the DCNN to realize the imputation. Two real traffic datasets were used to verify the effectiveness of this method, with the results of the proposed model proving better than those of the other models used for comparison.

20.

Smoother: a unified and modular framework for incorporating structural dependency in spatial omics data.

Su, Jiayu; Reynier, Jean-Baptiste; Fu, Xi; Zhong, Guojie; Jiang, Jiahao; Escalante, Rydberg Supo; Wang, Yiping; Aparicio, Luis; Izar, Benjamin; Knowles, David A; Rabadan, Raul.

Genome Biol ; 24(1): 291, 2023 12 18.

Artículo en Inglés | MEDLINE | ID: mdl-38110959

RESUMEN

Spatial omics technologies can help identify spatially organized biological processes, but existing computational approaches often overlook structural dependencies in the data. Here, we introduce Smoother, a unified framework that integrates positional information into non-spatial models via modular priors and losses. In simulated and real datasets, Smoother enables accurate data imputation, cell-type deconvolution, and dimensionality reduction with remarkable efficiency. In colorectal cancer, Smoother-guided deconvolution reveals plasma cell and fibroblast subtype localizations linked to tumor microenvironment restructuring. Additionally, joint modeling of spatial and single-cell human prostate data with Smoother allows for spatial mapping of reference populations with significantly reduced ambiguity.

Asunto(s)

Fibroblastos , Próstata , Humanos , Masculino , Microambiente Tumoral

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA