Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Más filtros











Base de datos
Asunto principal
Intervalo de año de publicación
1.
Patterns (N Y) ; 5(8): 101024, 2024 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-39233696

RESUMEN

In the rapidly evolving field of bioimaging, the integration and orchestration of findable, accessible, interoperable, and reusable (FAIR) image analysis workflows remains a challenge. We introduce BIOMERO (bioimage analysis in OMERO), a bridge connecting OMERO, a renowned bioimaging data management platform; FAIR workflows; and high-performance computing (HPC) environments. BIOMERO facilitates seamless execution of FAIR workflows, particularly for large datasets from high-content or high-throughput screening. BIOMERO empowers researchers by eliminating the need for specialized knowledge, enabling scalable image processing directly from OMERO. BIOMERO notably supports the sharing and utilization of FAIR workflows between OMERO, Cytomine/BIAFLOWS, and other bioimaging communities. BIOMERO will promote the widespread adoption of FAIR workflows, emphasizing reusability, across the realm of bioimaging research. Its user-friendly interface will empower users, including those without technical expertise, to seamlessly apply these workflows to their datasets, democratizing the utilization of AI by the broader research community.

2.
Rep Pract Oncol Radiother ; 27(5): 832-841, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36523798

RESUMEN

Background: Monte Carlo simulation is generally appreciated as an extraordinary technique to investigate particle physics processes and interactions in nuclear medicine and Radiation Therapy. The present task validates a new methodology of Monte Carlo simulation based on the Multithreading technique to reduce CPU time to simulate a 6 MV photon beam provided by the Elekta Synergy MLCi2 platform medical linear accelerator treatment head utilizing TOpas version 3.6 Monte Carlo software and the Slurm Marwan cluster. Materials and methods: The simulation includes the linear accelerator (LINAC) major components. Calculations are performed for the photon beam with several treatment field sizes varying from 3 × 3 to 10 × 10 cm2 at a 100 cm of distance from the source to the surface of the IBA dosimetry water box. The simulation was wholly approved by comparison with experimental distributions. To evaluate simulation accuracy, gamma index formalism for (2%/2mm) and (3%/2mm) criteria, Distance To Agreement DTA, and the estimator standard error ɛ and ɛ max are used. Results: Good agreement between simulations and measurements was observed for depth doses and lateral dose profiles, respectively. The gamma index comparisons also highlighted this agreement; more than 97% of the points for all simulations satisfy the quality assurance criteria of (2%/2mm). Regarding calculation performance, the event processing speed is faster using Gate-[mp] compared to TOpas-[mt] mode when running the identical simulation code for both. Conclusions: Consequently, according to the achieved results, the proposed methodology shows the first validation of TOpas in radiotherapy linacs simulations and a reduction in calculation time, capping simulation accuracy as much as possible. For this reason, this software is recommended to be serviceable for Treatment Planning Systems (TPS) purposes.

3.
J Proteome Res ; 21(11): 2810-2814, 2022 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-36201825

RESUMEN

Combining robust proteomics instrumentation with high-throughput enabling liquid chromatography (LC) systems (e.g., timsTOF Pro and the Evosep One system, respectively) enabled mapping the proteomes of 1000s of samples. Fragpipe is one of the few computational protein identification and quantification frameworks that allows for the time-efficient analysis of such large data sets. However, it requires large amounts of computational power and data storage space that leave even state-of-the-art workstations underpowered when it comes to the analysis of proteomics data sets with 1000s of LC mass spectrometry runs. To address this issue, we developed and optimized a Fragpipe-based analysis strategy for a high-performance computing environment and analyzed 3348 plasma samples (6.4 TB) that were longitudinally collected from hospitalized COVID-19 patients under the auspice of the Immunophenotyping Assessment in a COVID-19 Cohort (IMPACC) study. Our parallelization strategy reduced the total runtime by ∼90% from 116 (theoretical) days to just 9 days in the high-performance computing environment. All code is open-source and can be deployed in any Simple Linux Utility for Resource Management (SLURM) high-performance computing environment, enabling the analysis of large-scale high-throughput proteomics studies.


Asunto(s)
COVID-19 , Humanos , Cromatografía Liquida/métodos , Proteómica/métodos , Espectrometría de Masas/métodos , Proteoma/análisis
4.
Proc IEEE Int Conf Big Data ; 2021: 4113-4118, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-36745144

RESUMEN

This paper presents a novel use case of Graph Convolutional Network (GCN) learning representations for predictive data mining, specifically from user/task data in the domain of high-performance computing (HPC). It outlines an approach based on a coalesced data set: logs from the Slurm workload manager, joined with user experience survey data from computational cluster users. We introduce a new method of constructing a heterogeneous unweighted HPC graph consisting of multiple typed nodes after revealing the manifold relations between the nodes. The GCN structure used here supports two tasks: i) determining whether a job will complete or fail and ii) predicting memory and CPU requirements by training the GCN semi-supervised classification model and regression models on the generated graph. The graph is partitioned into partitions using graph clustering. We conducted classification and regression experiments using the proposed framework on our HPC log dataset and evaluated predictions by our trained models against baselines using test_score, F1-score, precision, recall for classification, and R1 score for regression, showing that our framework achieves significant improvements.

5.
Artículo en Inglés | MEDLINE | ID: mdl-36760802

RESUMEN

Determining resource allocations (memory and time) for submitted jobs in High Performance Computing (HPC) systems is a challenging process even for computer scientists. HPC users are highly encouraged to overestimate resource allocation for their submitted jobs, so their jobs will not be killed due to insufficient resources. Overestimating resource allocations occurs because of the wide variety of HPC applications and environment configuration options, and the lack of knowledge of the complex structure of HPC systems. This causes a waste of HPC resources, a decreased utilization of HPC systems, and increased waiting and turnaround time for submitted jobs. In this paper, we introduce our first ever implemented fully-offline, fully-automated, stand-alone, and open-source Machine Learning (ML) tool to help users predict memory and time requirements for their submitted jobs on the cluster. Our tool involves implementing six ML discriminative models from the scikit-learn and Microsoft LightGBM applied on the historical data (sacct data) from Simple Linux Utility for Resource Management (Slurm). We have tested our tool using historical data (saact data) using HPC resources of Kansas State University (Beocat), which covers the years from January 2019 - March 2021, and contains around 17.6 million jobs. Our results show that our tool achieves high predictive accuracy R 2 (0.72 using LightGBM for predicting the memory and 0.74 using Random Forest for predicting the time), helps dramatically reduce computational average waiting-time and turnaround time for the submitted jobs, and increases utilization of the HPC resources. Hence, our tool decreases the power consumption of the HPC resources.

6.
Artículo en Inglés | MEDLINE | ID: mdl-35373221

RESUMEN

In this paper, we present a novel methodology for predicting job resources (memory and time) for submitted jobs on HPC systems. Our methodology based on historical jobs data (saccount data) provided from the Slurm workload manager using supervised machine learning. This Machine Learning (ML) prediction model is effective and useful for both HPC administrators and HPC users. Moreover, our ML model increases the efficiency and utilization for HPC systems, thus reduce power consumption as well. Our model involves using Several supervised machine learning discriminative models from the scikit-learn machine learning library and LightGBM applied on historical data from Slurm. Our model helps HPC users to determine the required amount of resources for their submitted jobs and make it easier for them to use HPC resources efficiently. This work provides the second step towards implementing our general open source tool towards HPC service providers. For this work, our Machine learning model has been implemented and tested using two HPC providers, an XSEDE service provider (University of Colorado-Boulder (RMACC Summit) and Kansas State University (Beocat)). We used more than two hundred thousand jobs: one-hundred thousand jobs from SUMMIT and one-hundred thousand jobs from Beocat, to model and assess our ML model performance. In particular we measured the improvement of running time, turnaround time, average waiting time for the submitted jobs; and measured utilization of the HPC clusters. Our model achieved up to 86% accuracy in predicting the amount of time and the amount of memory for both SUMMIT and Beocat HPC resources. Our results show that our model helps dramatically reduce computational average waiting time (from 380 to 4 hours in RMACC Summit and from 662 hours to 28 hours in Beocat); reduced turnaround time (from 403 to 6 hours in RMACC Summit and from 673 hours to 35 hours in Beocat); and acheived up to 100% utilization for both HPC resources.

7.
PEARC19 (2019) ; 20192019 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35308798

RESUMEN

High-Performance Computing (HPC) systems are resources utilized for data capture, sharing, and analysis. The majority of our HPC users come from other disciplines than Computer Science. HPC users including computer scientists have difficulties and do not feel proficient enough to decide the required amount of resources for their submitted jobs on the cluster. Consequently, users are encouraged to over-estimate resources for their submitted jobs, so their jobs will not be killing due insufficient resources. This process will waste and devour HPC resources; hence, this will lead to inefficient cluster utilization. We created a supervised machine learning model and integrated it into the Slurm resource manager simulator to predict the amount of required memory resources (Memory) and the required amount of time to run the computation. Our model involves using different machine learning algorithms. Our goal is to integrate and test the proposed supervised machine learning model on Slurm. We used over 10000 tasks selected from our HPC log files to evaluate the performance and the accuracy of our integrated model. The purpose of our work is to increase the performance of the Slurm by predicting the amount of require jobs memory resources and the time required for each particular job in order to improve the utilization of the HPC system using our integrated supervised machine learning model. Our results indicate that for larger jobs our model helps dramatically reduce computational turnaround time (from five days to ten hours for large jobs), substantially increased utilization of the HPC system, and decreased the average waiting time for the submitted jobs.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA