Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 142
Filtrar
1.
Sci Rep ; 14(1): 21277, 2024 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-39261633

RESUMEN

The wild horse optimizer (WHO) is a novel metaheuristic algorithm, which has been successfully applied to solving continuous engineering problems. Considering the characteristics of the wild horse optimizer, a discrete version of the algorithm, named discrete wild horse optimizer (DWHO), is proposed to solve the capacitated vehicle routing problem (CVRP). By incorporating three local search strategies-swap operation, reverse operation, and insertion operation-along with the introduction of the largest-order-value (LOV) decoding technique, the precision and quality of the solutions have been enhanced. Experimental results conducted on 44 benchmark instances indicate that, in most test cases, the solving capability of discrete wild horse optimizer surpasses that of basic wild horse optimizer (BWHO), hybrid firefly algorithm, dynamic space reduction ant colony optimization (DSRACO), and discrete artificial ecosystem-based optimization (DAEO). The discrete wild horse optimizer provides a novel approach for solving the capacitated vehicle routing problem and also offers a new perspective for addressing other discrete problems.

2.
MethodsX ; 13: 102928, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-39286437

RESUMEN

Two-stage stratified sampling is a complex design that involves nested sampling units and stratification. This complexity increases when the strata have too few sampled units for variance estimation, necessitating the use of collapsed strata, where multiple strata are combined to ensure an adequate sample size. When collapsing strata, two cases can be distinguished depending on whether a size variable associated with the variable of interest is available at the stratum level.•We present computer-implementable formulas for total, mean, and ratio estimators, along with their corresponding sampling variance estimators, for stratified two-stage simple random sampling without replacement, and we provide ready-to-use algorithms.•We introduce two methods for grouping strata: (1) a deterministic approach that uses stratum codes to define an ordinal variable, which orders the strata, and (2) a stochastic method that aims to minimize within-group inertia, which measures the heterogeneity within the newly formed groups of strata.•We emphasize that, unlike the correlation between a size variable and the variable of interest at the stratum level, the bias of the sampling variance estimator for the collapsed strata technique is not invariant to linear transformations. It follows that a high correlation does not ensure a low-bias estimator of the sampling variance.

3.
Neural Netw ; 180: 106649, 2024 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-39236410

RESUMEN

Selecting a set of initial users from a social network in order to maximize the envisaged number of influenced users is known as influence maximization (IM). Researchers have achieved significant advancements in the theoretical design and performance gain of several classical approaches, but these advances are almost reaching their pinnacle. Learning-based IM approaches have emerged recently with a higher generalization to unknown graphs than conventional methods. The development of learning-based IM methods is still constrained by a number of fundamental hardships, including (1) solving the objective function efficiently, (2) struggling to characterize the diverse underlying diffusion patterns, and (3) adapting the solution to different node-centrality-constrained IM variants. To address the aforementioned issues, we design a novel framework DeepIM for generatively characterizing the latent representation of seed sets, as well as learning the diversified information diffusion pattern in a data-driven and end-to-end way. Subsequently, we design a novel objective function to infer optimal seed sets under flexible node-centrality-based budget constraints. Extensive analyses are conducted over both synthetic and real-world datasets to demonstrate the overall performance of DeepIM.

4.
Adv Mater ; : e2410191, 2024 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-39194394

RESUMEN

Due to its area and energy efficiency, a memristive crossbar array (CBA) has been extensively studied for various combinatorial optimization applications, from network problems to circuit design. However, conventional approaches include heavily burdening software fine-tuning for the annealing process. Instead, this study introduces the "in-materia annealing" method, where the inter-layer interference of vertically stacked memristive CBA is utilized as an annealing method. When mapping combinatorial optimization problems into the configuration layer of the CBA, exponentially decaying annealing profiles are generated in nearby noise layers. Moreover, in-materia annealing profiles can be controlled by changing compliance current, read voltage, and read pulse width. Therefore, the annealing profiles can be arbitrarily controlled and generated individually for each cell, providing rich noise sources to solve the problem efficiently. Consequently, the experimental and simulation of Max-Cut and weighted Max-Cut problems achieve notable results with the minimum software burden.

5.
Sci Rep ; 14(1): 19768, 2024 Aug 26.
Artículo en Inglés | MEDLINE | ID: mdl-39187613

RESUMEN

Many emerging commercial services are based on the sharing or pooling of resources for common use with the aim of reducing costs. Businesses such as delivery-, mobility-, or transport-as-a-service have become standard in many parts of the world, fulfilling on-demand requests for customers in live settings. However, it is known that many of these problems are NP-hard, and therefore both modeling and solving them accurately is a challenge. Here we focus on one such routing problem, the Ride Pooling Problem (RPP), where multiple customers can request on-demand pickups and drop-offs from shared vehicles within a fleet. The combinatorial optimization task is to optimally pool customer requests using the limited set of vehicles, akin to a small-scale flexible bus route. In this work, we propose a quadratic unconstrained binary optimization (QUBO) program and introduce efficient formulation methods for the RPP to be solved using metaheuristics, and specifically emerging quantum optimization algorithms.

6.
Nanotechnology ; 35(46)2024 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-39142322

RESUMEN

Solving certain combinatorial optimization problems like Max-Cut becomes challenging once the graph size and edge connectivity increase beyond a threshold, with brute-force algorithms which solve such problems exactly on conventional digital computers having the bottleneck of exponential time complexity. Hence currently, such problems are instead solved approximately using algorithms like Goemans-Williamson (GW) algorithm, run on conventional computers with polynomial time complexity. Phase binarized oscillators (PBOs), also often known as oscillator Ising machines, have been proposed as an alternative to solve such problems. In this paper, restricting ourselves to the combinatorial optimization problem Max-Cut solved on three kinds of graphs (Mobius Ladder, random cubic, Erdös Rényi) up to 100 nodes, we empirically show that computation time/time to solution (TTS) for PBOs (captured through Kuramoto model) grows at a much lower rate (logarithmically:O(log(N)), with respect to graph sizeN) compared to GW algorithm, for which TTS increases as square of graph size (O(N2)). However, Kuramoto model being a physics-agnostic mathematical model, this time complexity/ TTS trend for PBOs is a general trend and is device-physics agnostic. So for more specific results, we choose spintronic oscillators, known for their high operating frequency (in GHz), and model them through Slavin's model which captures the physics of their coupled magnetization oscillation dynamics. We thereby empirically show that TTS of spintronic oscillators also grows logarithmically with graph size (O(log(N)), while their accuracy is comparable to that of GW. So spintronic oscillators have improved time complexity over GW algorithm. For large graphs, they are expected to compute Max-Cut values much faster than GW algorithm, as well as other oscillators operating at lower frequencies, while maintaining the same level of accuracy.

7.
Math Program ; 207(1-2): 515-549, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39132467

RESUMEN

The Connectivity Augmentation Problem (CAP) together with a well-known special case thereof known as the Tree Augmentation Problem (TAP) are among the most basic Network Design problems. There has been a surge of interest recently to find approximation algorithms with guarantees below 2 for both TAP and CAP, culminating in the currently best approximation factor for both problems of 1.393 through quite sophisticated techniques. We present a new and arguably simple matching-based method for the well-known special case of leaf-to-leaf instances. Combining our work with prior techniques, we readily obtain a ( 4 / 3 + ε ) -approximation for Leaf-to-Leaf CAP by returning the better of our solution and one of an existing method. Prior to our work, a 4 / 3 -guarantee was only known for Leaf-to-Leaf TAP instances on trees of height 2. Moreover, when combining our technique with a recently introduced stack analysis approach, which is part of the above-mentioned 1.393-approximation, we can further improve the approximation factor to 1.29, obtaining for the first time a factor below 4 3 for a nontrivial class of TAP/CAP instances.

8.
ISA Trans ; 153: 57-69, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39127555

RESUMEN

The increasing role of unmanned aerial vehicle (UAV) swarms in modern warfare poses a significant challenge to ground and air defense systems. Considering complex terrain environments and multi-sensor resources including radar and photoelectric systems constraints, a novel multi-sensor dynamic scheduling algorithm is proposed in this paper. Firstly, a transmission model with Fresnel zone under complex terrain and sensor models for radar/photoelectric systems are established. Considering the constraints of 6 factors, such as pitch angle, array scanning angle and threat levels, a detection model is developed subsequently. Secondly, to meet the real-time requirements of ground and air defense systems, a fast calculation method for Fresnel zone clearance using adaptive buffer is achieved. Thirdly, an improved Hungarian algorithm is proposed to solve the combinatorial optimization problem of sensor scheduling. Finally, simulation experiments are conducted to evaluate the algorithm performance under different conditions. The results demonstrate that the proposed approach significantly reduces the sensor switching rate while achieving a high sensor-UAV matching rate and high-threat matching rate. Furthermore, the simulation results verify the effectiveness of the proposed algorithm when applied to multi-sensor scheduling for defending UAV swarms.

9.
J Environ Manage ; 366: 121914, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39043090

RESUMEN

Food Supply Chains (FSCs) have become increasingly complex with the average distance between producers and consumers rising considerably in the past two decades. Consequently, FSCs are a major source of carbon emissions and reducing transportation costs a major challenge for businesses. To address this, we present a mathematical model to promote the three core dimensions of sustainability (economic, environmental, and social), based on the Mixed-Integer Linear Programming (MILP) method. The model addresses the environmental dimension by intending to decrease the carbon emissions of different transport modes involved in the logistics network. Several supply chain network characteristics are incorporated and evaluated, with a consideration of social sustainability (job generation from operating various facilities). The mathematical model's robustness is demonstrated by testing and deploying it to a variety of problem instances. A real-life case study (Norwegian salmon supply chain) helps to comprehend the model's applicability. To understand the importance of optimizing food supply networks holistically, the paper investigates the impact of multiple supply chain permutations on total cost, demand fluctuations and carbon emissions. To address fluctuations in retail demand, we undertook sensitivity analysis for variations in demand, enabling the proposed model to revamp Norway's salmon supply chain network. Subsequently, the results are thoroughly examined to identify managerial implications.


Asunto(s)
Abastecimiento de Alimentos , Salmón , Animales , Noruega , Modelos Teóricos , Conservación de los Recursos Naturales
10.
N Biotechnol ; 83: 26-35, 2024 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-38936658

RESUMEN

D-1,2,4-butanetriol (BT) is a widely used fine chemical that can be manufactured by engineered Escherichia coli expressing heterologous pathways and using xylose as a substrate. The current study developed a glucose-xylose dual metabolic channel system in an engineered E. coli and Combinatorially optimized it using multiple strategies to promote BT production. The carbon catabolite repression effects were alleviated by deleting the gene ptsG that encodes the major glucose transporter IICBGlc and mutating the gene crp that encodes the catabolite repressor protein, thereby allowing C-fluxes of both glucose and xylose into their respective metabolic channels separately and simultaneously, which increased BT production by 33% compared with that of the original MJ133K-1 strain. Then, the branch metabolic pathways of intermediates in the BT channel were investigated, the transaminase HisC, the ketoreductases DlD, OLD, and IlvC, and the aldolase MhpE and YfaU were identified as the enzymes for the branched metabolism of 2-keto-3-deoxy-xylonate, deletion of the gene hisC increased BT titer by 21.7%. Furthermore, the relationship between BT synthesis and the intracellular NADPH level was examined, and deletion of the gene pntAB that encodes a transhydrogenase resulted in an 18.1% increase in BT production. The combination of the above approaches to optimize the metabolic network increased BT production by 47.5%, resulting in 2.67 g/L BT in 24 deep-well plates. This study provides insights into the BT biosynthesis pathway and demonstrates effective strategies to increase BT production, which will promote the industrialization of the biosynthesis of BT.


Asunto(s)
Escherichia coli , Glucosa , Ingeniería Metabólica , Xilosa , Escherichia coli/metabolismo , Escherichia coli/genética , Xilosa/metabolismo , Glucosa/metabolismo , Butanoles/metabolismo , Proteínas de Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética
11.
Adv Sci (Weinh) ; 11(26): e2310096, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38696663

RESUMEN

Combinatorial optimization (CO) has a broad range of applications in various fields, including operations research, computer science, and artificial intelligence. However, many of these problems are classified as nondeterministic polynomial-time (NP)-complete or NP-hard problems, which are known for their computational complexity and cannot be solved in polynomial time on traditional digital computers. To address this challenge, continuous-time Ising machine solvers have been developed, utilizing different physical principles to map CO problems to ground state finding. However, most Ising machine prototypes operate at speeds comparable to digital hardware and rely on binarizing node states, resulting in increased system complexity and further limiting operating speed. To tackle these issues, a novel device-algorithm co-design method is proposed for fast sub-optimal solution finding with low hardware complexity. On the device side, a piezoelectric lithium niobate (LiNbO3) microelectromechanical system (MEMS) oscillator network-based Ising machine without second-harmonic injection locking (SHIL) is devised to solve Max-cut and graph coloring problems. The LiNbO3 oscillator operates at speeds greater than 9 GHz, making it one of the fastest oscillatory Ising machines. System-wise, an innovative grouping method is used that achieves a performance guarantee of 0.878 for Max-cut and 0.658 for graph coloring problems, which is comparable to Ising machines that utilize binarization.

12.
Entropy (Basel) ; 26(5)2024 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-38785647

RESUMEN

Protein-ligand docking plays a significant role in structure-based drug discovery. This methodology aims to estimate the binding mode and binding free energy between the drug-targeted protein and candidate chemical compounds, utilizing protein tertiary structure information. Reformulation of this docking as a quadratic unconstrained binary optimization (QUBO) problem to obtain solutions via quantum annealing has been attempted. However, previous studies did not consider the internal degrees of freedom of the compound that is mandatory and essential. In this study, we formulated fragment-based protein-ligand flexible docking, considering the internal degrees of freedom of the compound by focusing on fragments (rigid chemical substructures of compounds) as a QUBO problem. We introduced four factors essential for fragment-based docking in the Hamiltonian: (1) interaction energy between the target protein and each fragment, (2) clashes between fragments, (3) covalent bonds between fragments, and (4) the constraint that each fragment of the compound is selected for a single placement. We also implemented a proof-of-concept system and conducted redocking for the protein-compound complex structure of Aldose reductase (a drug target protein) using SQBM+, which is a simulated quantum annealer. The predicted binding pose reconstructed from the best solution was near-native (RMSD = 1.26 Å), which can be further improved (RMSD = 0.27 Å) using conventional energy minimization. The results indicate the validity of our QUBO problem formulation.

13.
Heliyon ; 10(10): e31297, 2024 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-38818174

RESUMEN

The current best-known performance guarantees for the extensively studied Traveling Salesman Problem (TSP) of determinate approximation algorithms is 32, achieved by Christofides' algorithm 47 years ago. This paper investigates a new generalization problem of the TSP, termed the Minimum-Cost Bounded Degree Connected Subgraph (MBDCS) problem. In the MBDCS problem, the goal is to identify a minimum-cost connected subgraph containing n=|V| edges from an input graph G=(V,E) with degree upper bounds for particular vertices. We show that for certain special cases of MBDCS, the aim is equivalent to finding a minimum-cost Hamiltonian cycle for the input graph, same as the TSP. To appropriately solve MBDCS, we initially present an integer programming formulation for the problem. Subsequently, we propose an algorithm to approximate the optimal solution by applying the iterative rounding technique to solution of the integer programming relaxation. We demonstrate that the returned subgraph of our proposed algorithm is one of the best guarantees for the MBDCS problem in polynomial time, assuming P≠NP. This study views the optimization of TSP as finding a minimum-cost connected subgraph containing n edges with degree upper bounds for certain vertices, and it may provide new insights into optimizing the TSP in future research.

14.
ACS Nano ; 18(16): 10758-10767, 2024 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-38598699

RESUMEN

Neural networks are increasingly used to solve optimization problems in various fields, including operations research, design automation, and gene sequencing. However, these networks face challenges due to the nondeterministic polynomial time (NP)-hard issue, which results in exponentially increasing computational complexity as the problem size grows. Conventional digital hardware struggles with the von Neumann bottleneck, the slowdown of Moore's law, and the complexity arising from heterogeneous system design. Two-dimensional (2D) memristors offer a potential solution to these hardware challenges, with their in-memory computing, decent scalability, and rich dynamic behaviors. In this study, we explore the use of nonvolatile 2D memristors to emulate synapses in a discrete-time Hopfield neural network, enabling the network to solve continuous optimization problems, like finding the minimum value of a quadratic polynomial, and tackle combinatorial optimization problems like Max-Cut. Additionally, we coupled volatile memristor-based oscillators with nonvolatile memristor synapses to create an oscillatory neural network-based Ising machine, a continuous-time analog dynamic system capable of solving combinatorial optimization problems including Max-Cut and map coloring through phase synchronization. Our findings demonstrate that 2D memristors have the potential to significantly enhance the efficiency, compactness, and homogeneity of integrated Ising machines, which is useful for future advances in neural networks for optimization problems.

15.
Biometrika ; 111(1): 171-193, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38352626

RESUMEN

Rooted and ranked phylogenetic trees are mathematical objects that are useful in modelling hierarchical data and evolutionary relationships with applications to many fields such as evolutionary biology and genetic epidemiology. Bayesian phylogenetic inference usually explores the posterior distribution of trees via Markov chain Monte Carlo methods. However, assessing uncertainty and summarizing distributions remains challenging for these types of structures. While labelled phylogenetic trees have been extensively studied, relatively less literature exists for unlabelled trees that are increasingly useful, for example when one seeks to summarize samples of trees obtained with different methods, or from different samples and environments, and wishes to assess the stability and generalizability of these summaries. In our paper, we exploit recently proposed distance metrics of unlabelled ranked binary trees and unlabelled ranked genealogies, or trees equipped with branch lengths, to define the Fréchet mean, variance and interquartile sets as summaries of these tree distributions. We provide an efficient combinatorial optimization algorithm for computing the Fréchet mean of a sample or of distributions on unlabelled ranked tree shapes and unlabelled ranked genealogies. We show the applicability of our summary statistics for studying popular tree distributions and for comparing the SARS-CoV-2 evolutionary trees across different locations during the COVID-19 epidemic in 2020. Our current implementations are publicly available at https://github.com/RSamyak/fmatrix.

16.
EPJ Quantum Technol ; 11(1): 6, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38261853

RESUMEN

In recent years, variational quantum algorithms such as the Quantum Approximation Optimization Algorithm (QAOA) have gained popularity as they provide the hope of using NISQ devices to tackle hard combinatorial optimization problems. It is, however, known that at low depth, certain locality constraints of QAOA limit its performance. To go beyond these limitations, a non-local variant of QAOA, namely recursive QAOA (RQAOA), was proposed to improve the quality of approximate solutions. The RQAOA has been studied comparatively less than QAOA, and it is less understood, for instance, for what family of instances it may fail to provide high-quality solutions. However, as we are tackling NP-hard problems (specifically, the Ising spin model), it is expected that RQAOA does fail, raising the question of designing even better quantum algorithms for combinatorial optimization. In this spirit, we identify and analyze cases where (depth-1) RQAOA fails and, based on this, propose a reinforcement learning enhanced RQAOA variant (RL-RQAOA) that improves upon RQAOA. We show that the performance of RL-RQAOA improves over RQAOA: RL-RQAOA is strictly better on these identified instances where RQAOA underperforms and is similarly performing on instances where RQAOA is near-optimal. Our work exemplifies the potentially beneficial synergy between reinforcement learning and quantum (inspired) optimization in the design of new, even better heuristics for complex problems.

17.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38261343

RESUMEN

Cryo-Electron Microscopy (cryo-EM) is a widely used and effective method for determining the three-dimensional (3D) structure of biological molecules. For ab-initio Cryo-EM 3D reconstruction using single particle analysis (SPA), estimating the projection direction of the projection image is a crucial step. However, the existing SPA methods based on common lines are sensitive to noise. The error in common line detection will lead to a poor estimation of the projection directions and thus may greatly affect the final reconstruction results. To improve the reconstruction results, multiple candidate common lines are estimated for each pair of projection images. The key problem then becomes a combination optimization problem of selecting consistent common lines from multiple candidates. To solve the problem efficiently, a physics-inspired method based on a kinetic model is proposed in this work. More specifically, hypothetical attractive forces between each pair of candidate common lines are used to calculate a hypothetical torque exerted on each projection image in the 3D reconstruction space, and the rotation under the hypothetical torque is used to optimize the projection direction estimation of the projection image. This way, the consistent common lines along with the projection directions can be found directly without enumeration of all the combinations of the multiple candidate common lines. Compared with the traditional methods, the proposed method is shown to be able to produce more accurate 3D reconstruction results from high noise projection images. Besides the practical value, the proposed method also serves as a good reference for solving similar combinatorial optimization problems.


Asunto(s)
Imagenología Tridimensional , Microscopía por Crioelectrón , Cinética
18.
Cell Syst ; 14(12): 1122-1130.e3, 2023 12 20.
Artículo en Inglés | MEDLINE | ID: mdl-38128484

RESUMEN

The efficacy of epitope vaccines depends on the included epitopes as well as the probability that the selected epitopes are presented by the major histocompatibility complex (MHC) proteins of a vaccinated individual. Designing vaccines that effectively immunize a high proportion of the population is challenging because of high MHC polymorphism, diverging MHC-peptide binding affinities, and physical constraints on epitope vaccine constructs. Here, we present HOGVAX, a combinatorial optimization approach for epitope vaccine design. To optimize population coverage within the constraint of limited vaccine construct space, HOGVAX employs a hierarchical overlap graph (HOG) to identify and exploit overlaps between selected peptides and explicitly models the structure of linkage disequilibrium in the MHC. In a SARS-CoV-2 case study, we demonstrate that HOGVAX-designed vaccines contain substantially more epitopes than vaccines built from concatenated peptides and predict vaccine efficacy in over 98% of the population with high numbers of presented peptides in vaccinated individuals.


Asunto(s)
COVID-19 , Vacunas , Humanos , SARS-CoV-2 , COVID-19/prevención & control , Epítopos de Linfocito T , Péptidos
19.
Cell Syst ; 14(12): 1113-1121.e9, 2023 12 20.
Artículo en Inglés | MEDLINE | ID: mdl-38128483

RESUMEN

CRISPR-Cas9-based genome editing combined with single-cell sequencing enables the tracing of the history of cell divisions, or cellular lineage, in tissues and whole organisms. Although standard phylogenetic approaches may be applied to reconstruct cellular lineage trees from this data, the unique features of the CRISPR-Cas9 editing process motivate the development of specialized models that describe the evolution of CRISPR-Cas9-induced mutations. Here, we introduce the "star homoplasy" evolutionary model that constrains a phylogenetic character to mutate at most once along a lineage, capturing the "non-modifiability" property of CRISPR-Cas9 mutations. We derive a combinatorial characterization of star homoplasy phylogenies and use this characterization to develop an algorithm, "Startle", that computes a maximum parsimony star homoplasy phylogeny. We demonstrate that Startle infers more accurate phylogenies on simulated lineage tracing data compared with existing methods and finds parsimonious phylogenies with fewer metastatic migrations on lineage tracing data from mouse metastatic lung adenocarcinoma.


Asunto(s)
Sistemas CRISPR-Cas , Edición Génica , Animales , Ratones , Sistemas CRISPR-Cas/genética , Filogenia , Edición Génica/métodos , Linaje de la Célula/genética , Mutación
20.
BMC Bioinformatics ; 24(1): 431, 2023 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-37964228

RESUMEN

BACKGROUND: Liquid chromatography-mass spectrometry is widely used in untargeted metabolomics for composition profiling. In multi-run analysis scenarios, features of each run are aligned into consensus features by feature alignment algorithms to observe the intensity variations across runs. However, most of the existing feature alignment methods focus more on accurate retention time correction, while underestimating the importance of feature matching. None of the existing methods can comprehensively consider feature correspondences among all runs and achieve optimal matching. RESULTS: To comprehensively analyze feature correspondences among runs, we propose G-Aligner, a graph-based feature alignment method for untargeted LC-MS data. In the feature matching stage, G-Aligner treats features and potential correspondences as nodes and edges in a multipartite graph, considers the multi-run feature matching problem an unbalanced multidimensional assignment problem, and provides three combinatorial optimization algorithms to find optimal matching solutions. In comparison with the feature alignment methods in OpenMS, MZmine2 and XCMS on three public metabolomics benchmark datasets, G-Aligner achieved the best feature alignment performance on all the three datasets with up to 9.8% and 26.6% increase in accurately aligned features and analytes, and helped all comparison software obtain more accurate results on their self-extracted features by integrating G-Aligner to their analysis workflow. G-Aligner is open-source and freely available at https://github.com/CSi-Studio/G-Aligner under a permissive license. Benchmark datasets, manual annotation results, evaluation methods and results are available at https://doi.org/10.5281/zenodo.8313034 CONCLUSIONS: In this study, we proposed G-Aligner to improve feature matching accuracy for untargeted metabolomics LC-MS data. G-Aligner comprehensively considered potential feature correspondences between all runs, converting the feature matching problem as a multidimensional assignment problem (MAP). In evaluations on three public metabolomics benchmark datasets, G-Aligner achieved the highest alignment accuracy on manual annotated and popular software extracted features, proving the effectiveness and robustness of the algorithm.


Asunto(s)
Programas Informáticos , Espectrometría de Masas en Tándem , Cromatografía Liquida/métodos , Espectrometría de Masas en Tándem/métodos , Algoritmos , Metabolómica/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA