Búsqueda | Portal Regional de la BVS

1.

PelviNet: A Collaborative Multi-agent Convolutional Network for Enhanced Pelvic Image Registration.

Zakaria, Rguibi; Abdelmajid, Hajami; Dya, Zitouni; Hakim, Allali.

J Imaging Inform Med ; 2024 Sep 09.

Artículo en Inglés | MEDLINE | ID: mdl-39249582

RESUMEN

PelviNet introduces a groundbreaking multi-agent convolutional network architecture tailored for enhancing pelvic image registration. This innovative framework leverages shared convolutional layers, enabling synchronized learning among agents and ensuring an exhaustive analysis of intricate 3D pelvic structures. The architecture combines max pooling, parametric ReLU activations, and agent-specific layers to optimize both individual and collective decision-making processes. A communication mechanism efficiently aggregates outputs from these shared layers, enabling agents to make well-informed decisions by harnessing combined intelligence. PelviNet's evaluation centers on both quantitative accuracy metrics and visual representations to elucidate agents' performance in pinpointing optimal landmarks. Empirical results demonstrate PelviNet's superiority over traditional methods, achieving an average image-wise error of 2.8 mm, a subject-wise error of 3.2 mm, and a mean Euclidean distance error of 3.0 mm. These quantitative results highlight the model's efficiency and precision in landmark identification, crucial for medical contexts such as radiation therapy, where exact landmark identification significantly influences treatment outcomes. By reliably identifying critical structures, PelviNet advances pelvic image analysis and offers potential enhancements for broader medical imaging applications, marking a significant step forward in computational healthcare.

2.

Adaptive dynamic programming for containment control with robustness analysis to iterative error: A global Nash equilibrium solution.

Chen, Zitao; Chen, Kairui; Wang, Jianhui.

ISA Trans ; : 1-15, 2024 Aug 28.

Artículo en Inglés | MEDLINE | ID: mdl-39261266

RESUMEN

Global Nash equilibrium is an optimal solution for each player in a graphical game. This paper proposes an iterative adaptive dynamic programming-based algorithm to solve the global Nash equilibrium solution for optimal containment control problem with robustness analysis to the iterative error. The containment control problem is transferred into the graphical game formulation. Sufficient conditions are given to decouple the Hamilton-Jacobi equations, which guarantee the solvability of the global Nash equilibrium solution. The iterative algorithm is designed to obtain the solution without any knowledge of system dynamics. Conditions of iterative error for global stability are given with rigorous proof. Compared with existing works, the design procedures of control gain and coupling strength are separated, which avoids trivial cases in the design procedure. The robustness analysis exactly quantifies the effect of the iterative error caused by various sources in engineering practice. The theoretical results are validated by two numerical examples with marginally stable and unstable dynamics of the leader.

3.

Secure impulsive tracking of multi-agent systems with directed hypergraph topologies against hybrid deception attacks.

Yang, Zonglin; Ling, Guang; Ge, Ming-Feng.

Neural Netw ; 180: 106691, 2024 Sep 02.

Artículo en Inglés | MEDLINE | ID: mdl-39255635

RESUMEN

This research delves into the challenges of achieving secure consensus tracking within multi-agent systems characterized by directed hypergraph topologies, in the face of hybrid deception attacks. The hybrid discrete and continuous deception attacks are targeted at the controller communication channels and the hyperedges, respectively. To overcome these threats, an impulsive control mechanism based on hypergraph theory are introduced, and sufficient conditions are established, under which consensus can be maintained in a mean-square bounded sense, supported by rigorous mathematical proofs. Furthermore, the investigation quantifies the relationship between the mean-square bounded consensus of the multi-agent system and the intensity of the deception attacks, delineating a specific range for this error metric. The robustness and effectiveness of the proposed control method are verified through comprehensive simulation experiments, demonstrating its applicability in varied scenarios influenced by these sophisticated attacks. This study underscores the potential of hypergraph-based strategies in enhancing system resilience against complex hybrid attacks.

4.

Data-driven optimal cooperative tracking control for heterogeneous multi-agent systems.

Ma, Yong-Sheng; Xu, Yong; Sun, Jian; Dou, Li-Hua.

ISA Trans ; : 1-9, 2024 Sep 03.

Artículo en Inglés | MEDLINE | ID: mdl-39266336

RESUMEN

This paper presents a novel hierarchical control scheme for solving the data-driven optimal cooperative tracking control problem of heterogeneous multi-agent systems. Considering that followers cannot communicate with the leader, a prescribed-time fully distributed observer is devised to estimate the leader's state for each follower. Then, the data-driven decentralized controller is designed to ensure that the follower's output can track the leader's one. Compared with the existing results, the advantages of the designed distributed observer are that the prescribed convergence time is completely predetermined by the designer, and the design of the observer gain is independent of the global topology information. Besides, the advantages of the designed decentralized controller are that neither the follower's system model nor a known initial stabilizing control policy is required. Finally, simulation results exemplify the advantage of the proposed method.

5.

Distracted Walking: Does it impact pedestrian-vehicle interaction behavior?

Alsharif, Tala; Lanzaro, Gabriel; Sayed, Tarek.

Accid Anal Prev ; 208: 107789, 2024 Sep 18.

Artículo en Inglés | MEDLINE | ID: mdl-39299179

RESUMEN

Several studies have developed pedestrian-vehicle interaction models. However, these studies failed to consider pedestrian distraction, which considerably influences the safety of these interactions. Utilizing data from two intersections in Vancouver, Canada, this research uses the Multi-agent Adversarial Inverse Reinforcement Learning (MA-AIRL) framework to make inferences about the behavioral dynamics of distracted and non-distracted pedestrians while interacting with vehicles. Results showed that distracted pedestrians maintained closer proximity to vehicles, moved at reduced speeds, and rarely yielded to oncoming vehicles. In addition, they rarely changed their interaction angles regardless of lateral proximity to vehicles, indicating that they mostly remain unaware of the surrounding environment and have decreased navigational efficiency. Conversely, non-distracted pedestrians executed safer maneuvers, kept greater distances from vehicles, yielded more frequently, and adjusted their speeds accordingly. For example, non-distracted pedestrian-vehicle interactions showed a 46.5% decrease in traffic conflicts severity (as measured by the average Time-to-Collision (TTC) values) and an average 30.2% increase in minimum distances when compared to distracted pedestrian-vehicle interactions. Vehicle drivers also demonstrated different behaviors in response to distracted pedestrians. They often opted to decelerate around distracted pedestrians, indicating recognition of potential risks. Furthermore, the MA-AIRL framework provided different results depending on the type of interactions. The performance of the distracted vehicle-pedestrian model was lower than the non-distracted model, suggesting that predicting non-distracted behavior might be relatively easier. These findings emphasize the importance of refining pedestrian simulation models to include the unique behavioral patterns from pedestrian distractions. This should assist in further examining the safety impacts of pedestrian distraction on the road environment.

6.

Finite-time hybrid impulsive formation tracking control of multi-agent systems via aperiodic intermittent communication.

Liang, Zhanlue; Liu, Xinzhi.

ISA Trans ; : 1-14, 2024 Sep 16.

Artículo en Inglés | MEDLINE | ID: mdl-39299846

RESUMEN

This article studies the problem of formation tracking control in multi-agent systems, achieved in finite time, under challenging conditions such as strong nonlinearity, aperiodic intermittent communication, and time-delay effects, all within a hybrid impulsive framework. The impulses are categorized as either stabilizing control impulses or disruptive impulses. Furthermore, by integrating Lyapunov-based stability theory, graph theory, and the linear matrix inequality (LMI) method, new stability criteria are established. These criteria ensure finite-time intermittent formation tracking while considering weak Lyapunov inequality conditions, intermittent communication rates, and time-varying gain strengths. Additionally, the approach manages an indefinite number of impulsive moments and adjusts the control domain's width based on the average impulsive interval and state-dependent control width. Numerical simulations are provided to validate the applicability and effectiveness of the proposed formation tracking control protocols.

7.

Utility-Driven End-to-End Network Slicing for Diverse IoT Users in MEC: A Multi-Agent Deep Reinforcement Learning Approach.

Ejaz, Muhammad Asim; Wu, Guowei; Ahmed, Adeel; Iftikhar, Saman; Bawazeer, Shaikhan.

Sensors (Basel) ; 24(17)2024 Aug 28.

Artículo en Inglés | MEDLINE | ID: mdl-39275469

RESUMEN

Mobile Edge Computing (MEC) is crucial for reducing latency by bringing computational resources closer to the network edge, thereby enhancing the quality of services (QoS). However, the broad deployment of cloudlets poses challenges in efficient network slicing, particularly when traffic distribution is uneven. Therefore, these challenges include managing diverse resource requirements across widely distributed cloudlets, minimizing resource conflicts and delays, and maintaining service quality amid fluctuating request rates. Addressing this requires intelligent strategies to predict request types (common or urgent), assess resource needs, and allocate resources efficiently. Emerging technologies like edge computing and 5G with network slicing can handle delay-sensitive IoT requests rapidly, but a robust mechanism for real-time resource and utility optimization remains necessary. To address these challenges, we designed an end-to-end network slicing approach that predicts common and urgent user requests through T distribution. We formulated our problem as a multi-agent Markov decision process (MDP) and introduced a multi-agent soft actor-critic (MAgSAC) algorithm. This algorithm prevents the wastage of scarce resources by intelligently activating and deactivating virtual network function (VNF) instances, thereby balancing the allocation process. Our approach aims to optimize overall utility, balancing trade-offs between revenue, energy consumption costs, and latency. We evaluated our method, MAgSAC, through simulations, comparing it with the following six benchmark schemes: MAA3C, SACT, DDPG, S2Vec, Random, and Greedy. The results demonstrate that our approach, MAgSAC, optimizes utility by 30%, minimizes energy consumption costs by 12.4%, and reduces execution time by 21.7% compared to the closest related multi-agent approach named MAA3C.

8.

A Cascaded Multi-Agent Reinforcement Learning-Based Resource Allocation for Cellular-V2X Vehicular Platooning Networks.

Narayanasamy, Iswarya; Rajamanickam, Venkateswari.

Sensors (Basel) ; 24(17)2024 Aug 30.

Artículo en Inglés | MEDLINE | ID: mdl-39275567

RESUMEN

The platooning of cars and trucks is a pertinent approach for autonomous driving due to the effective utilization of roadways. The decreased gas consumption levels are an added merit owing to sustainability. Conventional platooning depended on Dedicated Short-Range Communication (DSRC)-based vehicle-to-vehicle communications. The computations were executed by the platoon members with their constrained capabilities. The advent of 5G has favored Intelligent Transportation Systems (ITS) to adopt Multi-access Edge Computing (MEC) in platooning paradigms by offloading the computational tasks to the edge server. In this research, vital parameters in vehicular platooning systems, viz. latency-sensitive radio resource management schemes, and Age of Information (AoI) are investigated. In addition, the delivery rates of Cooperative Awareness Messages (CAM) that ensure expeditious reception of safety-critical messages at the roadside units (RSU) are also examined. However, for latency-sensitive applications like vehicular networks, it is essential to address multiple and correlated objectives. To solve such objectives effectively and simultaneously, the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) framework necessitates a better and more sophisticated model to enhance its ability. In this paper, a novel Cascaded MADDPG framework, CMADDPG, is proposed to train cascaded target critics, which aims at achieving expected rewards through the collaborative conduct of agents. The estimation bias phenomenon, which hinders a system's overall performance, is vividly circumvented in this cascaded algorithm. Eventually, experimental analysis also demonstrates the potential of the proposed algorithm by evaluating the convergence factor, which stabilizes quickly with minimum distortions, and reliable CAM message dissemination with 99% probability. The average AoI quantity is maintained within the 5-10 ms range, guaranteeing better QoS. This technique has proven its robustness in decentralized resource allocation against channel uncertainties caused by higher mobility in the environment. Most importantly, the performance of the proposed algorithm remains unaffected by increasing platoon size and leading channel uncertainties.

9.

Optimizing control efficiency in discrete-time multi-agent systems via event-triggered containment techniques combining disturbance handling and input delay management.

Louati, Hanen; Niazi, Azmat Ullah Khan; Dalam, Mhassen E E; Hassan, Waqar Ul; Khan, Khawer Hameed; Alhagyan, Mohammed.

Heliyon ; 10(14): e33975, 2024 Jul 30.

Artículo en Inglés | MEDLINE | ID: mdl-39108846

RESUMEN

The goal of this paper is to mitigate disturbances and input delays while optimizing controller actuation updates for discrete-time multi-agent systems through the use of an event-triggered confinement control system, especially in resource-constrained scenarios. This approach when combined with event-triggered control techniques, then every follower in the system adjusts its condition at specified times based on an event-triggered condition that is suggested. The containment control system issue in the presence of disturbances and input delays was tackled by using both decentralized and centralized event-triggered control systems. Using matrix theory and the Lyapunov technique, convergence analysis is conducted to show that the proposed strategy stays free of zeno phenomena. Numerical boosts are used to further illustrate the impact of theoretical results.

10.

Collective predictive coding hypothesis: symbol emergence as decentralized Bayesian inference.

Taniguchi, Tadahiro.

Front Robot AI ; 11: 1353870, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-39109321

RESUMEN

Understanding the emergence of symbol systems, especially language, requires the construction of a computational model that reproduces both the developmental learning process in everyday life and the evolutionary dynamics of symbol emergence throughout history. This study introduces the collective predictive coding (CPC) hypothesis, which emphasizes and models the interdependence between forming internal representations through physical interactions with the environment and sharing and utilizing meanings through social semiotic interactions within a symbol emergence system. The total system dynamics is theorized from the perspective of predictive coding. The hypothesis draws inspiration from computational studies grounded in probabilistic generative models and language games, including the Metropolis-Hastings naming game. Thus, playing such games among agents in a distributed manner can be interpreted as a decentralized Bayesian inference of representations shared by a multi-agent system. Moreover, this study explores the potential link between the CPC hypothesis and the free-energy principle, positing that symbol emergence adheres to the society-wide free-energy principle. Furthermore, this paper provides a new explanation for why large language models appear to possess knowledge about the world based on experience, even though they have neither sensory organs nor bodies. This paper reviews past approaches to symbol emergence systems, offers a comprehensive survey of related prior studies, and presents a discussion on CPC-based generalizations. Future challenges and potential cross-disciplinary research avenues are highlighted.

11.

How to Formalize Different Types of Norms in Multi-agent Systems: A Methodology Focused on the T-Norm Model.

Roshankish, Soheil; Fornara, Nicoletta.

SN Comput Sci ; 5(6): 749, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-39100973

RESUMEN

In a world where many activities are carried out digitally, it is increasingly urgent to be able to formally represent the norms, policies, and contracts that regulate these activities in order to make them understandable and processable by machine. In multi-agent systems, the process to be followed by a person to choose a formal model of norms and transform a norm written in a natural language into a formal one by using the selected model is a demanding task. In this paper, we introduce a methodology to be followed by people to understand the fundamental elements that they should consider for this transformation. We will focus mainly on a methodology for formalizing norms using the T-Norm model, this is because it allows us to express a rich set of different types of norms. Nevertheless, the proposed methodology is general enough to also be used, in some of its steps, to formalize norms using other formal languages. In the definition of the methodology, we will explicitly state which types of norms can be expressed with a given model and which cannot. Since there is not yet a set of different types of norms that is sufficiently expressive and is recognized as valid by the Normative Mutiagent Systems (NorMAS) community, another goal of this paper is to propose and discuss a rich set of norms types that could be used to study the expressive power of different formal models of norms, to compare them, and to translate norms formalized with one language into norms written in another language.

12.

Encryption-decryption-based distributed fault-tolerant consensus tracking control for multi-agent systems.

Liu, Yiliu; Liu, Chun; Wang, Xiaofan; Lan, Jianglin; Ren, Xiaoqiang.

ISA Trans ; : 1-9, 2024 Aug 28.

Artículo en Inglés | MEDLINE | ID: mdl-39214754

RESUMEN

This study investigates fault-tolerant consensus tracking for discrete-time multi-agent systems (MASs) subject to external eavesdropping threats and additive actuator faults. First, actuator faults are modeled by difference equations, and decentralized observers are constructed to estimate actuator faults as well as system states. To offset fault-induced effects, ensure secure communication, and alleviate communication congestion, neighboring encrypted state information based on the encryption-decryption strategy (EDS) and estimated fault are integrated into a distributed active fault-tolerant consensus tracking control (FCTC) protocol. Through the properties of compatible norms, criteria for the controller, observer, and dynamic encryption key in EDS are derived to achieve leader-following consensus (LFC) of MASs with bias and drift actuator faults. Simulation results confirm the validity of the encryption-decryption-based distributed FCTC strategy.

13.

A novel development of advanced control approach for battery-fed electric vehicle systems.

Bhargavi, K M; Ashwini Kumari, P; Hussain Basha, C H; Girija Kanaka Jothi, S; Prashanth, V; Shetty, Nayana.

Sci Rep ; 14(1): 20194, 2024 Aug 30.

Artículo en Inglés | MEDLINE | ID: mdl-39215148

RESUMEN

In today's context, there is a clear preference for DC microgrids over AC microgrids due to their better compatibility with generating sources, loads, and battery energy storage systems (BESS). However, the intermittent nature of renewable resources disrupts the balance between power generation and load demand. It raises concerns regarding power management and quality in the power system. Control strategies are essential to address these challenges. This article focuses on developing a novel control strategy to ensure stability in microgrid systems. The proposed control structure utilizes a second-order multi-agent system (MAS) to enhance the power-sharing and coordination in the microgrid network. For effective control of battery energy storage units, a Voltage-Power (V-P) reference-based droop control and leader-follower consensus method is employed. The control approach consists of primary and secondary control layers. The primary layer uses a V-P reference-based droop control strategy to allocate load components to storage units. The secondary control layer aims to restore DC bus voltage using a MAS-based consensus protocol. The MAS approach offers greater flexibility and requires less computational power than other strategies such as Model Predictive Control (MPC). The enhanced control structure incorporates a current ratio modification loop to adjust the current ratio between the converters, thereby modifying gain and improving the voltage profile. This novel control optimizes the reliability and stability of the proposed DC microgrid system. The effectiveness of the enhanced consensus-based secondary control strategy is demonstrated using the MATLAB/Simulink platform.

14.

Asynchronous iterative Q-learning based tracking control for nonlinear discrete-time multi-agent systems.

Shen, Ziwen; Dong, Tao; Huang, Tingwen.

Neural Netw ; 180: 106667, 2024 Aug 26.

Artículo en Inglés | MEDLINE | ID: mdl-39216294

RESUMEN

This paper addresses the tracking control problem of nonlinear discrete-time multi-agent systems (MASs). First, a local neighborhood error system (LNES) is constructed. Then, a novel tracking algorithm based on asynchronous iterative Q-learning (AIQL) is developed, which can transform the tracking problem into the optimal regulation of LNES. The AIQL-based algorithm has two Q values QiA and QiB for each agent i, where QiA is used for improving the control policy and QiB is used for evaluating the value of the control policy. Moreover, the convergence of LNES is given. It is shown that the LNES converges to 0 and the tracking problem is solved. A neural network-based actor-critic framework is used to implement AIQL. The critic network of AIQL is composed of two neural networks, which are used for approximating QiA and QiB respectively. Finally, simulation results are given to verify the performance of the developed algorithm. It is shown that the AIQL-based tracking algorithm has a lower cost value and faster convergence speed than the IQL-based tracking algorithm.

15.

Evolutionary optimization for risk-aware heterogeneous multi-agent path planning in uncertain environments.

Rekabi Bana, Fatemeh; Krajník, Tomás; Arvin, Farshad.

Front Robot AI ; 11: 1375393, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-39193080

RESUMEN

Cooperative multi-agent systems make it possible to employ miniature robots in order to perform different experiments for data collection in wide open areas to physical interactions with test subjects in confined environments such as a hive. This paper proposes a new multi-agent path-planning approach to determine a set of trajectories where the agents do not collide with each other or any obstacle. The proposed algorithm leverages a risk-aware probabilistic roadmap algorithm to generate a map, employs node classification to delineate exploration regions, and incorporates a customized genetic framework to address the combinatorial optimization, with the ultimate goal of computing safe trajectories for the team. Furthermore, the proposed planning algorithm makes the agents explore all subdomains in the workspace together as a formation to allow the team to perform different tasks or collect multiple datasets for reliable localization or hazard detection. The objective function for minimization includes two major parts, the traveling distance of all the agents in the entire mission and the probability of collisions between the agents or agents with obstacles. A sampling method is used to determine the objective function considering the agents' dynamic behavior influenced by environmental disturbances and uncertainties. The algorithm's performance is evaluated for different group sizes by using a simulation environment, and two different benchmark scenarios are introduced to compare the exploration behavior. The proposed optimization method establishes stable and convergent properties regardless of the group size.

16.

Joint computation offloading and resource allocation for end-edge collaboration in internet of vehicles via multi-agent reinforcement learning.

Wang, Cong; Wang, Yaoming; Yuan, Ying; Peng, Sancheng; Li, Guorui; Yin, Pengfei.

Neural Netw ; 179: 106621, 2024 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-39153402

RESUMEN

Vehicular edge computing (VEC), a promising paradigm for the development of emerging intelligent transportation systems, can provide lower service latency for vehicular applications. However, it is still a challenge to fulfill the requirements of such applications with stringent latency requirements in the VEC system with limited resources. In addition, existing methods focus on handling the offloading task in a certain time slot with statically allocated resources, but ignore the heterogeneous tasks' different resource requirements, resulting in resource wastage. To solve the real-time task offloading and heterogeneous resource allocation problem in VEC system, we propose a decentralized solution based on the attention mechanism and recurrent neural networks (RNN) with a multi-agent distributed deep deterministic policy gradient (AR-MAD4PG). First, to address the partial observability of agents, we construct a shared agent graph and propose a periodic communication mechanism that enables edge nodes to aggregate information from other edge nodes. Second, to help agents better understand the current system state, we design an RNN-based feature extraction network to capture the historical state and resource allocation information of the VEC system. Thirdly, to tackle the challenges of excessive joint observation-action space and ineffective information interference, we adopt the multi-head attention mechanism to compress the dimension of the observation-action space of agents. Finally, we build a simulation model based on the actual vehicle trajectories, and the experimental results show that our proposed method outperforms the existing approaches.

Asunto(s)

Redes Neurales de la Computación , Asignación de Recursos , Refuerzo en Psicología , Internet , Transportes , Algoritmos , Simulación por Computador , Aprendizaje Profundo

17.

Walking representation and simulation based on multi-source image fusion and multi-agent reinforcement learning for gait rehabilitation.

Zhu, Yean; Xiao, Meirong; Robbins, Dan; Wu, Xiaoying; Lu, Wei; Hou, Wensheng.

Artif Intell Med ; 156: 102945, 2024 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-39178622

RESUMEN

In the formulation of strategies for walking rehabilitation, achieving precise identification of the current state and making rational predictions about the future state are crucial but often unrealized. To tackle this challenge, our study introduces a unified framework that integrates a novel 3D walking motion capture method using multi-source image fusion and a walking rehabilitation simulation approach based on multi-agent reinforcement learning. We found that, (i) the proposal achieved an accurate 3D walking motion capture and outperforms other advanced methods. Experimental evidence indicates that, compared to similar visual skeleton tracking methods, the proposed approach yields results with higher Pearson correlation (r=0.93), intra-class correlation coefficient (ICC(2,1)=0.91), and narrower confidence intervals ([0.90,0.95] for r, [0.88,0.94] for ICC(2,1)) when compared to standard results. The outcomes of the proposed approach also exhibit commendable correlation and concurrence with those obtained through the IMU-based skeleton tracking method in the assessment of gait parameters ([0.85,0.89] for r, [0.75,0.81] for ICC(2,1)); (ii) multi-agent reinforcement learning has the potential to be used to solve the simulation task of gait rehabilitation. In mimicry experiment, our proposed simulation method for gait rehabilitation not only enables the intelligent agent to converge from the initial state to the target state, but also observes evolutionary patterns similar to those observed in clinical practice through motor state resolution. This study offers valuable contributions to walking rehabilitation, enabling precise assessment and simulation-based interventions, with potential implications for clinical practice and patient outcomes.

Asunto(s)

Marcha , Caminata , Humanos , Caminata/fisiología , Marcha/fisiología , Simulación por Computador , Refuerzo en Psicología , Imagenología Tridimensional/métodos , Aprendizaje Automático

18.

Generative subgoal oriented multi-agent reinforcement learning through potential field.

Li, Shengze; Jiang, Hao; Liu, Yuntao; Zhang, Jieyuan; Xu, Xinhai; Liu, Donghong.

Neural Netw ; 179: 106552, 2024 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-39089154

RESUMEN

Multi-agent reinforcement learning (MARL) effectively improves the learning speed of agents in sparse reward tasks with the guide of subgoals. However, existing works sever the consistency of the learning objectives of the subgoal generation and subgoal reached stages, thereby significantly inhibiting the effectiveness of subgoal learning. To address this problem, we propose a novel Potential field Subgoal-based Multi-Agent reinforcement learning (PSMA) method, which introduces the potential field (PF) to unify the two-stage learning objectives. Specifically, we design a state-to-PF representation model that describes agents' states as potential fields, allowing easy measurement of the interaction effect for both allied and enemy agents. With the PF representation, a subgoal selector is designed to automatically generate multiple subgoals for each agent, drawn from the experience replay buffer that contains both individual and total PF values. Based on the determined subgoals, we define an intrinsic reward function to guide the agent to reach their respective subgoals while maximizing the joint action-value. Experimental results show that our method outperforms the state-of-the-art MARL method on both StarCraft II micro-management (SMAC) and Google Research Football (GRF) tasks with sparse reward settings.

Asunto(s)

Refuerzo en Psicología , Recompensa , Redes Neurales de la Computación , Humanos , Algoritmos , Aprendizaje Automático

19.

Optimal synchronization with L²-gain performance: An adaptive dynamic programming approach.

Chen, Zitao; Chen, Kairui; Tang, Ruizhi.

Neural Netw ; 179: 106566, 2024 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-39089157

RESUMEN

This paper studies an optimal synchronous control protocol design for nonlinear multi-agent systems under partially known dynamics and uncertain external disturbance. Under some mild assumptions, Hamilton-Jacobi-Isaacs equation is derived by the performance index function and system dynamics, which serves as an equivalent formulation. Distributed policy iteration adaptive dynamic programming is developed to obtain the numerical solution to the Hamilton-Jacobi-Isaacs equation. Three theoretical results are given about the proposed algorithm. First, the iterative variables is proved to converge to the solution to Hamilton-Jacobi-Isaacs equation. Second, the L2-gain performance of the closed loop system is achieved. As a special case, the origin of the nominal system is asymptotically stable. Third, the obtained control protocol constitutes an Nash equilibrium solution. Neural network-based implementation is designed following the main results. Finally, two numerical examples are provided to verify the effectiveness of the proposed method.

Asunto(s)

Algoritmos , Redes Neurales de la Computación , Dinámicas no Lineales , Simulación por Computador

20.

MuDE: Multi-agent decomposed reward-based exploration.

Yoo, Byunghyun; Yi, Sungwon; Kim, Hyunwoo; Shin, Younghwan; Han, Ran; Seo, Seungwoo; Song, Hwa Jeon; Chung, Euisok; Yang, Jeongmin.

Neural Netw ; 179: 106565, 2024 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-39111159

RESUMEN

In cooperative multi-agent reinforcement learning, agents jointly optimize a centralized value function based on the rewards shared by all agents and learn decentralized policies through value function decomposition. Although such a learning framework is considered effective, estimating individual contribution from the rewards, which is essential for learning highly cooperative behaviors, is difficult. In addition, it becomes more challenging when reinforcement and punishment, help in increasing or decreasing the specific behaviors of agents, coexist because the processes of maximizing reinforcement and minimizing punishment can often conflict in practice. This study proposes a novel exploration scheme called multi-agent decomposed reward-based exploration (MuDE), which preferably explores the action spaces associated with positive sub-rewards based on a modified reward decomposition scheme, thus effectively exploring action spaces not reachable by existing exploration schemes. We evaluate MuDE with a challenging set of StarCraft II micromanagement and modified predator-prey tasks extended to include reinforcement and punishment. The results show that MuDE accurately estimates sub-rewards and outperforms state-of-the-art approaches in both convergence speed and win rates.

Asunto(s)

Castigo , Refuerzo en Psicología , Recompensa , Redes Neurales de la Computación , Conducta Cooperativa , Humanos , Algoritmos

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA