Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
1.
J Imaging Inform Med ; 2024 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-39249582

RESUMEN

PelviNet introduces a groundbreaking multi-agent convolutional network architecture tailored for enhancing pelvic image registration. This innovative framework leverages shared convolutional layers, enabling synchronized learning among agents and ensuring an exhaustive analysis of intricate 3D pelvic structures. The architecture combines max pooling, parametric ReLU activations, and agent-specific layers to optimize both individual and collective decision-making processes. A communication mechanism efficiently aggregates outputs from these shared layers, enabling agents to make well-informed decisions by harnessing combined intelligence. PelviNet's evaluation centers on both quantitative accuracy metrics and visual representations to elucidate agents' performance in pinpointing optimal landmarks. Empirical results demonstrate PelviNet's superiority over traditional methods, achieving an average image-wise error of 2.8 mm, a subject-wise error of 3.2 mm, and a mean Euclidean distance error of 3.0 mm. These quantitative results highlight the model's efficiency and precision in landmark identification, crucial for medical contexts such as radiation therapy, where exact landmark identification significantly influences treatment outcomes. By reliably identifying critical structures, PelviNet advances pelvic image analysis and offers potential enhancements for broader medical imaging applications, marking a significant step forward in computational healthcare.

2.
Sensors (Basel) ; 24(17)2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39275567

RESUMEN

The platooning of cars and trucks is a pertinent approach for autonomous driving due to the effective utilization of roadways. The decreased gas consumption levels are an added merit owing to sustainability. Conventional platooning depended on Dedicated Short-Range Communication (DSRC)-based vehicle-to-vehicle communications. The computations were executed by the platoon members with their constrained capabilities. The advent of 5G has favored Intelligent Transportation Systems (ITS) to adopt Multi-access Edge Computing (MEC) in platooning paradigms by offloading the computational tasks to the edge server. In this research, vital parameters in vehicular platooning systems, viz. latency-sensitive radio resource management schemes, and Age of Information (AoI) are investigated. In addition, the delivery rates of Cooperative Awareness Messages (CAM) that ensure expeditious reception of safety-critical messages at the roadside units (RSU) are also examined. However, for latency-sensitive applications like vehicular networks, it is essential to address multiple and correlated objectives. To solve such objectives effectively and simultaneously, the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) framework necessitates a better and more sophisticated model to enhance its ability. In this paper, a novel Cascaded MADDPG framework, CMADDPG, is proposed to train cascaded target critics, which aims at achieving expected rewards through the collaborative conduct of agents. The estimation bias phenomenon, which hinders a system's overall performance, is vividly circumvented in this cascaded algorithm. Eventually, experimental analysis also demonstrates the potential of the proposed algorithm by evaluating the convergence factor, which stabilizes quickly with minimum distortions, and reliable CAM message dissemination with 99% probability. The average AoI quantity is maintained within the 5-10 ms range, guaranteeing better QoS. This technique has proven its robustness in decentralized resource allocation against channel uncertainties caused by higher mobility in the environment. Most importantly, the performance of the proposed algorithm remains unaffected by increasing platoon size and leading channel uncertainties.

3.
Neural Netw ; 179: 106552, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39089154

RESUMEN

Multi-agent reinforcement learning (MARL) effectively improves the learning speed of agents in sparse reward tasks with the guide of subgoals. However, existing works sever the consistency of the learning objectives of the subgoal generation and subgoal reached stages, thereby significantly inhibiting the effectiveness of subgoal learning. To address this problem, we propose a novel Potential field Subgoal-based Multi-Agent reinforcement learning (PSMA) method, which introduces the potential field (PF) to unify the two-stage learning objectives. Specifically, we design a state-to-PF representation model that describes agents' states as potential fields, allowing easy measurement of the interaction effect for both allied and enemy agents. With the PF representation, a subgoal selector is designed to automatically generate multiple subgoals for each agent, drawn from the experience replay buffer that contains both individual and total PF values. Based on the determined subgoals, we define an intrinsic reward function to guide the agent to reach their respective subgoals while maximizing the joint action-value. Experimental results show that our method outperforms the state-of-the-art MARL method on both StarCraft II micro-management (SMAC) and Google Research Football (GRF) tasks with sparse reward settings.


Asunto(s)
Refuerzo en Psicología , Recompensa , Redes Neurales de la Computación , Humanos , Algoritmos , Aprendizaje Automático
4.
Neural Netw ; 179: 106565, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39111159

RESUMEN

In cooperative multi-agent reinforcement learning, agents jointly optimize a centralized value function based on the rewards shared by all agents and learn decentralized policies through value function decomposition. Although such a learning framework is considered effective, estimating individual contribution from the rewards, which is essential for learning highly cooperative behaviors, is difficult. In addition, it becomes more challenging when reinforcement and punishment, help in increasing or decreasing the specific behaviors of agents, coexist because the processes of maximizing reinforcement and minimizing punishment can often conflict in practice. This study proposes a novel exploration scheme called multi-agent decomposed reward-based exploration (MuDE), which preferably explores the action spaces associated with positive sub-rewards based on a modified reward decomposition scheme, thus effectively exploring action spaces not reachable by existing exploration schemes. We evaluate MuDE with a challenging set of StarCraft II micromanagement and modified predator-prey tasks extended to include reinforcement and punishment. The results show that MuDE accurately estimates sub-rewards and outperforms state-of-the-art approaches in both convergence speed and win rates.


Asunto(s)
Castigo , Refuerzo en Psicología , Recompensa , Redes Neurales de la Computación , Conducta Cooperativa , Humanos , Algoritmos
5.
Artif Intell Med ; 156: 102945, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39178622

RESUMEN

In the formulation of strategies for walking rehabilitation, achieving precise identification of the current state and making rational predictions about the future state are crucial but often unrealized. To tackle this challenge, our study introduces a unified framework that integrates a novel 3D walking motion capture method using multi-source image fusion and a walking rehabilitation simulation approach based on multi-agent reinforcement learning. We found that, (i) the proposal achieved an accurate 3D walking motion capture and outperforms other advanced methods. Experimental evidence indicates that, compared to similar visual skeleton tracking methods, the proposed approach yields results with higher Pearson correlation (r=0.93), intra-class correlation coefficient (ICC(2,1)=0.91), and narrower confidence intervals ([0.90,0.95] for r, [0.88,0.94] for ICC(2,1)) when compared to standard results. The outcomes of the proposed approach also exhibit commendable correlation and concurrence with those obtained through the IMU-based skeleton tracking method in the assessment of gait parameters ([0.85,0.89] for r, [0.75,0.81] for ICC(2,1)); (ii) multi-agent reinforcement learning has the potential to be used to solve the simulation task of gait rehabilitation. In mimicry experiment, our proposed simulation method for gait rehabilitation not only enables the intelligent agent to converge from the initial state to the target state, but also observes evolutionary patterns similar to those observed in clinical practice through motor state resolution. This study offers valuable contributions to walking rehabilitation, enabling precise assessment and simulation-based interventions, with potential implications for clinical practice and patient outcomes.


Asunto(s)
Marcha , Caminata , Humanos , Caminata/fisiología , Marcha/fisiología , Simulación por Computador , Refuerzo en Psicología , Imagenología Tridimensional/métodos , Aprendizaje Automático
6.
Sensors (Basel) ; 24(16)2024 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-39204838

RESUMEN

Device-to-device (D2D) is a pivotal technology in the next generation of communication, allowing for direct task offloading between mobile devices (MDs) to improve the efficient utilization of idle resources. This paper proposes a novel algorithm for dynamic task offloading between the active MDs and the idle MDs in a D2D-MEC (mobile edge computing) system by deploying multi-agent deep reinforcement learning (DRL) to minimize the long-term average delay of delay-sensitive tasks under deadline constraints. Our core innovation is a dynamic partitioning scheme for idle and active devices in the D2D-MEC system, accounting for stochastic task arrivals and multi-time-slot task execution, which has been insufficiently explored in the existing literature. We adopt a queue-based system to formulate a dynamic task offloading optimization problem. To address the challenges of large action space and the coupling of actions across time slots, we model the problem as a Markov decision process (MDP) and perform multi-agent DRL through multi-agent proximal policy optimization (MAPPO). We employ a centralized training with decentralized execution (CTDE) framework to enable each MD to make offloading decisions solely based on its local system state. Extensive simulations demonstrate the efficiency and fast convergence of our algorithm. In comparison to the existing sub-optimal results deploying single-agent DRL, our algorithm reduces the average task completion delay by 11.0% and the ratio of dropped tasks by 17.0%. Our proposed algorithm is particularly pertinent to sensor networks, where mobile devices equipped with sensors generate a substantial volume of data that requires timely processing to ensure quality of experience (QoE) and meet the service-level agreements (SLAs) of delay-sensitive applications.

7.
Neural Netw ; 178: 106544, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39053197

RESUMEN

In multi-agent partially observable sequential decision problems with general-sum rewards, it is necessary to account for the egoism (individual rewards), utilitarianism (social welfare), and egalitarianism (fairness) criteria simultaneously. However, achieving a balance between these criteria poses a challenge for current multi-agent reinforcement learning methods. Specifically, fully decentralized methods without global information of all agents' rewards, observations and actions fail to learn a balanced policy, while agents in centralized training (with decentralized execution) methods are reluctant to share private information due to concerns of exploitation by others. To address these issues, this paper proposes a Decentralized and Federated (D&F) paradigm, where decentralized agents train egoistic policies utilizing solely local information to attain self-interest, and the federation controller primarily considers utilitarianism and egalitarianism. Meanwhile, the parameters of decentralized and federated policies are optimized with discrepancy constraints mutually, akin to a server and client pattern, which ensures the balance between egoism, utilitarianism, and egalitarianism. Furthermore, theoretical evidence demonstrates that the federated model, as well as the discrepancy between decentralized egoistic policies and federated utilitarian policies, obtains an O(1/T) convergence rate. Extensive experiments show that our D&F approach outperforms multiple baselines, in terms of both utilitarianism and egalitarianism.


Asunto(s)
Refuerzo en Psicología , Humanos , Recompensa , Toma de Decisiones/fisiología
8.
Neural Netw ; 179: 106547, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39068677

RESUMEN

Centralized Training with Decentralized Execution (CTDE) is a prevalent paradigm in the field of fully cooperative Multi-Agent Reinforcement Learning (MARL). Existing algorithms often encounter two major problems: independent strategies tend to underestimate the potential value of actions, leading to the convergence on sub-optimal Nash Equilibria (NE); some communication paradigms introduce added complexity to the learning process, complicating the focus on the essential elements of the messages. To address these challenges, we propose a novel method called Optimistic Sequential Soft Actor Critic with Motivational Communication (OSSMC). The key idea of OSSMC is to utilize a greedy-driven approach to explore the potential value of individual policies, named optimistic Q-values, which serve as an upper bound for the Q-value of the current policy. We then integrate a sequential update mechanism with optimistic Q-value for agents, aiming to ensure monotonic improvement in the joint policy optimization process. Moreover, we establish motivational communication modules for each agent to disseminate motivational messages to promote cooperative behaviors. Finally, we employ a value regularization strategy from the Soft Actor Critic (SAC) method to maximize entropy and improve exploration capabilities. The performance of OSSMC was rigorously evaluated against a series of challenging benchmark sets. Empirical results demonstrate that OSSMC not only surpasses current baseline algorithms but also exhibits a more rapid convergence rate.


Asunto(s)
Algoritmos , Motivación , Refuerzo en Psicología , Comunicación , Humanos , Redes Neurales de la Computación , Conducta Cooperativa
9.
Neural Netw ; 178: 106432, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-38901092

RESUMEN

In the realm of fully cooperative multi-agent reinforcement learning (MARL), effective communication can induce implicit cooperation among agents and improve overall performance. In current communication strategies, agents are allowed to exchange local observations or latent embeddings, which can augment individual local policy inputs and mitigate uncertainty in local decision-making processes. Unfortunately, in previous communication schemes, agents may potentially receive irrelevant information, which increases training difficulty and leads to poor performance in complex settings. Furthermore, most existing works lack the consideration of the impact of small coalitions formed by agents in the multi-agent system. To address these challenges, we propose HyperComm, a novel framework that uses the hypergraph to model the multi-agent system, improving the accuracy and specificity of communication among agents. Our approach brings the concept of hypergraph for the first time in multi-agent communication for MARL. Within this framework, each agent can communicate more effectively with other agents within the same hyperedge, leading to better cooperation in environments with multiple agents. Compared to those state-of-the-art communication-based approaches, HyperComm demonstrates remarkable performance in scenarios involving a large number of agents.


Asunto(s)
Comunicación , Refuerzo en Psicología , Humanos , Toma de Decisiones/fisiología , Redes Neurales de la Computación , Simulación por Computador , Algoritmos
10.
Sensors (Basel) ; 24(11)2024 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-38894236

RESUMEN

Frequency agility refers to the rapid variation of the carrier frequency of adjacent pulses, which is an effective radar active antijamming method against frequency spot jamming. Variation patterns of traditional pseudo-random frequency hopping methods are susceptible to analysis and decryption, rendering them ineffective against increasingly sophisticated jamming strategies. Although existing reinforcement learning-based methods can adaptively optimize frequency hopping strategies, they are limited in adapting to the diversity and dynamics of jamming strategies, resulting in poor performance in the face of complex unknown jamming strategies. This paper proposes an AK-MADDPG (Adaptive K-th order history-based Multi-Agent Deep Deterministic Policy Gradient) method for designing frequency hopping strategies in frequency agile radar. Signal pulses within a coherent processing interval are treated as agents, learning to optimize their hopping strategies in the case of unknown jamming strategies. Agents dynamically adjust their carrier frequencies to evade jamming and collaborate with others to enhance antijamming efficacy. This approach exploits cooperative relationships among the pulses, providing additional information for optimized frequency hopping strategies. In addition, an adaptive K-th order history method has been introduced into the algorithm to capture long-term dependencies in sequential data. Simulation results demonstrate the superior performance of the proposed method.

11.
Front Robot AI ; 11: 1229026, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38690119

RESUMEN

Introduction: Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL) has gained tremendous interest in recent years. Most research studies apply a fully centralized learning scheme to ease the transfer from the single-agent domain to multi-agent systems. Methods: In contrast, we claim that a decentralized learning scheme is preferable for applications in real-world scenarios as this allows deploying a learning algorithm on an individual robot rather than deploying the algorithm to a complete fleet of robots. Therefore, this article outlines a novel actor-critic (AC) approach tailored to cooperative MARL problems in sparsely rewarded domains. Our approach decouples the MARL problem into a set of distributed agents that model the other agents as responsive entities. In particular, we propose using two separate critics per agent to distinguish between the joint task reward and agent-based costs as commonly applied within multi-robot planning. On one hand, the agent-based critic intends to decrease agent-specific costs. On the other hand, each agent intends to optimize the joint team reward based on the joint task critic. As this critic still depends on the joint action of all agents, we outline two suitable behavior models based on Stackelberg games: a game against nature and a dyadic game against each agent. Following these behavior models, our algorithm allows fully decentralized execution and training. Results and Discussion: We evaluate our presented method using the proposed behavior models within a sparsely rewarded simulated multi-agent environment. Although our approach already outperforms the state-of-the-art learners, we conclude this article by outlining possible extensions of our algorithm that future research may build upon.

12.
Elife ; 132024 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-38711355

RESUMEN

Collaborative hunting, in which predators play different and complementary roles to capture prey, has been traditionally believed to be an advanced hunting strategy requiring large brains that involve high-level cognition. However, recent findings that collaborative hunting has also been documented in smaller-brained vertebrates have placed this previous belief under strain. Here, using computational multi-agent simulations based on deep reinforcement learning, we demonstrate that decisions underlying collaborative hunts do not necessarily rely on sophisticated cognitive processes. We found that apparently elaborate coordination can be achieved through a relatively simple decision process of mapping between states and actions related to distance-dependent internal representations formed by prior experience. Furthermore, we confirmed that this decision rule of predators is robust against unknown prey controlled by humans. Our computational ecological results emphasize that collaborative hunting can emerge in various intra- and inter-specific interactions in nature, and provide insights into the evolution of sociality.


From wolves to ants, many animals are known to be able to hunt as a team. This strategy may yield several advantages: going after bigger preys together, for example, can often result in individuals spending less energy and accessing larger food portions than when hunting alone. However, it remains unclear whether this behavior relies on complex cognitive processes, such as the ability for an animal to represent and anticipate the actions of its teammates. It is often thought that 'collaborative hunting' may require such skills, as this form of group hunting involves animals taking on distinct, tightly coordinated roles ­ as opposed to simply engaging in the same actions simultaneously. To better understand whether high-level cognitive skills are required for collaborative hunting, Tsutsui et al. used a type of artificial intelligence known as deep reinforcement learning. This allowed them to develop a computational model in which a small number of 'agents' had the opportunity to 'learn' whether and how to work together to catch a 'prey' under various conditions. To do so, the agents were only equipped with the ability to link distinct stimuli together, such as an event and a reward; this is similar to associative learning, a cognitive process which is widespread amongst animal species. The model showed that the challenge of capturing the prey when hunting alone, and the reward of sharing food after a successful hunt drove the agents to learn how to work together, with previous experiences shaping decisions made during subsequent hunts. Importantly, the predators started to exhibit the ability to take on distinct, complementary roles reminiscent of those observed during collaborative hunting, such as one agent chasing the prey while another ambushes it. Overall, the work by Tsutsui et al. challenges the traditional view that only organisms equipped with high-level cognitive processes can show refined collaborative approaches to hunting, opening the possibility that these behaviors may be more widespread than originally thought ­ including between animals of different species.


Asunto(s)
Aprendizaje Profundo , Conducta Predatoria , Refuerzo en Psicología , Animales , Conducta Cooperativa , Humanos , Simulación por Computador , Toma de Decisiones
13.
Sensors (Basel) ; 24(9)2024 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-38732885

RESUMEN

Delay-sensitive task offloading in a device-to-device assisted mobile edge computing (D2D-MEC) system with energy harvesting devices is a critical challenge due to the dynamic load level at edge nodes and the variability in harvested energy. In this paper, we propose a joint dynamic task offloading and CPU frequency control scheme for delay-sensitive tasks in a D2D-MEC system, taking into account the intricacies of multi-slot tasks, characterized by diverse processing speeds and data transmission rates. Our methodology involves meticulous modeling of task arrival and service processes using queuing systems, coupled with the strategic utilization of D2D communication to alleviate edge server load and prevent network congestion effectively. Central to our solution is the formulation of average task delay optimization as a challenging nonlinear integer programming problem, requiring intelligent decision making regarding task offloading for each generated task at active mobile devices and CPU frequency adjustments at discrete time slots. To navigate the intricate landscape of the extensive discrete action space, we design an efficient multi-agent DRL learning algorithm named MAOC, which is based on MAPPO, to minimize the average task delay by dynamically determining task-offloading decisions and CPU frequencies. MAOC operates within a centralized training with decentralized execution (CTDE) framework, empowering individual mobile devices to make decisions autonomously based on their unique system states. Experimental results demonstrate its swift convergence and operational efficiency, and it outperforms other baseline algorithms.

14.
Neural Netw ; 172: 106149, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38306786

RESUMEN

In this study, a novel exploration method for centralized training and decentralized execution (CTDE)-based multi-agent reinforcement learning (MARL) is introduced. The method uses the concept of strangeness, which is determined by evaluating (1) the level of the unfamiliarity of the observations an agent encounters and (2) the level of the unfamiliarity of the entire state the agents visit. An exploration bonus, which is derived from the concept of strangeness, is combined with the extrinsic reward obtained from the environment to form a mixed reward, which is then used for training CTDE-based MARL algorithms. Additionally, a separate action-value function is also proposed to prevent the high exploration bonus from overwhelming the sensitivity to extrinsic rewards during MARL training. This separate function is used to design the behavioral policy for generating transitions. The proposed method is not much affected by stochastic transitions commonly observed in MARL tasks and improves the stability of CTDE-based MARL algorithms when used with an exploration method. By providing didactic examples and demonstrating the substantial performance improvement of our proposed exploration method in CTDE-based MARL algorithms, we illustrate the advantages of our approach. These evaluations highlight how our method outperforms state-of-the-art MARL baselines on challenging tasks within the StarCraft II micromanagement benchmark, underscoring its effectiveness in improving MARL.


Asunto(s)
Aprendizaje , Refuerzo en Psicología , Recompensa , Algoritmos , Benchmarking
15.
Sensors (Basel) ; 24(2)2024 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-38257602

RESUMEN

As a promising paradigm, mobile crowdsensing (MCS) takes advantage of sensing abilities and cooperates with multi-agent reinforcement learning technologies to provide services for users in large sensing areas, such as smart transportation, environment monitoring, etc. In most cases, strategy training for multi-agent reinforcement learning requires substantial interaction with the sensing environment, which results in unaffordable costs. Thus, environment reconstruction via extraction of the causal effect model from past data is an effective way to smoothly accomplish environment monitoring. However, the sensing environment is often so complex that the observable and unobservable data collected are sparse and heterogeneous, affecting the accuracy of the reconstruction. In this paper, we focus on developing a robust multi-agent environment monitoring framework, called self-interested coalitional crowdsensing for multi-agent interactive environment monitoring (SCC-MIE), including environment reconstruction and worker selection. In SCC-MIE, we start from a multi-agent generative adversarial imitation learning framework to introduce a new self-interested coalitional learning strategy, which forges cooperation between a reconstructor and a discriminator to learn the sensing environment together with the hidden confounder while providing interpretability on the results of environment monitoring. Based on this, we utilize the secretary problem to select suitable workers to collect data for accurate environment monitoring in a real-time manner. It is shown that SCC-MIE realizes a significant performance improvement in environment monitoring compared to the existing models.

16.
Neural Netw ; 170: 610-621, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38056408

RESUMEN

Multi-agent reinforcement learning (MARL) algorithms based on trust regions (TR) have achieved significant success in numerous cooperative multi-agent tasks. These algorithms restrain the Kullback-Leibler (KL) divergence (i.e., TR constraint) between the current and new policies to avoid aggressive update steps and improve learning performance. However, the majority of existing TR-based MARL algorithms are on-policy, meaning that they require new data sampled by current policies for training and cannot utilize off-policy (or historical) data, leading to low sample efficiency. This study aims to enhance the data efficiency of TR-based learning methods. To achieve this, an approximation of the original objective function is designed. In addition, it is proven that as long as the update size of the policy (measured by the KL divergence) is restricted, optimizing the designed objective function using historical data can guarantee the monotonic improvement of the original target. Building on the designed objective, a practical off-policy multi-agent stochastic policy gradient algorithm is proposed within the framework of centralized training with decentralized execution (CTDE). Additionally, policy entropy is integrated into the reward to promote exploration, and consequently, improve stability. Comprehensive experiments are conducted on a representative benchmark for multi-agent MuJoCo (MAMuJoCo), which offers a range of challenging tasks in cooperative continuous multi-agent control. The results demonstrate that the proposed algorithm outperforms all other existing algorithms by a significant margin.


Asunto(s)
Algoritmos , Aprendizaje , Benchmarking , Entropía , Políticas
17.
Biomimetics (Basel) ; 8(8)2023 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-38132555

RESUMEN

Achieving omnidirectional walking for bipedal robots is considered one of the most challenging tasks in robotics technology. Reinforcement learning (RL) methods have proved effective in bipedal walking tasks. However, most existing methods use state machines to switch between multiple policies and achieve omnidirectional gait, which results in shaking during the policy switching process for bipedal robots. To achieve a seamless transition between omnidirectional gait and transient motion for full-size bipedal robots, we propose a novel multi-agent RL method. Firstly, a multi-agent RL algorithm based on the actor-critic framework is designed, and policy entropy is introduced to improve exploration efficiency. By learning agents with parallel initial state distributions, we minimize reliance on gait planner effectiveness in the Robot Operating System (ROS). Additionally, we design a novel heterogeneous policy experience replay mechanism based on Euclidean distance. Secondly, considering the periodicity of bipedal robot walking, we develop a new periodic gait function. Including periodic objectives in the policy can accelerate the convergence speed of training periodic gait functions. Finally, to enhance the robustness of the policy, we construct a novel curriculum learning method by discretizing Gaussian distribution and incorporate it into the robot's training task. Our method is validated in a simulation environment, and the results show that our method can achieve multiple gaits through a policy network and achieve smooth transitions between different gaits.

18.
Front Neurorobot ; 17: 1243174, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37811355

RESUMEN

Unmanned Aerial Vehicles (UAVs) have gained popularity due to their low lifecycle cost and minimal human risk, resulting in their widespread use in recent years. In the UAV swarm cooperative decision domain, multi-agent deep reinforcement learning has significant potential. However, current approaches are challenged by the multivariate mission environment and mission time constraints. In light of this, the present study proposes a meta-learning based multi-agent deep reinforcement learning approach that provides a viable solution to this problem. This paper presents an improved MAML-based multi-agent deep deterministic policy gradient (MADDPG) algorithm that achieves an unbiased initialization network by automatically assigning weights to meta-learning trajectories. In addition, a Reward-TD prioritized experience replay technique is introduced, which takes into account immediate reward and TD-error to improve the resilience and sample utilization of the algorithm. Experiment results show that the proposed approach effectively accomplishes the task in the new scenario, with significantly improved task success rate, average reward, and robustness compared to existing methods.

19.
Top Cogn Sci ; 2023 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-37902444

RESUMEN

Artificial intelligence (AI) is often used to predict human behavior, thus potentially posing limitations to individuals' and collectives' freedom to act. AI's most controversial and contested applications range from targeted advertisements to crime prevention, including the suppression of civil disorder. Scholars and civil society watchdogs are discussing the oppressive dangers of AI being used by centralized institutions, like governments or private corporations. Some suggest that AI gives asymmetrical power to governments, compared to their citizens. On the other hand, civil protests often rely on distributed networks of activists without centralized leadership or planning. Civil protests create an adversarial tension between centralized and decentralized intelligence, opening the question of how distributed human networks can collectively adapt and outperform a hostile centralized AI trying to anticipate and control their activities. This paper leverages multi-agent reinforcement learning to simulate dynamics within a human-machine hybrid society. We ask how decentralized intelligent agents can collectively adapt when competing with a centralized predictive algorithm, wherein prediction involves suppressing coordination. In particular, we investigate an adversarial game between a collective of individual learners and a central predictive algorithm, each trained through deep Q-learning. We compare different predictive architectures and showcase conditions in which the adversarial nature of this dynamic pushes each intelligence to increase its behavioral complexity to outperform its counterpart. We further show that a shared predictive algorithm drives decentralized agents to align their behavior. This work sheds light on the totalitarian danger posed by AI and provides evidence that decentrally organized humans can overcome its risks by developing increasingly complex coordination strategies.

20.
Cogn Sci ; 47(8): e13315, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37555649

RESUMEN

In developing artificial intelligence (AI), researchers often benchmark against human performance as a measure of progress. Is this kind of comparison possible for moral cognition? Given that human moral judgment often hinges on intangible properties like "intention" which may have no natural analog in artificial agents, it may prove difficult to design a "like-for-like" comparison between the moral behavior of artificial and human agents. What would a measure of moral behavior for both humans and AI look like? We unravel the complexity of this question by discussing examples within reinforcement learning and generative AI, and we examine how the puzzle of evaluating artificial agents' moral cognition remains open for further investigation within cognitive science.


Asunto(s)
Inteligencia Artificial , Cognición , Humanos , Principios Morales , Juicio , Aprendizaje
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA