Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 79
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Sensors (Basel) ; 24(15)2024 Jul 27.
Artículo en Inglés | MEDLINE | ID: mdl-39123922

RESUMEN

Interest in deploying deep reinforcement learning (DRL) models on low-power edge devices, such as Autonomous Mobile Robots (AMRs) and Internet of Things (IoT) devices, has seen a significant rise due to the potential of performing real-time inference by eliminating the latency and reliability issues incurred from wireless communication and the privacy benefits of processing data locally. Deploying such energy-intensive models on power-constrained devices is not always feasible, however, which has led to the development of model compression techniques that can reduce the size and computational complexity of DRL policies. Policy distillation, the most popular of these methods, can be used to first lower the number of network parameters by transferring the behavior of a large teacher network to a smaller student model before deploying these students at the edge. This works well with deterministic policies that operate using discrete actions. However, many real-world tasks that are power constrained, such as in the field of robotics, are formulated using continuous action spaces, which are not supported. In this work, we improve the policy distillation method to support the compression of DRL models designed to solve these continuous control tasks, with an emphasis on maintaining the stochastic nature of continuous DRL algorithms. Experiments show that our methods can be used effectively to compress such policies up to 750% while maintaining or even exceeding their teacher's performance by up to 41% in solving two popular continuous control tasks.

2.
PeerJ Comput Sci ; 10: e2161, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38983226

RESUMEN

In the dynamic field of deep reinforcement learning, the self-attention mechanism has been increasingly recognized. Nevertheless, its application in discrete problem domains has been relatively limited, presenting complex optimization challenges. This article introduces a pioneering deep reinforcement learning algorithm, termed Attention-based Actor-Critic with Priority Experience Replay (A2CPER). A2CPER combines the strengths of self-attention mechanisms with the Actor-Critic framework and prioritized experience replay to enhance policy formulation for discrete problems. The algorithm's architecture features dual networks within the Actor-Critic model-the Actor formulates action policies and the Critic evaluates state values to judge the quality of policies. The incorporation of target networks aids in stabilizing network optimization. Moreover, the addition of self-attention mechanisms bolsters the policy network's capability to focus on critical information, while priority experience replay promotes training stability and reduces correlation among training samples. Empirical experiments on discrete action problems validate A2CPER's adeptness at policy optimization, marking significant performance improvements across tasks. In summary, A2CPER highlights the viability of self-attention mechanisms in reinforcement learning, presenting a robust framework for discrete problem-solving and potential applicability in complex decision-making scenarios.

3.
Sci Rep ; 14(1): 14127, 2024 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-38898063

RESUMEN

Since conventional PID (Proportional-Integral-Derivative) controllers hardly control the robot to stabilize for constant force grinding under changing environmental conditions, it is necessary to add a compensation term to conventional PID controllers. An optimal parameter finding algorithm based on SAC (Soft-Actor-Critic) is proposed to solve the problem that the compensation term parameters are difficult to obtain, including training state action and normalization preprocessing, reward function design, and targeted deep neural network design. The algorithm is used to find the optimal controller compensation term parameters and applied to the PID controller to complete the compensation through the inverse kinematics of the robot to achieve constant force grinding control. To verify the algorithm's feasibility, a simulation model of a grinding robot with sensible force information is established, and the simulation results show that the controller trained with the algorithm can achieve constant force grinding of the robot. Finally, the robot constant force grinding experimental system platform is built for testing, which verifies the control effect of the optimal parameter finding algorithm on the robot constant force grinding and has specific environmental adaptability.

4.
Math Biosci Eng ; 21(5): 6077-6096, 2024 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-38872570

RESUMEN

Due to the complexity of the driving environment and the dynamics of the behavior of traffic participants, self-driving in dense traffic flow is very challenging. Traditional methods usually rely on predefined rules, which are difficult to adapt to various driving scenarios. Deep reinforcement learning (DRL) shows advantages over rule-based methods in complex self-driving environments, demonstrating the great potential of intelligent decision-making. However, one of the problems of DRL is the inefficiency of exploration; typically, it requires a lot of trial and error to learn the optimal policy, which leads to its slow learning rate and makes it difficult for the agent to learn well-performing decision-making policies in self-driving scenarios. Inspired by the outstanding performance of supervised learning in classification tasks, we propose a self-driving intelligent control method that combines human driving experience and adaptive sampling supervised actor-critic algorithm. Unlike traditional DRL, we modified the learning process of the policy network by combining supervised learning and DRL and adding human driving experience to the learning samples to better guide the self-driving vehicle to learn the optimal policy through human driving experience and real-time human guidance. In addition, in order to make the agent learn more efficiently, we introduced real-time human guidance in its learning process, and an adaptive balanced sampling method was designed for improving the sampling performance. We also designed the reward function in detail for different evaluation indexes such as traffic efficiency, which further guides the agent to learn the self-driving intelligent control policy in a better way. The experimental results show that the method is able to control vehicles in complex traffic environments for self-driving tasks and exhibits better performance than other DRL methods.

5.
Front Robot AI ; 11: 1229026, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38690119

RESUMEN

Introduction: Multi-agent systems are an interdisciplinary research field that describes the concept of multiple decisive individuals interacting with a usually partially observable environment. Given the recent advances in single-agent reinforcement learning, multi-agent reinforcement learning (RL) has gained tremendous interest in recent years. Most research studies apply a fully centralized learning scheme to ease the transfer from the single-agent domain to multi-agent systems. Methods: In contrast, we claim that a decentralized learning scheme is preferable for applications in real-world scenarios as this allows deploying a learning algorithm on an individual robot rather than deploying the algorithm to a complete fleet of robots. Therefore, this article outlines a novel actor-critic (AC) approach tailored to cooperative MARL problems in sparsely rewarded domains. Our approach decouples the MARL problem into a set of distributed agents that model the other agents as responsive entities. In particular, we propose using two separate critics per agent to distinguish between the joint task reward and agent-based costs as commonly applied within multi-robot planning. On one hand, the agent-based critic intends to decrease agent-specific costs. On the other hand, each agent intends to optimize the joint team reward based on the joint task critic. As this critic still depends on the joint action of all agents, we outline two suitable behavior models based on Stackelberg games: a game against nature and a dyadic game against each agent. Following these behavior models, our algorithm allows fully decentralized execution and training. Results and Discussion: We evaluate our presented method using the proposed behavior models within a sparsely rewarded simulated multi-agent environment. Although our approach already outperforms the state-of-the-art learners, we conclude this article by outlining possible extensions of our algorithm that future research may build upon.

6.
Front Neurorobot ; 18: 1338189, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38566892

RESUMEN

In real-world scenarios, making navigation decisions for autonomous driving involves a sequential set of steps. These judgments are made based on partial observations of the environment, while the underlying model of the environment remains unknown. A prevalent method for resolving such issues is reinforcement learning, in which the agent acquires knowledge through a succession of rewards in addition to fragmentary and noisy observations. This study introduces an algorithm named deep reinforcement learning navigation via decision transformer (DRLNDT) to address the challenge of enhancing the decision-making capabilities of autonomous vehicles operating in partially observable urban environments. The DRLNDT framework is built around the Soft Actor-Critic (SAC) algorithm. DRLNDT utilizes Transformer neural networks to effectively model the temporal dependencies in observations and actions. This approach aids in mitigating judgment errors that may arise due to sensor noise or occlusion within a given state. The process of extracting latent vectors from high-quality images involves the utilization of a variational autoencoder (VAE). This technique effectively reduces the dimensionality of the state space, resulting in enhanced training efficiency. The multimodal state space consists of vector states, including velocity and position, which the vehicle's intrinsic sensors can readily obtain. Additionally, latent vectors derived from high-quality images are incorporated to facilitate the Agent's assessment of the present trajectory. Experiments demonstrate that DRLNDT may achieve a superior optimal policy without prior knowledge of the environment, detailed maps, or routing assistance, surpassing the baseline technique and other policy methods that lack historical data.

7.
Sensors (Basel) ; 24(7)2024 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-38610247

RESUMEN

This paper introduces a model-free optimization method based on reinforcement learning (RL) aimed at resolving the issues of active power and frequency oscillations present in a traditional virtual synchronous generator (VSG). The RL agent utilizes the active power and frequency response of the VSG as state information inputs and generates actions to adjust the virtual inertia and damping coefficients for an optimal response. Distinctively, this study incorporates a setting-time term into the reward function design, alongside power and frequency deviations, to avoid prolonged system transients due to over-optimization. The soft actor critic (SAC) algorithm is utilized to determine the optimal strategy. SAC, being model-free with fast convergence, avoids policy overestimation bias, thus achieving superior convergence results. Finally, the proposed method is validated through MATLAB/Simulink simulation. Compared to other approaches, this method more effectively suppresses oscillations in active power and frequency and significantly reduces the setting time.

8.
Biomimetics (Basel) ; 9(4)2024 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-38667207

RESUMEN

This paper introduces a reinforcement learning method that leverages task decomposition and a task-specific reward system to address complex high-level tasks, such as door opening, block stacking, and nut assembly. These tasks are decomposed into various subtasks, with the grasping and putting tasks executed through single joint and gripper actions, while other tasks are trained using the SAC algorithm alongside the task-specific reward system. The task-specific reward system aims to increase the learning speed, enhance the success rate, and enable more efficient task execution. The experimental results demonstrate the efficacy of the proposed method, achieving success rates of 99.9% for door opening, 95.25% for block stacking, 80.8% for square-nut assembly, and 90.9% for round-nut assembly. Overall, this method presents a promising solution to address the challenges associated with complex tasks, offering improvements over the traditional end-to-end approach.

9.
Sci Rep ; 14(1): 6014, 2024 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-38472457

RESUMEN

Experience replay has been instrumental in achieving significant advancements in reinforcement learning by increasing the utilization of data. To further improve the sampling efficiency, prioritized experience replay (PER) was proposed. This algorithm prioritizes experiences based on the temporal difference error (TD error), enabling the agent to learn from more valuable experiences stored in the experience pool. While various prioritized algorithms have been proposed, they ignored the dynamic changes of experience value during the training process, merely combining different priority criteria in a fixed or linear manner. In this paper, we present a novel prioritized experience replay algorithm called PERDP, which employs a dynamic priority adjustment framework. PERDP adaptively adjusts the weights of each criterion based on average priority level of the experience pool and evaluates experiences' value according to current network. We apply this algorithm to the SAC model and conduct experiments in the OpenAI Gym experimental environment. The experiment results demonstrate that the PERDP exhibits superior convergence speed when compared to the PER.

10.
Math Biosci Eng ; 21(1): 1445-1471, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38303472

RESUMEN

With the rise of Industry 4.0, manufacturing is shifting towards customization and flexibility, presenting new challenges to meet rapidly evolving market and customer needs. To address these challenges, this paper suggests a novel approach to address flexible job shop scheduling problems (FJSPs) through reinforcement learning (RL). This method utilizes an actor-critic architecture that merges value-based and policy-based approaches. The actor generates deterministic policies, while the critic evaluates policies and guides the actor to achieve the most optimal policy. To construct the Markov decision process, a comprehensive feature set was utilized to accurately represent the system's state, and eight sets of actions were designed, inspired by traditional scheduling rules. The formulation of rewards indirectly measures the effectiveness of actions, promoting strategies that minimize job completion times and enhance adherence to scheduling constraints. The experimental evaluation conducted a thorough assessment of the proposed reinforcement learning framework through simulations on standard FJSP benchmarks, comparing the proposed method against several well-known heuristic scheduling rules, related RL algorithms and intelligent algorithms. The results indicate that the proposed method consistently outperforms traditional approaches and exhibits exceptional adaptability and efficiency, particularly in large-scale datasets.

11.
Sensors (Basel) ; 24(2)2024 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-38257654

RESUMEN

Autonomous mobile robots have become integral to daily life, providing crucial services across diverse domains. This paper focuses on path following, a fundamental technology and critical element in achieving autonomous mobility. Existing methods predominantly address tracking through steering control, neglecting velocity control or relying on path-specific reference velocities, thereby constraining their generality. In this paper, we propose a novel approach that integrates the conventional pure pursuit algorithm with deep reinforcement learning for a nonholonomic mobile robot. Our methodology employs pure pursuit for steering control and utilizes the soft actor-critic algorithm to train a velocity control strategy within randomly generated path environments. Through simulation and experimental validation, our approach exhibits notable advancements in path convergence and adaptive velocity adjustments to accommodate paths with varying curvatures. Furthermore, this method holds the potential for broader applicability to vehicles adhering to nonholonomic constraints beyond the specific model examined in this paper. In summary, our study contributes to the progression of autonomous mobility by harmonizing conventional algorithms with cutting-edge deep reinforcement learning techniques, enhancing the robustness of path following.

12.
Artif Intell Med ; 147: 102736, 2024 01.
Artículo en Inglés | MEDLINE | ID: mdl-38184360

RESUMEN

Deep Brain Stimulation (DBS) is an implantable medical device used for electrical stimulation to treat neurological disorders. Traditional DBS devices provide fixed frequency pulses, but personalized adjustment of stimulation parameters is crucial for optimal treatment. This paper introduces a Basal Ganglia inspired Reinforcement Learning (BGRL) approach, incorporating a closed-loop feedback mechanism to suppress neural synchrony during neurological fluctuations. The BGRL approach leverages the resemblance between the Basal Ganglia region of brain by incorporating the actor-critic architecture of reinforcement learning (RL). Simulation results demonstrate that BGRL significantly reduces synchronous electrical pulses compared to other standard RL algorithms. BGRL algorithm outperforms existing RL methods in terms of suppression capability and energy consumption, validated through comparisons using ensemble oscillators. Results shown in the paper demonstrate BGRL suppressed the synchronous electrical pulses across three signaling regimes namely regular, chaotic and bursting by 40%, 146% and 40% respectively as compared to soft actor-critic model. BGRL shows promise in effectively suppressing neural synchrony in DBS therapy, providing an efficient alternative to open-loop methodologies.


Asunto(s)
Aprendizaje , Refuerzo en Psicología , Ganglios Basales , Encéfalo , Algoritmos
13.
Neural Netw ; 169: 764-777, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37981458

RESUMEN

Actor-critic methods are leading in many challenging continuous control tasks. Advantage estimators, the most common critics in the actor-critic framework, combine state values from bootstrapping value functions and sample returns. Different combinations balance the bias introduced by state values and the variance returned by samples to reduce estimation errors. The bias and variance constantly fluctuate throughout training, leading to different optimal combinations. However, existing advantage estimators usually use fixed combinations that fail to account for the trade-off between minimizing bias and variance to find the optimal estimate. Our previous work on adaptive advantage estimation (AAE) analyzed the sources of bias and variance and offered two indicators. This paper further explores the relationship between the indicators and their optimal combination through typical numerical experiments. These analyses develop a general form of adaptive combinations of state values and sample returns to achieve low estimation errors. Empirical results on simulated robotic locomotion tasks show that our proposed estimators achieve similar or superior performance compared to previous generalized advantage estimators (GAE).


Asunto(s)
Algoritmos , Robótica , Sesgo
14.
Cell Rep ; 42(11): 113422, 2023 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-37950871

RESUMEN

The medial frontal cortex (MFC) plays an important but disputed role in speed-accuracy trade-off (SAT). In samples of neural spiking in the supplementary eye field (SEF) in the MFC simultaneous with the visuomotor frontal eye field and superior colliculus in macaques performing a visual search with instructed SAT, during accuracy emphasis, most SEF neurons discharge less from before stimulus presentation until response generation. Discharge rates adjust immediately and simultaneously across structures upon SAT cue changes. SEF neurons signal choice errors with stronger and earlier activity during accuracy emphasis. Other neurons signal timing errors, covarying with adjusting response time. Spike correlations between neurons in the SEF and visuomotor areas did not appear, disappear, or change sign across SAT conditions or trial outcomes. These results clarify findings with noninvasive measures, complement previous neurophysiological findings, and endorse the role of the MFC as a critic for the actor instantiated in visuomotor structures.


Asunto(s)
Función Ejecutiva , Campos Visuales , Animales , Macaca , Lóbulo Frontal/fisiología , Neuronas/fisiología , Movimientos Sacádicos
15.
Biomimetics (Basel) ; 8(6)2023 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-37887612

RESUMEN

The path planning problem has gained more attention due to the gradual popularization of mobile robots. The utilization of reinforcement learning techniques facilitates the ability of mobile robots to successfully navigate through an environment containing obstacles and effectively plan their path. This is achieved by the robots' interaction with the environment, even in situations when the environment is unfamiliar. Consequently, we provide a refined deep reinforcement learning algorithm that builds upon the soft actor-critic (SAC) algorithm, incorporating the concept of maximum entropy for the purpose of path planning. The objective of this strategy is to mitigate the constraints inherent in conventional reinforcement learning, enhance the efficacy of the learning process, and accommodate intricate situations. In the context of reinforcement learning, two significant issues arise: inadequate incentives and inefficient sample use during the training phase. To address these challenges, the hindsight experience replay (HER) mechanism has been presented as a potential solution. The HER mechanism aims to enhance algorithm performance by effectively reusing past experiences. Through the utilization of simulation studies, it can be demonstrated that the enhanced algorithm exhibits superior performance in comparison with the pre-existing method.

16.
Sensors (Basel) ; 23(20)2023 Oct 23.
Artículo en Inglés | MEDLINE | ID: mdl-37896743

RESUMEN

An end-to-end approach to autonomous navigation that is based on deep reinforcement learning (DRL) with a survival penalty function is proposed in this paper. Two actor-critic (AC) frameworks, namely, deep deterministic policy gradient (DDPG) and twin-delayed DDPG (TD3), are employed to enable a nonholonomic wheeled mobile robot (WMR) to perform navigation in dynamic environments containing obstacles and for which no maps are available. A comprehensive reward based on the survival penalty function is introduced; this approach effectively solves the sparse reward problem and enables the WMR to move toward its target. Consecutive episodes are connected to increase the cumulative penalty for scenarios involving obstacles; this method prevents training failure and enables the WMR to plan a collision-free path. Simulations are conducted for four scenarios-movement in an obstacle-free space, in a parking lot, at an intersection without and with a central obstacle, and in a multiple obstacle space-to demonstrate the efficiency and operational safety of our method. For the same navigation environment, compared with the DDPG algorithm, the TD3 algorithm exhibits faster numerical convergence and higher stability in the training phase, as well as a higher task execution success rate in the evaluation phase.

17.
Sensors (Basel) ; 23(18)2023 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-37765915

RESUMEN

To accommodate the requirements of extensive coverage and ubiquitous connectivity in 6G communications, satellite plays a more significant role in it. As users and devices explosively grow, new multiple access technologies are called for. Among the new candidates, rate splitting multiple access (RSMA) shows great potential. Since satellites are power-limited, we investigate the energy-efficient resource allocation in the integrated satellite terrestrial network (ISTN)-adopting RSMA scheme in this paper. However, this non-convex problem is challenging to solve using conventional model-based methods. Because this optimization task has a quality of service (QoS) requirement and continuous action/state space, we propose to use constrained soft actor-critic (SAC) to tackle it. This policy-gradient algorithm incorporates the Lagrangian relaxation technique to convert the original constrained problem into a penalized unconstrained one. The reward is maximized while the requirements are satisfied. Moreover, the learning process is time-consuming and unnecessary when little changes in the network. So, an on-off mechanism is introduced to avoid this situation. By calculating the difference between the current state and the last one, the system will decide to learn a new action or take the last one. The simulation results show that the proposed algorithm can outperform other benchmark algorithms in terms of energy efficiency while satisfying the QoS constraint. In addition, the time consumption is lowered because of the on-off design.

18.
Sensors (Basel) ; 23(14)2023 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-37514846

RESUMEN

A proactive mobile network (PMN) is a novel architecture enabling extremely low-latency communication. This architecture employs an open-loop transmission mode that prohibits all real-time control feedback processes and employs virtual cell technology to allocate resources non-exclusively to users. However, such a design also results in significant potential user interference and worsens the communication's reliability. In this paper, we propose introducing multi-reconfigurable intelligent surface (RIS) technology into the downlink process of the PMN to increase the network's capacity against interference. Since the PMN environment is complex and time varying and accurate channel state information cannot be acquired in real time, it is challenging to manage RISs to service the PMN effectively. We begin by formulating an optimization problem for RIS phase shifts and reflection coefficients. Furthermore, motivated by recent developments in deep reinforcement learning (DRL), we propose an asynchronous advantage actor-critic (A3C)-based method for solving the problem by appropriately designing the action space, state space, and reward function. Simulation results indicate that deploying RISs within a region can significantly facilitate interference suppression. The proposed A3C-based scheme can achieve a higher capacity than baseline schemes and approach the upper limit as the number of RISs increases.

19.
Biomimetics (Basel) ; 8(2)2023 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-37366835

RESUMEN

This paper proposes a task decomposition and dedicated reward-system-based reinforcement learning algorithm for the Pick-and-Place task, which is one of the high-level tasks of robot manipulators. The proposed method decomposes the Pick-and-Place task into three subtasks: two reaching tasks and one grasping task. One of the two reaching tasks is approaching the object, and the other is reaching the place position. These two reaching tasks are carried out using each optimal policy of the agents which are trained using Soft Actor-Critic (SAC). Different from the two reaching tasks, the grasping is implemented via simple logic which is easily designable but may result in improper gripping. To assist the grasping task properly, a dedicated reward system for approaching the object is designed through using individual axis-based weights. To verify the validity of the proposed method, wecarry out various experiments in the MuJoCo physics engine with the Robosuite framework. According to the simulation results of four trials, the robot manipulator picked up and released the object in the goal position with an average success rate of 93.2%.

20.
Diagnostics (Basel) ; 13(8)2023 Apr 20.
Artículo en Inglés | MEDLINE | ID: mdl-37189591

RESUMEN

While the world is working quietly to repair the damage caused by COVID-19's widespread transmission, the monkeypox virus threatens to become a global pandemic. There are several nations that report new monkeypox cases daily, despite the virus being less deadly and contagious than COVID-19. Monkeypox disease may be detected using artificial intelligence techniques. This paper suggests two strategies for improving monkeypox image classification precision. Based on reinforcement learning and parameter optimization for multi-layer neural networks, the suggested approaches are based on feature extraction and classification: the Q-learning algorithm determines the rate at which an act occurs in a particular state; Malneural networks are binary hybrid algorithms that improve the parameters of neural networks. The algorithms are evaluated using an openly available dataset. In order to analyze the proposed optimization feature selection for monkeypox classification, interpretation criteria were utilized. In order to evaluate the efficiency, significance, and robustness of the suggested algorithms, a series of numerical tests were conducted. There were 95% precision, 95% recall, and 96% f1 scores for monkeypox disease. As compared to traditional learning methods, this method has a higher accuracy value. The overall macro average was around 0.95, and the overall weighted average was around 0.96. When compared to the benchmark algorithms, DDQN, Policy Gradient, and Actor-Critic, the Malneural network had the highest accuracy (around 0.985). In comparison with traditional methods, the proposed methods were found to be more effective. Clinicians can use this proposal to treat monkeypox patients and administration agencies can use it to observe the origin and current status of the disease.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA