Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Sensors (Basel) ; 24(15)2024 Jul 27.
Artículo en Inglés | MEDLINE | ID: mdl-39123922

RESUMEN

Interest in deploying deep reinforcement learning (DRL) models on low-power edge devices, such as Autonomous Mobile Robots (AMRs) and Internet of Things (IoT) devices, has seen a significant rise due to the potential of performing real-time inference by eliminating the latency and reliability issues incurred from wireless communication and the privacy benefits of processing data locally. Deploying such energy-intensive models on power-constrained devices is not always feasible, however, which has led to the development of model compression techniques that can reduce the size and computational complexity of DRL policies. Policy distillation, the most popular of these methods, can be used to first lower the number of network parameters by transferring the behavior of a large teacher network to a smaller student model before deploying these students at the edge. This works well with deterministic policies that operate using discrete actions. However, many real-world tasks that are power constrained, such as in the field of robotics, are formulated using continuous action spaces, which are not supported. In this work, we improve the policy distillation method to support the compression of DRL models designed to solve these continuous control tasks, with an emphasis on maintaining the stochastic nature of continuous DRL algorithms. Experiments show that our methods can be used effectively to compress such policies up to 750% while maintaining or even exceeding their teacher's performance by up to 41% in solving two popular continuous control tasks.

2.
Sci Rep ; 14(1): 14127, 2024 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-38898063

RESUMEN

Since conventional PID (Proportional-Integral-Derivative) controllers hardly control the robot to stabilize for constant force grinding under changing environmental conditions, it is necessary to add a compensation term to conventional PID controllers. An optimal parameter finding algorithm based on SAC (Soft-Actor-Critic) is proposed to solve the problem that the compensation term parameters are difficult to obtain, including training state action and normalization preprocessing, reward function design, and targeted deep neural network design. The algorithm is used to find the optimal controller compensation term parameters and applied to the PID controller to complete the compensation through the inverse kinematics of the robot to achieve constant force grinding control. To verify the algorithm's feasibility, a simulation model of a grinding robot with sensible force information is established, and the simulation results show that the controller trained with the algorithm can achieve constant force grinding of the robot. Finally, the robot constant force grinding experimental system platform is built for testing, which verifies the control effect of the optimal parameter finding algorithm on the robot constant force grinding and has specific environmental adaptability.

3.
Sensors (Basel) ; 24(7)2024 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-38610247

RESUMEN

This paper introduces a model-free optimization method based on reinforcement learning (RL) aimed at resolving the issues of active power and frequency oscillations present in a traditional virtual synchronous generator (VSG). The RL agent utilizes the active power and frequency response of the VSG as state information inputs and generates actions to adjust the virtual inertia and damping coefficients for an optimal response. Distinctively, this study incorporates a setting-time term into the reward function design, alongside power and frequency deviations, to avoid prolonged system transients due to over-optimization. The soft actor critic (SAC) algorithm is utilized to determine the optimal strategy. SAC, being model-free with fast convergence, avoids policy overestimation bias, thus achieving superior convergence results. Finally, the proposed method is validated through MATLAB/Simulink simulation. Compared to other approaches, this method more effectively suppresses oscillations in active power and frequency and significantly reduces the setting time.

4.
Biomimetics (Basel) ; 9(4)2024 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-38667207

RESUMEN

This paper introduces a reinforcement learning method that leverages task decomposition and a task-specific reward system to address complex high-level tasks, such as door opening, block stacking, and nut assembly. These tasks are decomposed into various subtasks, with the grasping and putting tasks executed through single joint and gripper actions, while other tasks are trained using the SAC algorithm alongside the task-specific reward system. The task-specific reward system aims to increase the learning speed, enhance the success rate, and enable more efficient task execution. The experimental results demonstrate the efficacy of the proposed method, achieving success rates of 99.9% for door opening, 95.25% for block stacking, 80.8% for square-nut assembly, and 90.9% for round-nut assembly. Overall, this method presents a promising solution to address the challenges associated with complex tasks, offering improvements over the traditional end-to-end approach.

5.
Front Neurorobot ; 18: 1338189, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38566892

RESUMEN

In real-world scenarios, making navigation decisions for autonomous driving involves a sequential set of steps. These judgments are made based on partial observations of the environment, while the underlying model of the environment remains unknown. A prevalent method for resolving such issues is reinforcement learning, in which the agent acquires knowledge through a succession of rewards in addition to fragmentary and noisy observations. This study introduces an algorithm named deep reinforcement learning navigation via decision transformer (DRLNDT) to address the challenge of enhancing the decision-making capabilities of autonomous vehicles operating in partially observable urban environments. The DRLNDT framework is built around the Soft Actor-Critic (SAC) algorithm. DRLNDT utilizes Transformer neural networks to effectively model the temporal dependencies in observations and actions. This approach aids in mitigating judgment errors that may arise due to sensor noise or occlusion within a given state. The process of extracting latent vectors from high-quality images involves the utilization of a variational autoencoder (VAE). This technique effectively reduces the dimensionality of the state space, resulting in enhanced training efficiency. The multimodal state space consists of vector states, including velocity and position, which the vehicle's intrinsic sensors can readily obtain. Additionally, latent vectors derived from high-quality images are incorporated to facilitate the Agent's assessment of the present trajectory. Experiments demonstrate that DRLNDT may achieve a superior optimal policy without prior knowledge of the environment, detailed maps, or routing assistance, surpassing the baseline technique and other policy methods that lack historical data.

6.
Sci Rep ; 14(1): 6014, 2024 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-38472457

RESUMEN

Experience replay has been instrumental in achieving significant advancements in reinforcement learning by increasing the utilization of data. To further improve the sampling efficiency, prioritized experience replay (PER) was proposed. This algorithm prioritizes experiences based on the temporal difference error (TD error), enabling the agent to learn from more valuable experiences stored in the experience pool. While various prioritized algorithms have been proposed, they ignored the dynamic changes of experience value during the training process, merely combining different priority criteria in a fixed or linear manner. In this paper, we present a novel prioritized experience replay algorithm called PERDP, which employs a dynamic priority adjustment framework. PERDP adaptively adjusts the weights of each criterion based on average priority level of the experience pool and evaluates experiences' value according to current network. We apply this algorithm to the SAC model and conduct experiments in the OpenAI Gym experimental environment. The experiment results demonstrate that the PERDP exhibits superior convergence speed when compared to the PER.

7.
Sensors (Basel) ; 24(2)2024 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-38257654

RESUMEN

Autonomous mobile robots have become integral to daily life, providing crucial services across diverse domains. This paper focuses on path following, a fundamental technology and critical element in achieving autonomous mobility. Existing methods predominantly address tracking through steering control, neglecting velocity control or relying on path-specific reference velocities, thereby constraining their generality. In this paper, we propose a novel approach that integrates the conventional pure pursuit algorithm with deep reinforcement learning for a nonholonomic mobile robot. Our methodology employs pure pursuit for steering control and utilizes the soft actor-critic algorithm to train a velocity control strategy within randomly generated path environments. Through simulation and experimental validation, our approach exhibits notable advancements in path convergence and adaptive velocity adjustments to accommodate paths with varying curvatures. Furthermore, this method holds the potential for broader applicability to vehicles adhering to nonholonomic constraints beyond the specific model examined in this paper. In summary, our study contributes to the progression of autonomous mobility by harmonizing conventional algorithms with cutting-edge deep reinforcement learning techniques, enhancing the robustness of path following.

8.
Biomimetics (Basel) ; 8(6)2023 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-37887612

RESUMEN

The path planning problem has gained more attention due to the gradual popularization of mobile robots. The utilization of reinforcement learning techniques facilitates the ability of mobile robots to successfully navigate through an environment containing obstacles and effectively plan their path. This is achieved by the robots' interaction with the environment, even in situations when the environment is unfamiliar. Consequently, we provide a refined deep reinforcement learning algorithm that builds upon the soft actor-critic (SAC) algorithm, incorporating the concept of maximum entropy for the purpose of path planning. The objective of this strategy is to mitigate the constraints inherent in conventional reinforcement learning, enhance the efficacy of the learning process, and accommodate intricate situations. In the context of reinforcement learning, two significant issues arise: inadequate incentives and inefficient sample use during the training phase. To address these challenges, the hindsight experience replay (HER) mechanism has been presented as a potential solution. The HER mechanism aims to enhance algorithm performance by effectively reusing past experiences. Through the utilization of simulation studies, it can be demonstrated that the enhanced algorithm exhibits superior performance in comparison with the pre-existing method.

9.
Sensors (Basel) ; 23(18)2023 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-37765915

RESUMEN

To accommodate the requirements of extensive coverage and ubiquitous connectivity in 6G communications, satellite plays a more significant role in it. As users and devices explosively grow, new multiple access technologies are called for. Among the new candidates, rate splitting multiple access (RSMA) shows great potential. Since satellites are power-limited, we investigate the energy-efficient resource allocation in the integrated satellite terrestrial network (ISTN)-adopting RSMA scheme in this paper. However, this non-convex problem is challenging to solve using conventional model-based methods. Because this optimization task has a quality of service (QoS) requirement and continuous action/state space, we propose to use constrained soft actor-critic (SAC) to tackle it. This policy-gradient algorithm incorporates the Lagrangian relaxation technique to convert the original constrained problem into a penalized unconstrained one. The reward is maximized while the requirements are satisfied. Moreover, the learning process is time-consuming and unnecessary when little changes in the network. So, an on-off mechanism is introduced to avoid this situation. By calculating the difference between the current state and the last one, the system will decide to learn a new action or take the last one. The simulation results show that the proposed algorithm can outperform other benchmark algorithms in terms of energy efficiency while satisfying the QoS constraint. In addition, the time consumption is lowered because of the on-off design.

10.
Biomimetics (Basel) ; 8(2)2023 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-37366835

RESUMEN

This paper proposes a task decomposition and dedicated reward-system-based reinforcement learning algorithm for the Pick-and-Place task, which is one of the high-level tasks of robot manipulators. The proposed method decomposes the Pick-and-Place task into three subtasks: two reaching tasks and one grasping task. One of the two reaching tasks is approaching the object, and the other is reaching the place position. These two reaching tasks are carried out using each optimal policy of the agents which are trained using Soft Actor-Critic (SAC). Different from the two reaching tasks, the grasping is implemented via simple logic which is easily designable but may result in improper gripping. To assist the grasping task properly, a dedicated reward system for approaching the object is designed through using individual axis-based weights. To verify the validity of the proposed method, wecarry out various experiments in the MuJoCo physics engine with the Robosuite framework. According to the simulation results of four trials, the robot manipulator picked up and released the object in the goal position with an average success rate of 93.2%.

11.
Materials (Basel) ; 16(4)2023 Feb 06.
Artículo en Inglés | MEDLINE | ID: mdl-36836996

RESUMEN

Electromagnetic (EM) waves that bypass obstacles to achieve focus at arbitrary positions are of immense significance to communication and radar technologies. Small-sized and low-cost metasurfaces enable the accomplishment of this function. However, the magnitude-phase characteristics are challenging to analyze when there are obstacles between the metasurface and the EM wave. In this study, we creatively combined the deep reinforcement learning algorithm soft actor-critic (SAC) with a reconfigurable metasurface to construct an SAC-driven metasurface architecture that realizes focusing at any position under obstacles using real-time simulation data. The agent learns the optimal policy to achieve focus while interacting with a complex environment, and the framework proves to be effective even in complex scenes with multiple objects. Driven by real-time reinforcement learning, the knowledge learned from one environment can be flexibly transferred to another environment to maximize information utilization and save considerable iteration time. In the context of future 6G communications development, the proposed method may significantly reduce the path loss of users in an occluded state, thereby solving the open challenge of poor signal penetration. Our study may also inspire the implementation of other intelligent devices.

12.
Neural Netw ; 157: 288-304, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36375347

RESUMEN

This paper proposes, implements, and evaluates a reinforcement learning (RL)-based computational framework for automatic mesh generation. Mesh generation plays a fundamental role in numerical simulations in the area of computer aided design and engineering (CAD/E). It is identified as one of the critical issues in the NASA CFD Vision 2030 Study. Existing mesh generation methods suffer from high computational complexity, low mesh quality in complex geometries, and speed limitations. These methods and tools, including commercial software packages, are typically semiautomatic and they need inputs or help from human experts. By formulating the mesh generation as a Markov decision process (MDP) problem, we are able to use a state-of-the-art reinforcement learning (RL) algorithm called "soft actor-critic" to automatically learn from trials the policy of actions for mesh generation. The implementation of this RL algorithm for mesh generation allows us to build a fully automatic mesh generation system without human intervention and any extra clean-up operations, which fills the gap in the existing mesh generation tools. In the experiments to compare with two representative commercial software packages, our system demonstrates promising performance with respect to scalability, generalizability, and effectiveness.


Asunto(s)
Algoritmos , Refuerzo en Psicología , Humanos
13.
Sensors (Basel) ; 22(18)2022 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-36146202

RESUMEN

In low earth orbit (LEO) satellite-based applications (e.g., remote sensing and surveillance), it is important to efficiently transmit collected data to ground stations (GS). However, LEO satellites' high mobility and resultant insufficient time for downloading make this challenging. In this paper, we propose a deep-reinforcement-learning (DRL)-based cooperative downloading scheme, which utilizes inter-satellite communication links (ISLs) to fully utilize satellites' downloading capabilities. To this end, we formulate a Markov decision problem (MDP) with the objective to maximize the amount of downloaded data. To learn the optimal approach to the formulated problem, we adopt a soft-actor-critic (SAC)-based DRL algorithm in discretized action spaces. Moreover, we design a novel neural network consisting of a graph attention network (GAT) layer to extract latent features from the satellite network and parallel fully connected (FC) layers to control individual satellites of the network. Evaluation results demonstrate that the proposed DRL-based cooperative downloading scheme can enhance the average utilization of contact time by up to 17.8% compared with independent downloading and randomly offloading schemes.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Planeta Tierra , Comunicaciones por Satélite , Telemetría
14.
Sensors (Basel) ; 22(16)2022 Aug 14.
Artículo en Inglés | MEDLINE | ID: mdl-36015832

RESUMEN

The marine environment is a hostile setting for robotics. It is strongly unstructured, uncertain, and includes many external disturbances that cannot be easily predicted or modeled. In this work, we attempt to control an autonomous underwater vehicle (AUV) to perform a waypoint tracking task, using a machine learning-based controller. There has been great progress in machine learning (in many different domains) in recent years; in the subfield of deep reinforcement learning, several algorithms suitable for the continuous control of dynamical systems have been designed. We implemented the soft actor-critic (SAC) algorithm, an entropy-regularized deep reinforcement learning algorithm that allows fulfilling a learning task and encourages the exploration of the environment simultaneously. We compared a SAC-based controller with a proportional integral derivative (PID) controller on a waypoint tracking task using specific performance metrics. All tests were simulated via the UUV simulator. We applied these two controllers to the RexROV 2, a six degrees of freedom cube-shaped remotely operated underwater Vehicle (ROV) converted in an AUV. We propose several interesting contributions as a result of these tests, such as making the SAC control and guiding the AUV simultaneously, outperforming the PID controller in terms of energy saving, and reducing the amount of information needed by the SAC algorithm inputs. Moreover, our implementation of this controller allows facilitating the transfer towards real-world robots. The code corresponding to this work is available on GitHub.

15.
Accid Anal Prev ; 174: 106729, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-35700685

RESUMEN

Car-following behavior is a common driving behavior. It is necessary to consider the following vehicle in the car-following model of autonomous vehicle (AV) under the background of the vehicle-to-vehicle transportation system. In this study, a safe velocity control method for AV based on reinforcement learning with considering the following vehicle is proposed. First, the mixed driving environment of AVs and human-driven vehicles is constructed, and the trajectories of the leading and following vehicles are extracted from the naturalistic High D driving dataset. Next, the soft actor-critic (SAC) algorithm is used as the velocity control algorithm, in which the agent is AV, the action is acceleration, and the state is the relative distance and relative speed between the AV and the leading and following vehicles. Then, a reward function based on state and corresponding action is designed to guide AV to choose acceleration without collision between the leading and following vehicles. Furthermore, AVs are gradually able to learn to avoid collisions between the leading and following vehicles after training the model. The test result of the trained model shows that the SAC agent can achieve complete collision avoidance, resulting in zero collision. Finally, the driving performance of the SAC agent and that of human driving are compared and analyzed for safety and efficiency. The results of this study are expected to improve the safety of the car-following process..


Asunto(s)
Accidentes de Tránsito , Conducción de Automóvil , Aceleración , Accidentes de Tránsito/prevención & control , Algoritmos , Automóviles , Vehículos Autónomos , Humanos
16.
Front Neurorobot ; 16: 829437, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35308311

RESUMEN

We propose a vision-proprioception model for planar object pushing, efficiently integrating all necessary information from the environment. A Variational Autoencoder (VAE) is used to extract compact representations from the task-relevant part of the image. With the real-time robot state obtained easily from the hardware system, we fuse the latent representations from the VAE and the robot end-effector position together as the state of a Markov Decision Process. We use Soft Actor-Critic to train the robot to push different objects from random initial poses to target positions in simulation. Hindsight Experience replay is applied during the training process to improve the sample efficiency. Experiments demonstrate that our algorithm achieves a pushing performance superior to a state-based baseline model that cannot be generalized to a different object and outperforms state-of-the-art policies which operate on raw image observations. At last, we verify that our trained model has a good generalization ability to unseen objects in the real world.

17.
Entropy (Basel) ; 24(10)2022 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-37420461

RESUMEN

With the development of artificial intelligence, intelligent communication jamming decision making is an important research direction of cognitive electronic warfare. In this paper, we consider a complex intelligent jamming decision scenario in which both communication parties choose to adjust physical layer parameters to avoid jamming in a non-cooperative scenario and the jammer achieves accurate jamming by interacting with the environment. However, when the situation becomes complex and large in number, traditional reinforcement learning suffers from the problems of failure to converge and a high number of interactions, which are fatal and unrealistic in a real warfare environment. To solve this problem, we propose a deep reinforcement learning based and maximum-entropy-based soft actor-critic (SAC) algorithm. In the proposed algorithm, we add an improved Wolpertinger architecture to the original SAC algorithm in order to reduce the number of interactions and improve the accuracy of the algorithm. The results show that the proposed algorithm shows excellent performance in various scenarios of jamming and achieves accurate, fast, and continuous jamming for both sides of the communication.

18.
Sensors (Basel) ; 21(19)2021 Sep 29.
Artículo en Inglés | MEDLINE | ID: mdl-34640820

RESUMEN

Computation offloading technology extends cloud computing to the edge of the access network close to users, bringing many benefits to terminal devices with limited battery and computational resources. Nevertheless, the existing computation offloading approaches are challenging to apply to specific scenarios, such as the dense distribution of end-users and the sparse distribution of network infrastructure. The technological revolution in the unmanned aerial vehicle (UAV) and chip industry has granted UAVs more computing resources and promoted the emergence of UAV-assisted mobile edge computing (MEC) technology, which could be applied to those scenarios. However, in the MEC system with multiple users and multiple servers, making reasonable offloading decisions and allocating system resources is still a severe challenge. This paper studies the offloading decision and resource allocation problem in the UAV-assisted MEC environment with multiple users and servers. To ensure the quality of service for end-users, we set the weighted total cost of delay, energy consumption, and the size of discarded tasks as our optimization objective. We further formulate the joint optimization problem as a Markov decision process and apply the soft actor-critic (SAC) deep reinforcement learning algorithm to optimize the offloading policy. Numerical simulation results show that the offloading policy optimized by our proposed SAC-based dynamic computing offloading (SACDCO) algorithm effectively reduces the delay, energy consumption, and size of discarded tasks for the UAV-assisted MEC system. Compared with the fixed local-UAV scheme in the specific simulation setting, our proposed approach reduces system delay and energy consumption by approximately 50% and 200%, respectively.

19.
Sensors (Basel) ; 21(17)2021 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-34502781

RESUMEN

This study aims to solve the problems of poor exploration ability, single strategy, and high training cost in autonomous underwater vehicle (AUV) motion planning tasks and to overcome certain difficulties, such as multiple constraints and a sparse reward environment. In this research, an end-to-end motion planning system based on deep reinforcement learning is proposed to solve the motion planning problem of an underactuated AUV. The system directly maps the state information of the AUV and the environment into the control instructions of the AUV. The system is based on the soft actor-critic (SAC) algorithm, which enhances the exploration ability and robustness to the AUV environment. We also use the method of generative adversarial imitation learning (GAIL) to assist its training to overcome the problem that learning a policy for the first time is difficult and time-consuming in reinforcement learning. A comprehensive external reward function is then designed to help the AUV smoothly reach the target point, and the distance and time are optimized as much as possible. Finally, the end-to-end motion planning algorithm proposed in this research is tested and compared on the basis of the Unity simulation platform. Results show that the algorithm has an optimal decision-making ability during navigation, a shorter route, less time consumption, and a smoother trajectory. Moreover, GAIL can speed up the AUV training speed and minimize the training time without affecting the planning effect of the SAC algorithm.


Asunto(s)
Algoritmos , Aprendizaje , Simulación por Computador , Movimiento (Física)
20.
Sensors (Basel) ; 21(17)2021 Sep 02.
Artículo en Inglés | MEDLINE | ID: mdl-34502796

RESUMEN

External disturbance poses the primary threat to robot balance in dynamic environments. This paper provides a learning-based control architecture for quadrupedal self-balancing, which is adaptable to multiple unpredictable scenes of external continuous disturbance. Different from conventional methods which construct analytical models which explicitly reason the balancing process, our work utilized reinforcement learning and artificial neural network to avoid incomprehensible mathematical modeling. The control policy is composed of a neural network and a Tanh Gaussian policy, which implicitly establishes the fuzzy mapping from proprioceptive signals to action commands. During the training process, the maximum-entropy method (soft actor-critic algorithm) is employed to endow the policy with powerful exploration and generalization ability. The trained policy is validated in both simulations and realistic experiments with a customized quadruped robot. The results demonstrate that the policy can be easily transferred to the real world without elaborate configurations. Moreover, although this policy is trained in merely one specific vibration condition, it demonstrates robustness under conditions that were never encountered during training.


Asunto(s)
Redes Neurales de la Computación , Refuerzo en Psicología , Algoritmos , Entropía , Aprendizaje
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA