Boosting Weak-to-Strong Agents in Multiagent Reinforcement Learning via Balanced PPO.

Huang, Sili; Chen, Hechang; Piao, Haiyin; Sun, Zhixiao; Chang, Yi; Sun, Lichao; Yang, Bo

Huang, Sili; Chen, Hechang; Piao, Haiyin; Sun, Zhixiao; Chang, Yi; Sun, Lichao; Yang, Bo.

IEEE Trans Neural Netw Learn Syst ; PP2024 Aug 14.

Article en En | MEDLINE | ID: mdl-39141463

ABSTRACT

ABSTRACT

Multiagent policy gradients (MAPGs), an essential branch of reinforcement learning (RL), have made great progress in both industry and academia. However, existing models do not pay attention to the inadequate training of individual policies, thus limiting the overall performance. We verify the existence of imbalanced training in multiagent tasks and formally define it as an imbalance between policies (IBPs). To address the IBP issue, we propose a dynamic policy balance (DPB) model to balance the learning of each policy by dynamically reweighting the training samples. In addition, current methods for better performance strengthen the exploration of all policies, which leads to disregarding the training differences in the team and reducing learning efficiency. To overcome this drawback, we derive a technique named weighted entropy regularization (WER), a team-level exploration with additional incentives for individuals who exceed the team. DPB and WER are evaluated in homogeneous and heterogeneous tasks, effectively alleviating the imbalanced training problem and improving exploration efficiency. Furthermore, the experimental results show that our models can outperform the state-of-the-art MAPG methods and boast over 12.1 % performance gain on average.

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: IEEE Trans Neural Netw Learn Syst Año: 2024 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google