Your browser doesn't support javascript.
loading
Egoism, utilitarianism and egalitarianism in multi-agent reinforcement learning.
Dong, Shaokang; Li, Chao; Yang, Shangdong; An, Bo; Li, Wenbin; Gao, Yang.
Afiliación
  • Dong S; State Key Laboratory for Novel Software Technology, Nanjing University, China. Electronic address: shaokangdong@smail.nju.edu.cn.
  • Li C; State Key Laboratory for Novel Software Technology, Nanjing University, China. Electronic address: chaoli1996@smail.nju.edu.cn.
  • Yang S; School of Computer Science, Nanjing University of Posts and Telecommunications, China. Electronic address: sdyang@njupt.edu.cn.
  • An B; School of Computer Science and Engineering, Nanyang Technological University, Singapore. Electronic address: boan@ntu.edu.sg.
  • Li W; State Key Laboratory for Novel Software Technology, Nanjing University, China; Shenzhen Research Institute of Nanjing University, China. Electronic address: liwenbin@nju.edu.cn.
  • Gao Y; State Key Laboratory for Novel Software Technology, Nanjing University, China. Electronic address: gaoy@nju.edu.cn.
Neural Netw ; 178: 106544, 2024 Oct.
Article en En | MEDLINE | ID: mdl-39053197
ABSTRACT
In multi-agent partially observable sequential decision problems with general-sum rewards, it is necessary to account for the egoism (individual rewards), utilitarianism (social welfare), and egalitarianism (fairness) criteria simultaneously. However, achieving a balance between these criteria poses a challenge for current multi-agent reinforcement learning methods. Specifically, fully decentralized methods without global information of all agents' rewards, observations and actions fail to learn a balanced policy, while agents in centralized training (with decentralized execution) methods are reluctant to share private information due to concerns of exploitation by others. To address these issues, this paper proposes a Decentralized and Federated (D&F) paradigm, where decentralized agents train egoistic policies utilizing solely local information to attain self-interest, and the federation controller primarily considers utilitarianism and egalitarianism. Meanwhile, the parameters of decentralized and federated policies are optimized with discrepancy constraints mutually, akin to a server and client pattern, which ensures the balance between egoism, utilitarianism, and egalitarianism. Furthermore, theoretical evidence demonstrates that the federated model, as well as the discrepancy between decentralized egoistic policies and federated utilitarian policies, obtains an O(1/T) convergence rate. Extensive experiments show that our D&F approach outperforms multiple baselines, in terms of both utilitarianism and egalitarianism.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Refuerzo en Psicología Límite: Humans Idioma: En Revista: Neural Netw Asunto de la revista: NEUROLOGIA Año: 2024 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Refuerzo en Psicología Límite: Humans Idioma: En Revista: Neural Netw Asunto de la revista: NEUROLOGIA Año: 2024 Tipo del documento: Article Pais de publicación: Estados Unidos