Egoism, utilitarianism and egalitarianism in multi-agent reinforcement learning.

Dong, Shaokang; Li, Chao; Yang, Shangdong; An, Bo; Li, Wenbin; Gao, Yang

Dong, Shaokang; Li, Chao; Yang, Shangdong; An, Bo; Li, Wenbin; Gao, Yang.

Afiliación

Dong S; State Key Laboratory for Novel Software Technology, Nanjing University, China. Electronic address: shaokangdong@smail.nju.edu.cn.
Li C; State Key Laboratory for Novel Software Technology, Nanjing University, China. Electronic address: chaoli1996@smail.nju.edu.cn.
Yang S; School of Computer Science, Nanjing University of Posts and Telecommunications, China. Electronic address: sdyang@njupt.edu.cn.
An B; School of Computer Science and Engineering, Nanyang Technological University, Singapore. Electronic address: boan@ntu.edu.sg.
Li W; State Key Laboratory for Novel Software Technology, Nanjing University, China; Shenzhen Research Institute of Nanjing University, China. Electronic address: liwenbin@nju.edu.cn.
Gao Y; State Key Laboratory for Novel Software Technology, Nanjing University, China. Electronic address: gaoy@nju.edu.cn.

Neural Netw ; 178: 106544, 2024 Oct.

Article en En | MEDLINE | ID: mdl-39053197

ABSTRACT

ABSTRACT

In multi-agent partially observable sequential decision problems with general-sum rewards, it is necessary to account for the egoism (individual rewards), utilitarianism (social welfare), and egalitarianism (fairness) criteria simultaneously. However, achieving a balance between these criteria poses a challenge for current multi-agent reinforcement learning methods. Specifically, fully decentralized methods without global information of all agents' rewards, observations and actions fail to learn a balanced policy, while agents in centralized training (with decentralized execution) methods are reluctant to share private information due to concerns of exploitation by others. To address these issues, this paper proposes a Decentralized and Federated (D&F) paradigm, where decentralized agents train egoistic policies utilizing solely local information to attain self-interest, and the federation controller primarily considers utilitarianism and egalitarianism. Meanwhile, the parameters of decentralized and federated policies are optimized with discrepancy constraints mutually, akin to a server and client pattern, which ensures the balance between egoism, utilitarianism, and egalitarianism. Furthermore, theoretical evidence demonstrates that the federated model, as well as the discrepancy between decentralized egoistic policies and federated utilitarian policies, obtains an O(1/T) convergence rate. Extensive experiments show that our D&F approach outperforms multiple baselines, in terms of both utilitarianism and egalitarianism.

Asunto(s)

Refuerzo en Psicología; Humanos; Recompensa; Toma de Decisiones/fisiología

Palabras clave

Egalitarianism; Egoism; General-sum rewards; Multi-agent reinforcement learning; Utilitarianism

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Refuerzo en Psicología Límite: Humans Idioma: En Revista: Neural Netw Asunto de la revista: NEUROLOGIA Año: 2024 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google