Related papers: Optimal coordination in Minority Game: A solution from reinforcement learning

Optimal coordination in Minority Game: A solution from reinforcement learning

URL: http://arxiv.org/abs/2312.14970v1
Date: Wed, 20 Dec 2023 00:47:45 GMT
Title: Optimal coordination in Minority Game: A solution from reinforcement learning
Authors: Guozhong Zheng, Weiran Cai, Guanxiao Qi, Jiqiang Zhang, and Li Chen
Abstract summary: The Minority Game is perhaps the simplest model that provides insights into how human coordinate to maximize the resource utilization. Here, we turn to the paradigm of reinforcement learning, where individuals' strategies are evolving by evaluating both the past experience and rewards in the future. We reveal that the population is able to reach the optimal allocation when individuals appreciate both the past experience and rewards in the future.
Score: 6.0413802011767705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Efficient allocation is important in nature and human society where individuals often compete for finite resources. The Minority Game is perhaps the simplest model that provides deep insights into how human coordinate to maximize the resource utilization. However, this model assumes the static strategies that are provided a priori, failing to capture their adaptive nature. Here, we turn to the paradigm of reinforcement learning, where individuals' strategies are evolving by evaluating both the past experience and rewards in the future. Specifically, we adopt the Q-learning algorithm, each player is endowed with a Q-table that guides their decision-making. We reveal that the population is able to reach the optimal allocation when individuals appreciate both the past experience and rewards in the future, and they are able to balance the exploitation of their Q-tables and the exploration by randomly acting. The optimal allocation is ruined when individuals tend to use either exploitation-only or exploration-only, where only partial coordination and even anti-coordination are observed. Mechanism analysis reveals that a moderate level of exploration can escape local minimums of metastable periodic states, and reaches the optimal coordination as the global minimum. Interestingly, the optimal coordination is underlined by a symmetry-breaking of action preferences, where nearly half of the population choose one side while the other half prefer the other side. The emergence of optimal coordination is robust to the population size and other game parameters. Our work therefore provides a natural solution to the Minority Game and sheds insights into the resource allocation problem in general. Besides, our work demonstrates the potential of the proposed reinforcement learning paradigm in deciphering many puzzles in the socio-economic context.

Related papers

Learning to Assist Humans without Inferring Rewards [65.28156318196397]
We build upon prior work that studies assistance through the lens of empowerment. An assistive agent aims to maximize the influence of the human's actions. We prove that these representations estimate a similar notion of empowerment to that studied by prior work.
arXiv Detail & Related papers (2024-11-04T21:31:04Z)
Learning in Multi-Objective Public Goods Games with Non-Linear Utilities [8.243788683895376]
We study learning in a novel multi-objective version of the Public Goods Game where agents have different risk preferences. We study the interplay between such preference modelling and environmental uncertainty on the incentive alignment level in the game.
arXiv Detail & Related papers (2024-08-01T16:24:37Z)
Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z)
Multi-turn Reinforcement Learning from Preference Human Feedback [41.327438095745315]
Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models with human preferences. Existing methods work by emulating the preferences at the single decision (turn) level. We develop novel methods for Reinforcement Learning from preference feedback between two full multi-turn conversations.
arXiv Detail & Related papers (2024-05-23T14:53:54Z)
Non-linear Welfare-Aware Strategic Learning [10.448052192725168]
This paper studies algorithmic decision-making in the presence of strategic individual behaviors. We first generalize the agent best response model in previous works to the non-linear setting. We show the three welfare can attain the optimum simultaneously only under restrictive conditions.
arXiv Detail & Related papers (2024-05-03T01:50:03Z)
RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning [8.389454219309837]
multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations. We propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent. With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm.
arXiv Detail & Related papers (2024-04-12T05:02:49Z)
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data. We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences. Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z)
WARM: On the Benefits of Weight Averaged Reward Models [63.08179139233774]
We propose Weight Averaged Reward Models (WARM) to mitigate reward hacking. Experiments on summarization tasks, using best-of-N and RL methods, shows that WARM improves the overall quality and alignment of LLM predictions.
arXiv Detail & Related papers (2024-01-22T18:27:08Z)
A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback. Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training. We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Unsupervised Resource Allocation with Graph Neural Networks [0.0]
We present an approach for maximizing a global utility function by learning how to allocate resources in an unsupervised way. We propose to learn the reward structure for near-optimal allocation policies with a GNN.
arXiv Detail & Related papers (2021-06-17T18:44:04Z)
Learning Strategies in Decentralized Matching Markets under Uncertain Preferences [91.3755431537592]
We study the problem of decision-making in the setting of a scarcity of shared resources when the preferences of agents are unknown a priori. Our approach is based on the representation of preferences in a reproducing kernel Hilbert space. We derive optimal strategies that maximize agents' expected payoffs.
arXiv Detail & Related papers (2020-10-29T03:08:22Z)
On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents. We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z)
Resource Allocation via Model-Free Deep Learning in Free Space Optical Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications. Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z)
Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions [80.49176924360499]
We establish a framework for directing a society of simple, specialized, self-interested agents to solve sequential decision problems. We derive a class of decentralized reinforcement learning algorithms. We demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.
arXiv Detail & Related papers (2020-07-05T16:41:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.