Optimal coordination in Minority Game: A solution from reinforcement
learning
- URL: http://arxiv.org/abs/2312.14970v1
- Date: Wed, 20 Dec 2023 00:47:45 GMT
- Title: Optimal coordination in Minority Game: A solution from reinforcement
learning
- Authors: Guozhong Zheng, Weiran Cai, Guanxiao Qi, Jiqiang Zhang, and Li Chen
- Abstract summary: The Minority Game is perhaps the simplest model that provides insights into how human coordinate to maximize the resource utilization.
Here, we turn to the paradigm of reinforcement learning, where individuals' strategies are evolving by evaluating both the past experience and rewards in the future.
We reveal that the population is able to reach the optimal allocation when individuals appreciate both the past experience and rewards in the future.
- Score: 6.0413802011767705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficient allocation is important in nature and human society where
individuals often compete for finite resources. The Minority Game is perhaps
the simplest model that provides deep insights into how human coordinate to
maximize the resource utilization. However, this model assumes the static
strategies that are provided a priori, failing to capture their adaptive
nature. Here, we turn to the paradigm of reinforcement learning, where
individuals' strategies are evolving by evaluating both the past experience and
rewards in the future. Specifically, we adopt the Q-learning algorithm, each
player is endowed with a Q-table that guides their decision-making. We reveal
that the population is able to reach the optimal allocation when individuals
appreciate both the past experience and rewards in the future, and they are
able to balance the exploitation of their Q-tables and the exploration by
randomly acting. The optimal allocation is ruined when individuals tend to use
either exploitation-only or exploration-only, where only partial coordination
and even anti-coordination are observed. Mechanism analysis reveals that a
moderate level of exploration can escape local minimums of metastable periodic
states, and reaches the optimal coordination as the global minimum.
Interestingly, the optimal coordination is underlined by a symmetry-breaking of
action preferences, where nearly half of the population choose one side while
the other half prefer the other side. The emergence of optimal coordination is
robust to the population size and other game parameters. Our work therefore
provides a natural solution to the Minority Game and sheds insights into the
resource allocation problem in general. Besides, our work demonstrates the
potential of the proposed reinforcement learning paradigm in deciphering many
puzzles in the socio-economic context.
Related papers
- Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Multi-turn Reinforcement Learning from Preference Human Feedback [41.327438095745315]
Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models with human preferences.
Existing methods work by emulating the preferences at the single decision (turn) level.
We develop novel methods for Reinforcement Learning from preference feedback between two full multi-turn conversations.
arXiv Detail & Related papers (2024-05-23T14:53:54Z) - Non-linear Welfare-Aware Strategic Learning [10.448052192725168]
This paper studies algorithmic decision-making in the presence of strategic individual behaviors.
We first generalize the agent best response model in previous works to the non-linear setting.
We show the three welfare can attain the optimum simultaneously only under restrictive conditions.
arXiv Detail & Related papers (2024-05-03T01:50:03Z) - RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning [8.389454219309837]
multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations.
We propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent.
With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm.
arXiv Detail & Related papers (2024-04-12T05:02:49Z) - MaxMin-RLHF: Alignment with Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.
Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - WARM: On the Benefits of Weight Averaged Reward Models [63.08179139233774]
We propose Weight Averaged Reward Models (WARM) to mitigate reward hacking.
Experiments on summarization tasks, using best-of-N and RL methods, shows that WARM improves the overall quality and alignment of LLM predictions.
arXiv Detail & Related papers (2024-01-22T18:27:08Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Unsupervised Resource Allocation with Graph Neural Networks [0.0]
We present an approach for maximizing a global utility function by learning how to allocate resources in an unsupervised way.
We propose to learn the reward structure for near-optimal allocation policies with a GNN.
arXiv Detail & Related papers (2021-06-17T18:44:04Z) - Resource Allocation via Model-Free Deep Learning in Free Space Optical
Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications.
Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z) - Decentralized Reinforcement Learning: Global Decision-Making via Local
Economic Transactions [80.49176924360499]
We establish a framework for directing a society of simple, specialized, self-interested agents to solve sequential decision problems.
We derive a class of decentralized reinforcement learning algorithms.
We demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.
arXiv Detail & Related papers (2020-07-05T16:41:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.