Related papers: Optimal coordination of resources: A solution from reinforcement learning

Optimal coordination of resources: A solution from reinforcement learning

URL: http://arxiv.org/abs/2312.14970v2
Date: Thu, 20 Feb 2025 11:42:23 GMT
Title: Optimal coordination of resources: A solution from reinforcement learning
Authors: Guozhong Zheng, Weiran Cai, Guanxiao Qi, Jiqiang Zhang, Li Chen,
Abstract summary: The Minority Game (MG) is perhaps the simplest toy model to address this issue.<n>We introduce the reinforcement learning paradigm to MG, where individuals adjust decisions based on accumulated experience.<n>We find that this RL framework achieves optimal resource coordination when individuals balance the exploitation of experience with random exploration.
Score: 6.0413802011767705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Efficient allocation is important in nature and human society, where individuals frequently compete for limited resources. The Minority Game (MG) is perhaps the simplest toy model to address this issue. However, most previous solutions assume that the strategies are provided a priori and static, failing to capture their adaptive nature. Here, we introduce the reinforcement learning (RL) paradigm to MG, where individuals adjust decisions based on accumulated experience and expected rewards dynamically. We find that this RL framework achieves optimal resource coordination when individuals balance the exploitation of experience with random exploration. Yet, the imbalanced strategies of the two lead to suboptimal partial coordination or even anti-coordination. Our mechanistic analysis reveals a symmetry-breaking in action preferences at the optimum, offering a fresh solution to the MG and new insights into the resource allocation problem.

Related papers

Learning to Lead: Incentivizing Strategic Agents in the Dark [50.93875404941184]
We study an online learning version of the generalized principal-agent model.<n>We develop the first provably sample-efficient algorithm for this challenging setting.<n>We establish a near optimal $tildeO(sqrtT) $ regret bound for learning the principal's optimal policy.
arXiv Detail & Related papers (2025-06-10T04:25:04Z)
Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs [9.369330148791201]
Game-theoretic resource allocation on graphs (GRAG) is a problem modeled as a multi-step Colonel Blotto Game (MCBG)<n>We formulate the MCBG as a Markov Decision Process (MDP) and apply Reinforcement Learning (RL) methods, specifically Deep Q-Network (DQN) and Proximal Policy Optimization (PPO)<n>We evaluate RL performance across a variety of graph structures and initial resource distributions, comparing against random, greedy, and learned RL policies.
arXiv Detail & Related papers (2025-05-08T21:12:34Z)
Learning to Assist Humans without Inferring Rewards [65.28156318196397]
We build upon prior work that studies assistance through the lens of empowerment. An assistive agent aims to maximize the influence of the human's actions. We prove that these representations estimate a similar notion of empowerment to that studied by prior work.
arXiv Detail & Related papers (2024-11-04T21:31:04Z)
Learning in Multi-Objective Public Goods Games with Non-Linear Utilities [8.243788683895376]
We study learning in a novel multi-objective version of the Public Goods Game where agents have different risk preferences. We study the interplay between such preference modelling and environmental uncertainty on the incentive alignment level in the game.
arXiv Detail & Related papers (2024-08-01T16:24:37Z)
Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z)
Multi-turn Reinforcement Learning from Preference Human Feedback [41.327438095745315]
Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models with human preferences. Existing methods work by emulating the preferences at the single decision (turn) level. We develop novel methods for Reinforcement Learning from preference feedback between two full multi-turn conversations.
arXiv Detail & Related papers (2024-05-23T14:53:54Z)
Non-linear Welfare-Aware Strategic Learning [10.448052192725168]
This paper studies algorithmic decision-making in the presence of strategic individual behaviors. We first generalize the agent best response model in previous works to the non-linear setting. We show the three welfare can attain the optimum simultaneously only under restrictive conditions.
arXiv Detail & Related papers (2024-05-03T01:50:03Z)
RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning [8.389454219309837]
multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations. We propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent. With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm.
arXiv Detail & Related papers (2024-04-12T05:02:49Z)
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data. We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences. Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z)
WARM: On the Benefits of Weight Averaged Reward Models [63.08179139233774]
We propose Weight Averaged Reward Models (WARM) to mitigate reward hacking. Experiments on summarization tasks, using best-of-N and RL methods, shows that WARM improves the overall quality and alignment of LLM predictions.
arXiv Detail & Related papers (2024-01-22T18:27:08Z)
A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback. Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training. We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Unsupervised Resource Allocation with Graph Neural Networks [0.0]
We present an approach for maximizing a global utility function by learning how to allocate resources in an unsupervised way. We propose to learn the reward structure for near-optimal allocation policies with a GNN.
arXiv Detail & Related papers (2021-06-17T18:44:04Z)
Learning Strategies in Decentralized Matching Markets under Uncertain Preferences [91.3755431537592]
We study the problem of decision-making in the setting of a scarcity of shared resources when the preferences of agents are unknown a priori. Our approach is based on the representation of preferences in a reproducing kernel Hilbert space. We derive optimal strategies that maximize agents' expected payoffs.
arXiv Detail & Related papers (2020-10-29T03:08:22Z)
On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents. We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z)
Resource Allocation via Model-Free Deep Learning in Free Space Optical Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications. Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z)
Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions [80.49176924360499]
We establish a framework for directing a society of simple, specialized, self-interested agents to solve sequential decision problems. We derive a class of decentralized reinforcement learning algorithms. We demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.
arXiv Detail & Related papers (2020-07-05T16:41:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.