Contractual Reinforcement Learning: Pulling Arms with Invisible Hands
- URL: http://arxiv.org/abs/2407.01458v2
- Date: Tue, 2 Jul 2024 15:17:50 GMT
- Title: Contractual Reinforcement Learning: Pulling Arms with Invisible Hands
- Authors: Jibang Wu, Siyu Chen, Mengdi Wang, Huazheng Wang, Haifeng Xu,
- Abstract summary: We propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design.
For the planning problem, we design an efficient dynamic programming algorithm to determine the optimal contracts against the far-sighted agent.
For the learning problem, we introduce a generic design of no-regret learning algorithms to untangle the challenges from robust design of contracts to the balance of exploration and exploitation.
- Score: 68.77645200579181
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The agency problem emerges in today's large scale machine learning tasks, where the learners are unable to direct content creation or enforce data collection. In this work, we propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design. The problem, termed \emph{contractual reinforcement learning}, naturally arises from the classic model of Markov decision processes, where a learning principal seeks to optimally influence the agent's action policy for their common interests through a set of payment rules contingent on the realization of next state. For the planning problem, we design an efficient dynamic programming algorithm to determine the optimal contracts against the far-sighted agent. For the learning problem, we introduce a generic design of no-regret learning algorithms to untangle the challenges from robust design of contracts to the balance of exploration and exploitation, reducing the complexity analysis to the construction of efficient search algorithms. For several natural classes of problems, we design tailored search algorithms that provably achieve $\tilde{O}(\sqrt{T})$ regret. We also present an algorithm with $\tilde{O}(T^{2/3})$ for the general problem that improves the existing analysis in online contract design with mild technical assumptions.
Related papers
- On the Necessity of Collaboration for Online Model Selection with Decentralized Data [53.244188985271606]
We consider online model selection with decentralized data over $M$ clients, and study the necessity of collaboration among clients.
Our results show (i) collaboration is unnecessary in the absence of computational constraints on clients; (ii) collaboration is necessary if the computational cost on each client is limited to $o(K)$, where $K$ is the number of candidate hypothesis spaces.
arXiv Detail & Related papers (2024-04-15T06:32:28Z) - New Perspectives in Online Contract Design [2.296475290901356]
This work studies the repeated principal-agent problem from an online learning perspective.
The principal's goal is to learn the optimal contract that maximizes her utility through repeated interactions.
arXiv Detail & Related papers (2024-03-11T20:28:23Z) - Neural Algorithmic Reasoning Without Intermediate Supervision [21.852775399735005]
We focus on learning neural algorithmic reasoning only from the input-output pairs without appealing to the intermediate supervision.
We build a self-supervised objective that can regularise intermediate computations of the model without access to the algorithm trajectory.
We demonstrate that our approach is competitive to its trajectory-supervised counterpart on tasks from the CLRSic Algorithmic Reasoning Benchmark.
arXiv Detail & Related papers (2023-06-23T09:57:44Z) - Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z) - Minimalistic Predictions to Schedule Jobs with Online Precedence
Constraints [117.8317521974783]
We consider non-clairvoyant scheduling with online precedence constraints.
An algorithm is oblivious to any job dependencies and learns about a job only if all of its predecessors have been completed.
arXiv Detail & Related papers (2023-01-30T13:17:15Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - On Efficient Online Imitation Learning via Classification [17.416831207557603]
We study classification-based online imitation learning (abbrev. $textbfCOIL$) and the fundamental feasibility to design oracle-efficient regret-minimization algorithms.
Our work puts classification-based online imitation learning, an important IL setup, into a firmer foundation.
arXiv Detail & Related papers (2022-09-26T17:34:36Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z) - Learning Robust Algorithms for Online Allocation Problems Using
Adversarial Training [10.14260510961573]
We address the challenge of finding algorithms for online allocation (i.e. bipartite matching) using a machine learning approach.
In this paper, we focus on the AdWords problem, which is a classical online budgeted matching problem of both theoretical and practical significance.
arXiv Detail & Related papers (2020-10-16T14:33:11Z) - Reinforcement Learning for Combinatorial Optimization: A Survey [12.323976053967066]
Many traditional algorithms for solving optimization problems involve using hand-crafteds that sequentially construct a solution.
Reinforcement learning (RL) proposes a good alternative to automate the search of theses by training an agent in a supervised or self-supervised manner.
arXiv Detail & Related papers (2020-03-07T16:19:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.