Multi-User Reinforcement Learning with Low Rank Rewards
- URL: http://arxiv.org/abs/2210.05355v2
- Date: Mon, 22 May 2023 13:09:52 GMT
- Title: Multi-User Reinforcement Learning with Low Rank Rewards
- Authors: Naman Agarwal, Prateek Jain, Suhas Kowshik, Dheeraj Nagaraj and
Praneeth Netrapalli
- Abstract summary: Our main contribution is an algorithm which explores rewards collaboratively with $N$ user-specific MDPs.
When $N$ is large and the rank is constant, the sample complexity per MDP depends logarithmically over the size of the state-space.
- Score: 41.15103860230677
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we consider the problem of collaborative multi-user
reinforcement learning. In this setting there are multiple users with the same
state-action space and transition probabilities but with different rewards.
Under the assumption that the reward matrix of the $N$ users has a low-rank
structure -- a standard and practically successful assumption in the offline
collaborative filtering setting -- the question is can we design algorithms
with significantly lower sample complexity compared to the ones that learn the
MDP individually for each user. Our main contribution is an algorithm which
explores rewards collaboratively with $N$ user-specific MDPs and can learn
rewards efficiently in two key settings: tabular MDPs and linear MDPs. When $N$
is large and the rank is constant, the sample complexity per MDP depends
logarithmically over the size of the state-space, which represents an
exponential reduction (in the state-space size) when compared to the standard
``non-collaborative'' algorithms.
Related papers
- Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning [15.46907000938726]
We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL)
We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC.
We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (textiti.e., $N$-chain), a video game, and a real-world problem in energy systems.
arXiv Detail & Related papers (2024-04-16T17:01:38Z) - User-Level Differential Privacy With Few Examples Per User [73.81862394073308]
We consider the example-scarce regime, where each user has only a few examples, and obtain the following results.
For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm.
We present a simple technique for adapting the exponential mechanism [McSherry, Talwar FOCS 2007] to the user-level setting.
arXiv Detail & Related papers (2023-09-21T21:51:55Z) - Improved Sample Complexity for Reward-free Reinforcement Learning under
Low-rank MDPs [43.53286390357673]
This paper focuses on reward-free reinforcement learning under low-rank MDP models.
We first provide the first known sample complexity lower bound for any algorithm under low-rank MDPs.
We then propose a novel model-based algorithm, coined RAFFLE, and show it can both find an $epsilon$-optimal policy and achieve an $epsilon$-accurate system identification.
arXiv Detail & Related papers (2023-03-20T04:39:39Z) - (Private) Kernelized Bandits with Distributed Biased Feedback [13.312928989951505]
We study kernelized bandits with distributed biased feedback.
New emphdistributed phase-then-batch-based elimination (textttDPBE) algorithm is proposed.
We show that textttDPBE achieves a sublinear regret of $tildeO(T1-alpha/2+sqrtgamma_T T)$, where $alphain (0,1)$ is the user-sampling parameter one can tune.
arXiv Detail & Related papers (2023-01-28T02:30:15Z) - The Minority Matters: A Diversity-Promoting Collaborative Metric
Learning Algorithm [154.47590401735323]
Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems.
This paper focuses on a challenging scenario where a user has multiple categories of interests.
We propose a novel method called textitDiversity-Promoting Collaborative Metric Learning (DPCML)
arXiv Detail & Related papers (2022-09-30T08:02:18Z) - Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov
Decision Processes [61.11090361892306]
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration.
We show that this separation does not exist in the setting of linear MDPs.
We develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP.
arXiv Detail & Related papers (2022-01-26T22:09:59Z) - New Tight Relaxations of Rank Minimization for Multi-Task Learning [161.23314844751556]
We propose two novel multi-task learning formulations based on two regularization terms.
We show that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.
arXiv Detail & Related papers (2021-12-09T07:29:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.