Related papers: Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

URL: http://arxiv.org/abs/2311.18346v1
Date: Thu, 30 Nov 2023 08:32:50 GMT
Title: Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent
Authors: Bianca Marin Moreno (Thoth), Margaux Br\'eg\`ere (EDF R&D, LPSM, SU), Pierre Gaillard (Thoth), Nadia Oudjane (EDF R&D)
Abstract summary: Concave Utility Reinforcement Learning problem invalidates classical Bellman equations. We introduce MD-CURL, a new algorithm for CURL in a finite horizon Markov decision process. We present Greedy MD-CURL, a new method adapting MD-CURL to an online, episode-based setting.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many machine learning tasks can be solved by minimizing a convex function of an occupancy measure over the policies that generate them. These include reinforcement learning, imitation learning, among others. This more general paradigm is called the Concave Utility Reinforcement Learning problem (CURL). Since CURL invalidates classical Bellman equations, it requires new algorithms. We introduce MD-CURL, a new algorithm for CURL in a finite horizon Markov decision process. MD-CURL is inspired by mirror descent and uses a non-standard regularization to achieve convergence guarantees and a simple closed-form solution, eliminating the need for computationally expensive projection steps typically found in mirror descent approaches. We then extend CURL to an online learning scenario and present Greedy MD-CURL, a new method adapting MD-CURL to an online, episode-based setting with partially unknown dynamics. Like MD-CURL, the online version Greedy MD-CURL benefits from low computational complexity, while guaranteeing sub-linear or even logarithmic regret, depending on the level of information available on the underlying dynamics.

Related papers

Quantum Algorithm for Sparse Online Learning with Truncated Gradient Descent [2.148134736383802]
Logistic regression, the Support Vector Machine (SVM), and least squares are well-studied methods in the statistical and computer science community. We develop a quantum sparse online learning algorithm for logistic regression, the SVM, and least squares.
arXiv Detail & Related papers (2024-11-06T13:57:50Z)
Minimax-optimal and Locally-adaptive Online Nonparametric Regression [10.138723409205497]
We study adversarial online nonparametric regression with general convex losses. We propose a parameter-free learning algorithm that achieves minimax optimal rates. We discuss how these notions could be extended to a boosting framework.
arXiv Detail & Related papers (2024-10-04T12:30:03Z)
MetaCURL: Non-stationary Concave Utility Reinforcement Learning [8.230945193151399]
We explore online learning in episodic loop-free Markov decision processes on non-stationary environments. This paper introduces MetaCURL, the first CURL algorithm for non-stationary MDPs.
arXiv Detail & Related papers (2024-05-30T08:17:00Z)
Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory [17.62475351325657]
We consider inverse reinforcement learning problems with concave utilities. We show that most of the standard IRL results do not apply to CURL in general, since CURL invalidates the classical Bellman equations. We propose a new definition for the feasible rewards for I-CURL by proving that this problem is equivalent to an inverse game theory problem in a subclass of mean-field games.
arXiv Detail & Related papers (2024-05-29T12:07:17Z)
Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z)
Online Reinforcement Learning in Markov Decision Process Using Linear Programming [1.0878040851638]
We consider online reinforcement learning in episodic Markov decision process (MDP) with unknown transition function and rewards drawn from some fixed but unknown distribution. We devise a simple and efficient model-based algorithm that achieves $widetildeO(LXsqrtTA)$ regret with high probability.
arXiv Detail & Related papers (2023-03-31T22:21:41Z)
APP: Anytime Progressive Pruning [104.36308667437397]
We propose a novel way of training a neural network with a target sparsity in a particular case of online learning: the anytime learning at macroscale paradigm (ALMA) The proposed approach significantly outperforms the baseline dense and Anytime OSP models across multiple architectures and datasets under short, moderate, and long-sequence training.
arXiv Detail & Related papers (2022-04-04T16:38:55Z)
Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization [70.4342220499858]
We introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities. Our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and safeguard the same rate in the worst case.
arXiv Detail & Related papers (2021-12-29T02:42:59Z)
Online Sub-Sampling for Reinforcement Learning with General Function Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm. For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times. In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z)
Masked Contrastive Representation Learning for Reinforcement Learning [202.8261654227565]
CURL, which uses contrastive learning to extract high-level features from raw pixels of individual video frames, is an efficient algorithm. We propose a new algorithm, masked contrastive representation learning for RL, that takes the correlation among consecutive inputs into consideration. Our method achieves consistent improvements over CURL on $14$ out of $16$ environments from DMControl suite and $21$ out of $26$ environments from Atari 2600 Games.
arXiv Detail & Related papers (2020-10-15T02:00:10Z)
CURL: Contrastive Unsupervised Representations for Reinforcement Learning [93.57637441080603]
CURL extracts high-level features from raw pixels using contrastive learning. On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency of methods that use state-based features.
arXiv Detail & Related papers (2020-04-08T17:40:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.