Replicable Reinforcement Learning with Linear Function Approximation
- URL: http://arxiv.org/abs/2509.08660v2
- Date: Wed, 01 Oct 2025 14:47:11 GMT
- Title: Replicable Reinforcement Learning with Linear Function Approximation
- Authors: Eric Eaton, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell,
- Abstract summary: We introduce two efficient algorithms for replicable random design regression and uncentered covariance estimation.<n>We then leverage these tools to provide the first provably efficient RL algorithms for linear Markov decision processes.<n>We evaluate our algorithms experimentally and show how they can inspire more consistent neural policies.
- Score: 21.370247743205056
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Replication of experimental results has been a challenge faced by many scientific disciplines, including the field of machine learning. Recent work on the theory of machine learning has formalized replicability as the demand that an algorithm produce identical outcomes when executed twice on different samples from the same distribution. Provably replicable algorithms are especially interesting for reinforcement learning (RL), where algorithms are known to be unstable in practice. While replicable algorithms exist for tabular RL settings, extending these guarantees to more practical function approximation settings has remained an open problem. In this work, we make progress by developing replicable methods for linear function approximation in RL. We first introduce two efficient algorithms for replicable random design regression and uncentered covariance estimation, each of independent interest. We then leverage these tools to provide the first provably efficient replicable RL algorithms for linear Markov decision processes in both the generative model and episodic settings. Finally, we evaluate our algorithms experimentally and show how they can inspire more consistent neural policies.
Related papers
- Auto-exploration for online reinforcement learning [1.2788899058467404]
exploration-exploitation dilemma in reinforcement learning is a fundamental challenge to efficient RL algorithms.<n>Existing algorithms for finite state and action discounted RL problems address this by assuming sufficient exploration over both state and action spaces.<n>We introduce a new class of methods with auto-exploration, or methods that automatically explore both state and action spaces in a parameter-free way.
arXiv Detail & Related papers (2025-12-06T02:04:50Z) - The Cost of Replicability in Active Learning [1.349553980170339]
Active learning aims to reduce the required number of labeled data for machine learning algorithms by selectively querying the labels of initially unlabeled data points.<n> Ensuring the replicability of results, where an algorithm consistently produces the same outcome across different runs, is essential for the reliability of machine learning models.<n>This report investigates the cost of replicability in active learning using the CAL algorithm, a classical disagreement-based active learning method.
arXiv Detail & Related papers (2024-12-12T19:03:31Z) - Towards model-free RL algorithms that scale well with unstructured data [1.3799571823220778]
We provide an algorithm that constructs reward-relevant general value function questions to find and exploit predictive structure directly from the experience stream.
The proposed algorithm reliably outperforms a conventional deep RL algorithm on these scaling problems.
arXiv Detail & Related papers (2023-11-03T20:03:54Z) - Replicable Reinforcement Learning [15.857503103543308]
We provide a provably replicable algorithm for parallel value iteration, and a provably replicable version of R-max in the episodic setting.
These are the first formal replicability results for control problems, which present different challenges for replication than batch learning settings.
arXiv Detail & Related papers (2023-05-24T16:05:15Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Ordering-Based Causal Discovery with Reinforcement Learning [31.358145789333825]
We propose a novel RL-based approach for causal discovery, by incorporating RL into the ordering-based paradigm.
We analyze the consistency and computational complexity of the proposed method, and empirically show that a pretrained model can be exploited to accelerate training.
arXiv Detail & Related papers (2021-05-14T03:49:59Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - A Two-stage Framework and Reinforcement Learning-based Optimization
Algorithms for Complex Scheduling Problems [54.61091936472494]
We develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together.
The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively.
Results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems.
arXiv Detail & Related papers (2021-03-10T03:16:12Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.