Related papers: Task-Optimal Exploration in Linear Dynamical Systems

Task-Optimal Exploration in Linear Dynamical Systems

URL: http://arxiv.org/abs/2102.05214v1
Date: Wed, 10 Feb 2021 01:42:22 GMT
Title: Task-Optimal Exploration in Linear Dynamical Systems
Authors: Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson
Abstract summary: We study task-guided exploration and determine what precisely an agent must learn about their environment in order to complete a task. We provide instance- and task-dependent lower bounds which explicitly quantify the difficulty of completing a task of interest. We show that it optimally explores the environment, collecting precisely the information needed to complete the task, and provide finite-time bounds guaranteeing that it achieves the instance- and task-optimal sample complexity.
Score: 29.552894877883883
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Exploration in unknown environments is a fundamental problem in reinforcement learning and control. In this work, we study task-guided exploration and determine what precisely an agent must learn about their environment in order to complete a particular task. Formally, we study a broad class of decision-making problems in the setting of linear dynamical systems, a class that includes the linear quadratic regulator problem. We provide instance- and task-dependent lower bounds which explicitly quantify the difficulty of completing a task of interest. Motivated by our lower bound, we propose a computationally efficient experiment-design based exploration algorithm. We show that it optimally explores the environment, collecting precisely the information needed to complete the task, and provide finite-time bounds guaranteeing that it achieves the instance- and task-optimal sample complexity, up to constant factors. Through several examples of the LQR problem, we show that performing task-guided exploration provably improves on exploration schemes which do not take into account the task of interest. Along the way, we establish that certainty equivalence decision making is instance- and task-optimal, and obtain the first algorithm for the linear quadratic regulator problem which is instance-optimal. We conclude with several experiments illustrating the effectiveness of our approach in practice.

Related papers

Causally Aligned Curriculum Learning [69.11672390876763]
This paper studies the problem of curriculum RL through causal lenses. We derive a sufficient graphical condition characterizing causally aligned source tasks. We develop an efficient algorithm to generate a causally aligned curriculum.
arXiv Detail & Related papers (2025-03-21T02:20:38Z)
Towards Unsupervised Multi-Agent Reinforcement Learning via Task-Agnostic Exploration [44.601019677298005]
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.<n>We show that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
arXiv Detail & Related papers (2025-02-12T12:51:36Z)
Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration [13.053013407015628]
This paper addresses the problem of learning optimal control policies for systems with uncertain dynamics. We propose an accelerated RL algorithm that can learn control policies significantly faster than competitive approaches.
arXiv Detail & Related papers (2024-10-16T00:53:41Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
Multi-task Representation Learning for Pure Exploration in Linear Bandits [34.67303292713379]
We study multi-task representation learning for best arm identification in linear bandits (RepBAI-LB) and best policy identification in contextual linear bandits (RepBPI-CLB) In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks. We show that by learning the common representation among tasks, our sample complexity is significantly better than that of the native approach which solves tasks independently.
arXiv Detail & Related papers (2023-02-09T05:14:48Z)
Reinforcement Learning Approach for Multi-Agent Flexible Scheduling Problems [0.0]
This research presents a Reinforcement Learning approach for scheduling problems. In particular, this study delivers an OpenAI gym environment with search-space reduction for Job Shop Scheduling Problems.
arXiv Detail & Related papers (2022-10-07T16:31:01Z)
Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account. We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z)
Active Exploration via Experiment Design in Markov Chains [86.41407938210193]
A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest. We propose an algorithm that efficiently selects policies whose measurement allocation converges to the optimal one. In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.
arXiv Detail & Related papers (2022-06-29T00:04:40Z)
Relevance-guided Unsupervised Discovery of Abilities with Quality-Diversity Algorithms [1.827510863075184]
We introduce Relevance-guided Unsupervised Discovery of Abilities; a Quality-Diversity algorithm that autonomously finds a behavioural characterisation tailored to the task at hand. We evaluate our approach on a simulated robotic environment, where the robot has to autonomously discover its abilities based on its full sensory data.
arXiv Detail & Related papers (2022-04-21T00:29:38Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
Sequential Transfer in Reinforcement Learning with a Generative Model [48.40219742217783]
We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones. We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge. We empirically verify our theoretical findings in simple simulated domains.
arXiv Detail & Related papers (2020-07-01T19:53:35Z)
Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent. We present a new approach to self-supervised exploration and fast adaptation to new tasks. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.