Task-Optimal Exploration in Linear Dynamical Systems
- URL: http://arxiv.org/abs/2102.05214v1
- Date: Wed, 10 Feb 2021 01:42:22 GMT
- Title: Task-Optimal Exploration in Linear Dynamical Systems
- Authors: Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson
- Abstract summary: We study task-guided exploration and determine what precisely an agent must learn about their environment in order to complete a task.
We provide instance- and task-dependent lower bounds which explicitly quantify the difficulty of completing a task of interest.
We show that it optimally explores the environment, collecting precisely the information needed to complete the task, and provide finite-time bounds guaranteeing that it achieves the instance- and task-optimal sample complexity.
- Score: 29.552894877883883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploration in unknown environments is a fundamental problem in reinforcement
learning and control. In this work, we study task-guided exploration and
determine what precisely an agent must learn about their environment in order
to complete a particular task. Formally, we study a broad class of
decision-making problems in the setting of linear dynamical systems, a class
that includes the linear quadratic regulator problem. We provide instance- and
task-dependent lower bounds which explicitly quantify the difficulty of
completing a task of interest. Motivated by our lower bound, we propose a
computationally efficient experiment-design based exploration algorithm. We
show that it optimally explores the environment, collecting precisely the
information needed to complete the task, and provide finite-time bounds
guaranteeing that it achieves the instance- and task-optimal sample complexity,
up to constant factors. Through several examples of the LQR problem, we show
that performing task-guided exploration provably improves on exploration
schemes which do not take into account the task of interest. Along the way, we
establish that certainty equivalence decision making is instance- and
task-optimal, and obtain the first algorithm for the linear quadratic regulator
problem which is instance-optimal. We conclude with several experiments
illustrating the effectiveness of our approach in practice.
Related papers
- Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration [13.053013407015628]
This paper addresses the problem of learning optimal control policies for systems with uncertain dynamics.
We propose an accelerated RL algorithm that can learn control policies significantly faster than competitive approaches.
arXiv Detail & Related papers (2024-10-16T00:53:41Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Multi-task Representation Learning for Pure Exploration in Linear
Bandits [34.67303292713379]
We study multi-task representation learning for best arm identification in linear bandits (RepBAI-LB) and best policy identification in contextual linear bandits (RepBPI-CLB)
In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks.
We show that by learning the common representation among tasks, our sample complexity is significantly better than that of the native approach which solves tasks independently.
arXiv Detail & Related papers (2023-02-09T05:14:48Z) - Reinforcement Learning Approach for Multi-Agent Flexible Scheduling
Problems [0.0]
This research presents a Reinforcement Learning approach for scheduling problems.
In particular, this study delivers an OpenAI gym environment with search-space reduction for Job Shop Scheduling Problems.
arXiv Detail & Related papers (2022-10-07T16:31:01Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Active Exploration via Experiment Design in Markov Chains [86.41407938210193]
A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest.
We propose an algorithm that efficiently selects policies whose measurement allocation converges to the optimal one.
In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.
arXiv Detail & Related papers (2022-06-29T00:04:40Z) - Relevance-guided Unsupervised Discovery of Abilities with
Quality-Diversity Algorithms [1.827510863075184]
We introduce Relevance-guided Unsupervised Discovery of Abilities; a Quality-Diversity algorithm that autonomously finds a behavioural characterisation tailored to the task at hand.
We evaluate our approach on a simulated robotic environment, where the robot has to autonomously discover its abilities based on its full sensory data.
arXiv Detail & Related papers (2022-04-21T00:29:38Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Sequential Transfer in Reinforcement Learning with a Generative Model [48.40219742217783]
We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.
We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge.
We empirically verify our theoretical findings in simple simulated domains.
arXiv Detail & Related papers (2020-07-01T19:53:35Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.