Counting Reward Automata: Sample Efficient Reinforcement Learning
Through the Exploitation of Reward Function Structure
- URL: http://arxiv.org/abs/2312.11364v2
- Date: Fri, 16 Feb 2024 19:19:37 GMT
- Title: Counting Reward Automata: Sample Efficient Reinforcement Learning
Through the Exploitation of Reward Function Structure
- Authors: Tristan Bester, Benjamin Rosman, Steven James, Geraud Nangue Tasse
- Abstract summary: We present counting reward automata-a finite state machine variant capable of modelling any reward function expressible as a formal language.
We prove that an agent equipped with such an abstract machine is able to solve a larger set of tasks than those utilising current approaches.
- Score: 13.231546105751015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present counting reward automata-a finite state machine variant capable of
modelling any reward function expressible as a formal language. Unlike previous
approaches, which are limited to the expression of tasks as regular languages,
our framework allows for tasks described by unrestricted grammars. We prove
that an agent equipped with such an abstract machine is able to solve a larger
set of tasks than those utilising current approaches. We show that this
increase in expressive power does not come at the cost of increased automaton
complexity. A selection of learning algorithms are presented which exploit
automaton structure to improve sample efficiency. We show that the state
machines required in our formulation can be specified from natural language
task descriptions using large language models. Empirical results demonstrate
that our method outperforms competing approaches in terms of sample efficiency,
automaton complexity, and task completion.
Related papers
- Learning Quantitative Automata Modulo Theories [17.33092604696224]
We present QUINTIC, an active learning algorithm, wherein the learner infers a valid automaton through deductive reasoning.
Our evaluations utilize theory of rationals in order to learn summation, discounted summation, product, and classification quantitative automata.
arXiv Detail & Related papers (2024-11-15T21:51:14Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - Lemur: Integrating Large Language Models in Automated Program Verification [10.221822902660458]
We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification.
We instantiate the calculus as a sound automated verification procedure and demonstrate practical improvements on a set of synthetic and competition benchmarks.
arXiv Detail & Related papers (2023-10-07T16:44:53Z) - Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences.
Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary.
In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z) - Inverse Reinforcement Learning of Autonomous Behaviors Encoded as
Weighted Finite Automata [18.972270182221262]
This paper presents a method for learning logical task specifications and cost functions from demonstrations.
We employ a spectral learning approach to extract a weighted finite automaton (WFA), approximating the unknown logic structure of the task.
We define a product between the WFA for high-level task guidance and a Labeled Markov decision process (L-MDP) for low-level control and optimize a cost function that matches the demonstrator's behavior.
arXiv Detail & Related papers (2021-03-10T06:42:10Z) - AutoPrompt: Eliciting Knowledge from Language Models with Automatically
Generated Prompts [46.03503882865222]
AutoPrompt is an automated method to create prompts for a diverse set of tasks based on a gradient-guided search.
We show that masked language models (MLMs) have an inherent capability to perform sentiment analysis and natural language inference without additional parameters or finetuning.
arXiv Detail & Related papers (2020-10-29T22:54:00Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - Induction and Exploitation of Subgoal Automata for Reinforcement
Learning [75.55324974788475]
We present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks.
ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task's subgoals.
A subgoal automaton also consists of two special states: a state indicating the successful completion of the task, and a state indicating that the task has finished without succeeding.
arXiv Detail & Related papers (2020-09-08T16:42:55Z) - A Composable Specification Language for Reinforcement Learning Tasks [23.08652058034537]
We propose a language for specifying complex control tasks, along with an algorithm that compiles specifications in our language into a reward function and automatically performs reward shaping.
We implement our approach in a tool called SPECTRL, and show that it outperforms several state-of-the-art baselines.
arXiv Detail & Related papers (2020-08-21T03:40:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.