On The Expressivity of Objective-Specification Formalisms in
Reinforcement Learning
- URL: http://arxiv.org/abs/2310.11840v2
- Date: Sat, 17 Feb 2024 14:21:40 GMT
- Title: On The Expressivity of Objective-Specification Formalisms in
Reinforcement Learning
- Authors: Rohan Subramani and Marcus Williams and Max Heitmann and Halfdan Holm
and Charlie Griffin and Joar Skalse
- Abstract summary: We compare objective-specification formalisms in reinforcement learning.
No formalism is both dominantly expressive and straightforward to optimise with current techniques.
Results highlight the need for future research which adapts reward learning to work with a greater variety of formalisms.
- Score: 4.998202587873575
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most algorithms in reinforcement learning (RL) require that the objective is
formalised with a Markovian reward function. However, it is well-known that
certain tasks cannot be expressed by means of an objective in the Markov
rewards formalism, motivating the study of alternative objective-specification
formalisms in RL such as Linear Temporal Logic and Multi-Objective
Reinforcement Learning. To date, there has not yet been any thorough analysis
of how these formalisms relate to each other in terms of their expressivity. We
fill this gap in the existing literature by providing a comprehensive
comparison of 17 salient objective-specification formalisms. We place these
formalisms in a preorder based on their expressive power, and present this
preorder as a Hasse diagram. We find a variety of limitations for the different
formalisms, and argue that no formalism is both dominantly expressive and
straightforward to optimise with current techniques. For example, we prove that
each of Regularised RL, (Outer) Nonlinear Markov Rewards, Reward Machines,
Linear Temporal Logic, and Limit Average Rewards can express a task that the
others cannot. The significance of our results is twofold. First, we identify
important expressivity limitations to consider when specifying objectives for
policy optimization. Second, our results highlight the need for future research
which adapts reward learning to work with a greater variety of formalisms,
since many existing reward learning methods assume that the desired objective
takes a Markovian form. Our work contributes towards a more cohesive
understanding of the costs and benefits of different RL objective-specification
formalisms.
Related papers
- Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective [59.7140089198992]
We develop a mathematic framework that defines abstract reasoning as the ability to extract essential patterns.<n>We introduce two novel complementary metrics: (scoreGamma) measures basic reasoning accuracy, while (scoreDelta) quantifies a model's reliance on specific symbols.
arXiv Detail & Related papers (2025-05-28T09:02:45Z) - Rethinking Multi-Objective Learning through Goal-Conditioned Supervised Learning [8.593384839118658]
Multi-objective learning aims to optimize multiple objectives simultaneously with a single model.
It suffers from the difficulty to formalize and conduct the exact learning process.
We propose a general framework for automatically learning to achieve multiple objectives based on the existing sequential data.
arXiv Detail & Related papers (2024-12-12T03:47:40Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning [48.59516337905877]
Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents.
Recent work has developed theoretical insights into these algorithms.
We take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective.
arXiv Detail & Related papers (2024-06-04T07:22:12Z) - Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales [54.78115855552886]
We show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture.
With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner.
For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
arXiv Detail & Related papers (2024-02-23T16:50:07Z) - Defining Replicability of Prediction Rules [2.4366811507669124]
I propose an approach for defining replicability for prediction rules.
I focus specifically on the meaning of "consistent results" in typical utilization contexts.
arXiv Detail & Related papers (2023-04-30T13:27:55Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - DAReN: A Collaborative Approach Towards Reasoning And Disentangling [27.50150027974947]
We propose an end-to-end joint representation-reasoning learning framework, which leverages a weak form of inductive bias to improve both tasks together.
We accomplish this using a novel learning framework Disentangling based Abstract Reasoning Network (DAReN) based on the principles of GM-RPM.
arXiv Detail & Related papers (2021-09-27T16:10:30Z) - Variational Empowerment as Representation Learning for Goal-Based
Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment.
Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.