Related papers: Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

URL: http://arxiv.org/abs/2112.11230v1
Date: Mon, 20 Dec 2021 09:53:23 GMT
Title: Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions
Authors: Tom Bewley, Freddy Lecue
Abstract summary: We propose an online, active preference learning algorithm that constructs reward functions with the intrinsically interpretable, compositional structure of a tree. We demonstrate sample-efficient learning of tree-structured reward functions in several environments, then harness the enhanced interpretability to explore and debug for alignment.
Score: 2.741266294612776
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The potential of reinforcement learning (RL) to deliver aligned and performant agents is partially bottlenecked by the reward engineering problem. One alternative to heuristic trial-and-error is preference-based RL (PbRL), where a reward function is inferred from sparse human feedback. However, prior PbRL methods lack interpretability of the learned reward structure, which hampers the ability to assess robustness and alignment. We propose an online, active preference learning algorithm that constructs reward functions with the intrinsically interpretable, compositional structure of a tree. Using both synthetic and human-provided feedback, we demonstrate sample-efficient learning of tree-structured reward functions in several environments, then harness the enhanced interpretability to explore and debug for alignment.

Related papers

Learning Explainable Dense Reward Shapes via Bayesian Optimization [45.34810347865996]
We frame reward shaping as an optimization problem focused on token-level credit assignment. We use explainability methods such as SHAP and LIME to estimate per-token rewards from the reward model. Our experiments show that achieving a better balance of token-level reward attribution leads to performance improvements over baselines.
arXiv Detail & Related papers (2025-04-22T21:09:33Z)
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules. We use decision trees to convey this reasoning information, as they can be easily represented in natural language. OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z)
A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback [6.578074497549894]
Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning. This paper introduces a novel linear programming (LP) framework tailored for offline reward learning.
arXiv Detail & Related papers (2024-05-20T23:59:26Z)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world. Recent methods aim to mitigate misalignment by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Deep Reinforcement Learning from Hierarchical Preference Design [99.46415116087259]
This paper shows by exploiting certain structures, one can ease the reward design process. We propose a hierarchical reward modeling framework -- HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning.
arXiv Detail & Related papers (2023-09-06T00:44:29Z)
Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories. We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z)
Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback [12.858982225307809]
We use pairwise preferences over simulated flight trajectories to learn an interpretable rule-based model called a reward tree. We train an RL agent to execute high-quality handling behaviour by using the reward tree as the objective.
arXiv Detail & Related papers (2023-05-26T13:37:59Z)
Reward Learning with Trees: Methods and Evaluation [10.473362152378979]
We propose a method for learning reward trees from preference labels. We show it to be broadly competitive with neural networks on challenging high-dimensional tasks. Having found that reward tree learning can be done effectively in complex settings, we then consider why it should be used.
arXiv Detail & Related papers (2022-10-03T15:17:25Z)
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications. We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA) Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z)
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms. Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward. Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z)
Provable Hierarchy-Based Meta-Reinforcement Learning [50.17896588738377]
We analyze HRL in the meta-RL setting, where learner learns latent hierarchical structure during meta-training for use in a downstream task. We provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy. Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
arXiv Detail & Related papers (2021-10-18T17:56:02Z)
Measure Inducing Classification and Regression Trees for Functional Data [0.0]
We propose a tree-based algorithm for classification and regression problems in the context of functional data analysis. This is achieved by learning a weighted functional $L2$ space by means of constrained convex optimization.
arXiv Detail & Related papers (2020-10-30T18:49:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.