Reward Learning with Trees: Methods and Evaluation
- URL: http://arxiv.org/abs/2210.01007v1
- Date: Mon, 3 Oct 2022 15:17:25 GMT
- Title: Reward Learning with Trees: Methods and Evaluation
- Authors: Tom Bewley, Jonathan Lawry, Arthur Richards, Rachel Craddock, Ian
Henderson
- Abstract summary: We propose a method for learning reward trees from preference labels.
We show it to be broadly competitive with neural networks on challenging high-dimensional tasks.
Having found that reward tree learning can be done effectively in complex settings, we then consider why it should be used.
- Score: 10.473362152378979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent efforts to learn reward functions from human feedback have tended to
use deep neural networks, whose lack of transparency hampers our ability to
explain agent behaviour or verify alignment. We explore the merits of learning
intrinsically interpretable tree models instead. We develop a recently proposed
method for learning reward trees from preference labels, and show it to be
broadly competitive with neural networks on challenging high-dimensional tasks,
with good robustness to limited or corrupted data. Having found that reward
tree learning can be done effectively in complex settings, we then consider why
it should be used, demonstrating that the interpretable reward structure gives
significant scope for traceability, verification and explanation.
Related papers
- Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - NSOTree: Neural Survival Oblique Tree [0.21756081703275998]
Survival analysis is a statistical method employed to scrutinize the duration until a specific event of interest transpires.
Deep learning-based methods have dominated this field due to their representational capacity and state-of-the-art performance.
In this paper, we leverage the strengths of both neural networks and tree-based methods, capitalizing on their ability to approximate intricate functions while maintaining interpretability.
arXiv Detail & Related papers (2023-09-25T02:14:15Z) - Minimizing Control for Credit Assignment with Strong Feedback [65.59995261310529]
Current methods for gradient-based credit assignment in deep neural networks need infinitesimally small feedback signals.
We combine strong feedback influences on neural activity with gradient-based learning and show that this naturally leads to a novel view on neural network optimization.
We show that the use of strong feedback in DFC allows learning forward and feedback connections simultaneously, using a learning rule fully local in space and time.
arXiv Detail & Related papers (2022-04-14T22:06:21Z) - Interpretable Preference-based Reinforcement Learning with
Tree-Structured Reward Functions [2.741266294612776]
We propose an online, active preference learning algorithm that constructs reward functions with the intrinsically interpretable, compositional structure of a tree.
We demonstrate sample-efficient learning of tree-structured reward functions in several environments, then harness the enhanced interpretability to explore and debug for alignment.
arXiv Detail & Related papers (2021-12-20T09:53:23Z) - Backprop-Free Reinforcement Learning with Active Neural Generative
Coding [84.11376568625353]
We propose a computational framework for learning action-driven generative models without backpropagation of errors (backprop) in dynamic environments.
We develop an intelligent agent that operates even with sparse rewards, drawing inspiration from the cognitive theory of planning as inference.
The robust performance of our agent offers promising evidence that a backprop-free approach for neural inference and learning can drive goal-directed behavior.
arXiv Detail & Related papers (2021-07-10T19:02:27Z) - Sparse Oblique Decision Trees: A Tool to Understand and Manipulate
Neural Net Features [3.222802562733787]
We focus on understanding which of the internal features computed by the neural net are responsible for a particular class.
We show we can easily manipulate the neural net features in order to make the net predict, or not predict, a given class, thus showing that it is possible to carry out adversarial attacks at the level of the features.
arXiv Detail & Related papers (2021-04-07T05:31:08Z) - Learning Intrinsic Symbolic Rewards in Reinforcement Learning [7.101885582663675]
We present a method that discovers dense rewards in the form of low-dimensional symbolic trees.
We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks.
arXiv Detail & Related papers (2020-10-08T00:02:46Z) - Reward Propagation Using Graph Convolutional Networks [61.32891095232801]
We propose a new framework for learning potential functions by leveraging ideas from graph representation learning.
Our approach relies on Graph Convolutional Networks which we use as a key ingredient in combination with the probabilistic inference view of reinforcement learning.
arXiv Detail & Related papers (2020-10-06T04:38:16Z) - Making Neural Networks Interpretable with Attribution: Application to
Implicit Signals Prediction [11.427019313283997]
We propose a novel formulation of interpretable deep neural networks for the attribution task.
Using masked weights, hidden features can be deeply attributed, split into several input-restricted sub-networks and trained as a boosted mixture of experts.
arXiv Detail & Related papers (2020-08-26T06:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.