Learning Interpretable Models of Aircraft Handling Behaviour by
Reinforcement Learning from Human Feedback
- URL: http://arxiv.org/abs/2305.16924v1
- Date: Fri, 26 May 2023 13:37:59 GMT
- Title: Learning Interpretable Models of Aircraft Handling Behaviour by
Reinforcement Learning from Human Feedback
- Authors: Tom Bewley, Jonathan Lawry, Arthur Richards
- Abstract summary: We use pairwise preferences over simulated flight trajectories to learn an interpretable rule-based model called a reward tree.
We train an RL agent to execute high-quality handling behaviour by using the reward tree as the objective.
- Score: 12.858982225307809
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method to capture the handling abilities of fast jet pilots in a
software model via reinforcement learning (RL) from human preference feedback.
We use pairwise preferences over simulated flight trajectories to learn an
interpretable rule-based model called a reward tree, which enables the
automated scoring of trajectories alongside an explanatory rationale. We train
an RL agent to execute high-quality handling behaviour by using the reward tree
as the objective, and thereby generate data for iterative preference collection
and further refinement of both tree and agent. Experiments with synthetic
preferences show reward trees to be competitive with uninterpretable neural
network reward models on quantitative and qualitative evaluations.
Related papers
- Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems [17.10762463903638]
We train evaluation models to approximate human evaluation, achieving high agreement.
We propose a weak-to-strong supervision method that uses a fraction of the annotated data to train an evaluation model.
arXiv Detail & Related papers (2024-06-26T10:48:14Z) - Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment [69.33930972652594]
We propose a novel structural pruning approach to jointly learn the weights and structurally prune architectures of CNN models.
The core element of our method is a Reinforcement Learning (RL) agent whose actions determine the pruning ratios of the CNN model's layers.
We conduct the joint training and pruning by iteratively training the model's weights and the agent's policy.
arXiv Detail & Related papers (2024-03-28T15:22:29Z) - RewardBench: Evaluating Reward Models for Language Modeling [100.28366840977966]
We present RewardBench, a benchmark dataset and code-base for evaluation of reward models.
The dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety.
On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods.
arXiv Detail & Related papers (2024-03-20T17:49:54Z) - Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble [67.4269821365504]
Reinforcement Learning from Human Feedback (RLHF) is a widely adopted approach for aligning large language models with human values.
However, RLHF relies on a reward model that is trained with a limited amount of human preference data.
We contribute a reward ensemble method that allows the reward model to make more accurate predictions.
arXiv Detail & Related papers (2024-01-30T00:17:37Z) - Iterative Data Smoothing: Mitigating Reward Overfitting and
Overoptimization in RLHF [79.98542868281471]
Reinforcement Learning from Human Feedback (RLHF) is a technique that aligns language models closely with human-centric values.
It is observed that the performance of the reward model degrades after one epoch of training, and optimizing too much against the learned reward model eventually hinders the true objective.
This paper delves into these issues, leveraging the theoretical insights to design improved reward learning algorithm termed 'Iterative Data Smoothing' (IDS)
arXiv Detail & Related papers (2024-01-29T17:43:42Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Training a Helpful and Harmless Assistant with Reinforcement Learning
from Human Feedback [8.409764908043396]
We apply preference modeling and reinforcement learning from human feedback to finetune language models to act as helpful assistants.
We find this alignment training improves performance on almost all NLP evaluations.
We explore an iterated online mode of training, where preference models and RL policies are updated on a weekly cadence with fresh human feedback data.
arXiv Detail & Related papers (2022-04-12T15:02:38Z) - Learning Reward Models for Cooperative Trajectory Planning with Inverse
Reinforcement Learning and Monte Carlo Tree Search [2.658812114255374]
This work employs feature-based Entropy Inverse Reinforcement Learning to learn reward models that maximize the likelihood of recorded cooperative expert trajectories.
The evaluation demonstrates that the approach is capable of recovering a reasonable reward model that mimics the expert and performs similar to a manually tuned baseline reward model.
arXiv Detail & Related papers (2022-02-14T00:33:08Z) - Interpretable Preference-based Reinforcement Learning with
Tree-Structured Reward Functions [2.741266294612776]
We propose an online, active preference learning algorithm that constructs reward functions with the intrinsically interpretable, compositional structure of a tree.
We demonstrate sample-efficient learning of tree-structured reward functions in several environments, then harness the enhanced interpretability to explore and debug for alignment.
arXiv Detail & Related papers (2021-12-20T09:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.