Value Driven Representation for Human-in-the-Loop Reinforcement Learning
        - URL: http://arxiv.org/abs/2004.01223v1
- Date: Thu, 2 Apr 2020 18:45:45 GMT
- Title: Value Driven Representation for Human-in-the-Loop Reinforcement Learning
- Authors: Ramtin Keramati, Emma Brunskill
- Abstract summary: We focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent.
We present an algorithm, value driven representation (VDR) that can iteratively and adaptively augment the observation space of a reinforcement learning agent.
We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines.
- Score: 33.79501890330252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Interactive adaptive systems powered by Reinforcement Learning (RL) have many
potential applications, such as intelligent tutoring systems. In such systems
there is typically an external human system designer that is creating,
monitoring and modifying the interactive adaptive system, trying to improve its
performance on the target outcomes. In this paper we focus on algorithmic
foundation of how to help the system designer choose the set of sensors or
features to define the observation space used by reinforcement learning agent.
We present an algorithm, value driven representation (VDR), that can
iteratively and adaptively augment the observation space of a reinforcement
learning agent so that is sufficient to capture a (near) optimal policy. To do
so we introduce a new method to optimistically estimate the value of a policy
using offline simulated Monte Carlo rollouts. We evaluate the performance of
our approach on standard RL benchmarks with simulated humans and demonstrate
significant improvement over prior baselines.
 
      
        Related papers
        - Joint Demonstration and Preference Learning Improves Policy Alignment   with Human Feedback [58.049113055986375]
 We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
 arXiv  Detail & Related papers  (2024-06-11T01:20:53Z)
- Multi-turn Reinforcement Learning from Preference Human Feedback [41.327438095745315]
 Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models with human preferences.
Existing methods work by emulating the preferences at the single decision (turn) level.
We develop novel methods for Reinforcement Learning from preference feedback between two full multi-turn conversations.
 arXiv  Detail & Related papers  (2024-05-23T14:53:54Z)
- Human-centric Reward Optimization for Reinforcement Learning-based   Automated Driving using Large Language Models [15.11759379703718]
 One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively.
This paper introduces an innovative approach that uses large language models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way.
 arXiv  Detail & Related papers  (2024-05-07T09:04:52Z)
- REBEL: Reward Regularization-Based Approach for Robotic Reinforcement   Learning from Human Feedback [61.54791065013767]
 A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.
Recent methods aim to mitigate misalignment by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
 arXiv  Detail & Related papers  (2023-12-22T04:56:37Z)
- A Bayesian Approach to Robust Inverse Reinforcement Learning [54.24816623644148]
 We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL)
The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics.
Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed to have a highly accurate model of the environment.
 arXiv  Detail & Related papers  (2023-09-15T17:37:09Z)
- Predictive Experience Replay for Continual Visual Control and
  Forecasting [62.06183102362871]
 We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
 arXiv  Detail & Related papers  (2023-03-12T05:08:03Z)
- Weakly Supervised Disentangled Representation for Goal-conditioned
  Reinforcement Learning [15.698612710580447]
 We propose a skill learning framework DR-GRL that aims to improve the sample efficiency and policy generalization.
In a weakly supervised manner, we propose a Spatial Transform AutoEncoder (STAE) to learn an interpretable and controllable representation.
We empirically demonstrate that DR-GRL significantly outperforms the previous methods in sample efficiency and policy generalization.
 arXiv  Detail & Related papers  (2022-02-28T09:05:14Z)
- Generative Adversarial Reward Learning for Generalized Behavior Tendency
  Inference [71.11416263370823]
 We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
 arXiv  Detail & Related papers  (2021-05-03T13:14:25Z)
- Control-Aware Representations for Model-based Reinforcement Learning [36.221391601609255]
 A major challenge in modern reinforcement learning (RL) is efficient control of dynamical systems from high-dimensional sensory observations.
 Learning controllable embedding (LCE) is a promising approach that addresses this challenge by embedding the observations into a lower-dimensional latent space.
Two important questions in this area are how to learn a representation that is amenable to the control problem at hand, and how to achieve an end-to-end framework for representation learning and control.
 arXiv  Detail & Related papers  (2020-06-24T01:00:32Z)
- Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
 We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
 arXiv  Detail & Related papers  (2020-06-10T11:18:57Z)
- Gradient Monitored Reinforcement Learning [0.0]
 We focus on the enhancement of training and evaluation performance in reinforcement learning algorithms.
We propose an approach to steer the learning in the weight parameters of a neural network based on the dynamic development and feedback from the training process itself.
 arXiv  Detail & Related papers  (2020-05-25T13:45:47Z)
- Optimization-driven Deep Reinforcement Learning for Robust Beamforming
  in IRS-assisted Wireless Communications [54.610318402371185]
 Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver.
We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming.
We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
 arXiv  Detail & Related papers  (2020-05-25T01:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.