Sample-Efficient Model-based Actor-Critic for an Interactive Dialogue
Task
- URL: http://arxiv.org/abs/2004.13657v1
- Date: Tue, 28 Apr 2020 17:00:59 GMT
- Title: Sample-Efficient Model-based Actor-Critic for an Interactive Dialogue
Task
- Authors: Katya Kudashkina, Valliappa Chockalingam, Graham W. Taylor, Michael
Bowling
- Abstract summary: We present a model-based reinforcement learning for an interactive dialogue task.
We build on commonly used actor-critic methods, adding an environment model and planner that augments a learning agent to learn.
Our results show that, on a simulation that mimics the interactive task our algorithm requires 70 times fewer samples, compared to the baseline of commonly used model-free algorithm.
- Score: 27.896714528986855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-computer interactive systems that rely on machine learning are becoming
paramount to the lives of millions of people who use digital assistants on a
daily basis. Yet, further advances are limited by the availability of data and
the cost of acquiring new samples. One way to address this problem is by
improving the sample efficiency of current approaches. As a solution path, we
present a model-based reinforcement learning algorithm for an interactive
dialogue task. We build on commonly used actor-critic methods, adding an
environment model and planner that augments a learning agent to learn the model
of the environment dynamics. Our results show that, on a simulation that mimics
the interactive task, our algorithm requires 70 times fewer samples, compared
to the baseline of commonly used model-free algorithm, and demonstrates 2~times
better performance asymptotically. Moreover, we introduce a novel contribution
of computing a soft planner policy and further updating a model-free policy
yielding a less computationally expensive model-free agent as good as the
model-based one. This model-based architecture serves as a foundation that can
be extended to other human-computer interactive tasks allowing further advances
in this direction.
Related papers
- Learning Low-Dimensional Strain Models of Soft Robots by Looking at the Evolution of Their Shape with Application to Model-Based Control [2.058941610795796]
This paper introduces a streamlined method for learning low-dimensional, physics-based models.
We validate our approach through simulations with various planar soft manipulators.
Thanks to the capability of the method of generating physically compatible models, the learned models can be straightforwardly combined with model-based control policies.
arXiv Detail & Related papers (2024-10-31T18:37:22Z) - Model-based Policy Optimization using Symbolic World Model [46.42871544295734]
The application of learning-based control methods in robotics presents significant challenges.
One is that model-free reinforcement learning algorithms use observation data with low sample efficiency.
We suggest approximating transition dynamics with symbolic expressions, which are generated via symbolic regression.
arXiv Detail & Related papers (2024-07-18T13:49:21Z) - Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation [8.940998315746684]
We propose a model-based reinforcement learning (RL) approach for robotic arm end-tasks.
We employ Bayesian neural network models to represent, in a probabilistic way, both the belief and information encoded in the dynamic model during exploration.
Our experiments show the advantages of our Bayesian model-based RL approach, with similar quality in the results than relevant alternatives.
arXiv Detail & Related papers (2024-04-02T11:44:37Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - Data efficient surrogate modeling for engineering design: Ensemble-free
batch mode deep active learning for regression [0.6021787236982659]
We propose a simple and scalable approach for active learning that works in a student-teacher manner to train a surrogate model.
By using this proposed approach, we are able to achieve the same level of surrogate accuracy as the other baselines like DBAL and Monte Carlo sampling.
arXiv Detail & Related papers (2022-11-16T02:31:57Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Model-based Meta Reinforcement Learning using Graph Structured Surrogate
Models [40.08137765886609]
We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics.
Our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.
arXiv Detail & Related papers (2021-02-16T17:21:55Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.