MBB: Model-Based Baseline for Global Guidance of Model-Free
Reinforcement Learning via Lower-Dimensional Solutions
- URL: http://arxiv.org/abs/2011.02073v4
- Date: Sat, 23 Oct 2021 00:28:56 GMT
- Title: MBB: Model-Based Baseline for Global Guidance of Model-Free
Reinforcement Learning via Lower-Dimensional Solutions
- Authors: Xubo Lyu, Site Li, Seth Siriya, Ye Pu, Mo Chen
- Abstract summary: We show how to solve complex robotic tasks with hi-dim state spaces.
First, we compute a lo-dim value function for the lo-dim version of the problem.
Then, the lo-dim value function is used as a baseline function to warm-start the model-free RL process.
- Score: 8.6216807235051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One spectrum on which robotic control paradigms lie is the degree in which a
model of the environment is involved, from methods that are completely
model-free such as model-free RL, to methods that require a known model such as
optimal control, with other methods such as model-based RL somewhere in the
middle. On one end of the spectrum, model-free RL can learn control policies
for high-dimensional (hi-dim), complex robotic tasks through trial-and-error
without knowledge of a model of the environment, but tends to require a large
amount of data. On the other end, "classical methods" such as optimal control
generate solutions without collecting data, but assume that an accurate model
of the system and environment is known and are mostly limited to problems with
low-dimensional (lo-dim) state spaces. In this paper, we bring the two ends of
the spectrum together. Although models of hi-dim systems and environments may
not exist, lo-dim approximations of these systems and environments are widely
available, especially in robotics. Therefore, we propose to solve hi-dim,
complex robotic tasks in two stages. First, assuming a coarse model of the
hi-dim system, we compute a lo-dim value function for the lo-dim version of the
problem using classical methods (eg. value iteration and optimal control).
Then, the lo-dim value function is used as a baseline function to warm-start
the model-free RL process that learns hi-dim policies. The lo-dim value
function provides global guidance for model-free RL, alleviating the data
inefficiency of model-free RL. We demonstrate our approach on two robot
learning tasks with hi-dim state spaces and observe significant improvement in
policy performance and learning efficiency. We also give an empirical analysis
of our method with a third task.
Related papers
- MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Pretty darn good control: when are approximate solutions better than
approximate models [0.0]
We show that DRL algorithms can successfully approximate solutions in a non-linear three-variable model for a fishery.
We show that the policy obtained with DRL is both more profitable and more sustainable than any constant mortality policy.
arXiv Detail & Related papers (2023-08-25T19:58:17Z) - Learning Environment Models with Continuous Stochastic Dynamics [0.0]
We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent.
In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics.
We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot.
arXiv Detail & Related papers (2023-06-29T12:47:28Z) - Efficient Preference-Based Reinforcement Learning Using Learned Dynamics
Models [13.077993395762185]
Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences.
We study the benefits and challenges of using a learned dynamics model when performing PbRL.
arXiv Detail & Related papers (2023-01-11T22:22:54Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - CostNet: An End-to-End Framework for Goal-Directed Reinforcement
Learning [9.432068833600884]
Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment.
There are two approaches, model-based and model-free reinforcement learning, that show concrete results in several disciplines.
This paper introduces a novel reinforcement learning algorithm for predicting the distance between two states in a Markov Decision Process.
arXiv Detail & Related papers (2022-10-03T21:16:14Z) - An Experimental Design Perspective on Model-Based Reinforcement Learning [73.37942845983417]
In practical applications of RL, it is expensive to observe state transitions from the environment.
We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process.
arXiv Detail & Related papers (2021-12-09T23:13:57Z) - Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow [14.422129911404472]
Bellman aims to fill this gap and introduces the first thoroughly designed and tested model-based RL toolbox.
Our modular approach enables to combine a wide range of environment models with generic model-based agent classes that recover state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-26T11:32:27Z) - Model-Based Visual Planning with Self-Supervised Functional Distances [104.83979811803466]
We present a self-supervised method for model-based visual goal reaching.
Our approach learns entirely using offline, unlabeled data.
We find that this approach substantially outperforms both model-free and model-based prior methods.
arXiv Detail & Related papers (2020-12-30T23:59:09Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.