Should Models Be Accurate?
- URL: http://arxiv.org/abs/2205.10736v1
- Date: Sun, 22 May 2022 04:23:54 GMT
- Title: Should Models Be Accurate?
- Authors: Esra'a Saleh, John D. Martin, Anna Koop, Arash Pourzarabi, Michael
Bowling
- Abstract summary: We focus our investigations on Dyna-style planning in a prediction setting.
We introduce a meta-learning algorithm for training models with a focus on their usefulness to the learner instead of their accuracy in modelling the environment.
Our experiments show that our algorithm enables faster learning than even using an accurate model built with domain-specific knowledge of the non-stationarity.
- Score: 14.044354912031864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based Reinforcement Learning (MBRL) holds promise for data-efficiency
by planning with model-generated experience in addition to learning with
experience from the environment. However, in complex or changing environments,
models in MBRL will inevitably be imperfect, and their detrimental effects on
learning can be difficult to mitigate. In this work, we question whether the
objective of these models should be the accurate simulation of environment
dynamics at all. We focus our investigations on Dyna-style planning in a
prediction setting. First, we highlight and support three motivating points: a
perfectly accurate model of environment dynamics is not practically achievable,
is not necessary, and is not always the most useful anyways. Second, we
introduce a meta-learning algorithm for training models with a focus on their
usefulness to the learner instead of their accuracy in modelling the
environment. Our experiments show that in a simple non-stationary environment,
our algorithm enables faster learning than even using an accurate model built
with domain-specific knowledge of the non-stationarity.
Related papers
- HarmonyDream: Task Harmonization Inside World Models [93.07314830304193]
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning.
We propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization.
arXiv Detail & Related papers (2023-09-30T11:38:13Z) - Learning Environment Models with Continuous Stochastic Dynamics [0.0]
We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent.
In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics.
We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot.
arXiv Detail & Related papers (2023-06-29T12:47:28Z) - Minimal Value-Equivalent Partial Models for Scalable and Robust Planning
in Lifelong Reinforcement Learning [56.50123642237106]
Common practice in model-based reinforcement learning is to learn models that model every aspect of the agent's environment.
We argue that such models are not particularly well-suited for performing scalable and robust planning in lifelong reinforcement learning scenarios.
We propose new kinds of models that only model the relevant aspects of the environment, which we call "minimal value-minimal partial models"
arXiv Detail & Related papers (2023-01-24T16:40:01Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Physics-informed Dyna-Style Model-Based Deep Reinforcement Learning for
Dynamic Control [1.8275108630751844]
We propose to leverage the prior knowledge of underlying physics of the environment, where the governing laws are (partially) known.
By incorporating the prior information of the environment, the quality of the learned model can be notably improved.
arXiv Detail & Related papers (2021-07-31T02:19:36Z) - Model-Advantage Optimization for Model-Based Reinforcement Learning [41.13567626667456]
Model-based Reinforcement Learning (MBRL) algorithms have been traditionally designed with the goal of learning accurate dynamics of the environment.
Value-aware model learning, an alternative model-learning paradigm to maximum likelihood, proposes to inform model-learning through the value function of the learnt policy.
We propose a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models.
arXiv Detail & Related papers (2021-06-26T20:01:28Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Objective Mismatch in Model-based Reinforcement Learning [14.92062504466269]
Model-based reinforcement learning (MBRL) has been shown to be a powerful framework for data-efficiently learning control of continuous tasks.
We identify a fundamental issue of the standard MBRL framework -- what we call the objective mismatch issue.
We propose an initial method to mitigate the mismatch issue by re-weighting dynamics model training.
arXiv Detail & Related papers (2020-02-11T16:26:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.