Offline Learning for Planning: A Summary
- URL: http://arxiv.org/abs/2010.01931v1
- Date: Mon, 5 Oct 2020 11:41:11 GMT
- Title: Offline Learning for Planning: A Summary
- Authors: Giorgio Angelotti, Nicolas Drougard, Caroline Ponzoni Carvalho Chanel
- Abstract summary: Training of autonomous agents often requires expensive and unsafe trial-and-error interactions with the environment.
Data sets containing recorded experiences of intelligent agents performing various tasks are accessible on the internet.
In this paper we adumbrate the ideas motivating the development of the state-of-the-art offline learning baselines.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The training of autonomous agents often requires expensive and unsafe
trial-and-error interactions with the environment. Nowadays several data sets
containing recorded experiences of intelligent agents performing various tasks,
spanning from the control of unmanned vehicles to human-robot interaction and
medical applications are accessible on the internet. With the intention of
limiting the costs of the learning procedure it is convenient to exploit the
information that is already available rather than collecting new data.
Nevertheless, the incapability to augment the batch can lead the autonomous
agents to develop far from optimal behaviours when the sampled experiences do
not allow for a good estimate of the true distribution of the environment.
Offline learning is the area of machine learning concerned with efficiently
obtaining an optimal policy with a batch of previously collected experiences
without further interaction with the environment. In this paper we adumbrate
the ideas motivating the development of the state-of-the-art offline learning
baselines. The listed methods consist in the introduction of epistemic
uncertainty dependent constraints during the classical resolution of a Markov
Decision Process, with and without function approximators, that aims to
alleviate the bad effects of the distributional mismatch between the available
samples and real world. We provide comments on the practical utility of the
theoretical bounds that justify the application of these algorithms and suggest
the utilization of Generative Adversarial Networks to estimate the
distributional shift that affects all of the proposed model-free and
model-based approaches.
Related papers
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Causal Action Influence Aware Counterfactual Data Augmentation [23.949113120847507]
We propose CAIAC, a data augmentation method that can create synthetic transitions from a fixed dataset without having access to online environment interactions.
By utilizing principled methods for quantifying causal influence, we are able to perform counterfactual reasoning by swapping $itaction$-unaffected parts of the state-space.
This leads to a substantial increase in robustness of offline learning algorithms against distributional shift.
arXiv Detail & Related papers (2024-05-29T09:19:50Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - Model-based Offline Policy Optimization with Adversarial Network [0.36868085124383626]
We propose a novel Model-based Offline policy optimization framework with Adversarial Network (MOAN)
Key idea is to use adversarial learning to build a transition model with better generalization.
Our approach outperforms existing state-of-the-art baselines on widely studied offline RL benchmarks.
arXiv Detail & Related papers (2023-09-05T11:49:33Z) - Offline Robot Reinforcement Learning with Uncertainty-Guided Human
Expert Sampling [11.751910133386254]
Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data.
We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data.
Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
arXiv Detail & Related papers (2022-12-16T01:41:59Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Causal Reinforcement Learning using Observational and Interventional
Data [14.856472820492364]
Learning efficiently a causal model of the environment is a key challenge of model RL agents operating in POMDPs.
We consider a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment.
We then ask the following questions: can the online and offline experiences be safely combined for learning a causal model.
arXiv Detail & Related papers (2021-06-28T06:58:20Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.