An Offline Time-aware Apprenticeship Learning Framework for Evolving
Reward Functions
- URL: http://arxiv.org/abs/2305.09070v1
- Date: Mon, 15 May 2023 23:51:07 GMT
- Title: An Offline Time-aware Apprenticeship Learning Framework for Evolving
Reward Functions
- Authors: Xi Yang, Ge Gao, Min Chi
- Abstract summary: Apprenticeship learning (AL) is a process of inducing effective decision-making policies via observing and imitating experts' demonstrations.
Most existing AL approaches are not designed to cope with the evolving reward functions commonly found in human-centric tasks such as healthcare.
We propose an offline Time-aware Hierarchical EM Energy-based Sub-trajectory (THEMES) AL framework to tackle the evolving reward functions in such tasks.
- Score: 19.63724590121946
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Apprenticeship learning (AL) is a process of inducing effective
decision-making policies via observing and imitating experts' demonstrations.
Most existing AL approaches, however, are not designed to cope with the
evolving reward functions commonly found in human-centric tasks such as
healthcare, where offline learning is required. In this paper, we propose an
offline Time-aware Hierarchical EM Energy-based Sub-trajectory (THEMES) AL
framework to tackle the evolving reward functions in such tasks. The
effectiveness of THEMES is evaluated via a challenging task -- sepsis
treatment. The experimental results demonstrate that THEMES can significantly
outperform competitive state-of-the-art baselines.
Related papers
- GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via
Stationary Distribution Correction Estimation [1.4703485217797363]
GO-DICE is an offline IL technique for goal-conditioned long-horizon sequential tasks.
Inspired by the expansive DICE-family of techniques, policy learning at both the levels transpires within the space of stationary distributions.
Experimental results substantiate that GO-DICE outperforms recent baselines.
arXiv Detail & Related papers (2023-12-17T19:47:49Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - CRISP: Curriculum Inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning [25.84621883831624]
We present CRISP, a novel HRL algorithm that generates a curriculum of achievable subgoals for evolving lower-level primitives.
CRISP uses the lower level primitive to periodically perform data relabeling on a handful of expert demonstrations.
We demonstrate that CRISP demonstrates impressive generalization in real world scenarios.
arXiv Detail & Related papers (2023-04-07T08:22:50Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Learning Goal-Conditioned Policies Offline with Self-Supervised Reward
Shaping [94.89128390954572]
We propose a novel self-supervised learning phase on the pre-collected dataset to understand the structure and the dynamics of the model.
We evaluate our method on three continuous control tasks, and show that our model significantly outperforms existing approaches.
arXiv Detail & Related papers (2023-01-05T15:07:10Z) - Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously.
We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z) - Delayed Reinforcement Learning by Imitation [31.932677462399468]
We present a novel algorithm that learns how to act in a delayed environment from undelayed demonstrations.
We show that DIDA obtains high performances with a remarkable sample efficiency on a variety of tasks.
arXiv Detail & Related papers (2022-05-11T15:27:33Z) - Learning from Guided Play: A Scheduled Hierarchical Approach for
Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks.
This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
arXiv Detail & Related papers (2021-12-16T14:58:08Z) - Addressing practical challenges in Active Learning via a hybrid query
strategy [1.607440473560015]
We present a hybrid query strategy-based AL framework that addresses three practical challenges simultaneously: cold-start, oracle uncertainty and performance evaluation of Active Learner.
The robustness of the proposed framework is evaluated across three different environments and industrial settings.
arXiv Detail & Related papers (2021-10-07T20:38:14Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.