Related papers: TubeDAgger: Reducing the Number of Expert Interventions with Stochastic Reach-Tubes

TubeDAgger: Reducing the Number of Expert Interventions with Stochastic Reach-Tubes

URL: http://arxiv.org/abs/2510.00906v1
Date: Wed, 01 Oct 2025 13:45:16 GMT
Title: TubeDAgger: Reducing the Number of Expert Interventions with Stochastic Reach-Tubes
Authors: Julian Lemmel, Manuel Kranzl, Adam Lamine, Philipp Neubauer, Radu Grosu, Sophie A. Neubauer,
Abstract summary: DAgger algorithm trains a robust novice policy by alternating between interacting with the environment and retraining the network.<n>We propose the use of reachtubes as a novel method for estimating the necessity of expert intervention.
Score: 8.555610126960728
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Interactive Imitation Learning deals with training a novice policy from expert demonstrations in an online fashion. The established DAgger algorithm trains a robust novice policy by alternating between interacting with the environment and retraining of the network. Many variants thereof exist, that differ in the method of discerning whether to allow the novice to act or return control to the expert. We propose the use of stochastic reachtubes - common in verification of dynamical systems - as a novel method for estimating the necessity of expert intervention. Our approach does not require fine-tuning of decision thresholds per environment and effectively reduces the number of expert interventions, especially when compared with related approaches that make use of a doubt classification model.

Related papers

From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation [59.27094165576015]
We propose a novel learning paradigm (UniMod) that transitions from sparse decision-making to dense reasoning traces.<n>By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, we reformulate monolithic decision tasks into a multi-dimensional boundary learning process.<n>We introduce specialized optimization strategies to decouple task-specific parameters and rebalance training dynamics, effectively resolving interference between diverse objectives in multi-task learning.
arXiv Detail & Related papers (2026-01-28T09:29:40Z)
Value Function Decomposition in Markov Recommendation Process [19.082512423102855]
We propose an online reinforcement learning framework to improve recommender performance.<n>We show that these two factors can be separately approximated by decomposing the original temporal difference loss.<n>The disentangled learning framework can achieve a more accurate estimation with faster learning and improved robustness against action exploration.
arXiv Detail & Related papers (2025-01-29T04:22:29Z)
Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective [23.79259400522239]
We propose a novel approach to address catastrophic forgetting in Continual Relation Extraction.<n>Our approach employs a prompt pool for each task, capturing variations within each task while enhancing cross-task variances.
arXiv Detail & Related papers (2024-12-11T11:00:33Z)
Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.<n>Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.<n>Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior. In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z)
Interactive Graph Convolutional Filtering [79.34979767405979]
Interactive Recommender Systems (IRS) have been increasingly used in various domains, including personalized article recommendation, social media, and online advertising. These problems are exacerbated by the cold start problem and data sparsity problem. Existing Multi-Armed Bandit methods, despite their carefully designed exploration strategies, often struggle to provide satisfactory results in the early stages. Our proposed method extends interactive collaborative filtering into the graph model to enhance the performance of collaborative filtering between users and items.
arXiv Detail & Related papers (2023-09-04T09:02:31Z)
MEGA-DAgger: Imitation Learning with Multiple Imperfect Experts [7.4506213369860195]
MEGA-DAgger is a new DAgger variant that is suitable for interactive learning with multiple imperfect experts. We demonstrate that policy learned using MEGA-DAgger can outperform both experts and policies learned using the state-of-the-art interactive imitation learning algorithms.
arXiv Detail & Related papers (2023-03-01T16:40:54Z)
Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z)
Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems [13.705147776518421]
State-of-the-art systems use a model-based approach to enable natural conversations. We propose a scalable self-learning approach to explore routing alternatives.
arXiv Detail & Related papers (2022-04-14T17:46:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.