Related papers: Sample-Efficient Expert Query Control in Active Imitation Learning via Conformal Prediction

Sample-Efficient Expert Query Control in Active Imitation Learning via Conformal Prediction

URL: http://arxiv.org/abs/2512.00453v1
Date: Sat, 29 Nov 2025 11:58:21 GMT
Title: Sample-Efficient Expert Query Control in Active Imitation Learning via Conformal Prediction
Authors: Arad Firouzkouhi, Omid Mirzaeedodangeh, Lars Lindemann,
Abstract summary: We present Conformalized Rejection Sampling for Active Imitation Learning (CRSAIL)<n>CRSAIL scores state novelty by the distance to the $K$-th nearest expert state.<n>It reduces total expert queries by up to 96% vs. DAgger and up to 65% vs. prior AIL methods.
Score: 2.344992278528697
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Active imitation learning (AIL) combats covariate shift by querying an expert during training. However, expert action labeling often dominates the cost, especially in GPU-intensive simulators, human-in-the-loop settings, and robot fleets that revisit near-duplicate states. We present Conformalized Rejection Sampling for Active Imitation Learning (CRSAIL), a querying rule that requests an expert action only when the visited state is under-represented in the expert-labeled dataset. CRSAIL scores state novelty by the distance to the $K$-th nearest expert state and sets a single global threshold via conformal prediction. This threshold is the empirical $(1-α)$ quantile of on-policy calibration scores, providing a distribution-free calibration rule that links $α$ to the expected query rate and makes $α$ a task-agnostic tuning knob. This state-space querying strategy is robust to outliers and, unlike safety-gate-based AIL, can be run without real-time expert takeovers: we roll out full trajectories (episodes) with the learner and only afterward query the expert on a subset of visited states. Evaluated on MuJoCo robotics tasks, CRSAIL matches or exceeds expert-level reward while reducing total expert queries by up to 96% vs. DAgger and up to 65% vs. prior AIL methods, with empirical robustness to $α$ and $K$, easing deployment on novel systems with unknown dynamics.

Related papers

Matching Multiple Experts: On the Exploitability of Multi-Agent Imitation Learning [51.77462571479799]
Multi-agent imitation learning (MA-IL) aims to learn optimal policies from expert demonstrations of interactions in multi-agent interactive domains.<n>Despite existing guarantees on the performance of the resulting learned policies, characterizations of how far the learned polices are from a Nash equilibrium are missing for offline MA-IL.
arXiv Detail & Related papers (2026-02-24T15:38:11Z)
UCB-type Algorithm for Budget-Constrained Expert Learning [71.67657715154034]
algnameM-LCB is a UCB-style meta-algorithm that provides emphanytime regret guarantees<n>We show how algnameM-LCB extends the classical bandit paradigm to the more realistic scenario of coordinating stateful, self-learning experts under limited resources.
arXiv Detail & Related papers (2025-10-26T12:36:17Z)
No Need for Learning to Defer? A Training Free Deferral Framework to Multiple Experts through Conformal Prediction [3.746889836344766]
We propose a training-free, model- and expert-agnostic framework for expert deferral based on conformal prediction.<n>Our method consistently outperforms both the standalone model and the strongest expert.
arXiv Detail & Related papers (2025-09-16T02:01:21Z)
Offline Imitation Learning with Model-based Reverse Augmentation [48.64791438847236]
We propose a novel model-based framework, called offline Imitation Learning with Self-paced Reverse Augmentation. Specifically, we build a reverse dynamic model from the offline demonstrations, which can efficiently generate trajectories leading to the expert-observed states. We use the subsequent reinforcement learning method to learn from the augmented trajectories and transit from expert-unobserved states to expert-observed states.
arXiv Detail & Related papers (2024-06-18T12:27:02Z)
A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories [122.11358440078581]
offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. We propose Trajectory-Aware Learning from Observations (TAILO) to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available.
arXiv Detail & Related papers (2023-11-02T15:41:09Z)
Industrial Anomaly Detection and Localization Using Weakly-Supervised Residual Transformers [44.344548601242444]
We introduce a novel framework, Weakly-supervised RESidual Transformer (WeakREST), to achieve high anomaly detection accuracy.<n>We reformulate the pixel-wise anomaly localization task into a block-wise classification problem.<n>We develop a novel ResMixMatch algorithm, capable of handling the interplay between weak labels and residual-based representations.
arXiv Detail & Related papers (2023-06-06T08:19:30Z)
Active Ranking of Experts Based on their Performances in Many Tasks [72.96112117037465]
We consider the problem of ranking n experts based on their performances on d tasks. We make a monotonicity assumption stating that for each pair of experts, one outperforms the other on all tasks.
arXiv Detail & Related papers (2023-06-05T06:55:39Z)
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z)
DADAgger: Disagreement-Augmented Dataset Aggregation [0.0]
DAgger is an imitation algorithm that aggregates its original datasets by querying the expert on all samples encountered during training. We propose a modification to DAgger, known as DADAgger, which only queries the expert for state-action pairs that are out of distribution.
arXiv Detail & Related papers (2023-01-03T20:44:14Z)
Skill-Based Reinforcement Learning with Intrinsic Reward Matching [77.34726150561087]
We present Intrinsic Reward Matching (IRM), which unifies task-agnostic skill pretraining and task-aware finetuning. IRM enables us to utilize pretrained skills far more effectively than previous skill selection methods.
arXiv Detail & Related papers (2022-10-14T00:04:49Z)
Fast rates for prediction with limited expert advice [0.0]
We investigate the problem of minimizing the excess generalization error with respect to the best expert prediction in a finite family under limited access to information. We show that if we are allowed to see the advice of only one expert per round for T rounds in the training phase, the worst-case excess risk is $Omega$ (1/ $sqrt$ T) with probability lower bounded by a constant. We design novel algorithms achieving this rate in this setting, and give precise instance-dependent bounds on the number of training rounds and queries needed to achieve a given generalization error precision.
arXiv Detail & Related papers (2021-10-27T14:57:36Z)
Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes [36.104981594178525]
We study a variant of the contextual bandit problem where an agent can intervene through a set of expert policies. We propose the Divergence-based Upper Confidence Bound (D-UCB) algorithm that uses importance sampling to share information across experts. We also provide the Empirical D-UCB (ED-UCB) algorithm that can function with only approximate knowledge of expert distributions.
arXiv Detail & Related papers (2021-07-07T14:58:14Z)
Meta-AAD: Active Anomaly Detection with Deep Reinforcement Learning [56.65934079419417]
High false-positive rate is a long-standing challenge for anomaly detection algorithms. We propose Active Anomaly Detection with Meta-Policy (Meta-AAD), a novel framework that learns a meta-policy for query selection.
arXiv Detail & Related papers (2020-09-16T01:47:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.