Related papers: Steering Robots with Inference-Time Interactions

Steering Robots with Inference-Time Interactions

URL: http://arxiv.org/abs/2506.14287v1
Date: Tue, 17 Jun 2025 07:59:07 GMT
Title: Steering Robots with Inference-Time Interactions
Authors: Yanwei Wang,
Abstract summary: When a pretrained policy makes errors during deployment, there are limited mechanisms for users to correct its behavior.<n>My research proposes an alternative: keeping pretrained policies frozen as a fixed skill repertoire while allowing user interactions to guide behavior generation at inference time.<n>Specifically, I propose (1) inference-time steering, which leverages user interactions to switch between discrete skills, and (2) task and motion imitation, which enables user interactions to edit continuous motions while satisfying task constraints defined by discrete symbolic plans.
Score: 0.5801621787540268
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Imitation learning has driven the development of generalist policies capable of autonomously solving multiple tasks. However, when a pretrained policy makes errors during deployment, there are limited mechanisms for users to correct its behavior. While collecting additional data for finetuning can address such issues, doing so for each downstream use case is inefficient at deployment. My research proposes an alternative: keeping pretrained policies frozen as a fixed skill repertoire while allowing user interactions to guide behavior generation toward user preferences at inference time. By making pretrained policies steerable, users can help correct policy errors when the model struggles to generalize-without needing to finetune the policy. Specifically, I propose (1) inference-time steering, which leverages user interactions to switch between discrete skills, and (2) task and motion imitation, which enables user interactions to edit continuous motions while satisfying task constraints defined by discrete symbolic plans. These frameworks correct misaligned policy predictions without requiring additional training, maximizing the utility of pretrained models while achieving inference-time user objectives.

Related papers

Customize Multi-modal RAI Guardrails with Precedent-based predictions [55.63757336900865]
A multi-modal guardrail must effectively filter image content based on user-defined policies.<n>Existing fine-tuning methods typically condition predictions on pre-defined policies.<n>We propose to condition model's judgment on "precedents", which are the reasoning processes of prior data points similar to the given input.
arXiv Detail & Related papers (2025-07-28T03:45:34Z)
FDPP: Fine-tune Diffusion Policy with Human Preference [57.44575105114056]
Fine-tuning Diffusion Policy with Human Preference learns a reward function through preference-based learning.<n>This reward is then used to fine-tune the pre-trained policy with reinforcement learning.<n>Experiments demonstrate that FDPP effectively customizes policy behavior without compromising performance.
arXiv Detail & Related papers (2025-01-14T17:15:27Z)
Inference-Time Policy Steering through Human Interactions [54.02655062969934]
During inference, humans are often removed from the policy execution loop.<n>We propose an Inference-Time Policy Steering framework that leverages human interactions to bias the generative sampling process.<n>Our proposed sampling strategy achieves the best trade-off between alignment and distribution shift.
arXiv Detail & Related papers (2024-11-25T18:03:50Z)
Dynamic Policy Fusion for User Alignment Without Re-Interaction [14.948610521764415]
Deep reinforcement learning policies may not align with the personal preferences of human users.<n>We propose a more practical approach - to adapt the already trained policy to user-specific needs with the help of human feedback.<n>We empirically demonstrate in a number of environments that our proposed dynamic policy fusion approach consistently achieves the intended task.
arXiv Detail & Related papers (2024-09-30T07:23:47Z)
Towards Interpretable Foundation Models of Robot Behavior: A Task Specific Policy Generation Approach [1.7205106391379026]
Foundation models are a promising path toward general-purpose and user-friendly robots. In particular, the lack of modularity between tasks means that when model weights are updated, the behavior in other, unrelated tasks may be affected. We present an alternative approach to the design of robot foundation models, which generates stand-alone, task-specific policies.
arXiv Detail & Related papers (2024-07-10T21:55:44Z)
Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation [20.266695694005943]
Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts.
arXiv Detail & Related papers (2023-07-12T17:55:08Z)
Residual Q-Learning: Offline and Online Policy Customization without Value [53.47311900133564]
Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations. We formulate a new problem setting called policy customization. We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy.
arXiv Detail & Related papers (2023-06-15T22:01:19Z)
To the Noise and Back: Diffusion for Shared Autonomy [2.341116149201203]
We present a new approach to shared autonomy that employs a modulation of the forward and reverse diffusion process of diffusion models. Our framework learns a distribution over a space of desired behaviors. It then employs a diffusion model to translate the user's actions to a sample from this distribution.
arXiv Detail & Related papers (2023-02-23T18:58:36Z)
Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model. We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z)
PARTNR: Pick and place Ambiguity Resolving by Trustworthy iNteractive leaRning [5.046831208137847]
We present the PARTNR algorithm that can detect ambiguities in the trained policy by analyzing multiple modalities in the pick and place poses. PARTNR employs an adaptive, sensitivity-based, gating function that decides if additional user demonstrations are required. We demonstrate the performance of PARTNR in a table-top pick and place task.
arXiv Detail & Related papers (2022-11-15T17:07:40Z)
Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks. Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic. We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.