Related papers: Mechanistic interpretability for steering vision-language-action models

Mechanistic interpretability for steering vision-language-action models

URL: http://arxiv.org/abs/2509.00328v1
Date: Sat, 30 Aug 2025 03:01:57 GMT
Title: Mechanistic interpretability for steering vision-language-action models
Authors: Bear Häon, Kaylene Stocking, Ian Chuang, Claire Tomlin,
Abstract summary: Vision-Language-Action (VLA) models are a promising path to realizing generalist embodied agents.<n>We introduce the first framework for interpreting and steering VLAs via their internal representations.<n>We introduce a general-purpose activation steering method that modulates behavior in real time, without fine-tuning, reward signals, or environment interaction.
Score: 0.23371356738437823
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-Language-Action (VLA) models are a promising path to realizing generalist embodied agents that can quickly adapt to new tasks, modalities, and environments. However, methods for interpreting and steering VLAs fall far short of classical robotics pipelines, which are grounded in explicit models of kinematics, dynamics, and control. This lack of mechanistic insight is a central challenge for deploying learned policies in real-world robotics, where robustness and explainability are critical. Motivated by advances in mechanistic interpretability for large language models, we introduce the first framework for interpreting and steering VLAs via their internal representations, enabling direct intervention in model behavior at inference time. We project feedforward activations within transformer layers onto the token embedding basis, identifying sparse semantic directions - such as speed and direction - that are causally linked to action selection. Leveraging these findings, we introduce a general-purpose activation steering method that modulates behavior in real time, without fine-tuning, reward signals, or environment interaction. We evaluate this method on two recent open-source VLAs, Pi0 and OpenVLA, and demonstrate zero-shot behavioral control in simulation (LIBERO) and on a physical robot (UR5). This work demonstrates that interpretable components of embodied VLAs can be systematically harnessed for control - establishing a new paradigm for transparent and steerable foundation models in robotics.

Related papers

SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models [4.506695482619111]
This work introduces SteerVLM, a lightweight steering module for Vision-Language Models (VLMs)<n>Our approach learns from the latent embeddings of paired prompts encoding target and converse behaviors to dynamically adjust activations connecting the language modality with image context.<n>Our steering module requires learning parameters equal to 0.14% of the original VLM's size.
arXiv Detail & Related papers (2025-10-30T17:52:39Z)
Bridging Embodiment Gaps: Deploying Vision-Language-Action Models on Soft Robots [5.993870098970107]
Vision-Language-Action (VLA) models have been proposed as a language guided generalized control framework for real robots.<n>We present the deployment of a VLA model on a soft continuum manipulator to demonstrate autonomous safe human-robot interaction.
arXiv Detail & Related papers (2025-10-20T10:06:39Z)
Exploring Conditions for Diffusion models in Robotic Control [70.27711404291573]
We explore leveraging pre-trained text-to-image diffusion models to obtain task-adaptive visual representations for robotic control.<n>We find that naively applying textual conditions yields minimal or even negative gains in control tasks.<n>We propose ORCA, which introduces learnable task prompts that adapt to the control environment and visual prompts that capture fine-grained, frame-specific details.
arXiv Detail & Related papers (2025-10-17T10:24:14Z)
Executable Analytic Concepts as the Missing Link Between VLM Insight and Precise Manipulation [70.8381970762877]
Vision-Language Models (VLMs) have demonstrated remarkable capabilities in semantic reasoning and task planning.<n>We introduce GRACE, a novel framework that grounds VLM-based reasoning through executable analytic concepts.<n>G GRACE provides a unified and interpretable interface between high-level instruction understanding and low-level robot control.
arXiv Detail & Related papers (2025-10-09T09:08:33Z)
FreeAction: Training-Free Techniques for Enhanced Fidelity of Trajectory-to-Video Generation [50.39748673817223]
We introduce two training-free, inference-time techniques that fully exploit explicit action parameters in robot video generation.<n>First, action-scaled classifier-free guidance dynamically modulates guidance strength in proportion to action magnitude, enhancing controllability over motion intensity.<n>Second, action-scaled noise truncation adjusts the distribution of initially sampled noise to better align with the desired motion dynamics.
arXiv Detail & Related papers (2025-09-29T03:30:40Z)
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z)
Constrained Decoding for Robotics Foundation Models [13.414495236464488]
Recent advances in the development of robotic foundation models have led to promising end-to-end and general-purpose capabilities in robotic systems.<n>We introduce a constrained decoding framework for robotics foundation models that enforces logical constraints on action trajec- tories in dynamical systems.
arXiv Detail & Related papers (2025-09-01T19:17:40Z)
Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction [10.38090975412416]
Building a generalizable self-correction system is crucial for robots to recover from failures.<n>We build the Phoenix framework, which leverages motion instruction as a bridge to connect high-level semantic reflection with low-level robotic action correction.<n>Experiments conducted in both the RoboMimic simulation and real-world scenarios prove the superior generalization and robustness of our framework.
arXiv Detail & Related papers (2025-04-20T12:30:43Z)
LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation [94.84458417662404]
LangTraj is a language-conditioned scene-diffusion model that simulates the joint behavior of all agents in traffic scenarios.<n>By conditioning on natural language inputs, LangTraj provides flexible and intuitive control over interactive behaviors.<n>LangTraj demonstrates strong performance in realism, language controllability, and language-conditioned safety-critical simulation.
arXiv Detail & Related papers (2025-04-15T17:14:06Z)
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models [89.44024245194315]
We introduce a method that incorporates explicit visual chain-of-thought (CoT) reasoning into vision-language-action models (VLAs)<n>We introduce CoT-VLA, a state-of-the-art 7B VLA that can understand and generate visual and action tokens.<n>Our experimental results demonstrate that CoT-VLA achieves strong performance, outperforming the state-of-the-art VLA model by 17% in real-world manipulation tasks and 6% in simulation benchmarks.
arXiv Detail & Related papers (2025-03-27T22:23:04Z)
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation [22.968763141077375]
We propose a novel closed-loop vision-language-action (VLA) method to enhance robustness of robot visual manipulation.<n>The text-guided video generation model in GEVRM can generate highly expressive future visual planning goals.<n>The proposed GEVRM achieves state-of-the-art performance on both standard and CALVIN benchmarks.
arXiv Detail & Related papers (2025-02-13T12:29:50Z)
Robotic Control via Embodied Chain-of-Thought Reasoning [86.6680905262442]
Key limitation of learned robot control policies is their inability to generalize outside their training data.<n>Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models can substantially improve their robustness and generalization ability.<n>We introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features before predicting the robot action.
arXiv Detail & Related papers (2024-07-11T17:31:01Z)
Exploring Latent Pathways: Enhancing the Interpretability of Autonomous Driving with a Variational Autoencoder [79.70947339175572]
A bio-inspired neural circuit policy model has emerged as an innovative control module. We take a leap forward by integrating a variational autoencoder with the neural circuit policy controller. In addition to the architectural shift toward a variational autoencoder, this study introduces the automatic latent perturbation tool.
arXiv Detail & Related papers (2024-04-02T09:05:47Z)
Interactive Character Control with Auto-Regressive Motion Diffusion Models [18.727066177880708]
We propose A-MDM (Auto-regressive Motion Diffusion Model) for real-time motion synthesis. Our conditional diffusion model takes an initial pose as input, and auto-regressively generates successive motion frames conditioned on previous frame. We introduce a suite of techniques for incorporating interactive controls into A-MDM, such as task-oriented sampling, in-painting, and hierarchical reinforcement learning.
arXiv Detail & Related papers (2023-06-01T07:48:34Z)
An Adaptable Approach to Learn Realistic Legged Locomotion without Examples [38.81854337592694]
This work proposes a generic approach for ensuring realism in locomotion by guiding the learning process with the spring-loaded inverted pendulum model as a reference. We present experimental results showing that even in a model-free setup, the learned policies can generate realistic and energy-efficient locomotion gaits for a bipedal and a quadrupedal robot.
arXiv Detail & Related papers (2021-10-28T10:14:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.