Towards Automated Semantic Interpretability in Reinforcement Learning via Vision-Language Models
- URL: http://arxiv.org/abs/2503.16724v1
- Date: Thu, 20 Mar 2025 21:53:19 GMT
- Title: Towards Automated Semantic Interpretability in Reinforcement Learning via Vision-Language Models
- Authors: Zhaoxin Li, Zhang Xi-Jia, Batuhan Altundas, Letian Chen, Rohan Paleja, Matthew Gombolay,
- Abstract summary: We introduce Semantically Interpretable Reinforcement Learning with Vision-Language Models Empowered Automation (SILVA)<n>SILVA is an automated framework that leverages pre-trained vision-language models (VLM) for semantic feature extraction and interpretable tree-based models for policy optimization.
- Score: 1.8032335403003321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic Interpretability in Reinforcement Learning (RL) enables transparency, accountability, and safer deployment by making the agent's decisions understandable and verifiable. Achieving this, however, requires a feature space composed of human-understandable concepts, which traditionally rely on human specification and fail to generalize to unseen environments. In this work, we introduce Semantically Interpretable Reinforcement Learning with Vision-Language Models Empowered Automation (SILVA), an automated framework that leverages pre-trained vision-language models (VLM) for semantic feature extraction and interpretable tree-based models for policy optimization. SILVA first queries a VLM to identify relevant semantic features for an unseen environment, then extracts these features from the environment. Finally, it trains an Interpretable Control Tree via RL, mapping the extracted features to actions in a transparent and interpretable manner. To address the computational inefficiency of extracting features directly with VLMs, we develop a feature extraction pipeline that generates a dataset for training a lightweight convolutional network, which is subsequently used during RL. By leveraging VLMs to automate tree-based RL, SILVA removes the reliance on human annotation previously required by interpretable models while also overcoming the inability of VLMs alone to generate valid robot policies, enabling semantically interpretable reinforcement learning without human-in-the-loop.
Related papers
- LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation [94.84458417662404]
LangTraj is a language-conditioned scene-diffusion model that simulates the joint behavior of all agents in traffic scenarios.
By conditioning on natural language inputs, LangTraj provides flexible and intuitive control over interactive behaviors.
LangTraj demonstrates strong performance in realism, language controllability, and language-conditioned safety-critical simulation.
arXiv Detail & Related papers (2025-04-15T17:14:06Z) - Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models [50.587868616659826]
Sparse Autoencoders (SAEs) have been shown to enhance interpretability and steerability in Large Language Models (LLMs)
In this work, we extend the application of SAEs to Vision-Language Models (VLMs), such as CLIP, and introduce a comprehensive framework for evaluating monosemanticity in vision representations.
arXiv Detail & Related papers (2025-04-03T17:58:35Z) - Proposition of Affordance-Driven Environment Recognition Framework Using Symbol Networks in Large Language Models [1.2430809884830318]
This study proposes a method for automatic affordance acquisition by leveraging large language models (LLMs)
Experiments using apple'' as an example demonstrated the method's ability to extract context-dependent affordances with high explainability.
arXiv Detail & Related papers (2025-04-02T11:48:44Z) - Sparse Autoencoder Features for Classifications and Transferability [11.2185030332009]
We analyze Sparse Autoencoders (SAEs) for interpretable feature extraction from Large Language Models (LLMs)
Our framework evaluates (1) model-layer selection and scaling properties, (2) SAE architectural configurations, including width and pooling strategies, and (3) the effect of binarizing continuous SAE activations.
arXiv Detail & Related papers (2025-02-17T02:30:45Z) - RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception [20.01853641155509]
Vision-language model (VLM) fine-tuning for application-specific visual grounding based on natural language instructions has become one of the most popular approaches for learning-enabled autonomous systems.<n>We propose a new generalizable framework to improve VLM fine-tuning by integrating it with a reinforcement learning (RL) agent.
arXiv Detail & Related papers (2025-01-31T04:30:42Z) - Verbalized Machine Learning: Revisiting Machine Learning with Language Models [63.10391314749408]
We introduce the framework of verbalized machine learning (VML)<n>VML constrains the parameter space to be human-interpretable natural language.<n>We empirically verify the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability.
arXiv Detail & Related papers (2024-06-06T17:59:56Z) - Tuning-Free Accountable Intervention for LLM Deployment -- A
Metacognitive Approach [55.613461060997004]
Large Language Models (LLMs) have catalyzed transformative advances across a spectrum of natural language processing tasks.
We propose an innovative textitmetacognitive approach, dubbed textbfCLEAR, to equip LLMs with capabilities for self-aware error identification and correction.
arXiv Detail & Related papers (2024-03-08T19:18:53Z) - AS-XAI: Self-supervised Automatic Semantic Interpretation for CNN [5.42467030980398]
We propose a self-supervised automatic semantic interpretable artificial intelligence (AS-XAI) framework.
It utilizes transparent embedding semantic extraction spaces and row-centered principal component analysis (PCA) for global semantic interpretation of model decisions.
The proposed approach offers broad fine-grained practical applications, including shared semantic interpretation under out-of-distribution categories.
arXiv Detail & Related papers (2023-12-02T10:06:54Z) - SELF: Self-Evolution with Language Feedback [68.6673019284853]
'SELF' (Self-Evolution with Language Feedback) is a novel approach to advance large language models.
It enables LLMs to self-improve through self-reflection, akin to human learning processes.
Our experiments in mathematics and general tasks demonstrate that SELF can enhance the capabilities of LLMs without human intervention.
arXiv Detail & Related papers (2023-10-01T00:52:24Z) - Artificial-Spiking Hierarchical Networks for Vision-Language
Representation Learning [16.902924543372713]
State-of-the-art methods achieve impressive performance by pre-training on large-scale datasets.
We propose an efficient framework for multimodal alignment by introducing a novel visual semantic module.
Experiments show that the proposed ASH-Nets achieve competitive results.
arXiv Detail & Related papers (2023-08-18T10:40:25Z) - Planning for Learning Object Properties [117.27898922118946]
We formalize the problem of automatically training a neural network to recognize object properties as a symbolic planning problem.
We use planning techniques to produce a strategy for automating the training dataset creation and the learning process.
We provide an experimental evaluation in both a simulated and a real environment.
arXiv Detail & Related papers (2023-01-15T09:37:55Z) - PEVL: Position-enhanced Pre-training and Prompt Tuning for
Vision-language Models [127.17675443137064]
We introduce PEVL, which enhances the pre-training and prompt tuning of vision-language models with explicit object position modeling.
PEVL reformulates discretized object positions and language in a unified language modeling framework.
We show that PEVL enables state-of-the-art performance on position-sensitive tasks such as referring expression comprehension and phrase grounding.
arXiv Detail & Related papers (2022-05-23T10:17:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.