Related papers: ConceptViz: A Visual Analytics Approach for Exploring Concepts in Large Language Models

ConceptViz: A Visual Analytics Approach for Exploring Concepts in Large Language Models

URL: http://arxiv.org/abs/2509.20376v1
Date: Sat, 20 Sep 2025 04:57:20 GMT
Title: ConceptViz: A Visual Analytics Approach for Exploring Concepts in Large Language Models
Authors: Haoxuan Li, Zhen Wen, Qiqi Jiang, Chenxiao Li, Yuwei Wu, Yuchen Yang, Yiyao Wang, Xiuqi Huang, Minfeng Zhu, Wei Chen,
Abstract summary: ConceptViz is a visual analytics system designed to explore concepts in large language models (LLMs)<n>Our results show that ConceptViz enhances interpretability research by streamlining the discovery and validation of meaningful concept representations in LLMs.
Score: 18.456737929856125
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have achieved remarkable performance across a wide range of natural language tasks. Understanding how LLMs internally represent knowledge remains a significant challenge. Despite Sparse Autoencoders (SAEs) have emerged as a promising technique for extracting interpretable features from LLMs, SAE features do not inherently align with human-understandable concepts, making their interpretation cumbersome and labor-intensive. To bridge the gap between SAE features and human concepts, we present ConceptViz, a visual analytics system designed for exploring concepts in LLMs. ConceptViz implements a novel dentification => Interpretation => Validation pipeline, enabling users to query SAEs using concepts of interest, interactively explore concept-to-feature alignments, and validate the correspondences through model behavior verification. We demonstrate the effectiveness of ConceptViz through two usage scenarios and a user study. Our results show that ConceptViz enhances interpretability research by streamlining the discovery and validation of meaningful concept representations in LLMs, ultimately aiding researchers in building more accurate mental models of LLM features. Our code and user guide are publicly available at https://github.com/Happy-Hippo209/ConceptViz.

Related papers

Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs [51.378834857406325]
Mechanistic interpretability seeks to mitigate the issues through extracts from large language models.<n>Sparse autoencoders (SAEs) have emerged as a popular approach for extracting interpretable and monosemantic concepts.<n>We show that SAEs suffer from a fundamental theoretical ambiguity: the well-defined correspondence between LLM representations and human-interpretable concepts remains unclear.
arXiv Detail & Related papers (2026-01-28T09:27:05Z)
Probe by Gaming: A Game-based Benchmark for Assessing Conceptual Knowledge in LLMs [17.753896112412942]
CK-Arena is a multi-agent interaction game built upon the Undercover game.<n>It is designed to evaluate the capacity of Large Language Models to reason with concepts in interactive settings.<n> CK-Arena provides a scalable and realistic benchmark for assessing conceptual reasoning in dynamic environments.
arXiv Detail & Related papers (2025-05-23T06:06:28Z)
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models [50.587868616659826]
We introduce a comprehensive framework for evaluating monosemanticity at the neuron-level in vision representations.<n>Our experimental results reveal that SAEs trained on Vision-Language Models significantly enhance the monosemanticity of individual neurons.
arXiv Detail & Related papers (2025-04-03T17:58:35Z)
Self-Evolving Visual Concept Library using Vision-Language Critics [38.15146001218907]
Building effective visual concept libraries is challenging, as manual definition is labor-intensive.<n>Our approach, ESCHER, takes a library learning perspective to iteratively discover and improve visual concepts.<n>We empirically demonstrate the ability of ESCHER to learn a concept library for zero-shot, few-shot, and fine-tuning visual classification tasks.
arXiv Detail & Related papers (2025-03-31T19:47:55Z)
Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders [29.356200147371275]
Large language models (LLMs) excel at handling human queries, but they can occasionally generate flawed or unexpected responses.<n>We propose using a fixed vocabulary set for feature interpretations and designing a mutual information-based objective.<n>We propose two runtime steering strategies that adjust the learned feature activations based on their corresponding explanations.
arXiv Detail & Related papers (2025-02-21T16:36:42Z)
Disentangling Dense Embeddings with Sparse Autoencoders [0.0]
Sparse autoencoders (SAEs) have shown promise in extracting interpretable features from complex neural networks. We present one of the first applications of SAEs to dense text embeddings from large language models. We show that the resulting sparse representations maintain semantic fidelity while offering interpretability.
arXiv Detail & Related papers (2024-08-01T15:46:22Z)
MyVLM: Personalizing VLMs for User-Specific Queries [78.33252556805931]
We take a first step toward the personalization of vision-language models, enabling them to learn and reason over user-provided concepts. To effectively recognize a variety of user-specific concepts, we augment the VLM with external concept heads that function as toggles for the model. Having recognized the concept, we learn a new concept embedding in the intermediate feature space of the VLM. This embedding is tasked with guiding the language model to naturally integrate the target concept in its generated response.
arXiv Detail & Related papers (2024-03-21T17:51:01Z)
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains. The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications. We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Towards Concept-Aware Large Language Models [56.48016300758356]
Concepts play a pivotal role in various human cognitive functions, including learning, reasoning and communication. There is very little work on endowing machines with the ability to form and reason with concepts. In this work, we analyze how well contemporary large language models (LLMs) capture human concepts and their structure.
arXiv Detail & Related papers (2023-11-03T12:19:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.