Related papers: LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions

LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions

URL: http://arxiv.org/abs/2406.08572v1
Date: Wed, 12 Jun 2024 18:19:37 GMT
Title: LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions
Authors: Nhat Hoang-Xuan, Minh Vu, My T. Thai,
Abstract summary: Prior works have associated concepts with neurons based on examples of concepts or a pre-defined set of concepts. We propose to leverage multimodal large language models for automatic and open-ended concept discovery. We validate each concept by generating examples and counterexamples and evaluating the neuron's response on this new set of images.
Score: 15.381209058506078
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Providing textual concept-based explanations for neurons in deep neural networks (DNNs) is of importance in understanding how a DNN model works. Prior works have associated concepts with neurons based on examples of concepts or a pre-defined set of concepts, thus limiting possible explanations to what the user expects, especially in discovering new concepts. Furthermore, defining the set of concepts requires manual work from the user, either by directly specifying them or collecting examples. To overcome these, we propose to leverage multimodal large language models for automatic and open-ended concept discovery. We show that, without a restricted set of pre-defined concepts, our method gives rise to novel interpretable concepts that are more faithful to the model's behavior. To quantify this, we validate each concept by generating examples and counterexamples and evaluating the neuron's response on this new set of images. Collectively, our method can discover concepts and simultaneously validate them, providing a credible automated tool to explain deep neural networks.

Related papers

FaCT: Faithful Concept Traces for Explaining Neural Network Decisions [56.796533084868884]
Deep networks have shown remarkable performance across a wide range of tasks, yet getting a global concept-level understanding of how they function remains a key challenge.<n>We put emphasis on the faithfulness of concept-based explanations and propose a new model with model-inherent mechanistic concept-explanations.<n>Our concepts are shared across classes and, from any layer, their contribution to the logit and their input-visualization can be faithfully traced.
arXiv Detail & Related papers (2025-10-29T13:35:46Z)
Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
Neuro-Symbolic Concepts [72.94541757514396]
This article presents a concept-centric paradigm for building agents that can learn continually and reason flexibly.<n>The concept-centric agent utilizes a vocabulary of neuro-symbolic concepts.<n>This framework offers several advantages, including data efficiency, compositional generalization, continual learning, and zero-shot transfer.
arXiv Detail & Related papers (2025-05-09T17:02:51Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Revealing emergent human-like conceptual representations from language prediction [90.73285317321312]
Large language models (LLMs) trained solely through next-token prediction on text exhibit strikingly human-like behaviors.<n>Are these models developing concepts akin to those of humans?<n>We found that LLMs can flexibly derive concepts from linguistic descriptions in relation to contextual cues about other concepts.
arXiv Detail & Related papers (2025-01-21T23:54:17Z)
OmniPrism: Learning Disentangled Visual Concept for Image Generation [57.21097864811521]
Creative visual concept generation often draws inspiration from specific concepts in a reference image to produce relevant outcomes. We propose OmniPrism, a visual concept disentangling approach for creative image generation. Our method learns disentangled concept representations guided by natural language and trains a diffusion model to incorporate these concepts.
arXiv Detail & Related papers (2024-12-16T18:59:52Z)
Exploiting Interpretable Capabilities with Concept-Enhanced Diffusion and Prototype Networks [8.391254800873599]
We create concept-enriched models that incorporate concept information into existing architectures. In particular, we propose Concept-Guided Diffusion Conditional, which can generate visual representations of concepts, and Concept-Guided Prototype Networks, which can create a concept prototype dataset and leverage it to perform interpretable concept prediction. These results open up new lines of research by exploiting pre-existing information in the quest for rendering machine learning more human-understandable.
arXiv Detail & Related papers (2024-10-24T13:07:56Z)
Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations [7.736445799116692]
Concept-based methods have become a popular choice for explaining deep neural networks post-hoc.<n>We devise a reinforcement learning-based preference optimization algorithm that fine-tunes a vision-language generative model.<n>We demonstrate our method's ability to efficiently and reliably articulate diverse concepts.
arXiv Detail & Related papers (2024-08-24T02:26:42Z)
Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks. We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm. Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z)
A survey on Concept-based Approaches For Model Improvement [2.1516043775965565]
Concepts are known to be the thinking ground of humans. We provide a systematic review and taxonomy of various concept representations and their discovery algorithms in Deep Neural Networks (DNNs) We also provide details on concept-based model improvement literature marking the first comprehensive survey of these methods.
arXiv Detail & Related papers (2024-03-21T17:09:20Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning [49.12350554270196]
We show how modularity can be leveraged to derive a compositional data augmentation framework inspired by imagination. Our method, denoted Object-centric Compositional Neural Module Network (OC-NMN), decomposes visual generative reasoning tasks into a series of primitives applied to objects without using a domain-specific language.
arXiv Detail & Related papers (2023-10-28T20:12:58Z)
Formal Conceptual Views in Neural Networks [0.0]
We introduce two notions for conceptual views of a neural network, specifically a many-valued and a symbolic view. We test the conceptual expressivity of our novel views through different experiments on the ImageNet and Fruit-360 data sets. We demonstrate how conceptual views can be applied for abductive learning of human comprehensible rules from neurons.
arXiv Detail & Related papers (2022-09-27T16:38:24Z)
Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV) We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z)
Cause and Effect: Concept-based Explanation of Neural Networks [3.883460584034766]
We take a step in the interpretability of neural networks by examining their internal representation or neuron's activations against concepts. We propose a framework to check the existence of a causal relationship between a concept (or its negation) and task classes.
arXiv Detail & Related papers (2021-05-14T18:54:17Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.