Related papers: Improving the Explain-Any-Concept by Introducing Nonlinearity to the Trainable Surrogate Model

Improving the Explain-Any-Concept by Introducing Nonlinearity to the Trainable Surrogate Model

URL: http://arxiv.org/abs/2405.11837v2
Date: Mon, 24 Jun 2024 19:28:08 GMT
Title: Improving the Explain-Any-Concept by Introducing Nonlinearity to the Trainable Surrogate Model
Authors: Mounes Zaval, Sedat Ozer,
Abstract summary: Explain Any Concept (EAC) model is a flexible method for explaining decisions. EAC model is based on using a surrogate model which has one trainable linear layer to simulate the target model. We show that by introducing an additional nonlinear layer to the original surrogate model, we can improve the performance of the EAC model.
Score: 4.6040036610482655
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In the evolving field of Explainable AI (XAI), interpreting the decisions of deep neural networks (DNNs) in computer vision tasks is an important process. While pixel-based XAI methods focus on identifying significant pixels, existing concept-based XAI methods use pre-defined or human-annotated concepts. The recently proposed Segment Anything Model (SAM) achieved a significant step forward to prepare automatic concept sets via comprehensive instance segmentation. Building upon this, the Explain Any Concept (EAC) model emerged as a flexible method for explaining DNN decisions. EAC model is based on using a surrogate model which has one trainable linear layer to simulate the target model. In this paper, by introducing an additional nonlinear layer to the original surrogate model, we show that we can improve the performance of the EAC model. We compare our proposed approach to the original EAC model and report improvements obtained on both ImageNet and MS COCO datasets.

Related papers

Interpretable Reward Modeling with Active Concept Bottlenecks [54.00085739303773]
We introduce Concept Bottleneck Reward Models (CB-RM), a reward modeling framework that enables interpretable preference learning.<n>Unlike standard RLHF methods that rely on opaque reward functions, CB-RM decomposes reward prediction into human-interpretable concepts.<n>We formalize an active learning strategy that dynamically acquires the most informative concept labels.
arXiv Detail & Related papers (2025-07-07T06:26:04Z)
Sparse autoencoders reveal selective remapping of visual concepts during adaptation [54.82630842681845]
Adapting foundation models for specific purposes has become a standard approach to build machine learning systems. We develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts.
arXiv Detail & Related papers (2024-12-06T18:59:51Z)
Decompose the model: Mechanistic interpretability in image models with Generalized Integrated Gradients (GIG) [24.02036048242832]
This paper introduces a novel approach to trace the entire pathway from input through all intermediate layers to the final output within the whole dataset. We utilize Pointwise Feature Vectors (PFVs) and Effective Receptive Fields (ERFs) to decompose model embeddings into interpretable Concept Vectors. Then, we calculate the relevance between concept vectors with our Generalized Integrated Gradients (GIG) enabling a comprehensive, dataset-wide analysis of model behavior.
arXiv Detail & Related papers (2024-09-03T05:19:35Z)
Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Existing approaches often require numerous human interventions per image to achieve strong performances. We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z)
Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment [69.33930972652594]
We propose a novel structural pruning approach to jointly learn the weights and structurally prune architectures of CNN models. The core element of our method is a Reinforcement Learning (RL) agent whose actions determine the pruning ratios of the CNN model's layers. We conduct the joint training and pruning by iteratively training the model's weights and the agent's policy.
arXiv Detail & Related papers (2024-03-28T15:22:29Z)
SETA: Semantic-Aware Token Augmentation for Domain Generalization [27.301312891532277]
Domain generalization (DG) aims to enhance the model against domain shifts without accessing target domains. Prior CNN-based augmentation methods on token-based models are suboptimal due to the lack of incentivizing the model to learn holistic shape information. We propose the SEmantic-aware Token Augmentation (SETA) method, which transforms features by perturbing local edge cues while preserving global shape features.
arXiv Detail & Related papers (2024-03-18T13:50:35Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations [13.60538902487872]
We present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets.
arXiv Detail & Related papers (2023-11-28T10:53:26Z)
Learning Transferable Conceptual Prototypes for Interpretable Unsupervised Domain Adaptation [79.22678026708134]
In this paper, we propose an inherently interpretable method, named Transferable Prototype Learning ( TCPL) To achieve this goal, we design a hierarchically prototypical module that transfers categorical basic concepts from the source domain to the target domain and learns domain-shared prototypes for explaining the underlying reasoning process. Comprehensive experiments show that the proposed method can not only provide effective and intuitive explanations but also outperform previous state-of-the-arts.
arXiv Detail & Related papers (2023-10-12T06:36:41Z)
Explain Any Concept: Segment Anything Meets Concept-Based Explanation [11.433807960637685]
Segment Anything Model (SAM) has been demonstrated as a powerful framework for performing precise and comprehensive instance segmentation. We offer an effective and flexible concept-based explanation method, namely Explain Any Concept (EAC) We thus propose a lightweight per-input equivalent (PIE) scheme, enabling efficient explanation with a surrogate model.
arXiv Detail & Related papers (2023-05-17T15:26:51Z)
Optimizing Explanations by Network Canonization and Hyperparameter Search [74.76732413972005]
Rule-based and modified backpropagation XAI approaches often face challenges when being applied to modern model architectures. Model canonization is the process of re-structuring the model to disregard problematic components without changing the underlying function. In this work, we propose canonizations for currently relevant model blocks applicable to popular deep neural network architectures.
arXiv Detail & Related papers (2022-11-30T17:17:55Z)
Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction. Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image. Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z)
Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence [14.071950294953005]
Concept Activation Vectors (CAVs) have emerged as a popular tool for modeling human-understandable concepts in the latent space. In this paper we show that such a separability-oriented leads to solutions, which may diverge from the actual goal of precisely modeling the concept direction. We introduce pattern-based CAVs, solely focussing on concept signals, thereby providing more accurate concept directions.
arXiv Detail & Related papers (2022-02-07T19:40:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.