Related papers: Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation

Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation

URL: http://arxiv.org/abs/2310.11991v2
Date: Mon, 22 Jul 2024 21:40:55 GMT
Title: Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation
Authors: Floris Holstege, Bram Wouters, Noud van Giersbergen, Cees Diks,
Abstract summary: Out-of-distribution generalization in neural networks is often hampered by spurious correlations. Existing concept-removal methods tend to be overzealous by inadvertently eliminating features associated with the main task of the model. We propose an iterative algorithm that separates spurious from main-task concepts by jointly identifying two low-dimensional subspaces in the neural network representation.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Out-of-distribution generalization in neural networks is often hampered by spurious correlations. A common strategy is to mitigate this by removing spurious concepts from the neural network representation of the data. Existing concept-removal methods tend to be overzealous by inadvertently eliminating features associated with the main task of the model, thereby harming model performance. We propose an iterative algorithm that separates spurious from main-task concepts by jointly identifying two low-dimensional orthogonal subspaces in the neural network representation. We evaluate the algorithm on benchmark datasets for computer vision (Waterbirds, CelebA) and natural language processing (MultiNLI), and show that it outperforms existing concept removal methods

Related papers

Concept Probing: Where to Find Human-Defined Concepts (Extended Version) [3.2443914909457594]
We propose a method to automatically identify which layer's representations in a neural network model should be considered when probing for a given human-defined concept of interest.<n>We validate our findings through an exhaustive empirical analysis over different neural network models and datasets.
arXiv Detail & Related papers (2025-07-24T16:30:10Z)
Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models [5.985204759362746]
We present a unified framework for transforming any vision neural network into a spatially and conceptually interpretable model. We name this method "Spatially-Aware and Label-Free Concept Bottleneck Model" (SALF-CBM)
arXiv Detail & Related papers (2025-02-27T14:27:55Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Automatic Discovery of Visual Circuits [66.99553804855931]
We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
arXiv Detail & Related papers (2024-04-22T17:00:57Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Understanding Distributed Representations of Concepts in Deep Neural Networks without Supervision [25.449397570387802]
We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons. Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts. It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
arXiv Detail & Related papers (2023-12-28T07:33:51Z)
Deep Concept Removal [29.65899467379793]
We address the problem of concept removal in deep neural networks. We propose a novel method based on adversarial linear classifiers trained on a concept dataset. We also introduce an implicit gradient-based technique to tackle the challenges associated with adversarial training.
arXiv Detail & Related papers (2023-10-09T14:31:03Z)
Spiking Generative Adversarial Network with Attention Scoring Decoding [4.5727987473456055]
Spiking neural networks offer a closer approximation to brain-like processing. We build a spiking generative adversarial network capable of handling complex images.
arXiv Detail & Related papers (2023-05-17T14:35:45Z)
Fitting Low-rank Models on Egocentrically Sampled Partial Networks [4.111899441919165]
We propose an approach to fit general low-rank models for egocentrically sampled networks. This method offers the first theoretical guarantee for egocentric partial network estimation. We evaluate the technique on several synthetic and real-world networks and show that it delivers competitive performance in link prediction tasks.
arXiv Detail & Related papers (2023-03-09T03:20:44Z)
SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks. We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption. To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z)
Generate and Verify: Semantically Meaningful Formal Analysis of Neural Network Perception Systems [2.2559617939136505]
Testing remains to evaluate accuracy of neural network perception systems. We employ neural network verification to prove that a model will always produce estimates within some error bound to the ground truth.
arXiv Detail & Related papers (2020-12-16T23:09:53Z)
Neural Networks with Recurrent Generative Feedback [61.90658210112138]
We instantiate this design on convolutional neural networks (CNNs) In the experiments, CNN-F shows considerably improved adversarial robustness over conventional feedforward CNNs on standard benchmarks.
arXiv Detail & Related papers (2020-07-17T19:32:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.