Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation
- URL: http://arxiv.org/abs/2310.11991v2
- Date: Mon, 22 Jul 2024 21:40:55 GMT
- Title: Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation
- Authors: Floris Holstege, Bram Wouters, Noud van Giersbergen, Cees Diks,
- Abstract summary: Out-of-distribution generalization in neural networks is often hampered by spurious correlations.
Existing concept-removal methods tend to be overzealous by inadvertently eliminating features associated with the main task of the model.
We propose an iterative algorithm that separates spurious from main-task concepts by jointly identifying two low-dimensional subspaces in the neural network representation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Out-of-distribution generalization in neural networks is often hampered by spurious correlations. A common strategy is to mitigate this by removing spurious concepts from the neural network representation of the data. Existing concept-removal methods tend to be overzealous by inadvertently eliminating features associated with the main task of the model, thereby harming model performance. We propose an iterative algorithm that separates spurious from main-task concepts by jointly identifying two low-dimensional orthogonal subspaces in the neural network representation. We evaluate the algorithm on benchmark datasets for computer vision (Waterbirds, CelebA) and natural language processing (MultiNLI), and show that it outperforms existing concept removal methods
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Automatic Discovery of Visual Circuits [66.99553804855931]
We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept.
We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
arXiv Detail & Related papers (2024-04-22T17:00:57Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Understanding Distributed Representations of Concepts in Deep Neural
Networks without Supervision [25.449397570387802]
We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons.
Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts.
It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
arXiv Detail & Related papers (2023-12-28T07:33:51Z) - Deep Concept Removal [29.65899467379793]
We address the problem of concept removal in deep neural networks.
We propose a novel method based on adversarial linear classifiers trained on a concept dataset.
We also introduce an implicit gradient-based technique to tackle the challenges associated with adversarial training.
arXiv Detail & Related papers (2023-10-09T14:31:03Z) - Spiking Generative Adversarial Network with Attention Scoring Decoding [4.5727987473456055]
Spiking neural networks offer a closer approximation to brain-like processing.
We build a spiking generative adversarial network capable of handling complex images.
arXiv Detail & Related papers (2023-05-17T14:35:45Z) - Fitting Low-rank Models on Egocentrically Sampled Partial Networks [4.111899441919165]
We propose an approach to fit general low-rank models for egocentrically sampled networks.
This method offers the first theoretical guarantee for egocentric partial network estimation.
We evaluate the technique on several synthetic and real-world networks and show that it delivers competitive performance in link prediction tasks.
arXiv Detail & Related papers (2023-03-09T03:20:44Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - Generate and Verify: Semantically Meaningful Formal Analysis of Neural
Network Perception Systems [2.2559617939136505]
Testing remains to evaluate accuracy of neural network perception systems.
We employ neural network verification to prove that a model will always produce estimates within some error bound to the ground truth.
arXiv Detail & Related papers (2020-12-16T23:09:53Z) - Neural Networks with Recurrent Generative Feedback [61.90658210112138]
We instantiate this design on convolutional neural networks (CNNs)
In the experiments, CNN-F shows considerably improved adversarial robustness over conventional feedforward CNNs on standard benchmarks.
arXiv Detail & Related papers (2020-07-17T19:32:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.