Monitoring Shortcut Learning using Mutual Information
- URL: http://arxiv.org/abs/2206.13034v1
- Date: Mon, 27 Jun 2022 03:55:23 GMT
- Title: Monitoring Shortcut Learning using Mutual Information
- Authors: Mohammed Adnan, Yani Ioannou, Chuan-Yung Tsai, Angus Galloway, H.R.
Tizhoosh, Graham W. Taylor
- Abstract summary: Shortcut learning is evaluated on real-world data that does not contain spurious correlations.
Experiments demonstrate that MI can be used as a metric network shortcut network.
- Score: 16.17600110257266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The failure of deep neural networks to generalize to out-of-distribution data
is a well-known problem and raises concerns about the deployment of trained
networks in safety-critical domains such as healthcare, finance and autonomous
vehicles. We study a particular kind of distribution shift $\unicode{x2013}$
shortcuts or spurious correlations in the training data. Shortcut learning is
often only exposed when models are evaluated on real-world data that does not
contain the same spurious correlations, posing a serious dilemma for AI
practitioners to properly assess the effectiveness of a trained model for
real-world applications. In this work, we propose to use the mutual information
(MI) between the learned representation and the input as a metric to find where
in training, the network latches onto shortcuts. Experiments demonstrate that
MI can be used as a domain-agnostic metric for monitoring shortcut learning.
Related papers
- SoK: Verifiable Cross-Silo FL [0.0]
We present a systematization of knowledge on verifiable cross-silo FL.
We analyze various protocols, fit them in a taxonomy, and compare their efficiency and threat models.
arXiv Detail & Related papers (2024-10-11T07:39:35Z) - How to Construct Perfect and Worse-than-Coin-Flip Spoofing
Countermeasures: A Word of Warning on Shortcut Learning [20.486639064376014]
Shortcut learning, or Clever Hans effect refers to situations where a learning agent learns spurious correlations present in data, resulting in biased models.
We focus on finding shortcuts in deep learning based spoofing countermeasures (CMs) that predict whether a given utterance is spoofed or not.
arXiv Detail & Related papers (2023-05-31T15:58:37Z) - Scalable Infomin Learning [39.77171117174905]
infomin learning aims to learn a representation with high utility while being uninformative about a specified target.
Recent works on infomin learning mainly use adversarial training, which involves training a neural network to estimate mutual information.
We propose a new infomin learning approach, which uses a novel proxy metric to mutual information.
arXiv Detail & Related papers (2023-02-21T14:40:25Z) - Shortcut Detection with Variational Autoencoders [1.3174512123890016]
We present a novel approach to detect shortcuts in image and audio datasets by leveraging variational autoencoders (VAEs)
The disentanglement of features in the latent space of VAEs allows us to discover feature-target correlations in datasets and semi-automatically evaluate them for ML shortcuts.
We demonstrate the applicability of our method on several real-world datasets and identify shortcuts that have not been discovered before.
arXiv Detail & Related papers (2023-02-08T18:26:10Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Out-of-Distribution Detection with Hilbert-Schmidt Independence
Optimization [114.43504951058796]
Outlier detection tasks have been playing a critical role in AI safety.
Deep neural network classifiers usually tend to incorrectly classify out-of-distribution (OOD) inputs into in-distribution classes with high confidence.
We propose an alternative probabilistic paradigm that is both practically useful and theoretically viable for the OOD detection tasks.
arXiv Detail & Related papers (2022-09-26T15:59:55Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - MUSCLE: Strengthening Semi-Supervised Learning Via Concurrent
Unsupervised Learning Using Mutual Information Maximization [29.368950377171995]
We introduce Mutual-information-based Unsupervised & Semi-supervised Concurrent LEarning (MUSCLE) to combine both unsupervised and semi-supervised learning.
MUSCLE can be used as a stand-alone training scheme for neural networks, and can also be incorporated into other learning approaches.
We show that the proposed hybrid model outperforms state of the art on several standard benchmarks, including CIFAR-10, CIFAR-100, and Mini-Imagenet.
arXiv Detail & Related papers (2020-11-30T23:01:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.