Related papers: Gated Information Bottleneck for Generalization in Sequential Environments

Gated Information Bottleneck for Generalization in Sequential Environments

URL: http://arxiv.org/abs/2110.06057v1
Date: Tue, 12 Oct 2021 14:58:38 GMT
Title: Gated Information Bottleneck for Generalization in Sequential Environments
Authors: Francesco Alesiani, Shujian Yu, Xi Yu
Abstract summary: Deep neural networks suffer from poor generalization to unseen environments when the underlying data distribution is different from that in the training set. We propose a new neural network-based IB approach, termed gated information bottleneck (GIB) We empirically demonstrate the superiority of GIB over other popular neural network-based IB approaches in adversarial robustness and out-of-distribution detection.
Score: 13.795129636387623
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks suffer from poor generalization to unseen environments when the underlying data distribution is different from that in the training set. By learning minimum sufficient representations from training data, the information bottleneck (IB) approach has demonstrated its effectiveness to improve generalization in different AI applications. In this work, we propose a new neural network-based IB approach, termed gated information bottleneck (GIB), that dynamically drops spurious correlations and progressively selects the most task-relevant features across different environments by a trainable soft mask (on raw features). GIB enjoys a simple and tractable objective, without any variational approximation or distributional assumption. We empirically demonstrate the superiority of GIB over other popular neural network-based IB approaches in adversarial robustness and out-of-distribution (OOD) detection. Meanwhile, we also establish the connection between IB theory and invariant causal representation learning, and observed that GIB demonstrates appealing performance when different environments arrive sequentially, a more practical scenario where invariant risk minimization (IRM) fails. Code of GIB is available at https://github.com/falesiani/GIB

Related papers

IENE: Identifying and Extrapolating the Node Environment for Out-of-Distribution Generalization on Graphs [10.087216264788097]
We propose IENE, an OOD generalization method on graphs based on node-level environmental identification and extrapolation techniques. It strengthens the model's ability to extract invariance from two granularities simultaneously, leading to improved generalization.
arXiv Detail & Related papers (2024-06-02T14:43:56Z)
IB-AdCSCNet:Adaptive Convolutional Sparse Coding Network Driven by Information Bottleneck [4.523653503622693]
We introduce IB-AdCSCNet, a deep learning model grounded in information bottleneck theory. IB-AdCSCNet seamlessly integrates the information bottleneck trade-off strategy into deep networks. Experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that IB-AdCSCNet not only matches the performance of deep residual convolutional networks but also outperforms them when handling corrupted data.
arXiv Detail & Related papers (2024-05-23T05:35:57Z)
Graph Out-of-Distribution Generalization via Causal Intervention [69.70137479660113]
We introduce a conceptually simple yet principled approach for training robust graph neural networks (GNNs) under node-level distribution shifts. Our method resorts to a new learning objective derived from causal inference that coordinates an environment estimator and a mixture-of-expert GNN predictor. Our model can effectively enhance generalization with various types of distribution shifts and yield up to 27.4% accuracy improvement over state-of-the-arts on graph OOD generalization benchmarks.
arXiv Detail & Related papers (2024-02-18T07:49:22Z)
DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design [11.922951794283168]
In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce data-regularised environment design (D
arXiv Detail & Related papers (2024-02-05T19:47:45Z)
Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe. GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Handling Distribution Shifts on Graphs: An Invariance Perspective [78.31180235269035]
We formulate the OOD problem on graphs and develop a new invariant learning approach, Explore-to-Extrapolate Risk Minimization (EERM) EERM resorts to multiple context explorers that are adversarially trained to maximize the variance of risks from multiple virtual environments. We prove the validity of our method by theoretically showing its guarantee of a valid OOD solution.
arXiv Detail & Related papers (2022-02-05T02:31:01Z)
Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation. In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN. Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z)
Learning a Domain-Agnostic Visual Representation for Autonomous Driving via Contrastive Loss [25.798361683744684]
Domain-Agnostic Contrastive Learning (DACL) is a two-stage unsupervised domain adaptation framework with cyclic adversarial training and contrastive loss. Our proposed approach achieves better performance in the monocular depth estimation task compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-10T07:06:03Z)
The Pitfalls of Simplicity Bias in Neural Networks [39.29115335858375]
Simplicity Bias (SB) is the tendency of standard training procedures to find simple models. We propose datasets that incorporate a precise notion of simplicity, multiple predictive features, and capture the non-robustness of neural networks trained on real data.
arXiv Detail & Related papers (2020-06-13T20:15:26Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.