A Too-Good-to-be-True Prior to Reduce Shortcut Reliance
- URL: http://arxiv.org/abs/2102.06406v1
- Date: Fri, 12 Feb 2021 09:17:24 GMT
- Title: A Too-Good-to-be-True Prior to Reduce Shortcut Reliance
- Authors: Nikolay Dagaev, Brett D. Roads, Xiaoliang Luo, Daniel N. Barry,
Kaustubh R. Patil, Bradley C. Love
- Abstract summary: Deep convolutional neural networks (DCNNs) often fail to generalize to out-of-distribution (o.o.d.) samples.
One cause for this shortcoming is that modern architectures tend to rely on "shortcuts"
We implement this inductive bias in a two-stage approach that uses predictions from a low-capacity network to inform the training of a high-capacity network.
- Score: 0.19573380763700707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their impressive performance in object recognition and other tasks
under standard testing conditions, deep convolutional neural networks (DCNNs)
often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for
this shortcoming is that modern architectures tend to rely on "shortcuts" -
superficial features that correlate with categories without capturing deeper
invariants that hold across contexts. Real-world concepts often possess a
complex structure that can vary superficially across contexts, which can make
the most intuitive and promising solutions in one context not generalize to
others. One potential way to improve o.o.d. generalization is to assume simple
solutions are unlikely to be valid across contexts and downweight them, which
we refer to as the too-good-to-be-true prior. We implement this inductive bias
in a two-stage approach that uses predictions from a low-capacity network (LCN)
to inform the training of a high-capacity network (HCN). Since the shallow
architecture of the LCN can only learn surface relationships, which includes
shortcuts, we downweight training items for the HCN that the LCN can master,
thereby encouraging the HCN to rely on deeper invariant features that should
generalize broadly. Using a modified version of the CIFAR-10 dataset in which
we introduced shortcuts, we found that the two-stage LCN-HCN approach reduced
reliance on shortcuts and facilitated o.o.d. generalization.
Related papers
- Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - On Measuring Localization of Shortcuts in Deep Networks [10.928881579403907]
Shortcuts, spurious rules that perform well during training but fail to generalize, present a major challenge to the reliability of deep networks.<n>We study shortcuts on CIFAR-10, Waterbirds, and CelebA datasets across VGG, ResNet, DeiT, and ConvNeXt architectures.<n>We find that shortcut learning is not localized in specific layers but distributed throughout the network.
arXiv Detail & Related papers (2025-10-30T14:51:03Z) - Auto-Compressing Networks [59.83547898874152]
We introduce Auto- Networks (ACNs), an architectural variant where additive long feedforward connections from each layer replace traditional short residual connections.<n>ACNs showcase unique property we coin as "auto-compression", the ability of a network to organically compress information during training.<n>We find that ACNs exhibit enhanced noise robustness compared to residual networks, superior performance in low-data settings, and mitigate catastrophic forgetting.
arXiv Detail & Related papers (2025-06-11T13:26:09Z) - Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization [52.16435732772263]
Second-order optimization has been shown to accelerate the training of deep neural networks in many applications.
However, generalization properties of second-order methods are still being debated.
We show for the first time that exact Gauss-Newton (GN) updates take on a tractable form in a class of deep architectures.
arXiv Detail & Related papers (2024-11-12T17:58:40Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - On skip connections and normalisation layers in deep optimisation [32.51139594406463]
We introduce a general theoretical framework for the study of optimisation of deep neural networks.
Our framework determines the curvature and regularity properties of multilayer loss landscapes.
We identify a novel causal mechanism by which skip connections accelerate training.
arXiv Detail & Related papers (2022-10-10T06:22:46Z) - Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in
Neural Networks [66.76034024335833]
We investigate why diverse/ complex features are learned by the backbone, and their brittleness is due to the linear classification head relying primarily on the simplest features.
We propose Feature Reconstruction Regularizer (FRR) to ensure that the learned features can be reconstructed back from the logits.
We demonstrate up to 15% gains in OOD accuracy on the recently introduced semi-synthetic datasets with extreme distribution shifts.
arXiv Detail & Related papers (2022-10-04T04:01:15Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation.
In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN.
Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z) - Sparsity Aware Normalization for GANs [32.76828505875087]
Generative adversarial networks (GANs) are known to benefit from regularization or normalization of their critic (discriminator) network during training.
In this paper, we analyze the popular spectral normalization scheme, find a significant drawback and introduce sparsity aware normalization (SAN), a new alternative approach for stabilizing GAN training.
arXiv Detail & Related papers (2021-03-03T15:05:18Z) - Faster Convergence in Deep-Predictive-Coding Networks to Learn Deeper
Representations [12.716429755564821]
Deep-predictive-coding networks (DPCNs) are hierarchical, generative models that rely on feed-forward and feed-back connections.
A crucial element of DPCNs is a forward-backward inference procedure to uncover sparse states of a dynamic model.
We propose an optimization strategy, with better empirical and theoretical convergence, based on accelerated proximal gradients.
arXiv Detail & Related papers (2021-01-18T02:30:13Z) - Disentangling Trainability and Generalization in Deep Neural Networks [45.15453323967438]
We analyze the spectrum of the Neural Tangent Kernel (NTK) for trainability and generalization across a range of networks.
We find that CNNs without global average pooling behave almost identically to FCNs, but that CNNs with pooling have markedly different and often better generalization performance.
arXiv Detail & Related papers (2019-12-30T18:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.