Generalization by design: Shortcuts to Generalization in Deep Learning
- URL: http://arxiv.org/abs/2107.02253v1
- Date: Mon, 5 Jul 2021 20:01:23 GMT
- Title: Generalization by design: Shortcuts to Generalization in Deep Learning
- Authors: Petr Taborsky, Lars Kai Hansen
- Abstract summary: We show that good generalization may be instigated by bounded spectral products over layers leading to a novel geometric regularizer.
Backed up by theory we further demonstrate that "generalization by design" is practically possible and that good generalization may be encoded into the structure of the network.
- Score: 7.751691910877239
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We take a geometrical viewpoint and present a unifying view on supervised
deep learning with the Bregman divergence loss function - this entails frequent
classification and prediction tasks. Motivated by simulations we suggest that
there is principally no implicit bias of vanilla stochastic gradient descent
training of deep models towards "simpler" functions. Instead, we show that good
generalization may be instigated by bounded spectral products over layers
leading to a novel geometric regularizer. It is revealed that in deep enough
models such a regularizer enables both, extreme accuracy and generalization, to
be reached. We associate popular regularization techniques like weight decay,
drop out, batch normalization, and early stopping with this perspective. Backed
up by theory we further demonstrate that "generalization by design" is
practically possible and that good generalization may be encoded into the
structure of the network. We design two such easy-to-use structural
regularizers that insert an additional \textit{generalization layer} into a
model architecture, one with a skip connection and another one with drop-out.
We verify our theoretical results in experiments on various feedforward and
convolutional architectures, including ResNets, and datasets (MNIST, CIFAR10,
synthetic data). We believe this work opens up new avenues of research towards
better generalizing architectures.
Related papers
- What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks.
This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z) - A Margin-based Multiclass Generalization Bound via Geometric Complexity [6.554326244334867]
We investigate margin-based multiclass generalization bounds for neural networks.
We derive a new upper bound on the generalization error which scales with the margin-normalized geometric complexity of the network.
arXiv Detail & Related papers (2024-05-28T21:08:58Z) - GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image [94.56927147492738]
We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes from single images.
We show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage.
We propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions.
arXiv Detail & Related papers (2024-03-18T17:50:41Z) - Rotation Equivariant Proximal Operator for Deep Unfolding Methods in
Image Restoration [68.18203605110719]
We propose a high-accuracy rotation equivariant proximal network that embeds rotation symmetry priors into the deep unfolding framework.
This study makes efforts to suggest a high-accuracy rotation equivariant proximal network that effectively embeds rotation symmetry priors into the deep unfolding framework.
arXiv Detail & Related papers (2023-12-25T11:53:06Z) - Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer
with Mixture-of-View-Experts [88.23732496104667]
Cross-scene generalizable NeRF models have become a new spotlight of the NeRF field.
We bridge "neuralized" architectures with the powerful Mixture-of-Experts (MoE) idea from large language models.
Our proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has experimentally shown state-of-the-art results when transferring to unseen scenes.
arXiv Detail & Related papers (2023-08-22T21:18:54Z) - Sparsity-aware generalization theory for deep neural networks [12.525959293825318]
We present a new approach to analyzing generalization for deep feed-forward ReLU networks.
We show fundamental trade-offs between sparsity and generalization.
arXiv Detail & Related papers (2023-07-01T20:59:05Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - On skip connections and normalisation layers in deep optimisation [32.51139594406463]
We introduce a general theoretical framework for the study of optimisation of deep neural networks.
Our framework determines the curvature and regularity properties of multilayer loss landscapes.
We identify a novel causal mechanism by which skip connections accelerate training.
arXiv Detail & Related papers (2022-10-10T06:22:46Z) - Generalization Through The Lens Of Leave-One-Out Error [22.188535244056016]
We show that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime.
Our work therefore demonstrates that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime.
arXiv Detail & Related papers (2022-03-07T14:56:00Z) - An Optimization and Generalization Analysis for Max-Pooling Networks [34.58092926599547]
Max-Pooling operations are a core component of deep learning architectures.
We perform a theoretical analysis of a convolutional max-pooling architecture.
We empirically validate that CNNs significantly outperform fully connected networks in our setting.
arXiv Detail & Related papers (2020-02-22T22:26:26Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.