Related papers: Generalization by design: Shortcuts to Generalization in Deep Learning

Generalization by design: Shortcuts to Generalization in Deep Learning

URL: http://arxiv.org/abs/2107.02253v1
Date: Mon, 5 Jul 2021 20:01:23 GMT
Title: Generalization by design: Shortcuts to Generalization in Deep Learning
Authors: Petr Taborsky, Lars Kai Hansen
Abstract summary: We show that good generalization may be instigated by bounded spectral products over layers leading to a novel geometric regularizer. Backed up by theory we further demonstrate that "generalization by design" is practically possible and that good generalization may be encoded into the structure of the network.
Score: 7.751691910877239
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We take a geometrical viewpoint and present a unifying view on supervised deep learning with the Bregman divergence loss function - this entails frequent classification and prediction tasks. Motivated by simulations we suggest that there is principally no implicit bias of vanilla stochastic gradient descent training of deep models towards "simpler" functions. Instead, we show that good generalization may be instigated by bounded spectral products over layers leading to a novel geometric regularizer. It is revealed that in deep enough models such a regularizer enables both, extreme accuracy and generalization, to be reached. We associate popular regularization techniques like weight decay, drop out, batch normalization, and early stopping with this perspective. Backed up by theory we further demonstrate that "generalization by design" is practically possible and that good generalization may be encoded into the structure of the network. We design two such easy-to-use structural regularizers that insert an additional \textit{generalization layer} into a model architecture, one with a skip connection and another one with drop-out. We verify our theoretical results in experiments on various feedforward and convolutional architectures, including ResNets, and datasets (MNIST, CIFAR10, synthetic data). We believe this work opens up new avenues of research towards better generalizing architectures.

Related papers

Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization [52.16435732772263]
Second-order optimization has been shown to accelerate the training of deep neural networks in many applications. However, generalization properties of second-order methods are still being debated. We show for the first time that exact Gauss-Newton (GN) updates take on a tractable form in a class of deep architectures.
arXiv Detail & Related papers (2024-11-12T17:58:40Z)
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks. This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z)
A Margin-based Multiclass Generalization Bound via Geometric Complexity [6.554326244334867]
We investigate margin-based multiclass generalization bounds for neural networks. We derive a new upper bound on the generalization error which scales with the margin-normalized geometric complexity of the network.
arXiv Detail & Related papers (2024-05-28T21:08:58Z)
On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase. Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z)
Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration [62.41329042683779]
We propose a high-accuracy rotation equivariant proximal network that embeds rotation symmetry priors into the deep unfolding framework. This study makes efforts to suggest a high-accuracy rotation equivariant proximal network that effectively embeds rotation symmetry priors into the deep unfolding framework.
arXiv Detail & Related papers (2023-12-25T11:53:06Z)
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts [88.23732496104667]
Cross-scene generalizable NeRF models have become a new spotlight of the NeRF field. We bridge "neuralized" architectures with the powerful Mixture-of-Experts (MoE) idea from large language models. Our proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has experimentally shown state-of-the-art results when transferring to unseen scenes.
arXiv Detail & Related papers (2023-08-22T21:18:54Z)
Sparsity-aware generalization theory for deep neural networks [12.525959293825318]
We present a new approach to analyzing generalization for deep feed-forward ReLU networks. We show fundamental trade-offs between sparsity and generalization.
arXiv Detail & Related papers (2023-07-01T20:59:05Z)
Generalization and Estimation Error Bounds for Model-based Neural Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks. We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z)
On skip connections and normalisation layers in deep optimisation [32.51139594406463]
We introduce a general theoretical framework for the study of optimisation of deep neural networks. Our framework determines the curvature and regularity properties of multilayer loss landscapes. We identify a novel causal mechanism by which skip connections accelerate training.
arXiv Detail & Related papers (2022-10-10T06:22:46Z)
Generalization Through The Lens Of Leave-One-Out Error [22.188535244056016]
We show that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime. Our work therefore demonstrates that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime.
arXiv Detail & Related papers (2022-03-07T14:56:00Z)
An Optimization and Generalization Analysis for Max-Pooling Networks [34.58092926599547]
Max-Pooling operations are a core component of deep learning architectures. We perform a theoretical analysis of a convolutional max-pooling architecture. We empirically validate that CNNs significantly outperform fully connected networks in our setting.
arXiv Detail & Related papers (2020-02-22T22:26:26Z)
Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective. We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.