Related papers: How Does Overparameterization Affect Features?

How Does Overparameterization Affect Features?

URL: http://arxiv.org/abs/2407.00968v1
Date: Mon, 1 Jul 2024 05:01:03 GMT
Title: How Does Overparameterization Affect Features?
Authors: Ahmet Cagri Duzgun, Samy Jelassi, Yuanzhi Li,
Abstract summary: We first examine the expressivity of the features of these models, and show that the feature space of over parameterized networks cannot be spanned by concatenating many under parameterized features. We then evaluate the performance of these models, and find that over parameterized networks outperform under parameterized networks. We propose a toy setting to explain how over parameterized networks can learn some important features that the underparamaterized networks cannot learn.
Score: 42.99771787546585
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Overparameterization, the condition where models have more parameters than necessary to fit their training loss, is a crucial factor for the success of deep learning. However, the characteristics of the features learned by overparameterized networks are not well understood. In this work, we explore this question by comparing models with the same architecture but different widths. We first examine the expressivity of the features of these models, and show that the feature space of overparameterized networks cannot be spanned by concatenating many underparameterized features, and vice versa. This reveals that both overparameterized and underparameterized networks acquire some distinctive features. We then evaluate the performance of these models, and find that overparameterized networks outperform underparameterized networks, even when many of the latter are concatenated. We corroborate these findings using a VGG-16 and ResNet18 on CIFAR-10 and a Transformer on the MNLI classification dataset. Finally, we propose a toy setting to explain how overparameterized networks can learn some important features that the underparamaterized networks cannot learn.

Related papers

Instruction-Guided Autoregressive Neural Network Parameter Generation [49.800239140036496]
We propose IGPG, an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures. By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets. Experiments on multiple datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework.
arXiv Detail & Related papers (2025-04-02T05:50:19Z)
Recurrent Diffusion for Large-Scale Parameter Generation [52.98888368644455]
We introduce Recurrent Diffusion for Large Scale Generation (RPG), a novel framework that generates full neural network parameters up to hundreds of millions on a single GPU. RPG serves as a critical advance in AI generating AI, potentially enabling efficient weight generation at scales previously deemed infeasible.
arXiv Detail & Related papers (2025-01-20T16:46:26Z)
On Learnable Parameters of Optimal and Suboptimal Deep Learning Models [2.889799048595314]
We study the structural and operational aspects of deep learning models. Our research focuses on the nuances of learnable parameters (weight) statistics, distribution, node interaction, and visualization.
arXiv Detail & Related papers (2024-08-21T15:50:37Z)
Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process. In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture. We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Conditionally Parameterized, Discretization-Aware Neural Networks for Mesh-Based Modeling of Physical Systems [0.0]
We generalize the idea of conditional parametrization -- using trainable functions of input parameters. We show that conditionally parameterized networks provide superior performance compared to their traditional counterparts. A network architecture named CP-GNet is also proposed as the first deep learning model capable of reacting standalone prediction of flows on meshes.
arXiv Detail & Related papers (2021-09-15T20:21:13Z)
Network insensitivity to parameter noise via adversarial regularization [0.0]
We present a new adversarial network optimisation algorithm that attacks network parameters during training. We show that our approach produces models that are more robust to targeted parameter variation. Our work provides an approach to deploy neural network architectures to inference devices that suffer from computational non-idealities.
arXiv Detail & Related papers (2021-06-09T12:11:55Z)
Exploring the parameter reusability of CNN [12.654187477646449]
We propose a solution that can judge whether a given network is reusable or not based on the performance of reusing convolution kernels. We define that the success of a CNN's parameter reuse depends upon two conditions: first, the network is a reusable network; and second, the RMSE between the convolution kernels from the source domain and target domain is small enough.
arXiv Detail & Related papers (2020-08-08T01:23:22Z)
Neural Parameter Allocation Search [57.190693718951316]
Training neural networks requires increasing amounts of memory. Existing methods assume networks have many identical layers and utilize hand-crafted sharing strategies that fail to generalize. We introduce Neural Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget. NPAS covers both low-budget regimes, which produce compact networks, as well as a novel high-budget regime, where additional capacity can be added to boost performance without increasing inference FLOPs.
arXiv Detail & Related papers (2020-06-18T15:01:00Z)
When Residual Learning Meets Dense Aggregation: Rethinking the Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations. Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.