Related papers: No Free Lunch From Random Feature Ensembles

No Free Lunch From Random Feature Ensembles

URL: http://arxiv.org/abs/2412.05418v1
Date: Fri, 06 Dec 2024 20:55:27 GMT
Title: No Free Lunch From Random Feature Ensembles
Authors: Benjamin S. Ruben, William L. Tong, Hamza Tahir Chaudhry, Cengiz Pehlevan,
Abstract summary: Given a budget on total model size, one must decide whether to train a single, large neural network or to combine the predictions of many smaller networks.<n>We prove that when a fixed number of trainable parameters are partitioned among $K$ independently trained models, $K=1$ achieves optimal performance.<n>We identify conditions on the kernel and task eigenstructure under which ensembles can achieve near-optimal scaling laws.
Score: 23.661623767100384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Given a budget on total model size, one must decide whether to train a single, large neural network or to combine the predictions of many smaller networks. We study this trade-off for ensembles of random-feature ridge regression models. We prove that when a fixed number of trainable parameters are partitioned among $K$ independently trained models, $K=1$ achieves optimal performance, provided the ridge parameter is optimally tuned. We then derive scaling laws which describe how the test risk of an ensemble of regression models decays with its total size. We identify conditions on the kernel and task eigenstructure under which ensembles can achieve near-optimal scaling laws. Training ensembles of deep convolutional neural networks on CIFAR-10 and a transformer architecture on C4, we find that a single large network outperforms any ensemble of networks with the same total number of parameters, provided the weight decay and feature-learning strength are tuned to their optimal values.

Related papers

MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning [7.262751938473306]
Pruning is a well-established technique that reduces the size of neural networks while mathematically guaranteeing accuracy preservation. We develop a new pruning algorithm, MPruner, that leverages mutual information through vector similarity. MPruner achieved up to a 50% reduction in parameters and memory usage for CNN and transformer-based models, with minimal to no loss in accuracy.
arXiv Detail & Related papers (2024-08-24T05:54:47Z)
Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures [85.76673783330334]
Two different settings of linear weight-sharing layers motivate two flavours of Kronecker-Factored Approximate Curvature (K-FAC) We show they are exact for deep linear networks with weight-sharing in their respective setting. We observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer.
arXiv Detail & Related papers (2023-11-01T16:37:00Z)
Transfer-Once-For-All: AI Model Optimization for Edge [0.0]
We propose Transfer-Once-For-All (TOFA) for supernet-style training on small data sets with constant computational training cost. To overcome the challenges arising from small data, TOFA utilizes a unified semi-supervised training loss to simultaneously train all existings within the supernet.
arXiv Detail & Related papers (2023-03-27T04:14:30Z)
Autoselection of the Ensemble of Convolutional Neural Networks with Second-Order Cone Programming [0.8029049649310213]
This study proposes a mathematical model which prunes the ensemble of Convolutional Neural Networks (CNN) The proposed model is tested on CIFAR-10, CIFAR-100 and MNIST data sets.
arXiv Detail & Related papers (2023-02-12T16:18:06Z)
Robust Binary Models by Pruning Randomly-initialized Networks [57.03100916030444]
We propose ways to obtain robust models against adversarial attacks from randomly-d binary networks. We learn the structure of the robust model by pruning a randomly-d binary network. Our method confirms the strong lottery ticket hypothesis in the presence of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T00:05:08Z)
AutoDEUQ: Automated Deep Ensemble with Uncertainty Quantification [0.9449650062296824]
We propose AutoDEUQ, an automated approach for generating an ensemble of deep neural networks. We show that AutoDEUQ outperforms probabilistic backpropagation, Monte Carlo dropout, deep ensemble, distribution-free ensembles, and hyper ensemble methods on a number of regression benchmarks.
arXiv Detail & Related papers (2021-10-26T09:12:23Z)
A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell. This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z)
ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN. We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network. PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.