Deep Regression Ensembles
- URL: http://arxiv.org/abs/2203.05417v1
- Date: Thu, 10 Mar 2022 15:13:46 GMT
- Title: Deep Regression Ensembles
- Authors: Antoine Didisheim, Bryan Kelly, Semyon Malamud
- Abstract summary: We introduce a methodology for designing and training deep neural networks (DNN) that we call "Deep Regression Ensembles" (DRE)
It bridges the gap between DNN and two-layer neural networks trained with random feature regression.
Our experiments show that a single DRE architecture is at par with or exceeds state-of-the-art DNN in many data sets.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We introduce a methodology for designing and training deep neural networks
(DNN) that we call "Deep Regression Ensembles" (DRE). It bridges the gap
between DNN and two-layer neural networks trained with random feature
regression. Each layer of DRE has two components, randomly drawn input weights
and output weights trained myopically (as if the final output layer) using
linear ridge regression. Within a layer, each neuron uses a different subset of
inputs and a different ridge penalty, constituting an ensemble of random
feature ridge regressions. Our experiments show that a single DRE architecture
is at par with or exceeds state-of-the-art DNN in many data sets. Yet, because
DRE neural weights are either known in closed-form or randomly drawn, its
computational cost is orders of magnitude smaller than DNN.
Related papers
- Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse [32.06666853127924]
Deep neural networks (DNNs) at convergence consistently represent the training data in the last layer via a symmetric geometric structure referred to as neural collapse.
Here, the features of the penultimate layer are free variables, which makes the model data-agnostic and, hence, puts into question its ability to capture training.
We first prove generic guarantees on neural collapse that assume (i) low training error and balancedness of the linear layers, and (ii) bounded conditioning of the features before the linear part.
arXiv Detail & Related papers (2024-10-07T10:16:40Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Function Regression using Spiking DeepONet [2.935661780430872]
We present an SNN-based method to perform regression, which has been a challenge due to the inherent difficulty in representing a function's input domain and continuous output values as spikes.
We use a DeepONet - neural network designed to learn operators - to learn the behavior of spikes.
We propose several methods to use a DeepONet in the spiking framework, and present accuracy and training time for different benchmarks.
arXiv Detail & Related papers (2022-05-17T15:22:22Z) - Saving RNN Computations with a Neuron-Level Fuzzy Memoization Scheme [0.0]
Recurrent Neural Networks (RNNs) are a key technology for applications such as automatic speech recognition or machine translation.
We build a neuron-level fuzzy memoization scheme, which dynamically caches each neuron's output and reuses it whenever it is predicted that the current output will be similar to a previously computed result.
We show that our technique avoids more than 26.7% of computations, resulting in 21% energy savings and 1.4x speedup on average.
arXiv Detail & Related papers (2022-02-14T09:02:03Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Twin Neural Network Regression [0.802904964931021]
We introduce twin neural network (TNN) regression.
This method predicts differences between the target values of two different data points rather than the targets themselves.
We show that TNNs are able to compete or yield more accurate predictions for different data sets, compared to other state-of-the-art methods.
arXiv Detail & Related papers (2020-12-29T17:52:31Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy.
We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.