Investigating the Relationship Between Dropout Regularization and Model
Complexity in Neural Networks
- URL: http://arxiv.org/abs/2108.06628v1
- Date: Sat, 14 Aug 2021 23:49:33 GMT
- Title: Investigating the Relationship Between Dropout Regularization and Model
Complexity in Neural Networks
- Authors: Christopher Sun, Jai Sharma, and Milind Maiti
- Abstract summary: Dropout Regularization serves to reduce variance in Deep Learning models.
We explore the relationship between the dropout rate and model complexity by training 2,000 neural networks.
We build neural networks that predict the optimal dropout rate given the number of hidden units in each dense layer.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dropout Regularization, serving to reduce variance, is nearly ubiquitous in
Deep Learning models. We explore the relationship between the dropout rate and
model complexity by training 2,000 neural networks configured with random
combinations of the dropout rate and the number of hidden units in each dense
layer, on each of the three data sets we selected. The generated figures, with
binary cross entropy loss and binary accuracy on the z-axis, question the
common assumption that adding depth to a dense layer while increasing the
dropout rate will certainly enhance performance. We also discover a complex
correlation between the two hyperparameters that we proceed to quantify by
building additional machine learning and Deep Learning models which predict the
optimal dropout rate given some hidden units in each dense layer. Linear
regression and polynomial logistic regression require the use of arbitrary
thresholds to select the cost data points included in the regression and to
assign the cost data points a binary classification, respectively. These
machine learning models have mediocre performance because their naive nature
prevented the modeling of complex decision boundaries. Turning to Deep Learning
models, we build neural networks that predict the optimal dropout rate given
the number of hidden units in each dense layer, the desired cost, and the
desired accuracy of the model. Though, this attempt encounters a mathematical
error that can be attributed to the failure of the vertical line test. The
ultimate Deep Learning model is a neural network whose decision boundary
represents the 2,000 previously generated data points. This final model leads
us to devise a promising method for tuning hyperparameters to minimize
computational expense yet maximize performance. The strategy can be applied to
any model hyperparameters, with the prospect of more efficient tuning in
industrial models.
Related papers
- Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning.
Our approach can compress the number of parameters by more than 70%.
We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z) - Diffusion-Model-Assisted Supervised Learning of Generative Models for
Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation.
We use the score-based diffusion model to generate labeled data.
Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z) - Expressive variational quantum circuits provide inherent privacy in
federated learning [2.3255115473995134]
Federated learning has emerged as a viable solution to train machine learning models without the need to share data with the central aggregator.
Standard neural network-based federated learning models have been shown to be susceptible to data leakage from the gradients shared with the server.
We show that expressive maps lead to inherent privacy against gradient inversion attacks.
arXiv Detail & Related papers (2023-09-22T17:04:50Z) - A Deep Dive into the Connections Between the Renormalization Group and
Deep Learning in the Ising Model [0.0]
Renormalization group (RG) is an essential technique in statistical physics and quantum field theory.
We develop extensive renormalization techniques for the 1D and 2D Ising model to provide a baseline for comparison.
For the 2D Ising model, we successfully generated Ising model samples using the Wolff algorithm, and performed the group flow using a quasi-deterministic method.
arXiv Detail & Related papers (2023-08-21T22:50:54Z) - Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - Phantom Embeddings: Using Embedding Space for Model Regularization in
Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data.
The complex models tend to memorize the training data, which results in poor regularization performance on test data.
We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.