Related papers: Investigating the Relationship Between Dropout Regularization and Model Complexity in Neural Networks

Investigating the Relationship Between Dropout Regularization and Model Complexity in Neural Networks

URL: http://arxiv.org/abs/2108.06628v1
Date: Sat, 14 Aug 2021 23:49:33 GMT
Title: Investigating the Relationship Between Dropout Regularization and Model Complexity in Neural Networks
Authors: Christopher Sun, Jai Sharma, and Milind Maiti
Abstract summary: Dropout Regularization serves to reduce variance in Deep Learning models. We explore the relationship between the dropout rate and model complexity by training 2,000 neural networks. We build neural networks that predict the optimal dropout rate given the number of hidden units in each dense layer.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dropout Regularization, serving to reduce variance, is nearly ubiquitous in Deep Learning models. We explore the relationship between the dropout rate and model complexity by training 2,000 neural networks configured with random combinations of the dropout rate and the number of hidden units in each dense layer, on each of the three data sets we selected. The generated figures, with binary cross entropy loss and binary accuracy on the z-axis, question the common assumption that adding depth to a dense layer while increasing the dropout rate will certainly enhance performance. We also discover a complex correlation between the two hyperparameters that we proceed to quantify by building additional machine learning and Deep Learning models which predict the optimal dropout rate given some hidden units in each dense layer. Linear regression and polynomial logistic regression require the use of arbitrary thresholds to select the cost data points included in the regression and to assign the cost data points a binary classification, respectively. These machine learning models have mediocre performance because their naive nature prevented the modeling of complex decision boundaries. Turning to Deep Learning models, we build neural networks that predict the optimal dropout rate given the number of hidden units in each dense layer, the desired cost, and the desired accuracy of the model. Though, this attempt encounters a mathematical error that can be attributed to the failure of the vertical line test. The ultimate Deep Learning model is a neural network whose decision boundary represents the 2,000 previously generated data points. This final model leads us to devise a promising method for tuning hyperparameters to minimize computational expense yet maximize performance. The strategy can be applied to any model hyperparameters, with the prospect of more efficient tuning in industrial models.

Related papers

Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning. Our approach can compress the number of parameters by more than 70%. We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z)
Diffusion-Model-Assisted Supervised Learning of Generative Models for Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation. We use the score-based diffusion model to generate labeled data. Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z)
Expressive variational quantum circuits provide inherent privacy in federated learning [2.3255115473995134]
Federated learning has emerged as a viable solution to train machine learning models without the need to share data with the central aggregator. Standard neural network-based federated learning models have been shown to be susceptible to data leakage from the gradients shared with the server. We show that expressive maps lead to inherent privacy against gradient inversion attacks.
arXiv Detail & Related papers (2023-09-22T17:04:50Z)
A Deep Dive into the Connections Between the Renormalization Group and Deep Learning in the Ising Model [0.0]
Renormalization group (RG) is an essential technique in statistical physics and quantum field theory. We develop extensive renormalization techniques for the 1D and 2D Ising model to provide a baseline for comparison. For the 2D Ising model, we successfully generated Ising model samples using the Wolff algorithm, and performed the group flow using a quasi-deterministic method.
arXiv Detail & Related papers (2023-08-21T22:50:54Z)
Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models. It is most prominently used in federated learning. We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z)
Phantom Embeddings: Using Embedding Space for Model Regularization in Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data. The complex models tend to memorize the training data, which results in poor regularization performance on test data. We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning. To take the power of both worlds, we propose a novel X-model. X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z)
Training Deep Neural Networks with Constrained Learning Parameters [4.917317902787792]
A significant portion of deep learning tasks would run on edge computing systems. We propose the Combinatorial Neural Network Training Algorithm (CoNNTrA) CoNNTrA trains deep learning models with ternary learning parameters on the MNIST, Iris and ImageNet data sets. Our results indicate that CoNNTrA models use 32x less memory and have errors at par with the Backpropagation models.
arXiv Detail & Related papers (2020-09-01T16:20:11Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.