Related papers: Dropout: Explicit Forms and Capacity Control

Dropout: Explicit Forms and Capacity Control

URL: http://arxiv.org/abs/2003.03397v1
Date: Fri, 6 Mar 2020 19:10:15 GMT
Title: Dropout: Explicit Forms and Capacity Control
Authors: Raman Arora, Peter Bartlett, Poorya Mianjy, Nathan Srebro
Abstract summary: We investigate capacity control provided by dropout in various machine learning problems. In deep learning, we show that the data-dependent regularizer due to dropout directly controls the Rademacher complexity of the underlying class of deep neural networks. We evaluate our theoretical findings on real-world datasets, including MovieLens, MNIST, and Fashion-MNIST.
Score: 57.36692251815882
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the capacity control provided by dropout in various machine learning problems. First, we study dropout for matrix completion, where it induces a data-dependent regularizer that, in expectation, equals the weighted trace-norm of the product of the factors. In deep learning, we show that the data-dependent regularizer due to dropout directly controls the Rademacher complexity of the underlying class of deep neural networks. These developments enable us to give concrete generalization error bounds for the dropout algorithm in both matrix completion as well as training deep neural networks. We evaluate our theoretical findings on real-world datasets, including MovieLens, MNIST, and Fashion-MNIST.

Related papers

Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models [7.71225721416736]
Unsupervised pre-training and transfer learning are commonly used to initialize training algorithms for neural networks. We study the effects of unsupervised pre-training and transfer learning on the sample complexity of high-dimensional supervised learning.
arXiv Detail & Related papers (2025-02-24T05:13:11Z)
Average gradient outer product as a mechanism for deep neural collapse [26.939895223897572]
Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs) In this work, we introduce a data-dependent setting where DNC forms due to feature learning through the average gradient outer product (AGOP) We show that the right singular vectors and values of the weights can be responsible for the majority of within-class variability collapse for neural networks trained in the feature learning regime.
arXiv Detail & Related papers (2024-02-21T11:40:27Z)
Learning Universal Predictors [23.18743879588599]
We explore the potential of amortizing the most powerful universal predictor, namely Solomonoff Induction (SI), into neural networks via leveraging meta-learning to its limits. We use Universal Turing Machines (UTMs) to generate training data used to expose networks to a broad range of patterns. Our results suggest that UTM data is a valuable resource for meta-learning, and that it can be used to train neural networks capable of learning universal prediction strategies.
arXiv Detail & Related papers (2024-01-26T15:37:16Z)
Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced Dataset and Benchmark [62.997667081978825]
The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident. The dataset is created by aggregating publicly available datasets from the UK Department for Transport.
arXiv Detail & Related papers (2022-05-20T21:15:26Z)
Investigating Compounding Prediction Errors in Learned Dynamics Models [7.237751303770201]
Accurately predicting the consequences of agents' actions is a key prerequisite for planning in robotic control. Deep MBRL has become a popular candidate, using a neural network to learn a dynamics model that predicts with each pass from high-dimensional states to actions. These "one-step" predictions are known to become inaccurate over longer horizons of composed prediction.
arXiv Detail & Related papers (2022-03-17T22:24:38Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
On the Robustness and Generalization of Deep Learning Driven Full Waveform Inversion [2.5382095320488665]
Full Waveform Inversion (FWI) is commonly epitomized as an image-to-image translation task. Despite being trained with synthetic data, the deep learning-driven FWI is expected to perform well when evaluated with sufficient real-world data. We study such properties by asking: how robust are these deep neural networks and how do they generalize?
arXiv Detail & Related papers (2021-11-28T19:27:59Z)
On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning [69.48387059607387]
We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning. We analyze the limitations of learning from confounded expert data with and without external reward. We validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.
arXiv Detail & Related papers (2021-10-13T07:31:31Z)
Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque. Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z)
Statistical Mechanics of Deep Linear Neural Networks: The Back-Propagating Renormalization Group [4.56877715768796]
We study the statistical mechanics of learning in Deep Linear Neural Networks (DLNNs) in which the input-output function of an individual unit is linear. We solve exactly the network properties following supervised learning using an equilibrium Gibbs distribution in the weight space. Our numerical simulations reveal that despite the nonlinearity, the predictions of our theory are largely shared by ReLU networks with modest depth.
arXiv Detail & Related papers (2020-12-07T20:08:31Z)
Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.