Contextual Dropout: An Efficient Sample-Dependent Dropout Module
- URL: http://arxiv.org/abs/2103.04181v1
- Date: Sat, 6 Mar 2021 19:30:32 GMT
- Title: Contextual Dropout: An Efficient Sample-Dependent Dropout Module
- Authors: Xinjie Fan, Shujian Zhang, Korawat Tanwisuth, Xiaoning Qian, Mingyuan
Zhou
- Abstract summary: Dropout has been demonstrated as a simple and effective module to regularize the training process of deep neural networks.
We propose contextual dropout with an efficient structural design as a simple and scalable sample-dependent dropout module.
Our experimental results show that the proposed method outperforms baseline methods in terms of both accuracy and quality of uncertainty estimation.
- Score: 60.63525456640462
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dropout has been demonstrated as a simple and effective module to not only
regularize the training process of deep neural networks, but also provide the
uncertainty estimation for prediction. However, the quality of uncertainty
estimation is highly dependent on the dropout probabilities. Most current
models use the same dropout distributions across all data samples due to its
simplicity. Despite the potential gains in the flexibility of modeling
uncertainty, sample-dependent dropout, on the other hand, is less explored as
it often encounters scalability issues or involves non-trivial model changes.
In this paper, we propose contextual dropout with an efficient structural
design as a simple and scalable sample-dependent dropout module, which can be
applied to a wide range of models at the expense of only slightly increased
memory and computational cost. We learn the dropout probabilities with a
variational objective, compatible with both Bernoulli dropout and Gaussian
dropout. We apply the contextual dropout module to various models with
applications to image classification and visual question answering and
demonstrate the scalability of the method with large-scale datasets, such as
ImageNet and VQA 2.0. Our experimental results show that the proposed method
outperforms baseline methods in terms of both accuracy and quality of
uncertainty estimation.
Related papers
- Evidence Networks: simple losses for fast, amortized, neural Bayesian
model comparison [0.0]
Evidence Networks can enable Bayesian model comparison when state-of-the-art methods fail.
We introduce the leaky parity-odd power transform, leading to the novel l-POP-Exponential'' loss function.
We show that Evidence Networks are explicitly independent of dimensionality of the parameter space and scale mildly with the complexity of the posterior probability density function.
arXiv Detail & Related papers (2023-05-18T18:14:53Z) - Decision-based iterative fragile watermarking for model integrity
verification [33.42076236847454]
Foundation models are typically hosted on cloud servers to meet the high demand for their services.
This exposes them to security risks, as attackers can modify them after uploading to the cloud or transferring from a local system.
We propose an iterative decision-based fragile watermarking algorithm that transforms normal training samples into fragile samples that are sensitive to model changes.
arXiv Detail & Related papers (2023-05-13T10:36:11Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Learning Multivariate CDFs and Copulas using Tensor Factorization [39.24470798045442]
Learning the multivariate distribution of data is a core challenge in statistics and machine learning.
In this work, we aim to learn multivariate cumulative distribution functions (CDFs), as they can handle mixed random variables.
We show that any grid sampled version of a joint CDF of mixed random variables admits a universal representation as a naive Bayes model.
We demonstrate the superior performance of the proposed model in several synthetic and real datasets and applications including regression, sampling and data imputation.
arXiv Detail & Related papers (2022-10-13T16:18:46Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Know Where To Drop Your Weights: Towards Faster Uncertainty Estimation [7.605814048051737]
Estimating uncertainty of models used in low-latency applications is a challenge due to the computationally demanding nature of uncertainty estimation techniques.
We propose Select-DC which uses a subset of layers in a neural network to model uncertainty with MCDC.
We show a significant reduction in the GFLOPS required to model uncertainty, compared to Monte Carlo DropConnect, with marginal trade-off in performance.
arXiv Detail & Related papers (2020-10-27T02:56:27Z) - Advanced Dropout: A Model-free Methodology for Bayesian Dropout
Optimization [62.8384110757689]
Overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs)
The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate.
We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets.
arXiv Detail & Related papers (2020-10-11T13:19:58Z) - Dropout Strikes Back: Improved Uncertainty Estimation via Diversity
Sampling [3.077929914199468]
We show that modifying the sampling distributions for dropout layers in neural networks improves the quality of uncertainty estimation.
Our main idea consists of two main steps: computing data-driven correlations between neurons and generating samples, which include maximally diverse neurons.
arXiv Detail & Related papers (2020-03-06T15:20:04Z) - Learnable Bernoulli Dropout for Bayesian Deep Learning [53.79615543862426]
Learnable Bernoulli dropout (LBD) is a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters.
LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation.
arXiv Detail & Related papers (2020-02-12T18:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.