Related papers: Contextual Dropout: An Efficient Sample-Dependent Dropout Module

Contextual Dropout: An Efficient Sample-Dependent Dropout Module

URL: http://arxiv.org/abs/2103.04181v1
Date: Sat, 6 Mar 2021 19:30:32 GMT
Title: Contextual Dropout: An Efficient Sample-Dependent Dropout Module
Authors: Xinjie Fan, Shujian Zhang, Korawat Tanwisuth, Xiaoning Qian, Mingyuan Zhou
Abstract summary: Dropout has been demonstrated as a simple and effective module to regularize the training process of deep neural networks. We propose contextual dropout with an efficient structural design as a simple and scalable sample-dependent dropout module. Our experimental results show that the proposed method outperforms baseline methods in terms of both accuracy and quality of uncertainty estimation.
Score: 60.63525456640462
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dropout has been demonstrated as a simple and effective module to not only regularize the training process of deep neural networks, but also provide the uncertainty estimation for prediction. However, the quality of uncertainty estimation is highly dependent on the dropout probabilities. Most current models use the same dropout distributions across all data samples due to its simplicity. Despite the potential gains in the flexibility of modeling uncertainty, sample-dependent dropout, on the other hand, is less explored as it often encounters scalability issues or involves non-trivial model changes. In this paper, we propose contextual dropout with an efficient structural design as a simple and scalable sample-dependent dropout module, which can be applied to a wide range of models at the expense of only slightly increased memory and computational cost. We learn the dropout probabilities with a variational objective, compatible with both Bernoulli dropout and Gaussian dropout. We apply the contextual dropout module to various models with applications to image classification and visual question answering and demonstrate the scalability of the method with large-scale datasets, such as ImageNet and VQA 2.0. Our experimental results show that the proposed method outperforms baseline methods in terms of both accuracy and quality of uncertainty estimation.

Related papers

Evidence Networks: simple losses for fast, amortized, neural Bayesian model comparison [0.0]
Evidence Networks can enable Bayesian model comparison when state-of-the-art methods fail. We introduce the leaky parity-odd power transform, leading to the novel l-POP-Exponential'' loss function. We show that Evidence Networks are explicitly independent of dimensionality of the parameter space and scale mildly with the complexity of the posterior probability density function.
arXiv Detail & Related papers (2023-05-18T18:14:53Z)
Decision-based iterative fragile watermarking for model integrity verification [33.42076236847454]
Foundation models are typically hosted on cloud servers to meet the high demand for their services. This exposes them to security risks, as attackers can modify them after uploading to the cloud or transferring from a local system. We propose an iterative decision-based fragile watermarking algorithm that transforms normal training samples into fragile samples that are sensitive to model changes.
arXiv Detail & Related papers (2023-05-13T10:36:11Z)
Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method. We develop practical bounds to apply it to language generation. We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z)
Learning Multivariate CDFs and Copulas using Tensor Factorization [39.24470798045442]
Learning the multivariate distribution of data is a core challenge in statistics and machine learning. In this work, we aim to learn multivariate cumulative distribution functions (CDFs), as they can handle mixed random variables. We show that any grid sampled version of a joint CDF of mixed random variables admits a universal representation as a naive Bayes model. We demonstrate the superior performance of the proposed model in several synthetic and real datasets and applications including regression, sampling and data imputation.
arXiv Detail & Related papers (2022-10-13T16:18:46Z)
Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution. This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes. Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z)
Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective. Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z)
Know Where To Drop Your Weights: Towards Faster Uncertainty Estimation [7.605814048051737]
Estimating uncertainty of models used in low-latency applications is a challenge due to the computationally demanding nature of uncertainty estimation techniques. We propose Select-DC which uses a subset of layers in a neural network to model uncertainty with MCDC. We show a significant reduction in the GFLOPS required to model uncertainty, compared to Monte Carlo DropConnect, with marginal trade-off in performance.
arXiv Detail & Related papers (2020-10-27T02:56:27Z)
Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization [62.8384110757689]
Overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs) The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate. We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets.
arXiv Detail & Related papers (2020-10-11T13:19:58Z)
Dropout Strikes Back: Improved Uncertainty Estimation via Diversity Sampling [3.077929914199468]
We show that modifying the sampling distributions for dropout layers in neural networks improves the quality of uncertainty estimation. Our main idea consists of two main steps: computing data-driven correlations between neurons and generating samples, which include maximally diverse neurons.
arXiv Detail & Related papers (2020-03-06T15:20:04Z)
Learnable Bernoulli Dropout for Bayesian Deep Learning [53.79615543862426]
Learnable Bernoulli dropout (LBD) is a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters. LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation.
arXiv Detail & Related papers (2020-02-12T18:57:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.