R-Block: Regularized Block of Dropout for convolutional networks
- URL: http://arxiv.org/abs/2307.15150v1
- Date: Thu, 27 Jul 2023 18:53:14 GMT
- Title: R-Block: Regularized Block of Dropout for convolutional networks
- Authors: Liqi Wang, Qiya Hu
- Abstract summary: Dropout as a regularization technique is widely used in fully connected layers while is less effective in convolutional layers.
In this paper, we apply a mutual learning training strategy for convolutional layer regularization, namely R-Block.
We show that R-Block achieves better performance than other existing structured dropout variants.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dropout as a regularization technique is widely used in fully connected
layers while is less effective in convolutional layers. Therefore more
structured forms of dropout have been proposed to regularize convolutional
networks. The disadvantage of these methods is that the randomness introduced
causes inconsistency between training and inference. In this paper, we apply a
mutual learning training strategy for convolutional layer regularization,
namely R-Block, which forces two outputs of the generated difference maximizing
sub models to be consistent with each other. Concretely, R-Block minimizes the
losses between the output distributions of two sub models with different drop
regions for each sample in the training dataset. We design two approaches to
construct such sub models. Our experiments demonstrate that R-Block achieves
better performance than other existing structured dropout variants. We also
demonstrate that our approaches to construct sub models outperforms others.
Related papers
- Training Implicit Generative Models via an Invariant Statistical Loss [3.139474253994318]
Implicit generative models have the capability to learn arbitrary complex data distributions.
On the downside, training requires telling apart real data from artificially-generated ones using adversarial discriminators.
We develop a discriminator-free method for training one-dimensional (1D) generative implicit models.
arXiv Detail & Related papers (2024-02-26T09:32:28Z) - Layer-wise Regularized Dropout for Neural Language Models [57.422407462430186]
Layer-wise Regularized Dropout (LR-Drop) is specially designed for Transformer-based Language models.
We show that LR-Drop achieves superior performances, including state-of-the-art results.
arXiv Detail & Related papers (2024-02-26T07:31:35Z) - Lightweight Diffusion Models with Distillation-Based Block Neural
Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS)
Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher.
Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z) - Phantom Embeddings: Using Embedding Space for Model Regularization in
Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data.
The complex models tend to memorize the training data, which results in poor regularization performance on test data.
We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Learning to Solve Routing Problems via Distributionally Robust
Optimization [14.506553345693536]
Recent deep models for solving routing problems assume a single distribution of nodes for training, which severely impairs their cross-distribution generalization ability.
We exploit group distributionally robust optimization (group DRO) to tackle this issue, where we jointly optimize the weights for different groups of distributions and the parameters for the deep model in an interleaved manner during training.
We also design a module based on convolutional neural network, which allows the deep model to learn more informative latent pattern among the nodes.
arXiv Detail & Related papers (2022-02-15T08:06:44Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.