Related papers: R-Block: Regularized Block of Dropout for convolutional networks

R-Block: Regularized Block of Dropout for convolutional networks

URL: http://arxiv.org/abs/2307.15150v1
Date: Thu, 27 Jul 2023 18:53:14 GMT
Title: R-Block: Regularized Block of Dropout for convolutional networks
Authors: Liqi Wang, Qiya Hu
Abstract summary: Dropout as a regularization technique is widely used in fully connected layers while is less effective in convolutional layers. In this paper, we apply a mutual learning training strategy for convolutional layer regularization, namely R-Block. We show that R-Block achieves better performance than other existing structured dropout variants.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dropout as a regularization technique is widely used in fully connected layers while is less effective in convolutional layers. Therefore more structured forms of dropout have been proposed to regularize convolutional networks. The disadvantage of these methods is that the randomness introduced causes inconsistency between training and inference. In this paper, we apply a mutual learning training strategy for convolutional layer regularization, namely R-Block, which forces two outputs of the generated difference maximizing sub models to be consistent with each other. Concretely, R-Block minimizes the losses between the output distributions of two sub models with different drop regions for each sample in the training dataset. We design two approaches to construct such sub models. Our experiments demonstrate that R-Block achieves better performance than other existing structured dropout variants. We also demonstrate that our approaches to construct sub models outperforms others.

Related papers

Training Implicit Generative Models via an Invariant Statistical Loss [3.139474253994318]
Implicit generative models have the capability to learn arbitrary complex data distributions. On the downside, training requires telling apart real data from artificially-generated ones using adversarial discriminators. We develop a discriminator-free method for training one-dimensional (1D) generative implicit models.
arXiv Detail & Related papers (2024-02-26T09:32:28Z)
Layer-wise Regularized Dropout for Neural Language Models [57.422407462430186]
Layer-wise Regularized Dropout (LR-Drop) is specially designed for Transformer-based Language models. We show that LR-Drop achieves superior performances, including state-of-the-art results.
arXiv Detail & Related papers (2024-02-26T07:31:35Z)
Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS) Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher. Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z)
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces. We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z)
Phantom Embeddings: Using Embedding Space for Model Regularization in Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data. The complex models tend to memorize the training data, which results in poor regularization performance on test data. We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z)
Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data. Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Learning to Solve Routing Problems via Distributionally Robust Optimization [14.506553345693536]
Recent deep models for solving routing problems assume a single distribution of nodes for training, which severely impairs their cross-distribution generalization ability. We exploit group distributionally robust optimization (group DRO) to tackle this issue, where we jointly optimize the weights for different groups of distributions and the parameters for the deep model in an interleaved manner during training. We also design a module based on convolutional neural network, which allows the deep model to learn more informative latent pattern among the nodes.
arXiv Detail & Related papers (2022-02-15T08:06:44Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks. In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
DropCluster: A structured dropout for convolutional networks [5.995452890465241]
Dropout as a common regularizer to prevent overfitting in deep neural networks has been less effective in convolutional layers than in fully connected layers.<n>This is because Dropout drops features randomly, without considering local structure.<n>In this work, we leverage the structure in the outputs of convolutional layers and introduce a novel structured regularization method named DropCluster.
arXiv Detail & Related papers (2020-02-07T20:02:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.