DropCluster: A structured dropout for convolutional networks
- URL: http://arxiv.org/abs/2002.02997v1
- Date: Fri, 7 Feb 2020 20:02:47 GMT
- Title: DropCluster: A structured dropout for convolutional networks
- Authors: Liyan Chen, Philip Gautier, Sergul Aydore
- Abstract summary: Dropout as a regularizer in deep neural networks has been less effective in convolutional layers than in fully connected layers.
We introduce a novel structured regularization for convolutional layers, which we call DropCluster.
Our approach achieves better performance than DropBlock or other existing structured dropout variants.
- Score: 0.7489179288638513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dropout as a regularizer in deep neural networks has been less effective in
convolutional layers than in fully connected layers. This is due to the fact
that dropout drops features randomly. When features are spatially correlated as
in the case of convolutional layers, information about the dropped pixels can
still propagate to the next layers via neighboring pixels. In order to address
this problem, more structured forms of dropout have been proposed. A drawback
of these methods is that they do not adapt to the data. In this work, we
introduce a novel structured regularization for convolutional layers, which we
call DropCluster. Our regularizer relies on data-driven structure. It finds
clusters of correlated features in convolutional layer outputs and drops the
clusters randomly at each iteration. The clusters are learned and updated
during model training so that they adapt both to the data and to the model
weights. Our experiments on the ResNet-50 architecture demonstrate that our
approach achieves better performance than DropBlock or other existing
structured dropout variants. We also demonstrate the robustness of our approach
when the size of training data is limited and when there is corruption in the
data at test time.
Related papers
- R-Block: Regularized Block of Dropout for convolutional networks [0.0]
Dropout as a regularization technique is widely used in fully connected layers while is less effective in convolutional layers.
In this paper, we apply a mutual learning training strategy for convolutional layer regularization, namely R-Block.
We show that R-Block achieves better performance than other existing structured dropout variants.
arXiv Detail & Related papers (2023-07-27T18:53:14Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Revisiting Structured Dropout [11.011268090482577]
textbfemphProbDropBlock drops contiguous blocks from feature maps with a probability given by the normalized feature salience values.
We find that with a simple scheduling strategy the proposed approach to structured Dropout consistently improved model performance compared to baselines.
arXiv Detail & Related papers (2022-10-05T21:26:57Z) - Linear Connectivity Reveals Generalization Strategies [54.947772002394736]
Some pairs of finetuned models have large barriers of increasing loss on the linear paths between them.
We find distinct clusters of models which are linearly connected on the test loss surface, but are disconnected from models outside the cluster.
Our work demonstrates how the geometry of the loss surface can guide models towards different functions.
arXiv Detail & Related papers (2022-05-24T23:43:02Z) - Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training.
We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively.
Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z) - Robustness to Missing Features using Hierarchical Clustering with Split
Neural Networks [39.29536042476913]
We propose a simple yet effective approach that clusters similar input features together using hierarchical clustering.
We evaluate this approach on a series of benchmark datasets and show promising improvements even with simple imputation techniques.
arXiv Detail & Related papers (2020-11-19T00:35:08Z) - Advanced Dropout: A Model-free Methodology for Bayesian Dropout
Optimization [62.8384110757689]
Overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs)
The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate.
We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets.
arXiv Detail & Related papers (2020-10-11T13:19:58Z) - Online Deep Clustering for Unsupervised Representation Learning [108.33534231219464]
Online Deep Clustering (ODC) performs clustering and network update simultaneously rather than alternatingly.
We design and maintain two dynamic memory modules, i.e., samples memory to store samples labels and features, and centroids memory for centroids evolution.
In this way, labels and the network evolve shoulder-to-shoulder rather than alternatingly.
arXiv Detail & Related papers (2020-06-18T16:15:46Z) - Reusing Trained Layers of Convolutional Neural Networks to Shorten
Hyperparameters Tuning Time [1.160208922584163]
This paper describes a proposal to reuse the weights of hidden (convolutional) layers among different trainings to shorten this process.
The experiments compare the training time and the validation loss when reusing and not reusing convolutional layers.
They confirm that this strategy reduces the training time while it even increases the accuracy of the resulting neural network.
arXiv Detail & Related papers (2020-06-16T11:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.