Redesigning the classification layer by randomizing the class
representation vectors
- URL: http://arxiv.org/abs/2011.08704v2
- Date: Sun, 29 Nov 2020 08:32:23 GMT
- Title: Redesigning the classification layer by randomizing the class
representation vectors
- Authors: Gabi Shalev and Gal-Lev Shalev and Joseph Keshet
- Abstract summary: We analyze how simple design choices for the classification layer affect the learning dynamics.
We show that the standard cross-entropy training implicitly captures visual similarities between different classes.
We propose to draw the class vectors randomly and set them as fixed during training, thus invalidating the visual similarities encoded in these vectors.
- Score: 12.953517767147998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural image classification models typically consist of two components. The
first is an image encoder, which is responsible for encoding a given raw image
into a representative vector. The second is the classification component, which
is often implemented by projecting the representative vector onto target class
vectors. The target class vectors, along with the rest of the model parameters,
are estimated so as to minimize the loss function. In this paper, we analyze
how simple design choices for the classification layer affect the learning
dynamics. We show that the standard cross-entropy training implicitly captures
visual similarities between different classes, which might deteriorate accuracy
or even prevents some models from converging. We propose to draw the class
vectors randomly and set them as fixed during training, thus invalidating the
visual similarities encoded in these vectors. We analyze the effects of keeping
the class vectors fixed and show that it can increase the inter-class
separability, intra-class compactness, and the overall model accuracy, while
maintaining the robustness to image corruptions and the generalization of the
learned concepts.
Related papers
- Knowledge Composition using Task Vectors with Learned Anisotropic Scaling [51.4661186662329]
We introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level.
We show that such linear combinations explicitly exploit the low intrinsicity of pre-trained models, with only a few coefficients being the learnable parameters.
We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives.
arXiv Detail & Related papers (2024-07-03T07:54:08Z) - Simple-Sampling and Hard-Mixup with Prototypes to Rebalance Contrastive Learning for Text Classification [11.072083437769093]
We propose a novel model named SharpReCL for imbalanced text classification tasks.
Our model even outperforms popular large language models across several datasets.
arXiv Detail & Related papers (2024-05-19T11:33:49Z) - Ablation Study to Clarify the Mechanism of Object Segmentation in
Multi-Object Representation Learning [3.921076451326107]
Multi-object representation learning aims to represent complex real-world visual input using the composition of multiple objects.
It is not clear how previous methods have achieved the appropriate segmentation of individual objects.
Most of the previous methods regularize the latent vectors using a Variational Autoencoder (VAE)
arXiv Detail & Related papers (2023-10-05T02:59:48Z) - Unicom: Universal and Compact Representation Learning for Image
Retrieval [65.96296089560421]
We cluster the large-scale LAION400M into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model.
To alleviate such conflict, we randomly select partial inter-class prototypes to construct the margin-based softmax loss.
Our method significantly outperforms state-of-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks.
arXiv Detail & Related papers (2023-04-12T14:25:52Z) - Neural Representations Reveal Distinct Modes of Class Fitting in
Residual Convolutional Networks [5.1271832547387115]
We leverage probabilistic models of neural representations to investigate how residual networks fit classes.
We find that classes in the investigated models are not fitted in an uniform way.
We show that the uncovered structure in neural representations correlate with robustness of training examples and adversarial memorization.
arXiv Detail & Related papers (2022-12-01T18:55:58Z) - Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification.
The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z) - On the rate of convergence of a classifier based on a Transformer
encoder [55.41148606254641]
The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed.
It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability satisfies a suitable hierarchical composition model.
arXiv Detail & Related papers (2021-11-29T14:58:29Z) - GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot
Action Recognition [33.23662792742078]
We propose a two-stage deep neural network for zero-shot action recognition.
In the sampling stage, we utilize a generative adversarial networks (GAN) trained by action features and word vectors of seen classes.
In the classification stage, we construct a knowledge graph based on the relationship between word vectors of action classes and related objects.
arXiv Detail & Related papers (2021-05-25T09:34:42Z) - Counterfactual Generative Networks [59.080843365828756]
We propose to decompose the image generation process into independent causal mechanisms that we train without direct supervision.
By exploiting appropriate inductive biases, these mechanisms disentangle object shape, object texture, and background.
We show that the counterfactual images can improve out-of-distribution with a marginal drop in performance on the original classification task.
arXiv Detail & Related papers (2021-01-15T10:23:12Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.