Regular Polytope Networks
- URL: http://arxiv.org/abs/2103.15632v1
- Date: Mon, 29 Mar 2021 14:11:32 GMT
- Title: Regular Polytope Networks
- Authors: Federico Pernici and Matteo Bruni and Claudio Baecchi and Alberto Del
Bimbo
- Abstract summary: We argue that a transformation can be fixed with no loss of accuracy and with a reduction in memory usage.
It can also be used to learn stationary and maximally separated embeddings.
We show that the stationarity of the embedding and its maximal separated representation can be theoretically justified.
- Score: 29.44144177954405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks are widely used as a model for classification in a large
variety of tasks. Typically, a learnable transformation (i.e. the classifier)
is placed at the end of such models returning a value for each class used for
classification. This transformation plays an important role in determining how
the generated features change during the learning process. In this work, we
argue that this transformation not only can be fixed (i.e. set as
non-trainable) with no loss of accuracy and with a reduction in memory usage,
but it can also be used to learn stationary and maximally separated embeddings.
We show that the stationarity of the embedding and its maximal separated
representation can be theoretically justified by setting the weights of the
fixed classifier to values taken from the coordinate vertices of the three
regular polytopes available in $\mathbb{R}^d$, namely: the $d$-Simplex, the
$d$-Cube and the $d$-Orthoplex. These regular polytopes have the maximal amount
of symmetry that can be exploited to generate stationary features angularly
centered around their corresponding fixed weights. Our approach improves and
broadens the concept of a fixed classifier, recently proposed in
\cite{hoffer2018fix}, to a larger class of fixed classifier models.
Experimental results confirm the theoretical analysis, the generalization
capability, the faster convergence and the improved performance of the proposed
method. Code will be publicly available.
Related papers
- Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method [53.170053108447455]
Ensemble learning is a method that leverages weak learners to produce a strong learner.
We design a smooth and convex objective function that leverages the concept of margin, making the strong learner more discriminative.
We then compare our algorithm with random forests of ten times the size and other classical methods across numerous datasets.
arXiv Detail & Related papers (2024-08-06T03:42:38Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class
Incremental Learning [120.53458753007851]
Few-shot class-incremental learning (FSCIL) has been a challenging problem as only a few training samples are accessible for each novel class in the new sessions.
We deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse.
We propose a neural collapse inspired framework for FSCIL. Experiments on the miniImageNet, CUB-200, and CIFAR-100 datasets demonstrate that our proposed framework outperforms the state-of-the-art performances.
arXiv Detail & Related papers (2023-02-06T18:39:40Z) - Maximally Compact and Separated Features with Regular Polytope Networks [22.376196701232388]
We show how to extract from CNNs features the properties of emphmaximum inter-class separability and emphmaximum intra-class compactness.
We obtain features similar to what can be obtained with the well-known citewen2016discriminative and other similar approaches.
arXiv Detail & Related papers (2023-01-15T15:20:57Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - Prediction Calibration for Generalized Few-shot Semantic Segmentation [101.69940565204816]
Generalized Few-shot Semantic (GFSS) aims to segment each image pixel into either base classes with abundant training examples or novel classes with only a handful of (e.g., 1-5) training images per class.
We build a cross-attention module that guides the classifier's final prediction using the fused multi-level features.
Our PCN outperforms the state-the-art alternatives by large margins.
arXiv Detail & Related papers (2022-10-15T13:30:12Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - Exploring Category-correlated Feature for Few-shot Image Classification [27.13708881431794]
We present a simple yet effective feature rectification method by exploring the category correlation between novel and base classes as the prior knowledge.
The proposed approach consistently obtains considerable performance gains on three widely used benchmarks.
arXiv Detail & Related papers (2021-12-14T08:25:24Z) - More Is More -- Narrowing the Generalization Gap by Adding
Classification Heads [8.883733362171032]
We introduce an architecture enhancement for existing neural network models based on input transformations, termed 'TransNet'
Our model can be employed during training time only and then pruned for prediction, resulting in an equivalent architecture to the base model.
arXiv Detail & Related papers (2021-02-09T16:30:33Z) - A Multiple Classifier Approach for Concatenate-Designed Neural Networks [13.017053017670467]
We give the design of the classifiers, which collects the features produced between the network sets.
We use the L2 normalization method to obtain the classification score instead of the Softmax Dense.
As a result, the proposed classifiers are able to improve the accuracy in the experimental cases.
arXiv Detail & Related papers (2021-01-14T04:32:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.