Preserving Fine-Grain Feature Information in Classification via Entropic
Regularization
- URL: http://arxiv.org/abs/2208.03684v1
- Date: Sun, 7 Aug 2022 09:25:57 GMT
- Title: Preserving Fine-Grain Feature Information in Classification via Entropic
Regularization
- Authors: Raphael Baena, Lucas Drumetz, Vincent Gripon
- Abstract summary: We show that standard cross-entropy can lead to overfitting to coarse-related features.
We introduce an entropy-based regularization to promote more diversity in the feature space of trained models.
- Score: 10.358087436626391
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Labeling a classification dataset implies to define classes and associated
coarse labels, that may approximate a smoother and more complicated ground
truth. For example, natural images may contain multiple objects, only one of
which is labeled in many vision datasets, or classes may result from the
discretization of a regression problem. Using cross-entropy to train
classification models on such coarse labels is likely to roughly cut through
the feature space, potentially disregarding the most meaningful such features,
in particular losing information on the underlying fine-grain task. In this
paper we are interested in the problem of solving fine-grain classification or
regression, using a model trained on coarse-grain labels only. We show that
standard cross-entropy can lead to overfitting to coarse-related features. We
introduce an entropy-based regularization to promote more diversity in the
feature space of trained models, and empirically demonstrate the efficacy of
this methodology to reach better performance on the fine-grain problems. Our
results are supported through theoretical developments and empirical
validation.
Related papers
- Enhancing Fine-Grained Visual Recognition in the Low-Data Regime Through Feature Magnitude Regularization [23.78498670529746]
We introduce a regularization technique to ensure that the magnitudes of the extracted features are evenly distributed.
Despite its apparent simplicity, our approach has demonstrated significant performance improvements across various fine-grained visual recognition datasets.
arXiv Detail & Related papers (2024-09-03T07:32:46Z) - For Better or For Worse? Learning Minimum Variance Features With Label Augmentation [7.183341902583164]
In this work, we analyze the role played by the label augmentation aspect of data augmentation methods.
We first prove that linear models on binary classification data trained with label augmentation learn only the minimum variance features in the data.
We then use our techniques to show that even for nonlinear models and general data distributions, the label smoothing and Mixup losses are lower bounded by a function of the model output variance.
arXiv Detail & Related papers (2024-02-10T01:36:39Z) - Constructing Balance from Imbalance for Long-tailed Image Recognition [50.6210415377178]
The imbalance between majority (head) classes and minority (tail) classes severely skews the data-driven deep neural networks.
Previous methods tackle with data imbalance from the viewpoints of data distribution, feature space, and model design.
We propose a concise paradigm by progressively adjusting label space and dividing the head classes and tail classes.
Our proposed model also provides a feature evaluation method and paves the way for long-tailed feature learning.
arXiv Detail & Related papers (2022-08-04T10:22:24Z) - Graph Attention Transformer Network for Multi-Label Image Classification [50.0297353509294]
We propose a general framework for multi-label image classification that can effectively mine complex inter-label relationships.
Our proposed methods can achieve state-of-the-art performance on three datasets.
arXiv Detail & Related papers (2022-03-08T12:39:05Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Model-Change Active Learning in Graph-Based Semi-Supervised Learning [5.174023161939957]
"Model Change" active learning quantifies the resulting change by introducing the additional label(s)
We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution.
arXiv Detail & Related papers (2021-10-14T21:47:10Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Towards Robust Classification Model by Counterfactual and Invariant Data
Generation [7.488317734152585]
Spuriousness occurs when some features correlate with labels but are not causal.
We propose two data generation processes to reduce spuriousness.
Our data generations outperform state-of-the-art methods in accuracy when spurious correlations break.
arXiv Detail & Related papers (2021-06-02T12:48:29Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.