Balanced Product of Calibrated Experts for Long-Tailed Recognition
- URL: http://arxiv.org/abs/2206.05260v3
- Date: Wed, 7 Jun 2023 17:52:01 GMT
- Title: Balanced Product of Calibrated Experts for Long-Tailed Recognition
- Authors: Emanuel Sanchez Aimar, Arvi Jonnarth, Michael Felsberg, Marco Kuhlmann
- Abstract summary: Many real-world recognition problems are characterized by long-tailed label distributions.
In this work, we take an analytical approach and extend the notion of logit adjustment to ensembles to form a Balanced Product of Experts (BalPoE)
We show how to properly define these distributions and combine the experts in order to achieve unbiased predictions.
- Score: 13.194151879344487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world recognition problems are characterized by long-tailed label
distributions. These distributions make representation learning highly
challenging due to limited generalization over the tail classes. If the test
distribution differs from the training distribution, e.g. uniform versus
long-tailed, the problem of the distribution shift needs to be addressed. A
recent line of work proposes learning multiple diverse experts to tackle this
issue. Ensemble diversity is encouraged by various techniques, e.g. by
specializing different experts in the head and the tail classes. In this work,
we take an analytical approach and extend the notion of logit adjustment to
ensembles to form a Balanced Product of Experts (BalPoE). BalPoE combines a
family of experts with different test-time target distributions, generalizing
several previous approaches. We show how to properly define these distributions
and combine the experts in order to achieve unbiased predictions, by proving
that the ensemble is Fisher-consistent for minimizing the balanced error. Our
theoretical analysis shows that our balanced ensemble requires calibrated
experts, which we achieve in practice using mixup. We conduct extensive
experiments and our method obtains new state-of-the-art results on three
long-tailed datasets: CIFAR-100-LT, ImageNet-LT, and iNaturalist-2018. Our code
is available at https://github.com/emasa/BalPoE-CalibratedLT.
Related papers
- Generalizing to any diverse distribution: uniformity, gentle finetuning and rebalancing [55.791818510796645]
We aim to develop models that generalize well to any diverse test distribution, even if the latter deviates significantly from the training data.
Various approaches like domain adaptation, domain generalization, and robust optimization attempt to address the out-of-distribution challenge.
We adopt a more conservative perspective by accounting for the worst-case error across all sufficiently diverse test distributions within a known domain.
arXiv Detail & Related papers (2024-10-08T12:26:48Z) - Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition [114.96385572118042]
We argue that the variation in test label distributions can be broken down hierarchically into global and local levels.
We propose a new MoE strategy, $mathsfDirMixE$, which assigns experts to different Dirichlet meta-distributions of the label distribution.
We show that our proposed objective benefits from enhanced generalization by virtue of the variance-based regularization.
arXiv Detail & Related papers (2024-05-13T14:24:56Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Divide and not forget: Ensemble of selectively trained experts in Continual Learning [0.2886273197127056]
Class-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know.
A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task.
SEED selects only one, the most optimal expert for a considered task, and uses data from this task to fine-tune only this expert.
arXiv Detail & Related papers (2024-01-18T18:25:29Z) - Relieving Long-tailed Instance Segmentation via Pairwise Class Balance [85.53585498649252]
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes.
It causes severe biases of the head classes (with majority samples) against the tailed ones.
We propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences.
arXiv Detail & Related papers (2022-01-08T07:48:36Z) - Unbiased Gradient Estimation with Balanced Assignments for Mixtures of
Experts [32.43213645631101]
Training large-scale mixture of experts models efficiently requires assigning datapoints in a batch to different experts, each with a limited capacity.
Recently proposed assignment procedures lack a probabilistic interpretation and use biased estimators for training.
We propose two unbiased estimators based on principled assignment procedures: one that skips datapoints which exceed expert capacity, and one that samples perfectly balanced assignments.
arXiv Detail & Related papers (2021-09-24T09:02:12Z) - Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse
Experts with Self-Supervision [85.07855130048951]
We study a more practical task setting, called test-agnostic long-tailed recognition, where the training class distribution is long-tailed.
We propose a new method, called Test-time Aggregating Diverse Experts (TADE), that trains diverse experts to excel at handling different test distributions.
We theoretically show that our method has provable ability to simulate unknown test class distributions.
arXiv Detail & Related papers (2021-07-20T04:10:31Z) - Long-tailed Recognition by Routing Diverse Distribution-Aware Experts [64.71102030006422]
We propose a new long-tailed classifier called RoutIng Diverse Experts (RIDE)
It reduces the model variance with multiple experts, reduces the model bias with a distribution-aware diversity loss, reduces the computational cost with a dynamic expert routing module.
RIDE outperforms the state-of-the-art by 5% to 7% on CIFAR100-LT, ImageNet-LT and iNaturalist 2018 benchmarks.
arXiv Detail & Related papers (2020-10-05T06:53:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.