ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training
- URL: http://arxiv.org/abs/2205.06265v3
- Date: Sat, 20 Apr 2024 23:40:53 GMT
- Title: ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training
- Authors: Yue Zhao, Yantao Shen, Yuanjun Xiong, Shuo Yang, Wei Xia, Zhuowen Tu, Bernt Schiele, Stefano Soatto,
- Abstract summary: Existing methods to reduce the negative flip rate (NFR) either do so at the expense of overall accuracy by forcing a new model to imitate the old models, or use ensembles.
We analyze the role of ensembles in reducing NFR and observe that they remove negative flips that are typically not close to the decision boundary.
We present a method, called Ensemble Logit Difference Inhibition (ELODI), to train a classification system that achieves paragon performance in both error rate and NFR.
- Score: 110.52785254565518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Negative flips are errors introduced in a classification system when a legacy model is updated. Existing methods to reduce the negative flip rate (NFR) either do so at the expense of overall accuracy by forcing a new model to imitate the old models, or use ensembles, which multiply inference cost prohibitively. We analyze the role of ensembles in reducing NFR and observe that they remove negative flips that are typically not close to the decision boundary, but often exhibit large deviations in the distance among their logits. Based on the observation, we present a method, called Ensemble Logit Difference Inhibition (ELODI), to train a classification system that achieves paragon performance in both error rate and NFR, at the inference cost of a single model. The method distills a homogeneous ensemble to a single student model which is used to update the classification system. ELODI also introduces a generalized distillation objective, Logit Difference Inhibition (LDI), which only penalizes the logit difference of a subset of classes with the highest logit values. On multiple image classification benchmarks, model updates with ELODI demonstrate superior accuracy retention and NFR reduction.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Bias Mitigating Few-Shot Class-Incremental Learning [17.185744533050116]
Few-shot class-incremental learning aims at recognizing novel classes continually with limited novel class samples.
Recent methods somewhat alleviate the accuracy imbalance between base and incremental classes by fine-tuning the feature extractor in the incremental sessions.
We propose a novel method to mitigate model bias of the FSCIL problem during training and inference processes.
arXiv Detail & Related papers (2024-02-01T10:37:41Z) - Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning [62.40718385934608]
We introduce a novel Pseudo-Negative Regularization (PNR) framework for effective continual self-supervised learning (CSSL)
Our PNR leverages pseudo-negatives obtained through model-based augmentation in a way that newly learned representations may not contradict what has been learned in the past.
arXiv Detail & Related papers (2023-06-08T10:59:35Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn.
We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z) - Positive-Congruent Training: Towards Regression-Free Model Updates [87.25247195148187]
In image classification, sample-wise inconsistencies appear as "negative flips"
A new model incorrectly predicts the output for a test sample that was correctly classified by the old (reference) model.
We propose a simple approach for PC training, Focal Distillation, which enforces congruence with the reference model.
arXiv Detail & Related papers (2020-11-18T09:00:44Z) - Counterfactual fairness: removing direct effects through regularization [0.0]
We propose a new definition of fairness that incorporates causality through the Controlled Direct Effect (CDE)
We develop regularizations to tackle classical fairness measures and present a causal regularization that satisfies our new fairness definition.
Our results were found to mitigate unfairness from the predictions with small reductions in model performance.
arXiv Detail & Related papers (2020-02-25T10:13:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.