COBias and Debias: Balancing Class Accuracies for Language Models in Inference Time via Nonlinear Integer Programming
- URL: http://arxiv.org/abs/2405.07623v5
- Date: Wed, 29 Jan 2025 07:07:54 GMT
- Title: COBias and Debias: Balancing Class Accuracies for Language Models in Inference Time via Nonlinear Integer Programming
- Authors: Ruixi Lin, Yang You,
- Abstract summary: This paper investigates a fundamental inference-time problem in language models: imbalanced class accuracies.<n>We find what's underneath the issue is a tendency to over-predict some classes while under-predicting some others.<n>We show it can be effectively mitigated via inference-time optimization.
- Score: 12.287692969438169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are good knowledge bases but struggle to perform equally well for all classes in text classification tasks. This paper investigates a fundamental inference-time problem in language models: imbalanced class accuracies. We find what's underneath the issue is a tendency to over-predict some classes while under-predicting some others. This class accuracy imbalance is difficult to solve from the root via better pre-training or fine-tuning strategies, but we show it can be effectively mitigated via inference-time combinatorial optimization. To this end, we conceptualize and quantify the over- and under-prediction issue as the Contextual Oddity Bias (COBias), and propose the Debiasing as Nonlinear Integer Programming (DNIP) model to correct in-context learned class probabilities based on minimizing COBias and maximizing overall accuracy, without LLM parameter update. Considering that the DNIP model implicitly contains non-differentiable elements, we therefore use the simulated annealing algorithm to solve it. Extensive evaluations on three LLMs across seven NLP classification tasks in different prompting settings show that DNIP simultaneously achieves significant COBias reduction (-27%) and accuracy improvement (+12%) over the conventional ICL approach, suggesting that inference-time mitigation of class accuracy imbalance is a promising direction to push forward LLM performances.
Related papers
- Let the Fuzzy Rule Speak: Enhancing In-context Learning Debiasing with Interpretability [12.287692969438169]
Large language models (LLMs) often struggle with balanced class accuracy in text classification tasks using in-context learning (ICL)
This paper delves deeper into the class accuracy imbalance issue, identifying that it arises because certain classes consistently receive disproportionately high ICL probabilities.
We introduce FuRud, a method for sample-level class probability correction.
arXiv Detail & Related papers (2024-12-26T01:56:42Z) - Covariance-corrected Whitening Alleviates Network Degeneration on Imbalanced Classification [6.197116272789107]
Class imbalance is a critical issue in image classification that significantly affects the performance of deep recognition models.
We propose a novel framework called Whitening-Net to mitigate the degenerate solutions.
In scenarios with extreme class imbalance, the batch covariance statistic exhibits significant fluctuations, impeding the convergence of the whitening operation.
arXiv Detail & Related papers (2024-08-30T10:49:33Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Preference Learning Algorithms Do Not Learn Preference Rankings [62.335733662381884]
We study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs.
We find that most state-of-the-art preference-tuned models achieve a ranking accuracy of less than 60% on common preference datasets.
arXiv Detail & Related papers (2024-05-29T21:29:44Z) - Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models [39.82130327284791]
Large Language Models (LLMs) have demonstrated impressive zero-shot capabilities and versatility in NLP tasks.
They sometimes fail to maintain crucial invariances for specific tasks.
This paper addresses this inefficiency at inference time.
arXiv Detail & Related papers (2024-03-20T13:38:07Z) - A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation [121.0693322732454]
Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity.
Recent research has focused on developing efficient fine-tuning methods to enhance CLIP's performance in downstream tasks.
We revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP.
arXiv Detail & Related papers (2024-02-06T15:45:27Z) - Understanding the Detrimental Class-level Effects of Data Augmentation [63.1733767714073]
achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet.
We present a framework for understanding how DA interacts with class-level learning dynamics.
We show that simple class-conditional augmentation strategies improve performance on the negatively affected classes.
arXiv Detail & Related papers (2023-12-07T18:37:43Z) - Online Continual Learning via Logit Adjusted Softmax [24.327176079085703]
Inter-class imbalance during training has been identified as a major cause of forgetting.
We present a simple adjustment of model logits during training can effectively resist prior class bias.
Our proposed method, Logit Adjusted Softmax, can mitigate the impact of inter-class imbalance not only in class-incremental but also in realistic general setups.
arXiv Detail & Related papers (2023-11-11T03:03:33Z) - Fine-tune Language Models to Approximate Unbiased In-context Learning [8.609157988755896]
We introduce a reweighted algorithm called RICL (Reweighted In-context Learning)
This algorithm fine-tunes language models using an unbiased validation set to determine the optimal weight for each input-output example.
We also introduce a low-cost reweighted algorithm, a linear optimal weight approximation algorithm called LARICL.
arXiv Detail & Related papers (2023-10-05T06:16:01Z) - Semi-Supervised Learning with Multiple Imputations on Non-Random Missing
Labels [0.0]
Semi-Supervised Learning (SSL) is implemented when algorithms are trained on both labeled and unlabeled data.
This paper proposes two new methods of combining multiple imputation models to achieve higher accuracy and less bias.
arXiv Detail & Related papers (2023-08-15T04:09:53Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - Oracle Inequalities for Model Selection in Offline Reinforcement
Learning [105.74139523696284]
We study the problem of model selection in offline RL with value function approximation.
We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal inequalities up to logarithmic factors.
We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.
arXiv Detail & Related papers (2022-11-03T17:32:34Z) - Fairly Accurate: Learning Optimal Accuracy vs. Fairness Tradeoffs for
Hate Speech Detection [8.841221697099687]
We introduce a differentiable measure that enables direct optimization of group fairness in model training.
We evaluate our methods on the specific task of hate speech detection.
Empirical results across convolutional, sequential, and transformer-based neural architectures show superior empirical accuracy vs. fairness trade-offs over prior work.
arXiv Detail & Related papers (2022-04-15T22:11:25Z) - A Gating Model for Bias Calibration in Generalized Zero-shot Learning [18.32369721322249]
Generalized zero-shot learning (GZSL) aims at training a model that can generalize to unseen class data by only using auxiliary information.
One of the main challenges in GZSL is a biased model prediction toward seen classes caused by overfitting on only available seen class data during training.
We propose a two-stream autoencoder-based gating model for GZSL.
arXiv Detail & Related papers (2022-03-08T16:41:06Z) - The Interplay between Distribution Parameters and the
Accuracy-Robustness Tradeoff in Classification [0.0]
Adrial training tends to result in models that are less accurate on natural (unperturbed) examples compared to standard models.
This can be attributed to either an algorithmic shortcoming or a fundamental property of the training data distribution.
In this work, we focus on the latter case under a binary Gaussian mixture classification problem.
arXiv Detail & Related papers (2021-07-01T06:57:50Z) - PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes.
We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training.
Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - Generalized Zero-Shot Learning Via Over-Complete Distribution [79.5140590952889]
We propose to generate an Over-Complete Distribution (OCD) using Conditional Variational Autoencoder (CVAE) of both seen and unseen classes.
The effectiveness of the framework is evaluated using both Zero-Shot Learning and Generalized Zero-Shot Learning protocols.
arXiv Detail & Related papers (2020-04-01T19:05:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.