Leveraging Biases in Large Language Models: "bias-kNN'' for Effective
  Few-Shot Learning
        - URL: http://arxiv.org/abs/2401.09783v1
- Date: Thu, 18 Jan 2024 08:05:45 GMT
- Title: Leveraging Biases in Large Language Models: "bias-kNN'' for Effective
  Few-Shot Learning
- Authors: Yong Zhang, Hanzhang Li, Zhitao Li, Ning Cheng, Ming Li, Jing Xiao,
  Jianzong Wang
- Abstract summary: This study introduces a novel methodology named bias-kNN''
This approach capitalizes on the biased outputs, harnessing them as primary features for kNN and supplementing with gold labels.
Our comprehensive evaluations, spanning diverse domain text classification datasets and different GPT-2 model sizes, indicate the adaptability and efficacy of the bias-kNN'' method.
- Score: 36.739829839357995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Large Language Models (LLMs) have shown significant promise in various
applications, including zero-shot and few-shot learning. However, their
performance can be hampered by inherent biases. Instead of traditionally sought
methods that aim to minimize or correct these biases, this study introduces a
novel methodology named ``bias-kNN''. This approach capitalizes on the biased
outputs, harnessing them as primary features for kNN and supplementing with
gold labels. Our comprehensive evaluations, spanning diverse domain text
classification datasets and different GPT-2 model sizes, indicate the
adaptability and efficacy of the ``bias-kNN'' method. Remarkably, this approach
not only outperforms conventional in-context learning in few-shot scenarios but
also demonstrates robustness across a spectrum of samples, templates and
verbalizers. This study, therefore, presents a unique perspective on harnessing
biases, transforming them into assets for enhanced model performance.
 
      
        Related papers
        - Detecting Prefix Bias in LLM-based Reward Models [4.596249232904721]
 We introduce novel methods to detect and evaluate prefix bias in reward models trained on preference datasets.<n>We leverage these metrics to reveal significant biases in preference models across racial and gender dimensions.<n>Our findings highlight the critical need for bias-aware dataset design and evaluation in developing fair and reliable reward models.
 arXiv  Detail & Related papers  (2025-05-13T21:50:03Z)
- Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing [15.214861534330236]
 We introduce Diffusing DeBias (DDB) as a plug-in for common methods of unsupervised model debiasing.
Specifically, our approach adopts conditional diffusion models to generate synthetic bias-aligned images.
By tackling the fundamental issue of bias-conflicting training samples in learning auxiliary models, our proposed method beats current state-of-the-art in multiple benchmark datasets.
 arXiv  Detail & Related papers  (2025-02-13T18:17:03Z)
- Towards Resource Efficient and Interpretable Bias Mitigation in Large   Language Models [1.787433808079955]
 Large language models (LLMs) have been observed to perpetuate unwanted biases in training data.
In this paper, we mitigate bias by leveraging small biased and anti-biased expert models to obtain a debiasing signal.
 Experiments on mitigating gender, race, and religion biases show a reduction in bias on several local and global bias metrics.
 arXiv  Detail & Related papers  (2024-12-02T16:56:08Z)
- Scalable Influence and Fact Tracing for Large Language Model Pretraining [14.598556308631018]
 Training data attribution (TDA) methods aim to attribute model outputs back to specific training examples.
This paper refines existing gradient-based methods to work effectively at scale.
 arXiv  Detail & Related papers  (2024-10-22T20:39:21Z)
- GUS-Net: Social Bias Classification in Text with Generalizations,   Unfairness, and Stereotypes [2.2162879952427343]
 This paper introduces GUS-Net, an innovative approach to bias detection.
GUS-Net focuses on three key types of biases: (G)eneralizations, (U)nfairness, and (S)tereotypes.
Our methodology enhances traditional bias detection methods by incorporating the contextual encodings of pre-trained models.
 arXiv  Detail & Related papers  (2024-10-10T21:51:22Z)
- REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement   Learning [18.064064773660174]
 We introduce REFINE-LM, a debiasing method that uses reinforcement learning to handle different types of biases without any fine-tuning.
By training a simple model on top of the word probability distribution of a LM, our bias reinforcement learning method enables model debiasing without human annotations.
Experiments conducted on a wide range of models, including several LMs, show that our method significantly reduces stereotypical biases while preserving LMs performance.
 arXiv  Detail & Related papers  (2024-08-18T14:08:31Z)
- Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
 We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
 arXiv  Detail & Related papers  (2023-01-31T20:09:33Z)
- Feature-Level Debiased Natural Language Understanding [86.8751772146264]
 Existing natural language understanding (NLU) models often rely on dataset biases to achieve high performance on specific datasets.
We propose debiasing contrastive learning (DCT) to mitigate biased latent features and neglect the dynamic nature of bias.
DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
 arXiv  Detail & Related papers  (2022-12-11T06:16:14Z)
- Investigating Ensemble Methods for Model Robustness Improvement of Text
  Classifiers [66.36045164286854]
 We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
 arXiv  Detail & Related papers  (2022-10-28T17:52:10Z)
- General Greedy De-bias Learning [163.65789778416172]
 We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
 GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
 arXiv  Detail & Related papers  (2021-12-20T14:47:32Z)
- Learning from others' mistakes: Avoiding dataset biases without modeling
  them [111.17078939377313]
 State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
 arXiv  Detail & Related papers  (2020-12-02T16:10:54Z)
- An Empirical Study on Model-agnostic Debiasing Strategies for Robust
  Natural Language Inference [37.420864237437804]
 We focus on the model-agnostic debiasing strategies and explore how to make the NLI models robust to multiple adversarial attacks.
We first benchmark prevailing neural NLI models including pretrained ones on various adversarial datasets.
We then try to combat distinct known biases by modifying a mixture of experts (MoE) ensemble method.
 arXiv  Detail & Related papers  (2020-10-08T05:40:45Z)
- Towards Debiasing NLU Models from Unknown Biases [70.31427277842239]
 NLU models often exploit biases to achieve high dataset-specific performance without properly learning the intended task.
We present a self-debiasing framework that prevents models from mainly utilizing biases without knowing them in advance.
 arXiv  Detail & Related papers  (2020-09-25T15:49:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.