Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration
- URL: http://arxiv.org/abs/2402.14296v4
- Date: Sun, 09 Feb 2025 13:34:33 GMT
- Title: Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration
- Authors: Ang Li, Jingqian Zhao, Bin Liang, Lin Gui, Hui Wang, Xi Zeng, Xingwei Liang, Kam-Fai Wong, Ruifeng Xu,
- Abstract summary: Large language models (LLMs) have demonstrated significant advancements across various natural language processing tasks including stance detection.<n>Their performance in stance detection is limited by biases and spurious correlations inherent due to their data-driven nature.<n>We propose a Counterfactual Augmented Network (FACTUAL), which a novel calibration network is devised to calibrate potential bias in the stance prediction of LLMs.
- Score: 43.02857908228108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stance detection is critical for understanding the underlying position or attitude expressed toward a topic. Large language models (LLMs) have demonstrated significant advancements across various natural language processing tasks including stance detection, however, their performance in stance detection is limited by biases and spurious correlations inherent due to their data-driven nature. Our statistical experiment reveals that LLMs are prone to generate biased stances due to sentiment-stance spurious correlations and preference towards certain individuals and topics. Furthermore, the results demonstrate a strong negative correlation between stance bias and stance detection performance, underscoring the importance of mitigating bias to enhance the utility of LLMs in stance detection. Therefore, in this paper, we propose a Counterfactual Augmented Calibration Network (FACTUAL), which a novel calibration network is devised to calibrate potential bias in the stance prediction of LLMs. Further, to address the challenge of effectively learning bias representations and the difficulty in the generalizability of debiasing, we construct counterfactual augmented data. This approach enhances the calibration network, facilitating the debiasing and out-of-domain generalization. Experimental results on in-target and zero-shot stance detection tasks show that the proposed FACTUAL can effectively mitigate biases of LLMs, achieving state-of-the-art results.
Related papers
- Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training [57.03005244917803]
Large language models (LLMs) often fail on out-of-distribution (OOD) samples due to spurious correlations acquired during pre-training.<n>Here, we aim to mitigate such spurious correlations through causality-aware post-training (CAPT)<n> Experiments on the formal causal inference benchmark CLadder and the logical reasoning dataset PrOntoQA show that 3B-scale language models fine-tuned with CAPT can outperform both traditional SFT and larger LLMs on in-distribution (ID) and OOD tasks.
arXiv Detail & Related papers (2025-06-11T06:30:28Z) - Metamorphic Testing for Fairness Evaluation in Large Language Models: Identifying Intersectional Bias in LLaMA and GPT [2.380039717474099]
Large Language Models (LLMs) have made significant strides in Natural Language Processing but remain vulnerable to fairness-related issues.
This paper introduces a metamorphic testing approach to systematically identify fairness bugs in LLMs.
arXiv Detail & Related papers (2025-04-04T21:04:14Z) - Fine-Grained Bias Detection in LLM: Enhancing detection mechanisms for nuanced biases [0.0]
This study presents a detection framework to identify nuanced biases in Large Language Models (LLMs)
The approach integrates contextual analysis, interpretability via attention mechanisms, and counterfactual data augmentation to capture hidden biases.
Results show improvements in detecting subtle biases compared to conventional methods.
arXiv Detail & Related papers (2025-03-08T04:43:01Z) - Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment [30.605500809158986]
We propose a novel causal reward modeling approach that integrates causal inference to mitigate spurious correlations.
Our approach mitigates various types of spurious correlations effectively, resulting in more reliable and fair alignment of LLMs with human preferences.
arXiv Detail & Related papers (2025-01-16T16:00:37Z) - LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation [28.61326111959728]
Large language models (LLMs) exhibit significant biases in evaluation tasks, particularly in preferentially rating and favoring self-generated content.
Our study addresses this knowledge gap by simulating two critical phases of the retrieval-augmented generation (RAG) framework.
Contrary to previous findings, our results reveal no significant self-preference effect in RAG frameworks.
arXiv Detail & Related papers (2024-10-28T08:32:09Z) - Uncovering Biases with Reflective Large Language Models [2.5200794639628032]
Biases and errors in human-labeled data present significant challenges for machine learning.
We present the Reflective LLM Dialogue Framework RLDF, which leverages structured adversarial dialogues to uncover diverse perspectives.
Experiments show RLDF successfully identifies potential biases in public content while exposing limitations in human-labeled data.
arXiv Detail & Related papers (2024-08-24T04:48:32Z) - Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness [10.081447621656523]
The impact on language modeling ability can be alleviated given a high-quality and long-contextualized debiasing corpus.
The effectiveness of task-agnostic debiasing hinges on the quantitative bias level of both the task-specific data used for downstream applications and the debiased model.
We propose a novel framework which can Propagate Socially-fair Debiasing to Downstream Fine-tuning, ProSocialTuning.
arXiv Detail & Related papers (2024-06-06T15:11:11Z) - UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation [12.04811490937078]
We investigate how feedforward neural networks (FFNs) and attention heads result in the bias of large language models (LLMs)
To mitigate these biases, we introduce UniBias, an inference-only method that effectively identifies and eliminates biased FFN vectors and attention heads.
arXiv Detail & Related papers (2024-05-31T03:59:15Z) - Beyond Performance: Quantifying and Mitigating Label Bias in LLMs [8.77694178599322]
We evaluate different approaches to quantifying label bias in a model's predictions.
Our investigation reveals substantial label bias in models both before and after debiasing attempts.
We propose a novel label bias calibration method tailored for few-shot prompting.
arXiv Detail & Related papers (2024-05-04T19:53:03Z) - Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing.
Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image.
To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z) - Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy.
We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples.
Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z) - Information-Theoretic Bias Reduction via Causal View of Spurious
Correlation [71.9123886505321]
We propose an information-theoretic bias measurement technique through a causal interpretation of spurious correlation.
We present a novel debiasing framework against the algorithmic bias, which incorporates a bias regularization loss.
The proposed bias measurement and debiasing approaches are validated in diverse realistic scenarios.
arXiv Detail & Related papers (2022-01-10T01:19:31Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.