Related papers: HANS, are you clever? Clever Hans Effect Analysis of Neural Systems

HANS, are you clever? Clever Hans Effect Analysis of Neural Systems

URL: http://arxiv.org/abs/2309.12481v2
Date: Thu, 2 May 2024 06:36:26 GMT
Title: HANS, are you clever? Clever Hans Effect Analysis of Neural Systems
Authors: Leonardo Ranaldi, Fabio Massimo Zanzotto,
Abstract summary: Large Language Models (It-LLMs) have been exhibiting outstanding abilities to reason around cognitive states, intentions, and reactions of all people involved, letting humans guide and comprehend day-to-day social interactions effectively. Several multiple-choice questions (MCQ) benchmarks have been proposed to construct solid assessments of the models' abilities. However, earlier works are demonstrating the presence of inherent "order bias" in It-LLMs, posing challenges to the appropriate evaluation.
Score: 1.6267479602370545
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Instruction-tuned Large Language Models (It-LLMs) have been exhibiting outstanding abilities to reason around cognitive states, intentions, and reactions of all people involved, letting humans guide and comprehend day-to-day social interactions effectively. In fact, several multiple-choice questions (MCQ) benchmarks have been proposed to construct solid assessments of the models' abilities. However, earlier works are demonstrating the presence of inherent "order bias" in It-LLMs, posing challenges to the appropriate evaluation. In this paper, we investigate It-LLMs' resilience abilities towards a series of probing tests using four MCQ benchmarks. Introducing adversarial examples, we show a significant performance gap, mainly when varying the order of the choices, which reveals a selection bias and brings into discussion reasoning abilities. Following a correlation between first positions and model choices due to positional bias, we hypothesized the presence of structural heuristics in the decision-making process of the It-LLMs, strengthened by including significant examples in few-shot scenarios. Finally, by using the Chain-of-Thought (CoT) technique, we elicit the model to reason and mitigate the bias by obtaining more robust models.

Related papers

Reasoning Models are Test Exploiters: Rethinking Multiple-Choice [10.085788712670487]
Large Language Models (LLMs) are asked to choose among a fixed set of choices in question-answering domains.<n>Multiple-choice question-answering (McQCA) is a good proxy for the downstream performance of models as long as they were allowed to perform chain-of-thought reasoning.<n>We conclude that MCQA is no longer a good proxy for assessing downstream performance of state-of-the-art models.
arXiv Detail & Related papers (2025-07-21T07:49:32Z)
NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks [65.70224757972068]
We select reasoning traces from a strong teacher model based on a large pool of questions from NaturalReasoning.<n>We find that simply scaling up data size with random sampling is a strong baseline with steady performance gains.<n>We find that selecting difficult examples that require more diverse reasoning strategies is more sample-efficient to transfer the teacher model's reasoning skills.
arXiv Detail & Related papers (2025-07-02T17:30:24Z)
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models [53.18562650350898]
Chain-of-thought (CoT) reasoning enhances performance of large language models.<n>We present the first comprehensive study of CoT faithfulness in large vision-language models.
arXiv Detail & Related papers (2025-05-29T18:55:05Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
On the Reasoning Capacity of AI Models and How to Quantify It [0.0]
Large Language Models (LLMs) have intensified the debate surrounding the fundamental nature of their reasoning capabilities. While achieving high performance on benchmarks such as GPQA and MMLU, these models exhibit limitations in more complex reasoning tasks. We propose a novel phenomenological approach that goes beyond traditional accuracy metrics to probe the underlying mechanisms of model behavior.
arXiv Detail & Related papers (2025-01-23T16:58:18Z)
Diverging Preferences: When do Annotators Disagree and do Models Know? [92.24651142187989]
We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes. We find that the majority of disagreements are in opposition with standard reward modeling approaches. We develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
arXiv Detail & Related papers (2024-10-18T17:32:22Z)
Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice Options [2.1184929769291294]
This work introduces a novel framework for evaluating LLMs' capacity to balance instruction-following with critical reasoning.<n>We show that post-training aligned models often default to selecting invalid options, while base models exhibit improved refusal capabilities that scale with model size.<n>We additionally conduct a parallel human study showing similar instruction-following biases, with implications for how these biases may propagate through human feedback datasets used in alignment.
arXiv Detail & Related papers (2024-08-27T19:27:43Z)
Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning [0.0]
We show that differences in learned regularities across answer options are predictive of model preferences and mirror human test-taking strategies. We introduce two novel methods: Counterfactual Prompting with Chain of Thought (CoT) and Counterfactual Prompting with Agnostically Primed CoT (APriCoT) Our results suggest that mitigating bias requires a "System-2" like process and that CoT reasoning is susceptible to confirmation bias under some prompting methodologies.
arXiv Detail & Related papers (2024-08-16T10:34:50Z)
Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning [14.569770617709073]
We present a detailed analysis of which design choices cause instabilities and inconsistencies in task predictions. We show how spurious correlations between input distributions and labels form only a minor problem for prompted models. We statistically analyse the results to show which factors are the most influential, interactive or stable.
arXiv Detail & Related papers (2023-10-20T13:25:24Z)
Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges. Our model is trained on user queries and LLM-generated responses under massive real-world scenarios. Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z)
Cross Feature Selection to Eliminate Spurious Interactions and Single Feature Dominance Explainable Boosting Machines [0.0]
Interpretability is essential for legal, ethical, and practical reasons. High-performance models can suffer from spurious interactions with redundant features and single-feature dominance. In this paper, we explore novel approaches to address these issues by utilizing alternate Cross-feature selection, ensemble features and model configuration alteration techniques.
arXiv Detail & Related papers (2023-07-17T13:47:41Z)
Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering Models [3.9052860539161918]
We propose a simple method for measuring a scale of models' reliance on any identified spurious feature. We assess the robustness towards a large set of known and newly found prediction biases for various pre-trained models and debiasing methods in Question Answering (QA) We find that while existing debiasing methods can mitigate reliance on a chosen spurious feature, the OOD performance gains of these methods can not be explained by mitigated reliance on biased features.
arXiv Detail & Related papers (2023-05-11T14:35:00Z)
Knowledge-Based Counterfactual Queries for Visual Question Answering [0.0]
We propose a systematic method for explaining the behavior and investigating the robustness of VQA models through counterfactual perturbations. For this reason, we exploit structured knowledge bases to perform deterministic, optimal and controllable word-level replacements targeting the linguistic modality. We then evaluate the model's response against such counterfactual inputs.
arXiv Detail & Related papers (2023-03-05T08:00:30Z)
Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy. We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples. Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z)
Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models. We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z)
General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z)
Counterfactual Variable Control for Robust and Interpretable Question Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases. In this paper, we inspect such spurious "capability" of QA models using causal inference. We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.