Related papers: Hypothesis-only Biases in Large Language Model-Elicited Natural Language Inference

Hypothesis-only Biases in Large Language Model-Elicited Natural Language Inference

URL: http://arxiv.org/abs/2410.08996v1
Date: Fri, 11 Oct 2024 17:09:22 GMT
Title: Hypothesis-only Biases in Large Language Model-Elicited Natural Language Inference
Authors: Grace Proebsting, Adam Poliak,
Abstract summary: We recreate a portion of the Stanford NLI corpus using GPT-4, Llama-2 and Mistral 7b. We train hypothesis-only classifiers to determine whether LLM-elicited hypotheses contain annotation artifacts. Our analysis provides empirical evidence that well-attested biases in NLI can persist in LLM-generated data.
Score: 3.0804372027733202
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We test whether replacing crowdsource workers with LLMs to write Natural Language Inference (NLI) hypotheses similarly results in annotation artifacts. We recreate a portion of the Stanford NLI corpus using GPT-4, Llama-2 and Mistral 7b, and train hypothesis-only classifiers to determine whether LLM-elicited hypotheses contain annotation artifacts. On our LLM-elicited NLI datasets, BERT-based hypothesis-only classifiers achieve between 86-96% accuracy, indicating these datasets contain hypothesis-only artifacts. We also find frequent "give-aways" in LLM-generated hypotheses, e.g. the phrase "swimming in a pool" appears in more than 10,000 contradictions generated by GPT-4. Our analysis provides empirical evidence that well-attested biases in NLI can persist in LLM-generated data.

Related papers

Neutralizing Bias in LLM Reasoning using Entailment Graphs [13.5088417466172]
LLMs are often claimed to be capable of Natural Language Inference (NLI), which is widely regarded as a cornerstone of more complex forms of reasoning. We design an unsupervised framework to construct counterfactual reasoning data and fine-tune LLMs to reduce attestation bias. Our framework consistently improves inferential performance on both original and bias-neutralized NLI datasets.
arXiv Detail & Related papers (2025-03-14T17:33:30Z)
Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference [3.0804372027733202]
We test whether NLP datasets created with Large Language Models (LLMs) contain annotation artifacts and social biases. We recreate a portion of the Stanford Natural Language Inference corpus using GPT-4, Llama-2 70b for Chat, and Mistral 7b Instruct.
arXiv Detail & Related papers (2025-03-06T23:49:30Z)
On Reference (In-)Determinacy in Natural Language Inference [62.904689974282334]
We revisit the reference determinacy (RD) assumption in the task of natural language inference (NLI) We observe that current NLI models fail in downstream applications such as fact verification, where the input premise and hypothesis may refer to different contexts. We introduce RefNLI, a diagnostic benchmark for identifying reference ambiguity in NLI examples.
arXiv Detail & Related papers (2025-02-09T06:58:13Z)
CSS: Contrastive Semantic Similarity for Uncertainty Quantification of LLMs [1.515687944002438]
We propose Contrastive Semantic Similarity, a module to obtain similarity features for measuring uncertainty for text pairs. We conduct extensive experiments with three large language models (LLMs) on several benchmark question-answering datasets. Results show that our proposed method performs better in estimating reliable responses of LLMs than comparable baselines.
arXiv Detail & Related papers (2024-06-05T11:35:44Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
Native Language Identification with Large Language Models [60.80452362519818]
We show that GPT models are proficient at NLI classification, with GPT-4 setting a new performance record of 91.7% on the benchmark11 test set in a zero-shot setting. We also show that unlike previous fully-supervised settings, LLMs can perform NLI without being limited to a set of known classes.
arXiv Detail & Related papers (2023-12-13T00:52:15Z)
Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment [82.60594940370919]
We propose the FlipFlop experiment to study the multi-turn behavior of Large Language Models (LLMs) We show that models flip their answers on average 46% of the time and that all models see a deterioration of accuracy between their first and final prediction, with an average drop of 17% (the FlipFlop effect) We conduct finetuning experiments on an open-source LLM and find that finetuning on synthetically created data can mitigate - reducing performance deterioration by 60% - but not resolve sycophantic behavior entirely.
arXiv Detail & Related papers (2023-11-14T23:40:22Z)
Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks. How do we evaluate the capabilities of LLMs to consistently produce factually correct answers? We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z)
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models [37.63939774027709]
Large language models (LLMs) specializing in natural language generation (NLG) have recently started exhibiting promising capabilities. We propose and compare several confidence/uncertainty measures, applying them to *selective NLG* where unreliable results could either be ignored or yielded for further assessment. Results reveal that a simple measure for the semantic dispersion can be a reliable predictor of the quality of LLM responses.
arXiv Detail & Related papers (2023-05-30T16:31:26Z)
Sources of Hallucination by Large Language Models on Inference Tasks [16.644096408742325]
Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI) We present a series of behavioral studies on several LLM families which probe their behavior using controlled experiments.
arXiv Detail & Related papers (2023-05-23T22:24:44Z)
Statistical Knowledge Assessment for Large Language Models [79.07989821512128]
Given varying prompts regarding a factoid question, can a large language model (LLM) reliably generate factually correct answers? We propose KaRR, a statistical approach to assess factual knowledge for LLMs. Our results reveal that the knowledge in LLMs with the same backbone architecture adheres to the scaling law, while tuning on instruction-following data sometimes compromises the model's capability to generate factually correct text reliably.
arXiv Detail & Related papers (2023-05-17T18:54:37Z)
The Internal State of an LLM Knows When It's Lying [18.886091925252174]
Large Language Models (LLMs) have shown exceptional performance in various tasks. One of their most prominent drawbacks is generating inaccurate or false information with a confident tone. We provide evidence that the LLM's internal state can be used to reveal the truthfulness of statements.
arXiv Detail & Related papers (2023-04-26T02:49:38Z)
Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text Correspondence [45.9949173746044]
We show that large-size pre-trained language models (PLMs) do not satisfy the logical negation property (LNP) We propose a novel intermediate training task, names meaning-matching, designed to directly learn a meaning-text correspondence. We find that the task enables PLMs to learn lexical semantic information.
arXiv Detail & Related papers (2022-05-08T08:37:36Z)
Automatically Identifying Semantic Bias in Crowdsourced Natural Language Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets. interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.