Related papers: Surfacing Biases in Large Language Models using Contrastive Input Decoding

Surfacing Biases in Large Language Models using Contrastive Input Decoding

URL: http://arxiv.org/abs/2305.07378v1
Date: Fri, 12 May 2023 11:09:49 GMT
Title: Surfacing Biases in Large Language Models using Contrastive Input Decoding
Authors: Gal Yona, Or Honovich, Itay Laish, Roee Aharoni
Abstract summary: Contrastive Input Decoding (CID) is a decoding algorithm to generate text given two inputs. We use CID to highlight context-specific biases that are hard to detect with standard decoding strategies.
Score: 12.694066526722203
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensuring that large language models (LMs) are fair, robust and useful requires an understanding of how different modifications to their inputs impact the model's behaviour. In the context of open-text generation tasks, however, such an evaluation is not trivial. For example, when introducing a model with an input text and a perturbed, "contrastive" version of it, meaningful differences in the next-token predictions may not be revealed with standard decoding strategies. With this motivation in mind, we propose Contrastive Input Decoding (CID): a decoding algorithm to generate text given two inputs, where the generated text is likely given one input but unlikely given the other. In this way, the contrastive generations can highlight potentially subtle differences in how the LM output differs for the two inputs in a simple and interpretable manner. We use CID to highlight context-specific biases that are hard to detect with standard decoding strategies and quantify the effect of different input perturbations.

Related papers

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation [2.4555276449137042]
We propose DiffSampling, a new decoding method that leverages a mathematical analysis of the token probability distribution.<n>Experiments involving four different text-generation tasks demonstrate that our approach consistently performs at least on par with the existing methods.
arXiv Detail & Related papers (2025-02-19T19:00:02Z)
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction [47.17755403213469]
We propose CodeI/O, a novel approach that condenses diverse reasoning patterns embedded in contextually-grounded codes. By training models to predict inputs/outputs given code and test cases entirely in natural language, we expose them to universal reasoning primitives. Experimental results demonstrate CodeI/O leads to consistent improvements across symbolic, scientific, logic, math & numerical, and commonsense reasoning tasks.
arXiv Detail & Related papers (2025-02-11T07:26:50Z)
Model-diff: A Tool for Comparative Study of Language Models in the Input Space [34.680890752084004]
We propose a new model comparative analysis setting that considers a large input space where brute-force enumeration would be infeasible. Experiments reveal for the first time the quantitative prediction differences between LMs in a large input space, potentially facilitating the model analysis for applications such as model plagiarism.
arXiv Detail & Related papers (2024-12-13T00:06:25Z)
Vulnerability of LLMs to Vertically Aligned Text Manipulations [108.6908427615402]
Large language models (LLMs) have become highly effective at performing text classification tasks. modifying input formats, such as vertically aligning words for encoder-based models, can substantially lower accuracy in text classification tasks. Do decoder-based LLMs exhibit similar vulnerabilities to vertically formatted text input?
arXiv Detail & Related papers (2024-10-26T00:16:08Z)
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis. Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z)
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models [0.0]
This paper introduces an efficient and robust method for discovering interpretable circuits in large language models. We propose training sparse autoencoders on carefully designed positive and negative examples. Our findings highlight the promise of discrete sparse autoencoders for scalable and efficient mechanistic interpretability.
arXiv Detail & Related papers (2024-05-21T06:26:10Z)
Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation [5.304395026626743]
Hallucination of text ungrounded in the input is a well-known problem in neural data-to-text generation. We propose a new way to mitigate hallucinations by combining the probabilistic output of a generator language model with the output of a special "text critic" Our method does not need any changes to the underlying LM's architecture or training procedure.
arXiv Detail & Related papers (2023-10-25T20:05:07Z)
Contrastive Decoding Improves Reasoning in Large Language Models [55.16503283583076]
We show that Contrastive Decoding achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark.
arXiv Detail & Related papers (2023-09-17T00:29:32Z)
Code Difference Guided Adversarial Example Generation for Deep Code Models [25.01072108219646]
Adversarial examples are important to test and enhance the robustness of deep code models. We propose a novel adversarial example generation technique (i.e., CODA) for testing deep code models.
arXiv Detail & Related papers (2023-01-06T08:03:56Z)
Contrastive Decoding: Open-ended Text Generation as Optimization [153.35961722855686]
We propose contrastive decoding (CD), a reliable decoding approach. It is inspired by the fact that the failures of larger LMs are even more prevalent in smaller LMs. CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone.
arXiv Detail & Related papers (2022-10-27T00:58:21Z)
FAST: Improving Controllability for Text Generation with Feedback Aware Self-Training [25.75982440355576]
Controllable text generation systems often leverage control codes to direct various properties of the output like style and length. Inspired by recent work on causal inference for NLP, this paper reveals a previously overlooked flaw in these control code-based conditional text generation algorithms. We propose two simple techniques to reduce these correlations in training sets.
arXiv Detail & Related papers (2022-10-06T19:00:51Z)
On Measuring Social Biases in Prompt-Based Multi-Task Learning [1.3270286124913757]
We study T0, a large-scale multi-task text-to-text language model trained using prompt-based learning. We consider two different forms of semantically equivalent inputs: question-answer format and premise-hypothesis format.
arXiv Detail & Related papers (2022-05-23T20:01:20Z)
On Decoding Strategies for Neural Text Generators [73.48162198041884]
We study the interaction between language generation tasks and decoding strategies. We measure changes in attributes of generated text as a function of both decoding strategy and task. Our results reveal both previously-observed and surprising findings.
arXiv Detail & Related papers (2022-03-29T16:25:30Z)
Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.