Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?
- URL: http://arxiv.org/abs/2404.03134v3
- Date: Sat, 05 Oct 2024 20:30:40 GMT
- Title: Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?
- Authors: Vagrant Gautam, Eileen Bingert, Dawei Zhu, Anne Lauscher, Dietrich Klakow,
- Abstract summary: We present a dataset of over 5 million instances to measure pronoun fidelity in English.
Our results show that pronoun fidelity is not robust, in a simple, naturalistic setting where humans achieve nearly 100% accuracy.
- Score: 26.583741801345507
- License:
- Abstract: Robust, faithful and harm-free pronoun use for individuals is an important goal for language model development as their use increases, but prior work tends to study only one or two of these characteristics at a time. To measure progress towards the combined goal, we introduce the task of pronoun fidelity: given a context introducing a co-referring entity and pronoun, the task is to reuse the correct pronoun later. We present RUFF, a carefully-designed dataset of over 5 million instances to measure robust pronoun fidelity in English, and we evaluate 37 model variants from nine popular families, across architectures (encoder-only, decoder-only and encoder-decoder) and scales (11M-70B parameters). When an individual is introduced with a pronoun, models can mostly faithfully reuse this pronoun in the next sentence, but they are significantly worse with she/her/her, singular they and neopronouns. Moreover, models are easily distracted by non-adversarial sentences discussing other people; even one sentence with a distractor pronoun causes accuracy to drop on average by 34 percentage points. Our results show that pronoun fidelity is not robust, in a simple, naturalistic setting where humans achieve nearly 100% accuracy. We encourage researchers to bridge the gaps we find and to carefully evaluate reasoning in settings where superficial repetition might inflate perceptions of model performance.
Related papers
- Persian Pronoun Resolution: Leveraging Neural Networks and Language Models [8.604145658574689]
This study proposes the first end-to-end neural network system for Persian pronoun resolution, leveraging pre-trained Transformer models like ParsBERT.
Our system jointly optimize both mention detection and antecedent linking, achieving a 3.37 F1 score improvement over the previous state-of-the-art system.
arXiv Detail & Related papers (2024-05-17T11:56:00Z) - MISGENDERED: Limits of Large Language Models in Understanding Pronouns [46.276320374441056]
We evaluate popular language models for their ability to correctly use English gender-neutral pronouns.
We introduce MISGENDERED, a framework for evaluating large language models' ability to correctly use preferred pronouns.
arXiv Detail & Related papers (2023-06-06T18:27:52Z) - A Survey on Zero Pronoun Translation [69.09774294082965]
Zero pronouns (ZPs) are frequently omitted in pro-drop languages, but should be recalled in non-pro-drop languages.
This survey paper highlights the major works that have been undertaken in zero pronoun translation (ZPT) after the neural revolution.
We uncover a number of insightful findings such as: 1) ZPT is in line with the development trend of large language model; 2) data limitation causes learning bias in languages and domains; 3) performance improvements are often reported on single benchmarks, but advanced methods are still far from real-world use.
arXiv Detail & Related papers (2023-05-17T13:19:01Z) - Welcome to the Modern World of Pronouns: Identity-Inclusive Natural
Language Processing beyond Gender [23.92148222207458]
We provide an overview of 3rd person pronoun issues for Natural Language Processing.
We evaluate existing and novel modeling approaches.
We quantify the impact of a more discrimination-free approach on established benchmark data.
arXiv Detail & Related papers (2022-02-24T06:42:11Z) - They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English.
We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z) - NLP-CIC @ DIACR-Ita: POS and Neighbor Based Distributional Models for
Lexical Semantic Change in Diachronic Italian Corpora [62.997667081978825]
We present our systems and findings on unsupervised lexical semantic change for the Italian language.
The task is to determine whether a target word has evolved its meaning with time, only relying on raw-text from two time-specific datasets.
We propose two models representing the target words across the periods to predict the changing words using threshold and voting schemes.
arXiv Detail & Related papers (2020-11-07T11:27:18Z) - Transformer-GCRF: Recovering Chinese Dropped Pronouns with General
Conditional Random Fields [54.03719496661691]
We present a novel framework that combines the strength of Transformer network with General Conditional Random Fields (GCRF) to model the dependencies between pronouns in neighboring utterances.
Results on three Chinese conversation datasets show that the Transformer-GCRF model outperforms the state-of-the-art dropped pronoun recovery models.
arXiv Detail & Related papers (2020-10-07T07:06:09Z) - A Brief Survey and Comparative Study of Recent Development of Pronoun
Coreference Resolution [55.39835612617972]
Pronoun Coreference Resolution (PCR) is the task of resolving pronominal expressions to all mentions they refer to.
As one important natural language understanding (NLU) component, pronoun resolution is crucial for many downstream tasks and still challenging for existing models.
We conduct extensive experiments to show that even though current models are achieving good performance on the standard evaluation set, they are still not ready to be used in real applications.
arXiv Detail & Related papers (2020-09-27T01:40:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.