Detecting Linguistic Indicators for Stereotype Assessment with Large Language Models
- URL: http://arxiv.org/abs/2502.19160v1
- Date: Wed, 26 Feb 2025 14:15:28 GMT
- Title: Detecting Linguistic Indicators for Stereotype Assessment with Large Language Models
- Authors: Rebekka Görge, Michael Mock, Héctor Allende-Cid,
- Abstract summary: Social categories and stereotypes are embedded in language and can introduce data bias into Large Language Models.<n>We propose a new approach that detects and quantifies the linguistic indicators of stereotypes in a sentence.
- Score: 0.9285295512807729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social categories and stereotypes are embedded in language and can introduce data bias into Large Language Models (LLMs). Despite safeguards, these biases often persist in model behavior, potentially leading to representational harm in outputs. While sociolinguistic research provides valuable insights into the formation of stereotypes, NLP approaches for stereotype detection rarely draw on this foundation and often lack objectivity, precision, and interpretability. To fill this gap, in this work we propose a new approach that detects and quantifies the linguistic indicators of stereotypes in a sentence. We derive linguistic indicators from the Social Category and Stereotype Communication (SCSC) framework which indicate strong social category formulation and stereotyping in language, and use them to build a categorization scheme. To automate this approach, we instruct different LLMs using in-context learning to apply the approach to a sentence, where the LLM examines the linguistic properties and provides a basis for a fine-grained assessment. Based on an empirical evaluation of the importance of different linguistic indicators, we learn a scoring function that measures the linguistic indicators of a stereotype. Our annotations of stereotyped sentences show that these indicators are present in these sentences and explain the strength of a stereotype. In terms of model performance, our results show that the models generally perform well in detecting and classifying linguistic indicators of category labels used to denote a category, but sometimes struggle to correctly evaluate the associated behaviors and characteristics. Using more few-shot examples within the prompts, significantly improves performance. Model performance increases with size, as Llama-3.3-70B-Instruct and GPT-4 achieve comparable results that surpass those of Mixtral-8x7B-Instruct, GPT-4-mini and Llama-3.1-8B-Instruct.
Related papers
- Scaling for Fairness? Analyzing Model Size, Data Composition, and Multilinguality in Vision-Language Bias [14.632649933582648]
We investigate how dataset composition, model size, and multilingual training affect gender and racial bias in a popular VLM, CLIP, and its open source variants.<n>To assess social perception bias, we measure the zero-shot performance on face images featuring socially charged terms.
arXiv Detail & Related papers (2025-01-22T21:08:30Z) - Large Language Models as Neurolinguistic Subjects: Discrepancy in Performance and Competence for Form and Meaning [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning)<n>We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.<n>We found: (1) Psycholinguistic and neurolinguistic methods reveal that language performance and competence are distinct; (2) Direct probability measurement may not accurately assess linguistic competence; and (3) Instruction tuning won't change much competence but improve performance.
arXiv Detail & Related papers (2024-11-12T04:16:44Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - Social Bias Probing: Fairness Benchmarking for Language Models [38.180696489079985]
This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment.
We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections.
We show that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized.
arXiv Detail & Related papers (2023-11-15T16:35:59Z) - On The Role of Reasoning in the Identification of Subtle Stereotypes in Natural Language [0.03749861135832073]
Large language models (LLMs) are trained on vast, uncurated datasets that contain various forms of biases and language reinforcing harmful stereotypes.
It is essential to examine and address biases in language models, integrating fairness into their development to ensure that these models do not perpetuate social biases.
This work firmly establishes reasoning as a critical component in automatic stereotype detection and is a first step towards stronger stereotype mitigation pipelines for LLMs.
arXiv Detail & Related papers (2023-07-24T15:12:13Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Counteracts: Testing Stereotypical Representation in Pre-trained
Language Models [4.211128681972148]
We use counterexamples to examine the internal stereotypical knowledge in pre-trained language models (PLMs)
We evaluate 7 PLMs on 9 types of cloze-style prompt with different information and base knowledge.
arXiv Detail & Related papers (2023-01-11T07:52:59Z) - Automated Speech Scoring System Under The Lens: Evaluating and
interpreting the linguistic cues for language proficiency [26.70127591966917]
We utilize classical machine learning models to formulate a speech scoring task as both a classification and a regression problem.
First, we extract linguist features under five categories (fluency, pronunciation, content, grammar and vocabulary, and acoustic) and train models to grade responses.
In comparison, we find that the regression-based models perform equivalent to or better than the classification approach.
arXiv Detail & Related papers (2021-11-30T06:28:58Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.