Related papers: Analyzing Social Biases in Japanese Large Language Models

Related papers

EsBBQ and CaBBQ: The Spanish and Catalan Bias Benchmarks for Question Answering [1.6630304911300329]
This paper introduces the Spanish and the Catalan Bias Benchmarks for Question Answering (EsBBQ and CaBBQ)<n>Based on the original BBQ, these two parallel datasets are designed to assess social bias across 10 categories using a multiple-choice QA setting.<n>We report evaluation results on different Large Language Models, factoring in model family, size and variant.
arXiv Detail & Related papers (2025-07-15T11:37:30Z)
Intersectional Bias in Japanese Large Language Models from a Contextualized Perspective [19.168850702678125]
We construct a benchmark inter-JBBQ to evaluate the intersectional bias in large language models (LLMs) on the question-answering setting.<n>Using inter-JBBQ to analyze GPT-4o and Swallow, we find that biased output varies according to its contexts even with the equal combination of social attributes.
arXiv Detail & Related papers (2025-06-14T03:30:07Z)
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations [15.045809510740218]
We propose a Bias Benchmark for Generation (BBG) to evaluate social bias in long-form generation. We measure the probability of neutral and biased generations across ten large language models (LLMs) We also compare our long-form story generation evaluation results with multiple-choice BBQ evaluation, showing that the two approaches produce inconsistent results.
arXiv Detail & Related papers (2025-03-10T07:06:47Z)
Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks. We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs. These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z)
Analyzing the Safety of Japanese Large Language Models in Stereotype-Triggering Prompts [1.222454730281256]
This study examines the safety of Japanese Large Language Models when responding to stereotype-triggering prompts in direct setups. We constructed 3,612 prompts by combining 301 social group terms, categorized by age, gender, and other attributes, with 12 stereotype-inducing templates in Japanese. Our findings reveal that LLM-jp, a Japanese native model, exhibits the lowest refusal rate and is more likely to generate toxic and negative responses compared to other models.
arXiv Detail & Related papers (2025-03-03T19:00:00Z)
Beneath the Surface: How Large Language Models Reflect Hidden Bias [7.026605828163043]
We introduce the Hidden Bias Benchmark (HBB), a novel dataset designed to assess hidden bias that bias concepts are hidden within naturalistic, subtly framed contexts in real-world scenarios. We analyze six state-of-the-art Large Language Models, revealing that while models reduce bias in response to overt bias, they continue to reinforce biases in nuanced settings.
arXiv Detail & Related papers (2025-02-27T04:25:54Z)
Evaluating the Effect of Retrieval Augmentation on Social Biases [28.35953315232521]
We study the relationship between the different components of a RAG system and the social biases presented in the text generated across three languages. We find that the biases in document collections are often amplified in the generated responses, even when the generating LLM exhibits a low-level of bias. Our findings raise concerns about the use of RAG as a technique for injecting novel facts into NLG systems and call for careful evaluation of potential social biases in RAG applications before their real-world deployment.
arXiv Detail & Related papers (2025-02-24T19:58:23Z)
A Novel Interpretability Metric for Explaining Bias in Language Models: Applications on Multilingual Models from Southeast Asia [0.3376269351435396]
We propose a novel metric to measure token-level contributions to biased behavior in pretrained language models (PLMs) Our results confirm the presence of sexist and homophobic bias in Southeast Asian PLMs. Interpretability and semantic analyses also reveal that PLM bias is strongly induced by words relating to crime, intimate relationships, and helping.
arXiv Detail & Related papers (2024-10-20T18:31:05Z)
Social Debiasing for Fair Multi-modal LLMs [55.8071045346024]
Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities. However, these models often inherit severe social biases from their training datasets, leading to unfair predictions based on attributes like race and gender. This paper addresses the issue of social biases in MLLMs by i) Introducing a comprehensive Counterfactual dataset with Multiple Social Concepts (CMSC) and ii) Proposing an Anti-Stereotype Debiasing strategy (ASD)
arXiv Detail & Related papers (2024-08-13T02:08:32Z)
BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization [0.0]
Large Language Models (LLMs) have become pivotal in advancing natural language processing, yet their potential to perpetuate biases poses significant concerns. This paper introduces a new framework employing Direct Preference Optimization (DPO) to mitigate gender, racial, and religious biases in English text. By developing a loss function that favors less biased over biased completions, our approach cultivates a preference for respectful and non-discriminatory language.
arXiv Detail & Related papers (2024-07-18T22:32:20Z)
Social Bias Evaluation for Large Language Models Requires Prompt Variations [38.91306092184724]
Large Language Models (LLMs) exhibit considerable social biases. This paper investigates the sensitivity of LLMs when changing prompt variations. We show that LLMs have tradeoffs between performance and social bias caused by the prompts.
arXiv Detail & Related papers (2024-07-03T14:12:04Z)
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
VLBiasBench is a benchmark aimed at evaluating biases in Large Vision-Language Models (LVLMs) We construct a dataset encompassing nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status and two intersectional bias categories (race x gender, and race x social economic status) We conduct extensive evaluations on 15 open-source models as well as one advanced closed-source model, providing some new insights into the biases revealing from these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z)
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of large language models' implicit bias towards certain demographics. Inspired by psychometric principles, we propose three attack approaches, i.e., Disguise, Deception, and Teaching. Our methods can elicit LLMs' inner bias more effectively than competitive baselines.
arXiv Detail & Related papers (2024-06-20T06:42:08Z)
MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs [6.781972039785424]
Generative large language models (LLMs) have been shown to exhibit harmful biases and stereotypes. We present MBBQ, a dataset that measures stereotypes commonly held across Dutch, Spanish, and Turkish languages. Our results confirm that some non-English languages suffer from bias more than English, even when controlling for cultural shifts.
arXiv Detail & Related papers (2024-06-11T13:23:14Z)
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement [75.7148545929689]
Large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We formally define LLM's self-bias - the tendency to favor its own generation. We analyze six LLMs on translation, constrained text generation, and mathematical reasoning tasks.
arXiv Detail & Related papers (2024-02-18T03:10:39Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models [0.0]
This paper investigates bias along less-studied but still consequential, dimensions, such as age and beauty. We ask whether LLMs hold wide-reaching biases of positive or negative sentiment for specific social groups similar to the "what is beautiful is good" bias found in people in experimental psychology.
arXiv Detail & Related papers (2023-09-16T07:07:04Z)
The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks [75.58692290694452]
We compare social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye. We observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models.
arXiv Detail & Related papers (2022-10-18T17:58:39Z)
BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation [89.41378346080603]
This work presents the first systematic study on the social bias in PLM-based metrics. We demonstrate that popular PLM-based metrics exhibit significantly higher social bias than traditional metrics on 6 sensitive attributes. In addition, we develop debiasing adapters that are injected into PLM layers, mitigating bias in PLM-based metrics while retaining high performance for evaluating text generation.
arXiv Detail & Related papers (2022-10-14T08:24:11Z)
Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases. We propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.