Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona
Biases in Dialogue Systems
- URL: http://arxiv.org/abs/2310.05280v5
- Date: Thu, 2 Nov 2023 23:06:36 GMT
- Title: Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona
Biases in Dialogue Systems
- Authors: Yixin Wan, Jieyu Zhao, Aman Chadha, Nanyun Peng, Kai-Wei Chang
- Abstract summary: We study "persona biases", which we define to be the sensitivity of dialogue models' harmful behaviors contingent upon the personas they adopt.
We categorize persona biases into biases in harmful expression and harmful agreement, and establish a comprehensive evaluation framework to measure persona biases in five aspects: Offensiveness, Toxic Continuation, Regard, Stereotype Agreement, and Toxic Agreement.
- Score: 103.416202777731
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Recent advancements in Large Language Models empower them to follow freeform
instructions, including imitating generic or specific demographic personas in
conversations. We define generic personas to represent demographic groups, such
as "an Asian person", whereas specific personas may take the form of specific
popular Asian names like "Yumi". While the adoption of personas enriches user
experiences by making dialogue systems more engaging and approachable, it also
casts a shadow of potential risk by exacerbating social biases within model
responses, thereby causing societal harm through interactions with users. In
this paper, we systematically study "persona biases", which we define to be the
sensitivity of dialogue models' harmful behaviors contingent upon the personas
they adopt. We categorize persona biases into biases in harmful expression and
harmful agreement, and establish a comprehensive evaluation framework to
measure persona biases in five aspects: Offensiveness, Toxic Continuation,
Regard, Stereotype Agreement, and Toxic Agreement. Additionally, we propose to
investigate persona biases by experimenting with UNIVERSALPERSONA, a
systematically constructed persona dataset encompassing various types of both
generic and specific model personas. Through benchmarking on four different
models -- including Blender, ChatGPT, Alpaca, and Vicuna -- our study uncovers
significant persona biases in dialogue systems. Our findings also underscore
the pressing need to revisit the use of personas in dialogue agents to ensure
safe application.
Related papers
- Stereotype or Personalization? User Identity Biases Chatbot Recommendations [54.38329151781466]
We show that large language models (LLMs) produce recommendations that reflect both what the user wants and who the user is.
We find that models generate racially stereotypical recommendations regardless of whether the user revealed their identity intentionally.
Our experiments show that even though a user's revealed identity significantly influences model recommendations, model responses obfuscate this fact in response to user queries.
arXiv Detail & Related papers (2024-10-08T01:51:55Z) - When "A Helpful Assistant" Is Not Really Helpful: Personas in System Prompts Do Not Improve Performances of Large Language Models [34.831938712535084]
Commercial AI systems commonly define the role of the Large Language Models (LLM) in system prompts.
It remains unclear how different personas affect a model's performance on objective tasks.
We curate a list of 162 roles covering 6 types of interpersonal relationships and 8 domains of expertise.
arXiv Detail & Related papers (2023-11-16T17:48:55Z) - Exposing Bias in Online Communities through Large-Scale Language Models [3.04585143845864]
This work uses the flaw of bias in language models to explore the biases of six different online communities.
The bias of the resulting models is evaluated by prompting the models with different demographics and comparing the sentiment and toxicity values of these generations.
This work not only affirms how easily bias is absorbed from training data but also presents a scalable method to identify and compare the bias of different datasets or communities.
arXiv Detail & Related papers (2023-06-04T08:09:26Z) - Measuring the Effect of Influential Messages on Varying Personas [67.1149173905004]
We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona might have upon seeing a news message.
The proposed task not only introduces personalization in the modeling but also predicts the sentiment polarity and intensity of each response.
This enables more accurate and comprehensive inference on the mental state of the persona.
arXiv Detail & Related papers (2023-05-25T21:01:00Z) - Toxicity in ChatGPT: Analyzing Persona-assigned Language Models [23.53559226972413]
Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community.
We systematically evaluate toxicity in over half a million generations of ChatGPT, a popular dialogue-based LLM.
We find that setting the system parameter of ChatGPT by assigning it a persona, significantly increases the toxicity of generations.
arXiv Detail & Related papers (2023-04-11T16:53:54Z) - Towards Building a Personalized Dialogue Generator via Implicit User
Persona Detection [0.0]
We consider high-quality transmission is essentially built based on apprehending the persona of the other party.
Motivated by this, we propose a novel personalized dialogue generator by detecting implicit user persona.
arXiv Detail & Related papers (2022-04-15T08:12:10Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Revealing Persona Biases in Dialogue Systems [64.96908171646808]
We present the first large-scale study on persona biases in dialogue systems.
We conduct analyses on personas of different social classes, sexual orientations, races, and genders.
In our studies of the Blender and DialoGPT dialogue systems, we show that the choice of personas can affect the degree of harms in generated responses.
arXiv Detail & Related papers (2021-04-18T05:44:41Z) - A Neural Topical Expansion Framework for Unstructured Persona-oriented
Dialogue Generation [52.743311026230714]
Persona Exploration and Exploitation (PEE) is able to extend the predefined user persona description with semantically correlated content.
PEE consists of two main modules: persona exploration and persona exploitation.
Our approach outperforms state-of-the-art baselines in terms of both automatic and human evaluations.
arXiv Detail & Related papers (2020-02-06T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.