A critical appraisal of equity in conversational AI: Evidence from
auditing GPT-3's dialogues with different publics on climate change and Black
Lives Matter
- URL: http://arxiv.org/abs/2209.13627v1
- Date: Tue, 27 Sep 2022 18:44:41 GMT
- Title: A critical appraisal of equity in conversational AI: Evidence from
auditing GPT-3's dialogues with different publics on climate change and Black
Lives Matter
- Authors: Kaiping Chen, Anqi Shao, Jirayu Burapacheep, Yixuan Li
- Abstract summary: This paper proposes an analytical framework for unpacking the meaning of equity in human-AI dialogues.
Our corpus consists of over 20,000 rounds of dialogues between GPT-3 and 3290 individuals.
We found a substantively worse user experience with GPT-3 among the opinion and the education minority subpopulations.
- Score: 17.549208519206605
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Autoregressive language models, which use deep learning to produce human-like
texts, have become increasingly widespread. Such models are powering popular
virtual assistants in areas like smart health, finance, and autonomous driving.
While the parameters of these large language models are improving, concerns
persist that these models might not work equally for all subgroups in society.
Despite growing discussions of AI fairness across disciplines, there lacks
systemic metrics to assess what equity means in dialogue systems and how to
engage different populations in the assessment loop. Grounded in theories of
deliberative democracy and science and technology studies, this paper proposes
an analytical framework for unpacking the meaning of equity in human-AI
dialogues. Using this framework, we conducted an auditing study to examine how
GPT-3 responded to different sub-populations on crucial science and social
topics: climate change and the Black Lives Matter (BLM) movement. Our corpus
consists of over 20,000 rounds of dialogues between GPT-3 and 3290 individuals
who vary in gender, race and ethnicity, education level, English as a first
language, and opinions toward the issues. We found a substantively worse user
experience with GPT-3 among the opinion and the education minority
subpopulations; however, these two groups achieved the largest knowledge gain,
changing attitudes toward supporting BLM and climate change efforts after the
chat. We traced these user experience divides to conversational differences and
found that GPT-3 used more negative expressions when it responded to the
education and opinion minority groups, compared to its responses to the
majority groups. We discuss the implications of our findings for a deliberative
conversational AI system that centralizes diversity, equity, and inclusion.
Related papers
- AI in Support of Diversity and Inclusion [5.415339913320849]
We look at the challenges and progress in making large language models (LLMs) more transparent, inclusive, and aware of social biases.
We highlight AI's role in identifying biased content in media, which is important for improving representation.
We stress AI systems need diverse and inclusive training data.
arXiv Detail & Related papers (2025-01-16T13:36:24Z) - Towards New Benchmark for AI Alignment & Sentiment Analysis in Socially Important Issues: A Comparative Study of Human and LLMs in the Context of AGI [0.08192907805418582]
This research aims to contribute towards establishing a benchmark for evaluating the sentiment of various Large Language Models in socially importan issues.
Seven LLMs, including GPT-4 and Bard, were analyzed and compared against sentiment data from three independent human sample populations.
GPT-4 recorded the most positive sentiment score towards AGI, whereas Bard was leaning towards the neutral sentiment.
arXiv Detail & Related papers (2025-01-05T13:18:13Z) - Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.
This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - From Experts to the Public: Governing Multimodal Language Models in Politically Sensitive Video Analysis [48.14390493099495]
This paper examines the governance of large language models (MM-LLMs) through individual and collective deliberation.
We conducted a two-step study: first, interviews with 10 journalists established a baseline understanding of expert video interpretation; second, 114 individuals from the general public engaged in deliberation using Inclusive.AI.
arXiv Detail & Related papers (2024-09-15T03:17:38Z) - Representation Bias in Political Sample Simulations with Large Language Models [54.48283690603358]
This study seeks to identify and quantify biases in simulating political samples with Large Language Models.
Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao dataset, and China Family Panel Studies.
arXiv Detail & Related papers (2024-07-16T05:52:26Z) - Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP.
Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions.
We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z) - The effect of diversity on group decision-making [11.079483551335597]
We show that small groups can, through dialogue, overcome intuitive biases and improve individual decision-making.
Across a large sample and different operationalisations, we consistently find that greater cognitive diversity is associated with more successful group deliberation.
arXiv Detail & Related papers (2024-02-02T14:15:01Z) - AI, write an essay for me: A large-scale comparison of human-written
versus ChatGPT-generated essays [66.36541161082856]
ChatGPT and similar generative AI models have attracted hundreds of millions of users.
This study compares human-written versus ChatGPT-generated argumentative student essays.
arXiv Detail & Related papers (2023-04-24T12:58:28Z) - AI Chat Assistants can Improve Conversations about Divisive Topics [3.8583005413310625]
We present results of a large-scale experiment that demonstrates how online conversations can be improved with artificial intelligence tools.
We employ a large language model to make real-time, evidence-based recommendations intended to improve participants' perception of feeling understood in conversations.
We find that these interventions improve the reported quality of the conversation, reduce political divisiveness, and improve the tone, without systematically changing the content of the conversation or moving people's policy attitudes.
arXiv Detail & Related papers (2023-02-14T06:42:09Z) - My Teacher Thinks The World Is Flat! Interpreting Automatic Essay
Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples.
We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms.
We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.