LLMs Among Us: Generative AI Participating in Digital Discourse
- URL: http://arxiv.org/abs/2402.07940v1
- Date: Thu, 8 Feb 2024 19:21:33 GMT
- Title: LLMs Among Us: Generative AI Participating in Digital Discourse
- Authors: Kristina Radivojevic, Nicholas Clark, Paul Brenner
- Abstract summary: "LLMs Among Us" is an experimental framework for bot and human participants to communicate without knowing the ratio or nature of bot and human participants.
We conducted three rounds of the experiment and surveyed participants after each round to measure the ability of LLMs to pose as human participants without human detection.
We found that participants correctly identified the nature of other users in the experiment only 42% of the time despite knowing the presence of both bots and humans.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The emergence of Large Language Models (LLMs) has great potential to reshape
the landscape of many social media platforms. While this can bring promising
opportunities, it also raises many threats, such as biases and privacy
concerns, and may contribute to the spread of propaganda by malicious actors.
We developed the "LLMs Among Us" experimental framework on top of the Mastodon
social media platform for bot and human participants to communicate without
knowing the ratio or nature of bot and human participants. We built 10 personas
with three different LLMs, GPT-4, LLama 2 Chat, and Claude. We conducted three
rounds of the experiment and surveyed participants after each round to measure
the ability of LLMs to pose as human participants without human detection. We
found that participants correctly identified the nature of other users in the
experiment only 42% of the time despite knowing the presence of both bots and
humans. We also found that the choice of persona had substantially more impact
on human perception than the choice of mainstream LLMs.
Related papers
- LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
LLM-Roleplay is a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction.
We collect natural human-chatbot dialogues from different sociodemographic groups and conduct a human evaluation to compare real human-chatbot dialogues with our generated dialogues.
arXiv Detail & Related papers (2024-07-04T14:49:46Z) - Explicit and Implicit Large Language Model Personas Generate Opinions but Fail to Replicate Deeper Perceptions and Biases [14.650234624251716]
Large language models (LLMs) are increasingly being used in human-centered social scientific tasks.
These tasks are highly subjective and dependent on human factors, such as one's environment, attitudes, beliefs, and lived experiences.
We examine the role of prompting LLMs with human-like personas and ask the models to answer as if they were a specific human.
arXiv Detail & Related papers (2024-06-20T16:24:07Z) - How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO [55.25989137825992]
We introduce ECHO, an evaluative framework inspired by the Turing test.
This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses.
We evaluate three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models.
arXiv Detail & Related papers (2024-04-22T08:00:51Z) - On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial [10.770999939834985]
We analyze the effect of AI-driven persuasion in a controlled, harmless setting.
We found that participants who debated GPT-4 with access to their personal information had 81.7% higher odds of increased agreement with their opponents compared to participants who debated humans.
arXiv Detail & Related papers (2024-03-21T13:14:40Z) - Are Large Language Models Aligned with People's Social Intuitions for Human-Robot Interactions? [7.308479353736709]
Large language models (LLMs) are increasingly used in robotics, especially for high-level action planning.
In this work, we test whether LLMs reproduce people's intuitions and communication in human-robot interaction scenarios.
We show that vision models fail to capture the essence of video stimuli and that LLMs tend to rate different communicative acts and behavior higher than people.
arXiv Detail & Related papers (2024-03-08T22:23:23Z) - Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference [48.99117537559644]
We introduce Arena, an open platform for evaluating Large Language Models (LLMs) based on human preferences.
Our methodology employs a pairwise comparison approach and leverages input from a diverse user base through crowdsourcing.
This paper describes the platform, analyzes the data we have collected so far, and explains the tried-and-true statistical methods we are using.
arXiv Detail & Related papers (2024-03-07T01:22:38Z) - Limits of Large Language Models in Debating Humans [0.0]
Large Language Models (LLMs) have shown remarkable promise in their ability to interact proficiently with humans.
This paper endeavors to test the limits of current-day LLMs with a pre-registered study integrating real people with LLM agents acting as people.
arXiv Detail & Related papers (2024-02-06T03:24:27Z) - Large language models cannot replace human participants because they
cannot portray identity groups [40.865099955752825]
We argue that large language models (LLMs) are doomed to both misportray and flatten the representations of demographic groups.
We discuss a third consideration about how identity prompts can essentialize identities.
Overall, we urge caution in use cases where LLMs are intended to replace human participants whose identities are relevant to the task at hand.
arXiv Detail & Related papers (2024-02-02T21:21:06Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena [76.21004582932268]
We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement biases.
We then verify the agreement between LLM judges and human preferences by introducing two benchmarks: MT-bench, a multi-turn question set; and Arena, a crowdsourced battle platform.
arXiv Detail & Related papers (2023-06-09T05:55:52Z) - Revisiting the Reliability of Psychological Scales on Large Language
Models [66.31055885857062]
This study aims to determine the reliability of applying personality assessments to Large Language Models (LLMs)
By shedding light on the personalization of LLMs, our study endeavors to pave the way for future explorations in this field.
arXiv Detail & Related papers (2023-05-31T15:03:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.