LLMs Among Us: Generative AI Participating in Digital Discourse
- URL: http://arxiv.org/abs/2402.07940v1
- Date: Thu, 8 Feb 2024 19:21:33 GMT
- Title: LLMs Among Us: Generative AI Participating in Digital Discourse
- Authors: Kristina Radivojevic, Nicholas Clark, Paul Brenner
- Abstract summary: "LLMs Among Us" is an experimental framework for bot and human participants to communicate without knowing the ratio or nature of bot and human participants.
We conducted three rounds of the experiment and surveyed participants after each round to measure the ability of LLMs to pose as human participants without human detection.
We found that participants correctly identified the nature of other users in the experiment only 42% of the time despite knowing the presence of both bots and humans.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The emergence of Large Language Models (LLMs) has great potential to reshape
the landscape of many social media platforms. While this can bring promising
opportunities, it also raises many threats, such as biases and privacy
concerns, and may contribute to the spread of propaganda by malicious actors.
We developed the "LLMs Among Us" experimental framework on top of the Mastodon
social media platform for bot and human participants to communicate without
knowing the ratio or nature of bot and human participants. We built 10 personas
with three different LLMs, GPT-4, LLama 2 Chat, and Claude. We conducted three
rounds of the experiment and surveyed participants after each round to measure
the ability of LLMs to pose as human participants without human detection. We
found that participants correctly identified the nature of other users in the
experiment only 42% of the time despite knowing the presence of both bots and
humans. We also found that the choice of persona had substantially more impact
on human perception than the choice of mainstream LLMs.
Related papers
- Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game [3.8284679578037246]
We use the language logic game Who is Undercover?'' as an experimental platform to propose the Multi-Perspective Team Tactic (MPTT) framework.
MPTT aims to cultivate LLMs' human-like language expression logic, multi-dimensional thinking, and self-perception in complex scenarios.
Preliminary results show that MPTT, combined with WIU, leverages LLMs' cognitive capabilities to create a decision-making framework that can simulate real society.
arXiv Detail & Related papers (2024-10-20T06:41:31Z) - LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
We propose a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction.
Our method can simulate human-chatbot dialogues with a high indistinguishability rate.
arXiv Detail & Related papers (2024-07-04T14:49:46Z) - How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO [55.25989137825992]
We introduce ECHO, an evaluative framework inspired by the Turing test.
This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses.
We evaluate three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models.
arXiv Detail & Related papers (2024-04-22T08:00:51Z) - On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial [10.770999939834985]
We analyze the effect of AI-driven persuasion in a controlled, harmless setting.
We found that participants who debated GPT-4 with access to their personal information had 81.7% higher odds of increased agreement with their opponents compared to participants who debated humans.
arXiv Detail & Related papers (2024-03-21T13:14:40Z) - Are Large Language Models Aligned with People's Social Intuitions for Human-Robot Interactions? [7.308479353736709]
Large language models (LLMs) are increasingly used in robotics, especially for high-level action planning.
In this work, we test whether LLMs reproduce people's intuitions and communication in human-robot interaction scenarios.
We show that vision models fail to capture the essence of video stimuli and that LLMs tend to rate different communicative acts and behavior higher than people.
arXiv Detail & Related papers (2024-03-08T22:23:23Z) - Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference [48.99117537559644]
We introduce Arena, an open platform for evaluating Large Language Models (LLMs) based on human preferences.
Our methodology employs a pairwise comparison approach and leverages input from a diverse user base through crowdsourcing.
This paper describes the platform, analyzes the data we have collected so far, and explains the tried-and-true statistical methods we are using.
arXiv Detail & Related papers (2024-03-07T01:22:38Z) - Limits of Large Language Models in Debating Humans [0.0]
Large Language Models (LLMs) have shown remarkable promise in their ability to interact proficiently with humans.
This paper endeavors to test the limits of current-day LLMs with a pre-registered study integrating real people with LLM agents acting as people.
arXiv Detail & Related papers (2024-02-06T03:24:27Z) - Large language models should not replace human participants because they can misportray and flatten identity groups [36.36009232890876]
We show that there are two inherent limitations in the way current LLMs are trained that prevent this.
We argue analytically for why LLMs are likely to both misportray and flatten the representations of demographic groups.
We also discuss a third limitation about how identity prompts can essentialize identities.
arXiv Detail & Related papers (2024-02-02T21:21:06Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena [76.21004582932268]
We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement biases.
We then verify the agreement between LLM judges and human preferences by introducing two benchmarks: MT-bench, a multi-turn question set; and Arena, a crowdsourced battle platform.
arXiv Detail & Related papers (2023-06-09T05:55:52Z) - Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored.
This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.