CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations
- URL: http://arxiv.org/abs/2310.11501v1
- Date: Tue, 17 Oct 2023 18:00:25 GMT
- Title: CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations
- Authors: Myra Cheng, Tiziano Piccardi, Diyi Yang
- Abstract summary: We present a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic.
We use this framework to measure open-ended LLM simulations' susceptibility to caricature, defined via two criteria: individuation and exaggeration.
We find that for GPT-4, simulations of certain demographics (political and marginalized groups) and topics (general, uncontroversial) are highly susceptible to caricature.
- Score: 61.9212914612875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has aimed to capture nuances of human behavior by using LLMs to
simulate responses from particular demographics in settings like social science
experiments and public opinion surveys. However, there are currently no
established ways to discuss or evaluate the quality of such LLM simulations.
Moreover, there is growing concern that these LLM simulations are flattened
caricatures of the personas that they aim to simulate, failing to capture the
multidimensionality of people and perpetuating stereotypes. To bridge these
gaps, we present CoMPosT, a framework to characterize LLM simulations using
four dimensions: Context, Model, Persona, and Topic. We use this framework to
measure open-ended LLM simulations' susceptibility to caricature, defined via
two criteria: individuation and exaggeration. We evaluate the level of
caricature in scenarios from existing work on LLM simulations. We find that for
GPT-4, simulations of certain demographics (political and marginalized groups)
and topics (general, uncontroversial) are highly susceptible to caricature.
Related papers
- Social Science Meets LLMs: How Reliable Are Large Language Models in Social Simulations? [40.00556764679785]
Large Language Models (LLMs) are increasingly employed for simulations, enabling applications in role-playing agents and Computational Social Science (CSS)
In this paper, we aim to answer How reliable is LLM-based simulation?''
arXiv Detail & Related papers (2024-10-30T20:09:37Z) - GenSim: A General Social Simulation Platform with Large Language Model based Agents [111.00666003559324]
We propose a novel large language model (LLMs)-based simulation platform called textitGenSim.
Our platform supports one hundred thousand agents to better simulate large-scale populations in real-world contexts.
To our knowledge, GenSim represents an initial step toward a general, large-scale, and correctable social simulation platform.
arXiv Detail & Related papers (2024-10-06T05:02:23Z) - Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs [24.613282867543244]
Large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena.
Recent work has used a more omniscient perspective on these simulations, which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world.
arXiv Detail & Related papers (2024-03-08T03:49:17Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - Exploring Value Biases: How LLMs Deviate Towards the Ideal [57.99044181599786]
Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact.
We show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
arXiv Detail & Related papers (2024-02-16T18:28:43Z) - LLM-driven Imitation of Subrational Behavior : Illusion or Reality? [3.2365468114603937]
Existing work highlights the ability of Large Language Models to address complex reasoning tasks and mimic human communication.
We propose to investigate the use of LLMs to generate synthetic human demonstrations, which are then used to learn subrational agent policies.
We experimentally evaluate the ability of our framework to model sub-rationality through four simple scenarios.
arXiv Detail & Related papers (2024-02-13T19:46:39Z) - Systematic Biases in LLM Simulations of Debates [12.933509143906141]
We study the limitations of Large Language Models in simulating human interactions.
Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases.
These results underscore the need for further research to develop methods that help agents overcome these biases.
arXiv Detail & Related papers (2024-02-06T14:51:55Z) - How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation [46.42384207122049]
We design SimulateBench to evaluate the believability of large language models (LLMs) when simulating human behaviors.
Based on SimulateBench, we evaluate the performances of 10 widely used LLMs when simulating characters.
arXiv Detail & Related papers (2023-12-28T16:51:11Z) - MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria [49.500322937449326]
Multimodal large language models (MLLMs) have broadened the scope of AI applications.
Existing automatic evaluation methodologies for MLLMs are mainly limited in evaluating queries without considering user experiences.
We propose a new evaluation paradigm for MLLMs, which is evaluating MLLMs with per-sample criteria using potent MLLM as the judge.
arXiv Detail & Related papers (2023-11-23T12:04:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.