Related papers: Breaking the Assistant Mold: Modeling Behavioral Variation in LLM Based Procedural Character Generation

Breaking the Assistant Mold: Modeling Behavioral Variation in LLM Based Procedural Character Generation

URL: http://arxiv.org/abs/2601.03396v1
Date: Tue, 06 Jan 2026 20:18:01 GMT
Title: Breaking the Assistant Mold: Modeling Behavioral Variation in LLM Based Procedural Character Generation
Authors: Maan Qraitem, Kate Saenko, Bryan A. Plummer,
Abstract summary: Procedural content generation has enabled vast virtual worlds through levels, maps, and quests, but large-scale character generation remains underexplored.<n>We identify two alignment-induced biases in existing methods.<n>We introduce PersonaWeaver, a framework that disentangles world-building from behavioral-building.
Score: 62.54606886226136
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Procedural content generation has enabled vast virtual worlds through levels, maps, and quests, but large-scale character generation remains underexplored. We identify two alignment-induced biases in existing methods: a positive moral bias, where characters uniformly adopt agreeable stances (e.g. always saying lying is bad), and a helpful assistant bias, where characters invariably answer questions directly (e.g. never refusing or deflecting). While such tendencies suit instruction-following systems, they suppress dramatic tension and yield predictable characters, stemming from maximum likelihood training and assistant fine-tuning. To address this, we introduce PersonaWeaver, a framework that disentangles world-building (roles, demographics) from behavioral-building (moral stances, interactional styles), yielding characters with more diverse reactions and moral stances, as well as second-order diversity in stylistic markers like length, tone, and punctuation. Code: https://github.com/mqraitem/Persona-Weaver

Related papers

MM-SCALE: Grounded Multimodal Moral Reasoning via Scalar Judgment and Listwise Alignment [48.39756797294967]
We present MM-SCALE, a dataset for aligning Vision-Language Models with human moral preferences.<n>Each image-scenario pair is annotated with moral acceptability scores and grounded reasoning labels by humans.<n>Our framework provides richer alignment signals and finer calibration of multimodal moral reasoning.
arXiv Detail & Related papers (2026-02-03T15:48:00Z)
Fame Fades, Nature Remains: Disentangling the Character Identity of Role-Playing Agents [13.029517493304505]
We propose a multidimensional construct that disentangles a character into two distinct layers: textbf(1) Parametric Identity, referring to character-specific knowledge encoded from the LLM's pre-training, and textbf(2) Attributive Identity, capturing fine-grained behavioral properties such as personality traits and moral values.<n>Our findings pinpoint negative social natures as the primary bottleneck in RPA fidelity, guiding future character construction and evaluation.
arXiv Detail & Related papers (2026-01-08T08:33:40Z)
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains [69.0500092126915]
Large Language Models (LLMs) are increasingly tasked with creative generation, including the simulation of fictional characters.<n>We hypothesize that the safety alignment of modern LLMs creates a fundamental conflict with the task of authentically role-playing morally ambiguous or villainous characters.<n>We introduce the Moral RolePlay benchmark, a new dataset featuring a four-level moral alignment scale and a balanced test set for rigorous evaluation.<n>Our large-scale evaluation reveals a consistent, monotonic decline in role-playing fidelity as character morality decreases.
arXiv Detail & Related papers (2025-11-07T03:50:52Z)
Beyond One World: Benchmarking Super Heros in Role-Playing Across Multiversal Contexts [2.2816872489992135]
We introduce Beyond One World, a benchmark for character-grounded roleplay spanning 30 iconic heroes and 90 canon-specific versions.<n>We score responses for canonical accuracy and reasoning fidelity.<n>We propose Think-Act Matching, a metric that quantifies alignment between reasons and actions.
arXiv Detail & Related papers (2025-10-16T06:39:27Z)
CharaConsist: Fine-Grained Consistent Character Generation [93.08900337098302]
CharaConsist is first consistent generation method tailored for text-to-image DiT model.<n>CharaConsist enables fine-grained consistency for both foreground and background.<n>Its ability to maintain fine-grained consistency, combined with the larger capacity of latest base model, enables it to produce high-quality visual outputs.
arXiv Detail & Related papers (2025-07-15T17:58:08Z)
Can LLM Agents Maintain a Persona in Discourse? [3.286711575862228]
Large Language Models (LLMs) are widely used as conversational agents, exploiting their capabilities in various sectors such as education, law, medicine, and more.<n>LLMs are often subjected to context-shifting behaviour, resulting in a lack of consistent and interpretable personality-aligned interactions.<n>We show that while LLMs can be guided toward personality-driven dialogue, their ability to maintain personality traits varies significantly depending on the combination of models and discourse settings.
arXiv Detail & Related papers (2025-02-17T14:36:39Z)
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds [74.02480671181685]
Role-playing is a crucial capability of Large Language Models (LLMs)<n>Current evaluation methods fall short of adequately capturing the nuanced character traits and behaviors essential for authentic role-playing.<n>We propose CharacterBox, a simulation sandbox designed to generate situational fine-grained character behavior trajectories.
arXiv Detail & Related papers (2024-12-07T12:09:35Z)
CHIRON: Rich Character Representations in Long-Form Narratives [98.273323001781]
We propose CHIRON, a new character sheet' based representation that organizes and filters textual information about characters.<n>We validate CHIRON via the downstream task of masked-character prediction, where our experiments show CHIRON is better and more flexible than comparable summary-based baselines.<n> metrics derived from CHIRON can be used to automatically infer character-centricity in stories, and that these metrics align with human judgments.
arXiv Detail & Related papers (2024-06-14T17:23:57Z)
CharacterGPT: A Persona Reconstruction Framework for Role-Playing Agents [6.220415006158471]
We introduce CharacterGPT, a framework designed to dynamically reconstruct character personas through Character Persona Training (CPT)<n>This approach incrementally updates personas by extracting traits from chapter-wise novel summaries, reflecting the progression of the narrative.<n>Our framework is evaluated through Big Five personality evaluations and creative tasks, in which characters generate original narratives.
arXiv Detail & Related papers (2024-05-30T07:44:16Z)
MoralBERT: A Fine-Tuned Language Model for Capturing Moral Values in Social Discussions [4.747987317906765]
Moral values play a fundamental role in how we evaluate information, make decisions, and form judgements around important social issues. Recent advances in Natural Language Processing (NLP) show that moral values can be gauged in human-generated textual content. This paper introduces MoralBERT, a range of language representation models fine-tuned to capture moral sentiment in social discourse.
arXiv Detail & Related papers (2024-03-12T14:12:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.