Narrative-Guided Reinforcement Learning: A Platform for Studying Language Model Influence on Decision Making
- URL: http://arxiv.org/abs/2509.08785v1
- Date: Wed, 10 Sep 2025 17:14:12 GMT
- Title: Narrative-Guided Reinforcement Learning: A Platform for Studying Language Model Influence on Decision Making
- Authors: Anup Tuladhar, Araz Minhas, Adam Kirton, Eli Kinney-Lang,
- Abstract summary: We present a preliminary platform that explores how narrative elements might shape AI decision-making.<n>The system comprises a reinforcement learning policy that suggests actions based on past experience, and a language model that processes these suggestions through different narrative frameworks to guide decisions.
- Score: 0.20999222360659608
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present a preliminary experimental platform that explores how narrative elements might shape AI decision-making by combining reinforcement learning (RL) with language model reasoning. While AI systems can now both make decisions and engage in narrative reasoning, these capabilities have mostly been studied separately. Our platform attempts to bridge this gap using a dual-system architecture to examine how narrative frameworks could influence reward-based learning. The system comprises a reinforcement learning policy that suggests actions based on past experience, and a language model that processes these suggestions through different narrative frameworks to guide decisions. This setup enables initial experimentation with narrative elements while maintaining consistent environment and reward structures. We implement this architecture in a configurable gridworld environment, where agents receive both policy suggestions and information about their surroundings. The platform's modular design facilitates controlled testing of environmental complexity, narrative parameters, and the interaction between reinforcement learning and narrative-based decisions. Our logging system captures basic decision metrics, from RL policy values to language model reasoning to action selection patterns. While preliminary, this implementation provides a foundation for studying how different narrative frameworks might affect reward-based decisions and exploring potential interactions between optimization-based learning and symbolic reasoning in AI systems.
Related papers
- Fuzzy, Symbolic, and Contextual: Enhancing LLM Instruction via Cognitive Scaffolding [3.553493344868413]
We study how architectural inductive biases influence the cognitive behavior of large language models (LLMs) in instructional dialogue.<n>We introduce a symbolic scaffolding mechanism paired with a short-term memory schema designed to promote adaptive, structured reasoning in Socratic tutoring.
arXiv Detail & Related papers (2025-08-28T20:46:13Z) - Matching Game Preferences Through Dialogical Large Language Models: A Perspective [0.6827423171182154]
This paper explores the future potential of "conversational intelligence" by examining how Large Language Models (LLMs) could be combined with GRAPHYP's network system.<n>We propose a conceptual framework that could make AI rea-soning transparent and traceable.<n>The goal of this perspective is to envision AI systems that would not only provide answers but also show users how those answers were reached.
arXiv Detail & Related papers (2025-07-26T16:40:17Z) - Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why [50.191655141020505]
This survey provides a comparative analysis of feature-based and GAN-based approaches to learning from demonstrations.<n>We argue that the dichotomy between feature-based and GAN-based methods is increasingly nuanced.
arXiv Detail & Related papers (2025-07-08T11:45:51Z) - Playpen: An Environment for Exploring Learning Through Conversational Interaction [81.67330926729015]
We investigate whether Dialogue Games can also serve as a source of feedback signals for learning.<n>We introduce Playpen, an environment for off- and online learning through Dialogue Game self-play.<n>We find that imitation learning through SFT improves performance on unseen instances, but negatively impacts other skills.
arXiv Detail & Related papers (2025-04-11T14:49:33Z) - Training a Generally Curious Agent [86.84089201249104]
Paprika is a fine-tuning approach that enables language models to develop general decision-making capabilities.<n>Paprika teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates.<n>Results suggest a promising path towards AI systems that can autonomously solve sequential decision-making problems.
arXiv Detail & Related papers (2025-02-24T18:56:58Z) - What the Weight?! A Unified Framework for Zero-Shot Knowledge
Composition [20.742004197901576]
We propose a novel framework for zero-shot module composition, which encompasses existing and some novel variations for selecting, weighting, and combining parameter modules.
We conduct the first comprehensive benchmarking study of various zero-shot knowledge composition strategies.
Our results highlight the efficacy of ensembling but also hint at the power of simple though often-ignored weighting methods.
arXiv Detail & Related papers (2024-01-23T13:35:47Z) - Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - JoTR: A Joint Transformer and Reinforcement Learning Framework for
Dialog Policy Learning [53.83063435640911]
Dialogue policy learning (DPL) is a crucial component of dialogue modelling.
We introduce a novel framework, JoTR, to generate flexible dialogue actions.
Unlike traditional methods, JoTR formulates a word-level policy that allows for a more dynamic and adaptable dialogue action generation.
arXiv Detail & Related papers (2023-09-01T03:19:53Z) - Feature Interactions Reveal Linguistic Structure in Language Models [2.0178765779788495]
We study feature interactions in the context of feature attribution methods for post-hoc interpretability.
We work out a grey box methodology, in which we train models to perfection on a formal language classification task.
We show that under specific configurations, some methods are indeed able to uncover the grammatical rules acquired by a model.
arXiv Detail & Related papers (2023-06-21T11:24:41Z) - Diverse and Faithful Knowledge-Grounded Dialogue Generation via
Sequential Posterior Inference [82.28542500317445]
We present an end-to-end learning framework, termed Sequential Posterior Inference (SPI), capable of selecting knowledge and generating dialogues.
Unlike other methods, SPI does not require the inference network or assume a simple geometry of the posterior distribution.
arXiv Detail & Related papers (2023-06-01T21:23:13Z) - Frugal Prompting for Dialog Models [17.048111072193933]
This study examines different approaches for building dialog systems using large language models (LLMs)
As part of prompt tuning, we experiment with various ways of providing instructions, exemplars, current query and additional context.
The research also analyzes the representations of dialog history that have the optimal usable-information density.
arXiv Detail & Related papers (2023-05-24T09:06:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.