Related papers: A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents

A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents

URL: http://arxiv.org/abs/2602.08964v1
Date: Mon, 09 Feb 2026 18:00:28 GMT
Title: A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents
Authors: Raghu Arghal, Fade Chen, Niall Dalton, Evgenii Kortukov, Calum McNamara, Angelos Nalmpantis, Moksh Nirvaan, Gabriele Sarti, Mario Giulianelli,
Abstract summary: We propose a framework for evaluating goal-directedness that integrates behavioural evaluation with interpretability-based analyses of models' internal representations.<n>We evaluate an agent against an optimal policy across varying grid sizes, obstacle densities, and goal structures.<n>We then use probing methods to decode the agent's internal representations of the environment state and its multi-step action plans.
Score: 8.007212170802807
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding an agent's goals helps explain and predict its behaviour, yet there is no established methodology for reliably attributing goals to agentic systems. We propose a framework for evaluating goal-directedness that integrates behavioural evaluation with interpretability-based analyses of models' internal representations. As a case study, we examine an LLM agent navigating a 2D grid world toward a goal state. Behaviourally, we evaluate the agent against an optimal policy across varying grid sizes, obstacle densities, and goal structures, finding that performance scales with task difficulty while remaining robust to difficulty-preserving transformations and complex goal structures. We then use probing methods to decode the agent's internal representations of the environment state and its multi-step action plans. We find that the LLM agent non-linearly encodes a coarse spatial map of the environment, preserving approximate task-relevant cues about its position and the goal location; that its actions are broadly consistent with these internal representations; and that reasoning reorganises them, shifting from broader environment structural cues toward information supporting immediate action selection. Our findings support the view that introspective examination is required beyond behavioural evaluations to characterise how agents represent and pursue their objectives.

Related papers

Agentic Reasoning for Large Language Models [122.81018455095999]
Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making.<n>Large language models (LLMs) demonstrate strong reasoning capabilities in closed-world settings, but struggle in open-ended and dynamic environments.<n>Agentic reasoning marks a paradigm shift by reframing LLMs as autonomous agents that plan, act, and learn through continual interaction.
arXiv Detail & Related papers (2026-01-18T18:58:23Z)
What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding [50.35012849818872]
Large language model (LLM) agents have demonstrated remarkable capabilities in complex decision-making and tool-use tasks.<n>We propose Task-to-Quiz (T2Q), a deterministic and automated evaluation paradigm designed to decouple task execution from world-state understanding.<n>Our experiments reveal that task success is often a poor proxy for environment understanding, and that current memory machanism can not effectively help agents acquire a grounded model of the environment.
arXiv Detail & Related papers (2026-01-14T14:09:11Z)
PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention [92.85371254435074]
PosA-VLA framework anchors visual attention via pose-conditioned supervision, consistently guiding the model's perception toward task-relevant regions.<n>We show that our method executes embodied tasks with precise and time-efficient behavior across diverse robotic manipulation benchmarks.
arXiv Detail & Related papers (2025-12-03T12:14:29Z)
Scaling Environments for LLM Agents in the Era of Learning from Interaction: A Survey [30.673419015614233]
A growing consensus is that agents should interact directly with environments and learn from experience through reinforcement learning.<n>We formalize this iterative process as the Generation-Execution-Feedback (GEF) loop, where environments generate tasks to challenge agents, return observations in response to agents' actions during task execution, and provide evaluative feedback on rollouts for subsequent learning.<n>Under this paradigm, environments function as indispensable producers of experiential data, highlighting the need to scale them toward greater complexity, realism, and interactivity.
arXiv Detail & Related papers (2025-11-12T12:56:25Z)
Decentralized Multi-Agent Goal Assignment for Path Planning using Large Language Models [7.94408712915778]
This work addresses the problem of decentralized goal assignment for multi-agent path planning.<n>Agents independently generate ranked preferences over goals based on structured representations of the environment.<n>We compare greedys, optimal assignment, and large language model (LLM)-based agents in fully observable grid-world settings.
arXiv Detail & Related papers (2025-10-27T20:05:56Z)
Goal-Directedness is in the Eye of the Beholder [48.937781898861815]
Probing for goal-directed behavior comes in two flavors: Behavioral and mechanistic.<n>We identify technical and conceptual problems that arise from formalizing goals in agent systems.<n>We outline new directions for modeling goal-directedness as an emergent property of dynamic, multi-agent systems.
arXiv Detail & Related papers (2025-08-18T11:04:18Z)
Adaptive Target Localization under Uncertainty using Multi-Agent Deep Reinforcement Learning with Knowledge Transfer [15.605693371392212]
This work proposes a novel MADRL-based method for target localization in uncertain environments.<n>The observations of the agents are designed in an optimized manner to capture essential information in the environment.<n>A Deep Learning model builds on the knowledge from the MADRL model to accurately estimating the target location if it is unreachable.
arXiv Detail & Related papers (2025-01-19T02:58:22Z)
Improving Zero-Shot ObjectNav with Generative Communication [60.84730028539513]
We propose a new method for improving zero-shot ObjectNav. Our approach takes into account that the ground agent may have limited and sometimes obstructed view.
arXiv Detail & Related papers (2024-08-03T22:55:26Z)
Goal Recognition Design for General Behavioral Agents using Machine Learning [18.19703033857065]
Goal recognition design (GRD) aims to make limited modifications to decision-making environments to make it easier to infer the goals of agents acting within those environments.<n>We leverage machine learning methods for goal recognition design that can both improve run-time efficiency and account for agents with general behavioral models.
arXiv Detail & Related papers (2024-04-03T20:38:22Z)
Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups. We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.