Goal-Directedness is in the Eye of the Beholder
- URL: http://arxiv.org/abs/2508.13247v1
- Date: Mon, 18 Aug 2025 11:04:18 GMT
- Title: Goal-Directedness is in the Eye of the Beholder
- Authors: Nina Rajcic, Anders Søgaard,
- Abstract summary: Probing for goal-directed behavior comes in two flavors: Behavioral and mechanistic.<n>We identify technical and conceptual problems that arise from formalizing goals in agent systems.<n>We outline new directions for modeling goal-directedness as an emergent property of dynamic, multi-agent systems.
- Score: 48.937781898861815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our ability to predict the behavior of complex agents turns on the attribution of goals. Probing for goal-directed behavior comes in two flavors: Behavioral and mechanistic. The former proposes that goal-directedness can be estimated through behavioral observation, whereas the latter attempts to probe for goals in internal model states. We work through the assumptions behind both approaches, identifying technical and conceptual problems that arise from formalizing goals in agent systems. We arrive at the perhaps surprising position that goal-directedness cannot be measured objectively. We outline new directions for modeling goal-directedness as an emergent property of dynamic, multi-agent systems.
Related papers
- A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents [8.007212170802807]
We propose a framework for evaluating goal-directedness that integrates behavioural evaluation with interpretability-based analyses of models' internal representations.<n>We evaluate an agent against an optimal policy across varying grid sizes, obstacle densities, and goal structures.<n>We then use probing methods to decode the agent's internal representations of the environment state and its multi-step action plans.
arXiv Detail & Related papers (2026-02-09T18:00:28Z) - Rejecting Hallucinated State Targets during Planning [84.179112256683]
In planning processes, generative or predictive models are often used to propose "targets" representing sets of expected or desirable states.<n>Unfortunately, learned models inevitably hallucinate infeasible targets that can cause delusional behaviors and safety concerns.<n>We devise a strategy to identify and reject infeasible targets by learning a target feasibility evaluator.
arXiv Detail & Related papers (2024-10-09T17:35:25Z) - Towards Measuring Goal-Directedness in AI Systems [0.0]
A key prerequisite for AI systems pursuing unintended goals is whether they will behave in a coherent and goal-directed manner.
We propose a new family of definitions of the goal-directedness of a policy that analyze whether it is well-modeled as near-optimal for many reward functions.
Our contribution is a definition of goal-directedness that is simpler and more easily computable in order to approach the question of whether AI systems could pursue dangerous goals.
arXiv Detail & Related papers (2024-10-07T01:34:42Z) - Habits and goals in synergy: a variational Bayesian framework for
behavior [22.461524318820672]
How to behave efficiently and flexibly is a central problem for understanding biological agents and creating intelligent embodied AI.
It has been well known that behavior can be classified as two types: reward-maximizing habitual behavior, which is fast while inflexible; and goal-directed behavior, which is flexible while slow.
We propose to bridge the gap between the two behaviors, drawing on the principles of variational Bayesian theory.
arXiv Detail & Related papers (2023-04-11T06:28:14Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Deceptive Decision-Making Under Uncertainty [25.197098169762356]
We study the design of autonomous agents that are capable of deceiving outside observers about their intentions while carrying out tasks.
By modeling the agent's behavior as a Markov decision process, we consider a setting where the agent aims to reach one of multiple potential goals.
We propose a novel approach to model observer predictions based on the principle of maximum entropy and to efficiently generate deceptive strategies.
arXiv Detail & Related papers (2021-09-14T14:56:23Z) - Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position
Estimation [1.20855096102517]
We present Goal-GAN, an interpretable and end-to-end trainable model for human trajectory prediction.
Inspired by human navigation, we model the task of trajectory prediction as an intuitive two-stage process.
arXiv Detail & Related papers (2020-10-02T17:17:45Z) - Tracking Emotions: Intrinsic Motivation Grounded on Multi-Level
Prediction Error Dynamics [68.8204255655161]
We discuss how emotions arise when differences between expected and actual rates of progress towards a goal are experienced.
We present an intrinsic motivation architecture that generates behaviors towards self-generated and dynamic goals.
arXiv Detail & Related papers (2020-07-29T06:53:13Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Intrinsic Motivation for Encouraging Synergistic Behavior [55.10275467562764]
We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks.
Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.
arXiv Detail & Related papers (2020-02-12T19:34:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.