Related papers: Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning

Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning

URL: http://arxiv.org/abs/2511.20591v2
Date: Thu, 27 Nov 2025 08:35:55 GMT
Title: Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning
Authors: Charlotte Beylier, Hannah Selder, Arthur Fleig, Simon M. Hofmann, Nico Scherf,
Abstract summary: We introduce a scientific methodology for analyzing the learning process through quantitative analysis of saliency.<n>This approach aggregates saliency information at the object and modality level into hierarchical attention profiles.<n>This methodology uncovers algorithm-specific attention biases, reveals unintended reward-driven strategies, and diagnoses overfitting to redundant sensory channels.
Score: 4.662814261930481
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While deep reinforcement learning agents demonstrate high performance across domains, their internal decision processes remain difficult to interpret when evaluated only through performance metrics. In particular, it is poorly understood which input features agents rely on, how these dependencies evolve during training, and how they relate to behavior. We introduce a scientific methodology for analyzing the learning process through quantitative analysis of saliency. This approach aggregates saliency information at the object and modality level into hierarchical attention profiles, quantifying how agents allocate attention over time, thereby forming attention trajectories throughout training. Applied to Atari benchmarks, custom Pong environments, and muscle-actuated biomechanical user simulations in visuomotor interactive tasks, this methodology uncovers algorithm-specific attention biases, reveals unintended reward-driven strategies, and diagnoses overfitting to redundant sensory channels. These patterns correspond to measurable behavioral differences, demonstrating empirical links between attention profiles, learning dynamics, and agent behavior. To assess robustness of the attention profiles, we validate our findings across multiple saliency methods and environments. The results establish attention trajectories as a promising diagnostic axis for tracing how feature reliance develops during training and for identifying biases and vulnerabilities invisible to performance metrics alone.

Related papers

GuideAI: A Real-time Personalized Learning Solution with Adaptive Interventions [0.5833117322405447]
Large Language Models (LLMs) have emerged as powerful learning tools, but they lack awareness of learners' cognitive and physiological states.<n>We introduce GuideAI, a multi-modal framework that enhances LLM-driven learning by integrating real-time biosensory feedback.
arXiv Detail & Related papers (2026-01-28T09:06:45Z)
AI-Driven Evaluation of Surgical Skill via Action Recognition [4.92174988745803]
We propose an AI-driven framework for the automated assessment of microanastomosis performance.<n>Performance is evaluated along five aspects of microanastomosis skill, including overall action execution, motion quality during procedure-critical actions, and general instrument handling.
arXiv Detail & Related papers (2025-12-30T18:45:34Z)
Stochastic Encodings for Active Feature Acquisition [100.47043816019888]
Active Feature Acquisition is an instance-wise, sequential decision making problem.<n>The aim is to dynamically select which feature to measure based on current observations, independently for each test instance.<n>Common approaches either use Reinforcement Learning, which experiences training difficulties, or greedily maximize the conditional mutual information of the label and unobserved features, which makes myopic.<n>We introduce a latent variable model, trained in a supervised manner. Acquisitions are made by reasoning about the features across many possible unobserved realizations in a latent space.
arXiv Detail & Related papers (2025-08-03T23:48:46Z)
Can you see how I learn? Human observers' inferences about Reinforcement Learning agents' learning processes [1.6874375111244329]
Reinforcement Learning (RL) agents often exhibit learning behaviors that are not intuitively interpretable by human observers.<n>This work provides a data-driven understanding of the factors of human observers' understanding of the agent's learning process.
arXiv Detail & Related papers (2025-06-16T15:04:27Z)
Truly Self-Improving Agents Require Intrinsic Metacognitive Learning [59.60803539959191]
Self-improving agents aim to continuously acquire new capabilities with minimal supervision.<n>Current approaches face two key limitations: their self-improvement processes are often rigid, fail to generalize across tasks domains, and struggle to scale with increasing agent capabilities.<n>We argue that effective self-improvement requires intrinsic metacognitive learning, defined as an agent's intrinsic ability to actively evaluate, reflect on, and adapt its own learning processes.
arXiv Detail & Related papers (2025-06-05T14:53:35Z)
Dynamic Programming Techniques for Enhancing Cognitive Representation in Knowledge Tracing [125.75923987618977]
We propose the Cognitive Representation Dynamic Programming based Knowledge Tracing (CRDP-KT) model.<n>It is a dynamic programming algorithm to optimize cognitive representations based on the difficulty of the questions and the performance intervals between them.<n>It provides more accurate and systematic input features for subsequent model training, thereby minimizing distortion in the simulation of cognitive states.
arXiv Detail & Related papers (2025-06-03T14:44:48Z)
Interpretable Learning Dynamics in Unsupervised Reinforcement Learning [0.10832949790701804]
We present an interpretability framework for unsupervised reinforcement learning (URL) agents.<n>We analyze five agents DQN, RND, ICM, PPO, and a Transformer-RND variant trained on procedurally generated environments.
arXiv Detail & Related papers (2025-05-06T19:57:09Z)
Revealing the Learning Process in Reinforcement Learning Agents Through Attention-Oriented Metrics [0.0]
We introduce attention-oriented metrics (ATOMs) to investigate the development of an RL agent's attention during training.<n>ATOMs successfully delineate the attention patterns of an agent trained on each game variation, and that these differences in attention patterns translate into differences in the agent's behaviour.
arXiv Detail & Related papers (2024-06-20T13:56:05Z)
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training. Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z)
A Matter of Annotation: An Empirical Study on In Situ and Self-Recall Activity Annotations from Wearable Sensors [56.554277096170246]
We present an empirical study that evaluates and contrasts four commonly employed annotation methods in user studies focused on in-the-wild data collection. For both the user-driven, in situ annotations, where participants annotate their activities during the actual recording process, and the recall methods, where participants retrospectively annotate their data at the end of each day, the participants had the flexibility to select their own set of activity classes and corresponding labels.
arXiv Detail & Related papers (2023-05-15T16:02:56Z)
Discovering Behavioral Predispositions in Data to Improve Human Activity Recognition [1.2961180148172198]
We propose to improve the recognition performance by making use of the observation that patients tend to show specific behaviors at certain times of the day or week. All time segments within a cluster then consist of similar behaviors and thus indicate a behavioral predisposition (BPD) Empirically, we demonstrate that when the BPD per time segment is known, activity recognition performance can be substantially improved.
arXiv Detail & Related papers (2022-07-18T10:07:15Z)
Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis [14.656957226255628]
We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains. Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or models, and can be trained using entirely offline observational data.
arXiv Detail & Related papers (2022-06-17T23:07:33Z)
Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training. We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z)
Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification [101.49122450005869]
We present a counterfactual attention learning method to learn more effective attention based on causal inference. Specifically, we analyze the effect of the learned visual attention on network prediction. We evaluate our method on a wide range of fine-grained recognition tasks.
arXiv Detail & Related papers (2021-08-19T14:53:40Z)
What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm" We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z)
Joint Attention for Multi-Agent Coordination and Social Learning [108.31232213078597]
We show that joint attention can be useful as a mechanism for improving multi-agent coordination and social learning. Joint attention leads to higher performance than a competitive centralized critic baseline across multiple environments. Taken together, these findings suggest that joint attention may be a useful inductive bias for multi-agent learning.
arXiv Detail & Related papers (2021-04-15T20:14:19Z)
Unsupervised Behaviour Analysis and Magnification (uBAM) using Deep Learning [5.101123537955207]
Motor behaviour analysis provides a non-invasive strategy for identifying motor impairment and its change caused by interventions. We introduce unsupervised behaviour analysis and magnification (uBAM), an automatic deep learning algorithm for analysing behaviour by discovering and magnifying deviations. A central aspect is unsupervised learning of posture and behaviour representations to enable an objective comparison of movement.
arXiv Detail & Related papers (2020-12-16T20:07:36Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
Untangling tradeoffs between recurrence and self-attention in neural networks [81.30894993852813]
We present a formal analysis of how self-attention affects gradient propagation in recurrent networks. We prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies. We propose a relevancy screening mechanism that allows for a scalable use of sparse self-attention with recurrence.
arXiv Detail & Related papers (2020-06-16T19:24:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.