Related papers: Surrogate Fitness Metrics for Interpretable Reinforcement Learning

Surrogate Fitness Metrics for Interpretable Reinforcement Learning

URL: http://arxiv.org/abs/2504.14645v1
Date: Sun, 20 Apr 2025 15:01:19 GMT
Title: Surrogate Fitness Metrics for Interpretable Reinforcement Learning
Authors: Philipp Altmann, Céline Davignon, Maximilian Zorn, Fabian Ritz, Claudia Linnhoff-Popien, Thomas Gabor,
Abstract summary: We employ an evolutionary optimization framework that perturbs initial states to generate informative and diverse policy demonstrations.<n>A joint surrogate fitness function guides the optimization by combining local diversity, behavioral certainty, and global population diversity.<n>By refining and systematically analyzing surrogate fitness functions, this study advances the interpretability of RL models.
Score: 7.889696505137217
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We employ an evolutionary optimization framework that perturbs initial states to generate informative and diverse policy demonstrations. A joint surrogate fitness function guides the optimization by combining local diversity, behavioral certainty, and global population diversity. To assess demonstration quality, we apply a set of evaluation metrics, including the reward-based optimality gap, fidelity interquartile means (IQMs), fitness composition analysis, and trajectory visualizations. Hyperparameter sensitivity is also examined to better understand the dynamics of trajectory optimization. Our findings demonstrate that optimizing trajectory selection via surrogate fitness metrics significantly improves interpretability of RL policies in both discrete and continuous environments. In gridworld domains, evaluations reveal significantly enhanced demonstration fidelities compared to random and ablated baselines. In continuous control, the proposed framework offers valuable insights, particularly for early-stage policies, while fidelity-based optimization proves more effective for mature policies. By refining and systematically analyzing surrogate fitness functions, this study advances the interpretability of RL models. The proposed improvements provide deeper insights into RL decision-making, benefiting applications in safety-critical and explainability-focused domains.

Related papers

Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance.<n>We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Preference-Guided Diffusion for Multi-Objective Offline Optimization [64.08326521234228]
We propose a preference-guided diffusion model for offline multi-objective optimization. Our guidance is a preference model trained to predict the probability that one design dominates another. Our results highlight the effectiveness of classifier-guided diffusion models in generating diverse and high-quality solutions.
arXiv Detail & Related papers (2025-03-21T16:49:38Z)
A Survey of Direct Preference Optimization [103.59317151002693]
Large Language Models (LLMs) have demonstrated unprecedented generative capabilities.<n>Their alignment with human values remains critical for ensuring helpful and harmless deployments.<n>Direct Preference Optimization (DPO) has recently gained prominence as a streamlined alternative.
arXiv Detail & Related papers (2025-03-12T08:45:15Z)
Self-Consistent Model-based Adaptation for Visual Reinforcement Learning [27.701421196547674]
Visual reinforcement learning agents face serious performance declines in real-world applications caused by visual distractions. Existing methods rely on fine-tuning the policy's representations with hand-crafted augmentations. We propose Self-Consistent Model-based Adaptation (SCMA), a novel method that fosters robust adaptation without modifying the policy.
arXiv Detail & Related papers (2025-02-14T05:23:56Z)
Learning Dynamic Representations via An Optimally-Weighted Maximum Mean Discrepancy Optimization Framework for Continual Learning [16.10753846850319]
Continual learning allows models to persistently acquire and retain information.<n> catastrophic forgetting can severely impair model performance.<n>We introduce a novel framework termed Optimally-Weighted Mean Discrepancy (OWMMD), which imposes penalties on representation alterations.
arXiv Detail & Related papers (2025-01-21T13:33:45Z)
REACT: Revealing Evolutionary Action Consequence Trajectories for Interpretable Reinforcement Learning [7.889696505137217]
We propose Revealing Evolutionary Action Consequence Trajectories (REACT) to enhance the interpretability of Reinforcement Learning (RL) In contrast to the prevalent practice of RL models based on their optimal behavior learned during training, we posit that considering a range of edge-case trajectories provides a more comprehensive understanding of their inherent behavior. Our results highlight its effectiveness in revealing nuanced aspects of RL models' behavior beyond optimal performance, thereby contributing to improved interpretability.
arXiv Detail & Related papers (2024-04-04T10:56:30Z)
Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning [73.80728148866906]
Quasimetric Reinforcement Learning (QRL) is a new RL method that utilizes quasimetric models to learn optimal value functions. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance.
arXiv Detail & Related papers (2023-04-03T17:59:58Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control [0.0]
We show that the principal AlphaZero/TDGammon ideas of approximation in value space and rollout apply very broadly to deterministic and optimal control problems. These ideas can be effectively integrated with other important methodologies such as model control, adaptive control, decentralized control, and neural network-based value and policy approximations.
arXiv Detail & Related papers (2021-08-20T19:17:35Z)
Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions. We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel. We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z)
Value Driven Representation for Human-in-the-Loop Reinforcement Learning [33.79501890330252]
We focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent. We present an algorithm, value driven representation (VDR) that can iteratively and adaptively augment the observation space of a reinforcement learning agent. We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines.
arXiv Detail & Related papers (2020-04-02T18:45:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.