Related papers: On the Structural Non-Preservation of Epistemic Behaviour under Policy Transformation

On the Structural Non-Preservation of Epistemic Behaviour under Policy Transformation

URL: http://arxiv.org/abs/2602.21424v1
Date: Tue, 24 Feb 2026 22:55:21 GMT
Title: On the Structural Non-Preservation of Epistemic Behaviour under Policy Transformation
Authors: Alexander Galozy,
Abstract summary: We formalise such information-conditioned interaction patterns as behavioural dependency.<n>This induces a probe-relative notion of $$-behavioural equivalence and a within-policy behavioural distance.<n>Results identify structural conditions under which probe-conditioned behavioural separation is not preserved under common policy transformations.
Score: 51.56484100374058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) agents under partial observability often condition actions on internally accumulated information such as memory or inferred latent context. We formalise such information-conditioned interaction patterns as behavioural dependency: variation in action selection with respect to internal information under fixed observations. This induces a probe-relative notion of $ε$-behavioural equivalence and a within-policy behavioural distance that quantifies probe sensitivity. We establish three structural results. First, the set of policies exhibiting non-trivial behavioural dependency is not closed under convex aggregation. Second, behavioural distance contracts under convex combination. Third, we prove a sufficient local condition under which gradient ascent on a skewed mixture objective decreases behavioural distance when a dominant-mode gradient aligns with the direction of steepest contraction. Minimal bandit and partially observable gridworld experiments provide controlled witnesses of these mechanisms. In the examined settings, behavioural distance decreases under convex aggregation and under continued optimisation with skewed latent priors, and in these experiments it precedes degradation under latent prior shift. These results identify structural conditions under which probe-conditioned behavioural separation is not preserved under common policy transformations.

Related papers

Partial Causal Structure Learning for Valid Selective Conformal Inference under Interventions [0.0]
In genomics experiments, exchangeability often holds only within subsets of interventions that leave a target variable "unaffected"<n>Our contributions are: (i) a contamination-robust conformal coverage theorem that quantifies how misclassification of "unaffected" calibration examples degrades coverage via an explicit function $g(,n)$ of the contamination fraction and calibration set size; and (ii) a task-driven partial causal learning formulation that estimates only the binary descendant indicators $Z_a,i=mathbf1iinmathrmdesc(a)$ needed
arXiv Detail & Related papers (2026-03-02T18:58:22Z)
Simulated Adoption: Decoupling Magnitude and Direction in LLM In-Context Conflict Resolution [3.0242762196828448]
Large Language Models (LLMs) frequently prioritize conflicting in-context information over pre-existing parametric memory.<n>We show that models do not "unlearn" or suppress the magnitude of internal truths but rather employ a mechanism of geometric displacement.
arXiv Detail & Related papers (2026-02-04T06:13:11Z)
Causal Imitation Learning Under Measurement Error and Distribution Shift [6.038778620145853]
We study offline imitation learning (IL) when part of the decision-relevant state is observed only through noisy measurements.<n>We propose a general framework for IL under measurement error, inspired by explicitly modeling the causal relationships among the variables.
arXiv Detail & Related papers (2026-01-29T18:06:53Z)
Causal Discovery with Mixed Latent Confounding via Precision Decomposition [0.0]
Differentiable and score-based DAG learners can misinterpret global latent effects as causal edges, while latent-variable graphical models recover only undirected structure.<n>We propose textscDCL-DECOR, a modular, precision-led pipeline that separates these roles.
arXiv Detail & Related papers (2025-12-31T08:03:41Z)
Variational Learning of Disentangled Representations [2.3713407563738063]
Disentangled representations enable models to separate factors of variation that are shared across experimental conditions from those that are condition-specific.<n>We introduce DISCoVeR, a new variational framework that explicitly separates condition-invariant and condition-specific factors.<n>We show that DISCoVeR achieves improved disentanglement on synthetic datasets, natural images, and single-cell RNA-seq data.
arXiv Detail & Related papers (2025-06-20T17:36:12Z)
Unifying Perplexing Behaviors in Modified BP Attributions through Alignment Perspective [61.5509267439999]
We present a unified theoretical framework for methods like GBP, RectGrad, LRP, and DTD.<n>We demonstrate that they achieve input alignment by combining the weights of activated neurons.<n>This alignment improves the visualization quality and reduces sensitivity to weight randomization.
arXiv Detail & Related papers (2025-03-14T07:58:26Z)
Hallucinated Adversarial Control for Conservative Offline Policy Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance. We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics. We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z)
Score-based Causal Representation Learning with Interventions [54.735484409244386]
This paper studies the causal representation learning problem when latent causal variables are observed indirectly. The objectives are: (i) recovering the unknown linear transformation (up to scaling) and (ii) determining the directed acyclic graph (DAG) underlying the latent variables.
arXiv Detail & Related papers (2023-01-19T18:39:48Z)
Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning. GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients. This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z)
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables. We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph. Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z)
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare. We develop an approach that estimates the bounds of a given policy. We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.