Related papers: Active Inference with Reusable State-Dependent Value Profiles

Active Inference with Reusable State-Dependent Value Profiles

URL: http://arxiv.org/abs/2512.11829v1
Date: Wed, 03 Dec 2025 04:11:57 GMT
Title: Active Inference with Reusable State-Dependent Value Profiles
Authors: Jacob Poschl,
Abstract summary: We introduce value profiles: a small set of reusable bundles of value-related parameters assigned to hidden states in a generative model.<n>We evaluate this framework in probabilistic reversal learning, comparing static-precision, entropy-coupled dynamic-precision, and profile-based models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adaptive behavior in volatile environments requires agents to switch among value-control regimes across latent contexts, but maintaining separate preferences, policy biases, and action-confidence parameters for every situation is intractable. We introduce value profiles: a small set of reusable bundles of value-related parameters (outcome preferences, policy priors, and policy precision) assigned to hidden states in a generative model. As posterior beliefs over states evolve trial by trial, effective control parameters arise via belief-weighted mixing, enabling state-conditional strategy recruitment without requiring independent parameters for each context. We evaluate this framework in probabilistic reversal learning, comparing static-precision, entropy-coupled dynamic-precision, and profile-based models using cross-validated log-likelihood and information criteria. Model comparison favors the profile-based model over simpler alternatives (about 100-point AIC differences), and parameter-recovery analyses support structural identifiability even when context must be inferred from noisy observations. Model-based inference further suggests that adaptive control in this task is driven primarily by modulation of policy priors rather than policy precision, with gradual belief-dependent profile recruitment consistent with state-conditional (not purely uncertainty-driven) control. Overall, reusable value profiles provide a tractable computational account of belief-conditioned value control in volatile environments and yield testable signatures of belief-dependent control and behavioral flexibility.

Related papers

How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics [65.67654005892469]
We show that proper instance-dependent sampling can yield stronger ranking guarantees, while skewed on-policy sampling can induce excessive concentration under structured preferences.<n>We then analyze iterative alignment dynamics in which the learned policy feeds back into future sampling and reference policies.<n>Our theoretical insights extend to Direct Preference Optimization, indicating the phenomena we captured are common to a broader class of preference-alignment methods.
arXiv Detail & Related papers (2026-02-12T17:11:08Z)
Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Automatically Adaptive Conformal Risk Control [49.95190019041905]
We propose a methodology for achieving approximate conditional control of statistical risks by adapting to the difficulty of test samples.<n>Our framework goes beyond traditional conditional risk control based on user-provided conditioning events to the algorithmic, data-driven determination of appropriate function classes for conditioning.
arXiv Detail & Related papers (2024-06-25T08:29:32Z)
Belief-State Query Policies for User-Aligned POMDPs [18.821166966365315]
We present a novel framework for expressing users' constraints and preferences about agent behavior in a partially observable setting.<n>We present the first formal analysis of such constraints and prove that while the expected cost function of a parameterized BSQ policy w.r.t its parameters is not convex, it is piecewise constant.<n>This theoretical result leads to novel algorithms that optimize gPOMDP agent behavior with guaranteed user alignment.
arXiv Detail & Related papers (2024-05-24T20:04:51Z)
Learning the Uncertainty Sets for Control Dynamics via Set Membership: A Non-Asymptotic Analysis [18.110158316883403]
This paper focuses on set membership estimation (SME) for unknown linear systems. We provide the first convergence rate bounds for SME and discuss variations of SME under relaxed assumptions. We also provide numerical results demonstrating SME's practical promise.
arXiv Detail & Related papers (2023-09-26T03:58:06Z)
$K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control [0.6906005491572401]
We propose a novel $K$-nearest neighbor reparametric procedure for estimating the performance of a policy from historical data. Our analysis allows for the sampling of entire episodes, as is common practice in most applications. Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization, and does not explicitly assume a parametric model for the environment's dynamics.
arXiv Detail & Related papers (2023-06-07T23:55:12Z)
Hallucinated Adversarial Control for Conservative Offline Policy Evaluation [64.94009515033984]
We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, we seek to obtain a (tight) lower bound on a policy's performance. We introduce HAMBO, which builds on an uncertainty-aware learned model of the transition dynamics. We prove that the resulting COPE estimates are valid lower bounds, and, under regularity conditions, show their convergence to the true expected return.
arXiv Detail & Related papers (2023-03-02T08:57:35Z)
Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty [68.00748155945047]
Capturing uncertainty in models of complex dynamical systems is crucial to designing safe controllers. Several approaches use formal abstractions to synthesize policies that satisfy temporal specifications related to safety and reachability. Our contribution is a novel abstraction-based controller method for continuous-state models with noise, uncertain parameters, and external disturbances.
arXiv Detail & Related papers (2022-10-12T07:57:03Z)
Model-Value Inconsistency as a Signal for Epistemic Uncertainty [22.492926703232015]
Self-inconsistency is a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a model. We show that, unlike prior work, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms.
arXiv Detail & Related papers (2021-12-08T07:53:41Z)
Employing an Adjusted Stability Measure for Multi-Criteria Model Fitting on Data Sets with Similar Features [0.1127980896956825]
We show that our approach achieves the same or better predictive performance compared to the two established approaches. Our approach succeeds at selecting the relevant features while avoiding irrelevant or redundant features. For data sets with many similar features, the feature selection stability must be evaluated with an adjusted stability measure.
arXiv Detail & Related papers (2021-06-15T12:48:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.