Dynamic Decision-Making under Model Misspecification: A Stochastic Stability Approach
- URL: http://arxiv.org/abs/2602.17086v1
- Date: Thu, 19 Feb 2026 05:14:09 GMT
- Title: Dynamic Decision-Making under Model Misspecification: A Stochastic Stability Approach
- Authors: Xinyu Dai, Daniel Chen, Yian Qian,
- Abstract summary: We study the behavior of one of the most commonly used Bayesian reinforcement learning algorithms, Thompson Sampling, when the model class is misspecified.<n>We first provide a complete dynamic classification of posterior evolution in a misspecified two-armed Gaussian bandit.<n>We then extend the analysis to a general finite model class and develop a unified Markov framework.
- Score: 17.087471640760885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic decision-making under model uncertainty is central to many economic environments, yet existing bandit and reinforcement learning algorithms rely on the assumption of correct model specification. This paper studies the behavior and performance of one of the most commonly used Bayesian reinforcement learning algorithms, Thompson Sampling (TS), when the model class is misspecified. We first provide a complete dynamic classification of posterior evolution in a misspecified two-armed Gaussian bandit, identifying distinct regimes: correct model concentration, incorrect model concentration, and persistent belief mixing, characterized by the direction of statistical evidence and the model-action mapping. These regimes yield sharp predictions for limiting beliefs, action frequencies, and asymptotic regret. We then extend the analysis to a general finite model class and develop a unified stochastic stability framework that represents posterior evolution as a Markov process on the belief simplex. This approach characterizes two sufficient conditions to classify the ergodic and transient behaviors and provides inductive dimensional reductions of the posterior dynamics. Our results offer the first qualitative and geometric classification of TS under misspecification, bridging Bayesian learning with evolutionary dynamics, and also build the foundations of robust decision-making in structured bandits.
Related papers
- On Forgetting and Stability of Score-based Generative models [6.259598237089842]
Understanding the stability and long-time behavior of generative models is a fundamental problem in modern machine learning.<n>This paper provides quantitative bounds on the sampling error of score-based generative models by leveraging stability and forgetting properties of the Markov chain associated with the reverse-time dynamics.
arXiv Detail & Related papers (2026-01-29T15:37:50Z) - ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models [102.4511331368587]
ARISE (Adaptive Resolution-aware Scaling Evaluation) is a novel metric designed to assess the test-time scaling effectiveness of large reasoning models.<n>We conduct comprehensive experiments evaluating state-of-the-art reasoning models across diverse domains.
arXiv Detail & Related papers (2025-10-07T15:10:51Z) - Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories [58.988535279557546]
We introduce textbf sycophancy Mitigation through Adaptive Reasoning Trajectories.<n>We show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs.
arXiv Detail & Related papers (2025-09-20T17:09:14Z) - Stochastic dynamics learning with state-space systems [5.248564173595025]
This work advances the theoretical foundations of reservoir computing (RC) by providing a unified treatment of fading memory and the echo state property (ESP)<n>We investigate state-space systems, a central model class in time series learning, and establish that fading memory and solution stability hold generically -- even in the absence of the ESP.
arXiv Detail & Related papers (2025-08-11T11:49:01Z) - Pre-Trained AI Model Assisted Online Decision-Making under Missing Covariates: A Theoretical Perspective [12.160708336715489]
"Model elasticity" provides a unified way to characterize the regret incurred due to model imputation.<n>We show that under the missing at random (MAR) setting, it is possible to sequentially calibrate the pre-trained model.<n>Our analysis highlights the practical value of having an accurate pre-trained model in sequential decision-making tasks.
arXiv Detail & Related papers (2025-07-10T15:33:27Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Dynamic ensemble selection based on Deep Neural Network Uncertainty
Estimation for Adversarial Robustness [7.158144011836533]
This work explore the dynamic attributes in model level through dynamic ensemble selection technology.
In training phase the Dirichlet distribution is apply as prior of sub-models' predictive distribution, and the diversity constraint in parameter space is introduced.
In test phase, the certain sub-models are dynamically selected based on their rank of uncertainty value for the final prediction.
arXiv Detail & Related papers (2023-08-01T07:41:41Z) - Distributed Bayesian Learning of Dynamic States [65.7870637855531]
The proposed algorithm is a distributed Bayesian filtering task for finite-state hidden Markov models.
It can be used for sequential state estimation, as well as for modeling opinion formation over social networks under dynamic environments.
arXiv Detail & Related papers (2022-12-05T19:40:17Z) - Uncertainty estimation under model misspecification in neural network
regression [3.2622301272834524]
We study the effect of the model choice on uncertainty estimation.
We highlight that under model misspecification, aleatoric uncertainty is not properly captured.
arXiv Detail & Related papers (2021-11-23T10:18:41Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.