Related papers: Quantizer Design for Finite Model Approximations, Model Learning, and Quantized Q-Learning for MDPs with Unbounded Spaces

Quantizer Design for Finite Model Approximations, Model Learning, and Quantized Q-Learning for MDPs with Unbounded Spaces

URL: http://arxiv.org/abs/2510.04355v2
Date: Tue, 14 Oct 2025 16:14:06 GMT
Title: Quantizer Design for Finite Model Approximations, Model Learning, and Quantized Q-Learning for MDPs with Unbounded Spaces
Authors: Osman Bicer, Ali D. Kara, Serdar Yuksel,
Abstract summary: We present refined upper bounds presented in [Kara et. al. JMLR'23] on finite model approximation errors.<n>We also consider implications on quantizer design for quantized Q-learning and empirical model learning.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, for Markov decision processes (MDPs) with unbounded state spaces we present refined upper bounds presented in [Kara et. al. JMLR'23] on finite model approximation errors via optimizing the quantizers used for finite model approximations. We also consider implications on quantizer design for quantized Q-learning and empirical model learning, and the performance of policies obtained via Q-learning where the quantized state is treated as the state itself. We highlight the distinctions between planning, where approximating MDPs can be independently designed, and learning (either via Q-learning or empirical model learning), where approximating MDPs are restricted to be defined by invariant measures of Markov chains under exploration policies, leading to significant subtleties on quantizer design performance, even though asymptotic near optimality can be established under both setups. In particular, under Lyapunov growth conditions, we obtain explicit upper bounds which decay to zero as the number of bins approaches infinity

Related papers

Reinforcement Learning with Function Approximation for Non-Markov Processes [2.0136462287587675]
We study reinforcement learning methods with linear function approximation under non-Markov state and cost processes.<n>We show that the algorithm converges under suitable ergodicity conditions on the underlying non-Markov processes.<n>We derive explicit error bounds for the limits of the resulting learning algorithms.
arXiv Detail & Related papers (2026-01-01T00:56:18Z)
Near-optimal Prediction Error Estimation for Quantum Machine Learning Models [20.38743409927907]
Quantum machine learning (QML) models can be significantly affected by the limited access to the underlying data set.<n>Previous studies have focused on proving generalization error bounds for any QML models trained on a finite training set.<n>We focus on the optimal QML models obtained by training them on a finite training set and establish a tight prediction error bound in terms of the number of trainable gates and the size of training sets.
arXiv Detail & Related papers (2025-10-21T01:22:05Z)
MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z)
Boost Post-Training Quantization via Null Space Optimization for Large Language Models [28.57705976553512]
Existing post-training quantization methods for large language models (LLMs) offer remarkable success.<n>The increasingly marginal performance gains suggest that existing quantization strategies are insufficient to support the development of more compressed models.<n>We argue that the quantization error can be effectively alleviated by constraining the post-quantization weight to lie within the null space of input activations.
arXiv Detail & Related papers (2025-05-21T14:07:07Z)
Q-Learning for Stochastic Control under General Information Structures and Non-Markovian Environments [1.90365714903665]
We present a convergence theorem for iterations, and iterate in particular, Q-learnings under a general, possibly non-Markovian, environment. We discuss the implications and applications of this theorem to a variety of control problems with non-Markovian environments.
arXiv Detail & Related papers (2023-10-31T19:53:16Z)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs) This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z)
A kernel-based quantum random forest for improved classification [0.0]
Quantum Machine Learning (QML) to enhance traditional classical learning methods has seen various limitations to its realisation. We extend the linear quantum support vector machine (QSVM) with kernel function computed through quantum kernel estimation (QKE) To limit overfitting, we further extend the model to employ a low-rank Nystr"om approximation to the kernel matrix.
arXiv Detail & Related papers (2022-10-05T15:57:31Z)
PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems. We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models. We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z)
Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces. We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z)
Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity [2.685668802278156]
We show that Q-learning for standard Borel MDPs via quantization of states and actions converges to a limit. Our paper presents a very general convergence and approximation result for the applicability of Q-learning for continuous MDPs.
arXiv Detail & Related papers (2021-11-12T15:47:10Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning [63.64636047748605]
We develop a new theoretical framework to provide convergence guarantee for the general multi-step MAML algorithm. In particular, our results suggest that an inner-stage step needs to be chosen inversely proportional to $N$ of inner-stage steps in order for $N$ MAML to have guaranteed convergence.
arXiv Detail & Related papers (2020-02-18T19:17:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.