Related papers: A Frequency-Domain Analysis of the Multi-Armed Bandit Problem: A New Perspective on the Exploration-Exploitation Trade-off

A Frequency-Domain Analysis of the Multi-Armed Bandit Problem: A New Perspective on the Exploration-Exploitation Trade-off

URL: http://arxiv.org/abs/2510.08908v1
Date: Fri, 10 Oct 2025 01:45:14 GMT
Title: A Frequency-Domain Analysis of the Multi-Armed Bandit Problem: A New Perspective on the Exploration-Exploitation Trade-off
Authors: Di Zhang,
Abstract summary: The multi-armed bandit (MAB) problem is one of the most fundamental models in sequential decision-making.<n>This paper proposes a novel frequency-domain analysis framework, reformulating the bandit process as a signal processing problem.
Score: 8.201374511929538
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The stochastic multi-armed bandit (MAB) problem is one of the most fundamental models in sequential decision-making, with the core challenge being the trade-off between exploration and exploitation. Although algorithms such as Upper Confidence Bound (UCB) and Thompson Sampling, along with their regret theories, are well-established, existing analyses primarily operate from a time-domain and cumulative regret perspective, struggling to characterize the dynamic nature of the learning process. This paper proposes a novel frequency-domain analysis framework, reformulating the bandit process as a signal processing problem. Within this framework, the reward estimate of each arm is viewed as a spectral component, with its uncertainty corresponding to the component's frequency, and the bandit algorithm is interpreted as an adaptive filter. We construct a formal Frequency-Domain Bandit Model and prove the main theorem: the confidence bound term in the UCB algorithm is equivalent in the frequency domain to a time-varying gain applied to uncertain spectral components, a gain inversely proportional to the square root of the visit count. Based on this, we further derive finite-time dynamic bounds concerning the exploration rate decay. This theory not only provides a novel and intuitive physical interpretation for classical algorithms but also lays a rigorous theoretical foundation for designing next-generation algorithms with adaptive parameter adjustment.

Related papers

Efficient Convolutional Forward Model for Passive Acoustic Mapping and Temporal Monitoring [4.219150964619931]
We introduce a PAM beamforming framework based on a convolutional formulation in the time domain.<n>In this framework, PAM is formulated as an inverse problem in which the forward operator maps cavitation activity to recorded radio-frequency signals.<n>We then formulate a regularized inversion that incorporates prior knowledge on cavitation activity.
arXiv Detail & Related papers (2026-01-12T09:32:26Z)
The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss [53.542743390809356]
This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
arXiv Detail & Related papers (2025-12-21T06:08:22Z)
A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting [81.73338008264115]
Current approaches for time series forecasting, whether in the time or frequency domain, predominantly use deep learning models based on linear layers or transformers.<n>We propose FIRE, a unified frequency domain decomposition framework that provides a mathematical abstraction for diverse types of time series.<n>Fire consistently outperforms state-of-the-art models on long-term forecasting benchmarks.
arXiv Detail & Related papers (2025-10-11T09:59:25Z)
Impact of Temporally Correlated Dephasing Noise on the Fidelity of the 2-Qubit Deutsch-Jozsa Algorithm [0.76146285961466]
Noise in quantum systems often exhibits temporal correlations, leading to non-Markovian dynamics.<n>This paper investigates the impact of temporally correlated noise on the fidelity of the 2-qubit Deutsch-Jozsa algorithm.
arXiv Detail & Related papers (2025-06-05T18:48:50Z)
Optimal sparse phase retrieval via a quasi-Bayesian approach [0.0]
A signal need to be reconstructed using only the magnitude of its transformation while phase information remains inaccessible.<n>We introduce a novel sparse quasi-Bayesian approach and provide the first theoretical guarantees for such an approach.<n>Our results establish that the proposed Bayesian estimator achieves minimax-optimal convergence rates under sub-exponential noise.
arXiv Detail & Related papers (2025-04-13T10:21:35Z)
MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z)
Benchmarking Diffusion Annealing-Based Bayesian Inverse Problem Solvers [4.429516572803286]
This work introduces three benchmark problems for evaluating the performance of diffusion model based samplers.<n>The benchmark problems are inspired by problems in image inpainting, x-ray tomography, and phase retrieval.<n>In this setting, approximate ground-truth posterior samples can be obtained, enabling principled evaluation of the performance of posterior sampling algorithms.
arXiv Detail & Related papers (2025-03-04T21:07:15Z)
A Deep Unrolling Model with Hybrid Optimization Structure for Hyperspectral Image Deconvolution [50.13564338607482]
We propose a novel optimization framework for the hyperspectral deconvolution problem, called DeepMix.<n>It consists of three distinct modules, namely, a data consistency module, a module that enforces the effect of the handcrafted regularizers, and a denoising module.<n>This work proposes a context aware denoising module designed to sustain the advancements achieved by the cooperative efforts of the other modules.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
Fitting time-dependent Markovian dynamics to noisy quantum channels [0.0]
Understanding how to characterise and mitigate errors is a key challenge in developing reliable quantum architecture for near-term applications. Recent work provides an efficient set of algorithms for analysing unknown noise processes. We present an extension of the scheme now able to analyse noisy dynamics with time-dependent generators from a sequence of snapshots.
arXiv Detail & Related papers (2023-03-15T21:05:13Z)
Transformer Meets Boundary Value Inverse Problems [4.165221477234755]
Transformer-based deep direct sampling method is proposed for solving a class of boundary value inverse problem. A real-time reconstruction is achieved by evaluating the learned inverse operator between carefully designed data and reconstructed images.
arXiv Detail & Related papers (2022-09-29T17:45:25Z)
Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
We propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy. A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings. An experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
arXiv Detail & Related papers (2022-08-19T19:01:30Z)
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem. Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals. We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z)
Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring. We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z)
Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP) We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.