A Frequency-Domain Analysis of the Multi-Armed Bandit Problem: A New Perspective on the Exploration-Exploitation Trade-off
- URL: http://arxiv.org/abs/2510.08908v1
- Date: Fri, 10 Oct 2025 01:45:14 GMT
- Title: A Frequency-Domain Analysis of the Multi-Armed Bandit Problem: A New Perspective on the Exploration-Exploitation Trade-off
- Authors: Di Zhang,
- Abstract summary: The multi-armed bandit (MAB) problem is one of the most fundamental models in sequential decision-making.<n>This paper proposes a novel frequency-domain analysis framework, reformulating the bandit process as a signal processing problem.
- Score: 8.201374511929538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The stochastic multi-armed bandit (MAB) problem is one of the most fundamental models in sequential decision-making, with the core challenge being the trade-off between exploration and exploitation. Although algorithms such as Upper Confidence Bound (UCB) and Thompson Sampling, along with their regret theories, are well-established, existing analyses primarily operate from a time-domain and cumulative regret perspective, struggling to characterize the dynamic nature of the learning process. This paper proposes a novel frequency-domain analysis framework, reformulating the bandit process as a signal processing problem. Within this framework, the reward estimate of each arm is viewed as a spectral component, with its uncertainty corresponding to the component's frequency, and the bandit algorithm is interpreted as an adaptive filter. We construct a formal Frequency-Domain Bandit Model and prove the main theorem: the confidence bound term in the UCB algorithm is equivalent in the frequency domain to a time-varying gain applied to uncertain spectral components, a gain inversely proportional to the square root of the visit count. Based on this, we further derive finite-time dynamic bounds concerning the exploration rate decay. This theory not only provides a novel and intuitive physical interpretation for classical algorithms but also lays a rigorous theoretical foundation for designing next-generation algorithms with adaptive parameter adjustment.
Related papers
- Efficient Convolutional Forward Model for Passive Acoustic Mapping and Temporal Monitoring [4.219150964619931]
We introduce a PAM beamforming framework based on a convolutional formulation in the time domain.<n>In this framework, PAM is formulated as an inverse problem in which the forward operator maps cavitation activity to recorded radio-frequency signals.<n>We then formulate a regularized inversion that incorporates prior knowledge on cavitation activity.
arXiv Detail & Related papers (2026-01-12T09:32:26Z) - The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss [53.542743390809356]
This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
arXiv Detail & Related papers (2025-12-21T06:08:22Z) - A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting [81.73338008264115]
Current approaches for time series forecasting, whether in the time or frequency domain, predominantly use deep learning models based on linear layers or transformers.<n>We propose FIRE, a unified frequency domain decomposition framework that provides a mathematical abstraction for diverse types of time series.<n>Fire consistently outperforms state-of-the-art models on long-term forecasting benchmarks.
arXiv Detail & Related papers (2025-10-11T09:59:25Z) - Impact of Temporally Correlated Dephasing Noise on the Fidelity of the 2-Qubit Deutsch-Jozsa Algorithm [0.76146285961466]
Noise in quantum systems often exhibits temporal correlations, leading to non-Markovian dynamics.<n>This paper investigates the impact of temporally correlated noise on the fidelity of the 2-qubit Deutsch-Jozsa algorithm.
arXiv Detail & Related papers (2025-06-05T18:48:50Z) - Optimal sparse phase retrieval via a quasi-Bayesian approach [0.0]
A signal need to be reconstructed using only the magnitude of its transformation while phase information remains inaccessible.<n>We introduce a novel sparse quasi-Bayesian approach and provide the first theoretical guarantees for such an approach.<n>Our results establish that the proposed Bayesian estimator achieves minimax-optimal convergence rates under sub-exponential noise.
arXiv Detail & Related papers (2025-04-13T10:21:35Z) - MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z) - Benchmarking Diffusion Annealing-Based Bayesian Inverse Problem Solvers [4.429516572803286]
This work introduces three benchmark problems for evaluating the performance of diffusion model based samplers.<n>The benchmark problems are inspired by problems in image inpainting, x-ray tomography, and phase retrieval.<n>In this setting, approximate ground-truth posterior samples can be obtained, enabling principled evaluation of the performance of posterior sampling algorithms.
arXiv Detail & Related papers (2025-03-04T21:07:15Z) - A Deep Unrolling Model with Hybrid Optimization Structure for Hyperspectral Image Deconvolution [50.13564338607482]
We propose a novel optimization framework for the hyperspectral deconvolution problem, called DeepMix.<n>It consists of three distinct modules, namely, a data consistency module, a module that enforces the effect of the handcrafted regularizers, and a denoising module.<n>This work proposes a context aware denoising module designed to sustain the advancements achieved by the cooperative efforts of the other modules.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Fitting time-dependent Markovian dynamics to noisy quantum channels [0.0]
Understanding how to characterise and mitigate errors is a key challenge in developing reliable quantum architecture for near-term applications.
Recent work provides an efficient set of algorithms for analysing unknown noise processes.
We present an extension of the scheme now able to analyse noisy dynamics with time-dependent generators from a sequence of snapshots.
arXiv Detail & Related papers (2023-03-15T21:05:13Z) - Transformer Meets Boundary Value Inverse Problems [4.165221477234755]
Transformer-based deep direct sampling method is proposed for solving a class of boundary value inverse problem.
A real-time reconstruction is achieved by evaluating the learned inverse operator between carefully designed data and reconstructed images.
arXiv Detail & Related papers (2022-09-29T17:45:25Z) - Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
We propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy.
A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings.
An experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
arXiv Detail & Related papers (2022-08-19T19:01:30Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Analysis and Design of Thompson Sampling for Stochastic Partial
Monitoring [91.22679787578438]
We present a novel Thompson-sampling-based algorithm for partial monitoring.
We prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $mathrmO(log T)$ for a linearized variant of the problem with local observability.
arXiv Detail & Related papers (2020-06-17T05:48:33Z) - Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.