Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application
to Joint Communications and Sensing
- URL: http://arxiv.org/abs/2302.05257v2
- Date: Mon, 13 Feb 2023 09:42:37 GMT
- Title: Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application
to Joint Communications and Sensing
- Authors: Amir Rezaei Balef and Setareh Maghsudi
- Abstract summary: We propose a generic upper confidence bound (UCB)-based algorithm with change detection to solve this problem.
We also formulate an energy-efficient waveform design problem in an integrated communication and sensing system as a toy example.
- Score: 7.0997346625024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a multi-objective multi-armed bandit problem in a dynamic
environment. The problem portrays a decision-maker that sequentially selects an
arm from a given set. If selected, each action produces a reward vector, where
every element follows a piecewise-stationary Bernoulli distribution. The agent
aims at choosing an arm among the Pareto optimal set of arms to minimize its
regret. We propose a Pareto generic upper confidence bound (UCB)-based
algorithm with change detection to solve this problem. By developing the
essential inequalities for multi-dimensional spaces, we establish that our
proposal guarantees a regret bound in the order of $\gamma_T\log(T/{\gamma_T})$
when the number of breakpoints $\gamma_T$ is known. Without this assumption,
the regret bound of our algorithm is $\gamma_T\log(T)$. Finally, we formulate
an energy-efficient waveform design problem in an integrated communication and
sensing system as a toy example. Numerical experiments on the toy example and
synthetic and real-world datasets demonstrate the efficiency of our policy
compared to the current methods.
Related papers
- Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization.
We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution.
Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z) - Breaking the $\log(1/Δ_2)$ Barrier: Better Batched Best Arm Identification with Adaptive Grids [28.547030766096956]
We introduce an algorithm that achieves near-optimal sample complexity and features an instance-sensitive batch complexity.
We extend this framework to the problem of batched best arm identification in linear bandits and achieve similar improvements.
arXiv Detail & Related papers (2025-01-29T01:40:36Z) - Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection.
The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error.
We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z) - Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget [6.22018632187078]
We introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget.
The goal is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$.
We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$.
arXiv Detail & Related papers (2024-05-23T22:35:11Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Distributed Consensus Algorithm for Decision-Making in Multi-agent
Multi-armed Bandit [7.708904950194129]
We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment.
A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points.
The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step.
arXiv Detail & Related papers (2023-06-09T16:10:26Z) - Contextual Combinatorial Bandits with Probabilistically Triggered Arms [55.9237004478033]
We study contextual bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions.
Under the triggering modulated (TPM) condition, we devise the C$2$-UC-T algorithm and derive a regret bound $tildeO(dsqrtT)$.
arXiv Detail & Related papers (2023-03-30T02:51:00Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - ProtoBandit: Efficient Prototype Selection via Multi-Armed Bandits [9.333087475006003]
ProtoBandit is a multi-armed bandit-based framework for identifying a compact set of informative data instances from a source dataset.
Our algorithm reduces the number of similarity computation calls by several orders of magnitudes ($100-1000 times) while obtaining solutions similar in quality to those from state-of-the-art approaches.
arXiv Detail & Related papers (2022-10-04T19:03:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.