Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application
to Joint Communications and Sensing
- URL: http://arxiv.org/abs/2302.05257v2
- Date: Mon, 13 Feb 2023 09:42:37 GMT
- Title: Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application
to Joint Communications and Sensing
- Authors: Amir Rezaei Balef and Setareh Maghsudi
- Abstract summary: We propose a generic upper confidence bound (UCB)-based algorithm with change detection to solve this problem.
We also formulate an energy-efficient waveform design problem in an integrated communication and sensing system as a toy example.
- Score: 7.0997346625024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a multi-objective multi-armed bandit problem in a dynamic
environment. The problem portrays a decision-maker that sequentially selects an
arm from a given set. If selected, each action produces a reward vector, where
every element follows a piecewise-stationary Bernoulli distribution. The agent
aims at choosing an arm among the Pareto optimal set of arms to minimize its
regret. We propose a Pareto generic upper confidence bound (UCB)-based
algorithm with change detection to solve this problem. By developing the
essential inequalities for multi-dimensional spaces, we establish that our
proposal guarantees a regret bound in the order of $\gamma_T\log(T/{\gamma_T})$
when the number of breakpoints $\gamma_T$ is known. Without this assumption,
the regret bound of our algorithm is $\gamma_T\log(T)$. Finally, we formulate
an energy-efficient waveform design problem in an integrated communication and
sensing system as a toy example. Numerical experiments on the toy example and
synthetic and real-world datasets demonstrate the efficiency of our policy
compared to the current methods.
Related papers
- A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.
We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget [6.22018632187078]
We introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget.
The goal is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$.
We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$.
arXiv Detail & Related papers (2024-05-23T22:35:11Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Distributed Consensus Algorithm for Decision-Making in Multi-agent
Multi-armed Bandit [7.708904950194129]
We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment.
A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points.
The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step.
arXiv Detail & Related papers (2023-06-09T16:10:26Z) - Contextual Combinatorial Bandits with Probabilistically Triggered Arms [55.9237004478033]
We study contextual bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions.
Under the triggering modulated (TPM) condition, we devise the C$2$-UC-T algorithm and derive a regret bound $tildeO(dsqrtT)$.
arXiv Detail & Related papers (2023-03-30T02:51:00Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - ProtoBandit: Efficient Prototype Selection via Multi-Armed Bandits [9.333087475006003]
ProtoBandit is a multi-armed bandit-based framework for identifying a compact set of informative data instances from a source dataset.
Our algorithm reduces the number of similarity computation calls by several orders of magnitudes ($100-1000 times) while obtaining solutions similar in quality to those from state-of-the-art approaches.
arXiv Detail & Related papers (2022-10-04T19:03:47Z) - A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits [11.918230810566945]
We study the non-stationary multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning.
We propose a method that achieves, in $K$-armed bandit problems, a near-optimal $widetilde O(sqrtK N(S+1))$ dynamic regret.
arXiv Detail & Related papers (2022-01-17T17:23:56Z) - Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took.
While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results.
We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.