Bandit Optimal Transport
- URL: http://arxiv.org/abs/2502.07397v1
- Date: Tue, 11 Feb 2025 09:24:25 GMT
- Title: Bandit Optimal Transport
- Authors: Lorenzo Croissant,
- Abstract summary: This article considers the bandit problem of learning to solve generic Kantorovich and entropic OT problems from repeated interactions.
We provide $tildemathcal O(sqrtT)$ regret algorithms for both problems by extending linear bandits on Hilbert spaces.
- Score: 1.0878040851638
- License:
- Abstract: Despite the impressive progress in statistical Optimal Transport (OT) in recent years, there has been little interest in the study of the \emph{sequential learning} of OT. Surprisingly so, as this problem is both practically motivated and a challenging extension of existing settings such as linear bandits. This article considers (for the first time) the stochastic bandit problem of learning to solve generic Kantorovich and entropic OT problems from repeated interactions when the marginals are known but the cost is unknown. We provide $\tilde{\mathcal O}(\sqrt{T})$ regret algorithms for both problems by extending linear bandits on Hilbert spaces. These results provide a reduction to infinite-dimensional linear bandits. To deal with the dimension, we provide a method to exploit the intrinsic regularity of the cost to learn, yielding corresponding regret bounds which interpolate between $\tilde{\mathcal O}(\sqrt{T})$ and $\tilde{\mathcal O}(T)$.
Related papers
- On the Optimal Regret of Locally Private Linear Contextual Bandit [18.300225068036642]
We show that it is possible to achieve an $tilde O(sqrtT)$ regret upper bound for locally private linear contextual bandit.
Our solution relies on several new algorithmic and analytical ideas.
arXiv Detail & Related papers (2024-04-15T02:00:24Z) - Chained Information-Theoretic bounds and Tight Regret Rate for Linear
Bandit Problems [37.82763068378491]
We study the regret of a variant of the Thompson-Sampling algorithm for bandit problems.
Under suitable continuity assumption of the rewards, our bound offers a tight rate of $O(dsqrtT)$ for $d$-dimensional linear bandit problems.
arXiv Detail & Related papers (2024-03-05T23:08:18Z) - Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
We consider the setting where we play $M$ linear bandits with dimension $d$, each for $T$ rounds, and these $M$ bandit tasks share a common $k(ll d)$ dimensional linear representation.
We come up with novel algorithms that achieve $widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret bounds, which matches the known minimax regret lower bound up to logarithmic factors.
arXiv Detail & Related papers (2022-03-29T15:27:13Z) - Learning Contextual Bandits Through Perturbed Rewards [107.6210145983805]
We show that a $tildeO(tildedsqrtT)$ regret upper bound is still achievable under standard regularity conditions.
We perturb the rewards when updating the neural network to eliminate the need of explicit exploration.
arXiv Detail & Related papers (2022-01-24T19:10:22Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Simultaneously Learning Stochastic and Adversarial Episodic MDPs with
Known Transition [38.28925339231888]
We develop the first algorithm with a best-of-both-worlds'' guarantee.
It achieves $mathcalO(log T)$ regret when the losses are adversarial.
More generally, it achieves $tildemathcalO(sqrtC)$ regret in an intermediate setting.
arXiv Detail & Related papers (2020-06-10T01:59:34Z) - Low-Rank Generalized Linear Bandit Problems [19.052901817304743]
In a low-rank linear bandit problem, the reward of an action is the inner product between the action and an unknown low-rank matrix $Theta*$.
We propose an algorithm based on a novel combination of online-to-confidence-set conversioncitepabbasi2012online and the exponentially weighted average forecaster constructed by a covering of low-rank matrices.
arXiv Detail & Related papers (2020-06-04T15:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.