Damped Online Newton Step for Portfolio Selection
- URL: http://arxiv.org/abs/2202.07574v1
- Date: Tue, 15 Feb 2022 17:01:55 GMT
- Title: Damped Online Newton Step for Portfolio Selection
- Authors: Zakaria Mhammedi and Alexander Rakhlin
- Abstract summary: We revisit the classic online portfolio selection problem, where at each round a learner selects a distribution over a set of portfolios to allocate its wealth.
Existing algorithms that achieve a logarithmic regret for this problem have per-round time and space complexities that scale with the total number of rounds.
We present the first practical online portfolio selection algorithm with a logarithmic regret and whose per-round time and space complexities depend only logarithmically on the horizon.
- Score: 96.0297968061824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We revisit the classic online portfolio selection problem, where at each
round a learner selects a distribution over a set of portfolios to allocate its
wealth. It is known that for this problem a logarithmic regret with respect to
Cover's loss is achievable using the Universal Portfolio Selection algorithm,
for example. However, all existing algorithms that achieve a logarithmic regret
for this problem have per-round time and space complexities that scale
polynomially with the total number of rounds, making them impractical. In this
paper, we build on the recent work by Haipeng et al. 2018 and present the first
practical online portfolio selection algorithm with a logarithmic regret and
whose per-round time and space complexities depend only logarithmically on the
horizon. Behind our approach are two key technical novelties of independent
interest. We first show that the Damped Online Newton steps can approximate
mirror descent iterates well, even when dealing with time-varying regularizers.
Second, we present a new meta-algorithm that achieves an adaptive logarithmic
regret (i.e. a logarithmic regret on any sub-interval) for mixable losses.
Related papers
- The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - Pushing the Efficiency-Regret Pareto Frontier for Online Learning of
Portfolios and Quantum States [33.073320699506326]
We revisit the classical online portfolio selection problem.
We present the first efficient algorithm, BISONS, that obtains polylogarithmic regret.
We extend our algorithm and analysis to a more general problem, online learning of quantum states with log loss.
arXiv Detail & Related papers (2022-02-06T12:15:55Z) - Near-Linear Time Algorithm with Near-Logarithmic Regret Per Switch for
Mixable/Exp-Concave Losses [0.0]
We study the online optimization of mixable loss functions with logarithmic static regret in a dynamic environment.
For the first time in literature, we show that it is also possible to achieve near-logarithmic regret per switch with sub-polynomial complexity per time.
arXiv Detail & Related papers (2021-09-28T15:06:31Z) - Gap-Dependent Bounds for Two-Player Markov Games [122.20541282562363]
We analyze the cumulative regret when conducting Nash-learning algorithm on 2-player turn-based Markov games (2-TBSG)
This bound matches the theoretical lower bound only up to a logarithmic term.
We extend the conclusion to the discounted game setting with infinite horizon and propose a similar gap dependent logarithmic regret bound.
arXiv Detail & Related papers (2021-07-01T18:25:07Z) - On Limited-Memory Subsampling Strategies for Bandits [0.0]
We show that a simple deterministic subsampling rule, proposed in the recent work of Baudry et al. ( 2020) is optimal in one-dimensional exponential families.
We also prove that these guarantees also hold when limiting the algorithm memory to a polylogarithmic function of the time horizon.
We propose a variant of the algorithm in which only the most recent observations are used for subsampling, achieving optimal regret guarantees under the assumption of a known number of abrupt changes.
arXiv Detail & Related papers (2021-06-21T09:11:22Z) - Regret Bound Balancing and Elimination for Model Selection in Bandits
and RL [34.15978290083909]
We propose a simple model selection approach for algorithms in bandit and reinforcement learning problems.
We prove that the total regret of this approach is bounded by the best valid candidate regret bound times a multiplicative factor.
Unlike recent efforts in model selection for linear bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment.
arXiv Detail & Related papers (2020-12-24T00:53:42Z) - Learning to Accelerate Heuristic Searching for Large-Scale Maximum
Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc.
Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource.
We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z) - Efficient improper learning for online logistic regression [68.8204255655161]
It is known that any proper algorithm which has logarithmic regret in the number of samples (denoted n) necessarily suffers an exponential multiplicative constant in B.
In this work, we design an efficient improper algorithm that avoids this exponential constant while preserving a logarithmic regret.
Our new algorithm based on regularized empirical risk minimization with surrogate losses satisfies a regret scaling as O(B log(Bn)) with a per-round time-complexity of order O(d2)
arXiv Detail & Related papers (2020-03-18T09:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.