Online Nonsubmodular Minimization with Delayed Costs: From Full
Information to Bandit Feedback
- URL: http://arxiv.org/abs/2205.07217v1
- Date: Sun, 15 May 2022 08:27:12 GMT
- Title: Online Nonsubmodular Minimization with Delayed Costs: From Full
Information to Bandit Feedback
- Authors: Tianyi Lin and Aldo Pacchiano and Yaodong Yu and Michael I. Jordan
- Abstract summary: We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms.
We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
- Score: 98.7678704343537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by applications to online learning in sparse estimation and
Bayesian optimization, we consider the problem of online unconstrained
nonsubmodular minimization with delayed costs in both full information and
bandit feedback settings. In contrast to previous works on online unconstrained
submodular minimization, we focus on a class of nonsubmodular functions with
special structure, and prove regret guarantees for several variants of the
online and approximate online bandit gradient descent algorithms in static and
delayed scenarios. We derive bounds for the agent's regret in the full
information and bandit feedback setting, even if the delay between choosing a
decision and receiving the incurred cost is unbounded. Key to our approach is
the notion of $(\alpha, \beta)$-regret and the extension of the generic convex
relaxation model from~\citet{El-2020-Optimal}, the analysis of which is of
independent interest. We conduct and showcase several simulation studies to
demonstrate the efficacy of our algorithms.
Related papers
- Contextual Linear Optimization with Bandit Feedback [35.692428244561626]
Contextual linear optimization (CLO) uses predictive contextual features to reduce uncertainty in random cost coefficients.
We study a class of offline learning algorithms for CLO with bandit feedback.
We show a fast-rate regret bound for IERM that allows for misspecified model classes and flexible choices of the optimization estimate.
arXiv Detail & Related papers (2024-05-26T13:27:27Z) - Improved Regret for Efficient Online Reinforcement Learning with Linear
Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions.
We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z) - Distributed Online Non-convex Optimization with Composite Regret [31.53784277195043]
We propose a novel composite regret with a new network regret for distributed online general bounds losses.
To our knowledge, is the first regret bound for distributed online nonlinear learning.
arXiv Detail & Related papers (2022-09-21T04:16:33Z) - Augmented Lagrangian Methods for Time-varying Constrained Online Convex
Optimization [1.662966122370634]
We consider online convex optimization (OCO) with time-varying loss and constraint functions.
We first develop a class of model-based augmented Lagrangian methods (MALM) for time-varying functional constrained OCO.
numerical results for several examples of constrained OCO are presented to demonstrate the efficiency of the proposed algorithms.
arXiv Detail & Related papers (2022-05-19T14:03:25Z) - Smoothed Online Convex Optimization Based on Discounted-Normal-Predictor [68.17855675511602]
We investigate an online prediction strategy named as Discounted-Normal-Predictor (Kapralov and Panigrahy, 2010) for smoothed online convex optimization (SOCO)
We show that the proposed algorithm can minimize the adaptive regret with switching cost in every interval.
arXiv Detail & Related papers (2022-05-02T08:48:22Z) - Smoothed Online Combinatorial Optimization Using Imperfect Predictions [27.201074566335222]
We study smoothed online optimization problems when an imperfect predictive model is available.
We show that using predictions to plan for a finite time horizon leads to regret dependent on the total predictive uncertainty and an additional switching cost.
Our algorithm shows a significant improvement in cumulative regret compared to other baselines in synthetic online distributed streaming problems.
arXiv Detail & Related papers (2022-04-23T02:30:39Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Online Allocation with Two-sided Resource Constraints [44.5635910908944]
We consider an online allocation problem subject to lower and upper resource constraints, where the requests arrive sequentially.
We propose a new algorithm that obtains $1-O(fracepsilonalpha-epsilon)$ -competitive ratio for the offline problems that know the entire requests ahead of time.
arXiv Detail & Related papers (2021-12-28T02:21:06Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.