Online Stackelberg Optimization via Nonlinear Control
- URL: http://arxiv.org/abs/2406.18805v1
- Date: Thu, 27 Jun 2024 00:42:33 GMT
- Title: Online Stackelberg Optimization via Nonlinear Control
- Authors: William Brown, Christos Papadimitriou, Tim Roughgarden,
- Abstract summary: In repeated interaction problems with adaptive agents, our objective often requires anticipating and optimizing over the space of possible agent responses.
We show that many problems of this form can be cast as instances of online (nonlinear) control which satisfy textitlocal controllability, with convex losses over a bounded state space.
We introduce a unified algorithmic framework for tractable regret minimization in such cases.
- Score: 11.220642401065495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In repeated interaction problems with adaptive agents, our objective often requires anticipating and optimizing over the space of possible agent responses. We show that many problems of this form can be cast as instances of online (nonlinear) control which satisfy \textit{local controllability}, with convex losses over a bounded state space which encodes agent behavior, and we introduce a unified algorithmic framework for tractable regret minimization in such cases. When the instance dynamics are known but otherwise arbitrary, we obtain oracle-efficient $O(\sqrt{T})$ regret by reduction to online convex optimization, which can be made computationally efficient if dynamics are locally \textit{action-linear}. In the presence of adversarial disturbances to the state, we give tight bounds in terms of either the cumulative or per-round disturbance magnitude (for \textit{strongly} or \textit{weakly} locally controllable dynamics, respectively). Additionally, we give sublinear regret results for the cases of unknown locally action-linear dynamics as well as for the bandit feedback setting. Finally, we demonstrate applications of our framework to well-studied problems including performative prediction, recommendations for adaptive agents, adaptive pricing of real-valued goods, and repeated gameplay against no-regret learners, directly yielding extensions beyond prior results in each case.
Related papers
- Controlled Learning of Pointwise Nonlinearities in Neural-Network-Like Architectures [14.93489065234423]
We present a general variational framework for the training of freeform nonlinearities in layered computational architectures.
The slope constraints allow us to impose properties such as 1-Lipschitz stability, firm non-expansiveness, and monotonicity/invertibility.
We show how to solve the numerically function-optimization problem by representing the nonlinearities in a suitable (nonuniform) B-spline basis.
arXiv Detail & Related papers (2024-08-23T14:39:27Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - On Convex Data-Driven Inverse Optimal Control for Nonlinear, Non-stationary and Stochastic Systems [0.7240153598817866]
This paper is concerned with a finite-horizon inverse control problem, which has the goal of reconstructing, from observations, the possibly non-nonstationary cost driving the actions of an agent.
We present a result enabling cost optimization by solving an problem that is convex non-stationary agent cost.
All experiments confirm the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-24T10:25:53Z) - Adaptive Decision-Making with Constraints and Dependent Losses:
Performance Guarantees and Applications to Online and Nonlinear
Identification [5.787117733071415]
We consider adaptive decision-making problems where an agent optimize a cumulative performance objective by repeatedly choosing among a finite set of options.
Our algorithm and our analysis is instance dependent, that is, suboptimal choices of the environment are exploited and reflected in our regret bounds.
The performance of the resulting algorithms is highlighted in two numerical examples.
arXiv Detail & Related papers (2023-04-06T18:32:26Z) - Online Nonsubmodular Minimization with Delayed Costs: From Full
Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms.
We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z) - Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization [70.4342220499858]
We introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities.
Our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and safeguard the same rate in the worst case.
arXiv Detail & Related papers (2021-12-29T02:42:59Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - On the Stability Properties and the Optimization Landscape of Training
Problems with Squared Loss for Neural Networks and General Nonlinear Conic
Approximation Schemes [0.0]
We study the optimization landscape and the stability properties of training problems with squared loss for neural networks and general nonlinear conic approximation schemes.
We prove that the same effects that are responsible for these instability properties are also the reason for the emergence of saddle points and spurious local minima.
arXiv Detail & Related papers (2020-11-06T11:34:59Z) - Global Optimization of Objective Functions Represented by ReLU Networks [77.55969359556032]
Neural networks can learn complex, non- adversarial functions, and it is challenging to guarantee their correct behavior in safety-critical contexts.
Many approaches exist to find failures in networks (e.g., adversarial examples), but these cannot guarantee the absence of failures.
We propose an approach that integrates the optimization process into the verification procedure, achieving better performance than the naive approach.
arXiv Detail & Related papers (2020-10-07T08:19:48Z) - Adaptive Regret for Control of Time-Varying Dynamics [31.319502238224334]
We introduce the metric of it adaptive regret to the field of control.
Our main contribution is a novel efficient meta-algorithm.
The main technical innovation is the first adaptive regret bound for the more general framework of online convex optimization with memory.
arXiv Detail & Related papers (2020-07-08T19:40:34Z) - Dynamic Regret of Convex and Smooth Functions [93.71361250701075]
We investigate online convex optimization in non-stationary environments.
We choose the dynamic regret as the performance measure.
We show that it is possible to further enhance the dynamic regret by exploiting the smoothness condition.
arXiv Detail & Related papers (2020-07-07T14:10:57Z) - Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning [56.23358327635815]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces.<n>A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions.<n>We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.