Robustified Learning for Online Optimization with Memory Costs
- URL: http://arxiv.org/abs/2305.00677v1
- Date: Mon, 1 May 2023 06:12:01 GMT
- Title: Robustified Learning for Online Optimization with Memory Costs
- Authors: Pengfei Li, Jianyi Yang, Shaolei Ren
- Abstract summary: We propose a novel expert-robustified learning (ERL) approach, achieving both good average performance and robustness.
For any $lambdageq1$, ERL can achieve $lambda$-competitive against the expert algorithm and $lambdacdot C$-competitive against the optimal offline algorithm.
- Score: 28.737193318136725
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Online optimization with memory costs has many real-world applications, where
sequential actions are made without knowing the future input. Nonetheless, the
memory cost couples the actions over time, adding substantial challenges.
Conventionally, this problem has been approached by various expert-designed
online algorithms with the goal of achieving bounded worst-case competitive
ratios, but the resulting average performance is often unsatisfactory. On the
other hand, emerging machine learning (ML) based optimizers can improve the
average performance, but suffer from the lack of worst-case performance
robustness. In this paper, we propose a novel expert-robustified learning (ERL)
approach, achieving {both} good average performance and robustness. More
concretely, for robustness, ERL introduces a novel projection operator that
robustifies ML actions by utilizing an expert online algorithm; for average
performance, ERL trains the ML optimizer based on a recurrent architecture by
explicitly considering downstream expert robustification. We prove that, for
any $\lambda\geq1$, ERL can achieve $\lambda$-competitive against the expert
algorithm and $\lambda\cdot C$-competitive against the optimal offline
algorithm (where $C$ is the expert's competitive ratio). Additionally, we
extend our analysis to a novel setting of multi-step memory costs. Finally, our
analysis is supported by empirical experiments for an energy scheduling
application.
Related papers
- Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Cost-Sensitive Multi-Fidelity Bayesian Optimization with Transfer of Learning Curve Extrapolation [55.75188191403343]
We introduce utility, which is a function predefined by each user and describes the trade-off between cost and performance of BO.
We validate our algorithm on various LC datasets and found it outperform all the previous multi-fidelity BO and transfer-BO baselines we consider.
arXiv Detail & Related papers (2024-05-28T07:38:39Z) - $i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization [12.266207199002604]
Large Language Models (LLM) can sometimes produce outputs that deviate from human expectations.
We propose a novel framework named $i$REPO, which utilizes implicit Reward pairwise difference regression for Empirical Preference Optimization.
We show that $i$REPO effectively achieves self-alignment using soft-label, self-generated responses and the logit of empirical AI annotators.
arXiv Detail & Related papers (2024-05-24T05:42:11Z) - Best of Both Worlds Guarantees for Smoothed Online Quadratic Optimization [9.449153668916098]
We study the smoothed online optimization (SOQO) problem where, at each round $t$, a player plays an action $x_t in response to a quadratic hitting cost and an additional squared $ell$-norm cost for switching actions.
This problem class has strong connections to a wide range of application domains including smart grid management, adaptive control, and data center management.
We present a best-of-both-worlds algorithm that obtains a robust adversarial performance while simultaneously achieving a near-optimal performance.
arXiv Detail & Related papers (2023-10-31T22:59:23Z) - Robust Learning for Smoothed Online Convex Optimization with Feedback
Delay [43.85262428603507]
We propose a novel machine learning (ML) augmented online algorithm, Robustness-Constrained Learning (RCL)
RCL combines untrusted ML predictions with a trusted expert online algorithm via constrained projection to robustify the ML prediction.
RCL is the first ML-augmented algorithm with a provable robustness guarantee in the case of multi-step switching cost and feedback delay.
arXiv Detail & Related papers (2023-10-31T00:22:55Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Why So Pessimistic? Estimating Uncertainties for Offline RL through
Ensembles, and Why Their Independence Matters [35.17151863463472]
We take a renewed look at how ensembles of $Q$-functions can be leveraged as the primary source of pessimism for offline reinforcement learning (RL)
We propose MSG, a practical offline RL algorithm that trains an ensemble of $Q$-functions with independently computed targets based on completely separate networks.
Our experiments on the popular D4RL and RL Unplugged offline RL benchmarks demonstrate that MSG with deep ensembles surpasses highly well-tuned state-of-the-art methods by a wide margin.
arXiv Detail & Related papers (2022-05-27T01:30:12Z) - Expert-Calibrated Learning for Online Optimization with Switching Costs [28.737193318136725]
We study online convex optimization with switching costs.
By tapping into the power of machine learning (ML) baseds, ML-augmented online algorithms have been emerging as state of the art.
We propose EC-L2O (expert-calibrated learning to optimize), which trains an ML-based algorithm by explicitly taking into account the expert calibrator.
arXiv Detail & Related papers (2022-04-18T21:54:33Z) - Learning to Optimize: A Primer and A Benchmark [94.29436694770953]
Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods.
This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization.
arXiv Detail & Related papers (2021-03-23T20:46:20Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z) - Cost-Efficient Online Hyperparameter Optimization [94.60924644778558]
We propose an online HPO algorithm that reaches human expert-level performance within a single run of the experiment.
Our proposed online HPO algorithm reaches human expert-level performance within a single run of the experiment, while incurring only modest computational overhead compared to regular training.
arXiv Detail & Related papers (2021-01-17T04:55:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.