Toward Optimal Adversarial Policies in the Multiplicative Learning
System with a Malicious Expert
- URL: http://arxiv.org/abs/2001.00543v2
- Date: Fri, 18 Sep 2020 02:43:53 GMT
- Title: Toward Optimal Adversarial Policies in the Multiplicative Learning
System with a Malicious Expert
- Authors: S. Rasoul Etesami, Negar Kiyavash, Vincent Leon, H. Vincent Poor
- Abstract summary: We consider a learning system that combines experts' advice to predict a sequence of true outcomes.
It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system.
We show that a simple greedy policy of always reporting false prediction is optimal with an approximation ratio of $1+O(sqrtfracln NN)$.
For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in $O(N3)$.
- Score: 87.12201611818698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a learning system based on the conventional multiplicative weight
(MW) rule that combines experts' advice to predict a sequence of true outcomes.
It is assumed that one of the experts is malicious and aims to impose the
maximum loss on the system. The loss of the system is naturally defined to be
the aggregate absolute difference between the sequence of predicted outcomes
and the true outcomes. We consider this problem under both offline and online
settings. In the offline setting where the malicious expert must choose its
entire sequence of decisions a priori, we show somewhat surprisingly that a
simple greedy policy of always reporting false prediction is asymptotically
optimal with an approximation ratio of $1+O(\sqrt{\frac{\ln N}{N}})$, where $N$
is the total number of prediction stages. In particular, we describe a policy
that closely resembles the structure of the optimal offline policy. For the
online setting where the malicious expert can adaptively make its decisions, we
show that the optimal online policy can be efficiently computed by solving a
dynamic program in $O(N^3)$. Our results provide a new direction for
vulnerability assessment of commonly used learning algorithms to adversarial
attacks where the threat is an integral part of the system.
Related papers
- Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Online Policy Learning and Inference by Matrix Completion [12.527541242185404]
We formulate the problem as a matrix completion bandit (MCB)
We explore the $epsilon$-greedy bandit and the online gradient descent descent.
A faster decaying exploration yields smaller regret but learns the optimal policy less accurately.
arXiv Detail & Related papers (2024-04-26T13:19:27Z) - No-Regret Online Prediction with Strategic Experts [16.54912614895861]
We study a generalization of the online binary prediction with expert advice framework where at each round, the learner is allowed to pick $mgeq 1$ experts from a pool of $K$ experts.
We focus on the setting in which experts act strategically and aim to maximize their influence on the algorithm's predictions by potentially misreporting their beliefs.
Our goal is to design algorithms that satisfy the following two requirements: 1) $textitIncentive-compatible$: Incentivize the experts to report their beliefs truthfully, and 2) $textitNo-regret$: Achieve
arXiv Detail & Related papers (2023-05-24T16:43:21Z) - Streaming Algorithms for Learning with Experts: Deterministic Versus
Robust [62.98860182111096]
In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times)
The goal is to make a prediction with the minimum cost, specifically compared to the best expert in the set.
We show a space lower bound of $widetildeOmegaleft(fracnMRTright)$ for any deterministic algorithm that achieves regret $R$ when the best expert makes $M$ mistakes.
arXiv Detail & Related papers (2023-03-03T04:39:53Z) - Importance Weighted Actor-Critic for Optimal Conservative Offline
Reinforcement Learning [23.222448307481073]
We propose a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage.
Our algorithm combines the marginalized importance sampling framework with the actor-critic paradigm.
We provide both theoretical analysis and experimental results to validate the effectiveness of our proposed algorithm.
arXiv Detail & Related papers (2023-01-30T07:53:53Z) - Optimal Tracking in Prediction with Expert Advice [0.0]
We study the prediction with expert advice setting, where the aim is to produce a decision by combining the decisions generated by a set of experts.
We achieve the min-max optimal dynamic regret under the prediction with expert advice setting.
Our algorithms are the first to produce such universally optimal, adaptive and truly online guarantees with no prior knowledge.
arXiv Detail & Related papers (2022-08-07T12:29:54Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Online Adversarial Attacks [57.448101834579624]
We formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases.
We first rigorously analyze a deterministic variant of the online threat model.
We then propose algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm.
arXiv Detail & Related papers (2021-03-02T20:36:04Z) - Malicious Experts versus the multiplicative weights algorithm in online
prediction [85.62472761361107]
We consider a prediction problem with two experts and a forecaster.
We assume that one of the experts is honest and makes correct prediction with probability $mu$ at each round.
The other one is malicious, who knows true outcomes at each round and makes predictions in order to maximize the loss of the forecaster.
arXiv Detail & Related papers (2020-03-18T20:12:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.