Related papers: Toward Optimal Adversarial Policies in the Multiplicative Learning System with a Malicious Expert

Toward Optimal Adversarial Policies in the Multiplicative Learning System with a Malicious Expert

URL: http://arxiv.org/abs/2001.00543v2
Date: Fri, 18 Sep 2020 02:43:53 GMT
Title: Toward Optimal Adversarial Policies in the Multiplicative Learning System with a Malicious Expert
Authors: S. Rasoul Etesami, Negar Kiyavash, Vincent Leon, H. Vincent Poor
Abstract summary: We consider a learning system that combines experts' advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. We show that a simple greedy policy of always reporting false prediction is optimal with an approximation ratio of $1+O(sqrtfracln NN)$. For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in $O(N3)$.
Score: 87.12201611818698
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a learning system based on the conventional multiplicative weight (MW) rule that combines experts' advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. The loss of the system is naturally defined to be the aggregate absolute difference between the sequence of predicted outcomes and the true outcomes. We consider this problem under both offline and online settings. In the offline setting where the malicious expert must choose its entire sequence of decisions a priori, we show somewhat surprisingly that a simple greedy policy of always reporting false prediction is asymptotically optimal with an approximation ratio of $1+O(\sqrt{\frac{\ln N}{N}})$, where $N$ is the total number of prediction stages. In particular, we describe a policy that closely resembles the structure of the optimal offline policy. For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in $O(N^3)$. Our results provide a new direction for vulnerability assessment of commonly used learning algorithms to adversarial attacks where the threat is an integral part of the system.

Related papers

Quantile-Optimal Policy Learning under Unmeasured Confounding [55.72891849926314]
We study quantile-optimal policy learning where the goal is to find a policy whose reward distribution has the largest $alpha$-quantile for some $alpha in (0, 1)$.<n>Such a problem suffers from three main challenges: (i) nonlinearity of the quantile objective as a functional of the reward distribution, (ii) unobserved confounding issue, and (iii) insufficient coverage of the offline dataset.
arXiv Detail & Related papers (2025-06-08T13:37:38Z)
Learning While Repositioning in On-Demand Vehicle Sharing Networks [4.724825031148413]
We consider a network inventory problem motivated by one-way, on-demand vehicle sharing services. We show that a natural Lipschitz-bandit approach achieves a regret guarantee of $widetildeO(Tfracnn+1)$, which suffers from the exponential dependence on $n$. Motivated by these challenges, we propose an Online Gradient Repositioning algorithm that relies solely on censored demand.
arXiv Detail & Related papers (2025-01-31T15:16:02Z)
Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. We propose a single framework built on their equivalence in learning scenarios. Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z)
Online Policy Learning and Inference by Matrix Completion [12.527541242185404]
We formulate the problem as a matrix completion bandit (MCB) We explore the $epsilon$-greedy bandit and the online gradient descent descent. A faster decaying exploration yields smaller regret but learns the optimal policy less accurately.
arXiv Detail & Related papers (2024-04-26T13:19:27Z)
No-Regret Online Prediction with Strategic Experts [16.54912614895861]
We study a generalization of the online binary prediction with expert advice framework where at each round, the learner is allowed to pick $mgeq 1$ experts from a pool of $K$ experts. We focus on the setting in which experts act strategically and aim to maximize their influence on the algorithm's predictions by potentially misreporting their beliefs. Our goal is to design algorithms that satisfy the following two requirements: 1) $textitIncentive-compatible$: Incentivize the experts to report their beliefs truthfully, and 2) $textitNo-regret$: Achieve
arXiv Detail & Related papers (2023-05-24T16:43:21Z)
Streaming Algorithms for Learning with Experts: Deterministic Versus Robust [62.98860182111096]
In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times) The goal is to make a prediction with the minimum cost, specifically compared to the best expert in the set. We show a space lower bound of $widetildeOmegaleft(fracnMRTright)$ for any deterministic algorithm that achieves regret $R$ when the best expert makes $M$ mistakes.
arXiv Detail & Related papers (2023-03-03T04:39:53Z)
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning [23.222448307481073]
We propose a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage. Our algorithm combines the marginalized importance sampling framework with the actor-critic paradigm. We provide both theoretical analysis and experimental results to validate the effectiveness of our proposed algorithm.
arXiv Detail & Related papers (2023-01-30T07:53:53Z)
Optimal Tracking in Prediction with Expert Advice [0.0]
We study the prediction with expert advice setting, where the aim is to produce a decision by combining the decisions generated by a set of experts. We achieve the min-max optimal dynamic regret under the prediction with expert advice setting. Our algorithms are the first to produce such universally optimal, adaptive and truly online guarantees with no prior knowledge.
arXiv Detail & Related papers (2022-08-07T12:29:54Z)
Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response. We construct unbiased estimators for the policy-dependent estimand by a perturbation method. We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z)
Online Adversarial Attacks [57.448101834579624]
We formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases. We first rigorously analyze a deterministic variant of the online threat model. We then propose algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm.
arXiv Detail & Related papers (2021-03-02T20:36:04Z)
Malicious Experts versus the multiplicative weights algorithm in online prediction [85.62472761361107]
We consider a prediction problem with two experts and a forecaster. We assume that one of the experts is honest and makes correct prediction with probability $mu$ at each round. The other one is malicious, who knows true outcomes at each round and makes predictions in order to maximize the loss of the forecaster.
arXiv Detail & Related papers (2020-03-18T20:12:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.