Reinforcement Learning for Linear Quadratic Control is Vulnerable Under
Cost Manipulation
- URL: http://arxiv.org/abs/2203.05774v1
- Date: Fri, 11 Mar 2022 06:59:42 GMT
- Title: Reinforcement Learning for Linear Quadratic Control is Vulnerable Under
Cost Manipulation
- Authors: Yunhan Huang and Quanyan Zhu
- Abstract summary: We study the deception of a Linear-Quadratic-Gaussian (LQG) agent by manipulating the cost signals.
We show that a small falsification on the cost parameters will only lead to a bounded change in the optimal policy.
- Score: 22.755411056179813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study the deception of a Linear-Quadratic-Gaussian (LQG)
agent by manipulating the cost signals. We show that a small falsification on
the cost parameters will only lead to a bounded change in the optimal policy
and the bound is linear on the amount of falsification the attacker can apply
on the cost parameters. We propose an attack model where the goal of the
attacker is to mislead the agent into learning a `nefarious' policy with
intended falsification on the cost parameters. We formulate the attack's
problem as an optimization problem, which is proved to be convex, and developed
necessary and sufficient conditions to check the achievability of the
attacker's goal.
We showcase the adversarial manipulation on two types of LQG learners: the
batch RL learner and the other is the adaptive dynamic programming (ADP)
learner. Our results demonstrate that with only 2.296% of falsification on the
cost data, the attacker misleads the batch RL into learning the 'nefarious'
policy that leads the vehicle to a dangerous position. The attacker can also
gradually trick the ADP learner into learning the same `nefarious' policy by
consistently feeding the learner a falsified cost signal that stays close to
the true cost signal. The aim of the paper is to raise people's awareness of
the security threats faced by RL-enabled control systems.
Related papers
- Provably Efficient Action-Manipulation Attack Against Continuous Reinforcement Learning [49.48615590763914]
We propose a black-box attack algorithm named LCBT, which uses the Monte Carlo tree search method for efficient action searching and manipulation.
We conduct our proposed attack methods on three aggressive algorithms: DDPG, PPO, and TD3 in continuous settings, which show a promising attack performance.
arXiv Detail & Related papers (2024-11-20T08:20:29Z) - AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement
Learning [3.4806267677524896]
We propose AutoCost, a framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance.
We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs with baseline agents that use the same policy learners but with only extrinsic costs.
arXiv Detail & Related papers (2023-01-24T22:51:29Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - The Power and Limitation of Pretraining-Finetuning for Linear Regression
under Covariate Shift [127.21287240963859]
We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data.
For a large class of linear regression instances, transfer learning with $O(N2)$ source data is as effective as supervised learning with $N$ target data.
arXiv Detail & Related papers (2022-08-03T05:59:49Z) - Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation
and Complexity Analysis [20.11993437283895]
This paper provides a game-theoretical underpinning for understanding this type of security risk.
We define the sampling attack model as a Stackelberg game between the attacker and the agent, which yields a minimax formulation.
We observe that a minor effort of the attacker can significantly deteriorate the learning performance.
arXiv Detail & Related papers (2022-07-29T21:29:29Z) - Reward Poisoning Attacks on Offline Multi-Agent Reinforcement Learning [17.80728511507729]
An attacker can modify the reward vectors to different learners in an offline data set while incurring a poisoning cost.
We show how the attacker can formulate a linear program to minimize its poisoning cost.
Our work shows the need for robust MARL against adversarial attacks.
arXiv Detail & Related papers (2022-06-04T03:15:57Z) - Projective Ranking-based GNN Evasion Attacks [52.85890533994233]
Graph neural networks (GNNs) offer promising learning methods for graph-related tasks.
GNNs are at risk of adversarial attacks.
arXiv Detail & Related papers (2022-02-25T21:52:09Z) - Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Manipulating Reinforcement Learning: Poisoning Attacks on Cost Signals [22.755411056179813]
This chapter studies emerging cyber-attacks on reinforcement learning (RL)
We analyze the performance degradation of TD($lambda$) and $Q$-learning algorithms under the manipulation.
A case study of TD($lambda$) learning is provided to corroborate the results.
arXiv Detail & Related papers (2020-02-07T15:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.