Adversarial Counterfactual Environment Model Learning
- URL: http://arxiv.org/abs/2206.04890v2
- Date: Mon, 9 Oct 2023 02:23:27 GMT
- Title: Adversarial Counterfactual Environment Model Learning
- Authors: Xiong-Hui Chen, Yang Yu, Zheng-Mao Zhu, Zhihua Yu, Zhenjun Chen,
Chenghe Wang, Yinan Wu, Hongqiu Wu, Rong-Jun Qin, Ruijin Ding, Fangsheng
Huang
- Abstract summary: We introduce counterfactual-query risk minimization (CQRM) in model learning for generalizing to a counterfactual dataset queried by a specific target policy.
Since the target policies can be various and unknown in policy learning, we propose an adversarial CQRM objective in which the model learns on counterfactual data queried by adversarial policies.
We apply GALILEO in synthetic tasks and a real-world application.
- Score: 14.47625059952581
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A good model for action-effect prediction, named environment model, is
important to achieve sample-efficient decision-making policy learning in many
domains like robot control, recommender systems, and patients' treatment
selection. We can take unlimited trials with such a model to identify the
appropriate actions so that the costs of queries in the real world can be
saved. It requires the model to handle unseen data correctly, also called
counterfactual data. However, standard data fitting techniques do not
automatically achieve such generalization ability and commonly result in
unreliable models. In this work, we introduce counterfactual-query risk
minimization (CQRM) in model learning for generalizing to a counterfactual
dataset queried by a specific target policy. Since the target policies can be
various and unknown in policy learning, we propose an adversarial CQRM
objective in which the model learns on counterfactual data queried by
adversarial policies, and finally derive a tractable solution GALILEO. We also
discover that adversarial CQRM is closely related to the adversarial model
learning, explaining the effectiveness of the latter. We apply GALILEO in
synthetic tasks and a real-world application. The results show that GALILEO
makes accurate predictions on counterfactual data and thus significantly
improves policies in real-world testing.
Related papers
- Distributional Successor Features Enable Zero-Shot Policy Optimization [36.53356539916603]
This work proposes a novel class of models, i.e., Distributional Successor Features for Zero-Shot Policy Optimization (DiSPOs)
DiSPOs learn a distribution of successor features of a stationary dataset's behavior policy, along with a policy that acts to realize different successor features achievable within the dataset.
By directly modeling long-term outcomes in the dataset, DiSPOs avoid compounding error while enabling a simple scheme for zero-shot policy optimization across reward functions.
arXiv Detail & Related papers (2024-03-10T22:27:21Z) - SCME: A Self-Contrastive Method for Data-free and Query-Limited Model
Extraction Attack [18.998300969035885]
Model extraction attacks fool the target model by generating adversarial examples on a substitute model.
We propose a novel data-free model extraction method named SCME, which considers both the inter- and intra-class diversity in synthesizing fake data.
arXiv Detail & Related papers (2023-10-15T10:41:45Z) - Dual policy as self-model for planning [71.73710074424511]
We refer to the model used to simulate one's decisions as the agent's self-model.
Inspired by current reinforcement learning approaches and neuroscience, we explore the benefits and limitations of using a distilled policy network as the self-model.
arXiv Detail & Related papers (2023-06-07T13:58:45Z) - Canary in a Coalmine: Better Membership Inference with Ensembled
Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse.
Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z) - Suppressing Poisoning Attacks on Federated Learning for Medical Imaging [4.433842217026879]
We propose a robust aggregation rule called Distance-based Outlier Suppression (DOS) that is resilient to byzantine failures.
The proposed method computes the distance between local parameter updates of different clients and obtains an outlier score for each client.
The resulting outlier scores are converted into normalized weights using a softmax function, and a weighted average of the local parameters is used for updating the global model.
arXiv Detail & Related papers (2022-07-15T00:43:34Z) - Fully Decentralized Model-based Policy Optimization for Networked
Systems [23.46407780093797]
This work aims to improve data efficiency of multi-agent control by model-based learning.
We consider networked systems where agents are cooperative and communicate only locally with their neighbors.
In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts.
arXiv Detail & Related papers (2022-07-13T23:52:14Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return.
Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions.
The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z) - Building a Foundation for Data-Driven, Interpretable, and Robust Policy
Design using the AI Economist [67.08543240320756]
We show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning and data-driven simulations.
We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes.
arXiv Detail & Related papers (2021-08-06T01:30:41Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.