Object-Aware Regularization for Addressing Causal Confusion in Imitation
Learning
- URL: http://arxiv.org/abs/2110.14118v1
- Date: Wed, 27 Oct 2021 01:56:23 GMT
- Title: Object-Aware Regularization for Addressing Causal Confusion in Imitation
Learning
- Authors: Jongjin Park, Younggyo Seo, Chang Liu, Li Zhao, Tao Qin, Jinwoo Shin,
Tie-Yan Liu
- Abstract summary: This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner.
Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
- Score: 131.1852444489217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Behavioral cloning has proven to be effective for learning sequential
decision-making policies from expert demonstrations. However, behavioral
cloning often suffers from the causal confusion problem where a policy relies
on the noticeable effect of expert actions due to the strong correlation but
not the cause we desire. This paper presents Object-aware REgularizatiOn
(OREO), a simple technique that regularizes an imitation policy in an
object-aware manner. Our main idea is to encourage a policy to uniformly attend
to all semantic objects, in order to prevent the policy from exploiting
nuisance variables strongly correlated with expert actions. To this end, we
introduce a two-stage approach: (a) we extract semantic objects from images by
utilizing discrete codes from a vector-quantized variational autoencoder, and
(b) we randomly drop the units that share the same discrete code together,
i.e., masking out semantic objects. Our experiments demonstrate that OREO
significantly improves the performance of behavioral cloning, outperforming
various other regularization and causality-based methods on a variety of Atari
environments and a self-driving CARLA environment. We also show that our method
even outperforms inverse reinforcement learning methods trained with a
considerable amount of environment interaction.
Related papers
- ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Value function interference and greedy action selection in value-based
multi-objective reinforcement learning [1.4206639868377509]
Multi-objective reinforcement learning (MORL) algorithms extend conventional reinforcement learning (RL)
We show that, if the user's utility function maps widely varying vector-values to similar levels of utility, this can lead to interference.
We demonstrate empirically that avoiding the use of random tie-breaking when identifying greedy actions can ameliorate, but not fully overcome, the problems caused by value function interference.
arXiv Detail & Related papers (2024-02-09T09:28:01Z) - Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - Data augmentation for efficient learning from parametric experts [88.33380893179697]
We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert to inform the behavior of a student policy.
Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories.
We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems.
arXiv Detail & Related papers (2022-05-23T16:37:16Z) - Deterministic and Discriminative Imitation (D2-Imitation): Revisiting
Adversarial Imitation for Sample Efficiency [61.03922379081648]
We propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization.
Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation.
arXiv Detail & Related papers (2021-12-11T19:36:19Z) - Addressing Action Oscillations through Learning Policy Inertia [26.171039226334504]
Policy Inertia Controller (PIC) serves as a generic plug-in framework to off-the-shelf DRL algorithms.
We propose Nested Policy Iteration as a general training algorithm for PIC-augmented policy.
We derive a practical DRL algorithm, namely Nested Soft Actor-Critic.
arXiv Detail & Related papers (2021-03-03T09:59:43Z) - Scalable Reinforcement Learning Policies for Multi-Agent Control [29.42370205354368]
We develop a Multi-Agent Reinforcement Learning (MARL) method to learn scalable control policies for target tracking.
We show results for tasks consisting of up to 1000 pursuers tracking 1000 targets.
arXiv Detail & Related papers (2020-11-16T16:11:12Z) - Non-Adversarial Imitation Learning and its Connections to Adversarial
Methods [21.89749623434729]
We present a framework for non-adversarial imitation learning.
The resulting algorithms are similar to their adversarial counterparts.
We also show that our non-adversarial formulation can be used to derive novel algorithms.
arXiv Detail & Related papers (2020-08-08T13:43:06Z) - Connecting the Dots: Detecting Adversarial Perturbations Using Context
Inconsistency [25.039201331256372]
We augment the Deep Neural Network with a system that learns context consistency rules during training and checks for the violations of the same during testing.
Our approach builds a set of auto-encoders, one for each object class, appropriately trained so as to output a discrepancy between the input and output if an added adversarial perturbation violates context consistency rules.
Experiments on PASCAL VOC and MS COCO show that our method effectively detects various adversarial attacks and achieves high ROC-AUC (over 0.95 in most cases)
arXiv Detail & Related papers (2020-07-19T19:46:45Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.