Related papers: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning

Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning

URL: http://arxiv.org/abs/2110.14118v1
Date: Wed, 27 Oct 2021 01:56:23 GMT
Title: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning
Authors: Jongjin Park, Younggyo Seo, Chang Liu, Li Zhao, Tao Qin, Jinwoo Shin, Tie-Yan Liu
Abstract summary: This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner. Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
Score: 131.1852444489217
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Behavioral cloning has proven to be effective for learning sequential decision-making policies from expert demonstrations. However, behavioral cloning often suffers from the causal confusion problem where a policy relies on the noticeable effect of expert actions due to the strong correlation but not the cause we desire. This paper presents Object-aware REgularizatiOn (OREO), a simple technique that regularizes an imitation policy in an object-aware manner. Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions. To this end, we introduce a two-stage approach: (a) we extract semantic objects from images by utilizing discrete codes from a vector-quantized variational autoencoder, and (b) we randomly drop the units that share the same discrete code together, i.e., masking out semantic objects. Our experiments demonstrate that OREO significantly improves the performance of behavioral cloning, outperforming various other regularization and causality-based methods on a variety of Atari environments and a self-driving CARLA environment. We also show that our method even outperforms inverse reinforcement learning methods trained with a considerable amount of environment interaction.

Related papers

Offline Learning of Controllable Diverse Behaviors [19.0544729496907]
Imitation Learning (IL) techniques aim to replicate human behaviors in specific tasks. We propose a new method based on temporal consistency and controllability. We compare our approach to state-of-the-art methods over a diverse set of tasks and environments.
arXiv Detail & Related papers (2025-04-25T08:16:56Z)
COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded Grasping [56.907940167333656]
Occluded robot grasping is where the desired grasp poses are kinematically infeasible due to environmental constraints such as surface collisions. Traditional robot manipulation approaches struggle with the complexity of non-prehensile or bimanual strategies commonly used by humans. We introduce Constraint-based Manipulation for Bimanual Occluded Grasping (COMBO-Grasp), a learning-based approach which leverages two coordinated policies.
arXiv Detail & Related papers (2025-02-12T01:31:01Z)
ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z)
Value function interference and greedy action selection in value-based multi-objective reinforcement learning [1.4206639868377509]
Multi-objective reinforcement learning (MORL) algorithms extend conventional reinforcement learning (RL) We show that, if the user's utility function maps widely varying vector-values to similar levels of utility, this can lead to interference. We demonstrate empirically that avoiding the use of random tie-breaking when identifying greedy actions can ameliorate, but not fully overcome, the problems caused by value function interference.
arXiv Detail & Related papers (2024-02-09T09:28:01Z)
Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy. ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables. We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z)
Data augmentation for efficient learning from parametric experts [88.33380893179697]
We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert to inform the behavior of a student policy. Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories. We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems.
arXiv Detail & Related papers (2022-05-23T16:37:16Z)
Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency [61.03922379081648]
We propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation.
arXiv Detail & Related papers (2021-12-11T19:36:19Z)
Addressing Action Oscillations through Learning Policy Inertia [26.171039226334504]
Policy Inertia Controller (PIC) serves as a generic plug-in framework to off-the-shelf DRL algorithms. We propose Nested Policy Iteration as a general training algorithm for PIC-augmented policy. We derive a practical DRL algorithm, namely Nested Soft Actor-Critic.
arXiv Detail & Related papers (2021-03-03T09:59:43Z)
Scalable Reinforcement Learning Policies for Multi-Agent Control [29.42370205354368]
We develop a Multi-Agent Reinforcement Learning (MARL) method to learn scalable control policies for target tracking. We show results for tasks consisting of up to 1000 pursuers tracking 1000 targets.
arXiv Detail & Related papers (2020-11-16T16:11:12Z)
Non-Adversarial Imitation Learning and its Connections to Adversarial Methods [21.89749623434729]
We present a framework for non-adversarial imitation learning. The resulting algorithms are similar to their adversarial counterparts. We also show that our non-adversarial formulation can be used to derive novel algorithms.
arXiv Detail & Related papers (2020-08-08T13:43:06Z)
Connecting the Dots: Detecting Adversarial Perturbations Using Context Inconsistency [25.039201331256372]
We augment the Deep Neural Network with a system that learns context consistency rules during training and checks for the violations of the same during testing. Our approach builds a set of auto-encoders, one for each object class, appropriately trained so as to output a discrepancy between the input and output if an added adversarial perturbation violates context consistency rules. Experiments on PASCAL VOC and MS COCO show that our method effectively detects various adversarial attacks and achieves high ROC-AUC (over 0.95 in most cases)
arXiv Detail & Related papers (2020-07-19T19:46:45Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.