TarGF: Learning Target Gradient Field for Object Rearrangement
- URL: http://arxiv.org/abs/2209.00853v1
- Date: Fri, 2 Sep 2022 07:20:34 GMT
- Title: TarGF: Learning Target Gradient Field for Object Rearrangement
- Authors: Mingdong Wu, Fangwei Zhong, Yulong Xia, Hao Dong
- Abstract summary: We focus on a more practical setting in object rearrangement, i.e., rearranging objects from shuffled layouts to a normative target distribution.
It is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations.
We employ the score-matching objective to train a Target Gradient Field (TarGF), indicating a direction on each object to increase the likelihood of the target distribution.
- Score: 8.49306925839127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object Rearrangement is to move objects from an initial state to a goal
state. Here, we focus on a more practical setting in object rearrangement,
i.e., rearranging objects from shuffled layouts to a normative target
distribution without explicit goal specification. However, it remains
challenging for AI agents, as it is hard to describe the target distribution
(goal specification) for reward engineering or collect expert trajectories as
demonstrations. Hence, it is infeasible to directly employ reinforcement
learning or imitation learning algorithms to address the task. This paper aims
to search for a policy only with a set of examples from a target distribution
instead of a handcrafted reward function. We employ the score-matching
objective to train a Target Gradient Field (TarGF), indicating a direction on
each object to increase the likelihood of the target distribution. For object
rearrangement, the TarGF can be used in two ways: 1) For model-based planning,
we can cast the target gradient into a reference control and output actions
with a distributed path planner; 2) For model-free reinforcement learning, the
TarGF is not only used for estimating the likelihood-change as a reward but
also provides suggested actions in residual policy learning. Experimental
results in ball rearrangement and room rearrangement demonstrate that our
method significantly outperforms the state-of-the-art methods in the quality of
the terminal state, the efficiency of the control process, and scalability. The
code and demo videos are on our project website.
Related papers
- Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - Self-training through Classifier Disagreement for Cross-Domain Opinion
Target Extraction [62.41511766918932]
Opinion target extraction (OTE) or aspect extraction (AE) is a fundamental task in opinion mining.
Recent work focus on cross-domain OTE, which is typically encountered in real-world scenarios.
We propose a new SSL approach that opts for selecting target samples whose model output from a domain-specific teacher and student network disagrees on the unlabelled target data.
arXiv Detail & Related papers (2023-02-28T16:31:17Z) - ReorientDiff: Diffusion Model based Reorientation for Object
Manipulation [18.95498618397922]
The ability to manipulate objects in a desired configurations is a fundamental requirement for robots to complete various practical applications.
We propose a reorientation planning method, ReorientDiff, that utilizes a diffusion model-based approach.
The proposed method is evaluated using a set of YCB-objects and a suction gripper, demonstrating a success rate of 95.2% in simulation.
arXiv Detail & Related papers (2023-02-28T00:08:38Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Generative multitask learning mitigates target-causing confounding [61.21582323566118]
We propose a simple and scalable approach to causal representation learning for multitask learning.
The improvement comes from mitigating unobserved confounders that cause the targets, but not the input.
Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.
arXiv Detail & Related papers (2022-02-08T20:42:14Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - Provable Representation Learning for Imitation with Contrastive Fourier
Features [27.74988221252854]
We consider using offline experience datasets to learn low-dimensional state representations.
A central challenge is that the unknown target policy itself may not exhibit low-dimensional behavior.
We derive a representation learning objective which provides an upper bound on the performance difference between the target policy and a lowdimensional policy trained with max-likelihood.
arXiv Detail & Related papers (2021-05-26T00:31:30Z) - Follow the Object: Curriculum Learning for Manipulation Tasks with
Imagined Goals [8.98526174345299]
This paper introduces a notion of imaginary object goals.
For a given manipulation task, the object of interest is first trained to reach a desired target position on its own.
The object policy is then leveraged to build a predictive model of plausible object trajectories.
The proposed algorithm, Follow the Object, has been evaluated on 7 MuJoCo environments.
arXiv Detail & Related papers (2020-08-05T12:19:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.