Foldsformer: Learning Sequential Multi-Step Cloth Manipulation With
Space-Time Attention
- URL: http://arxiv.org/abs/2301.03003v1
- Date: Sun, 8 Jan 2023 09:15:45 GMT
- Title: Foldsformer: Learning Sequential Multi-Step Cloth Manipulation With
Space-Time Attention
- Authors: Kai Mo, Chongkun Xia, Xueqian Wang, Yuhong Deng, Xuehai Gao, Bin Liang
- Abstract summary: We present a novel multi-step cloth manipulation planning framework named Foldformer.
We experimentally evaluate Foldsformer on four representative sequential multi-step manipulation tasks.
Our approach can be transferred from simulation to the real world without additional training or domain randomization.
- Score: 4.2940878152791555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential multi-step cloth manipulation is a challenging problem in robotic
manipulation, requiring a robot to perceive the cloth state and plan a sequence
of chained actions leading to the desired state. Most previous works address
this problem in a goal-conditioned way, and goal observation must be given for
each specific task and cloth configuration, which is not practical and
efficient. Thus, we present a novel multi-step cloth manipulation planning
framework named Foldformer. Foldformer can complete similar tasks with only a
general demonstration and utilize a space-time attention mechanism to capture
the instruction information behind this demonstration. We experimentally
evaluate Foldsformer on four representative sequential multi-step manipulation
tasks and show that Foldsformer significantly outperforms state-of-the-art
approaches in simulation. Foldformer can complete multi-step cloth manipulation
tasks even when configurations of the cloth (e.g., size and pose) vary from
configurations in the general demonstrations. Furthermore, our approach can be
transferred from simulation to the real world without additional training or
domain randomization. Despite training on rectangular clothes, we also show
that our approach can generalize to unseen cloth shapes (T-shirts and shorts).
Videos and source code are available at:
https://sites.google.com/view/foldsformer.
Related papers
- SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation [82.61572106180705]
This paper presents a unified approach using vision-language models (VLMs) to improve keypoint prediction across various garment categories.
We created a large-scale synthetic dataset using advanced simulation techniques, allowing scalable training without extensive real-world data.
Experimental results indicate that the VLM-based method significantly enhances keypoint detection accuracy and task success rates.
arXiv Detail & Related papers (2024-09-26T17:26:16Z) - Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation [65.46610405509338]
We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation.
Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal.
We show that this approach of combining scalably learned track prediction with a residual policy enables diverse generalizable robot manipulation.
arXiv Detail & Related papers (2024-05-02T17:56:55Z) - Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools [14.069149456110676]
We introduce a demonstration-free hierarchical planning approach capable of tackling intricate long-horizon tasks.
We employ large language models (LLMs) to articulate a high-level, stage-by-stage plan corresponding to a specified task.
We further substantiate our approach with experimental trials on real-world robotic platforms.
arXiv Detail & Related papers (2023-11-05T22:43:29Z) - Learning to Act from Actionless Videos through Dense Correspondences [87.1243107115642]
We present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments.
Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals.
We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks.
arXiv Detail & Related papers (2023-10-12T17:59:23Z) - DexDeform: Dexterous Deformable Object Manipulation with Human
Demonstrations and Differentiable Physics [97.75188532559952]
We propose a principled framework that abstracts dexterous manipulation skills from human demonstration.
We then train a skill model using demonstrations for planning over action abstractions in imagination.
To evaluate the effectiveness of our approach, we introduce a suite of six challenging dexterous deformable object manipulation tasks.
arXiv Detail & Related papers (2023-03-27T17:59:49Z) - Learning Fabric Manipulation in the Real World with Human Videos [10.608723220309678]
Fabric manipulation is a long-standing challenge in robotics due to the enormous state space and complex dynamics.
Most prior methods rely heavily on simulation, which is still limited by the large sim-to-real gap of deformable objects.
A promising alternative is to learn fabric manipulation directly from watching humans perform the task.
arXiv Detail & Related papers (2022-11-05T07:09:15Z) - A Differentiable Recipe for Learning Visual Non-Prehensile Planar
Manipulation [63.1610540170754]
We focus on the problem of visual non-prehensile planar manipulation.
We propose a novel architecture that combines video decoding neural models with priors from contact mechanics.
We find that our modular and fully differentiable architecture performs better than learning-only methods on unseen objects and motions.
arXiv Detail & Related papers (2021-11-09T18:39:45Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - Encoding cloth manipulations using a graph of states and transitions [8.778914180886835]
We propose a generic, compact and simplified representation of the states of cloth manipulation.
We also define a Cloth Manipulation Graph that encodes all the strategies to accomplish a task.
arXiv Detail & Related papers (2020-09-30T13:56:13Z) - Learning Dense Visual Correspondences in Simulation to Smooth and Fold
Real Fabrics [35.84249614544505]
We learn visual correspondences for deformable fabrics across different configurations in simulation.
The learned correspondences can be used to compute geometrically equivalent actions in a new fabric configuration.
Results also suggest to fabrics of various colors, sizes, and shapes.
arXiv Detail & Related papers (2020-03-28T04:06:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.