One-shot Video Imitation via Parameterized Symbolic Abstraction Graphs
- URL: http://arxiv.org/abs/2408.12674v2
- Date: Sun, 22 Sep 2024 22:04:02 GMT
- Title: One-shot Video Imitation via Parameterized Symbolic Abstraction Graphs
- Authors: Jianren Wang, Kangni Liu, Dingkun Guo, Xian Zhou, Christopher G Atkeson,
- Abstract summary: We propose to interpret video demonstrations through Symbolicized Abstraction Graphs (PSAG)
We further ground denote geometric constraints through simulation to estimate non-geometric, visually imperceptible attributes.
Our approach has been validated across a range of tasks, such as Cutting Avocado, Cutting Vegetable, Pouring Liquid, Rolling Dough, and Slicing Pizza.
- Score: 8.872100864022675
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning to manipulate dynamic and deformable objects from a single demonstration video holds great promise in terms of scalability. Previous approaches have predominantly focused on either replaying object relationships or actor trajectories. The former often struggles to generalize across diverse tasks, while the latter suffers from data inefficiency. Moreover, both methodologies encounter challenges in capturing invisible physical attributes, such as forces. In this paper, we propose to interpret video demonstrations through Parameterized Symbolic Abstraction Graphs (PSAG), where nodes represent objects and edges denote relationships between objects. We further ground geometric constraints through simulation to estimate non-geometric, visually imperceptible attributes. The augmented PSAG is then applied in real robot experiments. Our approach has been validated across a range of tasks, such as Cutting Avocado, Cutting Vegetable, Pouring Liquid, Rolling Dough, and Slicing Pizza. We demonstrate successful generalization to novel objects with distinct visual and physical properties.
Related papers
- GAMMA: Generalizable Articulation Modeling and Manipulation for
Articulated Objects [53.965581080954905]
We propose a novel framework of Generalizable Articulation Modeling and Manipulating for Articulated Objects (GAMMA)
GAMMA learns both articulation modeling and grasp pose affordance from diverse articulated objects with different categories.
Results show that GAMMA significantly outperforms SOTA articulation modeling and manipulation algorithms in unseen and cross-category articulated objects.
arXiv Detail & Related papers (2023-09-28T08:57:14Z) - UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with
Geometric Topology Guidance [6.577227592760559]
UnsMOT is a novel framework that combines appearance and motion features of objects with geometric information to provide more accurate tracking.
Experimental results show remarkable performance in terms of HOTA, IDF1, and MOTA metrics in comparison with state-of-the-art methods.
arXiv Detail & Related papers (2023-09-03T04:58:12Z) - DisPositioNet: Disentangled Pose and Identity in Semantic Image
Manipulation [83.51882381294357]
DisPositioNet is a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs.
Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph.
arXiv Detail & Related papers (2022-11-10T11:47:37Z) - Revealing Occlusions with 4D Neural Fields [19.71277637485384]
For computer vision systems to operate in dynamic situations, they need to be able to represent and reason about object permanence.
We introduce a framework for learning to estimate 4D visual representations from monocular-Dtemporal.
arXiv Detail & Related papers (2022-04-22T20:14:42Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Generalizable task representation learning from human demonstration
videos: a geometric approach [4.640835690336654]
We study the problem of generalizable task learning from human demonstration videos without extra training on the robot or pre-recorded robot motions.
We propose CoVGS-IL, which uses a graphstructured task function to learn task representations under structural constraints.
arXiv Detail & Related papers (2022-02-28T08:25:57Z) - Exploring Motion and Appearance Information for Temporal Sentence
Grounding [52.01687915910648]
We propose a Motion-Appearance Reasoning Network (MARN) to solve temporal sentence grounding.
We develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations.
Our proposed MARN significantly outperforms previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-01-03T02:44:18Z) - Attribute-Based Robotic Grasping with One-Grasp Adaptation [9.255994599301712]
We introduce an end-to-end learning method of attribute-based robotic grasping with one-grasp adaptation capability.
Our approach fuses the embeddings of a workspace image and a query text using a gated-attention mechanism and learns to predict instance grasping affordances.
Experimental results in both simulation and the real world demonstrate that our approach achieves over 80% instance grasping success rate on unknown objects.
arXiv Detail & Related papers (2021-04-06T03:40:46Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation.
This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z) - Learning Rope Manipulation Policies Using Dense Object Descriptors
Trained on Synthetic Depth Data [32.936908766549344]
We present an approach that learns point-pair correspondences between initial and goal rope configurations.
In 50 trials of a knot-tying task with the ABB YuMi Robot, the system achieves a 66% knot-tying success rate from previously unseen configurations.
arXiv Detail & Related papers (2020-03-03T23:43:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.