Related papers: EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow

EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow

URL: http://arxiv.org/abs/2507.06224v1
Date: Tue, 08 Jul 2025 17:57:03 GMT
Title: EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow
Authors: Yixiang Chen, Peiyan Li, Yan Huang, Jiabing Yang, Kehan Chen, Liang Wang,
Abstract summary: Embodiment-Centric Flow (EC-Flow) is a framework that learns manipulation from action-unlabeled videos.<n>Our key insight is that incorporating the embodiment's inherent kinematics significantly enhances generalization to versatile manipulation scenarios.<n> translating EC-Flow to executable robot actions only requires a standard robot URDF file to specify kinematic constraints.
Score: 10.674192015199996
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current language-guided robotic manipulation systems often require low-level action-labeled datasets for imitation learning. While object-centric flow prediction methods mitigate this issue, they remain limited to scenarios involving rigid objects with clear displacement and minimal occlusion. In this work, we present Embodiment-Centric Flow (EC-Flow), a framework that directly learns manipulation from action-unlabeled videos by predicting embodiment-centric flow. Our key insight is that incorporating the embodiment's inherent kinematics significantly enhances generalization to versatile manipulation scenarios, including deformable object handling, occlusions, and non-object-displacement tasks. To connect the EC-Flow with language instructions and object interactions, we further introduce a goal-alignment module by jointly optimizing movement consistency and goal-image prediction. Moreover, translating EC-Flow to executable robot actions only requires a standard robot URDF (Unified Robot Description Format) file to specify kinematic constraints across joints, which makes it easy to use in practice. We validate EC-Flow on both simulation (Meta-World) and real-world tasks, demonstrating its state-of-the-art performance in occluded object handling (62% improvement), deformable object manipulation (45% improvement), and non-object-displacement tasks (80% improvement) than prior state-of-the-art object-centric flow methods. For more information, see our project website at https://ec-flow1.github.io .

Related papers

ActionSink: Toward Precise Robot Manipulation with Dynamic Integration of Action Flow [93.00917887667234]
This paper introduces a novel robot manipulation framework, i.e., ActionSink, to pave the way toward precise action estimations.<n>As the name suggests, ActionSink reformulates the actions of robots as action-caused optical flows from videos, called "action flow"<n>Our framework outperformed prior SOTA on the LIBERO benchmark by a 7.9% success rate, and obtained nearly an 8% accuracy gain on the challenging long-horizon visual task LIBERO-Long.
arXiv Detail & Related papers (2025-08-05T08:46:17Z)
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow [4.2766838326810355]
We present ViSA-Flow, a framework that learns pre-labeled representation from unsupervised large-scale video data.<n>First, a generative-trained semantic action flow is automatically extracted from large-scale human-object interaction video data.<n>Second, this prior is efficiently adapted to a target robot by fine-tuning on a small set of robot demonstrations processed through the same semantic abstraction pipeline.
arXiv Detail & Related papers (2025-05-02T14:03:06Z)
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation [65.86819811007157]
We present G3Flow, a novel framework that constructs real-time semantic flow, a dynamic, object-centric 3D representation by leveraging foundation models.<n>Our approach uniquely combines 3D generative models for digital twin creation, vision foundation models for semantic feature extraction, and robust pose tracking for continuous semantic flow updates.<n>Our results demonstrate the effectiveness of G3Flow in enhancing real-time dynamic semantic feature understanding for robotic manipulation policies.
arXiv Detail & Related papers (2024-11-27T14:17:43Z)
GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation [10.48817934871207]
We propose a global motion-guided recurrent flow estimation method called GMFlow for pose estimation. We leverage the object's structural information to extend the motion of visible parts of the rigid body to its invisible regions. Our method outperforms existing techniques in accuracy while maintaining competitive computational efficiency.
arXiv Detail & Related papers (2024-11-26T07:28:48Z)
Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [57.942404069484134]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.<n>Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.<n>We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z)
ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching [20.20511152176522]
ActionFlow is a policy class that integrates spatial symmetry inductive biases. On the representation level, ActionFlow introduces an SE(3) Invariant Transformer architecture. For action generation, ActionFlow leverages Flow Matching, a state-of-the-art deep generative model.
arXiv Detail & Related papers (2024-09-06T19:30:36Z)
Flow as the Cross-Domain Manipulation Interface [73.15952395641136]
Im2Flow2Act enables robots to acquire real-world manipulation skills without the need of real-world robot training data. Im2Flow2Act comprises two components: a flow generation network and a flow-conditioned policy. We demonstrate Im2Flow2Act's capabilities in a variety of real-world tasks, including the manipulation of rigid, articulated, and deformable objects.
arXiv Detail & Related papers (2024-07-21T16:15:02Z)
SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving [18.88208422580103]
Scene flow estimation predicts the 3D motion at each point in successive LiDAR scans. Current state-of-the-art methods require annotated data to train scene flow networks. We propose SeFlow, a self-supervised method that integrates efficient dynamic classification into a learning-based scene flow pipeline.
arXiv Detail & Related papers (2024-07-01T18:22:54Z)
Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals. Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars. Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z)
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement [92.97142696891727]
IFOR, Iterative Flow Minimization for Robotic Object Rearrangement, is an end-to-end method for the problem of object rearrangement for unknown objects. We show that our method applies to cluttered scenes, and in the real world, while training only on synthetic data.
arXiv Detail & Related papers (2022-02-01T20:03:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.