ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching
- URL: http://arxiv.org/abs/2409.04576v1
- Date: Fri, 6 Sep 2024 19:30:36 GMT
- Title: ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching
- Authors: Niklas Funk, Julen Urain, Joao Carvalho, Vignesh Prasad, Georgia Chalvatzaki, Jan Peters,
- Abstract summary: ActionFlow is a policy class that integrates spatial symmetry inductive biases.
On the representation level, ActionFlow introduces an SE(3) Invariant Transformer architecture.
For action generation, ActionFlow leverages Flow Matching, a state-of-the-art deep generative model.
- Score: 20.20511152176522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial understanding is a critical aspect of most robotic tasks, particularly when generalization is important. Despite the impressive results of deep generative models in complex manipulation tasks, the absence of a representation that encodes intricate spatial relationships between observations and actions often limits spatial generalization, necessitating large amounts of demonstrations. To tackle this problem, we introduce a novel policy class, ActionFlow. ActionFlow integrates spatial symmetry inductive biases while generating expressive action sequences. On the representation level, ActionFlow introduces an SE(3) Invariant Transformer architecture, which enables informed spatial reasoning based on the relative SE(3) poses between observations and actions. For action generation, ActionFlow leverages Flow Matching, a state-of-the-art deep generative model known for generating high-quality samples with fast inference - an essential property for feedback control. In combination, ActionFlow policies exhibit strong spatial and locality biases and SE(3)-equivariant action generation. Our experiments demonstrate the effectiveness of ActionFlow and its two main components on several simulated and real-world robotic manipulation tasks and confirm that we can obtain equivariant, accurate, and efficient policies with spatially symmetric flow matching. Project website: https://flowbasedpolicies.github.io/
Related papers
- Rethinking Traffic Flow Forecasting: From Transition to Generatation [0.0]
We propose an Effective Multi-Branch Similarity Transformer for Traffic Flow Prediction, namely EMBSFormer.
We find that the factors affecting traffic flow include node-level traffic generation and graph-level traffic transition, which describe the multi-periodicity and interaction pattern of nodes, respectively.
For traffic transition, we employ a temporal and spatial self-attention mechanism to maintain global node interactions, and use GNN and time conv to model local node interactions, respectively.
arXiv Detail & Related papers (2025-04-19T09:52:39Z) - SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.
We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.
With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z) - Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy [56.424032454461695]
We present Dita, a scalable framework that leverages Transformer architectures to directly denoise continuous action sequences.
Dita employs in-context conditioning -- enabling fine-grained alignment between denoised actions and raw visual tokens from historical observations.
Dita effectively integrates cross-embodiment datasets across diverse camera perspectives, observation scenes, tasks, and action spaces.
arXiv Detail & Related papers (2025-03-25T15:19:56Z) - MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems [25.039200070508603]
Multimodal Large Language Models (MLLMs) have yet to fully demonstrate their potential in visual mathematical problem-solving.
We develop FlowVerse, a benchmark that categorizes all information used during problem-solving into four components.
We introduce MathFlow, a modular problem-solving pipeline that decouples perception and inference into distinct stages.
arXiv Detail & Related papers (2025-03-19T11:46:19Z) - G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation [65.86819811007157]
We present G3Flow, a novel framework that constructs real-time semantic flow, a dynamic, object-centric 3D representation by leveraging foundation models.
Our approach uniquely combines 3D generative models for digital twin creation, vision foundation models for semantic feature extraction, and robust pose tracking for continuous semantic flow updates.
Our results demonstrate the effectiveness of G3Flow in enhancing real-time dynamic semantic feature understanding for robotic manipulation policies.
arXiv Detail & Related papers (2024-11-27T14:17:43Z) - ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video [26.01796507893086]
This paper proposes a 3D motion perception method called ScaleFlow++ that is easy to generalize.
With just a pair of RGB images, ScaleFlow++ can robustly estimate optical flow and motion-in-depth (MID)
On KITTI, ScaleFlow++ achieved the best monocular scene flow estimation performance, reducing SF-all from 6.21 to 5.79.
arXiv Detail & Related papers (2024-09-16T11:59:27Z) - SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving [18.88208422580103]
Scene flow estimation predicts the 3D motion at each point in successive LiDAR scans.
Current state-of-the-art methods require annotated data to train scene flow networks.
We propose SeFlow, a self-supervised method that integrates efficient dynamic classification into a learning-based scene flow pipeline.
arXiv Detail & Related papers (2024-07-01T18:22:54Z) - Baking Symmetry into GFlowNets [58.932776403471635]
GFlowNets have exhibited promising performance in generating diverse candidates with high rewards.
This study aims to integrate symmetries into GFlowNets by identifying equivalent actions during the generation process.
arXiv Detail & Related papers (2024-06-08T10:11:10Z) - MaskFlow: Object-Aware Motion Estimation [0.45646200630189254]
We introduce a novel motion estimation method, MaskFlow, that is capable of estimating accurate motion fields.
In addition to lower-level features, that are used in other Deep Neural Network (DNN)-based motion estimation methods, MaskFlow draws from object-level features and segmentations.
arXiv Detail & Related papers (2023-11-21T09:37:49Z) - TransFlow: Transformer as Flow Learner [22.727953339383344]
We propose TransFlow, a pure transformer architecture for optical flow estimation.
It provides more accurate correlation and trustworthy matching in flow estimation.
It recovers more compromised information in flow estimation through long-range temporal association in dynamic scenes.
arXiv Detail & Related papers (2023-04-23T03:11:23Z) - PointFlowHop: Green and Interpretable Scene Flow Estimation from
Consecutive Point Clouds [49.7285297470392]
An efficient 3D scene flow estimation method called PointFlowHop is proposed in this work.
PointFlowHop takes two consecutive point clouds and determines the 3D flow vectors for every point in the first point cloud.
It decomposes the scene flow estimation task into a set of subtasks, including ego-motion compensation, object association and object-wise motion estimation.
arXiv Detail & Related papers (2023-02-27T23:06:01Z) - D$^3$FlowSLAM: Self-Supervised Dynamic SLAM with Flow Motion Decomposition and DINO Guidance [61.14088096348959]
We introduce a self-supervised deep SLAM method that robustly operates in dynamic scenes while accurately identifying dynamic components.
We propose a dynamic update module based on this representation and develop a dense SLAM system that excels in dynamic scenarios.
arXiv Detail & Related papers (2022-07-18T17:47:39Z) - LocATe: End-to-end Localization of Actions in 3D with Transformers [91.28982770522329]
LocATe is an end-to-end approach that jointly localizes and recognizes actions in a 3D sequence.
Unlike transformer-based object-detection and classification models which consider image or patch features as input, LocATe's transformer model is capable of capturing long-term correlations between actions in a sequence.
We introduce a new, challenging, and more realistic benchmark dataset, BABEL-TAL-20 (BT20), where the performance of state-of-the-art methods is significantly worse.
arXiv Detail & Related papers (2022-03-21T03:35:32Z) - GMFlow: Learning Optical Flow via Global Matching [124.57850500778277]
We propose a GMFlow framework for learning optical flow estimation.
It consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation.
Our new framework outperforms 32-iteration RAFT's performance on the challenging Sintel benchmark.
arXiv Detail & Related papers (2021-11-26T18:59:56Z) - Towards High-Quality Temporal Action Detection with Sparse Proposals [14.923321325749196]
Temporal Action Detection aims to localize the temporal segments containing human action instances and predict the action categories.
We introduce Sparse Proposals to interact with the hierarchical features.
Experiments demonstrate the effectiveness of our method, especially under high tIoU thresholds.
arXiv Detail & Related papers (2021-09-18T06:15:19Z) - Flow Network based Generative Models for Non-Iterative Diverse Candidate
Generation [110.09855163856326]
This paper is about the problem of learning a policy for generating an object from a sequence of actions.
We propose GFlowNet, based on a view of the generative process as a flow network.
We prove that any global minimum of the proposed objectives yields a policy which samples from the desired distribution.
arXiv Detail & Related papers (2021-06-08T14:21:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.