Related papers: FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation

FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation

URL: http://arxiv.org/abs/2506.16201v1
Date: Thu, 19 Jun 2025 10:30:02 GMT
Title: FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation
Authors: Sen Wang, Le Wang, Sanping Zhou, Jingyi Tian, Jiayi Li, Haowen Sun, Wei Tang,
Abstract summary: FlowRAM is a novel framework that leverages generative models to achieve region-aware perception.<n>FlowRAM is able to generate physically plausible actions for a variety of real-world tasks in less than 4 time steps.
Score: 34.045199714747596
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robotic manipulation in high-precision tasks is essential for numerous industrial and real-world applications where accuracy and speed are required. Yet current diffusion-based policy learning methods generally suffer from low computational efficiency due to the iterative denoising process during inference. Moreover, these methods do not fully explore the potential of generative models for enhancing information exploration in 3D environments. In response, we propose FlowRAM, a novel framework that leverages generative models to achieve region-aware perception, enabling efficient multimodal information processing. Specifically, we devise a Dynamic Radius Schedule, which allows adaptive perception, facilitating transitions from global scene comprehension to fine-grained geometric details. Furthermore, we integrate state space models to integrate multimodal information, while preserving linear computational complexity. In addition, we employ conditional flow matching to learn action poses by regressing deterministic vector fields, simplifying the learning process while maintaining performance. We verify the effectiveness of the FlowRAM in the RLBench, an established manipulation benchmark, and achieve state-of-the-art performance. The results demonstrate that FlowRAM achieves a remarkable improvement, particularly in high-precision tasks, where it outperforms previous methods by 12.0% in average success rate. Additionally, FlowRAM is able to generate physically plausible actions for a variety of real-world tasks in less than 4 time steps, significantly increasing inference speed.

Related papers

ActionSink: Toward Precise Robot Manipulation with Dynamic Integration of Action Flow [93.00917887667234]
This paper introduces a novel robot manipulation framework, i.e., ActionSink, to pave the way toward precise action estimations.<n>As the name suggests, ActionSink reformulates the actions of robots as action-caused optical flows from videos, called "action flow"<n>Our framework outperformed prior SOTA on the LIBERO benchmark by a 7.9% success rate, and obtained nearly an 8% accuracy gain on the challenging long-horizon visual task LIBERO-Long.
arXiv Detail & Related papers (2025-08-05T08:46:17Z)
SSSUMO: Real-Time Semi-Supervised Submovement Decomposition [0.6499759302108926]
Submovement analysis offers valuable insights into motor control.<n>Existing methods struggle with reconstruction accuracy, computational cost, and validation.<n>We address these challenges using a semi-supervised learning framework.
arXiv Detail & Related papers (2025-07-08T21:26:25Z)
Active Learning for Manifold Gaussian Process Regression [5.618322163107168]
This paper introduces an active learning framework for manifold Gaussian Process (GP) regression.<n>It combines manifold learning with strategic data selection to improve accuracy in high-dimensional spaces.
arXiv Detail & Related papers (2025-06-26T01:25:39Z)
Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems [49.819436680336786]
We propose an efficient transformed Gaussian process state-space model (ETGPSSM) for scalable and flexible modeling of high-dimensional, non-stationary dynamical systems.<n>Specifically, our ETGPSSM integrates a single shared GP with input-dependent normalizing flows, yielding an expressive implicit process prior that captures complex, non-stationary transition dynamics.<n>Our ETGPSSM outperforms existing GPSSMs and neural network-based SSMs in terms of computational efficiency and accuracy.
arXiv Detail & Related papers (2025-03-24T03:19:45Z)
IMLE Policy: Fast and Sample Efficient Visuomotor Policy Learning via Implicit Maximum Likelihood Estimation [3.7584322469996896]
IMLE Policy is a novel behaviour cloning approach based on Implicit Maximum Likelihood Estimation (IMLE)<n>It excels in low-data regimes, effectively learning from minimal demonstrations and requiring 38% less data on average to match the performance of baseline methods in learning complex multi-modal behaviours.<n>We validate our approach across diverse manipulation tasks in simulated and real-world environments, showcasing its ability to capture complex behaviours under data constraints.
arXiv Detail & Related papers (2025-02-17T23:22:49Z)
SMamba: Sparse Mamba for Event-based Object Detection [17.141967728323714]
Transformer-based methods have achieved remarkable performance in event-based object detection, owing to the global modeling ability.<n>To mitigate cost, some researchers propose window attention based sparsification strategies to discard unimportant regions.<n>We propose Sparse Mamba, which performs adaptive sparsification to reduce computational effort while maintaining global modeling ability.
arXiv Detail & Related papers (2025-01-21T08:33:32Z)
Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation [22.653014803666668]
We propose a Faster LiDAR 3D object detection framework, called FASD, which implements heterogeneous model distillation by adaptively uniform cross-model voxel features. We aim to distill the transformer's capacity for high-performance sequence modeling into Mamba models with low FLOPs, achieving a significant improvement in accuracy through knowledge transfer. We evaluated the framework on datasets and nuScenes, achieving a 4x reduction in resource consumption and a 1-2% performance improvement over the current SoTA methods.
arXiv Detail & Related papers (2024-09-17T09:30:43Z)
PLANRL: A Motion Planning and Imitation Learning Framework to Bootstrap Reinforcement Learning [13.564676246832544]
We introduce PLANRL, a framework that chooses when the robot should use classical motion planning and when it should learn a policy. PLANRL switches between two modes of operation: reaching a waypoint using classical techniques when away from the objects and fine-grained manipulation control when about to interact with objects. We evaluate our approach across multiple challenging simulation environments and real-world tasks, demonstrating superior performance in terms of adaptability, efficiency, and generalization compared to existing methods.
arXiv Detail & Related papers (2024-08-07T19:30:08Z)
Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement Learning Approach [58.911515417156174]
We propose a new definition of Age of Information (AoI) and, based on the redefined AoI, we formulate an online AoI problem for MEC systems. We introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics. We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness.
arXiv Detail & Related papers (2023-12-01T01:30:49Z)
Generalizing Event-Based Motion Deblurring in Real-World Scenarios [62.995994797897424]
Event-based motion deblurring has shown promising results by exploiting low-latency events. We propose a scale-aware network that allows flexible input spatial scales and enables learning from different temporal scales of motion blur. A two-stage self-supervised learning scheme is then developed to fit real-world data distribution.
arXiv Detail & Related papers (2023-08-11T04:27:29Z)
Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes. Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z)
Learning to Continuously Optimize Wireless Resource In Episodically Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment. We propose to build the notion of continual learning into the modeling process of learning wireless systems. Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z)
Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression. Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice. A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.