RFM-Pose:Reinforcement-Guided Flow Matching for Fast Category-Level 6D Pose Estimation
- URL: http://arxiv.org/abs/2602.05257v1
- Date: Thu, 05 Feb 2026 03:26:15 GMT
- Title: RFM-Pose:Reinforcement-Guided Flow Matching for Fast Category-Level 6D Pose Estimation
- Authors: Diya He, Qingchen Liu, Cong Zhang, Jiahu Qin,
- Abstract summary: We propose a new framework, RFM-Pose, that accelerates category-level 6D object pose generation while actively evaluating sampled hypotheses.<n> Experiments on the REAL275 benchmark demonstrate that RFM-Pose achieves favorable performance while significantly reducing computational cost.
- Score: 8.3336796041978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object pose estimation is a fundamental problem in computer vision and plays a critical role in virtual reality and embodied intelligence, where agents must understand and interact with objects in 3D space. Recently, score based generative models have to some extent solved the rotational symmetry ambiguity problem in category level pose estimation, but their efficiency remains limited by the high sampling cost of score-based diffusion. In this work, we propose a new framework, RFM-Pose, that accelerates category-level 6D object pose generation while actively evaluating sampled hypotheses. To improve sampling efficiency, we adopt a flow-matching generative model and generate pose candidates along an optimal transport path from a simple prior to the pose distribution. To further refine these candidates, we cast the flow-matching sampling process as a Markov decision process and apply proximal policy optimization to fine-tune the sampling policy. In particular, we interpret the flow field as a learnable policy and map an estimator to a value network, enabling joint optimization of pose generation and hypothesis scoring within a reinforcement learning framework. Experiments on the REAL275 benchmark demonstrate that RFM-Pose achieves favorable performance while significantly reducing computational cost. Moreover, similar to prior work, our approach can be readily adapted to object pose tracking and attains competitive results in this setting.
Related papers
- Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization [60.87651283510059]
Group Relative Policy Optimization (GRPO) effectively scales LLM reasoning but incurs prohibitive computational costs.<n>We propose Dynamic Pruning Policy Optimization (DPPO), a framework that enables dynamic pruning while preserving unbiased gradient estimation.<n>To mitigate the data sparsity induced by pruning, we introduce Dense Prompt Packing, a window-based greedy strategy.
arXiv Detail & Related papers (2026-03-04T14:48:53Z) - UniTac2Pose: A Unified Approach Learned in Simulation for Category-level Visuotactile In-hand Pose Estimation [19.042061670329733]
We propose a novel three-stage framework for in-hand pose estimation.<n>The first stage involves sampling and pre-ranking pose candidates, followed by iterative refinement of these candidates.<n>In the final stage, post-ranking is applied to identify the most likely pose candidates.
arXiv Detail & Related papers (2025-09-19T12:39:31Z) - Aligning Latent Spaces with Flow Priors [72.24305287508474]
This paper presents a novel framework for aligning learnable latent spaces to arbitrary target distributions by leveraging flow-based generative models as priors.<n> Notably, the proposed method eliminates computationally expensive likelihood evaluations and avoids ODE solving during optimization.
arXiv Detail & Related papers (2025-06-05T16:59:53Z) - Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling [84.00480999255628]
Reinforcement Learning algorithms for safety alignment of Large Language Models (LLMs) encounter the challenge of distribution shift.<n>Current approaches typically address this issue through online sampling from the target policy.<n>We propose a new framework that leverages the model's intrinsic safety judgment capability to extract reward signals.
arXiv Detail & Related papers (2025-03-13T06:40:34Z) - Can foundation models actively gather information in interactive environments to test hypotheses? [43.42688356541211]
Foundation models excel at single-turn reasoning but struggle with multi-turn exploration in dynamic environments.<n>We evaluated these models on their ability to learn from experience, adapt, and gather information.
arXiv Detail & Related papers (2024-12-09T12:27:21Z) - OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - GenPose: Generative Category-level Object Pose Estimation via Diffusion
Models [5.1998359768382905]
We propose a novel solution by reframing categorylevel object pose estimation as conditional generative modeling.
Our approach achieves state-of-the-art performance on the REAL275 dataset, surpassing 50% and 60% on strict 5d2cm and 5d5cm metrics.
arXiv Detail & Related papers (2023-06-18T11:45:42Z) - CPPF++: Uncertainty-Aware Sim2Real Object Pose Estimation by Vote Aggregation [67.12857074801731]
We introduce a novel method, CPPF++, designed for sim-to-real pose estimation.
To address the challenge posed by vote collision, we propose a novel approach that involves modeling the voting uncertainty.
We incorporate several innovative modules, including noisy pair filtering, online alignment optimization, and a feature ensemble.
arXiv Detail & Related papers (2022-11-24T03:27:00Z) - VL4Pose: Active Learning Through Out-Of-Distribution Detection For Pose
Estimation [79.50280069412847]
We introduce VL4Pose, a first principles approach for active learning through out-of-distribution detection.
Our solution involves modelling the pose through a simple parametric Bayesian network trained via maximum likelihood estimation.
We perform qualitative and quantitative experiments on three datasets: MPII, LSP and ICVL, spanning human and hand pose estimation.
arXiv Detail & Related papers (2022-10-12T09:03:55Z) - Robust Ego and Object 6-DoF Motion Estimation and Tracking [5.162070820801102]
This paper proposes a robust solution to achieve accurate estimation and consistent track-ability for dynamic multi-body visual odometry.
A compact and effective framework is proposed leveraging recent advances in semantic instance-level segmentation and accurate optical flow estimation.
A novel formulation, jointly optimizing SE(3) motion and optical flow is introduced that improves the quality of the tracked points and the motion estimation accuracy.
arXiv Detail & Related papers (2020-07-28T05:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.