Color-Pair Guided Robust Zero-Shot 6D Pose Estimation and Tracking of Cluttered Objects on Edge Devices
- URL: http://arxiv.org/abs/2509.23647v1
- Date: Sun, 28 Sep 2025 05:07:49 GMT
- Title: Color-Pair Guided Robust Zero-Shot 6D Pose Estimation and Tracking of Cluttered Objects on Edge Devices
- Authors: Xingjian Yang, Ashis G. Banerjee,
- Abstract summary: We present a unified framework explicitly designed for efficient execution on edge devices.<n>Key to our approach is a shared, lighting-invariant color-pair feature representation.<n>For initial estimation, this feature facilitates robust registration between the live RGB-D view and the object's 3D mesh.<n>For tracking, the same feature logic validates temporal correspondences, enabling a lightweight model to reliably regress the object's motion.
- Score: 4.261261166281339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust 6D pose estimation of novel objects under challenging illumination remains a significant challenge, often requiring a trade-off between accurate initial pose estimation and efficient real-time tracking. We present a unified framework explicitly designed for efficient execution on edge devices, which synergizes a robust initial estimation module with a fast motion-based tracker. The key to our approach is a shared, lighting-invariant color-pair feature representation that forms a consistent foundation for both stages. For initial estimation, this feature facilitates robust registration between the live RGB-D view and the object's 3D mesh. For tracking, the same feature logic validates temporal correspondences, enabling a lightweight model to reliably regress the object's motion. Extensive experiments on benchmark datasets demonstrate that our integrated approach is both effective and robust, providing competitive pose estimation accuracy while maintaining high-fidelity tracking even through abrupt pose changes.
Related papers
- GeoMotion: Rethinking Motion Segmentation via Latent 4D Geometry [61.24189040578178]
We propose a fully learning-based approach that directly infers moving objects from latent feature representations via attention mechanisms.<n>Our key insight is to bypass explicit correspondence estimation and instead let the model learn to implicitly disentangle object and camera motion.<n>Our approach achieves state-of-the-art motion segmentation performance with high efficiency.
arXiv Detail & Related papers (2026-02-25T11:36:33Z) - Delving into Dynamic Scene Cue-Consistency for Robust 3D Multi-Object Tracking [16.366398265001422]
3D multi-object tracking is a critical and challenging task in the field of autonomous driving.<n>We introduce the Dynamic Scene Cue-Consistency Tracker (DSC-Track) to implement this principle.
arXiv Detail & Related papers (2025-08-15T08:48:13Z) - 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting [7.7145084897748974]
We present 6DOPE-GS, a novel method for online 6D object pose estimation & tracking with a single RGB-D camera.<n>We show that 6DOPE-GS matches the performance of state-of-the-art baselines for model-free simultaneous 6D pose tracking and reconstruction.<n>We also demonstrate the method's suitability for live, dynamic object tracking and reconstruction in a real-world setting.
arXiv Detail & Related papers (2024-12-02T14:32:19Z) - MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking [58.719310295870024]
This paper presents an event-based framework for tracking any point.<n>To resolve ambiguities caused by event sparsity, a motion-guidance module incorporates kinematic vectors into the local matching process.<n>The method improves the $Survival_50$ metric by 17.9% over event-only tracking of any point baseline.
arXiv Detail & Related papers (2024-12-02T09:13:29Z) - TrackAgent: 6D Object Tracking via Reinforcement Learning [24.621588217873395]
We propose to simplify object tracking to a reinforced point cloud (depth only) alignment task.
This allows us to train a streamlined approach from scratch with limited amounts of sparse 3D point clouds.
We also show that the RL agent's uncertainty and a rendering-based mask propagation are effective reinitialization triggers.
arXiv Detail & Related papers (2023-07-28T17:03:00Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action
Recognition from Egocentric RGB Videos [50.74218823358754]
We develop a transformer-based framework to exploit temporal information for robust estimation.
We build a network hierarchy with two cascaded transformer encoders, where the first one exploits the short-term temporal cue for hand pose estimation.
Our approach achieves competitive results on two first-person hand action benchmarks, namely FPHA and H2O.
arXiv Detail & Related papers (2022-09-20T05:52:54Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust
Correspondence Field Estimation and Pose Optimization [46.144194562841435]
We propose a framework based on a recurrent neural network (RNN) for object pose refinement.
The problem is formulated as a non-linear least squares problem based on the estimated correspondence field.
The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover accurate object poses.
arXiv Detail & Related papers (2022-03-24T06:24:55Z) - SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation [98.83762558394345]
SO-Pose is a framework for regressing all 6 degrees-of-freedom (6DoF) for the object pose in a cluttered environment from a single RGB image.
We introduce a novel reasoning about self-occlusion, in order to establish a two-layer representation for 3D objects.
Cross-layer consistencies that align correspondences, self-occlusion and 6D pose, we can further improve accuracy and robustness.
arXiv Detail & Related papers (2021-08-18T19:49:29Z) - se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image
Residuals in Synthetic Domains [12.71983073907091]
This work proposes a data-driven optimization approach for long-term, 6D pose tracking.
It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model.
The proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images.
arXiv Detail & Related papers (2020-07-27T21:09:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.