Related papers: Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset

Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset

URL: http://arxiv.org/abs/2309.13570v4
Date: Mon, 17 Jun 2024 22:08:05 GMT
Title: Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset
Authors: Zixun Huang, Keling Yao, Seth Z. Zhao, Chuanyu Pan, Chenfeng Xu, Kathy Zhuang, Tianjian Xu, Weiyu Feng, Allen Y. Yang,
Abstract summary: 6DoF pose estimation with mobile devices is the foundation for applications in robotics, augmented reality, and digital twin localization. We present a transformer-based 6DoF pose estimation approach called DTTDNet, featuring a novel geometric feature filtering module and a Chamfer distance loss for training. Our approach exhibits superior robustness to varying levels of measurement noise, setting a new benchmark for the robustness to noise measurements.
Score: 5.424654352786403
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Robust 6DoF pose estimation with mobile devices is the foundation for applications in robotics, augmented reality, and digital twin localization. In this paper, we extensively investigate the robustness of existing RGBD-based 6DoF pose estimation methods against varying levels of depth sensor noise. We highlight that existing 6DoF pose estimation methods suffer significant performance discrepancies due to depth measurement inaccuracies. In response to the robustness issue, we present a simple and effective transformer-based 6DoF pose estimation approach called DTTDNet, featuring a novel geometric feature filtering module and a Chamfer distance loss for training. Moreover, we advance the field of robust 6DoF pose estimation and introduce a new dataset -- Digital Twin Tracking Dataset Mobile (DTTD-Mobile), tailored for digital twin object tracking with noisy depth data from the mobile RGBD sensor suite of the Apple iPhone 14 Pro. Extensive experiments demonstrate that DTTDNet significantly outperforms state-of-the-art methods at least 4.32, up to 60.74 points in ADD metrics on the DTTD-Mobile. More importantly, our approach exhibits superior robustness to varying levels of measurement noise, setting a new benchmark for the robustness to noise measurements. Code and dataset are made publicly available at: https://github.com/augcog/DTTD2

Related papers

Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping [26.7709114619056]
We propose a novel framework, Monocular One-shot Metric-depth Alignment (MOMA), to recover metric depth from a single RGB image.<n>MOMA performs scale-rotation-shift alignments during camera calibration, guided by sparse ground-truth depth points.<n>Real-world experiments on tabletop 2-finger grasping and suction-based bin-picking applications show MOMA achieves high success rates in diverse tasks.
arXiv Detail & Related papers (2025-06-20T16:11:20Z)
TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion [54.46664104437454]
We propose TacoDepth, an efficient and accurate Radar-Camera depth estimation model with one-stage fusion.<n>Specifically, the graph-based Radar structure extractor and the pyramid-based Radar fusion module are designed.<n>Compared with the previous state-of-the-art approach, TacoDepth improves depth accuracy and processing speed by 12.8% and 91.8%.
arXiv Detail & Related papers (2025-04-16T05:25:04Z)
DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image [8.588871458005114]
We propose a novel completion-based method, named DEPTHOR, for depth enhancement in computer vision. First, we simulate real-world dToF data from the accurate ground truth in synthetic datasets to enable noise-robust training. Second, we design a novel network that incorporates monocular depth estimation (MDE), leveraging global depth relationships and contextual information to improve prediction in challenging regions.
arXiv Detail & Related papers (2025-04-02T11:02:21Z)
Bayesian Approximation-Based Trajectory Prediction and Tracking with 4D Radar [13.438311878715536]
3D multi-object tracking (MOT) is vital for autonomous vehicles, yet LiDAR and camera-based methods degrade in adverse weather. We propose Bayes-4DRTrack, a 4D Radar-based MOT framework that adopts a transformer-based motion prediction network.
arXiv Detail & Related papers (2025-02-03T13:49:21Z)
Robust Depth Enhancement via Polarization Prompt Fusion Tuning [112.88371907047396]
We present a framework that leverages polarization imaging to improve inaccurate depth measurements from various depth sensors. Our method first adopts a learning-based strategy where a neural network is trained to estimate a dense and complete depth map from polarization data and a sensor depth map from different sensors. To further improve the performance, we propose a Polarization Prompt Fusion Tuning (PPFT) strategy to effectively utilize RGB-based models pre-trained on large-scale datasets.
arXiv Detail & Related papers (2024-04-05T17:55:33Z)
RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations. Specifically, we leverage a pre-trained monocular estimator to extract local geometric information. A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z)
TransPose: A Transformer-based 6D Object Pose Estimation Network with Depth Refinement [5.482532589225552]
We propose TransPose, an improved Transformer-based 6D pose estimation with a depth refinement module. The architecture takes in only an RGB image as input with no additional supplementing modalities such as depth or thermal images. A novel depth refinement module is then used alongside the predicted centers, 6D poses and depth patches to refine the accuracy of the estimated 6D pose.
arXiv Detail & Related papers (2023-07-09T17:33:13Z)
Learning to Estimate 6DoF Pose from Limited Data: A Few-Shot, Generalizable Approach using RGB Images [60.0898989456276]
We present a new framework named Cas6D for few-shot 6DoF pose estimation that is generalizable and uses only RGB images. To address the false positives of target object detection in the extreme few-shot setting, our framework utilizes a self-supervised pre-trained ViT to learn robust feature representations. Experimental results on the LINEMOD and GenMOP datasets demonstrate that Cas6D outperforms state-of-the-art methods by 9.2% and 3.8% accuracy (Proj-5) under the 32-shot setting.
arXiv Detail & Related papers (2023-06-13T07:45:42Z)
Deep Metric Learning for Unsupervised Remote Sensing Change Detection [60.89777029184023]
Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs) The performance of existing RS-CD methods is attributed to training on large annotated datasets. This paper proposes an unsupervised CD method based on deep metric learning that can deal with both of these issues.
arXiv Detail & Related papers (2023-03-16T17:52:45Z)
Consistent Direct Time-of-Flight Video Depth Super-Resolution [9.173767380836852]
Direct time-of-flight (dToF) sensors are promising for next-generation on-device 3D sensing. We propose the first multi-frame fusion scheme to mitigate the spatial ambiguity resulting from the low-resolution dToF imaging. We introduce DyDToF, the first synthetic RGB-dToF video dataset that features dynamic objects and a realistic dToF simulator.
arXiv Detail & Related papers (2022-11-16T04:16:20Z)
Det6D: A Ground-Aware Full-Pose 3D Object Detector for Improving Terrain Robustness [1.4620086904601473]
We propose Det6D, the first full-degree-of-freedom 3D object detector without spatial and postural limitations. To predict full-degree poses, including pitch and roll, we design a ground-aware orientation branch. Experiments on various datasets demonstrate the effectiveness and robustness of our method in different terrains.
arXiv Detail & Related papers (2022-07-19T17:12:48Z)
FS6D: Few-Shot 6D Pose Estimation of Novel Objects [116.34922994123973]
6D object pose estimation networks are limited in their capability to scale to large numbers of object instances. In this work, we study a new open set problem; the few-shot 6D object poses estimation: estimating the 6D pose of an unknown object by a few support views without extra training.
arXiv Detail & Related papers (2022-03-28T10:31:29Z)
M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information. In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection. We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z)
PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space. We propose a model that unifies these two tasks in the same metric space. Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z)
3D Point-to-Keypoint Voting Network for 6D Pose Estimation [8.801404171357916]
We propose a framework for 6D pose estimation from RGB-D data based on spatial structure characteristics of 3D keypoints. The proposed method is verified on two benchmark datasets, LINEMOD and OCCLUSION LINEMOD.
arXiv Detail & Related papers (2020-12-22T11:43:15Z)
Uncertainty-Aware Deep Calibrated Salient Object Detection [74.58153220370527]
Existing deep neural network based salient object detection (SOD) methods mainly focus on pursuing high network accuracy. These methods overlook the gap between network accuracy and prediction confidence, known as the confidence uncalibration problem. We introduce an uncertaintyaware deep SOD network, and propose two strategies to prevent deep SOD networks from being overconfident.
arXiv Detail & Related papers (2020-12-10T23:28:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.