Reprojection Errors as Prompts for Efficient Scene Coordinate Regression
- URL: http://arxiv.org/abs/2409.04178v1
- Date: Fri, 6 Sep 2024 10:43:34 GMT
- Title: Reprojection Errors as Prompts for Efficient Scene Coordinate Regression
- Authors: Ting-Ru Liu, Hsuan-Kung Yang, Jou-Min Liu, Chun-Wei Huang, Tsung-Chih Chiang, Quan Kong, Norimasa Kobori, Chun-Yi Lee,
- Abstract summary: Scene coordinate regression (SCR) methods have emerged as a promising area of research due to their potential for accurate visual localization.
Many existing SCR approaches train on samples from all image regions, including dynamic objects and texture-less areas.
We introduce an error-guided feature selection mechanism, in tandem with the use of the Segment Anything Model (SAM)
This mechanism seeds low reprojection areas as prompts and expands them into error-guided masks, and then utilizes these masks to sample points and filter out problematic areas in an iterative manner.
- Score: 9.039259735902625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene coordinate regression (SCR) methods have emerged as a promising area of research due to their potential for accurate visual localization. However, many existing SCR approaches train on samples from all image regions, including dynamic objects and texture-less areas. Utilizing these areas for optimization during training can potentially hamper the overall performance and efficiency of the model. In this study, we first perform an in-depth analysis to validate the adverse impacts of these areas. Drawing inspiration from our analysis, we then introduce an error-guided feature selection (EGFS) mechanism, in tandem with the use of the Segment Anything Model (SAM). This mechanism seeds low reprojection areas as prompts and expands them into error-guided masks, and then utilizes these masks to sample points and filter out problematic areas in an iterative manner. The experiments demonstrate that our method outperforms existing SCR approaches that do not rely on 3D information on the Cambridge Landmarks and Indoor6 datasets.
Related papers
- Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios.
We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Multi-Resolution Planar Region Extraction for Uneven Terrains [6.482137641059034]
This paper studies the problem of extracting planar regions in uneven terrains from unordered point cloud measurements.
We propose a multi-resolution planar region extraction strategy that balances the accuracy in boundaries and computational efficiency.
arXiv Detail & Related papers (2023-11-21T12:17:51Z) - Leveraging Neural Radiance Fields for Uncertainty-Aware Visual
Localization [56.95046107046027]
We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for scene coordinate regression.
Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain.
arXiv Detail & Related papers (2023-10-10T20:11:13Z) - Domain Reduction Strategy for Non Line of Sight Imaging [20.473142941237015]
In non-line-of-sight (NLOS) imaging, the visible surfaces of the target objects are notably sparse.
We design our method to render the transients through partial propagations from a continuously sampled set of points from the hidden space.
Our method is capable of accurately and efficiently modeling the view-dependent reflectance using surface normals.
arXiv Detail & Related papers (2023-08-20T14:00:33Z) - LoLep: Single-View View Synthesis with Locally-Learned Planes and
Self-Attention Occlusion Inference [66.45326873274908]
We propose a novel method, LoLep, which regresses Locally-Learned planes from a single RGB image to represent scenes accurately.
Compared to MINE, our approach has an LPIPS reduction of 4.8%-9.0% and an RV reduction of 73.9%-83.5%.
arXiv Detail & Related papers (2023-07-23T03:38:55Z) - Subspace Perturbation Analysis for Data-Driven Radar Target Localization [20.34399283905663]
We use subspace analysis to benchmark radar target localization accuracy across mismatched scenarios.
We generate comprehensive datasets by randomly placing targets of variable strengths in mismatched constrained areas.
We estimate target locations from these heatmap tensors using a convolutional neural network.
arXiv Detail & Related papers (2023-03-14T21:22:26Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing
Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process.
The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.