PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers
- URL: http://arxiv.org/abs/2503.02388v1
- Date: Tue, 04 Mar 2025 08:24:08 GMT
- Title: PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers
- Authors: Wooju Lee, Juhye Park, Dasol Hong, Changki Sung, Youngwoo Seo, Dongwan Kang, Hyun Myung,
- Abstract summary: PIDLoc is a novel cross-view pose optimization approach inspired by the proportional-integral-derivative (PID) controller.<n>The PIDLoc achieves state-of-the-art performance in cross-view pose estimation for the KITTI dataset, reducing position error by $37.8%$ compared with the previous state-of-the-art.
- Score: 7.582581416640314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate localization is essential for autonomous driving, but GNSS-based methods struggle in challenging environments such as urban canyons. Cross-view pose optimization offers an effective solution by directly estimating vehicle pose using satellite-view images. However, existing methods primarily rely on cross-view features at a given pose, neglecting fine-grained contexts for precision and global contexts for robustness against large initial pose errors. To overcome these limitations, we propose PIDLoc, a novel cross-view pose optimization approach inspired by the proportional-integral-derivative (PID) controller. Using RGB images and LiDAR, the PIDLoc comprises the PID branches to model cross-view feature relationships and the spatially aware pose estimator (SPE) to estimate the pose from these relationships. The PID branches leverage feature differences for local context (P), aggregated feature differences for global context (I), and gradients of feature differences for precise pose adjustment (D) to enhance localization accuracy under large initial pose errors. Integrated with the PID branches, the SPE captures spatial relationships within the PID-branch features for consistent localization. Experimental results demonstrate that the PIDLoc achieves state-of-the-art performance in cross-view pose estimation for the KITTI dataset, reducing position error by $37.8\%$ compared with the previous state-of-the-art.
Related papers
- Multi-Modality Driven LoRA for Adverse Condition Depth Estimation [61.525312117638116]
We propose Multi-Modality Driven LoRA (MMD-LoRA) for Adverse Condition Depth Estimation.<n>It consists of two core components: Prompt Driven Domain Alignment (PDDA) and Visual-Text Consistent Contrastive Learning (VTCCL)<n>It achieves state-of-the-art performance on the nuScenes and Oxford RobotCar datasets.
arXiv Detail & Related papers (2024-12-28T14:23:58Z) - UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image [86.7128543480229]
We present a novel approach and benchmark, termed UNOPose, for unseen one-reference-based object pose estimation.
Building upon a coarse-to-fine paradigm, UNOPose constructs an SE(3)-invariant reference frame to standardize object representation.
We recalibrate the weight of each correspondence based on its predicted likelihood of being within the overlapping region.
arXiv Detail & Related papers (2024-11-25T05:36:00Z) - Reducing Semantic Ambiguity In Domain Adaptive Semantic Segmentation Via Probabilistic Prototypical Pixel Contrast [7.092718945468069]
Domain adaptation aims to reduce the model degradation on the target domain caused by the domain shift between the source and target domains.
Probabilistic proto-typical pixel contrast (PPPC) is a universal adaptation framework that models each pixel embedding as a probability.
PPPC not only helps to address ambiguity at the pixel level, yielding discriminative representations but also significant improvements in both synthetic-to-real and day-to-night adaptation tasks.
arXiv Detail & Related papers (2024-09-27T08:25:03Z) - SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning [17.99904937160487]
We introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning.
SCIPaD achieves a reduction of 22.2% in average translation error and 34.8% in average angular error for camera pose estimation task on the KITTI Odometry dataset.
arXiv Detail & Related papers (2024-07-07T06:52:51Z) - iComMa: Inverting 3D Gaussian Splatting for Camera Pose Estimation via Comparing and Matching [14.737266480464156]
We present a method named iComMa to address the 6D camera pose estimation problem in computer vision.
We propose an efficient method for accurate camera pose estimation by inverting 3D Gaussian Splatting (3DGS)
arXiv Detail & Related papers (2023-12-14T15:31:33Z) - Poses as Queries: Image-to-LiDAR Map Localization with Transformers [5.704968411509063]
High-precision vehicle localization with commercial setups is a crucial technique for high-level autonomous driving tasks.
Estimate pose by finding correspondences between such cross-modal sensor data is challenging.
We propose a novel Transformer-based neural network to register 2D images into 3D LiDAR map in an end-to-end manner.
arXiv Detail & Related papers (2023-05-07T14:57:58Z) - Relation Matters: Foreground-aware Graph-based Relational Reasoning for
Domain Adaptive Object Detection [81.07378219410182]
We propose a new and general framework for DomainD, named Foreground-aware Graph-based Reasoning (FGRR)
FGRR incorporates graph structures into the detection pipeline to explicitly model the intra- and inter-domain foreground object relations.
Empirical results demonstrate that the proposed FGRR exceeds the state-of-the-art on four DomainD benchmarks.
arXiv Detail & Related papers (2022-06-06T05:12:48Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose.
We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence.
Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z) - Improving the generalization of network based relative pose regression:
dimension reduction as a regularizer [16.63174637692875]
State-of-the-art visual localization methods perform pose estimation using geometry based solver within the RANSAC framework.
End-to-end learning based regression networks provide a solution to circumvent the requirement for precise pixel-level correspondences.
In this paper, we explicitly add a learnable matching layer within the network to isolate the pose regression solver from the absolute image feature values.
We implement this dimension regularization strategy within a two-layer pyramid based framework to regress the localization results from coarse to fine.
arXiv Detail & Related papers (2020-10-24T06:20:46Z) - Light Field Spatial Super-resolution via Deep Combinatorial Geometry
Embedding and Structural Consistency Regularization [99.96632216070718]
Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution.
The high-dimensional spatiality characteristic and complex geometrical structure of LF images make the problem more challenging than traditional single-image SR.
We propose a novel learning-based LF framework, in which each view of an LF image is first individually super-resolved.
arXiv Detail & Related papers (2020-04-05T14:39:57Z) - Deep Semantic Matching with Foreground Detection and Cycle-Consistency [103.22976097225457]
We address weakly supervised semantic matching based on a deep network.
We explicitly estimate the foreground regions to suppress the effect of background clutter.
We develop cycle-consistent losses to enforce the predicted transformations across multiple images to be geometrically plausible and consistent.
arXiv Detail & Related papers (2020-03-31T22:38:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.