TransCamP: Graph Transformer for 6-DoF Camera Pose Estimation
        - URL: http://arxiv.org/abs/2105.14065v1
- Date: Fri, 28 May 2021 19:08:43 GMT
- Title: TransCamP: Graph Transformer for 6-DoF Camera Pose Estimation
- Authors: Xinyi Li, Haibin Ling
- Abstract summary: We propose a neural network approach with a graph transformer backbone, namely TransCamP, to address the camera relocalization problem.
TransCamP effectively fuses the image features, camera pose information and inter-frame relative camera motions into encoded graph attributes.
- Score: 77.09542018140823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Camera pose estimation or camera relocalization is the centerpiece in
numerous computer vision tasks such as visual odometry, structure from motion
(SfM) and SLAM. In this paper we propose a neural network approach with a graph
transformer backbone, namely TransCamP, to address the camera relocalization
problem. In contrast with prior work where the pose regression is mainly guided
by photometric consistency, TransCamP effectively fuses the image features,
camera pose information and inter-frame relative camera motions into encoded
graph attributes and is trained towards the graph consistency and accuracy
instead, yielding significantly higher computational efficiency. By leveraging
graph transformer layers with edge features and enabling tensorized adjacency
matrix, TransCamP dynamically captures the global attention and thus endows the
pose graph with evolving structures to achieve improved robustness and
accuracy. In addition, optional temporal transformer layers actively enhance
the spatiotemporal inter-frame relation for sequential inputs. Evaluation of
the proposed network on various public benchmarks demonstrates that TransCamP
outperforms state-of-the-art approaches.
 
      
        Related papers
        - Exploring Kernel Transformations for Implicit Neural Representations [57.2225355625268]
 Implicit neural representations (INRs) leverage neural networks to represent signals by mapping coordinates to their corresponding attributes.<n>This work pioneers the exploration of the effect of kernel transformation of input/output while keeping the model itself unchanged.<n>A byproduct of our findings is a simple yet effective method that combines scale and shift to significantly boost INR with negligible overhead.
 arXiv  Detail & Related papers  (2025-04-07T04:43:50Z)
- Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
 Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes.
In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers.
 arXiv  Detail & Related papers  (2025-01-28T06:42:37Z)
- ESVO2: Direct Visual-Inertial Odometry with Stereo Event Cameras [33.81592783496106]
 Event-based visual odometry aims at solving tracking and mapping sub-problems in parallel.
We build an event-based stereo visual-inertial odometry system on top of our previous direct pipeline Event-based Stereo Visual Odometry.
 arXiv  Detail & Related papers  (2024-10-12T05:35:27Z)
- GTransPDM: A Graph-embedded Transformer with Positional Decoupling for   Pedestrian Crossing Intention Prediction [6.327758022051579]
 GTransPDM was developed for pedestrian crossing intention prediction by leveraging multi-modal features.
It achieves 92% accuracy on the PIE dataset and 87% accuracy on the JAAD dataset, with a processing speed of 0.05ms.
 arXiv  Detail & Related papers  (2024-09-30T12:02:17Z)
- VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
 We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
 arXiv  Detail & Related papers  (2024-03-25T17:47:03Z)
- Automated Camera Calibration via Homography Estimation with GNNs [8.786192891436686]
 Governments and local administrations are increasingly relying on the data collected from cameras to enhance road safety and optimize traffic conditions.
It is imperative to ensure accurate and automated calibration of the involved cameras.
This paper proposes a novel approach to address this challenge by leveraging the topological structure of intersections.
 arXiv  Detail & Related papers  (2023-11-05T08:45:26Z)
- Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
 High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output.
Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion.
We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
 arXiv  Detail & Related papers  (2023-05-29T15:03:23Z)
- Vision Transformer with Convolutions Architecture Search [72.70461709267497]
 We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
 arXiv  Detail & Related papers  (2022-03-20T02:59:51Z)
- CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
 This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
 arXiv  Detail & Related papers  (2021-12-31T04:37:11Z)
- Homography Decomposition Networks for Planar Object Tracking [11.558401177707312]
 Planar object tracking plays an important role in AI applications, such as robotics, visual servoing, and visual SLAM.
We propose a novel Homography Decomposition Networks(HDN) approach that drastically reduces and stabilizes the condition number by decomposing the homography transformation into two groups.
 arXiv  Detail & Related papers  (2021-12-15T06:13:32Z)
- XCiT: Cross-Covariance Image Transformers [73.33400159139708]
 We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
 arXiv  Detail & Related papers  (2021-06-17T17:33:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.