SOFI: Multi-Scale Deformable Transformer for Camera Calibration with Enhanced Line Queries
- URL: http://arxiv.org/abs/2409.15553v1
- Date: Mon, 23 Sep 2024 21:17:38 GMT
- Title: SOFI: Multi-Scale Deformable Transformer for Camera Calibration with Enhanced Line Queries
- Authors: Sebastian Janampa, Marios Pattichis,
- Abstract summary: We introduce textitmulti-Scale defOrmable transFormer for camera calibratIon for enhanced line queries, SOFI.
SOFI improves the line queries used in MSC-C and MSCC by using both line content and line geometric features.
It outperforms existing methods on the textit Google Street View, textit Horizon Line in the Wild, and textit Holicity datasets while keeping a competitive inference speed.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera calibration consists of estimating camera parameters such as the zenith vanishing point and horizon line. Estimating the camera parameters allows other tasks like 3D rendering, artificial reality effects, and object insertion in an image. Transformer-based models have provided promising results; however, they lack cross-scale interaction. In this work, we introduce \textit{multi-Scale defOrmable transFormer for camera calibratIon with enhanced line queries}, SOFI. SOFI improves the line queries used in CTRL-C and MSCC by using both line content and line geometric features. Moreover, SOFI's line queries allow transformer models to adopt the multi-scale deformable attention mechanism to promote cross-scale interaction between the feature maps produced by the backbone. SOFI outperforms existing methods on the \textit {Google Street View}, \textit {Horizon Line in the Wild}, and \textit {Holicity} datasets while keeping a competitive inference speed.
Related papers
- LiFCal: Online Light Field Camera Calibration via Bundle Adjustment [38.2887165481751]
LiFCal is an online calibration pipeline for MLA-based light field cameras.
It accurately determines model parameters from a moving camera sequence without precise calibration targets.
It can be applied in a target-free scene, and it is implemented online in a complete and continuous pipeline.
arXiv Detail & Related papers (2024-08-21T15:04:49Z) - Self-Supervised Pre-Training for Table Structure Recognition Transformer [25.04573593082671]
We propose a self-supervised pre-training (SSP) method for table structure recognition transformers.
We discover that the performance gap between the linear projection transformer and the hybrid CNN-transformer can be mitigated by SSP of the visual encoder in the TSR model.
arXiv Detail & Related papers (2024-02-23T19:34:06Z) - Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval [68.61855682218298]
Cross-modal retrieval methods employ two-stream encoders with different architectures for images and texts.
Inspired by recent advances of Transformers in vision tasks, we propose to unify the encoder architectures with Transformers for both modalities.
We design a cross-modal retrieval framework purely based on two-stream Transformers, dubbed textbfHierarchical Alignment Transformers (HAT), which consists of an image Transformer, a text Transformer, and a hierarchical alignment module.
arXiv Detail & Related papers (2023-08-08T15:43:59Z) - Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based
Transformer Network for Remote Sensing Image Super-Resolution [13.894645293832044]
Transformer-based models have shown competitive performance in remote sensing image super-resolution (RSISR)
We propose a novel transformer architecture called Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network (SPIFFNet) for RSISR.
Our proposed model effectively enhances global cognition and understanding of the entire image, facilitating efficient integration of features cross-stages.
arXiv Detail & Related papers (2023-07-06T13:19:06Z) - Xformer: Hybrid X-Shaped Transformer for Image Denoising [114.37510775636811]
We present a hybrid X-shaped vision Transformer, named Xformer, which performs notably on image denoising tasks.
Xformer achieves state-of-the-art performance on the synthetic and real-world image denoising tasks.
arXiv Detail & Related papers (2023-03-11T16:32:09Z) - CTRL-C: Camera calibration TRansformer with Line-Classification [22.092637979495358]
We propose Camera calibration TRansformer with Line-Classification (CTRL-C), an end-to-end neural network-based approach to single image camera calibration.
Our experiments demonstrate that benchmark-C outperforms the previous state-of-the-art methods on the Google Street View and SUN360 datasets.
arXiv Detail & Related papers (2021-09-06T06:30:38Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - TransCamP: Graph Transformer for 6-DoF Camera Pose Estimation [77.09542018140823]
We propose a neural network approach with a graph transformer backbone, namely TransCamP, to address the camera relocalization problem.
TransCamP effectively fuses the image features, camera pose information and inter-frame relative camera motions into encoded graph attributes.
arXiv Detail & Related papers (2021-05-28T19:08:43Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Infrastructure-based Multi-Camera Calibration using Radial Projections [117.22654577367246]
Pattern-based calibration techniques can be used to calibrate the intrinsics of the cameras individually.
Infrastucture-based calibration techniques are able to estimate the extrinsics using 3D maps pre-built via SLAM or Structure-from-Motion.
We propose to fully calibrate a multi-camera system from scratch using an infrastructure-based approach.
arXiv Detail & Related papers (2020-07-30T09:21:04Z) - Multi-camera calibration with pattern rigs, including for non-overlapping cameras: CALICO [1.8874301050354767]
This paper describes CALICO, a method for multi-camera calibration suitable for challenging contexts.
CalICO is a pattern-based approach, where the multi-calibration problem is formulated using rigidity constraints between patterns and cameras.
arXiv Detail & Related papers (2019-03-15T21:35:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.