Related papers: Estimating Extreme 3D Image Rotation with Transformer Cross-Attention

Estimating Extreme 3D Image Rotation with Transformer Cross-Attention

URL: http://arxiv.org/abs/2303.02615v2
Date: Fri, 8 Mar 2024 19:29:10 GMT
Title: Estimating Extreme 3D Image Rotation with Transformer Cross-Attention
Authors: Shay Dekel, Yosi Keller, Martin Cadik
Abstract summary: We propose a cross-attention-based approach that utilizes CNN feature maps and a Transformer-Encoder to compute the cross-attention between the activation maps of the image pairs. It is experimentally shown to outperform contemporary state-of-the-art schemes when applied to commonly used image rotation datasets and benchmarks.
Score: 13.82735766201496
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The estimation of large and extreme image rotation plays a key role in multiple computer vision domains, where the rotated images are related by a limited or a non-overlapping field of view. Contemporary approaches apply convolutional neural networks to compute a 4D correlation volume to estimate the relative rotation between image pairs. In this work, we propose a cross-attention-based approach that utilizes CNN feature maps and a Transformer-Encoder, to compute the cross-attention between the activation maps of the image pairs, which is shown to be an improved equivalent of the 4D correlation volume, used in previous works. In the suggested approach, higher attention scores are associated with image regions that encode visual cues of rotation. Our approach is end-to-end trainable and optimizes a simple regression loss. It is experimentally shown to outperform contemporary state-of-the-art schemes when applied to commonly used image rotation datasets and benchmarks, and establishes a new state-of-the-art accuracy on these datasets. We make our code publicly available.

Related papers

Exploring Kernel Transformations for Implicit Neural Representations [57.2225355625268]
Implicit neural representations (INRs) leverage neural networks to represent signals by mapping coordinates to their corresponding attributes. This work pioneers the exploration of the effect of kernel transformation of input/output while keeping the model itself unchanged. A byproduct of our findings is a simple yet effective method that combines scale and shift to significantly boost INR with negligible overhead.
arXiv Detail & Related papers (2025-04-07T04:43:50Z)
Multiview Image-Based Localization [2.594420805049218]
This paper represents a hybrid approach that stores only image features in the database like some IR methods. It relies on a latent 3D reconstruction, like 3D methods but without retaining a 3D scene reconstruction. Our approach shows improved performance on the 7-Scenes and Cambridge Landmarks datasets while also improving on timing and memory footprint as compared to state-of-the-art.
arXiv Detail & Related papers (2025-03-30T20:00:31Z)
3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction [50.07071392673984]
Existing methods learn 3D rotations parametrized in the spatial domain using angles or quaternions. We propose a frequency-domain approach that directly predicts Wigner-D coefficients for 3D rotation regression. Our method achieves state-of-the-art results on benchmarks such as ModelNet10-SO(3) and PASCAL3D+.
arXiv Detail & Related papers (2024-11-01T12:50:38Z)
Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction [4.689071714940848]
4D time-space reconstruction of dynamic events or deforming objects using Xray computed tomography (CT) is an extremely ill-posed inverse problem. Existing approaches assume that the object remains static for the duration of several tens or hundreds of X-ray projection measurement images. We propose to perform a 4D time-space reconstruction using a distributed implicit neural representation network that is trained using a novel distributed training algorithm.
arXiv Detail & Related papers (2024-04-29T19:41:51Z)
Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers [50.576354045312115]
Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model. We introduce a set of methods enabling cross-domain and cross-dimension transfer learning for image-to-graph transformers. We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we pretrain our models on 2D satellite images before applying them to vastly different target domains in 2D and 3D.
arXiv Detail & Related papers (2024-03-11T10:48:56Z)
Plug-and-Play Regularization on Magnitude with Deep Priors for 3D Near-Field MIMO Imaging [0.0]
Near-field radar imaging systems are used in a wide range of applications such as concealed weapon detection and medical diagnosis. We consider the problem of the three-dimensional (3D) complex-valued reflectivity by enforcing regularization on its magnitude.
arXiv Detail & Related papers (2023-12-26T12:25:09Z)
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z)
Explicit Correspondence Matching for Generalizable Neural Radiance Fields [49.49773108695526]
We present a new NeRF method that is able to generalize to new unseen scenarios and perform novel view synthesis with as few as two source views. The explicit correspondence matching is quantified with the cosine similarity between image features sampled at the 2D projections of a 3D point on different views. Our method achieves state-of-the-art results on different evaluation settings, with the experiments showing a strong correlation between our learned cosine feature similarity and volume density.
arXiv Detail & Related papers (2023-04-24T17:46:01Z)
Transformer-based Image Generation from Scene Graphs [11.443097632746763]
Graph-structured scene descriptions can be efficiently used in generative models to control the composition of the generated image. Previous approaches are based on the combination of graph convolutional networks and adversarial methods for layout prediction and image generation. We show how employing multi-head attention to encode the graph information can improve the quality of the sampled data.
arXiv Detail & Related papers (2023-03-08T14:54:51Z)
Extreme Rotation Estimation using Dense Correlation Volumes [73.35119461422153]
We present a technique for estimating the relative 3D rotation of an RGB image pair in an extreme setting. We observe that, even when images do not overlap, there may be rich hidden cues as to their geometric relationship. We propose a network design that can automatically learn such implicit cues by comparing all pairs of points between the two input images.
arXiv Detail & Related papers (2021-04-28T02:00:04Z)
Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy. But their inference time is typically slow, on the order of seconds for a pair of 540p images. We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z)
Fast Distance-based Anomaly Detection in Images Using an Inception-like Autoencoder [16.157879279661362]
A convolutional autoencoder (CAE) is trained to extract a low-dimensional representation of the images. We employ a distanced-based anomaly detector in the low-dimensional space of the learned representation for the images. We find that our approach resulted in improved predictive performance.
arXiv Detail & Related papers (2020-03-12T16:10:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.