Steerers: A framework for rotation equivariant keypoint descriptors
- URL: http://arxiv.org/abs/2312.02152v2
- Date: Tue, 2 Apr 2024 09:40:33 GMT
- Title: Steerers: A framework for rotation equivariant keypoint descriptors
- Authors: Georg Bökman, Johan Edstedt, Michael Felsberg, Fredrik Kahl,
- Abstract summary: Keypoint descriptions that are discriminative and matchable over large changes in viewpoint are vital for 3D reconstruction.
We learn a linear transform in description space that encodes rotations of the input image.
We obtain state-of-the-art results on the rotation invariant image matching benchmarks AIMS and Roto-360.
- Score: 26.31402935889126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image keypoint descriptions that are discriminative and matchable over large changes in viewpoint are vital for 3D reconstruction. However, descriptions output by learned descriptors are typically not robust to camera rotation. While they can be made more robust by, e.g., data augmentation, this degrades performance on upright images. Another approach is test-time augmentation, which incurs a significant increase in runtime. Instead, we learn a linear transform in description space that encodes rotations of the input image. We call this linear transform a steerer since it allows us to transform the descriptions as if the image was rotated. From representation theory, we know all possible steerers for the rotation group. Steerers can be optimized (A) given a fixed descriptor, (B) jointly with a descriptor or (C) we can optimize a descriptor given a fixed steerer. We perform experiments in these three settings and obtain state-of-the-art results on the rotation invariant image matching benchmarks AIMS and Roto-360. We publish code and model weights at https://github.com/georg-bn/rotation-steerers.
Related papers
- Higher order PCA-like rotation-invariant features for detailed shape descriptors modulo rotation [0.2320648715016106]
PCA can be used for rotation invariant features, describing a shape with its $p_ab=E[(x_i-E[x_a])(x_b-E[x_b)]$ comodimating shape by ellipsoid.<n>Real shapes are usually much more complicated, hence there is proposed its extension to e.g. $p_abc=E[(x_a-E[x_a])(x_b-E[x_b))
arXiv Detail & Related papers (2026-01-06T15:24:20Z) - Eff-GRot: Efficient and Generalizable Rotation Estimation with Transformers [35.57122848273358]
We introduce Eff-GRot, an approach for efficient and generalizable rotation estimation from RGB images.<n>Given a query image and a set of reference images with known orientations, our method directly predicts the object's rotation in a single forward pass.
arXiv Detail & Related papers (2025-12-21T15:57:13Z) - Selective Rotary Position Embedding [84.22998043041198]
We introduce textitSelective RoPE, an textitinput-dependent rotary embedding mechanism.<n>We show that softmax attention already performs a hidden form of these rotations on query-key pairs.<n>We validate our method by equipping gated transformers with textitSelective RoPE, demonstrating that its input-dependent rotations improve performance in language modeling.
arXiv Detail & Related papers (2025-11-21T16:50:00Z) - RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation [59.830657530592255]
Multimodal Large Language Models (MLLMs) can accurately identify the orientation of input images rotated 0deg, 90deg, 180deg, and 270deg.<n>This task demands robust visual reasoning capabilities to detect rotational cues and contextualize spatial relationships within images, regardless of their orientation.<n>We show that several state-of-the-art open and proprietary MLLMs, including GPT-5, o3, and Gemini-2.5-Pro, do not reliably identify rotation in input images.
arXiv Detail & Related papers (2025-08-19T15:58:25Z) - Affine steerers for structured keypoint description [26.31402935889126]
We propose a way to train deep learning based keypoint descriptors that makes them approximately equivariant for locally affine transformations of the image plane.
We demonstrate the potential of using this control for image matching.
arXiv Detail & Related papers (2024-08-26T11:22:52Z) - Learning with 3D rotations, a hitchhiker's guide to SO(3) [17.802455837461125]
This paper acts as a survey and guide through rotation representations.
By consolidating insights from rotation-based learning, we provide a comprehensive overview of learning functions with rotation representations.
arXiv Detail & Related papers (2024-04-17T20:37:29Z) - Rotation Invariant Transformer for Recognizing Object in UAVs [66.1564328237299]
We propose a novel rotation invariant vision transformer (RotTrans) forRecognizing targets of interest from UAVs.
RotTrans greatly outperforms the current state-of-the-arts, which is 5.9% and 4.8% higher than the highest mAP and Rank1.
Our solution wins the first place in the UAV-based person re-recognition track in the Multi-Modal Video Reasoning and Analyzing Competition.
arXiv Detail & Related papers (2023-11-05T03:55:08Z) - Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem.
In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images.
The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z) - PaRot: Patch-Wise Rotation-Invariant Network via Feature Disentanglement
and Pose Restoration [16.75367717130046]
State-of-the-art models are not robust to rotations, which remains an unknown prior to real applications.
We introduce a novel Patch-wise Rotation-invariant network (PaRot)
Our disentanglement module extracts high-quality rotation-robust features and the proposed lightweight model achieves competitive results.
arXiv Detail & Related papers (2023-02-06T02:13:51Z) - Rethinking Rotation Invariance with Point Cloud Registration [18.829454172955202]
We propose an effective framework for rotation invariance learning via three sequential stages, namely rotation-invariant shape encoding, aligned feature integration, and deep feature registration.
Experimental results on 3D shape classification, part segmentation, and retrieval tasks prove the feasibility of our work.
arXiv Detail & Related papers (2022-12-31T08:17:09Z) - Category-Level 6D Object Pose Estimation with Flexible Vector-Based
Rotation Representation [51.67545893892129]
We propose a novel 3D graph convolution based pipeline for category-level 6D pose and size estimation from monocular RGB-D images.
We first design an orientation-aware autoencoder with 3D graph convolution for latent feature learning.
Then, to efficiently decode the rotation information from the latent feature, we design a novel flexible vector-based decomposable rotation representation.
arXiv Detail & Related papers (2022-12-09T02:13:43Z) - Orthonormal Convolutions for the Rotation Based Iterative
Gaussianization [64.44661342486434]
This paper elaborates an extension of rotation-based iterative Gaussianization, RBIG, which makes image Gaussianization possible.
In images its application has been restricted to small image patches or isolated pixels, because rotation in RBIG is based on principal or independent component analysis.
We present the emphConvolutional RBIG: an extension that alleviates this issue by imposing that the rotation in RBIG is a convolution.
arXiv Detail & Related papers (2022-06-08T12:56:34Z) - Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes [86.2129580231191]
Adjoint Rigid Transform (ART) Network is a neural module which can be integrated with a variety of 3D networks.
ART learns to rotate input shapes to a learned canonical orientation, which is crucial for a lot of tasks.
We will release our code and pre-trained models for further research.
arXiv Detail & Related papers (2021-02-01T20:58:45Z) - Learning Feature Descriptors using Camera Pose Supervision [101.56783569070221]
We propose a novel weakly-supervised framework that can learn feature descriptors solely from relative camera poses between images.
Because we no longer need pixel-level ground-truth correspondences, our framework opens up the possibility of training on much larger and more diverse datasets for better and unbiased descriptors.
arXiv Detail & Related papers (2020-04-28T06:35:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.