RoMa: Robust Dense Feature Matching
- URL: http://arxiv.org/abs/2305.15404v2
- Date: Mon, 11 Dec 2023 13:20:50 GMT
- Title: RoMa: Robust Dense Feature Matching
- Authors: Johan Edstedt, Qiyu Sun, Georg B\"okman, M{\aa}rten Wadenb\"ack,
Michael Felsberg
- Abstract summary: Feature matching is an important computer vision task that involves estimating correspondences between two images of a 3D scene.
We propose a model, leveraging frozen pretrained features from the foundation model DINOv2.
To further improve robustness, we propose a tailored transformer match decoder.
- Score: 17.015362716393216
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature matching is an important computer vision task that involves
estimating correspondences between two images of a 3D scene, and dense methods
estimate all such correspondences. The aim is to learn a robust model, i.e., a
model able to match under challenging real-world changes. In this work, we
propose such a model, leveraging frozen pretrained features from the foundation
model DINOv2. Although these features are significantly more robust than local
features trained from scratch, they are inherently coarse. We therefore combine
them with specialized ConvNet fine features, creating a precisely localizable
feature pyramid. To further improve robustness, we propose a tailored
transformer match decoder that predicts anchor probabilities, which enables it
to express multimodality. Finally, we propose an improved loss formulation
through regression-by-classification with subsequent robust regression. We
conduct a comprehensive set of experiments that show that our method, RoMa,
achieves significant gains, setting a new state-of-the-art. In particular, we
achieve a 36% improvement on the extremely challenging WxBS benchmark. Code is
provided at https://github.com/Parskatt/RoMa
Related papers
- Grounding Image Matching in 3D with MASt3R [8.14650201701567]
We propose to cast matching as a 3D task with DUSt3R, a powerful 3D reconstruction framework based on Transformers.
We propose to augment the DUSt3R network with a new head that outputs dense local features, trained with an additional matching loss.
Our approach, coined MASt3R, significantly outperforms the state of the art on multiple matching tasks.
arXiv Detail & Related papers (2024-06-14T06:46:30Z) - DiffComplete: Diffusion-based Generative 3D Shape Completion [114.43353365917015]
We introduce a new diffusion-based approach for shape completion on 3D range scans.
We strike a balance between realism, multi-modality, and high fidelity.
DiffComplete sets a new SOTA performance on two large-scale 3D shape completion benchmarks.
arXiv Detail & Related papers (2023-06-28T16:07:36Z) - Unifying Flow, Stereo and Depth Estimation [121.54066319299261]
We present a unified formulation and model for three motion and 3D perception tasks.
We formulate all three tasks as a unified dense correspondence matching problem.
Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
arXiv Detail & Related papers (2022-11-10T18:59:54Z) - Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks.
Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts.
Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z) - DFC: Deep Feature Consistency for Robust Point Cloud Registration [0.4724825031148411]
We present a novel learning-based alignment network for complex alignment scenes.
We validate our approach on the 3DMatch dataset and the KITTI odometry dataset.
arXiv Detail & Related papers (2021-11-15T08:27:21Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [9.924083358178239]
We propose two variants of self-attention for contextual modeling in 3D object detection.
We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors.
Next, we propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations.
arXiv Detail & Related papers (2021-01-07T18:30:32Z) - Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by
Enabling Input-Adaptive Inference [119.19779637025444]
Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images)
This paper studies multi-exit networks associated with input-adaptive inference, showing their strong promise in achieving a "sweet point" in cooptimizing model accuracy, robustness and efficiency.
arXiv Detail & Related papers (2020-02-24T00:40:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.