RANSAC-Flow: generic two-stage image alignment
- URL: http://arxiv.org/abs/2004.01526v2
- Date: Fri, 17 Jul 2020 13:51:18 GMT
- Title: RANSAC-Flow: generic two-stage image alignment
- Authors: Xi Shen, Fran\c{c}ois Darmon, Alexei A. Efros, Mathieu Aubry
- Abstract summary: We show that a simple unsupervised approach performs surprisingly well across a range of tasks.
Despite its simplicity, our method shows competitive results on a range of tasks and datasets.
- Score: 53.11926395028508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers the generic problem of dense alignment between two
images, whether they be two frames of a video, two widely different views of a
scene, two paintings depicting similar content, etc. Whereas each such task is
typically addressed with a domain-specific solution, we show that a simple
unsupervised approach performs surprisingly well across a range of tasks. Our
main insight is that parametric and non-parametric alignment methods have
complementary strengths. We propose a two-stage process: first, a feature-based
parametric coarse alignment using one or more homographies, followed by
non-parametric fine pixel-wise alignment. Coarse alignment is performed using
RANSAC on off-the-shelf deep features. Fine alignment is learned in an
unsupervised way by a deep network which optimizes a standard structural
similarity metric (SSIM) between the two images, plus cycle-consistency.
Despite its simplicity, our method shows competitive results on a range of
tasks and datasets, including unsupervised optical flow on KITTI, dense
correspondences on Hpatches, two-view geometry estimation on YFCC100M,
localization on Aachen Day-Night, and, for the first time, fine alignment of
artworks on the Brughel dataset. Our code and data are available at
http://imagine.enpc.fr/~shenx/RANSAC-Flow/
Related papers
- Skeleton-Guided Instance Separation for Fine-Grained Segmentation in
Microscopy [23.848474219551818]
One of the fundamental challenges in microscopy (MS) image analysis is instance segmentation (IS)
We propose a novel one-stage framework named A2B-IS to address this challenge and enhance the accuracy of IS in MS images.
Our method has been thoroughly validated on two large-scale MS datasets.
arXiv Detail & Related papers (2024-01-18T11:14:32Z) - RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in
Dynamic Environments [55.864869961717424]
It is typically challenging for visual or visual-inertial odometry systems to handle the problems of dynamic scenes and pure rotation.
We design a novel visual-inertial odometry (VIO) system called RD-VIO to handle both of these problems.
arXiv Detail & Related papers (2023-10-23T16:30:39Z) - Asymmetric Cross-Scale Alignment for Text-Based Person Search [15.618984100653348]
Text-based person search (TBPS) is of significant importance in intelligent surveillance, which aims to retrieve pedestrian images with high semantic relevance to a given text description.
To implement this task, one needs to extract multi-scale features from both image and text domains, and then perform the cross-modal alignment.
We present a transformer-based model to extract multi-scale representations, and perform Asymmetric Cross-Scale Alignment (ACSA) to precisely align the two modalities.
arXiv Detail & Related papers (2022-11-26T08:34:35Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using
Transformer Encoders [14.634046503477979]
We present a novel approach called Transformer Reasoning and Alignment Network (TERAN)
TERAN enforces a fine-grained match between the underlying components of images and sentences.
On the MS-COCO 1K test set, we obtain an improvement of 5.7% and 3.5% respectively on the image and the sentence retrieval tasks.
arXiv Detail & Related papers (2020-08-12T11:02:40Z) - Graph Optimal Transport for Cross-Domain Alignment [121.80313648519203]
Cross-domain alignment is fundamental to computer vision and natural language processing.
We propose Graph Optimal Transport (GOT), a principled framework that germinates from recent advances in Optimal Transport (OT)
Experiments show consistent outperformance of GOT over baselines across a wide range of tasks.
arXiv Detail & Related papers (2020-06-26T01:14:23Z) - Coarse-to-Fine Gaze Redirection with Numerical and Pictorial Guidance [74.27389895574422]
We propose a novel gaze redirection framework which exploits both a numerical and a pictorial direction guidance.
The proposed method outperforms the state-of-the-art approaches in terms of both image quality and redirection precision.
arXiv Detail & Related papers (2020-04-07T01:17:27Z) - iFAN: Image-Instance Full Alignment Networks for Adaptive Object
Detection [48.83883375118966]
iFAN aims to precisely align feature distributions on both image and instance levels.
It outperforms state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.
arXiv Detail & Related papers (2020-03-09T13:27:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.