DMESA: Densely Matching Everything by Segmenting Anything
- URL: http://arxiv.org/abs/2408.00279v1
- Date: Thu, 1 Aug 2024 04:39:36 GMT
- Title: DMESA: Densely Matching Everything by Segmenting Anything
- Authors: Yesheng Zhang, Xu Zhao,
- Abstract summary: We propose MESA and DMESA as novel feature matching methods.
MESA establishes implicit-semantic area matching prior to point matching, based on advanced image understanding of SAM.
With less repetitive computation, DMESA showcases a speed improvement of nearly five times compared to MESA.
- Score: 16.16319526547664
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose MESA and DMESA as novel feature matching methods, which utilize Segment Anything Model (SAM) to effectively mitigate matching redundancy. The key insight of our methods is to establish implicit-semantic area matching prior to point matching, based on advanced image understanding of SAM. Then, informative area matches with consistent internal semantic are able to undergo dense feature comparison, facilitating precise inside-area point matching. Specifically, MESA adopts a sparse matching framework and first obtains candidate areas from SAM results through a novel Area Graph (AG). Then, area matching among the candidates is formulated as graph energy minimization and solved by graphical models derived from AG. To address the efficiency issue of MESA, we further propose DMESA as its dense counterpart, applying a dense matching framework. After candidate areas are identified by AG, DMESA establishes area matches through generating dense matching distributions. The distributions are produced from off-the-shelf patch matching utilizing the Gaussian Mixture Model and refined via the Expectation Maximization. With less repetitive computation, DMESA showcases a speed improvement of nearly five times compared to MESA, while maintaining competitive accuracy. Our methods are extensively evaluated on five datasets encompassing indoor and outdoor scenes. The results illustrate consistent performance improvements from our methods for five distinct point matching baselines across all datasets. Furthermore, our methods exhibit promise generalization and improved robustness against image resolution variations. The code is publicly available at https://github.com/Easonyesheng/A2PM-MESA.
Related papers
- Multiway Point Cloud Mosaicking with Diffusion and Global Optimization [74.3802812773891]
We introduce a novel framework for multiway point cloud mosaicking (named Wednesday)
At the core of our approach is ODIN, a learned pairwise registration algorithm that identifies overlaps and refines attention scores.
Tested on four diverse, large-scale datasets, our method state-of-the-art pairwise and rotation registration results by a large margin on all benchmarks.
arXiv Detail & Related papers (2024-03-30T17:29:13Z) - Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap [11.96884248631201]
We tackle the multimodal version of the unsupervised domain generalization problem.
Our framework relies on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space.
We show theoretically that cross-modal approximate nearest neighbor search suffers from low recall due to the large distance between text queries and the image centroids used for coarse quantization.
arXiv Detail & Related papers (2024-02-06T21:29:37Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - OAMatcher: An Overlapping Areas-based Network for Accurate Local Feature
Matching [9.006654114778073]
We propose OAMatcher, a detector-free method that imitates humans behavior to generate dense and accurate matches.
OAMatcher predicts overlapping areas to promote effective and clean global context aggregation.
Comprehensive experiments demonstrate that OAMatcher outperforms the state-of-the-art methods on several benchmarks.
arXiv Detail & Related papers (2023-02-12T03:32:45Z) - Mining Relations among Cross-Frame Affinities for Video Semantic
Segmentation [87.4854250338374]
We explore relations among affinities in two aspects: single-scale intrinsic correlations and multi-scale relations.
Our experiments demonstrate that the proposed method performs favorably against state-of-the-art VSS methods.
arXiv Detail & Related papers (2022-07-21T12:12:36Z) - Seeking Similarities over Differences: Similarity-based Domain Alignment
for Adaptive Object Detection [86.98573522894961]
We propose a framework that generalizes the components commonly used by Unsupervised Domain Adaptation (UDA) algorithms for detection.
Specifically, we propose a novel UDA algorithm, ViSGA, that leverages the best design choices and introduces a simple but effective method to aggregate features at instance-level.
We show that both similarity-based grouping and adversarial training allows our model to focus on coarsely aligning feature groups, without being forced to match all instances across loosely aligned domains.
arXiv Detail & Related papers (2021-10-04T13:09:56Z) - Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models.
Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence)
We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z) - Structure-Consistent Weakly Supervised Salient Object Detection with
Local Saliency Coherence [14.79639149658596]
We propose a one-round end-to-end training approach for weakly supervised salient object detection via scribble annotations.
Our method achieves a new state-of-the-art performance on six benchmarks.
arXiv Detail & Related papers (2020-12-08T12:49:40Z) - Learning Independent Instance Maps for Crowd Localization [44.6430092887941]
We propose an end-to-end and straightforward framework for crowd localization, named Independent Instance Map segmentation (IIM)
IIM segment crowds into independent connected components, the positions and the crowd counts are obtained.
To improve the segmentation quality for different density regions, we present a differentiable Binarization Module (BM)
BM brings two advantages into localization models: 1) adaptively learn a threshold map for different images to detect each instance more accurately; 2) directly train the model using loss on binary predictions and labels.
arXiv Detail & Related papers (2020-12-08T02:17:19Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - Adaptive Mixture Regression Network with Local Counting Map for Crowd
Counting [16.816382549827214]
We introduce a new target, named local counting map (LCM), to obtain more accurate results than density map based approaches.
We also propose an adaptive mixture regression framework with three modules in a coarse-to-fine manner to further improve the precision of the crowd estimation.
arXiv Detail & Related papers (2020-05-12T13:54:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.