CoMatcher: Multi-View Collaborative Feature Matching
- URL: http://arxiv.org/abs/2504.01872v1
- Date: Wed, 02 Apr 2025 16:27:44 GMT
- Title: CoMatcher: Multi-View Collaborative Feature Matching
- Authors: Jintao Zhang, Zimin Xia, Mingyue Dong, Shuhan Shen, Linwei Yue, Xianwei Zheng,
- Abstract summary: We introduce CoMatcher, a deep multi-view matcher to leverage complementary context cues from different views to form a holistic 3D scene understanding.<n>Building on CoMatcher, we develop a groupwise framework that fully exploits cross-view relationships for large-scale matching tasks.
- Score: 10.432708461699578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a multi-view collaborative matching strategy for reliable track construction in complex scenarios. We observe that the pairwise matching paradigms applied to image set matching often result in ambiguous estimation when the selected independent pairs exhibit significant occlusions or extreme viewpoint changes. This challenge primarily stems from the inherent uncertainty in interpreting intricate 3D structures based on limited two-view observations, as the 3D-to-2D projection leads to significant information loss. To address this, we introduce CoMatcher, a deep multi-view matcher to (i) leverage complementary context cues from different views to form a holistic 3D scene understanding and (ii) utilize cross-view projection consistency to infer a reliable global solution. Building on CoMatcher, we develop a groupwise framework that fully exploits cross-view relationships for large-scale matching tasks. Extensive experiments on various complex scenarios demonstrate the superiority of our method over the mainstream two-view matching paradigm.
Related papers
- Deep Incomplete Multi-view Clustering with Distribution Dual-Consistency Recovery Guidance [69.58609684008964]
We propose BURG, a novel method for incomplete multi-view clustering with distriBution dUal-consistency Recovery Guidance.<n>We treat each sample as a distinct category and perform cross-view distribution transfer to predict the distribution space of missing views.<n>To compensate for the lack of reliable category information, we design a dual-consistency guided recovery strategy that includes intra-view alignment guided by neighbor-aware consistency and cross-view alignment guided by prototypical consistency.
arXiv Detail & Related papers (2025-03-14T02:27:45Z) - Multi-View Factorizing and Disentangling: A Novel Framework for Incomplete Multi-View Multi-Label Classification [9.905528765058541]
We propose a novel framework for incomplete multi-view multi-label classification (iMvMLC)
Our method factorizes multi-view representations into two independent sets of factors: view-consistent and view-specific.
Our framework innovatively decomposes consistent representation learning into three key sub-objectives.
arXiv Detail & Related papers (2025-01-11T12:19:20Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot
Fine-grained Action Recognition [80.21796574234287]
M$3$Net is a matching-based framework for few-shot fine-grained (FS-FG) action recognition.
It incorporates textitmulti-view encoding, textitmulti-view matching, and textitmulti-view fusion to facilitate embedding encoding, similarity matching, and decision making.
Explainable visualizations and experimental results demonstrate the superiority of M$3$Net in capturing fine-grained action details.
arXiv Detail & Related papers (2023-08-06T09:15:14Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - HaMuCo: Hand Pose Estimation via Multiview Collaborative Self-Supervised
Learning [19.432034725468217]
HaMuCo is a self-supervised learning framework that learns a single-view hand pose estimator from multi-view pseudo 2D labels.
We introduce a cross-view interaction network that distills the single-view estimator by utilizing the cross-view correlated features.
Our method can achieve state-of-the-art performance on multi-view self-supervised hand pose estimation.
arXiv Detail & Related papers (2023-02-02T10:13:04Z) - A Clustering-guided Contrastive Fusion for Multi-view Representation
Learning [7.630965478083513]
We propose a deep fusion network to fuse view-specific representations into the view-common representation.
We also design an asymmetrical contrastive strategy that aligns the view-common representation and each view-specific representation.
In the incomplete view scenario, our proposed method resists noise interference better than those of our competitors.
arXiv Detail & Related papers (2022-12-28T07:21:05Z) - MORI-RAN: Multi-view Robust Representation Learning via Hybrid
Contrastive Fusion [4.36488705757229]
Multi-view representation learning is essential for many multi-view tasks, such as clustering and classification.
We propose a hybrid contrastive fusion algorithm to extract robust view-common representation from unlabeled data.
Experimental results demonstrated that the proposed method outperforms 12 competitive multi-view methods on four real-world datasets.
arXiv Detail & Related papers (2022-08-26T09:58:37Z) - Global-and-Local Collaborative Learning for Co-Salient Object Detection [162.62642867056385]
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images.
We propose a global-and-local collaborative learning architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM)
The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms eleven state-of-the-art competitors trained on some large datasets (about 8k-200k images)
arXiv Detail & Related papers (2022-04-19T14:32:41Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.