Related papers: LLHA-Net: A Hierarchical Attention Network for Two-View Correspondence Learning

LLHA-Net: A Hierarchical Attention Network for Two-View Correspondence Learning

URL: http://arxiv.org/abs/2512.24620v1
Date: Wed, 31 Dec 2025 04:25:53 GMT
Title: LLHA-Net: A Hierarchical Attention Network for Two-View Correspondence Learning
Authors: Shuyuan Lin, Yu Guo, Xiao Chen, Yanjie Liang, Guobao Xiao, Feiran Huang,
Abstract summary: We propose a novel method called Layer-by-Layer Hierarchical Attention Network.<n>It enhances the precision of feature point matching in computer vision by addressing the issue of outliers.<n>Our method incorporates stage fusion, hierarchical extraction, and an attention mechanism to improve the network's representation capability.
Score: 33.76961965760301
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Establishing the correct correspondence of feature points is a fundamental task in computer vision. However, the presence of numerous outliers among the feature points can significantly affect the matching results, reducing the accuracy and robustness of the process. Furthermore, a challenge arises when dealing with a large proportion of outliers: how to ensure the extraction of high-quality information while reducing errors caused by negative samples. To address these issues, in this paper, we propose a novel method called Layer-by-Layer Hierarchical Attention Network, which enhances the precision of feature point matching in computer vision by addressing the issue of outliers. Our method incorporates stage fusion, hierarchical extraction, and an attention mechanism to improve the network's representation capability by emphasizing the rich semantic information of feature points. Specifically, we introduce a layer-by-layer channel fusion module, which preserves the feature semantic information from each stage and achieves overall fusion, thereby enhancing the representation capability of the feature points. Additionally, we design a hierarchical attention module that adaptively captures and fuses global perception and structural semantic information using an attention mechanism. Finally, we propose two architectures to extract and integrate features, thereby improving the adaptability of our network. We conduct experiments on two public datasets, namely YFCC100M and SUN3D, and the results demonstrate that our proposed method outperforms several state-of-the-art techniques in both outlier removal and camera pose estimation. Source code is available at http://www.linshuyuan.com.

Related papers

DAGLFNet:Deep Attention-Guided Global-Local Feature Fusion for Pseudo-Image Point Cloud Segmentation [6.418552842518015]
We propose DAGLFNet, a pseudo-image-based representation method to extract discriminative features from point clouds.<n>The method balances high performance with real-time capability, demonstrating great potential for LiDAR-based real-time applications.
arXiv Detail & Related papers (2025-10-12T06:35:03Z)
ContextFusion and Bootstrap: An Effective Approach to Improve Slot Attention-Based Object-Centric Learning [53.19029595226767]
Slot attention-based framework has emerged as a leading approach in object-centric learning.<n>Current methods require a stable feature space throughout training to enable reconstruction from slots.<n>We propose a novel ContextFusion stage and a Bootstrap Branch, both of which can be seamlessly integrated into existing slot attention models.
arXiv Detail & Related papers (2025-09-02T07:19:25Z)
Point Cloud Understanding via Attention-Driven Contrastive Learning [64.65145700121442]
Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms. PointACL is an attention-driven contrastive learning framework designed to address these limitations. Our method employs an attention-driven dynamic masking strategy that guides the model to focus on under-attended regions.
arXiv Detail & Related papers (2024-11-22T05:41:00Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features. Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z)
Learning to Reduce Information Bottleneck for Object Detection in Aerial Images [5.4547979989237225]
We first analyse the importance of the neck network in object detection frameworks from the theory of information bottleneck. We propose a global semantic network, which acts as a bridge from the backbone to the head network in a bidirectional global convolution manner. Compared to the existing neck networks, our method has advantages of capturing rich detailed information and less computational costs.
arXiv Detail & Related papers (2022-04-05T07:46:37Z)
LC3Net: Ladder context correlation complementary network for salient object detection [0.32116198597240836]
We propose a novel ladder context correlation complementary network (LC3Net) FCB is a filterable convolution block to assist the automatic collection of information on the diversity of initial features. DCM is a dense cross module to facilitate the intimate aggregation of different levels of features. BCD is a bidirectional compression decoder to help the progressive shrinkage of multi-scale features.
arXiv Detail & Related papers (2021-10-21T03:12:32Z)
PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving. Current approaches suffer from sparse and partial point clouds of distant and occluded objects. In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z)
Interpretable Detail-Fidelity Attention Network for Single Image Super-Resolution [89.1947690981471]
We propose a purposeful and interpretable detail-fidelity attention network to progressively process smoothes and details in divide-and-conquer manner. Particularly, we propose a Hessian filtering for interpretable feature representation which is high-profile for detail inference. Experiments demonstrate that the proposed methods achieve superior performances over the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-28T08:31:23Z)
Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot Learning [21.89909688056478]
We propose a new two-level joint idea to augment the generative network with an inference network during training. This provides strong cross-modal interaction for effective transfer of knowledge between visual and semantic domains. We evaluate our approach on four benchmark datasets against several state-of-the-art methods, and show its performance.
arXiv Detail & Related papers (2020-07-15T15:34:09Z)
Global Context-Aware Progressive Aggregation Network for Salient Object Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features. We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.