Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration
- URL: http://arxiv.org/abs/2411.09604v1
- Date: Thu, 14 Nov 2024 17:22:16 GMT
- Title: Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration
- Authors: Yifan Shao,
- Abstract summary: Local-Global Attention is designed to better integrate both local and global contextual features.
We have thoroughly evaluated the Local-Global Attention mechanism on several widely used object detection and classification datasets.
- Score: 0.9790236766474198
- License:
- Abstract: In recent years, attention mechanisms have significantly enhanced the performance of object detection by focusing on key feature information. However, prevalent methods still encounter difficulties in effectively balancing local and global features. This imbalance hampers their ability to capture both fine-grained details and broader contextual information-two critical elements for achieving accurate object detection.To address these challenges, we propose a novel attention mechanism, termed Local-Global Attention, which is designed to better integrate both local and global contextual features. Specifically, our approach combines multi-scale convolutions with positional encoding, enabling the model to focus on local details while concurrently considering the broader global context. Additionally, we introduce a learnable parameters, which allow the model to dynamically adjust the relative importance of local and global attention, depending on the specific requirements of the task, thereby optimizing feature representations across multiple scales.We have thoroughly evaluated the Local-Global Attention mechanism on several widely used object detection and classification datasets. Our experimental results demonstrate that this approach significantly enhances the detection of objects at various scales, with particularly strong performance on multi-class and small object detection tasks. In comparison to existing attention mechanisms, Local-Global Attention consistently outperforms them across several key metrics, all while maintaining computational efficiency.
Related papers
- Point Cloud Understanding via Attention-Driven Contrastive Learning [64.65145700121442]
Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms.
PointACL is an attention-driven contrastive learning framework designed to address these limitations.
Our method employs an attention-driven dynamic masking strategy that guides the model to focus on under-attended regions.
arXiv Detail & Related papers (2024-11-22T05:41:00Z) - Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information [68.10033984296247]
This paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy.
Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications.
arXiv Detail & Related papers (2024-07-22T12:32:09Z) - DistFormer: Enhancing Local and Global Features for Monocular Per-Object
Distance Estimation [35.6022448037063]
Per-object distance estimation is crucial in safety-critical applications such as autonomous driving, surveillance, and robotics.
Existing approaches rely on two scales: local information (i.e., the bounding box proportions) or global information.
Our work aims to strengthen both local and global cues.
arXiv Detail & Related papers (2024-01-06T10:56:36Z) - Global Feature Pyramid Network [1.2473780585666772]
The visual feature pyramid has proven its effectiveness and efficiency in target detection tasks.
Current methodologies tend to overly emphasize inter-layer feature interaction, neglecting the crucial aspect of intra-layer feature adjustment.
arXiv Detail & Related papers (2023-12-18T14:30:41Z) - GlobalMind: Global Multi-head Interactive Self-attention Network for
Hyperspectral Change Detection [22.22495802857453]
High spectral resolution imagery of the Earth's surface enables users to monitor changes over time in fine-grained scale.
Most current algorithms are still confined to describing local features and fail to incorporate a global perspective.
We propose a Global Multi-head INteractive self-attention change Detection network (GlobalMind) to explore the implicit correlation between different surface objects and variant land cover transformations.
arXiv Detail & Related papers (2023-04-18T01:43:17Z) - Global Meets Local: Effective Multi-Label Image Classification via
Category-Aware Weak Supervision [37.761378069277676]
This paper builds a unified framework to perform effective noisy-proposal suppression.
We develop a cross-granularity attention module to explore the complementary information between global and local features.
Our framework achieves superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2022-11-23T05:39:17Z) - Centralized Feature Pyramid for Object Detection [53.501796194901964]
Visual feature pyramid has shown its superiority in both effectiveness and efficiency in a wide range of applications.
In this paper, we propose a OLO Feature Pyramid for object detection, which is based on a globally explicit centralized feature regulation.
arXiv Detail & Related papers (2022-10-05T08:32:54Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z) - Video Salient Object Detection via Adaptive Local-Global Refinement [7.723369608197167]
Video salient object detection (VSOD) is an important task in many vision applications.
We propose an adaptive local-global refinement framework for VSOD.
We show that our weighting methodology can further exploit the feature correlations, thus driving the network to learn more discriminative feature representation.
arXiv Detail & Related papers (2021-04-29T14:14:11Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.