Related papers: Unifying Nonlocal Blocks for Neural Networks

Unifying Nonlocal Blocks for Neural Networks

URL: http://arxiv.org/abs/2108.02451v1
Date: Thu, 5 Aug 2021 08:34:12 GMT
Title: Unifying Nonlocal Blocks for Neural Networks
Authors: Lei Zhu, Qi She, Duo Li, Yanye Lu, Xuejing Kang, Jie Hu, Changhu Wang
Abstract summary: Nonlocal-based blocks are designed for capturing long-range spatial-temporal dependencies in computer vision tasks. We provide a new perspective to interpret them, where we view them as a set of graph filters generated on a fully-connected graph. We propose an efficient and robust spectral nonlocal block, which can be more robust and flexible to catch long-range dependencies.
Score: 43.107708207022526
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The nonlocal-based blocks are designed for capturing long-range spatial-temporal dependencies in computer vision tasks. Although having shown excellent performance, they still lack the mechanism to encode the rich, structured information among elements in an image or video. In this paper, to theoretically analyze the property of these nonlocal-based blocks, we provide a new perspective to interpret them, where we view them as a set of graph filters generated on a fully-connected graph. Specifically, when choosing the Chebyshev graph filter, a unified formulation can be derived for explaining and analyzing the existing nonlocal-based blocks (e.g., nonlocal block, nonlocal stage, double attention block). Furthermore, by concerning the property of spectral, we propose an efficient and robust spectral nonlocal block, which can be more robust and flexible to catch long-range dependencies when inserted into deep neural networks than the existing nonlocal blocks. Experimental results demonstrate the clear-cut improvements and practical applicabilities of our method on image classification, action recognition, semantic segmentation, and person re-identification tasks.

Related papers

Spatial regularisation for improved accuracy and interpretability in keypoint-based registration [5.286949071316761]
Recent approaches based on unsupervised keypoint detection stand out as very promising for interpretability. Here, we propose a three-fold loss to regularise the spatial distribution of the features. Our loss considerably improves the interpretability of the features, which now correspond to precise and anatomically meaningful landmarks.
arXiv Detail & Related papers (2025-03-06T14:48:25Z)
Multi-View Subgraph Neural Networks: Self-Supervised Learning with Scarce Labeled Data [24.628203785306233]
We present a novel learning framework called multi-view subgraph neural networks (Muse) for handling long-range dependencies. By fusing two views of subgraphs, the learned representations can preserve the topological properties of the graph at large. Experimental results show that Muse outperforms the alternative methods on node classification tasks with limited labeled data.
arXiv Detail & Related papers (2024-04-19T01:36:50Z)
Closing the Loop: Graph Networks to Unify Semantic Objects and Visual Features for Multi-object Scenes [2.236663830879273]
Loop Closure Detection (LCD) is essential to minimize drift when recognizing previously visited places. Visual Bag-of-Words (vBoW) has been an LCD algorithm of choice for many state-of-the-art SLAM systems. This paper proposes SymbioLCD2, which creates a unified graph structure to integrate semantic objects and visual features symbiotically.
arXiv Detail & Related papers (2022-09-24T00:42:33Z)
Video Salient Object Detection via Contrastive Features and Attention Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection. A co-attention formulation is utilized to combine the low-level and high-level features. We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z)
Structural block driven - enhanced convolutional neural representation for relation extraction [11.617819771034927]
We propose a novel lightweight relation extraction approach of structural block driven - convolutional neural learning. We detect the essential sequential tokens associated with entities through dependency analysis, named as a structural block. We only encode the block on a block-wise and an inter-block-wise representation, utilizing multi-scale CNNs.
arXiv Detail & Related papers (2021-03-21T10:23:44Z)
Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning. Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector. We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z)
LoCo: Local Contrastive Representation Learning [93.98029899866866]
We show that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks. This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time.
arXiv Detail & Related papers (2020-08-04T05:41:29Z)
Disentangled Non-Local Neural Networks [68.92293183542131]
We study the non-local block in depth, where we find that its attention can be split into two terms. We present the disentangled non-local block, where the two terms are decoupled to facilitate learning for both terms.
arXiv Detail & Related papers (2020-06-11T17:59:22Z)
Structural Temporal Graph Neural Networks for Anomaly Detection in Dynamic Graphs [54.13919050090926]
We propose an end-to-end structural temporal Graph Neural Network model for detecting anomalous edges in dynamic graphs. In particular, we first extract the $h$-hop enclosing subgraph centered on the target edge and propose the node labeling function to identify the role of each node in the subgraph. Based on the extracted features, we utilize Gated recurrent units (GRUs) to capture the temporal information for anomaly detection.
arXiv Detail & Related papers (2020-05-15T09:17:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.