Unifying Nonlocal Blocks for Neural Networks
- URL: http://arxiv.org/abs/2108.02451v1
- Date: Thu, 5 Aug 2021 08:34:12 GMT
- Title: Unifying Nonlocal Blocks for Neural Networks
- Authors: Lei Zhu, Qi She, Duo Li, Yanye Lu, Xuejing Kang, Jie Hu, Changhu Wang
- Abstract summary: Nonlocal-based blocks are designed for capturing long-range spatial-temporal dependencies in computer vision tasks.
We provide a new perspective to interpret them, where we view them as a set of graph filters generated on a fully-connected graph.
We propose an efficient and robust spectral nonlocal block, which can be more robust and flexible to catch long-range dependencies.
- Score: 43.107708207022526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The nonlocal-based blocks are designed for capturing long-range
spatial-temporal dependencies in computer vision tasks. Although having shown
excellent performance, they still lack the mechanism to encode the rich,
structured information among elements in an image or video. In this paper, to
theoretically analyze the property of these nonlocal-based blocks, we provide a
new perspective to interpret them, where we view them as a set of graph filters
generated on a fully-connected graph. Specifically, when choosing the Chebyshev
graph filter, a unified formulation can be derived for explaining and analyzing
the existing nonlocal-based blocks (e.g., nonlocal block, nonlocal stage,
double attention block). Furthermore, by concerning the property of spectral,
we propose an efficient and robust spectral nonlocal block, which can be more
robust and flexible to catch long-range dependencies when inserted into deep
neural networks than the existing nonlocal blocks. Experimental results
demonstrate the clear-cut improvements and practical applicabilities of our
method on image classification, action recognition, semantic segmentation, and
person re-identification tasks.
Related papers
- Multi-View Subgraph Neural Networks: Self-Supervised Learning with Scarce Labeled Data [24.628203785306233]
We present a novel learning framework called multi-view subgraph neural networks (Muse) for handling long-range dependencies.
By fusing two views of subgraphs, the learned representations can preserve the topological properties of the graph at large.
Experimental results show that Muse outperforms the alternative methods on node classification tasks with limited labeled data.
arXiv Detail & Related papers (2024-04-19T01:36:50Z) - Closing the Loop: Graph Networks to Unify Semantic Objects and Visual
Features for Multi-object Scenes [2.236663830879273]
Loop Closure Detection (LCD) is essential to minimize drift when recognizing previously visited places.
Visual Bag-of-Words (vBoW) has been an LCD algorithm of choice for many state-of-the-art SLAM systems.
This paper proposes SymbioLCD2, which creates a unified graph structure to integrate semantic objects and visual features symbiotically.
arXiv Detail & Related papers (2022-09-24T00:42:33Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Structural block driven - enhanced convolutional neural representation
for relation extraction [11.617819771034927]
We propose a novel lightweight relation extraction approach of structural block driven - convolutional neural learning.
We detect the essential sequential tokens associated with entities through dependency analysis, named as a structural block.
We only encode the block on a block-wise and an inter-block-wise representation, utilizing multi-scale CNNs.
arXiv Detail & Related papers (2021-03-21T10:23:44Z) - Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning.
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector.
We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z) - LoCo: Local Contrastive Representation Learning [93.98029899866866]
We show that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks.
This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time.
arXiv Detail & Related papers (2020-08-04T05:41:29Z) - Disentangled Non-Local Neural Networks [68.92293183542131]
We study the non-local block in depth, where we find that its attention can be split into two terms.
We present the disentangled non-local block, where the two terms are decoupled to facilitate learning for both terms.
arXiv Detail & Related papers (2020-06-11T17:59:22Z) - Structural Temporal Graph Neural Networks for Anomaly Detection in
Dynamic Graphs [54.13919050090926]
We propose an end-to-end structural temporal Graph Neural Network model for detecting anomalous edges in dynamic graphs.
In particular, we first extract the $h$-hop enclosing subgraph centered on the target edge and propose the node labeling function to identify the role of each node in the subgraph.
Based on the extracted features, we utilize Gated recurrent units (GRUs) to capture the temporal information for anomaly detection.
arXiv Detail & Related papers (2020-05-15T09:17:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.