Poly-NL: Linear Complexity Non-local Layers with Polynomials
- URL: http://arxiv.org/abs/2107.02859v1
- Date: Tue, 6 Jul 2021 19:51:37 GMT
- Title: Poly-NL: Linear Complexity Non-local Layers with Polynomials
- Authors: Francesca Babiloni, Ioannis Marras, Filippos Kokkinos, Jiankang Deng,
Grigorios Chrysos, Stefanos Zafeiriou
- Abstract summary: We formulate novel fast NonLocal blocks, capable of reducing complexity from quadratic to linear with no loss in performance.
The proposed method, which we dub as "Poly-NL", is competitive with state-of-the-art performance across image recognition, instance segmentation, and face detection tasks.
- Score: 76.21832434001759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial self-attention layers, in the form of Non-Local blocks, introduce
long-range dependencies in Convolutional Neural Networks by computing pairwise
similarities among all possible positions. Such pairwise functions underpin the
effectiveness of non-local layers, but also determine a complexity that scales
quadratically with respect to the input size both in space and time. This is a
severely limiting factor that practically hinders the applicability of
non-local blocks to even moderately sized inputs. Previous works focused on
reducing the complexity by modifying the underlying matrix operations, however
in this work we aim to retain full expressiveness of non-local layers while
keeping complexity linear. We overcome the efficiency limitation of non-local
blocks by framing them as special cases of 3rd order polynomial functions. This
fact enables us to formulate novel fast Non-Local blocks, capable of reducing
the complexity from quadratic to linear with no loss in performance, by
replacing any direct computation of pairwise similarities with element-wise
multiplications. The proposed method, which we dub as "Poly-NL", is competitive
with state-of-the-art performance across image recognition, instance
segmentation, and face detection tasks, while having considerably less
computational overhead.
Related papers
- PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured Pruning [8.791770352147989]
We propose a novel approach to training DNNs for FPGA deployment using CERNs as the basic building block.
Our method takes advantage of the flexibility offered by soft logic, hiding the evaluation inside the LUTs with minimal overhead.
We demonstrate the effectiveness of PolyLUT on three tasks: network intrusion detection, jet identification at the Large Hadron Collider, and MNIST.
arXiv Detail & Related papers (2025-01-14T11:51:57Z) - Element-wise Attention Is All You Need [0.0]
Self-attention mechanism has superior performance across various domains, yet suffers from complexity during both training and inference.
We propose a novel element-wise attention mechanism, which uses the element-wise squared Euclidean distance, instead of the dot product operation, to compute similarity.
During inference, it can be reformulated as recurrent neural networks, achieving a inference of $mathcalO(tD)$.
arXiv Detail & Related papers (2025-01-10T05:54:04Z) - Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective [17.5796206457693]
We study nonsmooth optimization problems under bounded local subgradient variation.
The resulting class of objective encapsulates the classes of objective based on the defined classes.
arXiv Detail & Related papers (2024-03-24T22:42:40Z) - Polyhedral Complex Extraction from ReLU Networks using Edge Subdivision [0.0]
A neural network consists of piecewise affine building blocks, such as fully-connected layers and ReLU activations.
This complex has been previously studied to characterize theoretical properties of neural networks.
We propose to subdivide the regions via intersections with hyperplanes induced by each neuron.
arXiv Detail & Related papers (2023-06-12T16:17:04Z) - Optimization-based Block Coordinate Gradient Coding for Mitigating
Partial Stragglers in Distributed Learning [58.91954425047425]
This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning.
We propose a gradient coordinate coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes.
arXiv Detail & Related papers (2022-06-06T09:25:40Z) - Hybrid Trilinear and Bilinear Programming for Aligning Partially
Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm.
We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound.
We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z) - Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer.
This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$.
We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z) - LoCo: Local Contrastive Representation Learning [93.98029899866866]
We show that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks.
This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time.
arXiv Detail & Related papers (2020-08-04T05:41:29Z) - PNL: Efficient Long-Range Dependencies Extraction with Pyramid Non-Local
Module for Action Recognition [19.010874017607247]
Non-local block inspired by the non-local means is designed to address this challenge.
Non-local block brings significant increase in computation cost to the original network.
It also lacks the ability to model regional correlation in videos.
We propose Pyramid Non-Local (PNL) module, which extends the non-local block by incorporating regional correlation at structured pyramid module.
arXiv Detail & Related papers (2020-06-09T07:40:23Z) - Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take.
We propose a novel approach based on Lagrangian Decomposition.
We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.