Poly-NL: Linear Complexity Non-local Layers with Polynomials
- URL: http://arxiv.org/abs/2107.02859v1
- Date: Tue, 6 Jul 2021 19:51:37 GMT
- Title: Poly-NL: Linear Complexity Non-local Layers with Polynomials
- Authors: Francesca Babiloni, Ioannis Marras, Filippos Kokkinos, Jiankang Deng,
Grigorios Chrysos, Stefanos Zafeiriou
- Abstract summary: We formulate novel fast NonLocal blocks, capable of reducing complexity from quadratic to linear with no loss in performance.
The proposed method, which we dub as "Poly-NL", is competitive with state-of-the-art performance across image recognition, instance segmentation, and face detection tasks.
- Score: 76.21832434001759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial self-attention layers, in the form of Non-Local blocks, introduce
long-range dependencies in Convolutional Neural Networks by computing pairwise
similarities among all possible positions. Such pairwise functions underpin the
effectiveness of non-local layers, but also determine a complexity that scales
quadratically with respect to the input size both in space and time. This is a
severely limiting factor that practically hinders the applicability of
non-local blocks to even moderately sized inputs. Previous works focused on
reducing the complexity by modifying the underlying matrix operations, however
in this work we aim to retain full expressiveness of non-local layers while
keeping complexity linear. We overcome the efficiency limitation of non-local
blocks by framing them as special cases of 3rd order polynomial functions. This
fact enables us to formulate novel fast Non-Local blocks, capable of reducing
the complexity from quadratic to linear with no loss in performance, by
replacing any direct computation of pairwise similarities with element-wise
multiplications. The proposed method, which we dub as "Poly-NL", is competitive
with state-of-the-art performance across image recognition, instance
segmentation, and face detection tasks, while having considerably less
computational overhead.
Related papers
- Verifying Properties of Binary Neural Networks Using Sparse Polynomial Optimization [8.323690755070123]
This paper explores methods for verifying the properties of Binary Neural Networks (BNNs)
BNNs, like their full-precision counterparts, are also sensitive to input perturbations.
We introduce an alternative approach using Semidefinite Programming relaxations derived from sparse Polynomial Optimization.
arXiv Detail & Related papers (2024-05-27T11:03:48Z) - Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective [17.5796206457693]
We study nonsmooth optimization problems under bounded local subgradient variation.
The resulting class of objective encapsulates the classes of objective based on the defined classes.
arXiv Detail & Related papers (2024-03-24T22:42:40Z) - Polyhedral Complex Extraction from ReLU Networks using Edge Subdivision [0.0]
A neural network consists of piecewise affine building blocks, such as fully-connected layers and ReLU activations.
This complex has been previously studied to characterize theoretical properties of neural networks.
We propose to subdivide the regions via intersections with hyperplanes induced by each neuron.
arXiv Detail & Related papers (2023-06-12T16:17:04Z) - Optimization-based Block Coordinate Gradient Coding for Mitigating
Partial Stragglers in Distributed Learning [58.91954425047425]
This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning.
We propose a gradient coordinate coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes.
arXiv Detail & Related papers (2022-06-06T09:25:40Z) - Hybrid Trilinear and Bilinear Programming for Aligning Partially
Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm.
We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound.
We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z) - Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer.
This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$.
We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z) - Concurrent Alternating Least Squares for multiple simultaneous Canonical
Polyadic Decompositions [2.3513645401551333]
We introduce the Concurrent ALS algorithm and library, which offers an interface to Matlab.
We show how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity.
Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.
arXiv Detail & Related papers (2020-10-09T16:55:46Z) - LoCo: Local Contrastive Representation Learning [93.98029899866866]
We show that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks.
This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time.
arXiv Detail & Related papers (2020-08-04T05:41:29Z) - PNL: Efficient Long-Range Dependencies Extraction with Pyramid Non-Local
Module for Action Recognition [19.010874017607247]
Non-local block inspired by the non-local means is designed to address this challenge.
Non-local block brings significant increase in computation cost to the original network.
It also lacks the ability to model regional correlation in videos.
We propose Pyramid Non-Local (PNL) module, which extends the non-local block by incorporating regional correlation at structured pyramid module.
arXiv Detail & Related papers (2020-06-09T07:40:23Z) - Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take.
We propose a novel approach based on Lagrangian Decomposition.
We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.