Related papers: Poly-NL: Linear Complexity Non-local Layers with Polynomials

Poly-NL: Linear Complexity Non-local Layers with Polynomials

URL: http://arxiv.org/abs/2107.02859v1
Date: Tue, 6 Jul 2021 19:51:37 GMT
Title: Poly-NL: Linear Complexity Non-local Layers with Polynomials
Authors: Francesca Babiloni, Ioannis Marras, Filippos Kokkinos, Jiankang Deng, Grigorios Chrysos, Stefanos Zafeiriou
Abstract summary: We formulate novel fast NonLocal blocks, capable of reducing complexity from quadratic to linear with no loss in performance. The proposed method, which we dub as "Poly-NL", is competitive with state-of-the-art performance across image recognition, instance segmentation, and face detection tasks.
Score: 76.21832434001759
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spatial self-attention layers, in the form of Non-Local blocks, introduce long-range dependencies in Convolutional Neural Networks by computing pairwise similarities among all possible positions. Such pairwise functions underpin the effectiveness of non-local layers, but also determine a complexity that scales quadratically with respect to the input size both in space and time. This is a severely limiting factor that practically hinders the applicability of non-local blocks to even moderately sized inputs. Previous works focused on reducing the complexity by modifying the underlying matrix operations, however in this work we aim to retain full expressiveness of non-local layers while keeping complexity linear. We overcome the efficiency limitation of non-local blocks by framing them as special cases of 3rd order polynomial functions. This fact enables us to formulate novel fast Non-Local blocks, capable of reducing the complexity from quadratic to linear with no loss in performance, by replacing any direct computation of pairwise similarities with element-wise multiplications. The proposed method, which we dub as "Poly-NL", is competitive with state-of-the-art performance across image recognition, instance segmentation, and face detection tasks, while having considerably less computational overhead.

Related papers

PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured Pruning [8.791770352147989]
We propose a novel approach to training DNNs for FPGA deployment using CERNs as the basic building block. Our method takes advantage of the flexibility offered by soft logic, hiding the evaluation inside the LUTs with minimal overhead. We demonstrate the effectiveness of PolyLUT on three tasks: network intrusion detection, jet identification at the Large Hadron Collider, and MNIST.
arXiv Detail & Related papers (2025-01-14T11:51:57Z)
Element-wise Attention Is All You Need [0.0]
Self-attention mechanism has superior performance across various domains, yet suffers from complexity during both training and inference. We propose a novel element-wise attention mechanism, which uses the element-wise squared Euclidean distance, instead of the dot product operation, to compute similarity. During inference, it can be reformulated as recurrent neural networks, achieving a inference of $mathcalO(tD)$.
arXiv Detail & Related papers (2025-01-10T05:54:04Z)
Verifying Properties of Binary Neural Networks Using Sparse Polynomial Optimization [8.323690755070123]
This paper explores methods for verifying the properties of Binary Neural Networks (BNNs) BNNs, like their full-precision counterparts, are also sensitive to input perturbations. We introduce an alternative approach using Semidefinite Programming relaxations derived from sparse Polynomial Optimization.
arXiv Detail & Related papers (2024-05-27T11:03:48Z)
Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective [17.5796206457693]
We study nonsmooth optimization problems under bounded local subgradient variation. The resulting class of objective encapsulates the classes of objective based on the defined classes.
arXiv Detail & Related papers (2024-03-24T22:42:40Z)
Polyhedral Complex Extraction from ReLU Networks using Edge Subdivision [0.0]
A neural network consists of piecewise affine building blocks, such as fully-connected layers and ReLU activations. This complex has been previously studied to characterize theoretical properties of neural networks. We propose to subdivide the regions via intersections with hyperplanes induced by each neuron.
arXiv Detail & Related papers (2023-06-12T16:17:04Z)
Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning [58.91954425047425]
This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning. We propose a gradient coordinate coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes.
arXiv Detail & Related papers (2022-06-06T09:25:40Z)
Hybrid Trilinear and Bilinear Programming for Aligning Partially Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm. We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound. We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z)
Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z)
Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions [2.3513645401551333]
We introduce the Concurrent ALS algorithm and library, which offers an interface to Matlab. We show how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity. Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.
arXiv Detail & Related papers (2020-10-09T16:55:46Z)
LoCo: Local Contrastive Representation Learning [93.98029899866866]
We show that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks. This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time.
arXiv Detail & Related papers (2020-08-04T05:41:29Z)
PNL: Efficient Long-Range Dependencies Extraction with Pyramid Non-Local Module for Action Recognition [19.010874017607247]
Non-local block inspired by the non-local means is designed to address this challenge. Non-local block brings significant increase in computation cost to the original network. It also lacks the ability to model regional correlation in videos. We propose Pyramid Non-Local (PNL) module, which extends the non-local block by incorporating regional correlation at structured pyramid module.
arXiv Detail & Related papers (2020-06-09T07:40:23Z)
Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. We propose a novel approach based on Lagrangian Decomposition. We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.