LambdaNetworks: Modeling Long-Range Interactions Without Attention
- URL: http://arxiv.org/abs/2102.08602v1
- Date: Wed, 17 Feb 2021 06:33:47 GMT
- Title: LambdaNetworks: Modeling Long-Range Interactions Without Attention
- Authors: Irwan Bello
- Abstract summary: We present layers -- an alternative framework to self-attention -- for capturing long-range interactions between an input and structured contextual information.
Lambda layers capture such interactions by transforming available contexts into linear functions, termed COCOs.
They model both content and position-based interactions which enables their application to large structured inputs such as images.
- Score: 3.459216990820884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present lambda layers -- an alternative framework to self-attention -- for
capturing long-range interactions between an input and structured contextual
information (e.g. a pixel surrounded by other pixels). Lambda layers capture
such interactions by transforming available contexts into linear functions,
termed lambdas, and applying these linear functions to each input separately.
Similar to linear attention, lambda layers bypass expensive attention maps, but
in contrast, they model both content and position-based interactions which
enables their application to large structured inputs such as images. The
resulting neural network architectures, LambdaNetworks, significantly
outperform their convolutional and attentional counterparts on ImageNet
classification, COCO object detection and COCO instance segmentation, while
being more computationally efficient. Additionally, we design LambdaResNets, a
family of hybrid architectures across different scales, that considerably
improves the speed-accuracy tradeoff of image classification models.
LambdaResNets reach excellent accuracies on ImageNet while being 3.2 - 4.4x
faster than the popular EfficientNets on modern machine learning accelerators.
When training with an additional 130M pseudo-labeled images, LambdaResNets
achieve up to a 9.5x speed-up over the corresponding EfficientNet checkpoints.
Related papers
- InterFormer: Real-time Interactive Image Segmentation [80.45763765116175]
Interactive image segmentation enables annotators to efficiently perform pixel-level annotation for segmentation tasks.
The existing interactive segmentation pipeline suffers from inefficient computations of interactive models.
We propose a method named InterFormer that follows a new pipeline to address these issues.
arXiv Detail & Related papers (2023-04-06T08:57:00Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Efficient Representation Learning via Adaptive Context Pooling [15.673260849127695]
Self-attention mechanisms assume a fixed attention granularity defined by the individual tokens, which may not be optimal for modeling complex dependencies at higher levels.
We propose ContextPool to address this problem by adapting the attention granularity for each token.
We show that ContextPool makes attention models more expressive, achieving strong performance often with fewer layers and thus significantly reduced cost.
arXiv Detail & Related papers (2022-07-05T07:10:31Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Rethinking Space-Time Networks with Improved Memory Coverage for
Efficient Video Object Segmentation [68.45737688496654]
We establish correspondences directly between frames without re-encoding the mask features for every object.
With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion.
We validated that every memory node now has a chance to contribute, and experimentally showed that such diversified voting is beneficial to both memory efficiency and inference accuracy.
arXiv Detail & Related papers (2021-06-09T16:50:57Z) - RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for
Image Recognition [123.59890802196797]
We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition.
We construct convolutional layers inside a RepMLP during training and merge them into the FC for inference.
By inserting RepMLP in traditional CNN, we improve ResNets by 1.8% accuracy on ImageNet, 2.9% for face recognition, and 2.3% mIoU on Cityscapes with lower FLOPs.
arXiv Detail & Related papers (2021-05-05T06:17:40Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in
Image Classification [46.885260723836865]
Deep convolutional neural networks (CNNs) generally improve when fueled with high resolution images.
Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efficient image classification.
Our framework is general and flexible as it is compatible with most of the state-of-the-art light-weighted CNNs.
arXiv Detail & Related papers (2020-10-11T17:55:06Z) - Attentional Feature Fusion [4.265244011052538]
We propose a uniform and general scheme, namely attentional feature fusion.
We show that our models outperform state-of-the-art networks on both CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-09-29T15:10:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.