Focus Your Attention (with Adaptive IIR Filters)
- URL: http://arxiv.org/abs/2305.14952v2
- Date: Wed, 18 Oct 2023 11:24:31 GMT
- Title: Focus Your Attention (with Adaptive IIR Filters)
- Authors: Shahar Lutati, Itamar Zimerman, Lior Wolf
- Abstract summary: We present a new layer in which dynamic (i.e.,input-dependent) Infinite Impulse Response (IIR) filters of order two are used to process the input sequence.
Despite their relatively low order, the causal adaptive filters are shown to focus attention on the relevant sequence elements.
- Score: 62.80628327613344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new layer in which dynamic (i.e.,input-dependent) Infinite
Impulse Response (IIR) filters of order two are used to process the input
sequence prior to applying conventional attention. The input is split into
chunks, and the coefficients of these filters are determined based on previous
chunks to maintain causality. Despite their relatively low order, the causal
adaptive filters are shown to focus attention on the relevant sequence
elements. The new layer is grounded in control theory, and is shown to
generalize diagonal state-space layers. The layer performs on-par with
state-of-the-art networks, with a fraction of their parameters and with time
complexity that is sub-quadratic with input size. The obtained layer is
favorable to layers such as Heyna, GPT2, and Mega, both with respect to the
number of parameters and the obtained level of performance on multiple
long-range sequence problems.
Related papers
- FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction [11.146015814220858]
FIRST is an algorithm that reduces inference latency by using layer-specific routers to select a subset of transformer layers adaptively for each input sequence.
Our approach reveals that input adaptivity is critical - indeed, different task-specific middle layers play a crucial role in evolving hidden representations depending on task.
arXiv Detail & Related papers (2024-10-16T12:45:35Z) - Scene Prior Filtering for Depth Super-Resolution [97.30137398361823]
We introduce a Scene Prior Filtering network, SPFNet, to mitigate texture interference and edge inaccuracy.
Our SPFNet has been extensively evaluated on both real and synthetic datasets, achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-02-21T15:35:59Z) - Filter Pruning via Filters Similarity in Consecutive Layers [20.29555787754269]
Filter pruning is widely adopted to compress and accelerate the Convolutional Neural Networks (CNNs)
We intuitively propose a novel pruning method by explicitly leveraging the Filters Similarity in Consecutive Layers (FSCL)
Experiments demonstrate the effectiveness of FSCL, and it yields remarkable improvement over state-of-the-art on accuracy, FLOPs and parameter reduction.
arXiv Detail & Related papers (2023-04-26T09:18:38Z) - A neural network-supported two-stage algorithm for lightweight
dereverberation on hearing devices [13.49645012479288]
A two-stage lightweight online dereverberation algorithm for hearing devices is presented in this paper.
The approach combines a multi-channel multi-frame linear filter with a single-channel single-frame post-filter.
Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs)
arXiv Detail & Related papers (2022-04-06T11:08:28Z) - The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees.
We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z) - Unsharp Mask Guided Filtering [53.14430987860308]
The goal of this paper is guided image filtering, which emphasizes the importance of structure transfer during filtering.
We propose a new and simplified formulation of the guided filter inspired by unsharp masking.
Our formulation enjoys a filtering prior to a low-pass filter and enables explicit structure transfer by estimating a single coefficient.
arXiv Detail & Related papers (2021-06-02T19:15:34Z) - Layer-adaptive sparsity for the Magnitude-based Pruning [88.37510230946478]
We propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score.
LAMP consistently outperforms popular existing schemes for layerwise sparsity selection.
arXiv Detail & Related papers (2020-10-15T09:14:02Z) - Novel Adaptive Binary Search Strategy-First Hybrid Pyramid- and
Clustering-Based CNN Filter Pruning Method without Parameters Setting [3.7468898363447654]
Pruning redundant filters in CNN models has received growing attention.
We propose an adaptive binary search-first hybrid pyramid- and clustering-based (ABS HPC) method for pruning filters automatically.
Based on the practical dataset and the CNN models, with higher accuracy, the thorough experimental results demonstrated the significant parameters and floating-point operations reduction merits of the proposed filter pruning method.
arXiv Detail & Related papers (2020-06-08T10:09:43Z) - Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost.
Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors.
We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.