Dynamic Query Selection for Fast Visual Perceiver
- URL: http://arxiv.org/abs/2205.10873v2
- Date: Tue, 21 Mar 2023 10:53:32 GMT
- Title: Dynamic Query Selection for Fast Visual Perceiver
- Authors: Corentin Dancette and Matthieu Cord
- Abstract summary: We show how to make Perceivers even more efficient, by reducing the number of queries Q during inference while limiting the accuracy drop.
In this work, we explore how to make Perceivers even more efficient, by reducing the number of queries Q during inference while limiting the accuracy drop.
- Score: 42.07082299370995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers have been matching deep convolutional networks for vision
architectures in recent works. Most work is focused on getting the best results
on large-scale benchmarks, and scaling laws seem to be the most successful
strategy: bigger models, more data, and longer training result in higher
performance. However, the reduction of network complexity and inference time
remains under-explored. The Perceiver model offers a solution to this problem:
by first performing a Cross-attention with a fixed number Q of latent query
tokens, the complexity of the L-layers Transformer network that follows is
bounded by O(LQ^2). In this work, we explore how to make Perceivers even more
efficient, by reducing the number of queries Q during inference while limiting
the accuracy drop.
Related papers
- Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers [14.756988176469365]
An effective approach to reduce computational requirements and increase efficiency is to prune unnecessary components of Deep Neural Networks.
Previous work has shown that attribution methods from the field of eXplainable AI serve as effective means to extract and prune the least relevant network components in a few-shot fashion.
arXiv Detail & Related papers (2024-08-22T17:35:18Z) - Fast networked data selection via distributed smoothed quantile estimation [6.002041236376175]
We establish a connection between selecting the most informative data and finding the top-$k$ elements of a multiset.
The top-$k$ selection in a network can be formulated as a distributed nonsmooth convex optimization problem known as quantile estimation.
We characterize the complexity required to achieve top-$k$ selection, a challenging task due to the lack of strong convexity.
arXiv Detail & Related papers (2024-06-04T03:26:15Z) - RDRN: Recursively Defined Residual Network for Image Super-Resolution [58.64907136562178]
Deep convolutional neural networks (CNNs) have obtained remarkable performance in single image super-resolution.
We propose a novel network architecture which utilizes attention blocks efficiently.
arXiv Detail & Related papers (2022-11-17T11:06:29Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - Mitigating Performance Saturation in Neural Marked Point Processes:
Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers.
We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z) - RAN-GNNs: breaking the capacity limits of graph neural networks [43.66682619000099]
Graph neural networks have become a staple in problems addressing learning and analysis of data defined over graphs.
Recent works attribute this to the need to consider multiple neighborhood sizes at the same time and adaptively tune them.
We show that employing a randomly-wired architecture can be a more effective way to increase the capacity of the network and obtain richer representations.
arXiv Detail & Related papers (2021-03-29T12:34:36Z) - Delaying Interaction Layers in Transformer-based Encoders for Efficient
Open Domain Question Answering [3.111078740559015]
Open Domain Question Answering (ODQA) on a large-scale corpus of documents is a key challenge in computer science.
We propose a more direct and complementary solution which consists in applying a generic change in the architecture of transformer-based models.
The resulting variants are competitive with the original models on the extractive task and allow, on the ODQA setting, a significant speedup and even a performance improvement in many cases.
arXiv Detail & Related papers (2020-10-16T14:36:38Z) - A Partial Regularization Method for Network Compression [0.0]
We propose an approach of partial regularization rather than the original form of penalizing all parameters, which is said to be full regularization, to conduct model compression at a higher speed.
Experimental results show that as we expected, the computational complexity is reduced by observing less running time in almost all situations.
Surprisingly, it helps to improve some important metrics such as regression fitting results and classification accuracy in both training and test phases on multiple datasets.
arXiv Detail & Related papers (2020-09-03T00:38:27Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.