Element-Wise Attention Layers: an option for optimization
- URL: http://arxiv.org/abs/2302.05488v1
- Date: Fri, 10 Feb 2023 19:50:34 GMT
- Title: Element-Wise Attention Layers: an option for optimization
- Authors: Giovanni Araujo Bacochina, Rodrigo Clemente Thom de Souza
- Abstract summary: It's proposed a new method of attention mechanism that adapts the Dot-Product Attention to become element-wise through the use of arrays multiplications.
The results show that this mechanism allows for an accuracy of 92% of the VGG-like counterpart in Fashion MNIST dataset, while reducing the number of parameters in 97%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of Attention Layers has become a trend since the popularization of
the Transformer-based models, being the key element for many state-of-the-art
models that have been developed through recent years. However, one of the
biggest obstacles in implementing these architectures - as well as many others
in Deep Learning Field - is the enormous amount of optimizing parameters they
possess, which make its use conditioned on the availability of robust hardware.
In this paper, it's proposed a new method of attention mechanism that adapts
the Dot-Product Attention, which uses matrices multiplications, to become
element-wise through the use of arrays multiplications. To test the
effectiveness of such approach, two models (one with a VGG-like architecture
and one with the proposed method) have been trained in a classification task
using Fashion MNIST and CIFAR10 datasets. Each model has been trained for 10
epochs in a single Tesla T4 GPU from Google Colaboratory. The results show that
this mechanism allows for an accuracy of 92% of the VGG-like counterpart in
Fashion MNIST dataset, while reducing the number of parameters in 97%. For
CIFAR10, the accuracy is still equivalent to 60% of the VGG-like counterpart
while using 50% less parameters.
Related papers
- SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors [80.6043267994434]
We propose SVFT, a simple approach that fundamentally differs from existing methods.
SVFT updates (W) as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations.
Experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters.
arXiv Detail & Related papers (2024-05-30T01:27:43Z) - SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation [53.675725490807615]
We introduce SDPose, a new self-distillation method for improving the performance of small transformer-based models.
SDPose-T obtains 69.7% mAP with 4.4M parameters and 1.8 GFLOPs, while SDPose-S-V2 obtains 73.5% mAP on the MSCOCO validation dataset.
arXiv Detail & Related papers (2024-04-04T15:23:14Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - Model Composition: Can Multiple Neural Networks Be Combined into a
Single Network Using Only Unlabeled Data? [6.0945220518329855]
This paper investigates the idea of combining multiple trained neural networks using unlabeled data.
To this end, the proposed method makes use of generation, filtering, and aggregation of reliable pseudo-labels collected from unlabeled data.
Our method supports using an arbitrary number of input models with arbitrary architectures and categories.
arXiv Detail & Related papers (2021-10-20T04:17:25Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - NSGANetV2: Evolutionary Multi-Objective Surrogate-Assisted Neural
Architecture Search [22.848528877480796]
We propose an efficient NAS algorithm for generating task-specific models that are competitive under multiple competing objectives.
It comprises of two surrogates, one at the architecture level to improve sample efficiency and one at the weights level, through a supernet, to improve gradient descent training efficiency.
We demonstrate the effectiveness and versatility of the proposed method on six diverse non-standard datasets.
arXiv Detail & Related papers (2020-07-20T18:30:11Z) - CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on
Embedded FPGAs [41.43273142203345]
We harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions.
With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB.
Our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters-20.9x smaller but 10% more accurate than Tiny-YOLO.
arXiv Detail & Related papers (2020-06-12T17:56:47Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.