Related papers: Augmentations: An Insight into their Effectiveness on Convolution Neural Networks

Augmentations: An Insight into their Effectiveness on Convolution Neural Networks

URL: http://arxiv.org/abs/2205.04064v1
Date: Mon, 9 May 2022 06:36:40 GMT
Title: Augmentations: An Insight into their Effectiveness on Convolution Neural Networks
Authors: Sabeesh Ethiraj, Bharath Kumar Bolla
Abstract summary: The ability to boost a model's robustness depends on two factors, viz-a-viz, the model architecture, and the type of augmentations. This paper evaluates the effect of parameters using 3x3 and depth-wise separable convolutions on different augmentation techniques.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Augmentations are the key factor in determining the performance of any neural network as they provide a model with a critical edge in boosting its performance. Their ability to boost a model's robustness depends on two factors, viz-a-viz, the model architecture, and the type of augmentations. Augmentations are very specific to a dataset, and it is not imperative that all kinds of augmentation would necessarily produce a positive effect on a model's performance. Hence there is a need to identify augmentations that perform consistently well across a variety of datasets and also remain invariant to the type of architecture, convolutions, and the number of parameters used. Hence there is a need to identify augmentations that perform consistently well across a variety of datasets and also remain invariant to the type of architecture, convolutions, and the number of parameters used. This paper evaluates the effect of parameters using 3x3 and depth-wise separable convolutions on different augmentation techniques on MNIST, FMNIST, and CIFAR10 datasets. Statistical Evidence shows that techniques such as Cutouts and Random horizontal flip were consistent on both parametrically low and high architectures. Depth-wise separable convolutions outperformed 3x3 convolutions at higher parameters due to their ability to create deeper networks. Augmentations resulted in bridging the accuracy gap between the 3x3 and depth-wise separable convolutions, thus establishing their role in model generalization. At higher number augmentations did not produce a significant change in performance. The synergistic effect of multiple augmentations at higher parameters, with antagonistic effect at lower parameters, was also evaluated. The work proves that a delicate balance between architectural supremacy and augmentations needs to be achieved to enhance a model's performance in any given deep learning task.

Related papers

Instruction-Guided Autoregressive Neural Network Parameter Generation [49.800239140036496]
We propose IGPG, an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures. By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets. Experiments on multiple datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework.
arXiv Detail & Related papers (2025-04-02T05:50:19Z)
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models [10.517704202614091]
sparse Mixture-of-Experts (MoEs) allow scaling the number of parameters without proportionally increasing the FLOPs per example. We investigate how varying the sparsity level, i.e., the fraction of inactive parameters, impacts model's performance during pretraining and downstream few-shot evaluation.
arXiv Detail & Related papers (2025-01-21T18:51:15Z)
Mixture of Hidden-Dimensions Transformer [50.40325486463241]
We study hidden dimension sparsity and observe that trained Transformers utilize only a small fraction of token dimensions. We propose MoHD (Mixture of Hidden Dimensions), a sparse conditional activation architecture. It achieves 1.7% higher performance with 50% fewer activation parameters and 3.7% higher performance with a 3x parameter expansion at constant activation cost.
arXiv Detail & Related papers (2024-12-07T13:15:22Z)
Geometric Algebra Planes: Convex Implicit Neural Volumes [70.12234371845445]
We show that GA-Planes is equivalent to a sparse low-rank factor plus low-resolution matrix. We also show that GA-Planes can be adapted for many existing representations.
arXiv Detail & Related papers (2024-11-20T18:21:58Z)
Data Augmentation for Image Classification using Generative AI [8.74488498507946]
Data augmentation is a promising solution to expanding the dataset size. Recent approaches use generative AI models to improve dataset diversity. We propose the Automated Generative Data Augmentation (AGA)
arXiv Detail & Related papers (2024-08-31T21:16:43Z)
Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision [52.80792724919329]
We introduce a novel framework named Adapter-X to improve fine-tuning in 2D image and 3D point cloud modalities. It is the first to outperform full fine-tuning in both 2D image and 3D point cloud modalities with significantly fewer parameters, i.e., only 0.20% and 1.88% of original trainable parameters for 2D and 3D classification tasks.
arXiv Detail & Related papers (2024-06-05T08:26:44Z)
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation. We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z)
Revisiting Data Augmentation in Model Compression: An Empirical and Comprehensive Study [17.970216875558638]
In this paper, we revisit the usage of data augmentation in model compression. We show that models in different sizes prefer data augmentation with different magnitudes. The prediction of a pre-trained large model can be utilized to measure the difficulty of data augmentation.
arXiv Detail & Related papers (2023-05-22T17:05:06Z)
Tunable Convolutions with Parametric Multi-Loss Optimization [5.658123802733283]
Behavior of neural networks is irremediably determined by the specific loss and data used during training. It is often desirable to tune the model at inference time based on external factors such as preferences of the user or dynamic characteristics of the data. This is especially important to balance the perception-distortion trade-off of ill-posed image-to-image translation tasks.
arXiv Detail & Related papers (2023-04-03T11:36:10Z)
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models [68.9288651177564]
We present a novel MoE architecture based on matrix product operators (MPO) from quantum many-body physics. With the decomposed MPO structure, we can reduce the parameters of the original MoE architecture. Experiments on the three well-known downstream natural language datasets based on GPT2 show improved performance and efficiency in increasing model capacity.
arXiv Detail & Related papers (2022-03-02T13:44:49Z)
Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution. We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids. The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z)
Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems. A theory has shown the importance of the gradient descent (GD) to globally optimal solutions. We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.