Related papers: Network Augmentation for Tiny Deep Learning

Network Augmentation for Tiny Deep Learning

URL: http://arxiv.org/abs/2110.08890v1
Date: Sun, 17 Oct 2021 18:48:41 GMT
Title: Network Augmentation for Tiny Deep Learning
Authors: Han Cai, Chuang Gan, Ji Lin, Song Han
Abstract summary: We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks. We demonstrate the effectiveness of NetAug on image classification and object detection.
Score: 73.57192520534585
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks. Existing regularization techniques (e.g., data augmentation, dropout) have shown much success on large neural networks (e.g., ResNet50) by adding noise to overcome over-fitting. However, we found these techniques hurt the performance of tiny neural networks. We argue that training tiny models are different from large models: rather than augmenting the data, we should augment the model, since tiny models tend to suffer from under-fitting rather than over-fitting due to limited capacity. To alleviate this issue, NetAug augments the network (reverse dropout) instead of inserting noise into the dataset or the network. It puts the tiny model into larger models and encourages it to work as a sub-model of larger models to get extra supervision, in addition to functioning as an independent model. At test time, only the tiny model is used for inference, incurring zero inference overhead. We demonstrate the effectiveness of NetAug on image classification and object detection. NetAug consistently improves the performance of tiny models, achieving up to 2.1% accuracy improvement on ImageNet, and 4.3% on Cars. On Pascal VOC, NetAug provides 2.96% mAP improvement with the same computational cost.

Related papers

A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization. Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z)
Small-footprint slimmable networks for keyword spotting [3.0825815617887415]
We show that slimmable neural networks allow us to create super-nets from Convolutioanl Neural Networks and Transformers. We demonstrate the usefulness of these models on in-house Alexa data and Google Speech Commands, and focus our efforts on models for the on-device use case.
arXiv Detail & Related papers (2023-04-21T12:59:37Z)
Establishing a stronger baseline for lightweight contrastive models [10.63129923292905]
Recent research has reported a performance degradation in self-supervised contrastive learning for specially designed efficient networks. A common practice is to introduce a pretrained contrastive teacher model and train the lightweight networks with distillation signals generated by the teacher. In this work, we aim to establish a stronger baseline for lightweight contrastive models without using a pretrained teacher model.
arXiv Detail & Related papers (2022-12-14T11:20:24Z)
Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks. Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts. Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z)
T-RECX: Tiny-Resource Efficient Convolutional neural networks with early-eXit [0.0]
We show how an early exit intermediate classifier can be enhanced by the addition of an early exit intermediate classifier. Our technique is optimized specifically for tiny-CNN sized models. Our results show that T-RecX 1) improves the accuracy of baseline network, 2) achieves 31.58% average reduction in FLOPS in exchange for one percent accuracy across all evaluated models.
arXiv Detail & Related papers (2022-07-14T02:05:43Z)
LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification [36.651329027209634]
LilNetX is an end-to-end trainable technique for neural networks. It enables learning models with specified accuracy-rate-computation trade-off.
arXiv Detail & Related papers (2022-04-06T17:59:10Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
Learnable Expansion-and-Compression Network for Few-shot Class-Incremental Learning [87.94561000910707]
We propose a learnable expansion-and-compression network (LEC-Net) to solve catastrophic forgetting and model over-fitting problems. LEC-Net enlarges the representation capacity of features, alleviating feature drift of old network from the perspective of model regularization. Experiments on the CUB/CIFAR-100 datasets show that LEC-Net improves the baseline by 57% while outperforms the state-of-the-art by 56%.
arXiv Detail & Related papers (2021-04-06T04:34:21Z)
Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets [65.28292822614418]
Giant formula for simultaneously enlarging the resolution, depth and width provides us a Rubik's cube for neural networks. This paper aims to explore the twisting rules for obtaining deep neural networks with minimum model sizes and computational costs.
arXiv Detail & Related papers (2020-10-28T08:49:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.