Adaptive Test-Time Augmentation for Low-Power CPU
- URL: http://arxiv.org/abs/2105.06183v1
- Date: Thu, 13 May 2021 10:50:13 GMT
- Title: Adaptive Test-Time Augmentation for Low-Power CPU
- Authors: Luca Mocerino, Roberto G. Rizzo, Valentino Peluso, Andrea Calimera,
Enrico Macii
- Abstract summary: Test-Time Augmentation (TTA) techniques aim to alleviate such common side effect at inference-time.
We propose AdapTTA, an adaptive implementation of TTA that controls the number of feed-forward passes dynamically.
Experimental results on state-of-the-art ConvNets for image classification deployed on a commercial ARM Cortex-A CPU demonstrate AdapTTA reaches remarkable latency savings.
- Score: 3.5473686344971416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Neural Networks (ConvNets) are trained offline using the few
available data and may therefore suffer from substantial accuracy loss when
ported on the field, where unseen input patterns received under unpredictable
external conditions can mislead the model. Test-Time Augmentation (TTA)
techniques aim to alleviate such common side effect at inference-time, first
running multiple feed-forward passes on a set of altered versions of the same
input sample, and then computing the main outcome through a consensus of the
aggregated predictions. Unfortunately, the implementation of TTA on embedded
CPUs introduces latency penalties that limit its adoption on edge applications.
To tackle this issue, we propose AdapTTA, an adaptive implementation of TTA
that controls the number of feed-forward passes dynamically, depending on the
complexity of the input. Experimental results on state-of-the-art ConvNets for
image classification deployed on a commercial ARM Cortex-A CPU demonstrate
AdapTTA reaches remarkable latency savings, from 1.49X to 2.21X, and hence a
higher frame rate compared to static TTA, still preserving the same accuracy
gain.
Related papers
- Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts.
We propose a test-time Forward-Optimization Adaptation (FOA) method.
FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z) - E2USD: Efficient-yet-effective Unsupervised State Detection for Multivariate Time Series [18.02694168117277]
We propose E2Usd that enables efficient-yet-accurate unsupervised state detection.
E2Usd exploits a Fast Fourier Transform-based Time Series and a Decomposed Dual-view Embedding Module.
We also propose a False Negative Cancellation Contrastive Learning method to counteract the effects of false negatives.
arXiv Detail & Related papers (2024-02-21T10:16:57Z) - Optimization-Free Test-Time Adaptation for Cross-Person Activity
Recognition [30.350005654271868]
Test-Time Adaptation aims to utilize the test stream to adjust predictions in real-time inference.
High computational cost makes it intractable to run on resource-constrained edge devices.
We propose an Optimization-Free Test-Time Adaptation framework for sensor-based HAR.
arXiv Detail & Related papers (2023-10-28T02:20:33Z) - Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side.
By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample.
We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z) - Design and Prototyping Distributed CNN Inference Acceleration in Edge
Computing [85.74517957717363]
HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing.
Experiments show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16.
It is shown that the model selection with distributed inference HALP can significantly improve service reliability.
arXiv Detail & Related papers (2022-11-24T19:48:30Z) - Robust Continual Test-time Adaptation: Instance-aware BN and
Prediction-balanced Memory [58.72445309519892]
We present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams.
Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner.
arXiv Detail & Related papers (2022-08-10T03:05:46Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.