Related papers: TinyTNAS: GPU-Free, Time-Bound, Hardware-Aware Neural Architecture Search for TinyML Time Series Classification

TinyTNAS: GPU-Free, Time-Bound, Hardware-Aware Neural Architecture Search for TinyML Time Series Classification

URL: http://arxiv.org/abs/2408.16535v1
Date: Thu, 29 Aug 2024 13:50:08 GMT
Title: TinyTNAS: GPU-Free, Time-Bound, Hardware-Aware Neural Architecture Search for TinyML Time Series Classification
Authors: Bidyut Saha, Riya Samanta, Soumya K. Ghosh, Ram Babu Roy,
Abstract summary: We present TinyTNAS, a novel hardware-aware multi-objective Neural Architecture Search (NAS) tool specifically designed for TinyML time series classification. Unlike traditional NAS methods that rely on GPU capabilities, TinyTNAS operates efficiently on CPUs, making it accessible for a broader range of applications. TinyTNAS demonstrates state-of-the-art accuracy with significant reductions in RAM, FLASH, MAC usage, and latency.
Score: 6.9604565273682955
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this work, we present TinyTNAS, a novel hardware-aware multi-objective Neural Architecture Search (NAS) tool specifically designed for TinyML time series classification. Unlike traditional NAS methods that rely on GPU capabilities, TinyTNAS operates efficiently on CPUs, making it accessible for a broader range of applications. Users can define constraints on RAM, FLASH, and MAC operations to discover optimal neural network architectures within these parameters. Additionally, the tool allows for time-bound searches, ensuring the best possible model is found within a user-specified duration. By experimenting with benchmark dataset UCI HAR, PAMAP2, WISDM, MIT BIH, and PTB Diagnostic ECG Databas TinyTNAS demonstrates state-of-the-art accuracy with significant reductions in RAM, FLASH, MAC usage, and latency. For example, on the UCI HAR dataset, TinyTNAS achieves a 12x reduction in RAM usage, a 144x reduction in MAC operations, and a 78x reduction in FLASH memory while maintaining superior accuracy and reducing latency by 149x. Similarly, on the PAMAP2 and WISDM datasets, it achieves a 6x reduction in RAM usage, a 40x reduction in MAC operations, an 83x reduction in FLASH, and a 67x reduction in latency, all while maintaining superior accuracy. Notably, the search process completes within 10 minutes in a CPU environment. These results highlight TinyTNAS's capability to optimize neural network architectures effectively for resource-constrained TinyML applications, ensuring both efficiency and high performance. The code for TinyTNAS is available at the GitHub repository and can be accessed at https://github.com/BidyutSaha/TinyTNAS.git.

Related papers

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs. We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z)
NAS-FCOS: Efficient Search for Object Detection Architectures [113.47766862146389]
We propose an efficient method to obtain better object detectors by searching for the feature pyramid network (FPN) and the prediction head of a simple anchor-free object detector. With carefully designed search space, search algorithms, and strategies for evaluating network quality, we are able to find top-performing detection architectures within 4 days using 8 V100 GPUs.
arXiv Detail & Related papers (2021-10-24T12:20:04Z)
A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays [66.62377866022221]
Latent Replay-based Continual Learning (CL) techniques enable online, serverless adaptation in principle. We introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power processor. Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory.
arXiv Detail & Related papers (2021-10-20T11:01:23Z)
FP-NAS: Fast Probabilistic Neural Architecture Search [49.21560787752714]
Probabilistic NAS, such as PARSEC, learns a distribution over high-performing architectures, and uses only as much memory as needed to train a single model. We propose a sampling method adaptive to the distribution entropy, drawing more samples to encourage explorations at the beginning, and reducing samples as learning proceeds. We show that Fast Probabilistic NAS (FP-NAS) can sample 64% fewer architectures and search 2.1x faster than PARSEC.
arXiv Detail & Related papers (2020-11-22T06:10:05Z)
$\mu$NAS: Constrained Neural Architecture Search for Microcontrollers [15.517404770022633]
IoT devices are powered by microcontroller units (MCUs) which are extremely resource-scarce. We build a neural architecture search (NAS) system, called $mu$NAS, to automate the design of such small-yet-powerful MCU-level networks. NAS is able to improve top-1 classification accuracy by up to 4.8%, or (b) reduce memory footprint by 4--13x, or (c) reduce the number of multiply-accumulate operations by at least 2x.
arXiv Detail & Related papers (2020-10-27T12:42:53Z)
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers [18.662026553041937]
Machine learning on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of the Internet of Things (IoT) TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget. neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints.
arXiv Detail & Related papers (2020-10-21T19:39:39Z)
Binarized Neural Architecture Search for Efficient Object Recognition [120.23378346337311]
Binarized neural architecture search (BNAS) produces extremely compressed models to reduce huge computational cost on embedded devices for edge computing. An accuracy of $96.53%$ vs. $97.22%$ is achieved on the CIFAR-10 dataset, but with a significantly compressed model, and a $40%$ faster search than the state-of-the-art PC-DARTS.
arXiv Detail & Related papers (2020-09-08T15:51:23Z)
S3NAS: Fast NPU-aware Neural Architecture Search Methodology [2.607400740040335]
We present a fast NPU-aware NAS methodology, called S3NAS, to find a CNN architecture with higher accuracy than the existing ones. We are able to find a network in 3 hours using TPUv3, which shows 82.72% top-1 accuracy on ImageNet with 11.66 ms latency.
arXiv Detail & Related papers (2020-09-04T04:45:50Z)
MCUNet: Tiny Deep Learning on IoT Devices [62.752899523628066]
We propose a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine) TinyNAS adopts a two-stage neural architecture search approach that first optimize the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 4.8x.
arXiv Detail & Related papers (2020-07-20T17:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.