STResNet & STYOLO : A New Family of Compact Classification and Object Detection Models for MCUs
- URL: http://arxiv.org/abs/2601.05364v1
- Date: Thu, 08 Jan 2026 20:39:50 GMT
- Title: STResNet & STYOLO : A New Family of Compact Classification and Object Detection Models for MCUs
- Authors: Sudhakar Sah, Ravish Kumar,
- Abstract summary: STResNet for image classification and STYOLO for object detection jointly optimized for accuracy, efficiency, and memory footprint on resource constrained platforms.<n> STResNetMilli attains 70.0 percent Top 1 accuracy with only three million parameters, outperforming MobileNetV1 and ShuffleNetV2 at comparable computational complexity.
- Score: 3.9048771791853816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in lightweight neural networks have significantly improved the efficiency of deploying deep learning models on edge hardware. However, most existing architectures still trade accuracy for latency, which limits their applicability on microcontroller and neural processing unit based devices. In this work, we introduce two new model families, STResNet for image classification and STYOLO for object detection, jointly optimized for accuracy, efficiency, and memory footprint on resource constrained platforms. The proposed STResNet series, ranging from Nano to Tiny variants, achieves competitive ImageNet 1K accuracy within a four million parameter budget. Specifically, STResNetMilli attains 70.0 percent Top 1 accuracy with only three million parameters, outperforming MobileNetV1 and ShuffleNetV2 at comparable computational complexity. For object detection, STYOLOMicro and STYOLOMilli achieve 30.5 percent and 33.6 percent mean average precision, respectively, on the MS COCO dataset, surpassing YOLOv5n and YOLOX Nano in both accuracy and efficiency. Furthermore, when STResNetMilli is used as a backbone with the Ultralytics training environment.
Related papers
- Comparative Analysis of Lightweight Deep Learning Models for Memory-Constrained Devices [0.0]
Five state-of-the-art architectures are benchmarked across three diverse datasets: CIFAR-10, CIFAR-100, and Tiny ImageNet.<n>The models are assessed using four key performance metrics: classification accuracy, inference time, floating-point operations (FLOPs), and model size.
arXiv Detail & Related papers (2025-05-06T08:36:01Z) - A Light Perspective for 3D Object Detection [46.23578780480946]
This paper introduces a novel approach that incorporates cutting-edge Deep Learning techniques into the feature extraction process.<n>Our model, NextBEV, surpasses established feature extractors like ResNet50 and MobileNetV3.<n>By fusing these lightweight proposals, we have enhanced the accuracy of the VoxelNet-based model by 2.93% and improved the F1-score of the PointPillar-based model by approximately 20%.
arXiv Detail & Related papers (2025-03-10T10:03:23Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - LeYOLO, New Embedded Architecture for Object Detection [0.0]
We introduce two key contributions to object detection models using MSCOCO as a base validation set.<n>First, we propose LeNeck, a general-purpose detection framework that maintains inference speed comparable to SSDLite.<n>Second, we present LeYOLO, an efficient object detection model designed to enhance computational efficiency in YOLO-based architectures.
arXiv Detail & Related papers (2024-06-20T12:08:24Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - Edge Inference with Fully Differentiable Quantized Mixed Precision
Neural Networks [1.131071436917293]
Quantizing parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference.
This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing.
arXiv Detail & Related papers (2022-06-15T18:11:37Z) - Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on
Edge Devices [19.335535517714703]
The prediction accuracy of the deep neural networks (DNNs) after deployment at the edge can suffer with time due to shifts in the distribution of the new data.
Recent prediction-time unsupervised DNN adaptation techniques have been introduced that improve prediction accuracy of the models for noisy data by re-tuning the batch normalization parameters.
This paper, for the first time, performs a comprehensive measurement study of such techniques to quantify their performance and energy on various edge devices.
arXiv Detail & Related papers (2022-03-21T19:10:40Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z) - Lightweight Residual Densely Connected Convolutional Neural Network [18.310331378001397]
The lightweight residual densely connected blocks are proposed to guaranty the deep supervision, efficient gradient flow, and feature reuse abilities of convolutional neural network.
The proposed method decreases the cost of training and inference processes without using any special hardware-software equipment.
arXiv Detail & Related papers (2020-01-02T17:15:32Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.