Quantized Distillation: Optimizing Driver Activity Recognition Models
for Resource-Constrained Environments
- URL: http://arxiv.org/abs/2311.05970v1
- Date: Fri, 10 Nov 2023 10:07:07 GMT
- Title: Quantized Distillation: Optimizing Driver Activity Recognition Models
for Resource-Constrained Environments
- Authors: Calvin Tanama, Kunyu Peng, Zdravko Marinov, Rainer Stiefelhagen, and
Alina Roitberg
- Abstract summary: This paper introduces a lightweight framework for resource-efficient driver activity recognition.
The framework enhances 3D MobileNet, a neural architecture optimized for speed in video classification.
It achieves a threefold reduction in model size and a 1.4-fold improvement in inference time.
- Score: 34.80538284957094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning-based models are at the forefront of most driver observation
benchmarks due to their remarkable accuracies but are also associated with high
computational costs. This is challenging, as resources are often limited in
real-world driving scenarios. This paper introduces a lightweight framework for
resource-efficient driver activity recognition. The framework enhances 3D
MobileNet, a neural architecture optimized for speed in video classification,
by incorporating knowledge distillation and model quantization to balance model
accuracy and computational efficiency. Knowledge distillation helps maintain
accuracy while reducing the model size by leveraging soft labels from a larger
teacher model (I3D), instead of relying solely on original ground truth data.
Model quantization significantly lowers memory and computation demands by using
lower precision integers for model weights and activations. Extensive testing
on a public dataset for in-vehicle monitoring during autonomous driving
demonstrates that this new framework achieves a threefold reduction in model
size and a 1.4-fold improvement in inference time, compared to an already
optimized architecture. The code for this study is available at
https://github.com/calvintanama/qd-driver-activity-reco.
Related papers
- Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance [15.877771709013743]
In this work, we propose a novel finetuning framework to ternary-weight embedding models.
To apply ternarization to pre-trained embedding models, we introduce self-taught knowledge distillation to finalize the ternary-weights of the linear layers.
With extensive experiments on public text and vision datasets, we demonstrated that without sacrificing effectiveness, the ternarized model consumes low memory usage.
arXiv Detail & Related papers (2024-11-23T03:44:56Z) - Open-Source High-Speed Flight Surrogate Modeling Framework [0.0]
High-speed flight vehicles, which travel much faster than the speed of sound, are crucial for national defense and space exploration.
accurately predicting their behavior under numerous, varied flight conditions is a challenge and often expensive.
The proposed approach involves creating smarter, more efficient machine learning models.
arXiv Detail & Related papers (2024-11-06T01:34:06Z) - Distill-then-prune: An Efficient Compression Framework for Real-time Stereo Matching Network on Edge Devices [5.696239274365031]
We propose a novel strategy by incorporating knowledge distillation and model pruning to overcome the inherent trade-off between speed and accuracy.
We obtained a model that maintains real-time performance while delivering high accuracy on edge devices.
arXiv Detail & Related papers (2024-05-20T06:03:55Z) - Asymmetric Masked Distillation for Pre-Training Small Foundation Models [52.56257450614992]
Self-supervised foundation models have shown great potential in computer vision thanks to the pre-training paradigm of masked autoencoding.
This paper focuses on pre-training relatively small vision transformer models that could be efficiently adapted to downstream tasks.
We propose a new asymmetric masked distillation (AMD) framework for pre-training relatively small models with autoencoding.
arXiv Detail & Related papers (2023-11-06T14:44:34Z) - Rethinking Voxelization and Classification for 3D Object Detection [68.8204255655161]
The main challenge in 3D object detection from LiDAR point clouds is achieving real-time performance without affecting the reliability of the network.
We present a solution to improve network inference speed and precision at the same time by implementing a fast dynamic voxelizer.
In addition, we propose a lightweight detection sub-head model for classifying predicted objects and filter out false detected objects.
arXiv Detail & Related papers (2023-01-10T16:22:04Z) - Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks.
Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts.
Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - BEVDetNet: Bird's Eye View LiDAR Point Cloud based Real-time 3D Object
Detection for Autonomous Driving [6.389322215324224]
We propose a novel semantic segmentation architecture as a single unified model for object center detection using key points, box predictions and orientation prediction.
The proposed architecture can be trivially extended to include semantic segmentation classes like road without any additional computation.
The model is 5X faster than other top accuracy models with a minimal accuracy degradation of 2% in Average Precision at IoU=0.5 on KITTI dataset.
arXiv Detail & Related papers (2021-04-21T22:06:39Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - Computation on Sparse Neural Networks: an Inspiration for Future
Hardware [20.131626638342706]
We describe the current status of the research on the computation of sparse neural networks.
We discuss the model accuracy influenced by the number of weight parameters and the structure of the model.
We show that for practically complicated problems, it is more beneficial to search large and sparse models in the weight dominated region.
arXiv Detail & Related papers (2020-04-24T19:13:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.