Related papers: Q-YOLO: Efficient Inference for Real-time Object Detection

Q-YOLO: Efficient Inference for Real-time Object Detection

URL: http://arxiv.org/abs/2307.04816v1
Date: Sat, 1 Jul 2023 03:50:32 GMT
Title: Q-YOLO: Efficient Inference for Real-time Object Detection
Authors: Mingze Wang, Huixin Sun, Jun Shi, Xuhui Liu, Baochang Zhang, Xianbin Cao
Abstract summary: Real-time object detection plays a vital role in various computer vision applications. deploying real-time object detectors on resource-constrained platforms poses challenges due to high computational and memory requirements. This paper describes a low-bit quantization method to build a highly efficient one-stage detector, dubbed as Q-YOLO.
Score: 29.51643492051404
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-time object detection plays a vital role in various computer vision applications. However, deploying real-time object detectors on resource-constrained platforms poses challenges due to high computational and memory requirements. This paper describes a low-bit quantization method to build a highly efficient one-stage detector, dubbed as Q-YOLO, which can effectively address the performance degradation problem caused by activation distribution imbalance in traditional quantized YOLO models. Q-YOLO introduces a fully end-to-end Post-Training Quantization (PTQ) pipeline with a well-designed Unilateral Histogram-based (UH) activation quantization scheme, which determines the maximum truncation values through histogram analysis by minimizing the Mean Squared Error (MSE) quantization errors. Extensive experiments on the COCO dataset demonstrate the effectiveness of Q-YOLO, outperforming other PTQ methods while achieving a more favorable balance between accuracy and computational cost. This research contributes to advancing the efficient deployment of object detection models on resource-limited edge devices, enabling real-time detection with reduced computational and memory overhead.

Related papers

Generative QoE Modeling: A Lightweight Approach for Telecom Networks [6.473372512447993]
This study introduces a lightweight generative modeling framework that balances computational efficiency, interpretability, and predictive accuracy. By validating the use of Vector Quantization (VQ) as a preprocessing technique, continuous network features are effectively transformed into discrete categorical symbols. This VQ-HMM pipeline enhances the model's capacity to capture dynamic QoE patterns while supporting probabilistic inference on new and unseen data.
arXiv Detail & Related papers (2025-04-30T06:19:37Z)
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [53.571195477043496]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE) RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers. Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z)
P-YOLOv8: Efficient and Accurate Real-Time Detection of Distracted Driving [0.0]
Distracted driving is a critical safety issue that leads to numerous fatalities and injuries worldwide. This study addresses the need for efficient and real-time machine learning models to detect distracted driving behaviors. A real-time object detection system is introduced, optimized for both speed and accuracy.
arXiv Detail & Related papers (2024-10-21T02:56:44Z)
Q-VLM: Post-training Quantization for Large Vision-Language Models [73.19871905102545]
We propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference. We mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy. Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLaVA model without performance degradation.
arXiv Detail & Related papers (2024-10-10T17:02:48Z)
Reducing the Side-Effects of Oscillations in Training of Quantized YOLO Networks [5.036532914308394]
We show that it is difficult to achieve extremely low precision (4-bit and lower) for efficient YOLO models even with SOTA QAT methods due to oscillation issue. We propose a simple QAT correction method, namely QC, that takes only a single epoch of training after standard QAT procedure to correct the error.
arXiv Detail & Related papers (2023-11-09T02:53:21Z)
ELUQuant: Event-Level Uncertainty Quantification in Deep Inelastic Scattering [0.0]
We introduce a physics-informed Bayesian Neural Network (BNN) with flow approximated posteriors for detailed uncertainty quantification (UQ) at the physics event-level. Applying to Deep Inelastic Scattering (DIS) events, our model effectively extracts the kinematic variables $x$, $Q2$, and $y$. This detailed description of the underlying uncertainty proves invaluable for decision-making, especially in tasks like event filtering.
arXiv Detail & Related papers (2023-10-04T15:50:05Z)
Drastic Circuit Depth Reductions with Preserved Adversarial Robustness by Approximate Encoding for Quantum Machine Learning [0.5181797490530444]
We implement methods for the efficient preparation of quantum states representing encoded image data using variational, genetic and matrix product state based algorithms. Results show that these methods can approximately prepare states to a level suitable for QML using circuits two orders of magnitude shallower than a standard state preparation implementation.
arXiv Detail & Related papers (2023-09-18T01:49:36Z)
Potential and limitations of quantum extreme learning machines [55.41644538483948]
We present a framework to model QRCs and QELMs, showing that they can be concisely described via single effective measurements. Our analysis paves the way to a more thorough understanding of the capabilities and limitations of both QELMs and QRCs.
arXiv Detail & Related papers (2022-10-03T09:32:28Z)
Towards Balanced Learning for Instance Recognition [149.76724446376977]
We propose Libra R-CNN, a framework towards balanced learning for instance recognition. It integrates IoU-balanced sampling, balanced feature pyramid, and objective re-weighting, respectively for reducing the imbalance at sample, feature, and objective level.
arXiv Detail & Related papers (2021-08-23T13:40:45Z)
FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose. We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence. Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z)
High Dimensional Level Set Estimation with Bayesian Neural Network [58.684954492439424]
This paper proposes novel methods to solve the high dimensional Level Set Estimation problems using Bayesian Neural Networks. For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points. Numerical experiments on both synthetic and real-world datasets show that our proposed method can achieve better results compared to existing state-of-the-art approaches.
arXiv Detail & Related papers (2020-12-17T23:21:53Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
EfficientPose: Scalable single-person pose estimation [3.325625311163864]
We propose a novel convolutional neural network architecture, called EfficientPose, for single-person pose estimation. Our top-performing model achieves state-of-the-art accuracy on single-person MPII, with low-complexity ConvNets. Due to its low complexity and efficiency, EfficientPose enables real-world applications on edge devices by limiting the memory footprint and computational cost.
arXiv Detail & Related papers (2020-04-25T16:50:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.