Related papers: PowerYOLO: Mixed Precision Model for Hardware Efficient Object Detection with Event Data

PowerYOLO: Mixed Precision Model for Hardware Efficient Object Detection with Event Data

URL: http://arxiv.org/abs/2407.08272v1
Date: Thu, 11 Jul 2024 08:17:35 GMT
Title: PowerYOLO: Mixed Precision Model for Hardware Efficient Object Detection with Event Data
Authors: Dominika Przewlocka-Rus, Tomasz Kryjak, Marek Gorgon,
Abstract summary: PowerYOLO is a mixed precision solution to the problem of fitting algorithms of high memory and computational complexity into small low-power devices. First, we propose a system based on a Dynamic Vision Sensor (DVS), a novel sensor, that offers low power requirements. Second, to ensure high accuracy and low memory and computational complexity, we propose to use 4-bit width Powers-of-Two (PoT) quantisation. Third, we replace multiplication with bit-shifting to increase the efficiency of hardware acceleration of such solution.
Score: 0.5461938536945721
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The performance of object detection systems in automotive solutions must be as high as possible, with minimal response time and, due to the often battery-powered operation, low energy consumption. When designing such solutions, we therefore face challenges typical for embedded vision systems: the problem of fitting algorithms of high memory and computational complexity into small low-power devices. In this paper we propose PowerYOLO - a mixed precision solution, which targets three essential elements of such application. First, we propose a system based on a Dynamic Vision Sensor (DVS), a novel sensor, that offers low power requirements and operates well in conditions with variable illumination. It is these features that may make event cameras a preferential choice over frame cameras in some applications. Second, to ensure high accuracy and low memory and computational complexity, we propose to use 4-bit width Powers-of-Two (PoT) quantisation for convolution weights of the YOLO detector, with all other parameters quantised linearly. Finally, we embrace from PoT scheme and replace multiplication with bit-shifting to increase the efficiency of hardware acceleration of such solution, with a special convolution-batch normalisation fusion scheme. The use of specific sensor with PoT quantisation and special batch normalisation fusion leads to a unique system with almost 8x reduction in memory complexity and vast computational simplifications, with relation to a standard approach. This efficient system achieves high accuracy of mAP 0.301 on the GEN1 DVS dataset, marking the new state-of-the-art for such compressed model.

Related papers

Gaussian Based Adaptive Multi-Modal 3D Semantic Occupancy Prediction [0.0]
This research work enhances a novel adaptive camera-LiDAR multimodal 3D occupancy prediction model.<n>It seamlessly bridges the semantic strengths of camera modality with the geometric strengths of LiDAR modality.
arXiv Detail & Related papers (2026-01-20T20:11:09Z)
Deep Fusion of Ultra-Low-Resolution Thermal Camera and Gyroscope Data for Lighting-Robust and Compute-Efficient Rotational Odometry [1.1838866556981258]
This study introduces thermal-gyro fusion, a novel sensor fusion approach that integrates ultra-low-resolution thermal imaging with gyroscope readings for rotational odometry.<n>Our analysis demonstrates that thermal-gyro fusion enables a significant reduction in thermal camera resolution without significantly compromising accuracy.<n>These advantages make our approach well-suited for real-time deployment in resource-constrained robotic systems.
arXiv Detail & Related papers (2025-06-14T15:23:40Z)
Multi-modal Multi-platform Person Re-Identification: Benchmark and Method [58.59888754340054]
MP-ReID is a novel dataset designed specifically for multi-modality and multi-platform ReID. This benchmark compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging. We introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios.
arXiv Detail & Related papers (2025-03-21T12:27:49Z)
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design [1.3589914205911104]
We make a comprehensive analysis of the general quantization principles on their effect to the triangle of accuracy, memory consumption and system efficiency. We propose MixLLM that explores the new optimization space of mixed-precision quantization between output features. We present the sweet spot of quantization configuration of algorithm-system co-design that leads to high accuracy and system efficiency.
arXiv Detail & Related papers (2024-12-19T07:15:15Z)
PACE: Pacing Operator Learning to Accurate Optical Field Simulation for Complicated Photonic Devices [14.671301859745453]
Existing SOTA approaches, NeurOLight, struggle with predicting high-fidelity fields for real-world complicated photonic devices. We propose a novel cross-axis factorized PACE operator with a strong long-distance modeling capacity. Inspired by human learning, we conquer the simulation task for extremely hard cases into two progressively easy tasks.
arXiv Detail & Related papers (2024-11-05T22:03:14Z)
SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation [74.07836010698801]
We propose an SMPL-based Transformer framework (SMPLer) to address this issue. SMPLer incorporates two key ingredients: a decoupled attention operation and an SMPL-based target representation. Extensive experiments demonstrate the effectiveness of SMPLer against existing 3D human shape and pose estimation methods.
arXiv Detail & Related papers (2024-04-23T17:59:59Z)
LEMDA: A Novel Feature Engineering Method for Intrusion Detection in IoT Systems [3.5323691899538137]
Intrusion detection systems (IDS) for the Internet of Things (IoT) systems can use AI-based models to ensure secure communications. Complex models have notorious problems such as overfitting, low interpretability, and high computational complexity. This paper proposes a new feature engineering method called LEMDA (Light feature Engineering based on the Mean Decrease in Accuracy)
arXiv Detail & Related papers (2024-04-20T11:11:47Z)
Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
Match and Locate: low-frequency monocular odometry based on deep feature matching [0.65268245109828]
We introduce a novel approach for the robotic odometry which only requires a single camera. The approach is based on matching image features between the consecutive frames of the video stream using deep feature matching models. We evaluate the performance of the approach in the AISG-SLA Visual Localisation Challenge and find that while being computationally efficient and easy to implement our method shows competitive results.
arXiv Detail & Related papers (2023-11-16T17:32:58Z)
M3ICRO: Machine Learning-Enabled Compact Photonic Tensor Core based on PRogrammable Multi-Operand Multimode Interference [18.0155410476884]
Photonic tensor core (PTC) designs based on standard optical components hinder scalability and compute density due to their large spatial footprint. We propose an ultra-compact PTC using customized programmable multi-operand multimode interference (MOMMI) devices, named M3ICRO. M3ICRO achieves a 3.4-9.6x smaller footprint, 1.6-4.4x higher speed, 10.6-42x higher compute density, 3.7-12x higher system throughput, and superior noise robustness.
arXiv Detail & Related papers (2023-05-31T02:34:36Z)
Collaborative Intelligent Reflecting Surface Networks with Multi-Agent Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks. In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z)
Improved Transformer for High-Resolution GANs [69.42469272015481]
We introduce two key ingredients to Transformer to address this challenge. We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 31.87 and 2.95 on unconditional ImageNet $128 times 128$ and FFHQ $256 times 256$, respectively.
arXiv Detail & Related papers (2021-06-14T17:39:49Z)
Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy. We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR. Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z)
MIMC-VINS: A Versatile and Resilient Multi-IMU Multi-Camera Visual-Inertial Navigation System [44.76768683036822]
We propose a real-time consistent multi-IMU multi-camera (CMU)-VINS estimator for visual-inertial navigation systems. Within an efficient multi-state constraint filter, the proposed MIMC-VINS algorithm optimally fuses asynchronous measurements from all sensors. The proposed MIMC-VINS is validated in both Monte-Carlo simulations and real-world experiments.
arXiv Detail & Related papers (2020-06-28T20:16:08Z)
ASFD: Automatic and Scalable Face Detector [129.82350993748258]
We propose a novel Automatic and Scalable Face Detector (ASFD) ASFD is based on a combination of neural architecture search techniques as well as a new loss design. Our ASFD-D6 outperforms the prior strong competitors, and our lightweight ASFD-D0 runs at more than 120 FPS with Mobilenet for VGA-resolution images.
arXiv Detail & Related papers (2020-03-25T06:00:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.