Related papers: Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems

Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems

URL: http://arxiv.org/abs/2404.11488v1
Date: Wed, 17 Apr 2024 15:45:49 GMT
Title: Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems
Authors: Luca Bompani, Manuele Rusci, Daniele Palossi, Francesco Conti, Luca Benini,
Abstract summary: Multi-Resolution Rescored Byte-Track (MR2-ByteTrack) is a novel video object detection framework for ultra-low-power embedded processors. MR2-ByteTrack reduces the average compute load of an off-the-shelf Deep Neural Network based object detector by up to 2.25$times$. We demonstrate an average accuracy increase of 2.16% and a latency reduction of 43% on the GAP9 microcontroller.
Score: 13.225654514930595
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors. This method reduces the average compute load of an off-the-shelf Deep Neural Network (DNN) based object detector by up to 2.25$\times$ by alternating the processing of high-resolution images (320$\times$320 pixels) with multiple down-sized frames (192$\times$192 pixels). To tackle the accuracy degradation due to the reduced image input size, MR2-ByteTrack correlates the output detections over time using the ByteTrack tracker and corrects potential misclassification using a novel probabilistic Rescore algorithm. By interleaving two down-sized images for every high-resolution one as the input of different state-of-the-art DNN object detectors with our MR2-ByteTrack, we demonstrate an average accuracy increase of 2.16% and a latency reduction of 43% on the GAP9 microcontroller compared to a baseline frame-by-frame inference scheme using exclusively full-resolution images. Code available at: https://github.com/Bomps4/Multi_Resolution_Rescored_ByteTrack

Related papers

Learning to Make Keypoints Sub-Pixel Accurate [80.55676599677824]
This work addresses the challenge of sub-pixel accuracy in detecting 2D local features. We propose a novel network that enhances any detector with sub-pixel precision by learning an offset vector for detected features.
arXiv Detail & Related papers (2024-07-16T12:39:56Z)
LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking [12.670730236928353]
Low-Resolution Transformer Tracker (LoReTrack) LoReTrack with a 256x256 resolution consistently improves baseline with the same resolution, and shows competitive or even better results compared to 384x384 high-resolution Transformer tracker. With a 128x128 resolution, it runs 25 fps on a CPU with 64.9%/46.4% SUC scores on LaSOT/LaSOText, surpassing all other CPU real-time trackers.
arXiv Detail & Related papers (2024-05-27T21:19:04Z)
Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing [0.7305342793164903]
We propose a model simplification method for two-stage object detectors. Our method reduces computation costs upto 61.2% with the accuracy loss within 2.1% on the DOTAv1.5 dataset.
arXiv Detail & Related papers (2024-04-11T00:45:10Z)
Practical cross-sensor color constancy using a dual-mapping strategy [0.0]
The proposed method uses a dual-mapping strategy and only requires a simple white point from a test sensor under a D65 condition. In the second mapping phase, we transform the re-constructed image data into sparse features, which are then optimized with a lightweight multi-layer perceptron (MLP) model. This approach effectively reduces sensor discrepancies and delivers performance on par with leading cross-sensor methods.
arXiv Detail & Related papers (2023-11-20T13:58:59Z)
ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box [81.45219802386444]
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames. We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes. In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate.
arXiv Detail & Related papers (2023-03-27T15:35:21Z)
SALISA: Saliency-based Input Sampling for Efficient Video Object Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection. We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z)
Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation [93.80710126516405]
We propose a novel lightweight ORSI-SOD solution, named CorrNet, to address these issues. By reducing the parameters and computations of each component, CorrNet ends up having only 4.09M parameters and running with 21.09G FLOPs. Experimental results on two public datasets demonstrate that our lightweight CorrNet achieves competitive or even better performance compared with 26 state-of-the-art methods.
arXiv Detail & Related papers (2022-01-20T08:28:01Z)
DPNET: Dual-Path Network for Efficient Object Detectioj with Lightweight Self-Attention [16.13989397708127]
DPNet is a dual path network for efficient object detection with lightweight self-attention. It achieves 29.0% AP on COCO dataset, with only 1.14 GFLOPs and 2.27M model size for a 320x320 image.
arXiv Detail & Related papers (2021-10-31T13:38:16Z)
You Better Look Twice: a new perspective for designing accurate detectors with reduced computations [56.34005280792013]
BLT-net is a new low-computation two-stage object detection architecture. It reduces computations by separating objects from background using a very lite first-stage. Resulting image proposals are then processed in the second-stage by a highly accurate model.
arXiv Detail & Related papers (2021-07-21T12:39:51Z)
PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection [100.60209139039472]
We propose the PointVoxel Region based Convolution Neural Networks (PVRCNNs) for accurate 3D detection from point clouds. Our proposed PV-RCNNs significantly outperform previous state-of-the-art 3D detection methods on both the Open dataset and the highly-competitive KITTI benchmark.
arXiv Detail & Related papers (2021-01-31T14:51:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.