A 96pJ/Frame/Pixel and 61pJ/Event Anti-UAV System with Hybrid Object Tracking Modes
- URL: http://arxiv.org/abs/2512.17939v1
- Date: Fri, 12 Dec 2025 13:53:38 GMT
- Title: A 96pJ/Frame/Pixel and 61pJ/Event Anti-UAV System with Hybrid Object Tracking Modes
- Authors: Yuncheng Lu, Yucen Shi, Aobo Li, Zehao Li, Junying Li, Bo Wang, Tony Tae-Hyoung Kim,
- Abstract summary: We present an energy-efficient anti-UAV system that integrates frame-based and event-driven object tracking.<n>The 2 mm2 chip achieves 96 pJ per frame per pixel and 61 pJ per event at 0.8 V, and reaches 98.2 percent recognition accuracy on public UAV datasets.
- Score: 5.593237736175593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an energy-efficient anti-UAV system that integrates frame-based and event-driven object tracking to enable reliable detection of small and fast-moving drones. The system reconstructs binary event frames using run-length encoding, generates region proposals, and adaptively switches between frame mode and event mode based on object size and velocity. A Fast Object Tracking Unit improves robustness for high-speed targets through adaptive thresholding and trajectory-based classification. The neural processing unit supports both grayscale-patch and trajectory inference with a custom instruction set and a zero-skipping MAC architecture, reducing redundant neural computations by more than 97 percent. Implemented in 40 nm CMOS technology, the 2 mm^2 chip achieves 96 pJ per frame per pixel and 61 pJ per event at 0.8 V, and reaches 98.2 percent recognition accuracy on public UAV datasets across 50 to 400 m ranges and 5 to 80 pixels per second speeds. The results demonstrate state-of-the-art end-to-end energy efficiency for anti-UAV systems.
Related papers
- Commercial Vehicle Braking Optimization: A Robust SIFT-Trajectory Approach [6.751326589596112]
A vision-based trajectory analysis solution is proposed to address the "zero-speed braking" issue caused by inaccurate Controller Area Network (CAN) signals.<n>The algorithm utilizes the NVIDIA Jetson AGX Xavier platform to process sequential video frames from a blind spot camera.<n>The deployment on-site shows an 89% reduction in false braking events, a 100% success rate in emergency braking, and a fault rate below 5%.
arXiv Detail & Related papers (2025-12-21T05:06:16Z) - Event-Based Visual Teach-and-Repeat via Fast Fourier-Domain Cross-Correlation [52.46888249268445]
We present the first event-camera-based visual teach-and-repeat system.<n>We develop a frequency-domain cross-correlation framework that transforms the event stream matching problem into computationally efficient space multiplications.<n>Experiments using a Prophesee EVK4 HD event camera mounted on an AgileX Scout Mini robot demonstrate successful autonomous navigation.
arXiv Detail & Related papers (2025-09-21T23:53:31Z) - Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking [49.07982079554859]
Transformer-based visual trackers have demonstrated significant advancements due to their powerful modeling capabilities.<n>However, their practicality is limited on resource-constrained devices because of their slow processing speeds.<n>We present HiT, a novel family of efficient tracking models that achieve high performance while maintaining fast operation across various devices.
arXiv Detail & Related papers (2025-06-25T12:46:46Z) - Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving [3.617580194719686]
This paper introduces Fast-COS, a novel single-stage object detection framework crafted specifically for driving scenes.<n> RAViT achieves 81.4% Top-1 accuracy on the ImageNet-1K dataset.<n>It surpasses leading models in efficiency, delivering up to 75.9% faster GPU inference and 1.38 higher throughput on edge devices.
arXiv Detail & Related papers (2025-02-11T09:54:09Z) - Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - Gesture Recognition for FMCW Radar on the Edge [0.0]
We show that gestures can be characterized efficiently by a set of five features.
A recurrent neural network (RNN) based architecture exploits these features to jointly detect and classify five different gestures.
The proposed system recognizes gestures with an F1 score of 98.4% on our hold-out test dataset.
arXiv Detail & Related papers (2023-10-13T06:03:07Z) - ColibriUAV: An Ultra-Fast, Energy-Efficient Neuromorphic Edge Processing
UAV-Platform with Event-Based and Frame-Based Cameras [14.24529561007139]
ColibriUAV is a UAV platform with both frame-based and event-based cameras interfaces.
Kraken is capable of efficiently processing both event data from a DVS camera and frame data from an RGB camera.
This paper benchmarks the end-to-end latency and power efficiency of the neuromorphic and event-based UAV subsystem.
arXiv Detail & Related papers (2023-05-27T23:08:22Z) - Ultra-low Power Deep Learning-based Monocular Relative Localization
Onboard Nano-quadrotors [64.68349896377629]
This work presents a novel autonomous end-to-end system that addresses the monocular relative localization, through deep neural networks (DNNs), of two peer nano-drones.
To cope with the ultra-constrained nano-drone platform, we propose a vertically-integrated framework, including dataset augmentation, quantization, and system optimizations.
Experimental results show that our DNN can precisely localize a 10cm-size target nano-drone by employing only low-resolution monochrome images, up to 2m distance.
arXiv Detail & Related papers (2023-03-03T14:14:08Z) - Fast Motion Understanding with Spatiotemporal Neural Networks and
Dynamic Vision Sensors [99.94079901071163]
This paper presents a Dynamic Vision Sensor (DVS) based system for reasoning about high speed motion.
We consider the case of a robot at rest reacting to a small, fast approaching object at speeds higher than 15m/s.
We highlight the results of our system to a toy dart moving at 23.4m/s with a 24.73deg error in $theta$, 18.4mm average discretized radius prediction error, and 25.03% median time to collision prediction error.
arXiv Detail & Related papers (2020-11-18T17:55:07Z) - CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on
Embedded FPGAs [41.43273142203345]
We harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions.
With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB.
Our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters-20.9x smaller but 10% more accurate than Tiny-YOLO.
arXiv Detail & Related papers (2020-06-12T17:56:47Z) - Near-chip Dynamic Vision Filtering for Low-Bandwidth Pedestrian
Detection [99.94079901071163]
This paper presents a novel end-to-end system for pedestrian detection using Dynamic Vision Sensors (DVSs)
We target applications where multiple sensors transmit data to a local processing unit, which executes a detection algorithm.
Our detector is able to perform a detection every 450 ms, with an overall testing F1 score of 83%.
arXiv Detail & Related papers (2020-04-03T17:36:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.