Related papers: Deep Learning Computer Vision Algorithms for Real-time UAVs On-board Camera Image Processing

Deep Learning Computer Vision Algorithms for Real-time UAVs On-board Camera Image Processing

URL: http://arxiv.org/abs/2211.01037v1
Date: Wed, 2 Nov 2022 11:10:42 GMT
Title: Deep Learning Computer Vision Algorithms for Real-time UAVs On-board Camera Image Processing
Authors: Alessandro Palmas, Pietro Andronico
Abstract summary: This paper describes how advanced deep learning based computer vision algorithms are applied to enable real-time on-board sensor processing for small UAVs. All algorithms have been developed using state-of-the-art image processing methods based on deep neural networks.
Score: 77.34726150561087
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper describes how advanced deep learning based computer vision algorithms are applied to enable real-time on-board sensor processing for small UAVs. Four use cases are considered: target detection, classification and localization, road segmentation for autonomous navigation in GNSS-denied zones, human body segmentation, and human action recognition. All algorithms have been developed using state-of-the-art image processing methods based on deep neural networks. Acquisition campaigns have been carried out to collect custom datasets reflecting typical operational scenarios, where the peculiar point of view of a multi-rotor UAV is replicated. Algorithms architectures and trained models performances are reported, showing high levels of both accuracy and inference speed. Output examples and on-field videos are presented, demonstrating models operation when deployed on a GPU-powered commercial embedded device (NVIDIA Jetson Xavier) mounted on board of a custom quad-rotor, paving the way to enabling high level autonomy.

Related papers

Real-Time Navigation for Autonomous Aerial Vehicles Using Video [11.414350041043326]
We introduce a novel Markov Decision Process(MDP) framework to reduce the workload of Computer Vision(CV) algorithms. We apply our proposed framework to both feature-based and neural-network-based object-detection tasks. These holistic tests show significant benefits in energy consumption and speed with only a limited loss in accuracy.
arXiv Detail & Related papers (2025-04-01T01:14:42Z)
Toward a Low-Cost Perception System in Autonomous Vehicles: A Spectrum Learning Approach [19.23732332126651]
We introduce a novel pixel positional encoding algorithm inspired by Bartlett's spatial spectrum estimation technique. Our method effectively leverages high-resolution camera images to train radar depth map generative models. Our results demonstrate that our approach also outperforms the state-of-the-art (SOTA) by 27.95% in terms of Unidirectional Chamfer Distance (UCD)
arXiv Detail & Related papers (2025-02-04T02:20:52Z)
Semantic Segmentation of Unmanned Aerial Vehicle Remote Sensing Images using SegFormer [0.14999444543328289]
This paper evaluates the effectiveness and efficiency of SegFormer, a semantic segmentation framework, for the semantic segmentation of UAV images. SegFormer variants, ranging in real-time (B0) to high-performance (B5) models, are assessed using the UAVid dataset tailored for semantic segmentation tasks. Experimental results showcase the model's performance on benchmark dataset, highlighting its ability to accurately delineate objects and land cover features in diverse UAV scenarios.
arXiv Detail & Related papers (2024-10-01T21:40:15Z)
TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes [58.180556221044235]
We present a new approach to bridge the domain gap between synthetic and real-world data for unmanned aerial vehicle (UAV)-based perception. Our formulation is designed for dynamic scenes, consisting of small moving objects or human actions. We evaluate its performance on challenging datasets, including Okutama Action and UG2.
arXiv Detail & Related papers (2024-05-04T21:55:33Z)
High Speed Human Action Recognition using a Photonic Reservoir Computer [1.7403133838762443]
We introduce a new training method for the reservoir computer, based on "Timesteps Of Interest" We solve the task with high accuracy and speed, to the point of allowing for processing multiple video streams in real time.
arXiv Detail & Related papers (2023-05-24T16:04:42Z)
Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner. Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping. Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z)
Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos. Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras. We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z)
VideoPose: Estimating 6D object pose from videos [14.210010379733017]
We introduce a simple yet effective algorithm that uses convolutional neural networks to directly estimate object poses from videos. Our proposed network takes a pre-trained 2D object detector as input, and aggregates visual features through a recurrent neural network to make predictions at each frame. Experimental evaluation on the YCB-Video dataset show that our approach is on par with the state-of-the-art algorithms.
arXiv Detail & Related papers (2021-11-20T20:57:45Z)
Polyline Based Generative Navigable Space Segmentation for Autonomous Visual Navigation [57.3062528453841]
We propose a representation-learning-based framework to enable robots to learn the navigable space segmentation in an unsupervised manner. We show that the proposed PSV-Nets can learn the visual navigable space with high accuracy, even without any single label.
arXiv Detail & Related papers (2021-10-29T19:50:48Z)
Deep Direct Volume Rendering: Learning Visual Feature Mappings From Exemplary Images [57.253447453301796]
We introduce Deep Direct Volume Rendering (DeepDVR), a generalization of Direct Volume Rendering (DVR) that allows for the integration of deep neural networks into the DVR algorithm. We conceptualize the rendering in a latent color space, thus enabling the use of deep architectures to learn implicit mappings for feature extraction and classification. Our generalization serves to derive novel volume rendering architectures that can be trained end-to-end directly from examples in image space.
arXiv Detail & Related papers (2021-06-09T23:03:00Z)
On Deep Learning Techniques to Boost Monocular Depth Estimation for Autonomous Navigation [1.9007546108571112]
Inferring the depth of images is a fundamental inverse problem within the field of Computer Vision. We propose a new lightweight and fast supervised CNN architecture combined with novel feature extraction models. We also introduce an efficient surface normals module, jointly with a simple geometric 2.5D loss function, to solve SIDE problems.
arXiv Detail & Related papers (2020-10-13T18:37:38Z)
Deep Learning Based Vehicle Tracking System Using License Plate Detection And Recognition [0.0]
The proposed system uses a novel approach to vehicle tracking using Vehicle License plate detection and recognition (OCR) technique. Results were obtained at a speed of 30 frames per second with accuracy close to human.
arXiv Detail & Related papers (2020-05-10T14:03:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.