Newton-PnP: Real-time Visual Navigation for Autonomous Toy-Drones
- URL: http://arxiv.org/abs/2203.02686v1
- Date: Sat, 5 Mar 2022 09:00:50 GMT
- Title: Newton-PnP: Real-time Visual Navigation for Autonomous Toy-Drones
- Authors: Ibrahim Jubran, Fares Fares, Yuval Alfassi, Firas Ayoub, Dan Feldman
- Abstract summary: Perspective-n-Point problem aims to estimate the relative pose between a calibrated monocular camera and a known 3D model.
We suggest an algorithm that runs on weak IoT in real-time but still provides provable guarantees for both running time and correctness.
Our main motivation was to turn the popular DJI's Tello Drone into an autonomous drone that navigates in an indoor environment with no external human/laptop/sensor.
- Score: 15.075691719756877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Perspective-n-Point problem aims to estimate the relative pose between a
calibrated monocular camera and a known 3D model, by aligning pairs of 2D
captured image points to their corresponding 3D points in the model. We suggest
an algorithm that runs on weak IoT devices in real-time but still provides
provable theoretical guarantees for both running time and correctness. Existing
solvers provide only one of these requirements. Our main motivation was to turn
the popular DJI's Tello Drone (<90gr, <\$100) into an autonomous drone that
navigates in an indoor environment with no external human/laptop/sensor, by
simply attaching a Raspberry PI Zero (<9gr, <\$25) to it. This tiny
micro-processor takes as input a real-time video from a tiny RGB camera, and
runs our PnP solver on-board. Extensive experimental results, open source code,
and a demonstration video are included.
Related papers
- EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams [59.77837807004765]
This paper introduces a new problem, i.e., 3D human motion capture from an egocentric monocular event camera with a fisheye lens.
Event streams have high temporal resolution and provide reliable cues for 3D human motion capture under high-speed human motions and rapidly changing illumination.
Our EE3D demonstrates robustness and superior 3D accuracy compared to existing solutions while supporting real-time 3D pose update rates of 140Hz.
arXiv Detail & Related papers (2024-04-12T17:59:47Z) - BAA-NGP: Bundle-Adjusting Accelerated Neural Graphics Primitives [6.431806897364565]
Implicit neural representations have become pivotal in robotic perception, enabling robots to comprehend 3D environments from 2D images.
We propose a framework called bundle-adjusting accelerated neural graphics primitives (BAA-NGP)
Results demonstrate 10 to 20 x speed improvement compared to other bundle-adjusting neural radiance field methods.
arXiv Detail & Related papers (2023-06-07T05:36:45Z) - External Camera-based Mobile Robot Pose Estimation for Collaborative
Perception with Smart Edge Sensors [22.5939915003931]
We present an approach for estimating a mobile robot's pose w.r.t. the allocentric coordinates of a network of static cameras using multi-view RGB images.
The images are processed online, locally on smart edge sensors by deep neural networks to detect the robot.
With the robot's pose precisely estimated, its observations can be fused into the allocentric scene model.
arXiv Detail & Related papers (2023-03-07T11:03:33Z) - TransVisDrone: Spatio-Temporal Transformer for Vision-based
Drone-to-Drone Detection in Aerial Videos [57.92385818430939]
Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones.
Existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices.
We propose a simple yet effective framework, itTransVisDrone, that provides an end-to-end solution with higher computational efficiency.
arXiv Detail & Related papers (2022-10-16T03:05:13Z) - Deep Learning on Home Drone: Searching for the Optimal Architecture [54.535788447839884]
We suggest the first system that runs real-time semantic segmentation via deep learning on a weak micro-computer such as the Raspberry Pi Zero v2 attached to a toy-drone.
In particular, since the Raspberry Pi weighs less than $16$ grams, and its size is half of a credit card, we could easily attach it to the common commercial DJI Tello toy-drone.
The result is an autonomous drone that can detect and classify objects in real-time from a video stream of an on-board monocular RGB camera.
arXiv Detail & Related papers (2022-09-21T11:41:45Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z) - Fast Autofocusing using Tiny Networks for Digital Holographic Microscopy [0.5057148335041798]
A deep learning (DL) solution is proposed to cast the autofocusing as a regression problem and tested over both experimental and simulated holograms.
Experiments show that the predicted focusing distance $Z_RmathrmPred$ is accurately inferred with an accuracy of 1.2 $mu$m.
Models reach state of the art inference time on CPU, less than 25 ms per inference.
arXiv Detail & Related papers (2022-03-15T10:52:58Z) - AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape
Estimation [51.17610485589701]
We present a novel markerless 3D human motion capture (MoCap) system for unstructured, outdoor environments.
AirPose estimates human pose and shape using images captured by multiple uncalibrated flying cameras.
AirPose itself calibrates the cameras relative to the person instead of relying on any pre-calibration.
arXiv Detail & Related papers (2022-01-20T09:46:20Z) - FLEX: Parameter-free Multi-view 3D Human Motion Reconstruction [70.09086274139504]
Multi-view algorithms strongly depend on camera parameters, in particular, the relative positions among the cameras.
We introduce FLEX, an end-to-end parameter-free multi-view model.
We demonstrate results on the Human3.6M and KTH Multi-view Football II datasets.
arXiv Detail & Related papers (2021-05-05T09:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.