Related papers: Newton-PnP: Real-time Visual Navigation for Autonomous Toy-Drones

Newton-PnP: Real-time Visual Navigation for Autonomous Toy-Drones

URL: http://arxiv.org/abs/2203.02686v1
Date: Sat, 5 Mar 2022 09:00:50 GMT
Title: Newton-PnP: Real-time Visual Navigation for Autonomous Toy-Drones
Authors: Ibrahim Jubran, Fares Fares, Yuval Alfassi, Firas Ayoub, Dan Feldman
Abstract summary: Perspective-n-Point problem aims to estimate the relative pose between a calibrated monocular camera and a known 3D model. We suggest an algorithm that runs on weak IoT in real-time but still provides provable guarantees for both running time and correctness. Our main motivation was to turn the popular DJI's Tello Drone into an autonomous drone that navigates in an indoor environment with no external human/laptop/sensor.
Score: 15.075691719756877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Perspective-n-Point problem aims to estimate the relative pose between a calibrated monocular camera and a known 3D model, by aligning pairs of 2D captured image points to their corresponding 3D points in the model. We suggest an algorithm that runs on weak IoT devices in real-time but still provides provable theoretical guarantees for both running time and correctness. Existing solvers provide only one of these requirements. Our main motivation was to turn the popular DJI's Tello Drone (<90gr, <\$100) into an autonomous drone that navigates in an indoor environment with no external human/laptop/sensor, by simply attaching a Raspberry PI Zero (<9gr, <\$25) to it. This tiny micro-processor takes as input a real-time video from a tiny RGB camera, and runs our PnP solver on-board. Extensive experimental results, open source code, and a demonstration video are included.

Related papers

UniK3D: Universal Camera Monocular 3D Estimation [62.06785782635153]
We present UniK3D, the first generalizable method for monocular 3D estimation able to model any camera. Our method introduces a spherical 3D representation which allows for better disentanglement of camera and scene geometry. A comprehensive zero-shot evaluation on 13 diverse datasets demonstrates the state-of-the-art performance of UniK3D across 3D, depth, and camera metrics.
arXiv Detail & Related papers (2025-03-20T17:49:23Z)
Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters. Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z)
BAA-NGP: Bundle-Adjusting Accelerated Neural Graphics Primitives [6.431806897364565]
Implicit neural representations have become pivotal in robotic perception, enabling robots to comprehend 3D environments from 2D images. We propose a framework called bundle-adjusting accelerated neural graphics primitives (BAA-NGP) Results demonstrate 10 to 20 x speed improvement compared to other bundle-adjusting neural radiance field methods.
arXiv Detail & Related papers (2023-06-07T05:36:45Z)
External Camera-based Mobile Robot Pose Estimation for Collaborative Perception with Smart Edge Sensors [22.5939915003931]
We present an approach for estimating a mobile robot's pose w.r.t. the allocentric coordinates of a network of static cameras using multi-view RGB images. The images are processed online, locally on smart edge sensors by deep neural networks to detect the robot. With the robot's pose precisely estimated, its observations can be fused into the allocentric scene model.
arXiv Detail & Related papers (2023-03-07T11:03:33Z)
TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos [57.92385818430939]
Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones. Existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices. We propose a simple yet effective framework, itTransVisDrone, that provides an end-to-end solution with higher computational efficiency.
arXiv Detail & Related papers (2022-10-16T03:05:13Z)
Deep Learning on Home Drone: Searching for the Optimal Architecture [54.535788447839884]
We suggest the first system that runs real-time semantic segmentation via deep learning on a weak micro-computer such as the Raspberry Pi Zero v2 attached to a toy-drone. In particular, since the Raspberry Pi weighs less than $16$ grams, and its size is half of a credit card, we could easily attach it to the common commercial DJI Tello toy-drone. The result is an autonomous drone that can detect and classify objects in real-time from a video stream of an on-board monocular RGB camera.
arXiv Detail & Related papers (2022-09-21T11:41:45Z)
Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images. Our aim is to generate high-resolution images and videos from novel viewpoints. We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z)
Fast Autofocusing using Tiny Networks for Digital Holographic Microscopy [0.5057148335041798]
A deep learning (DL) solution is proposed to cast the autofocusing as a regression problem and tested over both experimental and simulated holograms. Experiments show that the predicted focusing distance $Z_RmathrmPred$ is accurately inferred with an accuracy of 1.2 $mu$m. Models reach state of the art inference time on CPU, less than 25 ms per inference.
arXiv Detail & Related papers (2022-03-15T10:52:58Z)
AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation [51.17610485589701]
We present a novel markerless 3D human motion capture (MoCap) system for unstructured, outdoor environments. AirPose estimates human pose and shape using images captured by multiple uncalibrated flying cameras. AirPose itself calibrates the cameras relative to the person instead of relying on any pre-calibration.
arXiv Detail & Related papers (2022-01-20T09:46:20Z)
FLEX: Parameter-free Multi-view 3D Human Motion Reconstruction [70.09086274139504]
Multi-view algorithms strongly depend on camera parameters, in particular, the relative positions among the cameras. We introduce FLEX, an end-to-end parameter-free multi-view model. We demonstrate results on the Human3.6M and KTH Multi-view Football II datasets.
arXiv Detail & Related papers (2021-05-05T09:08:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.