Related papers: GoToNet: Fast Monocular Scene Exposure and Exploration

GoToNet: Fast Monocular Scene Exposure and Exploration

URL: http://arxiv.org/abs/2206.05967v1
Date: Mon, 13 Jun 2022 08:28:31 GMT
Title: GoToNet: Fast Monocular Scene Exposure and Exploration
Authors: Tom Avrech, Evgenii Zheltonozhskii, Chaim Baskin, Ehud Rivlin
Abstract summary: We present a novel method for real-time environment exploration. Our method requires only one look (image) to make a good tactical decision. Two direction predictions, characterized by pixels dubbed the Goto and Lookat pixels, comprise the core of our method.
Score: 0.6204265638103346
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Autonomous scene exposure and exploration, especially in localization or communication-denied areas, useful for finding targets in unknown scenes, remains a challenging problem in computer navigation. In this work, we present a novel method for real-time environment exploration, whose only requirements are a visually similar dataset for pre-training, enough lighting in the scene, and an on-board forward-looking RGB camera for environmental sensing. As opposed to existing methods, our method requires only one look (image) to make a good tactical decision, and therefore works at a non-growing, constant time. Two direction predictions, characterized by pixels dubbed the Goto and Lookat pixels, comprise the core of our method. These pixels encode the recommended flight instructions in the following way: the Goto pixel defines the direction in which the agent should move by one distance unit, and the Lookat pixel defines the direction in which the camera should be pointing at in the next step. These flying-instruction pixels are optimized to expose the largest amount of currently unexplored areas. Our method presents a novel deep learning-based navigation approach that is able to solve this problem and demonstrate its ability in an even more complicated setup, i.e., when computational power is limited. In addition, we propose a way to generate a navigation-oriented dataset, enabling efficient training of our method using RGB and depth images. Tests conducted in a simulator evaluating both the sparse pixels' coordinations inferring process, and 2D and 3D test flights aimed to unveil areas and decrease distances to targets achieve promising results. Comparison against a state-of-the-art algorithm shows our method is able to overperform it, that while measuring the new voxels per camera pose, minimum distance to target, percentage of surface voxels seen, and compute time metrics.

Related papers

Real-Time Navigation for Autonomous Aerial Vehicles Using Video [11.414350041043326]
We introduce a novel Markov Decision Process(MDP) framework to reduce the workload of Computer Vision(CV) algorithms. We apply our proposed framework to both feature-based and neural-network-based object-detection tasks. These holistic tests show significant benefits in energy consumption and speed with only a limited loss in accuracy.
arXiv Detail & Related papers (2025-04-01T01:14:42Z)
Towards Real-Time 2D Mapping: Harnessing Drones, AI, and Computer Vision for Advanced Insights [0.0]
This paper presents an advanced mapping system that combines drone imagery with machine learning and computer vision to overcome challenges in speed, accuracy, and adaptability across diverse terrains. The system produces seamless, high-resolution maps with minimal latency, offering strategic advantages in defense operations.
arXiv Detail & Related papers (2024-12-28T16:47:18Z)
SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images [50.742420049839474]
'SaccadeDet' is an innovative architecture for gigapixel-level object detection, inspired by the human eye saccadic movement. Our approach, evaluated on the PANDA dataset, achieves an 8x speed increase over the state-of-the-art methods. It also demonstrates significant potential in gigapixel-level pathology analysis through its application to Whole Slide Imaging.
arXiv Detail & Related papers (2024-07-25T11:22:54Z)
EventTransAct: A video transformer-based framework for Event-camera based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos. In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame. In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z)
R-C-P Method: An Autonomous Volume Calculation Method Using Image Processing and Machine Vision [0.0]
Two cameras were used to measure the dimensions of a rectangular object in real-time. The R-C-P method is developed using image processing and edge detection. In addition to the surface areas, the R-C-P method also detects discontinuous edges or volumes.
arXiv Detail & Related papers (2023-08-19T15:39:27Z)
On the Generation of a Synthetic Event-Based Vision Dataset for Navigation and Landing [69.34740063574921]
This paper presents a methodology for generating event-based vision datasets from optimal landing trajectories. We construct sequences of photorealistic images of the lunar surface with the Planet and Asteroid Natural Scene Generation Utility. We demonstrate that the pipeline can generate realistic event-based representations of surface features by constructing a dataset of 500 trajectories.
arXiv Detail & Related papers (2023-08-01T09:14:20Z)
Depth Monocular Estimation with Attention-based Encoder-Decoder Network from Single Image [7.753378095194288]
Vision-based approaches have recently received much attention and can overcome these drawbacks. In this work, we explore an extreme scenario in vision-based settings: estimate a depth map from one monocular image severely plagued by grid artifacts and blurry edges. Our novel approach can find the focus of current image with minimal overhead and avoid losses of depth features.
arXiv Detail & Related papers (2022-10-24T23:01:25Z)
VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images [90.60881721134656]
We propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT) Experiments on KITTI Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values.
arXiv Detail & Related papers (2022-06-06T14:02:06Z)
Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge TPU [58.720142291102135]
In this paper we propose a pose estimation software exploiting neural network architectures. We show how low power machine learning accelerators could enable Artificial Intelligence exploitation in space.
arXiv Detail & Related papers (2022-04-07T08:53:18Z)
Memory-Augmented Reinforcement Learning for Image-Goal Navigation [67.3963444878746]
We present a novel method that leverages a cross-episode memory to learn to navigate. In order to avoid overfitting, we propose to use data augmentation on the RGB input during training. We obtain this competitive performance from RGB input only, without access to additional sensors such as position or depth.
arXiv Detail & Related papers (2021-01-13T16:30:20Z)
Agile Reactive Navigation for A Non-Holonomic Mobile Robot Using A Pixel Processor Array [22.789108850681146]
This paper presents an agile reactive navigation strategy for driving a non-holonomic ground vehicle around a preset course of gates in a cluttered environment using a low-cost processor array sensor. We demonstrate a small ground vehicle running through or avoiding multiple gates at high speed using minimal computational resources.
arXiv Detail & Related papers (2020-09-27T09:11:31Z)
Semantic sensor fusion: from camera to sparse lidar information [7.489722641968593]
This paper presents an approach to fuse different sensory information, Light Detection and Ranging (lidar) scans and camera images. The transference of semantic information between the labelled image and the lidar point cloud is performed in four steps.
arXiv Detail & Related papers (2020-03-04T03:09:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.