GoToNet: Fast Monocular Scene Exposure and Exploration
- URL: http://arxiv.org/abs/2206.05967v1
- Date: Mon, 13 Jun 2022 08:28:31 GMT
- Title: GoToNet: Fast Monocular Scene Exposure and Exploration
- Authors: Tom Avrech, Evgenii Zheltonozhskii, Chaim Baskin, Ehud Rivlin
- Abstract summary: We present a novel method for real-time environment exploration.
Our method requires only one look (image) to make a good tactical decision.
Two direction predictions, characterized by pixels dubbed the Goto and Lookat pixels, comprise the core of our method.
- Score: 0.6204265638103346
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Autonomous scene exposure and exploration, especially in localization or
communication-denied areas, useful for finding targets in unknown scenes,
remains a challenging problem in computer navigation. In this work, we present
a novel method for real-time environment exploration, whose only requirements
are a visually similar dataset for pre-training, enough lighting in the scene,
and an on-board forward-looking RGB camera for environmental sensing. As
opposed to existing methods, our method requires only one look (image) to make
a good tactical decision, and therefore works at a non-growing, constant time.
Two direction predictions, characterized by pixels dubbed the Goto and Lookat
pixels, comprise the core of our method. These pixels encode the recommended
flight instructions in the following way: the Goto pixel defines the direction
in which the agent should move by one distance unit, and the Lookat pixel
defines the direction in which the camera should be pointing at in the next
step. These flying-instruction pixels are optimized to expose the largest
amount of currently unexplored areas.
Our method presents a novel deep learning-based navigation approach that is
able to solve this problem and demonstrate its ability in an even more
complicated setup, i.e., when computational power is limited. In addition, we
propose a way to generate a navigation-oriented dataset, enabling efficient
training of our method using RGB and depth images. Tests conducted in a
simulator evaluating both the sparse pixels' coordinations inferring process,
and 2D and 3D test flights aimed to unveil areas and decrease distances to
targets achieve promising results. Comparison against a state-of-the-art
algorithm shows our method is able to overperform it, that while measuring the
new voxels per camera pose, minimum distance to target, percentage of surface
voxels seen, and compute time metrics.
Related papers
- SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images [50.742420049839474]
'SaccadeDet' is an innovative architecture for gigapixel-level object detection, inspired by the human eye saccadic movement.
Our approach, evaluated on the PANDA dataset, achieves an 8x speed increase over the state-of-the-art methods.
It also demonstrates significant potential in gigapixel-level pathology analysis through its application to Whole Slide Imaging.
arXiv Detail & Related papers (2024-07-25T11:22:54Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - R-C-P Method: An Autonomous Volume Calculation Method Using Image
Processing and Machine Vision [0.0]
Two cameras were used to measure the dimensions of a rectangular object in real-time.
The R-C-P method is developed using image processing and edge detection.
In addition to the surface areas, the R-C-P method also detects discontinuous edges or volumes.
arXiv Detail & Related papers (2023-08-19T15:39:27Z) - On the Generation of a Synthetic Event-Based Vision Dataset for
Navigation and Landing [69.34740063574921]
This paper presents a methodology for generating event-based vision datasets from optimal landing trajectories.
We construct sequences of photorealistic images of the lunar surface with the Planet and Asteroid Natural Scene Generation Utility.
We demonstrate that the pipeline can generate realistic event-based representations of surface features by constructing a dataset of 500 trajectories.
arXiv Detail & Related papers (2023-08-01T09:14:20Z) - Depth Monocular Estimation with Attention-based Encoder-Decoder Network
from Single Image [7.753378095194288]
Vision-based approaches have recently received much attention and can overcome these drawbacks.
In this work, we explore an extreme scenario in vision-based settings: estimate a depth map from one monocular image severely plagued by grid artifacts and blurry edges.
Our novel approach can find the focus of current image with minimal overhead and avoid losses of depth features.
arXiv Detail & Related papers (2022-10-24T23:01:25Z) - VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images [90.60881721134656]
We propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT)
Experiments on KITTI Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values.
arXiv Detail & Related papers (2022-06-06T14:02:06Z) - Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge
TPU [58.720142291102135]
In this paper we propose a pose estimation software exploiting neural network architectures.
We show how low power machine learning accelerators could enable Artificial Intelligence exploitation in space.
arXiv Detail & Related papers (2022-04-07T08:53:18Z) - Memory-Augmented Reinforcement Learning for Image-Goal Navigation [67.3963444878746]
We present a novel method that leverages a cross-episode memory to learn to navigate.
In order to avoid overfitting, we propose to use data augmentation on the RGB input during training.
We obtain this competitive performance from RGB input only, without access to additional sensors such as position or depth.
arXiv Detail & Related papers (2021-01-13T16:30:20Z) - Agile Reactive Navigation for A Non-Holonomic Mobile Robot Using A Pixel
Processor Array [22.789108850681146]
This paper presents an agile reactive navigation strategy for driving a non-holonomic ground vehicle around a preset course of gates in a cluttered environment using a low-cost processor array sensor.
We demonstrate a small ground vehicle running through or avoiding multiple gates at high speed using minimal computational resources.
arXiv Detail & Related papers (2020-09-27T09:11:31Z) - Semantic sensor fusion: from camera to sparse lidar information [7.489722641968593]
This paper presents an approach to fuse different sensory information, Light Detection and Ranging (lidar) scans and camera images.
The transference of semantic information between the labelled image and the lidar point cloud is performed in four steps.
arXiv Detail & Related papers (2020-03-04T03:09:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.