Radiance Field Learners As UAV First-Person Viewers
- URL: http://arxiv.org/abs/2408.05533v1
- Date: Sat, 10 Aug 2024 12:29:11 GMT
- Title: Radiance Field Learners As UAV First-Person Viewers
- Authors: Liqi Yan, Qifan Wang, Junhan Zhao, Qiang Guan, Zheng Tang, Jianhui Zhang, Dongfang Liu,
- Abstract summary: First-Person-View (FPV) holds immense potential for revolutionizing the trajectory of Unmanned Aerial Vehicles (UAVs)
Traditional Neural Radiance Field (NeNeRF) methods face challenges such as sampling single points per granularity.
We introduce FPV-NeRF, addressing these challenges through three key facets.
- Score: 36.59524833437512
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: First-Person-View (FPV) holds immense potential for revolutionizing the trajectory of Unmanned Aerial Vehicles (UAVs), offering an exhilarating avenue for navigating complex building structures. Yet, traditional Neural Radiance Field (NeRF) methods face challenges such as sampling single points per iteration and requiring an extensive array of views for supervision. UAV videos exacerbate these issues with limited viewpoints and significant spatial scale variations, resulting in inadequate detail rendering across diverse scales. In response, we introduce FPV-NeRF, addressing these challenges through three key facets: (1) Temporal consistency. Leveraging spatio-temporal continuity ensures seamless coherence between frames; (2) Global structure. Incorporating various global features during point sampling preserves space integrity; (3) Local granularity. Employing a comprehensive framework and multi-resolution supervision for multi-scale scene feature representation tackles the intricacies of UAV video spatial scales. Additionally, due to the scarcity of publicly available FPV videos, we introduce an innovative view synthesis method using NeRF to generate FPV perspectives from UAV footage, enhancing spatial perception for drones. Our novel dataset spans diverse trajectories, from outdoor to indoor environments, in the UAV domain, differing significantly from traditional NeRF scenarios. Through extensive experiments encompassing both interior and exterior building structures, FPV-NeRF demonstrates a superior understanding of the UAV flying space, outperforming state-of-the-art methods in our curated UAV dataset. Explore our project page for further insights: https://fpv-nerf.github.io/.
Related papers
- On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration [0.7373617024876724]
Unmanned aerial vehicles (UAVs) have become increasingly popular in various fields, including precision agriculture, search and rescue, and remote sensing.
This study aims to address this challenge by utilizing on-policy Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) to explore the two dimensional area of interest with multiple UAVs.
The proposed solution includes actor-critic networks using deep convolutional neural networks (CNN) and long short-term memory (LSTM) for identifying the UAVs and areas that have already been covered.
arXiv Detail & Related papers (2024-09-17T10:36:46Z) - Aerial-NeRF: Adaptive Spatial Partitioning and Sampling for Large-Scale Aerial Rendering [10.340739248752516]
We propose Aerial-NeRF to render complex aerial scenes with high-precision.
Our model allows us to perform rendering over 4 times as fast as compared to multiple competitors.
New state-of-the-art results have been achieved on two public large-scale aerial datasets.
arXiv Detail & Related papers (2024-05-10T02:57:02Z) - Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve
Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives.
MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes.
This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z) - F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera
Trajectories [100.37377892779654]
This paper presents a novel grid-based NeRF called F2-NeRF (Fast-Free-NeRF) for novel view synthesis.
F2-NeRF enables arbitrary input camera trajectories and only costs a few minutes for training.
arXiv Detail & Related papers (2023-03-28T13:09:44Z) - Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction [84.94140661523956]
We propose a tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes.
We model each point in the 3D space by summing its projected features on the three planes.
Experiments show that our model trained with sparse supervision effectively predicts the semantic occupancy for all voxels.
arXiv Detail & Related papers (2023-02-15T17:58:10Z) - Beyond the Field-of-View: Enhancing Scene Visibility and Perception with Clip-Recurrent Transformer [28.326852785609788]
FlowLens architecture explicitly employs optical flow and implicitly incorporates a novel clip-recurrent transformer for feature propagation.
In this paper, we propose the concept of online video inpainting for autonomous vehicles to expand the field of view.
Experiments and user studies involving offline and online video inpainting, as well as beyondFo-V perception tasks, demonstrate that Flows achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-11-21T09:34:07Z) - Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes.
We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z) - Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual
Fly-Throughs [54.41204057689033]
We explore how to leverage neural fields (NeRFs) to build interactive 3D environments from large-scale visual captures spanning buildings or even multiple city blocks collected primarily from drone data.
In contrast to the single object scenes against which NeRFs have been traditionally evaluated, this setting poses multiple challenges.
We introduce a simple clustering algorithm that partitions training images (or rather pixels) into different NeRF submodules that can be trained in parallel.
arXiv Detail & Related papers (2021-12-20T17:40:48Z) - Aerial-PASS: Panoramic Annular Scene Segmentation in Drone Videos [15.244418294614857]
We design a UAV system with a Panoramic Annular Lens (PAL), which has the characteristics of small size, low weight, and a 360-degree annular FoV.
A lightweight panoramic annular semantic segmentation neural network model is designed to achieve high-accuracy and real-time scene parsing.
A comprehensive variety of experiments shows that the designed system performs satisfactorily in aerial panoramic scene parsing.
arXiv Detail & Related papers (2021-05-15T12:01:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.