DronePose: Photorealistic UAV-Assistant Dataset Synthesis for 3D Pose
Estimation via a Smooth Silhouette Loss
- URL: http://arxiv.org/abs/2008.08823v2
- Date: Fri, 21 Aug 2020 06:14:34 GMT
- Title: DronePose: Photorealistic UAV-Assistant Dataset Synthesis for 3D Pose
Estimation via a Smooth Silhouette Loss
- Authors: Georgios Albanis, Nikolaos Zioulis, Anastasios Dimou, Dimitrios
Zarpalas, Petros Daras
- Abstract summary: 3D localisation of the UAV assistant is an important task that can facilitate the exchange of spatial information between the user and the UAV.
We design a data synthesis pipeline to create a realistic multimodal dataset that includes both the exocentric user view, and the egocentric UAV view.
We then exploit the joint availability of photorealistic and synthesized inputs to train a single-shot monocular pose estimation model.
- Score: 27.58747838557417
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work we consider UAVs as cooperative agents supporting human users in
their operations. In this context, the 3D localisation of the UAV assistant is
an important task that can facilitate the exchange of spatial information
between the user and the UAV. To address this in a data-driven manner, we
design a data synthesis pipeline to create a realistic multimodal dataset that
includes both the exocentric user view, and the egocentric UAV view. We then
exploit the joint availability of photorealistic and synthesized inputs to
train a single-shot monocular pose estimation model. During training we
leverage differentiable rendering to supplement a state-of-the-art direct
regression objective with a novel smooth silhouette loss. Our results
demonstrate its qualitative and quantitative performance gains over traditional
silhouette objectives. Our data and code are available at
https://vcl3d.github.io/DronePose
Related papers
- UAVScenes: A Multi-Modal Dataset for UAVs [45.752766099526525]
UAVScenes is a large-scale dataset designed to benchmark various tasks across both 2D and 3D modalities.<n>We enhance this dataset by providing manually labeled semantic annotations for both frame-wise images and LiDAR point clouds.<n>These additions enable a wide range of UAV perception tasks, including segmentation, depth estimation, 6-DoF localization, place recognition, and novel view synthesis.
arXiv Detail & Related papers (2025-07-30T06:29:52Z) - UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery using Gaussian Splatting [54.883935964137706]
We introduce UAV4D, a framework for enabling photorealistic rendering for dynamic real-world scenes captured by UAVs.<n>We use a combination of a 3D foundation model and a human mesh reconstruction model to reconstruct both the scene background and humans.<n>Our results demonstrate the benefits of our approach over existing methods in novel view synthesis, achieving a 1.5 dB PSNR improvement and superior visual sharpness.
arXiv Detail & Related papers (2025-06-05T13:21:09Z) - UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting [57.63613048492219]
We present UAVTwin, a method for creating digital twins from real-world environments and facilitating data augmentation for training downstream models embedded in unmanned aerial vehicles (UAVs)
This is achieved by integrating 3D Gaussian Splatting (3DGS) for reconstructing backgrounds along with controllable synthetic human models that display diverse appearances and actions in multiple poses.
arXiv Detail & Related papers (2025-04-02T22:17:30Z) - Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of Real Vehicles [81.29018359825872]
This paper consolidates a set of good practices to finetune large pretrained models for a real-world task.
Specifically, we develop several strategies to account for discrepancies between the synthetic data and real driving data.
Our insights lead to effective finetuning that results in a $68.8%$ reduction in FID for novel view synthesis over prior arts.
arXiv Detail & Related papers (2024-12-19T03:39:13Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance.
First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds.
Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - UAV-Sim: NeRF-based Synthetic Data Generation for UAV-based Perception [62.71374902455154]
We leverage recent advancements in neural rendering to improve static and dynamic novelview UAV-based image rendering.
We demonstrate a considerable performance boost when a state-of-the-art detection model is optimized primarily on hybrid sets of real and synthetic data.
arXiv Detail & Related papers (2023-10-25T00:20:37Z) - UAVStereo: A Multiple Resolution Dataset for Stereo Matching in UAV
Scenarios [0.6524460254566905]
This paper constructs a multi-resolution UAV scenario dataset, called UAVStereo, with over 34k stereo image pairs covering 3 typical scenes.
In this paper, we evaluate traditional and state-of-the-art deep learning methods, highlighting their limitations in addressing challenges in UAV scenarios.
arXiv Detail & Related papers (2023-02-20T16:45:27Z) - Archangel: A Hybrid UAV-based Human Detection Benchmark with Position
and Pose Metadata [10.426019628829204]
Archangel is the first UAV-based object detection dataset composed of real and synthetic subsets.
A series of experiments are carefully designed with a state-of-the-art object detector to demonstrate the benefits of leveraging the metadata.
arXiv Detail & Related papers (2022-08-31T21:45:16Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z) - Real-Time Hybrid Mapping of Populated Indoor Scenes using a Low-Cost
Monocular UAV [42.850288938936075]
We present the first system to perform simultaneous mapping and multi-person 3D human pose estimation from a monocular camera mounted on a single UAV.
In particular, we show how to loosely couple state-of-the-art monocular depth estimation and monocular 3D human pose estimation approaches to reconstruct a hybrid map of a populated indoor scene in real time.
arXiv Detail & Related papers (2022-03-04T17:31:26Z) - Vision-Based UAV Self-Positioning in Low-Altitude Urban Environments [20.69412701553767]
Unmanned Aerial Vehicles (UAVs) rely on satellite systems for stable positioning.
In such situations, vision-based techniques can serve as an alternative, ensuring the self-positioning capability of UAVs.
This paper presents a new dataset, DenseUAV, which is the first publicly available dataset designed for the UAV self-positioning task.
arXiv Detail & Related papers (2022-01-23T07:18:55Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.