Hestia: Hierarchical Next-Best-View Exploration for Systematic Intelligent Autonomous Data Collection
- URL: http://arxiv.org/abs/2508.01014v1
- Date: Fri, 01 Aug 2025 18:27:23 GMT
- Title: Hestia: Hierarchical Next-Best-View Exploration for Systematic Intelligent Autonomous Data Collection
- Authors: Cheng-You Lu, Zhuoli Zhuang, Nguyen Thanh Trung Le, Da Xiao, Yu-Cheng Chang, Thomas Do, Srinath Sridhar, Chin-teng Lin,
- Abstract summary: This study introduces Hierarchical Next-Best-View Exploration for Systematic Intelligent Autonomous Data Collection (Hestia)<n>Hestia systematically defines the next-best-view task by proposing core components such as dataset choice, observation design, action space, reward calculation, and learning schemes.<n> Experimental results show that Hestia performs robustly across three datasets translated and object settings in the NVIDIA IsaacLab environment.
- Score: 23.427212631082025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in 3D reconstruction and novel view synthesis have enabled efficient, photorealistic rendering, but the data collection process remains largely manual, making it time-consuming and labor-intensive. To address the challenges, this study introduces Hierarchical Next-Best-View Exploration for Systematic Intelligent Autonomous Data Collection (Hestia), which leverages reinforcement learning to learn a generalizable policy for 5-DoF next-best viewpoint prediction. Unlike prior approaches, Hestia systematically defines the next-best-view task by proposing core components such as dataset choice, observation design, action space, reward calculation, and learning schemes, forming a foundation for the planner. Hestia goes beyond prior next-best-view approaches and traditional capture systems through integration and validation in a real-world setup, where a drone serves as a mobile sensor for active scene exploration. Experimental results show that Hestia performs robustly across three datasets and translated object settings in the NVIDIA IsaacLab environment, and proves feasible for real-world deployment.
Related papers
- VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions [12.739233840342958]
VOccl3D is a Video-based human Occlusion dataset with 3D body pose and shape annotations.<n>Inspired by works such as AGORA and BEDLAM, we constructed this dataset using advanced computer graphics rendering techniques.
arXiv Detail & Related papers (2025-08-09T00:13:46Z) - A large-scale, physically-based synthetic dataset for satellite pose estimation [0.0]
This paper introduces the DLVS3-HST-V1 dataset, which focuses on the Hubble Space Telescope (HST) as a complex, articulated target.<n>The dataset is generated using advanced real-time and offline rendering technologies, integrating high-fidelity 3D models, dynamic lighting, and physically accurate material properties.<n>The pipeline supports the creation of large-scale, richly annotated image sets with ground-truth 6-DoF pose and keypoint data, semantic segmentation, depth, and normal maps.
arXiv Detail & Related papers (2025-06-15T09:24:32Z) - Spatial Understanding from Videos: Structured Prompts Meet Simulation Data [79.52833996220059]
We present a unified framework for enhancing 3D spatial reasoning in pre-trained vision-language models without modifying their architecture.<n>This framework combines SpatialMind, a structured prompting strategy that decomposes complex scenes and questions into interpretable reasoning steps, with ScanForgeQA, a scalable question-answering dataset built from diverse 3D simulation scenes.
arXiv Detail & Related papers (2025-06-04T07:36:33Z) - FrontierNet: Learning Visual Cues to Explore [54.8265603996238]
This work aims at leveraging 2D visual cues for efficient autonomous exploration, addressing the limitations of extracting goal poses from a 3D map.<n>We propose a visual-only frontier-based exploration system, with FrontierNet as its core component.<n>Our approach provides an alternative to existing 3D-dependent goal-extraction approaches, achieving a 15% improvement in early-stage exploration efficiency.
arXiv Detail & Related papers (2025-01-08T16:25:32Z) - Open-World Drone Active Tracking with Goal-Centered Rewards [62.21394499788672]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.<n>We propose DAT, the first open-world drone active air-to-ground tracking benchmark.<n>We also propose GC-VAT, which aims to improve the performance of drone tracking targets in complex scenarios.
arXiv Detail & Related papers (2024-12-01T09:37:46Z) - Segmentation-aware Prior Assisted Joint Global Information Aggregated 3D Building Reconstruction [6.839442579589125]
Multi-View Stereo plays a pivotal role in civil engineering by facilitating 3D modeling, precise engineering surveying, quantitative analysis, as well as monitoring and maintenance.
However, Multi-View Stereo algorithms encounter challenges in reconstructing weakly-textured regions within large-scale building scenes.
In these areas, the stereo matching of pixels often fails, leading to inaccurate depth estimations.
We propose an algorithm that accurately segments weakly-textured regions and constructs their plane priors.
This function selects optimal plane prior information based on global information in the prior candidate set, constrained by geometric consistency during the depth estimation update process.
arXiv Detail & Related papers (2024-10-24T04:59:44Z) - OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal
Feature Learning [132.20119288212376]
We propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.
To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system.
arXiv Detail & Related papers (2022-07-15T16:57:43Z) - Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR.
Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking.
Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.