Embodied Navigation at the Art Gallery
- URL: http://arxiv.org/abs/2204.09069v1
- Date: Tue, 19 Apr 2022 18:00:06 GMT
- Title: Embodied Navigation at the Art Gallery
- Authors: Roberto Bigazzi, Federico Landi, Silvia Cascianelli, Marcella Cornia,
Lorenzo Baraldi and Rita Cucchiara
- Abstract summary: We build and release a new 3D space with unique characteristics: the one of a complete art museum.
Compared with existing 3D scenes, the collected space is ampler, richer in visual features, and provides very sparse occupancy information.
We deliver a new benchmark for PointGoal navigation inside this new space.
- Score: 43.52107532692226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied agents, trained to explore and navigate indoor photorealistic
environments, have achieved impressive results on standard datasets and
benchmarks. So far, experiments and evaluations have involved domestic and
working scenes like offices, flats, and houses. In this paper, we build and
release a new 3D space with unique characteristics: the one of a complete art
museum. We name this environment ArtGallery3D (AG3D). Compared with existing 3D
scenes, the collected space is ampler, richer in visual features, and provides
very sparse occupancy information. This feature is challenging for
occupancy-based agents which are usually trained in crowded domestic
environments with plenty of occupancy information. Additionally, we annotate
the coordinates of the main points of interest inside the museum, such as
paintings, statues, and other items. Thanks to this manual process, we deliver
a new benchmark for PointGoal navigation inside this new space. Trajectories in
this dataset are far more complex and lengthy than existing ground-truth paths
for navigation in Gibson and Matterport3D. We carry on extensive experimental
evaluation using our new space for evaluation and prove that existing methods
hardly adapt to this scenario. As such, we believe that the availability of
this 3D model will foster future research and help improve existing solutions.
Related papers
- NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization [80.3424839706698]
We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering.
Our approach rests on insights in learning a category-level shape prior directly from real driving scenes.
We make critical design choices to learn object coordinates more effectively from an object-centric view.
arXiv Detail & Related papers (2023-05-28T16:18:41Z) - Generating Visual Spatial Description via Holistic 3D Scene
Understanding [88.99773815159345]
Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images.
With an external 3D scene extractor, we obtain the 3D objects and scene features for input images.
We construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes.
arXiv Detail & Related papers (2023-05-19T15:53:56Z) - Visual Localization using Imperfect 3D Models from the Internet [54.731309449883284]
This paper studies how imperfections in 3D models affect localization accuracy.
We show that 3D models from the Internet show promise as an easy-to-obtain scene representation.
arXiv Detail & Related papers (2023-04-12T16:15:05Z) - WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language [31.691159120136064]
We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data.
We present a novel method, dubbed WildRefer, for this task by fully utilizing the rich appearance information in images, the position and geometric clues in point cloud.
Our datasets are significant for the research of 3D visual grounding in the wild and has huge potential to boost the development of autonomous driving and service robots.
arXiv Detail & Related papers (2023-04-12T06:48:26Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - A Comparison of Spatiotemporal Visualizations for 3D Urban Analytics [7.157706457130007]
This paper investigates how effective 3D urban visual analytics are at supportingtemporal analysis on building surfaces.
We compare four representative visual designs used to visualize 3Dtemporal urban data: spatial juxtaposition, temporal juxtaposition, linked view, and embedded view.
Our results demonstrate that participants were more accurate using plot-based visualizations but faster using colorcoded visualizations.
arXiv Detail & Related papers (2022-08-10T14:38:13Z) - Roominoes: Generating Novel 3D Floor Plans From Existing 3D Rooms [22.188206636953794]
We propose the task of generating novel 3D floor plans from existing 3D rooms.
One uses available 2D floor plans to guide selection and deformation of 3D rooms; the other learns to retrieve a set of compatible 3D rooms and combine them into novel layouts.
arXiv Detail & Related papers (2021-12-10T16:17:01Z) - Walk2Map: Extracting Floor Plans from Indoor Walk Trajectories [23.314557741879664]
We present Walk2Map, a data-driven approach to generate floor plans from trajectories of a person walking inside the rooms.
Thanks to advances in data-driven inertial odometry, such minimalistic input data can be acquired from the IMU readings of consumer-level smartphones.
We train our networks using scanned 3D indoor models and apply them in a cascaded fashion on an indoor walk trajectory.
arXiv Detail & Related papers (2021-02-27T16:29:09Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z) - SAILenv: Learning in Virtual Visual Environments Made Simple [16.979621213790015]
We present a novel platform that allows researchers to experiment visual recognition in virtual 3D scenes.
A few lines of code are needed to interface every algorithm with the virtual world, and non-3D-graphics experts can easily customize the 3D environment itself.
Our framework yields pixel-level semantic and instance labeling, depth, and, to the best of our knowledge, it is the only one that provides motion-related information directly inherited from the 3D engine.
arXiv Detail & Related papers (2020-07-16T09:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.