PanGEA: The Panoramic Graph Environment Annotation Toolkit
- URL: http://arxiv.org/abs/2103.12703v1
- Date: Tue, 23 Mar 2021 17:24:12 GMT
- Title: PanGEA: The Panoramic Graph Environment Annotation Toolkit
- Authors: Alexander Ku and Peter Anderson and Jordi Pont-Tuset and Jason
Baldridge
- Abstract summary: PanGEA is a toolkit for collecting speech and text annotations in photo-realistic 3D environments.
PanGEA immerses annotators in a web-based simulation and allows them to move around easily as they speak and/or listen.
- Score: 83.12648898284048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: PanGEA, the Panoramic Graph Environment Annotation toolkit, is a lightweight
toolkit for collecting speech and text annotations in photo-realistic 3D
environments. PanGEA immerses annotators in a web-based simulation and allows
them to move around easily as they speak and/or listen. It includes database
and cloud storage integration, plus utilities for automatically aligning
recorded speech with manual transcriptions and the virtual pose of the
annotators. Out of the box, PanGEA supports two tasks -- collecting navigation
instructions and navigation instruction following -- and it could be easily
adapted for annotating walking tours, finding and labeling landmarks or
objects, and similar tasks. We share best practices learned from using PanGEA
in a 20,000 hour annotation effort to collect the Room-Across-Room dataset. We
hope that our open-source annotation toolkit and insights will both expedite
future data collection efforts and spur innovation on the kinds of grounded
language tasks such environments can support.
Related papers
- Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Interactive Navigation in Environments with Traversable Obstacles Using
Large Language and Vision-Language Models [14.871309526022516]
This paper proposes an interactive navigation framework by using large language and vision-language models.
We create an action-aware costmap to perform effective path planning without fine-tuning.
All experimental results demonstrated the proposed framework's effectiveness and adaptability to diverse environments.
arXiv Detail & Related papers (2023-10-13T05:59:03Z) - PanoGen: Text-Conditioned Panoramic Environment Generation for
Vision-and-Language Navigation [96.8435716885159]
Vision-and-Language Navigation (VLN) requires the agent to follow language instructions to navigate through 3D environments.
One main challenge in VLN is the limited availability of training environments, which makes it hard to generalize to new and unseen environments.
We propose PanoGen, a generation method that can potentially create an infinite number of diverse panoramic environments conditioned on text.
arXiv Detail & Related papers (2023-05-30T16:39:54Z) - Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
Navigation [87.52136927091712]
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions.
To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.
We propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively.
arXiv Detail & Related papers (2022-10-14T04:23:27Z) - CLEAR: Improving Vision-Language Navigation with Cross-Lingual,
Environment-Agnostic Representations [98.30038910061894]
Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the environment based on language instructions.
We propose CLEAR: Cross-Lingual and Environment-Agnostic Representations.
Our language and visual representations can be successfully transferred to the Room-to-Room and Cooperative Vision-and-Dialogue Navigation task.
arXiv Detail & Related papers (2022-07-05T17:38:59Z) - ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in
Dynamic Environments [85.81157224163876]
We combine Vision-and-Language Navigation, assembling of collected objects, and object referring expression comprehension, to create a novel joint navigation-and-assembly task, named ArraMon.
During this task, the agent is asked to find and collect different target objects one-by-one by navigating based on natural language instructions in a complex, realistic outdoor environment.
We present results for several baseline models (integrated and biased) and metrics (nDTW, CTC, rPOD, and PTC), and the large model-human performance gap demonstrates that our task is challenging and presents a wide scope for future work.
arXiv Detail & Related papers (2020-11-15T23:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.