On the Evaluation of Vision-and-Language Navigation Instructions
- URL: http://arxiv.org/abs/2101.10504v1
- Date: Tue, 26 Jan 2021 01:03:49 GMT
- Title: On the Evaluation of Vision-and-Language Navigation Instructions
- Authors: Ming Zhao, Peter Anderson, Vihan Jain, Su Wang, Alexander Ku, Jason
Baldridge, Eugene Ie
- Abstract summary: Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions.
Existing instruction generators have not been comprehensively evaluated.
BLEU, ROUGE, METEOR and CIDEr are ineffective for evaluating grounded navigation instructions.
- Score: 76.92085026018427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-and-Language Navigation wayfinding agents can be enhanced by
exploiting automatically generated navigation instructions. However, existing
instruction generators have not been comprehensively evaluated, and the
automatic evaluation metrics used to develop them have not been validated.
Using human wayfinders, we show that these generators perform on par with or
only slightly better than a template-based generator and far worse than human
instructors. Furthermore, we discover that BLEU, ROUGE, METEOR and CIDEr are
ineffective for evaluating grounded navigation instructions. To improve
instruction evaluation, we propose an instruction-trajectory compatibility
model that operates without reference instructions. Our model shows the highest
correlation with human wayfinding outcomes when scoring individual
instructions. For ranking instruction generation systems, if reference
instructions are available we recommend using SPICE.
Related papers
- Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation [8.931633531104021]
SAS (Spatially-Aware Speaker) is an instruction generator that uses both structural and semantic knowledge of the environment to produce richer instructions.
Our method outperforms existing instruction generation models, evaluated using standard metrics.
arXiv Detail & Related papers (2024-09-09T13:12:11Z) - Evaluation of Instruction-Following Ability for Large Language Models on Story-Ending Generation [2.4889060833127665]
In this paper, we focus on evaluating the instruction-following ability of Large Language Models (LLMs) in the context of story-ending generation.
We propose an automatic evaluation pipeline that utilizes a machine reading comprehension (MRC) model to determine whether the generated story-ending reflects instruction.
arXiv Detail & Related papers (2024-06-24T06:53:36Z) - Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis [51.04181562775778]
We present a novel approach to automatically synthesize "wayfinding instructions" for an embodied robot agent.
Our algorithm uses in-context learning to condition an LLM to generate instructions using just a few references.
We implement our approach on multiple simulation platforms including Matterport3D, AI Habitat and ThreeDWorld.
arXiv Detail & Related papers (2024-03-18T05:38:07Z) - Lana: A Language-Capable Navigator for Instruction Following and
Generation [70.76686546473994]
LANA is a language-capable navigation agent which is able to execute human-written navigation commands and provide route descriptions to humans.
We empirically verify that, compared with recent advanced task-specific solutions, LANA attains better performances on both instruction following and route description.
In addition, endowed with language generation capability, LANA can explain to humans its behaviors and assist human's wayfinding.
arXiv Detail & Related papers (2023-03-15T07:21:28Z) - Self-Instruct: Aligning Language Models with Self-Generated Instructions [76.42871502364697]
Self-Instruct is a framework for improving the instruction-following capabilities of pretrained language models.
Our pipeline generates instructions, input, and output samples from a language model, then filters invalid or similar ones before using them to finetune the original model.
For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin.
arXiv Detail & Related papers (2022-12-20T18:59:19Z) - FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation [45.99831101677059]
We present textscfoam, a textscFollower-textscaware speaker textscModel that is constantly updated given the follower feedback.
We optimize the speaker using a bi-level optimization framework and obtain its training signals by evaluating the follower on labeled data.
arXiv Detail & Related papers (2022-06-09T06:11:07Z) - Counterfactual Cycle-Consistent Learning for Instruction Following and
Generation in Vision-Language Navigation [172.15808300686584]
We describe an approach that learns the two tasks simultaneously and exploits their intrinsic correlations to boost the training of each.
Our approach improves the performance of various follower models and produces accurate navigation instructions.
arXiv Detail & Related papers (2022-03-30T18:15:26Z) - Adversarial Reinforced Instruction Attacker for Robust Vision-Language
Navigation [145.84123197129298]
Language instruction plays an essential role in the natural language grounded navigation tasks.
We exploit to train a more robust navigator which is capable of dynamically extracting crucial factors from the long instruction.
Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target.
arXiv Detail & Related papers (2021-07-23T14:11:31Z) - Sub-Instruction Aware Vision-and-Language Navigation [46.99329933894108]
Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions.
We focus on the granularity of the visual and language sequences as well as the traceability of agents through the completion of an instruction.
We propose effective sub-instruction attention and shifting modules that select and attend to a single sub-instruction at each time-step.
arXiv Detail & Related papers (2020-04-06T14:44:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.