FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
- URL: http://arxiv.org/abs/2206.04294v1
- Date: Thu, 9 Jun 2022 06:11:07 GMT
- Title: FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
- Authors: Zi-Yi Dou, Nanyun Peng
- Abstract summary: We present textscfoam, a textscFollower-textscaware speaker textscModel that is constantly updated given the follower feedback.
We optimize the speaker using a bi-level optimization framework and obtain its training signals by evaluating the follower on labeled data.
- Score: 45.99831101677059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The speaker-follower models have proven to be effective in
vision-and-language navigation, where a speaker model is used to synthesize new
instructions to augment the training data for a follower navigation model.
However, in many of the previous methods, the generated instructions are not
directly trained to optimize the performance of the follower. In this paper, we
present \textsc{foam}, a \textsc{Fo}llower-\textsc{a}ware speaker
\textsc{M}odel that is constantly updated given the follower feedback, so that
the generated instructions can be more suitable to the current learning state
of the follower. Specifically, we optimize the speaker using a bi-level
optimization framework and obtain its training signals by evaluating the
follower on labeled data. Experimental results on the Room-to-Room and
Room-across-Room datasets demonstrate that our methods can outperform strong
baseline models across settings. Analyses also reveal that our generated
instructions are of higher quality than the baselines.
Related papers
- Improving Instruction-Following in Language Models through Activation Steering [58.876600545898675]
We derive instruction-specific vector representations from language models and use them to steer models accordingly.
We demonstrate how this method can enhance model adherence to constraints such as output format, length, and word inclusion.
Our findings demonstrate that activation steering offers a practical and scalable approach for fine-grained control in language generation.
arXiv Detail & Related papers (2024-10-15T08:38:20Z) - Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation [8.931633531104021]
SAS (Spatially-Aware Speaker) is an instruction generator that uses both structural and semantic knowledge of the environment to produce richer instructions.
Our method outperforms existing instruction generation models, evaluated using standard metrics.
arXiv Detail & Related papers (2024-09-09T13:12:11Z) - Self-Alignment with Instruction Backtranslation [162.02529653768096]
We present a method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions.
Our approach, named instruction backtranslation, starts with a language model finetuned on a small amount of seed data, and a given web corpus.
arXiv Detail & Related papers (2023-08-11T17:47:54Z) - Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for
Navigation Instruction Generation [70.76686546473994]
We introduce a novel speaker model textscKefa for navigation instruction generation.
The proposed KEFA speaker achieves state-of-the-art instruction generation performance for both indoor and outdoor scenes.
arXiv Detail & Related papers (2023-07-25T09:39:59Z) - Self-supervised Speaker Diarization [19.111219197011355]
This study proposes an entirely unsupervised deep-learning model for speaker diarization.
Speaker embeddings are represented by an encoder trained in a self-supervised fashion using pairs of adjacent segments assumed to be of the same speaker.
arXiv Detail & Related papers (2022-04-08T16:27:14Z) - Counterfactual Cycle-Consistent Learning for Instruction Following and
Generation in Vision-Language Navigation [172.15808300686584]
We describe an approach that learns the two tasks simultaneously and exploits their intrinsic correlations to boost the training of each.
Our approach improves the performance of various follower models and produces accurate navigation instructions.
arXiv Detail & Related papers (2022-03-30T18:15:26Z) - Layer-wise Analysis of a Self-supervised Speech Representation Model [26.727775920272205]
Self-supervised learning approaches have been successful for pre-training speech representation models.
Not much has been studied about the type or extent of information encoded in the pre-trained representations themselves.
arXiv Detail & Related papers (2021-07-10T02:13:25Z) - Self-supervised Text-independent Speaker Verification using Prototypical
Momentum Contrastive Learning [58.14807331265752]
We show that better speaker embeddings can be learned by momentum contrastive learning.
We generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled.
arXiv Detail & Related papers (2020-12-13T23:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.