Multimodal Sequential Generative Models for Semi-Supervised Language
Instruction Following
- URL: http://arxiv.org/abs/2301.00676v1
- Date: Thu, 29 Dec 2022 03:23:43 GMT
- Title: Multimodal Sequential Generative Models for Semi-Supervised Language
Instruction Following
- Authors: Kei Akuzawa, Yusuke Iwasawa, Yutaka Matsuo
- Abstract summary: This paper proposes using multimodal generative models for semi-supervised learning in the instruction following tasks.
The models learn a shared representation of the paired data, and enable semi-supervised learning by reconstructing unpaired data.
Experiments on BabyAI and Room-to-Room environments show that the proposed method improves the performance of instruction following by leveraging unpaired data.
- Score: 26.386772715777223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agents that can follow language instructions are expected to be useful in a
variety of situations such as navigation. However, training neural
network-based agents requires numerous paired trajectories and languages. This
paper proposes using multimodal generative models for semi-supervised learning
in the instruction following tasks. The models learn a shared representation of
the paired data, and enable semi-supervised learning by reconstructing unpaired
data through the representation. Key challenges in applying the models to
sequence-to-sequence tasks including instruction following are learning a
shared representation of variable-length mulitimodal data and incorporating
attention mechanisms. To address the problems, this paper proposes a novel
network architecture to absorb the difference in the sequence lengths of the
multimodal data. In addition, to further improve the performance, this paper
shows how to incorporate the generative model-based approach with an existing
semi-supervised method called a speaker-follower model, and proposes a
regularization term that improves inference using unpaired trajectories.
Experiments on BabyAI and Room-to-Room (R2R) environments show that the
proposed method improves the performance of instruction following by leveraging
unpaired data, and improves the performance of the speaker-follower model by
2\% to 4\% in R2R.
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - An Active Learning Framework for Inclusive Generation by Large Language Models [32.16984263644299]
Large Language Models (LLMs) generate text representative of diverse sub-populations.
We propose a novel clustering-based active learning framework, enhanced with knowledge distillation.
We construct two new datasets in tandem with model training, showing a performance improvement of 2%-10% over baseline models.
arXiv Detail & Related papers (2024-10-17T15:09:35Z) - Controlled Training Data Generation with Diffusion Models [48.123126522294015]
We present a method to control a text-to-image generative model to produce training data specifically "useful" for supervised learning.
We develop an automated closed-loop system which involves two feedback mechanisms.
arXiv Detail & Related papers (2024-03-22T15:59:24Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Layer-wise Analysis of a Self-supervised Speech Representation Model [26.727775920272205]
Self-supervised learning approaches have been successful for pre-training speech representation models.
Not much has been studied about the type or extent of information encoded in the pre-trained representations themselves.
arXiv Detail & Related papers (2021-07-10T02:13:25Z) - Neuro-Symbolic Representations for Video Captioning: A Case for
Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z) - Mixing Consistent Deep Clustering [3.5786621294068373]
Good latent representations produce semantically mixed outputs when decoding linears of two latent representations.
We propose the Mixing Consistent Deep Clustering method which encourages representations to appear realistic.
We show that the proposed method can be added to existing autoencoders to further improve clustering performance.
arXiv Detail & Related papers (2020-11-03T19:47:06Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.