PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and
Compositional Experts
- URL: http://arxiv.org/abs/2305.14839v2
- Date: Tue, 13 Jun 2023 06:31:46 GMT
- Title: PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and
Compositional Experts
- Authors: Yunshui Li, Binyuan Hui, ZhiChao Yin, Min Yang, Fei Huang and Yongbin
Li
- Abstract summary: This paper proposes textbfPaCE, a unified, structured, compositional multi-modal dialogue pre-training framework.
It utilizes a combination of several fundamental experts to accommodate multiple dialogue-related tasks and can be pre-trained using limited dialogue and extensive non-dialogue multi-modal data.
Experimental results demonstrate that PaCE achieves state-of-the-art results on eight multi-modal dialog benchmarks.
- Score: 45.69829921539995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perceiving multi-modal information and fulfilling dialogues with humans is a
long-term goal of artificial intelligence. Pre-training is commonly regarded as
an effective approach for multi-modal dialogue. However, due to the limited
availability of multi-modal dialogue data, there is still scarce research on
multi-modal dialogue pre-training. Yet another intriguing challenge emerges
from the encompassing nature of multi-modal dialogue, which involves various
modalities and tasks. Moreover, new forms of tasks may arise at unpredictable
points in the future. Hence, it is essential for designed multi-modal dialogue
models to possess sufficient flexibility to adapt to such scenarios. This paper
proposes \textbf{PaCE}, a unified, structured, compositional multi-modal
dialogue pre-training framework. It utilizes a combination of several
fundamental experts to accommodate multiple dialogue-related tasks and can be
pre-trained using limited dialogue and extensive non-dialogue multi-modal data.
Furthermore, we propose a progressive training method where old experts from
the past can assist new experts, facilitating the expansion of their
capabilities. Experimental results demonstrate that PaCE achieves
state-of-the-art results on eight multi-modal dialog benchmarks.
Related papers
- DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval.
Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP.
To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z) - Self-Explanation Prompting Improves Dialogue Understanding in Large
Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs)
This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks.
Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z) - Multimodal Dialogue Response Generation [27.611204319057393]
We present a multimodal dialogue generation model, which takes the dialogue history as input, then generates a textual sequence or an image as response.
We consider multimodal dialogue generation under a natural assumption that only limited training examples are available.
In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire model.
arXiv Detail & Related papers (2021-10-16T08:52:26Z) - Advances in Multi-turn Dialogue Comprehension: A Survey [51.215629336320305]
Training machines to understand natural language and interact with humans is an elusive and essential task of artificial intelligence.
This paper reviews the previous methods from the technical perspective of dialogue modeling for the dialogue comprehension task.
In addition, we categorize dialogue-related pre-training techniques which are employed to enhance PrLMs in dialogue scenarios.
arXiv Detail & Related papers (2021-10-11T03:52:37Z) - DialogLM: Pre-trained Model for Long Dialogue Understanding and
Summarization [19.918194137007653]
We present a pre-training framework for long dialogue understanding and summarization.
Considering the nature of long conversations, we propose a window-based denoising approach for generative pre-training.
We conduct extensive experiments on five datasets of long dialogues, covering tasks of dialogue summarization, abstractive question answering and topic segmentation.
arXiv Detail & Related papers (2021-09-06T13:55:03Z) - Dialogue-oriented Pre-training [70.03028879331339]
We propose three strategies to simulate the conversation features on general plain text.
Dialog-PrLM is fine-tuned on three public multi-turn dialogue datasets.
arXiv Detail & Related papers (2021-06-01T12:02:46Z) - Emora STDM: A Versatile Framework for Innovative Dialogue System
Development [17.14709845342071]
Emora STDM is a dialogue system development framework that provides novel for rapid prototyping of chat-based dialogue managers.
Our framework caters to a wide range of expertise levels by supporting interoperability between two popular approaches, state machine and information state, to dialogue management.
arXiv Detail & Related papers (2020-06-11T01:31:17Z) - Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue
Representation Learning [50.5572111079898]
Multi-role dialogue understanding comprises a wide range of diverse tasks such as question answering, act classification, dialogue summarization etc.
While dialogue corpora are abundantly available, labeled data, for specific learning tasks, can be highly scarce and expensive.
In this work, we investigate dialogue context representation learning with various types unsupervised pretraining tasks.
arXiv Detail & Related papers (2020-02-27T04:36:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.