Multi-Party Supervised Fine-tuning of Language Models for Multi-Party Dialogue Generation
- URL: http://arxiv.org/abs/2412.05342v2
- Date: Wed, 18 Dec 2024 09:47:53 GMT
- Title: Multi-Party Supervised Fine-tuning of Language Models for Multi-Party Dialogue Generation
- Authors: Xiaoyu Wang, Ningyuan Xi, Teng Chen, Qingqing Gu, Yue Zhao, Xiaokai Chen, Zhonglin Jiang, Yong Chen, Luo Ji,
- Abstract summary: Large Language Models (LLM) are usually fine-tuned to participate in dyadic or two-party dialogues.
In this work, we design a multi-party fine-tuning framework (MuPaS) for LLMs on the multi-party dialogue datasets.
- Score: 11.340007143339657
- License:
- Abstract: Large Language Models (LLM) are usually fine-tuned to participate in dyadic or two-party dialogues, which can not adapt well to multi-party dialogues (MPD), which hinders their applications in such scenarios including multi-personal meetings, discussions and daily communication. Previous LLM-based researches mainly focus on the multi-agent framework, while their base LLMs are still pairwisely fine-tuned. In this work, we design a multi-party fine-tuning framework (MuPaS) for LLMs on the multi-party dialogue datasets, and prove such a straightforward framework can let the LLM align with the multi-party conversation style efficiently and effectively. We also design two training strategies which can convert MuPaS into the MPD simulator. Substantial experiments show that MuPaS can achieve state-of-the-art multi-party response, higher accuracy of the-next-speaker prediction, higher human and automatic evaluated utterance qualities, and can even generate reasonably with out-of-distribution scene, topic and role descriptions. The MuPaS framework bridges the LLM training with more complicated multi-party applications, such as conversation generation, virtual rehearsal or meta-universe.
Related papers
- Don't Stop the Multi-Party! On Generating Synthetic Multi-Party Conversations with Constraints [11.566214724241798]
Multi-Party Conversations (MPCs) are widely studied across disciplines, with social media as a primary data source due to their accessibility.
This work explores the feasibility of generating diverse MPCs with instruction-tuned Large Language Models.
arXiv Detail & Related papers (2025-02-19T10:10:43Z) - Can xLLMs Understand the Structure of Dialog? Exploring Multilingual Response Generation in Complex Scenarios [8.131774353504472]
We introduce XMP, a high-quality parallel Multilingual dataset sourced from Multi-party Podcast dialogues.
Each sample in the dataset features at least three participants discussing a wide range of topics, including society, culture, politics, and entertainment.
We uncover significant limitations in previously recognized multilingual capabilities of LLMs when applied to such complex dialogue scenarios.
arXiv Detail & Related papers (2025-01-20T04:33:03Z) - OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation [53.7173034249361]
End-to-end GPT-based model OmniFlatten capable of effectively modeling complex behaviors inherent natural conversations with low latency.
Our approach offers a simple modeling technique and a promising research direction for developing efficient and natural end-to-end full- spoken dialogue systems.
arXiv Detail & Related papers (2024-10-23T11:58:58Z) - Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions [68.98811048970963]
We present a pioneering effort to investigate the capability of large language models (LLMs) in transcribing speech in multi-talker environments.
Our approach utilizes WavLM and Whisper encoder to extract multi-faceted speech representations that are sensitive to speaker characteristics and semantic context.
Comprehensive experiments reveal the promising performance of our proposed system, MT-LLM, in cocktail party scenarios.
arXiv Detail & Related papers (2024-09-13T07:28:28Z) - DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval.
Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP.
To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z) - Plug-and-Play Policy Planner for Large Language Model Powered Dialogue
Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP.
Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data.
PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z) - Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and
Text Integration [50.94902442781148]
We propose a novel multi-modal large language model (LLM) that seamlessly integrates visual, audio, and textual information.
Macaw-LLM consists of three main components: a modality module for encoding multi-modal data, a cognitive module for harnessing pretrained LLMs, and an alignment module for harmonizing diverse representations.
We construct a large-scale multi-modal instruction dataset in terms of multi-turn dialogue, including 69K image instances and 50K video instances.
arXiv Detail & Related papers (2023-06-15T12:45:25Z) - Dialogue-oriented Pre-training [70.03028879331339]
We propose three strategies to simulate the conversation features on general plain text.
Dialog-PrLM is fine-tuned on three public multi-turn dialogue datasets.
arXiv Detail & Related papers (2021-06-01T12:02:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.