Related papers: Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

URL: http://arxiv.org/abs/2406.15718v1
Date: Sat, 22 Jun 2024 03:20:10 GMT
Title: Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
Authors: Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu,
Abstract summary: Traditional turn-based chat systems prevent users from verbally interacting with system while it is generating responses. To overcome these limitations, we adapt existing LLMs to listen users while generating output and provide users with instant feedback. We build a dataset consisting of alternating time slices of queries and responses as well as covering typical feedback types in instantaneous interactions.
Score: 66.24055500785657
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can listen for users while generating output and dynamically adjust themselves to provide users with instant feedback. % such as in response to interruptions. Specifically, we divide the queries and responses of conversations into several time slices and then adopt a time-division-multiplexing (TDM) encoding-decoding strategy to pseudo-simultaneously process these slices. Furthermore, to make LLMs proficient enough to handle real-time conversations, we build a fine-tuning dataset consisting of alternating time slices of queries and responses as well as covering typical feedback types in instantaneous interactions. Our experiments show that although the queries and responses of conversations are segmented into incomplete slices for processing, LLMs can preserve their original performance on standard benchmarks with a few fine-tuning steps on our dataset. Automatic and human evaluation indicate that duplex models make user-AI interactions more natural and human-like, and greatly improve user satisfaction compared to vanilla LLMs. Our duplex model and dataset will be released.

Related papers

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation [60.164968941945645]
We introduce LLaVA-Reward, an efficient reward model designed to automatically evaluate text-to-image (T2I) generations across multiple perspectives.<n>LLaVA-Reward directly utilizes the hidden states of multimodal large language models (MLLMs)<n>We train LLaVA-Reward on four evaluation perspectives: text-image alignment, fidelity/artifact, safety, and overall ranking.
arXiv Detail & Related papers (2025-07-28T23:52:53Z)
Beyond Single-User Dialogue: Assessing Multi-User Dialogue State Tracking Capabilities of Large Language Models [7.5972186611957815]
Large language models (LLMs) have demonstrated remarkable performance in zero-shot dialogue state tracking (DST)<n>In this study, we assess the robustness of LLMs in multi-user DST while minimizing dataset construction costs.
arXiv Detail & Related papers (2025-06-12T09:04:19Z)
A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations [112.81207927088117]
PersonaConvBench is a benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs)<n>We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements.
arXiv Detail & Related papers (2025-05-20T09:13:22Z)
LLMs Get Lost In Multi-Turn Conversation [44.26588510453331]
Large Language Models (LLMs) are conversational interfaces.<n>LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define, explore, and refine what they need through multi-turn conversational exchange.
arXiv Detail & Related papers (2025-05-09T15:21:44Z)
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale [51.9706400130481]
Large Language Models (LLMs) have emerged as personalized assistants for users across a wide range of tasks. PERSONAMEM features curated user profiles with over 180 simulated user-LLM interaction histories. We evaluate LLM chatbots' ability to identify the most suitable response according to the current state of the user's profile.
arXiv Detail & Related papers (2025-04-19T08:16:10Z)
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models [96.64960606650115]
LongHalQA is an LLM-free hallucination benchmark that comprises 6K long and complex hallucination text. LongHalQA is featured by GPT4V-generated hallucinatory data that are well aligned with real-world scenarios.
arXiv Detail & Related papers (2024-10-13T18:59:58Z)
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models [0.0]
We introduce a dynamic benchmarking system for conversational agents that evaluates their performance through a single, simulated, and lengthy user interaction. We context switch regularly to interleave the tasks, which constructs a realistic testing scenario in which we assess the Long-Term Memory, Continual Learning, and Information Integration capabilities of the agents.
arXiv Detail & Related papers (2024-09-30T12:01:29Z)
RAD-Bench: Evaluating Large Language Models Capabilities in Retrieval Augmented Dialogues [8.036117602566074]
external retrieval mechanisms are often employed to enhance the quality of augmented generations in dialogues. Existing benchmarks either assess LLMs' chat abilities in multi-turn dialogues or their use of retrieval for augmented responses in single-turn settings. We introduce RAD-Bench, a benchmark designed to evaluate LLMs' capabilities in multi-turn dialogues following retrievals.
arXiv Detail & Related papers (2024-09-19T08:26:45Z)
Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training. Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z)
Hidden in Plain Sight: Exploring Chat History Tampering in Interactive Language Models [12.920884182101142]
Large Language Models (LLMs) have become prevalent in real-world applications, exhibiting impressive text generation performance. To behave interactively, LLM-based chat systems must integrate prior chat history as context into their inputs, following a pre-defined structure. This paper introduces a systematic methodology to inject user-supplied history into LLM conversations without any prior knowledge of the target model.
arXiv Detail & Related papers (2024-05-30T16:36:47Z)
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions [58.57255822646756]
This paper introduces MathChat, a benchmark designed to evaluate large language models (LLMs) across a broader spectrum of mathematical tasks. We evaluate the performance of various SOTA LLMs on the MathChat benchmark, and we observe that while these models excel in single turn question answering, they significantly underperform in more complex scenarios. We develop MathChat sync, a synthetic dialogue based math dataset for LLM finetuning, focusing on improving models' interaction and instruction following capabilities in conversations.
arXiv Detail & Related papers (2024-05-29T18:45:55Z)
Evaluating Very Long-Term Conversational Memory of LLM Agents [95.84027826745609]
We introduce a machine-human pipeline to generate high-quality, very long-term dialogues. We equip each agent with the capability of sharing and reacting to images. The generated conversations are verified and edited by human annotators for long-range consistency.
arXiv Detail & Related papers (2024-02-27T18:42:31Z)
User-LLM: Efficient LLM Contextualization with User Embeddings [23.226164112909643]
User-LLM is a novel framework that leverages user embeddings to directly contextualize large language models with user history interactions. Our approach achieves significant efficiency gains by representing user timelines directly as embeddings, leading to substantial inference speedups of up to 78.1X.
arXiv Detail & Related papers (2024-02-21T08:03:27Z)
Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM Interactions [19.365615476223635]
Conversational question-answering systems aim to create interactive search systems that retrieve information by interacting with users. Existing work uses human annotators to play the roles of the questioner (student) and the answerer (teacher) We propose a simulation framework that employs zero-shot learner LLMs for simulating teacher-student interactions.
arXiv Detail & Related papers (2023-12-05T17:38:02Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.