Related papers: Measuring and Controlling Instruction (In)Stability in Language Model Dialogs

Related papers

F-Actor: Controllable Conversational Behaviour in Full-Duplex Models [70.48189107402145]
We present first open, instruction-following full-stage conversational speech model that can be trained efficiently under typical academic resource constraints.<n>Our model requires just 2,000 hours of data, without relying on large-scale pretraining or multi-stage pretraining.<n>Both the model and training code will be released to enable reproducible research on controllable full-like controllable full-stage speech systems.
arXiv Detail & Related papers (2026-01-16T14:25:57Z)
One Battle After Another: Probing LLMs' Limits on Multi-Turn Instruction Following with a Benchmark Evolving Framework [51.50565654314582]
Large language models can follow users' instructions throughout a dialogue spanning multiple topics.<n>Existing benchmarks are often limited to a fixed number of turns, making them susceptible to saturation and failing to account for the user's interactive experience.<n>We propose a framework for assessing multi-turn instruction-following ability.
arXiv Detail & Related papers (2025-11-05T14:39:59Z)
BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs [84.59993864748195]
We propose a new paradigm inspired by operationalism'' that decouples instruction understanding from speech generation.<n>We introduce BatonVoice, a framework where an LLM acts as a conductor'', understanding user instructions.<n>A separate TTS model, the orchestra'', then generates the speech from these features.
arXiv Detail & Related papers (2025-09-30T16:52:14Z)
Do What? Teaching Vision-Language-Action Models to Reject the Impossible [53.40183895299108]
Vision-Language-Action (VLA) models have demonstrated strong performance on a range of robotic tasks.<n>We propose Instruct-Verify-and-Act (IVA), a framework that detects when an instruction cannot be executed due to a false premise.<n>Our experiments show that IVA improves false premise detection accuracy by 97.56% over baselines.
arXiv Detail & Related papers (2025-08-22T10:54:33Z)
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems [48.42417538526542]
Text-to-Speech systems rely on fixed style labels or inserting a speech prompt to control these cues.<n>Recent attempts seek to employ natural-language instructions to modulate paralinguistic features.<n>InstructTTSEval is a benchmark for measuring the capability of complex natural-language style control.
arXiv Detail & Related papers (2025-06-19T15:08:01Z)
Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules [19.673388630963807]
We formalise the challenge as span-conditioned generation, decomposing each turn into the dialogue history.<n>We introduce a quotation-centric data pipeline that automatically synthesises task-specific dialogues.<n>We propose QuAda, a lightweight training-based method that attaches two bottleneck projections to every attention head.
arXiv Detail & Related papers (2025-05-30T07:06:11Z)
Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities [93.09944267871163]
FullDuplexBench is a benchmark that systematically evaluates key conversational behaviors. We aim to advance spoken dialogue modeling and encourage the development of more interactive and natural dialogue systems.
arXiv Detail & Related papers (2025-03-06T18:59:16Z)
Seq2Seq Model-Based Chatbot with LSTM and Attention Mechanism for Enhanced User Interaction [1.937324318931008]
This work proposes a Sequence-to-Sequence (Seq2Seq) model with an encoder-decoder architecture that incorporates attention mechanisms and Long Short-Term Memory (LSTM) cells. The proposed Seq2Seq model-based robot is trained, validated, and tested on a dataset specifically for the tourism sector in Draa-Tafilalet, Morocco.
arXiv Detail & Related papers (2024-12-27T23:50:54Z)
Prompt Engineering a Schizophrenia Chatbot: Utilizing a Multi-Agent Approach for Enhanced Compliance with Prompt Instructions [0.0699049312989311]
Patients with schizophrenia often present with cognitive impairments that may hinder their ability to learn about their condition. While Large Language Models (LLMs) have the potential to make topical mental health information more accessible and engaging, their black-box nature raises concerns about ethics and safety.
arXiv Detail & Related papers (2024-10-10T09:49:24Z)
Modeling Real-Time Interactive Conversations as Timed Diarized Transcripts [11.067252960486272]
We present a simple yet general method to simulate real-time interactive conversations using pretrained language models. We demonstrate the promise of this method with two case studies: instant messenger dialogues and spoken conversations.
arXiv Detail & Related papers (2024-05-21T21:14:31Z)
Dialogue-based generation of self-driving simulation scenarios using Large Language Models [14.86435467709869]
Simulation is an invaluable tool for developing and evaluating controllers for self-driving cars. Current simulation frameworks are driven by highly-specialist domain specific languages. There is often a gap between a concise English utterance and the executable code that captures the user's intent.
arXiv Detail & Related papers (2023-10-26T13:07:01Z)
Multi-Purpose NLP Chatbot : Design, Methodology & Conclusion [0.0]
This research paper provides a thorough analysis of the chatbots technology environment as it exists today. It provides a very flexible system that makes use of reinforcement learning strategies to improve user interactions and conversational experiences. The complexity of chatbots technology development is also explored in this study, along with the causes that have propelled these developments and their far-reaching effects on a range of sectors.
arXiv Detail & Related papers (2023-10-13T09:47:24Z)
Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs [59.74002011562726]
We propose a novel linguistic cue-based chain-of-thoughts (textitCue-CoT) to provide a more personalized and engaging response. We build a benchmark with in-depth dialogue questions, consisting of 6 datasets in both Chinese and English. Empirical results demonstrate our proposed textitCue-CoT method outperforms standard prompting methods in terms of both textithelpfulness and textitacceptability on all datasets.
arXiv Detail & Related papers (2023-05-19T16:27:43Z)
Controllable Mixed-Initiative Dialogue Generation through Prompting [50.03458333265885]
Mixed-initiative dialogue tasks involve repeated exchanges of information and conversational control. Agents gain control by generating responses that follow particular dialogue intents or strategies, prescribed by a policy planner. Standard approach has been fine-tuning pre-trained language models to perform generation conditioned on these intents. We instead prompt large language models as a drop-in replacement to fine-tuning on conditional generation.
arXiv Detail & Related papers (2023-05-06T23:11:25Z)
Towards Robust Online Dialogue Response Generation [62.99904593650087]
We argue that this can be caused by a discrepancy between training and real-world testing. We propose a hierarchical sampling-based method consisting of both utterance-level sampling and semi-utterance-level sampling.
arXiv Detail & Related papers (2022-03-07T06:51:41Z)
Auto-tagging of Short Conversational Sentences using Natural Language Processing Methods [0.0]
We manually tagged approximately 14 thousand visitor inputs into ten basic categories. We considered three different state-of-the-art models and reported their auto-tagging capabilities. Implementation of the models used in these experiments can be cloned from our GitHub repository and tested for similar auto-tagging problems without much effort.
arXiv Detail & Related papers (2021-06-09T10:14:05Z)
Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions. Our framework included a guiding robot and an interlocutor model that plays the role of humans. We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z)
Syntax-Enhanced Pre-trained Model [49.1659635460369]
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages. We present a model that utilizes the syntax of text in both pre-training and fine-tuning stages.
arXiv Detail & Related papers (2020-12-28T06:48:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.