Related papers: DialDefer: A Framework for Detecting and Mitigating LLM Dialogic Deference

DialDefer: A Framework for Detecting and Mitigating LLM Dialogic Deference

URL: http://arxiv.org/abs/2601.10896v1
Date: Thu, 15 Jan 2026 22:50:46 GMT
Title: DialDefer: A Framework for Detecting and Mitigating LLM Dialogic Deference
Authors: Parisa Rabbani, Priyam Sahoo, Ruben Mathew, Aishee Mondal, Harshita Ketharaman, Nimet Beyza Bozdag, Dilek Hakkani-Tür,
Abstract summary: We show that third-party judges (LLMs) judge identical claims differently depending on framing.<n>We call this dialogic deference and introduce DialDefer, a framework for detecting and mitigating these framing-induced judgment shifts.<n>Our Dialogic Deference Score (DDS) captures directional shifts that aggregate accuracy obscures.
Score: 6.820756409849046
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: LLMs are increasingly used as third-party judges, yet their reliability when evaluating speakers in dialogue remains poorly understood. We show that LLMs judge identical claims differently depending on framing: the same content elicits different verdicts when presented as a statement to verify ("Is this statement correct?") versus attributed to a speaker ("Is this speaker correct?"). We call this dialogic deference and introduce DialDefer, a framework for detecting and mitigating these framing-induced judgment shifts. Our Dialogic Deference Score (DDS) captures directional shifts that aggregate accuracy obscures. Across nine domains, 3k+ instances, and four models, conversational framing induces large shifts (|DDS| up to 87pp, p < .0001) while accuracy remains stable (<2pp), with effects amplifying 2-4x on naturalistic Reddit conversations. Models can shift toward agreement (deference) or disagreement (skepticism) depending on domain -- the same model ranges from DDS = -53 on graduate-level science to +58 on social judgment. Ablations reveal that human-vs-LLM attribution drives the largest shifts (17.7pp swing), suggesting models treat disagreement with humans as more costly than with AI. Mitigation attempts reduce deference but can over-correct into skepticism, framing this as a calibration problem beyond accuracy optimization.

Related papers

LLM Reasoning Predicts When Models Are Right: Evidence from Coding Classroom Discourse [0.18268488712787334]
Large Language Models (LLMs) are increasingly deployed to automatically label and analyze educational dialogue at scale.<n>We investigate whether reasoning generated by LLMs can be used to predict the correctness of a model's own predictions.<n>We analyze 30,300 teacher utterances from classroom dialogue, each labeled by multiple state-of-the-art LLMs with an instructional move construct and an accompanying reasoning.
arXiv Detail & Related papers (2026-02-10T14:38:13Z)
SpeakerSleuth: Evaluating Large Audio-Language Models as Judges for Multi-turn Speaker Consistency [12.420484491347073]
We present SpeakerSleuth, a benchmark evaluating whether LALMs can reliably judge speaker consistency in multi-turn dialogues.<n>We construct 1,818 human-verified evaluation instances across four diverse datasets spanning synthetic and real speech.<n>We find that models struggle to reliably detect acoustic inconsistencies.
arXiv Detail & Related papers (2026-01-07T15:45:41Z)
From Fact to Judgment: Investigating the Impact of Task Framing on LLM Conviction in Dialogue Systems [8.8953040142657]
We investigate how an LLM's conviction is changed when a task is reframed from a direct factual query to a Conversational Judgment Task.<n>We apply pressure in the form of a simple rebuttal ("The previous answer is incorrect.") to both conditions.<n>Our findings show that while some models like GPT-4o-mini reveal sycophantic tendencies under social framing tasks, others like Llama-8B-Instruct become overly-critical.
arXiv Detail & Related papers (2025-11-14T00:55:28Z)
Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL [64.3268313484078]
Large Language Models (LLMs) interact with millions of people worldwide in applications such as customer support, education and healthcare.<n>Their ability to produce deceptive outputs, whether intentionally or inadvertently, poses significant safety concerns.<n>We investigate the extent to which LLMs engage in deception within dialogue, and propose the belief misalignment metric to quantify deception.
arXiv Detail & Related papers (2025-10-16T05:29:36Z)
Two Heads Are Better Than One: Audio-Visual Speech Error Correction with Dual Hypotheses [71.34350093068473]
This paper introduces a new paradigm for generative error correction (GER) framework in audio-visual speech recognition (AVSR)<n>Our framework, DualHyp, empowers a large language model (LLM) to compose independent N-best hypotheses from separate automatic speech recognition (ASR) and visual speech recognition (VSR) models.<n>Our framework attains up to 57.7% error rate gain on the LRS2 benchmark over standard ASR baseline, contrary to single-stream GER approaches that achieve only 10% gain.
arXiv Detail & Related papers (2025-10-15T08:27:16Z)
Joint Detection of Fraud and Concept Drift inOnline Conversations with LLM-Assisted Judgment [11.917779863156097]
We propose a two stage detection framework that first identifies suspicious conversations using a tailored ensemble classification model.<n>To improve the reliability of detection, we incorporate a concept drift analysis step using a One Class Drift Detector (OCDD)<n>When drift is detected, a large language model (LLM) assesses whether the shift indicates fraudulent manipulation or a legitimate topic change.
arXiv Detail & Related papers (2025-05-07T22:30:53Z)
Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling [9.305763502526833]
We propose an accountability model for task-oriented dialogue agents to address user overreliance via friction turns.<n>Our empirical findings demonstrate that the proposed approach not only enables reliable estimation of AI agent errors but also guides the decoder in generating more accurate actions.
arXiv Detail & Related papers (2025-01-17T17:40:12Z)
DebUnc: Improving Large Language Model Agent Communication With Uncertainty Metrics [52.242449026151846]
Multi-agent debates have been introduced to improve the accuracy of Large Language Models (LLMs)<n>We propose DebUnc, a debate framework that uses uncertainty metrics to assess agent confidence.
arXiv Detail & Related papers (2024-07-08T22:15:01Z)
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models [69.68379406317682]
We introduce a listener-aware finetuning method (LACIE) to calibrate implicit and explicit confidence markers. We show that LACIE models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener. We find that training with LACIE results in 47% fewer incorrect answers being accepted while maintaining the same level of acceptance for correct answers.
arXiv Detail & Related papers (2024-05-31T17:16:38Z)
Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness? [53.98071556805525]
Neural language models (LMs) can be used to evaluate the truth of factual statements. They can be queried for statement probabilities, or probed for internal representations of truthfulness. Past work has found that these two procedures sometimes disagree, and that probes tend to be more accurate than LM outputs. This has led some researchers to conclude that LMs "lie" or otherwise encode non-cooperative communicative intents.
arXiv Detail & Related papers (2023-11-27T18:59:14Z)
FoolHD: Fooling speaker identification by Highly imperceptible adversarial Disturbances [63.80959552818541]
We propose a white-box steganography-inspired adversarial attack that generates imperceptible perturbations against a speaker identification model. Our approach, FoolHD, uses a Gated Convolutional Autoencoder that operates in the DCT domain and is trained with a multi-objective loss function. We validate FoolHD with a 250-speaker identification x-vector network, trained using VoxCeleb, in terms of accuracy, success rate, and imperceptibility.
arXiv Detail & Related papers (2020-11-17T07:38:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.