Related papers: Navigating the Synchrony-Stability Frontier in Adaptive Chatbots

Navigating the Synchrony-Stability Frontier in Adaptive Chatbots

URL: http://arxiv.org/abs/2510.00339v1
Date: Tue, 30 Sep 2025 22:50:30 GMT
Title: Navigating the Synchrony-Stability Frontier in Adaptive Chatbots
Authors: T. James Brandt,
Abstract summary: We present a computational evaluation framework that makes the core design tension explicit.<n>We simulate and compare explicit adaptation policies on a human-log dataset.<n>We find bounded policies achieve substantial gains in stability at a modest cost to synchrony.<n>We quantify "prompt legibility," showing that frontier policies reduce instruction churn and cut jarring register flips.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adaptive chatbots that mimic a user's linguistic style can build rapport and engagement, yet unconstrained mimicry risks an agent that feels unstable or sycophantic. We present a computational evaluation framework that makes the core design tension explicit: balancing moment-to-moment linguistic synchrony against long-term persona stability. Using an 8-dimensional style vector and a closed-loop "base+delta" prompting architecture, we simulate and compare explicit adaptation policies - Uncapped, Cap, Exponential Moving Average (EMA), Dead-Band, and Hybrids - on a human-log dataset. Our analysis maps a clear Pareto frontier: bounded policies achieve substantial gains in stability at a modest cost to synchrony. For example, a Hybrid (EMA+Cap) raises stability from 0.542 to 0.878 (+62%) while reducing synchrony by only 17%. We confirm this trade-off through large-scale replications on three public corpora (DailyDialog, Persona-Chat, EmpatheticDialogues) and LLM-in-the-loop validation across two model families. Furthermore, we quantify "prompt legibility," showing that frontier policies reduce instruction churn and cut jarring register flips (major tone changes) from 0.254 to 0.092, yielding systems that are easier to reason about and maintain. Taken together, our framework provides a general evaluation harness for style adaptation; a systematic ablation that identifies Pareto-efficient policies; robust validation across diverse datasets and models; and novel legibility metrics linking policy choices to system maintainability.

Related papers

Words & Weights: Streamlining Multi-Turn Interactions via Co-Adaptation [55.938648534942665]
Test-time policy adaptation for multi-turn interactions (T2PAM) is essential for aligning Large Language Models (LLMs) with dynamic user needs during inference time.<n>We propose ROSA2, a framework that reformulates interaction as a joint optimization problem over the heterogeneous space of Words and Weights.
arXiv Detail & Related papers (2026-03-02T02:16:20Z)
Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction [1.3511057160494195]
Leader-follower interaction is an important paradigm in human-robot interaction (HRI)<n>Small language models (SLMs) offer a potential alternative, but their effectiveness for role classification in HRI has not been systematically evaluated.
arXiv Detail & Related papers (2026-02-26T18:20:26Z)
Communication Enhances LLMs' Stability in Strategic Thinking [0.0]
We evaluate whether short, costless pre-play messages emulating the cheap-talk paradigm affect strategic stability.<n>We demonstrate consistent reductions in trajectory noise across a majority of the model-context pairings being studied.<n>While communication rarely produces harmful instability, we document a few context-specific exceptions and identify the limited domains in which communication harms stability.
arXiv Detail & Related papers (2026-02-04T17:12:52Z)
AdaptNC: Adaptive Nonconformity Scores for Uncertainty-Aware Autonomous Systems in Dynamic Environments [7.201566646241765]
Conformal Prediction methods maintain target coverage by adaptively scaling the conformal threshold.<n>We show that this fixed geometry leads to highly conservative, volume-inefficient prediction regions when environments undergo structural shifts.<n>We propose textbfAdaptNC, a framework for the joint online adaptation of both the nonconformity score parameters and the conformal threshold.
arXiv Detail & Related papers (2026-02-02T04:41:35Z)
Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z)
FLARE: Adaptive Multi-Dimensional Reputation for Robust Client Reliability in Federated Learning [0.6524460254566904]
Federated learning (FL) enables collaborative model training while preserving data privacy.<n>It remains vulnerable to malicious clients who compromise model integrity through Byzantine attacks, data poisoning, or adaptive adversarial behaviors.<n>We propose FLARE, an adaptive reputation-based framework that transforms client reliability assessment from binary decisions to a continuous, multi-dimensional trust evaluation.
arXiv Detail & Related papers (2025-11-18T17:57:40Z)
CO-PFL: Contribution-Oriented Personalized Federated Learning for Heterogeneous Networks [51.43780477302533]
Contribution-Oriented PFL (CO-PFL) is a novel algorithm that dynamically estimates each client's contribution for global aggregation.<n>CO-PFL consistently surpasses state-of-the-art methods in robustness in personalization accuracy, robustness, scalability and convergence stability.
arXiv Detail & Related papers (2025-10-23T05:10:06Z)
SPACeR: Self-Play Anchoring with Centralized Reference Models [50.55045557371374]
Sim agent policies are realistic, human-like, fast, and scalable in multi-agent settings.<n>Recent progress in imitation learning with large diffusion-based or tokenized models has shown that behaviors can be captured directly from human driving data.<n>We propose SPACeR, a framework that leverages a pretrained tokenized autoregressive motion model as a central reference policy.
arXiv Detail & Related papers (2025-10-20T19:53:02Z)
Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z)
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment [58.37104890690234]
Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems.<n>We introduce a new framework named textbfSteerable textbfAdversarial scenario textbfGEnerator (SAGE)<n>SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining.
arXiv Detail & Related papers (2025-09-24T13:27:35Z)
Prompt Stability in Code LLMs: Measuring Sensitivity across Emotion- and Personality-Driven Variations [40.12950482269347]
We present PromptSE, a framework that creates semantically equivalent prompt variants with emotion and personality templates.<n>Our study shows that performance and stability behave as largely decoupled optimization objectives.<n>PromptSE enables practitioners to quantify performance stability trade offs for deployment and model selection.
arXiv Detail & Related papers (2025-09-17T04:17:42Z)
STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z)
FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning [0.10241134756773229]
Federated Learning (FL) offers a paradigm for privacy-preserving collaborative AI, but its decentralized nature creates significant vulnerabilities to model poisoning attacks.<n>This paper introduces FedStrategist, a novel meta-learning framework that reframes robust aggregation as a real-time, cost-aware control problem.
arXiv Detail & Related papers (2025-07-18T18:53:26Z)
Self-Play Preference Optimization for Language Model Alignment [75.83359213697854]
Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences. We propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game. Our approach, dubbed Self-Play Preference Optimization (SPPO), utilizes iterative policy updates to provably approximate the Nash equilibrium.
arXiv Detail & Related papers (2024-05-01T17:59:20Z)
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders [16.193776814471768]
We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders. We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis.
arXiv Detail & Related papers (2023-02-01T18:40:53Z)
Sampling, Communication, and Prediction Co-Design for Synchronizing the Real-World Device and Digital Model in Metaverse [14.326344469446434]
We develop a constrained Deep Reinforcement Learning (DRL) algorithm, named Knowledge-assisted Constrained Twin-Delayed Deep Deterministic (KC-TD3) policy gradient algorithm. We validate our framework on a prototype composed of a real-world robotic arm and its digital model.
arXiv Detail & Related papers (2022-07-31T20:17:31Z)
Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy. We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z)
Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control. We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.