Related papers: Reasoning Stabilization Point: A Training-Time Signal for Stable Evidence and Shortcut Reliance

Reasoning Stabilization Point: A Training-Time Signal for Stable Evidence and Shortcut Reliance

URL: http://arxiv.org/abs/2601.11625v1
Date: Mon, 12 Jan 2026 17:48:05 GMT
Title: Reasoning Stabilization Point: A Training-Time Signal for Stable Evidence and Shortcut Reliance
Authors: Sahil Rajesh Dhayalkar,
Abstract summary: We define explanation drift as the epoch-to-epoch change in normalized token attributions on a fixed probe set.<n>RSP is computed from within-run drift dynamics and requires no tuning on out-of-distribution data.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning pretrained language models can improve task performance while subtly altering the evidence a model relies on. We propose a training-time interpretability view that tracks token-level attributions across finetuning epochs. We define explanation driftas the epoch-to-epoch change in normalized token attributions on a fixed probe set, and introduce the Reasoning Stabilization Point(RSP), the earliest epoch after which drift remains consistently low. RSP is computed from within-run drift dynamics and requires no tuning on out-of-distribution data. Across multiple lightweight transformer classifiers and benchmark classification tasks, drift typically collapses into a low, stable regime early in training, while validation accuracy continues to change only marginally. In a controlled shortcut setting with label-correlated trigger tokens, attribution dynamics expose increasing reliance on the shortcut even when validation accuracy remains competitive. Overall, explanation drift provides a simple, low-cost diagnostic for monitoring how decision evidence evolves during fine-tuning and for selecting checkpoints in a stable-evidence regime.

Related papers

Adaptive recurrent flow map operator learning for reaction diffusion dynamics [0.9137554315375919]
We develop an operator learner with adaptive recurrent training (DDOL-ART) using a robust recurrent strategy with lightweight validation milestones.<n>DDOL-ART learns one-step operators that remain stable under long rollouts and generalize zero-shot to strong shifts.<n>It is several-fold faster than a physics-based numerical-loss operator learner (NLOL) under matched settings.
arXiv Detail & Related papers (2026-02-10T07:33:13Z)
Rethinking Test-Time Training: Tilting The Latent Distribution For Few-Shot Source-Free Adaptation [3.5808917363708743]
We study test-time adaptation of foundation models for few-shot classification under a completely frozen-model regime.<n>We propose arguably the first training-free inference method that adapts predictions to the new task by performing a change of measure over the latent embedding distribution induced by the encoder.
arXiv Detail & Related papers (2026-02-02T18:17:29Z)
SteeringTTA: Guiding Diffusion Trajectories for Robust Test-Time-Adaptation [10.159672026403097]
Test-time adaptation (TTA) aims to correct performance degradation of deep models under distribution shifts by updating models or inputs using unlabeled test data.<n>We propose SteeringTTA, an inference-only framework that adapts Feynman-Kac steering to guide diffusion-based input adaptation for classification with rewards driven by pseudo-label.
arXiv Detail & Related papers (2025-10-16T12:46:53Z)
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z)
Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z)
Technical note on Sequential Test-Time Adaptation via Martingale-Driven Fisher Prompting [3.5808917363708743]
M-FISHER is a method for sequential distribution shift detection and stable adaptation in streaming data.<n>For detection, we construct an exponential martingale from non-conformity scores and apply Ville's inequality to obtain time-uniform guarantees on false alarm control.<n>For adaptation, we show that Fisher-preconditioned updates of prompt parameters implement natural gradient descent on the distributional manifold.
arXiv Detail & Related papers (2025-10-04T15:31:26Z)
Large Continual Instruction Assistant [59.585544987096974]
Continual Instruction Tuning (CIT) is adopted to instruct Large Models to follow human intent data by data.<n>Existing update gradient would heavily destroy the performance on previous datasets during CIT process.<n>We propose a general continual instruction tuning framework to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z)
Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance. We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z)
Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation [67.18144414660681]
We propose a Fast-Slow Test-Time Adaptation (FSTTA) approach for online Vision-and-Language Navigation (VLN) Our method obtains impressive performance gains on four popular benchmarks.
arXiv Detail & Related papers (2023-11-22T07:47:39Z)
Generalized Robust Test-Time Adaptation in Continuous Dynamic Scenarios [18.527640606971563]
Test-time adaptation (TTA) adapts pre-trained models to test distributions during the inference phase exclusively employing unlabeled test data streams. We propose a Generalized Robust Test-Time Adaptation (GRoTTA) method to effectively address the difficult problem.
arXiv Detail & Related papers (2023-10-07T07:13:49Z)
Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming Disfluency Detection [3.884530687475798]
Streaming BERT-based sequence tagging model is capable of detecting disfluencies in real-time. Model attains state-of-the-art latency and stability scores when compared with recent work on incremental disfluency detection.
arXiv Detail & Related papers (2022-05-02T02:13:24Z)
StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR [46.69852287267763]
We propose a simple alignment-free regularization method, StableEmit, to encourage MoChA to emit tokens earlier. We show that StableEmit significantly reduces the recognition errors and the emission latency simultaneously.
arXiv Detail & Related papers (2021-07-01T17:49:31Z)
Optimal Change-Point Detection with Training Sequences in the Large and Moderate Deviations Regimes [72.68201611113673]
This paper investigates a novel offline change-point detection problem from an information-theoretic perspective. We assume that the knowledge of the underlying pre- and post-change distributions are not known and can only be learned from the training sequences which are available.
arXiv Detail & Related papers (2020-03-13T23:39:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.