Reasoning Stabilization Point: A Training-Time Signal for Stable Evidence and Shortcut Reliance
- URL: http://arxiv.org/abs/2601.11625v1
- Date: Mon, 12 Jan 2026 17:48:05 GMT
- Title: Reasoning Stabilization Point: A Training-Time Signal for Stable Evidence and Shortcut Reliance
- Authors: Sahil Rajesh Dhayalkar,
- Abstract summary: We define explanation drift as the epoch-to-epoch change in normalized token attributions on a fixed probe set.<n>RSP is computed from within-run drift dynamics and requires no tuning on out-of-distribution data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning pretrained language models can improve task performance while subtly altering the evidence a model relies on. We propose a training-time interpretability view that tracks token-level attributions across finetuning epochs. We define explanation driftas the epoch-to-epoch change in normalized token attributions on a fixed probe set, and introduce the Reasoning Stabilization Point(RSP), the earliest epoch after which drift remains consistently low. RSP is computed from within-run drift dynamics and requires no tuning on out-of-distribution data. Across multiple lightweight transformer classifiers and benchmark classification tasks, drift typically collapses into a low, stable regime early in training, while validation accuracy continues to change only marginally. In a controlled shortcut setting with label-correlated trigger tokens, attribution dynamics expose increasing reliance on the shortcut even when validation accuracy remains competitive. Overall, explanation drift provides a simple, low-cost diagnostic for monitoring how decision evidence evolves during fine-tuning and for selecting checkpoints in a stable-evidence regime.
Related papers
- Adaptive recurrent flow map operator learning for reaction diffusion dynamics [0.9137554315375919]
We develop an operator learner with adaptive recurrent training (DDOL-ART) using a robust recurrent strategy with lightweight validation milestones.<n>DDOL-ART learns one-step operators that remain stable under long rollouts and generalize zero-shot to strong shifts.<n>It is several-fold faster than a physics-based numerical-loss operator learner (NLOL) under matched settings.
arXiv Detail & Related papers (2026-02-10T07:33:13Z) - Rethinking Test-Time Training: Tilting The Latent Distribution For Few-Shot Source-Free Adaptation [3.5808917363708743]
We study test-time adaptation of foundation models for few-shot classification under a completely frozen-model regime.<n>We propose arguably the first training-free inference method that adapts predictions to the new task by performing a change of measure over the latent embedding distribution induced by the encoder.
arXiv Detail & Related papers (2026-02-02T18:17:29Z) - SteeringTTA: Guiding Diffusion Trajectories for Robust Test-Time-Adaptation [10.159672026403097]
Test-time adaptation (TTA) aims to correct performance degradation of deep models under distribution shifts by updating models or inputs using unlabeled test data.<n>We propose SteeringTTA, an inference-only framework that adapts Feynman-Kac steering to guide diffusion-based input adaptation for classification with rewards driven by pseudo-label.
arXiv Detail & Related papers (2025-10-16T12:46:53Z) - ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z) - Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z) - Technical note on Sequential Test-Time Adaptation via Martingale-Driven Fisher Prompting [3.5808917363708743]
M-FISHER is a method for sequential distribution shift detection and stable adaptation in streaming data.<n>For detection, we construct an exponential martingale from non-conformity scores and apply Ville's inequality to obtain time-uniform guarantees on false alarm control.<n>For adaptation, we show that Fisher-preconditioned updates of prompt parameters implement natural gradient descent on the distributional manifold.
arXiv Detail & Related papers (2025-10-04T15:31:26Z) - Large Continual Instruction Assistant [59.585544987096974]
Continual Instruction Tuning (CIT) is adopted to instruct Large Models to follow human intent data by data.<n>Existing update gradient would heavily destroy the performance on previous datasets during CIT process.<n>We propose a general continual instruction tuning framework to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z) - Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation [67.18144414660681]
We propose a Fast-Slow Test-Time Adaptation (FSTTA) approach for online Vision-and-Language Navigation (VLN)
Our method obtains impressive performance gains on four popular benchmarks.
arXiv Detail & Related papers (2023-11-22T07:47:39Z) - Generalized Robust Test-Time Adaptation in Continuous Dynamic Scenarios [18.527640606971563]
Test-time adaptation (TTA) adapts pre-trained models to test distributions during the inference phase exclusively employing unlabeled test data streams.
We propose a Generalized Robust Test-Time Adaptation (GRoTTA) method to effectively address the difficult problem.
arXiv Detail & Related papers (2023-10-07T07:13:49Z) - Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming
Disfluency Detection [3.884530687475798]
Streaming BERT-based sequence tagging model is capable of detecting disfluencies in real-time.
Model attains state-of-the-art latency and stability scores when compared with recent work on incremental disfluency detection.
arXiv Detail & Related papers (2022-05-02T02:13:24Z) - StableEmit: Selection Probability Discount for Reducing Emission Latency
of Streaming Monotonic Attention ASR [46.69852287267763]
We propose a simple alignment-free regularization method, StableEmit, to encourage MoChA to emit tokens earlier.
We show that StableEmit significantly reduces the recognition errors and the emission latency simultaneously.
arXiv Detail & Related papers (2021-07-01T17:49:31Z) - Optimal Change-Point Detection with Training Sequences in the Large and
Moderate Deviations Regimes [72.68201611113673]
This paper investigates a novel offline change-point detection problem from an information-theoretic perspective.
We assume that the knowledge of the underlying pre- and post-change distributions are not known and can only be learned from the training sequences which are available.
arXiv Detail & Related papers (2020-03-13T23:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.