Related papers: SPINAL -- Scaling-law and Preference Integration in Neural Alignment Layers

SPINAL -- Scaling-law and Preference Integration in Neural Alignment Layers

URL: http://arxiv.org/abs/2601.06238v1
Date: Thu, 08 Jan 2026 17:47:12 GMT
Title: SPINAL -- Scaling-law and Preference Integration in Neural Alignment Layers
Authors: Arion Das, Partha Pratim Saha, Amit Dhanda, Vinija Jain, Aman Chadha, Amitava Das,
Abstract summary: We introduce SPINAL, a diagnostic that measures how alignment reshapes representations across depth.<n>Across model families, DPO produces a layerwise calibration effect concentrated in the final decoder blocks.<n>Aligned checkpoints show a late-layer ramp-up in contraction and a smooth reduction in transport, consistent with tightened and stabilized policy mass.
Score: 16.976750197698063
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Direct Preference Optimization (DPO) is a principled, scalable alternative to RLHF for aligning large language models from pairwise preferences, but its internal geometric footprint remains undercharacterized, limiting audits, checkpoint comparisons, and failure prediction. We introduce SPINAL (Scaling-law and Preference Integration in Neural Alignment Layers), a diagnostic that measures how alignment reshapes representations across depth by tracing localized structural change layer by layer. Across model families, DPO produces a layerwise calibration effect concentrated in the final decoder blocks (often layers 21-30), where preference gradients most directly affect the next-token distribution. SPINAL encodes each checkpoint as a depth trace over (layer index, contraction score, transport score). The contraction score summarizes how quickly the tail of a layer's spectrum decays (how fast small modes vanish); higher values indicate stronger contraction into fewer effective directions. The transport score summarizes how much the token distribution shifts between adjacent layers using a bounded overlap measure; lower values indicate shorter, smoother steps through representation space. Aligned checkpoints show a late-layer ramp-up in contraction and a smooth reduction in transport, consistent with tightened and stabilized policy mass, while unaligned models trace higher-curvature, more entropic, and geometrically incoherent depth paths. Overall, alignment is geometrically localized: the final layers encode the dominant preference-induced corrections. SPINAL turns this localization into a practical audit signal, quantifying where alignment concentrates, how strongly it manifests, and when it begins to destabilize during training.

Related papers

Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misalignment [27.352639822596146]
Cross-worker divergence in losses and gradients can remain invisible under conventional monitoring signals.<n>We propose a model-agnostic diagnostic framework that quantifies worker-level consistency using training signals readily available in standard pipelines.
arXiv Detail & Related papers (2026-02-16T04:42:30Z)
AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning [8.698253005940503]
We propose Activation-Guided Zeroth-Order optimization (AGZO)<n>Unlike prior methods, AGZO extracts a compact, activation-informed subspace on the fly during the forward pass and restricts perturbations to this low-rank subspace.<n>AGZO consistently outperforms state-of-the-art ZO baselines and significantly narrows the performance gap with first-order fine-tuning.
arXiv Detail & Related papers (2026-01-24T02:28:15Z)
Suspicious Alignment of SGD: A Fine-Grained Step Size Condition Analysis [30.6120085647449]
This paper explores the suspicious alignment phenomenon in gradient descent (SGD) under ill-conditioned optimization.<n>Specifically, during the initial phase of SGD updates, the alignment between the gradient and the dominant subspace tends to decrease.<n>We show that under sufficient ill-conditioning, a step size interval exists where projecting the SGD updates to the bulk space decreases the loss while projecting them to the dominant space increases the loss.
arXiv Detail & Related papers (2026-01-16T21:32:48Z)
CLAPS: Posterior-Aware Conformal Intervals via Last-Layer Laplace [0.0]
We present CLAPS, a posterior-aware conformal regression method that pairs a Last-Layer Laplace Approximation with split-conformal calibration.<n>From the resulting Gaussian posterior, CLAPS defines a simple two-sided posterior CDF score that aligns the conformity metric with the full shape, not just a point estimate.<n>This alignment yields narrower prediction intervals at the same target coverage, especially on small to medium datasets where data are scarce and uncertainty modeling matters.
arXiv Detail & Related papers (2025-12-01T07:58:21Z)
DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling [0.7866885337535715]
Adversarially guided diffusion sampling often achieves the target class, but sample quality degrades as deviations between the adversarially controlled and nominal trajectories accumulate.<n>We formalize this degradation as a path-space Kullback-Leibler divergence(path-KL) between controlled and nominal (uncontrolled) diffusion processes.<n>We show that minimizing this path-KL simultaneously tightens upper bounds on both the Wasserstein distance and Fréchet Inception Distance (FID), revealing a connection between adversarial control energy and perceptual fidelity.
arXiv Detail & Related papers (2025-12-01T00:15:05Z)
Closed-Form Last Layer Optimization [72.49151473937319]
Under a squared loss, the optimal solution to the linear last layer weights is known in closed-form.<n>We show this is equivalent to alternating between gradient descent steps on the backbone and closed-form updates on the last layer.
arXiv Detail & Related papers (2025-10-06T09:14:39Z)
Gaussian Primitive Optimized Deformable Retinal Image Registration [19.882820812725523]
Deformable retinal image registration is notoriously difficult due to large homogeneous regions and sparse but critical vascular features.<n>We introduce a novel iterative framework that performs structured message passing to overcome these challenges.<n>Experiments on the FIRE dataset show that GPO reduces the target registration error from 6.2,px to 2.4,px and increases the AUC at 25,px from 0.770 to 0.938.
arXiv Detail & Related papers (2025-08-23T00:44:50Z)
SPARE: Symmetrized Point-to-Plane Distance for Robust Non-Rigid 3D Registration [77.13381026159111]
We propose SPARE, a novel formulation that utilizes a symmetrized point-to-plane distance for robust non-rigid registration.<n>The proposed method greatly improves the accuracy of non-rigid registration problems and maintains relatively high solution efficiency.
arXiv Detail & Related papers (2024-05-30T15:55:04Z)
Semi-Supervised Coupled Thin-Plate Spline Model for Rotation Correction and Beyond [84.56978780892783]
We propose CoupledTPS, which iteratively couples multiple TPS with limited control points into a more flexible and powerful transformation. In light of the laborious annotation cost, we develop a semi-supervised learning scheme to improve warping quality by exploiting unlabeled data. Experiments demonstrate the superiority and universality of CoupledTPS over the existing state-of-the-art solutions for rotation correction.
arXiv Detail & Related papers (2024-01-24T13:03:28Z)
Learning Signed Hyper Surfaces for Oriented Point Cloud Normal Estimation [53.19926259132379]
We propose a novel method called SHS-Net for oriented normal estimation of point clouds by learning signed hyper surfaces. The signed hyper surfaces are implicitly learned in a high-dimensional feature space where the local and global information is aggregated. An attention-weighted normal prediction module is proposed as a decoder, which takes the local and global latent codes as input to predict oriented normals.
arXiv Detail & Related papers (2023-05-10T03:40:25Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
The Break-Even Point on Optimization Trajectories of Deep Neural Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory. We show that using a large learning rate in the initial phase of training reduces the variance of the gradient. We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.