SPINAL -- Scaling-law and Preference Integration in Neural Alignment Layers
- URL: http://arxiv.org/abs/2601.06238v1
- Date: Thu, 08 Jan 2026 17:47:12 GMT
- Title: SPINAL -- Scaling-law and Preference Integration in Neural Alignment Layers
- Authors: Arion Das, Partha Pratim Saha, Amit Dhanda, Vinija Jain, Aman Chadha, Amitava Das,
- Abstract summary: We introduce SPINAL, a diagnostic that measures how alignment reshapes representations across depth.<n>Across model families, DPO produces a layerwise calibration effect concentrated in the final decoder blocks.<n>Aligned checkpoints show a late-layer ramp-up in contraction and a smooth reduction in transport, consistent with tightened and stabilized policy mass.
- Score: 16.976750197698063
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Direct Preference Optimization (DPO) is a principled, scalable alternative to RLHF for aligning large language models from pairwise preferences, but its internal geometric footprint remains undercharacterized, limiting audits, checkpoint comparisons, and failure prediction. We introduce SPINAL (Scaling-law and Preference Integration in Neural Alignment Layers), a diagnostic that measures how alignment reshapes representations across depth by tracing localized structural change layer by layer. Across model families, DPO produces a layerwise calibration effect concentrated in the final decoder blocks (often layers 21-30), where preference gradients most directly affect the next-token distribution. SPINAL encodes each checkpoint as a depth trace over (layer index, contraction score, transport score). The contraction score summarizes how quickly the tail of a layer's spectrum decays (how fast small modes vanish); higher values indicate stronger contraction into fewer effective directions. The transport score summarizes how much the token distribution shifts between adjacent layers using a bounded overlap measure; lower values indicate shorter, smoother steps through representation space. Aligned checkpoints show a late-layer ramp-up in contraction and a smooth reduction in transport, consistent with tightened and stabilized policy mass, while unaligned models trace higher-curvature, more entropic, and geometrically incoherent depth paths. Overall, alignment is geometrically localized: the final layers encode the dominant preference-induced corrections. SPINAL turns this localization into a practical audit signal, quantifying where alignment concentrates, how strongly it manifests, and when it begins to destabilize during training.
Related papers
- Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misalignment [27.352639822596146]
Cross-worker divergence in losses and gradients can remain invisible under conventional monitoring signals.<n>We propose a model-agnostic diagnostic framework that quantifies worker-level consistency using training signals readily available in standard pipelines.
arXiv Detail & Related papers (2026-02-16T04:42:30Z) - AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning [8.698253005940503]
We propose Activation-Guided Zeroth-Order optimization (AGZO)<n>Unlike prior methods, AGZO extracts a compact, activation-informed subspace on the fly during the forward pass and restricts perturbations to this low-rank subspace.<n>AGZO consistently outperforms state-of-the-art ZO baselines and significantly narrows the performance gap with first-order fine-tuning.
arXiv Detail & Related papers (2026-01-24T02:28:15Z) - Suspicious Alignment of SGD: A Fine-Grained Step Size Condition Analysis [30.6120085647449]
This paper explores the suspicious alignment phenomenon in gradient descent (SGD) under ill-conditioned optimization.<n>Specifically, during the initial phase of SGD updates, the alignment between the gradient and the dominant subspace tends to decrease.<n>We show that under sufficient ill-conditioning, a step size interval exists where projecting the SGD updates to the bulk space decreases the loss while projecting them to the dominant space increases the loss.
arXiv Detail & Related papers (2026-01-16T21:32:48Z) - CLAPS: Posterior-Aware Conformal Intervals via Last-Layer Laplace [0.0]
We present CLAPS, a posterior-aware conformal regression method that pairs a Last-Layer Laplace Approximation with split-conformal calibration.<n>From the resulting Gaussian posterior, CLAPS defines a simple two-sided posterior CDF score that aligns the conformity metric with the full shape, not just a point estimate.<n>This alignment yields narrower prediction intervals at the same target coverage, especially on small to medium datasets where data are scarce and uncertainty modeling matters.
arXiv Detail & Related papers (2025-12-01T07:58:21Z) - DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling [0.7866885337535715]
Adversarially guided diffusion sampling often achieves the target class, but sample quality degrades as deviations between the adversarially controlled and nominal trajectories accumulate.<n>We formalize this degradation as a path-space Kullback-Leibler divergence(path-KL) between controlled and nominal (uncontrolled) diffusion processes.<n>We show that minimizing this path-KL simultaneously tightens upper bounds on both the Wasserstein distance and Fréchet Inception Distance (FID), revealing a connection between adversarial control energy and perceptual fidelity.
arXiv Detail & Related papers (2025-12-01T00:15:05Z) - Closed-Form Last Layer Optimization [72.49151473937319]
Under a squared loss, the optimal solution to the linear last layer weights is known in closed-form.<n>We show this is equivalent to alternating between gradient descent steps on the backbone and closed-form updates on the last layer.
arXiv Detail & Related papers (2025-10-06T09:14:39Z) - Gaussian Primitive Optimized Deformable Retinal Image Registration [19.882820812725523]
Deformable retinal image registration is notoriously difficult due to large homogeneous regions and sparse but critical vascular features.<n>We introduce a novel iterative framework that performs structured message passing to overcome these challenges.<n>Experiments on the FIRE dataset show that GPO reduces the target registration error from 6.2,px to 2.4,px and increases the AUC at 25,px from 0.770 to 0.938.
arXiv Detail & Related papers (2025-08-23T00:44:50Z) - SPARE: Symmetrized Point-to-Plane Distance for Robust Non-Rigid 3D Registration [77.13381026159111]
We propose SPARE, a novel formulation that utilizes a symmetrized point-to-plane distance for robust non-rigid registration.<n>The proposed method greatly improves the accuracy of non-rigid registration problems and maintains relatively high solution efficiency.
arXiv Detail & Related papers (2024-05-30T15:55:04Z) - Semi-Supervised Coupled Thin-Plate Spline Model for Rotation Correction and Beyond [84.56978780892783]
We propose CoupledTPS, which iteratively couples multiple TPS with limited control points into a more flexible and powerful transformation.
In light of the laborious annotation cost, we develop a semi-supervised learning scheme to improve warping quality by exploiting unlabeled data.
Experiments demonstrate the superiority and universality of CoupledTPS over the existing state-of-the-art solutions for rotation correction.
arXiv Detail & Related papers (2024-01-24T13:03:28Z) - Learning Signed Hyper Surfaces for Oriented Point Cloud Normal Estimation [53.19926259132379]
We propose a novel method called SHS-Net for oriented normal estimation of point clouds by learning signed hyper surfaces.
The signed hyper surfaces are implicitly learned in a high-dimensional feature space where the local and global information is aggregated.
An attention-weighted normal prediction module is proposed as a decoder, which takes the local and global latent codes as input to predict oriented normals.
arXiv Detail & Related papers (2023-05-10T03:40:25Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - The Break-Even Point on Optimization Trajectories of Deep Neural
Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory.
We show that using a large learning rate in the initial phase of training reduces the variance of the gradient.
We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.