2026-03-31 Daily Report: Curvature-Guided LoRA: Steering in the pretrained NTK subspace

Curvature-Guided LoRA: Steering in the pretrained NTK subspace

Authors Frédéric Zheng, Alexandre Proutière

Affiliations KTH Royal Institute of Technology

Categories Method / Fine-Tuning Techniques / Curvature-adaptive low-rank updates, Theory / Optimization Theory / Newton-like second-order methods, Application / Parameter-Efficient Transfer Learning / Efficient adaptation in pretrained NTK subspace

License CC BY 4.0

Abstract Overview

This paper introduces the prediction alignment problem for parameter-efficient fine-tuning (PEFT), which aims to match the outputs of a LoRA-adapted model to those of full fine-tuning at the function level rather than aligning parameter updates. Working in the NTK regime, the authors show that this objective leads to a curvature-aware second-order formulation where optimal low-rank update directions correspond to a curvature-whitened gradient. Based on this analysis, they propose Curvature-Guided LoRA (CG-LoRA), which uses K-FAC-style local curvature approximations to select and scale adapter directions without explicitly forming large second-order matrices. Preliminary experiments are conducted on RoBERTa-base and T5-base across several GLUE tasks, comparing against LoRA-GA, LoRA-One, and rsLoRA baselines.

Novelty

The main novelty is the formulation of PEFT as a prediction alignment problem in function space, targeting output-level agreement with full fine-tuning rather than parameter-space matching. This formulation yields a theoretical connection between optimal low-rank adapter initialization and a Newton-like curvature-whitened gradient under the K-FAC approximation, resulting in a concrete, computationally efficient curvature-guided LoRA initialization scheme.

Results

On the reported RoBERTa-base GLUE experiments, CG-LoRA (no shift) achieves higher accuracy than LoRA-GA and LoRA-One across all five tested datasets, with notably lower variance and reduced sensitivity to learning rate on CoLA. On T5-base, results are more balanced, with CG-LoRA remaining competitive with existing initialization methods. The initialization procedure is reported as roughly twice as fast and requiring substantially less memory than LoRA-GA on the RoBERTa-base CoLA setup (3.58 s / 509.95 MiB vs. 7.01 s / 1.28 GiB).

Key Points

The paper derives a theoretical link between output-level alignment with full fine-tuning and curvature-aware low-rank updates in the pretrained NTK subspace, showing that optimal adapter directions come from a whitened gradient under the K-FAC approximation.
CG-LoRA constructs LoRA adapter initializations from a curvature-whitened gradient using K-FAC-style approximations, avoiding explicit construction of large second-order matrices and achieving lower initialization cost than LoRA-GA in the reported benchmark.
In preliminary experiments, CG-LoRA (no shift) shows the strongest improvements on RoBERTa-base relative to other LoRA initialization methods, with faster loss reduction and lower sensitivity to learning rate, while remaining competitive on T5-base where pretrained gradients are already informative.

References

arXiv: https://arxiv.org/abs/2603.29824v1
Fugu-MT: https://fugumt.com/fugumt/paper_check/2603.29824v1