Curvature-Guided LoRA: Steering in the pretrained NTK subspace
Abstract Overview
This paper introduces the prediction alignment problem for parameter-efficient fine-tuning (PEFT), which aims to match the outputs of a LoRA-adapted model to those of full fine-tuning at the function level rather than aligning parameter updates. Working in the NTK regime, the authors show that this objective leads to a curvature-aware second-order formulation where optimal low-rank update directions correspond to a curvature-whitened gradient. Based on this analysis, they propose Curvature-Guided LoRA (CG-LoRA), which uses K-FAC-style local curvature approximations to select and scale adapter directions without explicitly forming large second-order matrices. Preliminary experiments are conducted on RoBERTa-base and T5-base across several GLUE tasks, comparing against LoRA-GA, LoRA-One, and rsLoRA baselines.
Novelty
The main novelty is the formulation of PEFT as a prediction alignment problem in function space, targeting output-level agreement with full fine-tuning rather than parameter-space matching. This formulation yields a theoretical connection between optimal low-rank adapter initialization and a Newton-like curvature-whitened gradient under the K-FAC approximation, resulting in a concrete, computationally efficient curvature-guided LoRA initialization scheme.
Results
On the reported RoBERTa-base GLUE experiments, CG-LoRA (no shift) achieves higher accuracy than LoRA-GA and LoRA-One across all five tested datasets, with notably lower variance and reduced sensitivity to learning rate on CoLA. On T5-base, results are more balanced, with CG-LoRA remaining competitive with existing initialization methods. The initialization procedure is reported as roughly twice as fast and requiring substantially less memory than LoRA-GA on the RoBERTa-base CoLA setup (3.58 s / 509.95 MiB vs. 7.01 s / 1.28 GiB).
Key Points
- The paper derives a theoretical link between output-level alignment with full fine-tuning and curvature-aware low-rank updates in the pretrained NTK subspace, showing that optimal adapter directions come from a whitened gradient under the K-FAC approximation.
- CG-LoRA constructs LoRA adapter initializations from a curvature-whitened gradient using K-FAC-style approximations, avoiding explicit construction of large second-order matrices and achieving lower initialization cost than LoRA-GA in the reported benchmark.
- In preliminary experiments, CG-LoRA (no shift) shows the strongest improvements on RoBERTa-base relative to other LoRA initialization methods, with faster loss reduction and lower sensitivity to learning rate, while remaining competitive on T5-base where pretrained gradients are already informative.