FuguReport

Curvature-Guided LoRA: Steering in the pretrained NTK subspace

Authors Frédéric Zheng, Alexandre Proutière
Affiliations KTH Royal Institute of Technology
Categories Method / Fine-Tuning Techniques / Curvature-adaptive low-rank updates, Theory / Optimization Theory / Newton-like second-order methods, Application / Parameter-Efficient Transfer Learning / Efficient adaptation in pretrained NTK subspace
License CC BY 4.0

Abstract Overview

This paper introduces the prediction alignment problem for parameter-efficient fine-tuning (PEFT), which aims to match the outputs of a LoRA-adapted model to those of full fine-tuning at the function level rather than aligning parameter updates. Working in the NTK regime, the authors show that this objective leads to a curvature-aware second-order formulation where optimal low-rank update directions correspond to a curvature-whitened gradient. Based on this analysis, they propose Curvature-Guided LoRA (CG-LoRA), which uses K-FAC-style local curvature approximations to select and scale adapter directions without explicitly forming large second-order matrices. Preliminary experiments are conducted on RoBERTa-base and T5-base across several GLUE tasks, comparing against LoRA-GA, LoRA-One, and rsLoRA baselines.

Novelty

The main novelty is the formulation of PEFT as a prediction alignment problem in function space, targeting output-level agreement with full fine-tuning rather than parameter-space matching. This formulation yields a theoretical connection between optimal low-rank adapter initialization and a Newton-like curvature-whitened gradient under the K-FAC approximation, resulting in a concrete, computationally efficient curvature-guided LoRA initialization scheme.

Results

On the reported RoBERTa-base GLUE experiments, CG-LoRA (no shift) achieves higher accuracy than LoRA-GA and LoRA-One across all five tested datasets, with notably lower variance and reduced sensitivity to learning rate on CoLA. On T5-base, results are more balanced, with CG-LoRA remaining competitive with existing initialization methods. The initialization procedure is reported as roughly twice as fast and requiring substantially less memory than LoRA-GA on the RoBERTa-base CoLA setup (3.58 s / 509.95 MiB vs. 7.01 s / 1.28 GiB).

Key Points

  1. The paper derives a theoretical link between output-level alignment with full fine-tuning and curvature-aware low-rank updates in the pretrained NTK subspace, showing that optimal adapter directions come from a whitened gradient under the K-FAC approximation.
  2. CG-LoRA constructs LoRA adapter initializations from a curvature-whitened gradient using K-FAC-style approximations, avoiding explicit construction of large second-order matrices and achieving lower initialization cost than LoRA-GA in the reported benchmark.
  3. In preliminary experiments, CG-LoRA (no shift) shows the strongest improvements on RoBERTa-base relative to other LoRA initialization methods, with faster loss reduction and lower sensitivity to learning rate, while remaining competitive on T5-base where pretrained gradients are already informative.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.