Related papers: GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR

GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR

URL: http://arxiv.org/abs/2601.09361v1
Date: Wed, 14 Jan 2026 10:41:34 GMT
Title: GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR
Authors: Jiaying Zhang, Lei Shi, Jiguo Li, Jun Xu, Jiuchong Gao, Jinghua Hao, Renqing He,
Abstract summary: We propose GeoRA, which exploits the anisotropic and compressible nature of RL update subspaces.<n>GeoRA mitigates optimization bottlenecks caused by geometric misalignment.<n>It consistently outperforms established low-rank baselines on key mathematical benchmarks.
Score: 10.820638016337869
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is crucial for advancing large-scale reasoning models. However, existing parameter-efficient methods, such as PiSSA and MiLoRA, are designed for Supervised Fine-Tuning (SFT) and do not account for the distinct optimization dynamics and geometric structures of RLVR. Applying these methods directly leads to spectral collapse and optimization instability, which severely limit model performance. Meanwhile, alternative approaches that leverage update sparsity encounter significant efficiency bottlenecks on modern hardware due to unstructured computations. To address these challenges, we propose GeoRA (Geometry-Aware Low-Rank Adaptation), which exploits the anisotropic and compressible nature of RL update subspaces. GeoRA initializes adapters by extracting principal directions via Singular Value Decomposition (SVD) within a geometrically constrained subspace while freezing the residual components. This method preserves the pre-trained geometric structure and enables efficient GPU computation through dense operators. Experiments on Qwen and Llama demonstrate that GeoRA mitigates optimization bottlenecks caused by geometric misalignment. It consistently outperforms established low-rank baselines on key mathematical benchmarks, achieving state-of-the-art (SOTA) results. Moreover, GeoRA shows superior generalization and resilience to catastrophic forgetting in out-of-domain tasks.

Related papers

Joint Geometric and Trajectory Consistency Learning for One-Step Real-World Super-Resolution [14.52346301984322]
Diffusion-based Real-World Image Super-Resolution (Real-ISR) achieves impressive perceptual quality but suffers from high computational costs due to iterative sampling.<n>We propose GTASR (Geometric Trajectory Alignment Super-Resolution), a simple yet effective consistency training paradigm for Real-ISR.
arXiv Detail & Related papers (2026-02-27T18:13:31Z)
ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations [54.886931928255564]
Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning.<n>We propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE)<n>We show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality.
arXiv Detail & Related papers (2026-02-07T10:19:36Z)
Merging Beyond: Streaming LLM Updates via Activation-Guided Rotations [55.047454145941366]
Streaming Merging is an innovative model updating paradigm that conceptualizes merging as an iterative optimization process.<n> ARM is a strategy designed to approximate gradient descent dynamics.<n> ARM requires only early SFT checkpoints and, through iterative merging, surpasses the fully converged SFT model.
arXiv Detail & Related papers (2026-02-03T08:15:57Z)
The Path Not Taken: RLVR Provably Learns Off the Principals [85.41043469428365]
We show that sparsity is a surface artifact of a model-conditioned optimization bias.<n>We mechanistically explain these dynamics with a Three-Gate Theory.<n>We provide a parameter-level characterization of RLVR's learning dynamics.
arXiv Detail & Related papers (2025-11-11T18:49:45Z)
RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation [75.61028930882144]
We identify and quantify this critical issue, demonstrating a significant performance gap in 3D object detection when using synthetic versus real data.<n>We introduce Reinforcement Learning with Geometric Feedback (RLGF), RLGF uniquely refines video diffusion models by incorporating rewards from specialized latent-space AD perception models.<n> RLGF substantially reduces geometric errors (e.g., VP error by 21%, Depth error by 57%) and dramatically improves 3D object detection mAP by 12.7%, narrowing the gap to real-data performance.
arXiv Detail & Related papers (2025-09-20T02:23:36Z)
Reconstruction of SINR Maps from Sparse Measurements using Group Equivariant Non-Expansive Operators [1.9692747349111241]
We introduce a novel reconstruction framework based on Group Equivariant Non-Expansive Operators (GENEOs)<n>Our key insight is that for network management, preserving the topological structure of the SINR map, is often more critical than minimizing pixel-wise error.<n>Results show that while maintaining competitive MSE, our method dramatically outperforms established ML baselines in topological fidelity.
arXiv Detail & Related papers (2025-07-25T14:59:44Z)
Rolling Ball Optimizer: Learning by ironing out loss landscape wrinkles [19.667068548957143]
Training large neural networks (NNs) requires optimizing high-dimensional data-dependent loss functions.<n>These functions are often highly complex and textured, even fractal-like.<n>Noise in the training data can propagate forward and give rise to unrepresentative small-scale geometry.
arXiv Detail & Related papers (2025-05-26T05:26:21Z)
OSoRA: Output-Dimension and Singular-Value Initialized Low-Rank Adaptation [9.048461365342204]
We present OSoRA, a novel PEFT method for Large Language Models (LLMs)<n>OSoRA substantially reduces computational resource requirements by minimizing the number of trainable parameters during fine-tuning.<n> Comprehensive evaluations across mathematical reasoning, common sense reasoning, and other benchmarks demonstrate that OSoRA achieves comparable or superior performance to state-of-the-art methods.
arXiv Detail & Related papers (2025-05-20T13:34:06Z)
GeoLoRA: Geometric integration for parameter efficient fine-tuning [6.701651480567394]
Low-Rank Adaptation (LoRA) has become a widely used method for parameter-efficient fine-tuning of pre-trained neural networks. We introduce GeoLoRA, a novel approach that addresses the limitations by leveraging dynamical low-rank approximation theory. We demonstrate the effectiveness of GeoLoRA on several state-of-the-art benchmarks, showing that it outperforms existing methods in both accuracy and computational efficiency.
arXiv Detail & Related papers (2024-10-24T13:26:10Z)
Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape [52.98187034726091]
We introduce Flat-LoRA, which aims to identify a low-rank adaptation situated in a flat region of the full parameter space.<n>We show that Flat-LoRA improves both in-domain and out-of-domain generalization.
arXiv Detail & Related papers (2024-09-22T11:24:10Z)
iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS) We tackle the hypergradient computation in DARTS based on the implicit function theorem. We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.