CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure
- URL: http://arxiv.org/abs/2509.18993v1
- Date: Tue, 23 Sep 2025 13:43:02 GMT
- Title: CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure
- Authors: Boao Kong, Junzhu Liang, Yuxi Liu, Renjia Deng, Kun Yuan,
- Abstract summary: Cross-layer Low-Rank residual Network (CR-Net) is an innovative framework inspired by our discovery that inter-layer activation residuals possess low-rank properties.<n>CR-Net consistently outperforms state-of-the-art low-rank frameworks while requiring fewer computational resources and less memory.
- Score: 8.92064131103945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Low-rank architectures have become increasingly important for efficient large language model (LLM) pre-training, providing substantial reductions in both parameter complexity and memory/computational demands. Despite these advantages, current low-rank methods face three critical shortcomings: (1) compromised model performance, (2) considerable computational overhead, and (3) limited activation memory savings. To address these limitations, we propose Cross-layer Low-Rank residual Network (CR-Net), an innovative parameter-efficient framework inspired by our discovery that inter-layer activation residuals possess low-rank properties. CR-Net implements this insight through a dual-path architecture that efficiently reconstructs layer activations by combining previous-layer outputs with their low-rank differences, thereby maintaining high-rank information with minimal parameters. We further develop a specialized activation recomputation strategy tailored for CR-Net that dramatically reduces memory requirements. Extensive pre-training experiments across model scales from 60M to 7B parameters demonstrate that CR-Net consistently outperforms state-of-the-art low-rank frameworks while requiring fewer computational resources and less memory.
Related papers
- Training Large Reasoning Models Efficiently via Progressive Thought Encoding [63.254758972725654]
Large reasoning models (LRMs) excel on complex problems but face a critical barrier to efficiency.<n>We introduce Progressive Thought, a parameter-efficient fine-tuning method that enables LRMs to reason effectively under fixed-size caches.
arXiv Detail & Related papers (2026-02-18T20:03:38Z) - Sequential Reservoir Computing for Efficient High-Dimensional Spatiotemporal Forecasting [1.5313142881179707]
Reservoir Computing (RC) mitigates challenges by replacing backpropagation with fixed recurrent atemporal readout optimization.<n>We introduce a Sequential Reservoir Computing (Sequential RC) architecture that decomposes a large reservoir into a series of smaller, interconnected layers.
arXiv Detail & Related papers (2026-01-01T02:24:56Z) - Memory-Efficient Fine-Tuning via Low-Rank Activation Compression [16.44044624606008]
Low-Rank Activation Compression (LoRAct) is a memory-efficient fine-tuning approach.<n>LoRAct reduces activation memory by approximately 80% in comparison with the widely adopted LoRA method.
arXiv Detail & Related papers (2025-09-27T19:48:32Z) - S2A: A Unified Framework for Parameter and Memory Efficient Transfer Learning [8.602744958104969]
We propose a new PETL framework, called Structure to Activation (S2A), to reduce the memory footprint of activation during fine-tuning.<n>Specifically, our framework consists of: 1) Activation modules design(i.e., bias, prompt and side modules) in the parametric model structure, which results in a significant reduction of adjustable parameters and activation memory.<n>We show that our methods not only outperform existing PETL techniques, achieving a fourfold reduction in GPU memory footprint on average, but also shows competitive performance in accuracy with fewer tunable parameters.
arXiv Detail & Related papers (2025-03-11T08:10:03Z) - LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.<n>Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.<n>We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z) - Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity [39.483346492111515]
Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference.<n>Unstructured sparsity offers a compelling solution, enabling substantial reductions in compute and memory requirements when accelerated by compatible hardware platforms.<n>We find that highly sparse linear RNNs consistently achieve better efficiency-performance trade-offs than dense baselines.
arXiv Detail & Related papers (2025-02-03T13:09:21Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Residual Local Feature Network for Efficient Super-Resolution [20.62809970985125]
In this work, we propose a novel Residual Local Feature Network (RLFN)
The main idea is using three convolutional layers for residual local feature learning to simplify feature aggregation.
In addition, we won the first place in the runtime track of the NTIRE 2022 efficient super-resolution challenge.
arXiv Detail & Related papers (2022-05-16T08:46:34Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.