Related papers: Graph-Based Spectral Decomposition for Parameter Coordination in Language Model Fine-Tuning

Graph-Based Spectral Decomposition for Parameter Coordination in Language Model Fine-Tuning

URL: http://arxiv.org/abs/2504.19583v1
Date: Mon, 28 Apr 2025 08:42:35 GMT
Title: Graph-Based Spectral Decomposition for Parameter Coordination in Language Model Fine-Tuning
Authors: Hanlu Zhang, Yumeng Ma, Shuo Wang, Guiran Liu, Binrong Zhu,
Abstract summary: The goal is to improve both fine-tuning efficiency and structural awareness during training.<n>A weighted graph is constructed, and Laplacian spectral decomposition is applied to enable frequency-domain modeling.<n>A spectral filtering mechanism is introduced during the optimization phase, enhancing the model's training stability and convergence behavior.
Score: 5.69600290598441
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a parameter collaborative optimization algorithm for large language models, enhanced with graph spectral analysis. The goal is to improve both fine-tuning efficiency and structural awareness during training. In the proposed method, the parameters of a pre-trained language model are treated as nodes in a graph. A weighted graph is constructed, and Laplacian spectral decomposition is applied to enable frequency-domain modeling and structural representation of the parameter space. Based on this structure, a joint loss function is designed. It combines the task loss with a spectral regularization term to facilitate collaborative updates among parameters. In addition, a spectral filtering mechanism is introduced during the optimization phase. This mechanism adjusts gradients in a structure-aware manner, enhancing the model's training stability and convergence behavior. The method is evaluated on multiple tasks, including traditional fine-tuning comparisons, few-shot generalization tests, and convergence speed analysis. In all settings, the proposed approach demonstrates superior performance. The experimental results confirm that the spectral collaborative optimization framework effectively reduces parameter perturbations and improves fine-tuning quality while preserving overall model performance. This work contributes significantly to the field of artificial intelligence by advancing parameter-efficient training methodologies for large-scale models, reinforcing the importance of structural signal processing in deep learning optimization, and offering a robust, generalizable framework for enhancing language model adaptability and performance.

Related papers

Tuning for Trustworthiness -- Balancing Performance and Explanation Consistency in Neural Network Optimization [49.567092222782435]
We introduce the novel concept of XAI consistency, defined as the agreement among different feature attribution methods.<n>We create a multi-objective optimization framework that balances predictive performance with explanation.<n>Our research provides a foundation for future investigations into whether models from the trade-off zone-balancing performance loss and XAI consistency-exhibit greater robustness.
arXiv Detail & Related papers (2025-05-12T13:19:14Z)
Contextual Gradient Flow Modeling for Large Language Model Generalization in Multi-Scale Feature Spaces [0.0]
A structured gradient refinement framework was introduced to incorporate multi-scale contextual adjustments.<n>The hierarchical adjustment of weight updates provided an alternative to conventional backpropagation.<n> structured optimization strategies mitigated overfitting while preserving adaptability across heterogeneous text distributions.
arXiv Detail & Related papers (2025-02-06T22:57:40Z)
Parameter Tracking in Federated Learning with Adaptive Optimization [14.111863825607001]
In Federated Learning (FL), model training performance is strongly impacted by data heterogeneity across clients.<n> Gradient Tracking (GT) has recently emerged as a solution which mitigates this issue by introducing correction terms to local model updates.<n>To date, GT has only been considered under Gradient (SGD)-based model Descent training, while modern FL frameworks increasingly employ adaptives for improved convergence.
arXiv Detail & Related papers (2025-02-04T21:21:30Z)
Context-Aware Neural Gradient Mapping for Fine-Grained Instruction Processing [0.0]
This paper introduces a dynamic gradient adjustment mechanism, incorporating contextual embeddings directly into the optimization process. The proposed framework consistently outperforms baseline models across various metrics, including accuracy, robustness to noise, and computational efficiency. The integration of context-specific embeddings allows for a more complex understanding of language, thereby improving the model's ability to handle diverse linguistic phenomena.
arXiv Detail & Related papers (2025-01-24T21:49:24Z)
Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models [73.88009808326387]
We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and their basis vectors of pretrained weights. We introduce Spectral Ortho Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity.
arXiv Detail & Related papers (2024-05-31T17:43:35Z)
Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks [6.596361762662328]
Internal structure and operation mechanism of large-scale language models are analyzed theoretically. We evaluate the contribution of adaptive optimization algorithms (such as AdamW), massively parallel computing techniques, and mixed precision training strategies.
arXiv Detail & Related papers (2024-05-20T00:10:00Z)
Majority Kernels: An Approach to Leverage Big Model Dynamics for Efficient Small Model Training [32.154166415680066]
Methods like distillation, compression or quantization help leverage the highly performant large models to induce smaller performant ones. This paper explores the hypothesis that a single training run can simultaneously train a larger model for performance and derive a smaller model for deployment.
arXiv Detail & Related papers (2024-02-07T17:07:41Z)
Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO) We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z)
An Optimization-based Deep Equilibrium Model for Hyperspectral Image Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem. A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network. The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Parameter Tuning Strategies for Metaheuristic Methods Applied to Discrete Optimization of Structural Design [0.0]
This paper presents several strategies to tune the parameters of metaheuristic methods for (discrete) design optimization of reinforced concrete (RC) structures. A novel utility metric is proposed, based on the area under the average performance curve.
arXiv Detail & Related papers (2021-10-12T17:34:39Z)
Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices. We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT) Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)
Additive Tree-Structured Covariance Function for Conditional Parameter Spaces in Bayesian Optimization [34.89735938765757]
We generalize the additive assumption to tree-structured functions. By incorporating the structure information of parameter spaces and the additive assumption in the BO loop, we develop a parallel algorithm to optimize the acquisition function.
arXiv Detail & Related papers (2020-06-21T11:21:55Z)
Automatically Learning Compact Quality-aware Surrogates for Optimization Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making. We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.