Adaptive Principal Components Allocation with the $\ell_{2,g}$-regularized Gaussian Graphical Model for Efficient Fine-Tuning Large Models
- URL: http://arxiv.org/abs/2412.08592v1
- Date: Wed, 11 Dec 2024 18:11:21 GMT
- Title: Adaptive Principal Components Allocation with the $\ell_{2,g}$-regularized Gaussian Graphical Model for Efficient Fine-Tuning Large Models
- Authors: Jingjing Zheng, Yankai Cao,
- Abstract summary: We propose a novel -Efficient Fine-ning (PEFT) approach based on Gaussian Graphical Models (GGMs)
We demonstrate the effectiveness of the proposed approach, achieving competitive performance with significantly fewer trainable parameters.
- Score: 7.6656660956453635
- License:
- Abstract: In this work, we propose a novel Parameter-Efficient Fine-Tuning (PEFT) approach based on Gaussian Graphical Models (GGMs), marking the first application of GGMs to PEFT tasks, to the best of our knowledge. The proposed method utilizes the $\ell_{2,g}$-norm to effectively select critical parameters and capture global dependencies. The resulting non-convex optimization problem is efficiently solved using a Block Coordinate Descent (BCD) algorithm. Experimental results on the GLUE benchmark [24] for fine-tuning RoBERTa-Base [18] demonstrate the effectiveness of the proposed approach, achieving competitive performance with significantly fewer trainable parameters. The code for this work is available at: https://github.com/jzheng20/Course projects.git.
Related papers
- Computing Approximate Graph Edit Distance via Optimal Transport [16.327678462502668]
Given a graph pair $(G1, G2)$, graph edit distance (GED) is defined as the minimum number of edit operations converting $G1$ to $G2$.
GEDIOT is based on inverse optimal transport that leverages a learnable Sinkhorn algorithm to generate the coupling matrix.
GEDGW, models GED computation as a linear combination of optimal transport and its variant, Gromov-Wasserstein discrepancy, for node and edge operations.
arXiv Detail & Related papers (2024-12-25T09:55:14Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - Learning Regions of Interest for Bayesian Optimization with Adaptive
Level-Set Estimation [84.0621253654014]
We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest.
We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO.
arXiv Detail & Related papers (2023-07-25T09:45:47Z) - Fantasizing with Dual GPs in Bayesian Optimization and Active Learning [14.050425158209826]
We focus on fantasizing' batch acquisition functions that need the ability to condition on new fantasized data.
By using a sparse Dual GP parameterization, we gain linear scaling with batch size as well as one-step updates for non-Gaussian likelihoods.
arXiv Detail & Related papers (2022-11-02T11:37:06Z) - Optimization for Robustness Evaluation beyond $\ell_p$ Metrics [11.028091609739738]
Empirical evaluation of deep learning models against adversarial attacks involves solving nontrivial constrained optimization problems.
We introduce a novel framework that blends a general-purpose constrained-optimization solver PyGRANSO, With Constraint-Folding (PWCF) to add reliability and generality to robustness evaluation.
arXiv Detail & Related papers (2022-10-02T20:48:05Z) - Surrogate modeling for Bayesian optimization beyond a single Gaussian
process [62.294228304646516]
We propose a novel Bayesian surrogate model to balance exploration with exploitation of the search space.
To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model.
To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret.
arXiv Detail & Related papers (2022-05-27T16:43:10Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Scalable Combinatorial Bayesian Optimization with Tractable Statistical
models [44.25245545568633]
We study the problem of optimizing blackbox functions over Relaxation spaces (e.g., sets, sequences, trees, and graphs)
Based on recent advances in submodular relaxation, we study an approach as Parametrized Submodular (PSR) towards the goal of improving the scalability and accuracy of solving AFO problems for BOCS model.
Experiments on diverse benchmark problems show significant improvements with PSR for BOCS model.
arXiv Detail & Related papers (2020-08-18T22:56:46Z) - Black-Box Certification with Randomized Smoothing: A Functional
Optimization Based Framework [60.981406394238434]
We propose a general framework of adversarial certification with non-Gaussian noise and for more general types of attacks.
Our proposed methods achieve better certification results than previous works and provide a new perspective on randomized smoothing certification.
arXiv Detail & Related papers (2020-02-21T07:52:47Z) - Revisiting Graph based Collaborative Filtering: A Linear Residual Graph
Convolutional Network Approach [55.44107800525776]
Graph Convolutional Networks (GCNs) are state-of-the-art graph based representation learning models.
In this paper, we revisit GCN based Collaborative Filtering (CF) based Recommender Systems (RS)
We show that removing non-linearities would enhance recommendation performance, consistent with the theories in simple graph convolutional networks.
We propose a residual network structure that is specifically designed for CF with user-item interaction modeling.
arXiv Detail & Related papers (2020-01-28T04:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.