Model Fusion through Bayesian Optimization in Language Model Fine-Tuning
- URL: http://arxiv.org/abs/2411.06710v1
- Date: Mon, 11 Nov 2024 04:36:58 GMT
- Title: Model Fusion through Bayesian Optimization in Language Model Fine-Tuning
- Authors: Chaeyun Jang, Hyungi Lee, Jungtaek Kim, Juho Lee,
- Abstract summary: Fine-tuning pre-trained models for downstream tasks is a widely adopted technique known for its adaptability and reliability across various domains.
We introduce a novel model fusion technique that optimize both the desired metric and loss through multi-objective Bayesian optimization.
Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method.
- Score: 16.86812534268461
- License:
- Abstract: Fine-tuning pre-trained models for downstream tasks is a widely adopted technique known for its adaptability and reliability across various domains. Despite its conceptual simplicity, fine-tuning entails several troublesome engineering choices, such as selecting hyperparameters and determining checkpoints from an optimization trajectory. To tackle the difficulty of choosing the best model, one effective solution is model fusion, which combines multiple models in a parameter space. However, we observe a large discrepancy between loss and metric landscapes during the fine-tuning of pre-trained language models. Building on this observation, we introduce a novel model fusion technique that optimizes both the desired metric and loss through multi-objective Bayesian optimization. In addition, to effectively select hyperparameters, we establish a two-stage procedure by integrating Bayesian optimization processes into our framework. Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method.
Related papers
- Trajectory-Based Multi-Objective Hyperparameter Optimization for Model Retraining [8.598456741786801]
We present a novel trajectory-based multi-objective Bayesian optimization algorithm.
Our algorithm outperforms the state-of-the-art multi-objectives in both locating better trade-offs and tuning efficiency.
arXiv Detail & Related papers (2024-05-24T07:43:45Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Towards Safe Multi-Task Bayesian Optimization [1.3654846342364308]
Reduced physical models of the system can be incorporated into the optimization process, accelerating it.
These models are able to offer an approximation of the actual system, and evaluating them is significantly cheaper.
Safety is a crucial criterion for online optimization methods such as Bayesian optimization.
arXiv Detail & Related papers (2023-12-12T13:59:26Z) - Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and
Optimization [59.386153202037086]
Predict-Then- framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving.
This approach can be inefficient and requires handcrafted, problem-specific rules for backpropagation through the optimization step.
This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by predictive models.
arXiv Detail & Related papers (2023-11-22T01:32:06Z) - A Survey on Multi-Objective based Parameter Optimization for Deep
Learning [1.3223682837381137]
We focus on exploring the effectiveness of multi-objective optimization strategies for parameter optimization in conjunction with deep neural networks.
The two methods are combined to provide valuable insights into the generation of predictions and analysis in multiple applications.
arXiv Detail & Related papers (2023-05-17T07:48:54Z) - Agent-based Collaborative Random Search for Hyper-parameter Tuning and
Global Function Optimization [0.0]
This paper proposes an agent-based collaborative technique for finding near-optimal values for any arbitrary set of hyper- parameters in a machine learning model.
The behavior of the presented model, specifically against the changes in its design parameters, is investigated in both machine learning and global function optimization applications.
arXiv Detail & Related papers (2023-03-03T21:10:17Z) - Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution.
This approach poses a number of implementation and optimization challenges.
We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z) - Bayesian Optimization for Selecting Efficient Machine Learning Models [53.202224677485525]
We present a unified Bayesian Optimization framework for jointly optimizing models for both prediction effectiveness and training efficiency.
Experiments on model selection for recommendation tasks indicate models selected this way significantly improves model training efficiency.
arXiv Detail & Related papers (2020-08-02T02:56:30Z) - Automatically Learning Compact Quality-aware Surrogates for Optimization
Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values.
Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making.
We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.