Related papers: Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

URL: http://arxiv.org/abs/2411.06710v1
Date: Mon, 11 Nov 2024 04:36:58 GMT
Title: Model Fusion through Bayesian Optimization in Language Model Fine-Tuning
Authors: Chaeyun Jang, Hyungi Lee, Jungtaek Kim, Juho Lee,
Abstract summary: Fine-tuning pre-trained models for downstream tasks is a widely adopted technique known for its adaptability and reliability across various domains. We introduce a novel model fusion technique that optimize both the desired metric and loss through multi-objective Bayesian optimization. Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method.
Score: 16.86812534268461
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning pre-trained models for downstream tasks is a widely adopted technique known for its adaptability and reliability across various domains. Despite its conceptual simplicity, fine-tuning entails several troublesome engineering choices, such as selecting hyperparameters and determining checkpoints from an optimization trajectory. To tackle the difficulty of choosing the best model, one effective solution is model fusion, which combines multiple models in a parameter space. However, we observe a large discrepancy between loss and metric landscapes during the fine-tuning of pre-trained language models. Building on this observation, we introduce a novel model fusion technique that optimizes both the desired metric and loss through multi-objective Bayesian optimization. In addition, to effectively select hyperparameters, we establish a two-stage procedure by integrating Bayesian optimization processes into our framework. Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method.

Related papers

From Parameter to Representation: A Closed-Form Approach for Controllable Model Merging [22.794831741556468]
Model merging combines expert models for multitask performance but faces challenges from parameter interference.<n>Existing approaches employ a compile-then-query paradigm, performing a costly offline multi-objective optimization to enable fast, preference-aware model generation.<n>We model this correction as an optimal linear transformation, yielding a closed-form solution that replaces the entire offline optimization process with a single-step, architecture-agnostic computation.
arXiv Detail & Related papers (2025-11-14T04:09:25Z)
Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization [13.271737599933147]
We introduce EntroPO, an entropy-enhanced framework that adapts existing preference optimization algorithms to the multi-turn, tool-assisted setting.<n>We validate EntroPO by fine-tuning a diverse suite of models from different families and sizes.<n>On the swebench leaderboard, our approach establishes new state-of-the-art results among open-weight models.
arXiv Detail & Related papers (2025-09-15T20:36:19Z)
Preference-Guided Diffusion for Multi-Objective Offline Optimization [64.08326521234228]
We propose a preference-guided diffusion model for offline multi-objective optimization. Our guidance is a preference model trained to predict the probability that one design dominates another. Our results highlight the effectiveness of classifier-guided diffusion models in generating diverse and high-quality solutions.
arXiv Detail & Related papers (2025-03-21T16:49:38Z)
Trajectory-Based Multi-Objective Hyperparameter Optimization for Model Retraining [8.598456741786801]
We present a novel trajectory-based multi-objective Bayesian optimization algorithm. Our algorithm outperforms the state-of-the-art multi-objectives in both locating better trade-offs and tuning efficiency.
arXiv Detail & Related papers (2024-05-24T07:43:45Z)
End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality. This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives. It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z)
Towards Safe Multi-Task Bayesian Optimization [1.3654846342364308]
Reduced physical models of the system can be incorporated into the optimization process, accelerating it. These models are able to offer an approximation of the actual system, and evaluating them is significantly cheaper. Safety is a crucial criterion for online optimization methods such as Bayesian optimization.
arXiv Detail & Related papers (2023-12-12T13:59:26Z)
Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and Optimization [59.386153202037086]
Predict-Then- framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving. This approach can be inefficient and requires handcrafted, problem-specific rules for backpropagation through the optimization step. This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by predictive models.
arXiv Detail & Related papers (2023-11-22T01:32:06Z)
A Survey on Multi-Objective based Parameter Optimization for Deep Learning [1.3223682837381137]
We focus on exploring the effectiveness of multi-objective optimization strategies for parameter optimization in conjunction with deep neural networks. The two methods are combined to provide valuable insights into the generation of predictions and analysis in multiple applications.
arXiv Detail & Related papers (2023-05-17T07:48:54Z)
Agent-based Collaborative Random Search for Hyper-parameter Tuning and Global Function Optimization [0.0]
This paper proposes an agent-based collaborative technique for finding near-optimal values for any arbitrary set of hyper- parameters in a machine learning model. The behavior of the presented model, specifically against the changes in its design parameters, is investigated in both machine learning and global function optimization applications.
arXiv Detail & Related papers (2023-03-03T21:10:17Z)
Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution. This approach poses a number of implementation and optimization challenges. We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z)
Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large. Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z)
Bayesian Optimization for Selecting Efficient Machine Learning Models [53.202224677485525]
We present a unified Bayesian Optimization framework for jointly optimizing models for both prediction effectiveness and training efficiency. Experiments on model selection for recommendation tasks indicate models selected this way significantly improves model training efficiency.
arXiv Detail & Related papers (2020-08-02T02:56:30Z)
Automatically Learning Compact Quality-aware Surrogates for Optimization Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making. We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.