Leveraging free energy in pretraining model selection for improved fine-tuning
- URL: http://arxiv.org/abs/2410.05612v1
- Date: Tue, 8 Oct 2024 01:50:21 GMT
- Title: Leveraging free energy in pretraining model selection for improved fine-tuning
- Authors: Michael Munn, Susan Wei,
- Abstract summary: We introduce a free energy criterion that quantifies a checkpoint's adaptability by measuring the concentration of nearby favorable parameters for the downstream task.
We provide empirical evidence that the free energy criterion reliably correlates with improved fine-tuning performance.
- Score: 4.005483185111992
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in artificial intelligence have been fueled by the development of foundation models such as BERT, GPT, T5, and Vision Transformers. These models are first pretrained on vast and diverse datasets and then adapted to specific downstream tasks, often with significantly less data. However, the mechanisms behind the success of this ubiquitous pretrain-then-adapt paradigm remain underexplored, particularly the characteristics of pretraining checkpoints that lend themselves to good downstream adaptation. We introduce a Bayesian model selection criterion, called the downstream free energy, which quantifies a checkpoint's adaptability by measuring the concentration of nearby favorable parameters for the downstream task. We demonstrate that this free energy criterion can be effectively implemented without access to the downstream data or prior knowledge of the downstream task. Furthermore, we provide empirical evidence that the free energy criterion reliably correlates with improved fine-tuning performance, offering a principled approach to predicting model adaptability.
Related papers
- SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters.
Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks.
Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z) - Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization [28.977757627384165]
Domain Domain (DG) aims to avoid the performance degradation of the model when the distribution shift between the limited training data and unseen test data occurs.
Recently, foundation models with enormous parameters have been pre-trained with huge datasets, demonstrating strong generalization ability.
Our framework achieves SOTA performance on five DG benchmarks, while only requiring training a small number of parameters without adding additional testing cost.
arXiv Detail & Related papers (2024-07-21T07:50:49Z) - AiGAS-dEVL: An Adaptive Incremental Neural Gas Model for Drifting Data Streams under Extreme Verification Latency [6.7236795813629]
In streaming setups data flows are affected by factors that yield non-stationarities in the patterns (concept drift)
We propose a novel approach, AiGAS-dEVL, which relies on growing neural gas to characterize the distributions of all concepts detected within the stream over time.
Our approach exposes that the online analysis of the behavior of these points over time facilitates the definition of the evolution of concepts in the feature space.
arXiv Detail & Related papers (2024-07-07T14:04:57Z) - Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters.
EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z) - On the contribution of pre-trained models to accuracy and utility in
modeling distributed energy resources [0.0]
We evaluate the improvement in predictive accuracy due to pre-trained models, both with and without fine-tuning.
We consider the question of fairness: do pre-trained models create equal improvements for heterogeneous agents, and how does this translate to downstream utility?
arXiv Detail & Related papers (2023-02-22T22:29:40Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - End-to-End Weak Supervision [15.125993628007972]
We propose an end-to-end approach for directly learning the downstream model.
We show improved performance over prior work in terms of end model performance on downstream test sets.
arXiv Detail & Related papers (2021-07-05T19:10:11Z) - Back2Future: Leveraging Backfill Dynamics for Improving Real-time
Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task.
'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature.
We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z) - Energy-Based Processes for Exchangeable Data [109.04978766553612]
We introduce Energy-Based Processes (EBPs) to extend energy based models to exchangeable data.
A key advantage of EBPs is the ability to express more flexible distributions over sets without restricting their cardinality.
We develop an efficient training procedure for EBPs that demonstrates state-of-the-art performance on a variety of tasks.
arXiv Detail & Related papers (2020-03-17T04:26:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.