Related papers: Assessing the Macro and Micro Effects of Random Seeds on Fine-Tuning Large Language Models

Assessing the Macro and Micro Effects of Random Seeds on Fine-Tuning Large Language Models

URL: http://arxiv.org/abs/2503.07329v1
Date: Mon, 10 Mar 2025 13:42:04 GMT
Title: Assessing the Macro and Micro Effects of Random Seeds on Fine-Tuning Large Language Models
Authors: Hao Zhou, Guergana Savova, Lijing Wang,
Abstract summary: We systematically evaluate the effects of random seeds on large language models (LLMs) using the GLUE and SuperGLUE benchmarks.<n>Our experiments reveal significant variance at both macro and micro levels, underscoring the need for careful consideration of random seeds in fine-tuning and evaluation.
Score: 15.45085233672237
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The impact of random seeds in fine-tuning large language models (LLMs) has been largely overlooked despite its potential influence on model performance.In this study, we systematically evaluate the effects of random seeds on LLMs using the GLUE and SuperGLUE benchmarks. We analyze the macro-level impact through traditional metrics like accuracy and F1, calculating their mean and variance to quantify performance fluctuations. To capture the micro-level effects, we introduce a novel metric, consistency, measuring the stability of individual predictions across runs. Our experiments reveal significant variance at both macro and micro levels, underscoring the need for careful consideration of random seeds in fine-tuning and evaluation.

Related papers

An Information-Theoretic Perspective on Multi-LLM Uncertainty Estimation [7.018119896897734]
Large language models (LLMs) often behave inconsistently across inputs, indicating uncertainty and motivating the need for its quantification in high-stakes settings.<n>We propose MUSE (Multi-LLM Uncertainty via Subset Ensembles), a simple information-theoretic method that uses Jensen-Shannon Divergence to identify and aggregate well-calibrated subsets of LLMs.<n>Experiments on binary prediction tasks demonstrate improved calibration and predictive performance compared to single-model and naive ensemble baselines.
arXiv Detail & Related papers (2025-07-09T19:13:25Z)
Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERT [0.9249657468385781]
This study evaluates fine-tuning strategies for text classification using the DistilBERT model. We examine the influence of hyper parameters such as learning rate, batch size, and epochs on accuracy, F1-score, and loss.
arXiv Detail & Related papers (2024-12-31T03:16:15Z)
Quantifying perturbation impacts for large language models [49.1574468325115]
We introduce Distribution-Based Perturbation Analysis (DBPA), a framework that reformulates perturbation analysis as a frequentist hypothesis testing problem.<n>We demonstrate the effectiveness of DBPA in evaluating perturbation impacts, showing its versatility for perturbation analysis.
arXiv Detail & Related papers (2024-12-01T16:13:09Z)
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks. To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z)
What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length [61.71625297655583]
We show that MORCELA outperforms a commonly used linking theory for acceptability. Larger models require a lower relative degree of adjustment for unigram frequency. Our analysis shows that larger LMs' lower susceptibility to frequency effects can be explained by an ability to better predict rarer words in context.
arXiv Detail & Related papers (2024-11-04T19:05:49Z)
Measuring Variable Importance in Heterogeneous Treatment Effects with Confidence [33.12963161545068]
Causal machine learning holds promise for estimating individual treatment effects from complex data.<n>We propose PermuCATE, an algorithm based on the Conditional Permutation Importance (CPI) method.<n>We empirically demonstrate the benefits of PermuCATE in simulated and real-world health datasets.
arXiv Detail & Related papers (2024-08-23T11:44:07Z)
Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment [58.030196381554745]
We introduce the Hybrid-grained Weight Importance Assessment (HyWIA), a novel method that merges fine-grained and coarse-grained evaluations of weight importance for the pruning of large language models (LLMs)<n>Extensive experiments on LLaMA-V1/V2, Vicuna, Baichuan, and Bloom across various benchmarks demonstrate the effectiveness of HyWIA in pruning LLMs.
arXiv Detail & Related papers (2024-03-16T04:12:50Z)
Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes [52.92110730286403]
It is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input dimensions. We prove that by tuning hyper parameters, the performance, as measured by the marginal likelihood, improves monotonically with the input dimension. We also prove that cross-validation metrics exhibit qualitatively different behavior that is characteristic of double descent.
arXiv Detail & Related papers (2022-10-14T08:09:33Z)
Variational Inference for Additive Main and Multiplicative Interaction Effects Models [0.0]
In plant breeding the presence of a genotype by environment (GxE) interaction has a strong impact on cultivation decision making. In this article, we consider a variational inference approach for such a model. We derive variational approximations for estimating the parameters and we compare the approximations to MCMC using both simulated and real data.
arXiv Detail & Related papers (2022-06-29T22:58:12Z)
Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues. We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)
On the Impact of Random Seeds on the Fairness of Clinical Classifiers [27.71610203951057]
We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records. We also find that the small sample sizes inherent to looking at intersections of minority groups and somewhat rare conditions limit our ability to accurately estimate disparities.
arXiv Detail & Related papers (2021-04-13T16:30:39Z)
Inferring Microbial Biomass Yield and Cell Weight using Probabilistic Macrochemical Modeling [0.0]
Growth rates and biomass yields are key descriptors used in microbiology studies to understand how microbial species respond to changes in the environment. estimating biomass from cell counts, as needed to assess yields, relies on an assumed cell weight. Noise and discrepancies on these assumptions can lead to significant changes in conclusions regarding the microbes' response. This article proposes a methodology to address these challenges using probabilistic macrochemical models of microbial growth.
arXiv Detail & Related papers (2020-10-06T14:23:21Z)
Coupling Machine Learning and Crop Modeling Improves Crop Yield Prediction in the US Corn Belt [2.580765958706854]
This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt. The main objectives are to explore whether a hybrid approach (crop modeling + ML) would result in better predictions, and determine the features from the crop modeling that are most effective to be integrated with ML for corn yield prediction.
arXiv Detail & Related papers (2020-07-28T16:22:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.