Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective
- URL: http://arxiv.org/abs/2405.15895v1
- Date: Fri, 24 May 2024 19:33:05 GMT
- Title: Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective
- Authors: Pranshu Malviya, Jerry Huang, Quentin Fournier, Sarath Chandar,
- Abstract summary: We propose a metric to study the impact of expansion by estimating the size of the manifold.
Experimental results show a clear relationship between gains in performance and manifold size.
- Score: 10.547693900435917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The optimal model for a given task is often challenging to determine, requiring training multiple models from scratch which becomes prohibitive as dataset and model sizes grow. A more efficient alternative is to reuse smaller pre-trained models by expanding them, however, this is not widely adopted as how this impacts training dynamics remains poorly understood. While prior works have introduced statistics to measure these effects, they remain flawed. To rectify this, we offer a new approach for understanding and quantifying the impact of expansion through the lens of the loss landscape, which has been shown to contain a manifold of linearly connected minima. Building on this new perspective, we propose a metric to study the impact of expansion by estimating the size of the manifold. Experimental results show a clear relationship between gains in performance and manifold size, enabling the comparison of candidate models and presenting a first step towards expanding models more reliably based on geometric properties of the loss landscape.
Related papers
- "Why" Has the Least Side Effect on Model Editing [25.67779910446609]
This paper delves into a critical factor-question type-by categorizing model editing questions.
Our findings reveal that the extent of performance degradation varies significantly across different question types.
We also examine the impact of batch size on side effects, discovering that increasing the batch size can mitigate performance drops.
arXiv Detail & Related papers (2024-09-27T12:05:12Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Revisiting Catastrophic Forgetting in Large Language Model Tuning [79.70722658190097]
Catastrophic Forgetting (CF) means models forgetting previously acquired knowledge when learning new data.
This paper takes the first step to reveal the direct link between the flatness of the model loss landscape and the extent of CF in the field of large language models.
Experiments on three widely-used fine-tuning datasets, spanning different model scales, demonstrate the effectiveness of our method in alleviating CF.
arXiv Detail & Related papers (2024-06-07T11:09:13Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Improving Adversarial Transferability via Model Alignment [25.43899674478279]
We introduce a novel model alignment technique aimed at improving a given source model's ability in generating transferable adversarial perturbations.
Experiments on the ImageNet dataset, using a variety of model architectures, demonstrate that perturbations generated from aligned source models exhibit significantly higher transferability.
arXiv Detail & Related papers (2023-11-30T12:15:49Z) - On the Embedding Collapse when Scaling up Recommendation Models [53.66285358088788]
We identify the embedding collapse phenomenon as the inhibition of scalability, wherein the embedding matrix tends to occupy a low-dimensional subspace.
We propose a simple yet effective multi-embedding design incorporating embedding-set-specific interaction modules to learn embedding sets with large diversity.
arXiv Detail & Related papers (2023-10-06T17:50:38Z) - The Quest of Finding the Antidote to Sparse Double Descent [1.336445018915526]
As the model's sparsity increases, the performance first worsens, then improves, and finally deteriorates.
Such a non-monotonic behavior raises serious questions about the optimal model's size to maintain high performance.
We show that a simple $ell$ regularization method can help to mitigate this phenomenon but sacrifices the performance/sparsity.
arXiv Detail & Related papers (2023-08-31T09:56:40Z) - Uncovering the Hidden Cost of Model Compression [43.62624133952414]
Visual Prompting has emerged as a pivotal method for transfer learning in computer vision.
Model compression detrimentally impacts the performance of visual prompting-based transfer.
However, negative effects on calibration are not present when models are compressed via quantization.
arXiv Detail & Related papers (2023-08-29T01:47:49Z) - Unlocking the Potential of Federated Learning for Deeper Models [24.875271131226707]
Federated learning (FL) is a new paradigm for distributed machine learning that allows a global model to be trained across multiple clients.
We propose several technical guidelines based on reducing divergence, such as using wider models and reducing the receptive field.
These approaches can greatly improve the accuracy of FL on deeper models.
arXiv Detail & Related papers (2023-06-05T08:45:44Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.