Related papers: Exploring Mode Connectivity for Pre-trained Language Models

Exploring Mode Connectivity for Pre-trained Language Models

URL: http://arxiv.org/abs/2210.14102v1
Date: Tue, 25 Oct 2022 15:40:11 GMT
Title: Exploring Mode Connectivity for Pre-trained Language Models
Authors: Yujia Qin, Cheng Qian, Jing Yi, Weize Chen, Yankai Lin, Xu Han, Zhiyuan Liu, Maosong Sun and Jie Zhou
Abstract summary: We study how to effectively adapt pre-trained language models (PLMs) to high-performance minima. In this paper, we investigate the geometric connections of different minima through the lens of mode connectivity.
Score: 91.33378704580295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent years have witnessed the prevalent application of pre-trained language models (PLMs) in NLP. From the perspective of parameter space, PLMs provide generic initialization, starting from which high-performance minima could be found. Although plenty of works have studied how to effectively and efficiently adapt PLMs to high-performance minima, little is known about the connection of various minima reached under different adaptation configurations. In this paper, we investigate the geometric connections of different minima through the lens of mode connectivity, which measures whether two minima can be connected with a low-loss path. We conduct empirical analyses to investigate three questions: (1) how could hyperparameters, specific tuning methods, and training data affect PLM's mode connectivity? (2) How does mode connectivity change during pre-training? (3) How does the PLM's task knowledge change along the path connecting two minima? In general, exploring the mode connectivity of PLMs conduces to understanding the geometric connection of different minima, which may help us fathom the inner workings of PLM downstream adaptation.

Related papers

Pilot: Building the Federated Multimodal Instruction Tuning Framework [79.56362403673354]
Our framework integrates two stages of "adapter on adapter" into the connector of the vision encoder and the LLM. In stage 1, we extract task-specific features and client-specific features from visual information. In stage 2, we build the cross-task Mixture-of-Adapters(CT-MoA) module to perform cross-task interaction.
arXiv Detail & Related papers (2025-01-23T07:49:24Z)
A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Method-Level Code Smell Detection [11.9757082688031]
Existing detection methods, relying on Codes or Machine Learning (ML) and Deep Learning (DL) techniques, often face limitations such as unsatisfactory performance. This study evaluates state-of-the-art PEFT methods on both small and large Language Models for detecting two types of method-level code smells: Complex Conditional and Complex Method. Results show that PEFT methods achieve comparable or better performance than full fine-tuning while consuming less GPU memory.
arXiv Detail & Related papers (2024-12-18T12:48:36Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning [0.6650227510403052]
Multi-objective reinforcement learning (MORL) is essential for addressing the intricacies of real-world RL problems. MORL is challenging due to unstable learning dynamics with deep learning-based function approximators. Our work empirically explores model-free policy learning loss functions and the impact of different architectural choices.
arXiv Detail & Related papers (2024-07-23T19:17:47Z)
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences [49.14535254003683]
PaLoRA is a novel parameter-efficient method that augments the original model with task-specific low-rank adapters. Our experimental results show that PaLoRA outperforms MTL and PFL baselines across various datasets.
arXiv Detail & Related papers (2024-07-10T21:25:51Z)
Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning [9.38259062204602]
Large language models (LLMs) exhibit remarkable performance in language understanding and generation. LLMs are continuously fine-tuned on complex and diverse domain-specific downstream tasks. A trade-off needs to be kept between learning plasticity and memory stability.
arXiv Detail & Related papers (2024-02-29T05:27:45Z)
Learning to Learn with Indispensable Connections [6.040904021861969]
We propose a novel meta-learning method called Meta-LTH that includes indispensible (necessary) connections. Our method improves the classification accuracy by approximately 2% (20-way 1-shot task setting) for omniglot dataset.
arXiv Detail & Related papers (2023-04-06T04:53:13Z)
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models [75.25782573728677]
This paper presents a framework for adapter-based parameter-efficient fine-tuning (PEFT) of language models (LLMs) The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, and GPT-J, as well as widely used adapters such as Series adapters, Parallel adapter, Prompt-based learning and Reparametrization-based methods. We evaluate the effectiveness of the adapters on fourteen datasets from two different reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning.
arXiv Detail & Related papers (2023-04-04T16:31:37Z)
PSNet: Parallel Symmetric Network for Video Salient Object Detection [85.94443548452729]
We propose a VSOD network with up and down parallel symmetry, named PSNet. Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding.
arXiv Detail & Related papers (2022-10-12T04:11:48Z)
Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods [19.587273175563745]
Self-Supervised Learning (SSL) surmises that inputs and pairwise positive relationships are enough to learn meaningful representations. This paper proposes a unifying framework under the helm of spectral manifold learning to address those limitations.
arXiv Detail & Related papers (2022-05-23T17:59:32Z)
Hybrid Relation Guided Set Matching for Few-shot Action Recognition [51.3308583226322]
We propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components. The purpose of the hybrid relation module is to learn task-specific embeddings by fully exploiting associated relations within and cross videos in an episode. We evaluate HyRSM on six challenging benchmarks, and the experimental results show its superiority over the state-of-the-art methods by a convincing margin.
arXiv Detail & Related papers (2022-04-28T11:43:41Z)
Multi-level Distance Regularization for Deep Metric Learning [20.178765779788492]
We propose a novel distance-based regularization method for deep metric learning called Multi-level Distance Regularization (MDR) MDR explicitly disturbs a learning procedure by regularizing pairwise distances between embedding vectors into multiple levels. By easily adopting our MDR, the previous approaches can be improved in performance and generalization ability.
arXiv Detail & Related papers (2021-02-08T14:16:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.