Gradient-Based Automated Iterative Recovery for Parameter-Efficient
Tuning
- URL: http://arxiv.org/abs/2302.06598v1
- Date: Mon, 13 Feb 2023 18:54:58 GMT
- Title: Gradient-Based Automated Iterative Recovery for Parameter-Efficient
Tuning
- Authors: Maximilian Mozes, Tolga Bolukbasi, Ann Yuan, Frederick Liu, Nithum
Thain, Lucas Dixon
- Abstract summary: We use TracIn to improve model performance in the parameter-efficient tuning (PET) setting.
We develop a new methodology for using gradient-based explainability techniques to improve model performance.
- Score: 11.124310650599146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained large language models (LLMs) are able to solve a wide variety of
tasks through transfer learning. Various explainability methods have been
developed to investigate their decision making process. TracIn (Pruthi et al.,
2020) is one such gradient-based method which explains model inferences based
on the influence of training examples. In this paper, we explore the use of
TracIn to improve model performance in the parameter-efficient tuning (PET)
setting. We develop conversational safety classifiers via the prompt-tuning PET
method and show how the unique characteristics of the PET regime enable TracIn
to identify the cause for certain misclassifications by LLMs. We develop a new
methodology for using gradient-based explainability techniques to improve model
performance, G-BAIR: gradient-based automated iterative recovery. We show that
G-BAIR can recover LLM performance on benchmarks after manually corrupting
training labels. This suggests that influence methods like TracIn can be used
to automatically perform data cleaning, and introduces the potential for
interactive debugging and relabeling for PET-based transfer learning methods.
Related papers
- SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models [26.484208658326857]
Continual learning aims to incrementally acquire new concepts in data streams while resisting forgetting previous knowledge.
With the rise of powerful pre-trained models (PTMs), there is a growing interest in training incremental learning systems.
arXiv Detail & Related papers (2024-11-04T15:34:30Z) - Semi-Supervised Class-Agnostic Motion Prediction with Pseudo Label
Regeneration and BEVMix [59.55173022987071]
We study the potential of semi-supervised learning for class-agnostic motion prediction.
Our framework adopts a consistency-based self-training paradigm, enabling the model to learn from unlabeled data.
Our method exhibits comparable performance to weakly and some fully supervised methods.
arXiv Detail & Related papers (2023-12-13T09:32:50Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - Exploring the Impact of Model Scaling on Parameter-Efficient Tuning [100.61202305296275]
Scaling-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs)
In small PLMs, there are usually noticeable performance differences among PET methods.
We introduce a more flexible PET method called Arbitrary PET (APET) method.
arXiv Detail & Related papers (2023-06-04T10:10:54Z) - Mitigating ML Model Decay in Continuous Integration with Data Drift
Detection: An Empirical Study [7.394099294390271]
This study aims to investigate the performance of using data drift detection techniques for automatically detecting the retraining points for ML models for TCP in CI environments.
We employed the Hellinger distance to identify changes in both the values and distribution of input data and leveraged these changes as retraining points for the ML model.
Our experimental evaluation of the Hellinger distance-based method demonstrated its efficacy and efficiency in detecting retraining points and reducing the associated costs.
arXiv Detail & Related papers (2023-05-22T05:55:23Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Automated Essay Scoring Using Transformer Models [0.415623340386296]
We consider a transformer-based approach for automated essay scoring (AES)
We compare its performance to a logistic regression model based on the BOW approach and discuss their differences.
We show how such models can help increase the accuracy of human raters.
arXiv Detail & Related papers (2021-10-13T17:09:47Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z) - Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant
Disease Diagnosis [64.82680813427054]
Plant diseases serve as one of main threats to food security and crop production.
One popular approach is to transform this problem as a leaf image classification task, which can be addressed by the powerful convolutional neural networks (CNNs)
We propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information.
arXiv Detail & Related papers (2020-03-17T09:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.