Related papers: LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

URL: http://arxiv.org/abs/2206.06565v2
Date: Wed, 15 Jun 2022 01:14:49 GMT
Title: LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks
Authors: Tuan Dinh, Yuchen Zeng, Ruisu Zhang, Ziqian Lin, Michael Gira, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee
Abstract summary: Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks. We propose Language-Interfaced Fine-Tuning (LIFT) to solve non-language downstream tasks without changing the model architecture or loss function. LIFT does not make any changes to the model architecture or loss function, and it relies on the natural language interface.
Score: 22.274913349275817
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks. However, for non-language downstream tasks, a common practice is to employ task-specific designs for input, output layers, and loss functions. For instance, it is possible to fine-tune an LM into an MNIST classifier by replacing the word embedding layer with an image patch embedding layer, the word token output layer with a 10-way output layer, and the word prediction loss with a 10-way classification loss, respectively. A natural question arises: can LM fine-tuning solve non-language downstream tasks without changing the model architecture or loss function? To answer this, we propose Language-Interfaced Fine-Tuning (LIFT) and study its efficacy and limitations by conducting an extensive empirical study on a suite of non-language classification and regression tasks. LIFT does not make any changes to the model architecture or loss function, and it solely relies on the natural language interface, enabling "no-code machine learning with LMs." We find that LIFT performs relatively well across a wide range of low-dimensional classification and regression tasks, matching the performances of the best baselines in many cases, especially for the classification tasks. We report the experimental results on the fundamental properties of LIFT, including its inductive bias, sample efficiency, ability to extrapolate, robustness to outliers and label noise, and generalization. We also analyze a few properties/techniques specific to LIFT, e.g., context-aware learning via appropriate prompting, quantification of predictive uncertainty, and two-stage fine-tuning. Our code is available at https://github.com/UW-Madison-Lee-Lab/LanguageInterfacedFineTuning.

Related papers

Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model [14.92282077647913]
Continual fine-tuning (CFT) is the process of sequentially fine-tuning an LLM to enable the model to adapt to downstream tasks. We study a two-phase CFT process in which an English-only end-to-end fine-tuned LLM is sequentially fine-tuned on a multilingual dataset. We observe that the similarity'' of Phase 2 tasks with Phase 1 determines the LLM's adaptability.
arXiv Detail & Related papers (2024-10-21T13:39:03Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer [50.40191599304911]
We introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer. In this paper, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language. We show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines.
arXiv Detail & Related papers (2024-01-09T21:09:07Z)
LaFFi: Leveraging Hybrid Natural Language Feedback for Fine-tuning Language Models [14.087415157225715]
Fine-tuning Large Language Models (LLMs) adapts a trained model to specific downstream tasks. Supervised Fine-Tuning (SFT) is a common approach, where an LLM is trained to produce desired answers. This paper introduces an alternative to SFT called Natural Language Feedback for Finetuning LLMs (LaFFi)
arXiv Detail & Related papers (2023-12-31T21:18:16Z)
The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency. We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes. It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z)
Instruction Tuning for Large Language Models: A Survey [52.86322823501338]
We make a systematic review of the literature, including the general methodology of supervised fine-tuning (SFT) We also review the potential pitfalls of SFT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies.
arXiv Detail & Related papers (2023-08-21T15:35:16Z)
Meta-Learning Online Adaptation of Language Models [88.8947656843812]
Large language models encode impressively broad world knowledge in their parameters. However, the knowledge in static language models falls out of date, limiting the model's effective "shelf life"
arXiv Detail & Related papers (2023-05-24T11:56:20Z)
LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback. Our focus is the code generation task, where the model produces code based on natural language instructions. LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z)
Actuarial Applications of Natural Language Processing Using Transformers: Case Studies for Using Text Features in an Actuarial Context [0.0]
This tutorial demonstrates to incorporate text data into actuarial classification and regression tasks. The main focus is on methods employing transformer-based models. The case studies tackle challenges related to a multi-lingual setting and long input sequences.
arXiv Detail & Related papers (2022-06-04T15:39:30Z)
Detecting Requirements Smells With Deep Learning: Experiences, Challenges and Future Work [9.44316959798363]
This work aims to improve the previous work by creating a manually labeled dataset and using ensemble learning, Deep Learning (DL), and techniques such as word embeddings and transfer learning to overcome the generalization problem. The current findings show that the dataset is unbalanced and which class examples should be added more.
arXiv Detail & Related papers (2021-08-06T12:45:15Z)
SML: a new Semantic Embedding Alignment Transformer for efficient cross-lingual Natural Language Inference [71.57324258813674]
The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present. NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise. In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
arXiv Detail & Related papers (2021-03-17T13:23:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.