Related papers: The Power of Prompt Tuning for Low-Resource Semantic Parsing

The Power of Prompt Tuning for Low-Resource Semantic Parsing

URL: http://arxiv.org/abs/2110.08525v1
Date: Sat, 16 Oct 2021 09:33:09 GMT
Title: The Power of Prompt Tuning for Low-Resource Semantic Parsing
Authors: Nathan Schucher, Siva Reddy, Harm de Vries
Abstract summary: We investigate prompt tuning for semantic parsing. For large T5 models we find (i.e. that prompt tuning significantly outperforms fine-tuning in the low data regime) This last result is surprising as it suggests that large T5 models can be modulated to generate sequences far from the pre-training distribution.
Score: 10.37371743879877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prompt tuning has recently emerged as an effective method for adapting pre-trained language models to a number of language tasks. In this paper, we investigate prompt tuning for semantic parsing, the task of mapping natural language utterances onto formal meaning representations. For large T5 models we find (i) that prompt tuning significantly outperforms fine-tuning in the low data regime and (ii) that canonicalization -- i.e. naturalizing the meaning representations -- barely improves performance. This last result is surprising as it suggests that large T5 models can be modulated to generate sequences that are far from the pre-training distribution.

Related papers

Ensembling Finetuned Language Models for Text Classification [55.15643209328513]
Finetuning is a common practice across different communities to adapt pretrained models to particular tasks. ensembles of neural networks are typically used to boost performance and provide reliable uncertainty estimates. We present a metadataset with predictions from five large finetuned models on six datasets and report results of different ensembling strategies.
arXiv Detail & Related papers (2024-10-25T09:15:54Z)
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains. We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z)
SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings [0.7349727826230863]
Soft prompt tuning techniques have gained traction as an effective strategy for the parameter-efficient tuning of pretrained language models. We introduce SuperPos-Prompt, a new re parameterization technique employing the superposition of multiple pretrained vocabulary embeddings to improve the learning of soft prompts. Our experiments consistently highlight SuperPos-Prompt's superiority over Residual Prompt tuning, exhibiting an average score increase of $+6.4$ in T5-Small and $+5.0$ in T5-Base. Remarkably, SuperPos-Prompt occasionally outperforms even full fine-tuning methods.
arXiv Detail & Related papers (2024-06-07T22:18:49Z)
UT5: Pretraining Non autoregressive T5 with unrolled denoising [9.656399724144192]
We studied unsupervised pretraining for non auto-regressive T5 models via unrolled denoising. We showed its SoTA results in downstream generation tasks such as SQuAD question generation and XSum.
arXiv Detail & Related papers (2023-11-14T21:28:10Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation [9.736284584478032]
We show the effectiveness of character-level modeling in translation, particularly in cases where fine-tuning data is limited. While evaluating the importance of source texts in driving model predictions, we highlight word-level patterns within ByT5. We conclude by assessing the efficiency tradeoff of byte models, suggesting their usage in non-time-critical scenarios to boost translation quality.
arXiv Detail & Related papers (2023-02-28T00:50:19Z)
Simple and Effective Gradient-Based Tuning of Sequence-to-Sequence Models [8.370770440898454]
Huge cost of training larger language models can make tuning them prohibitively expensive. We apply gradient-based hyper- parameter optimization to sequence-to-sequence tasks for the first time. We show efficiency and performance gains over strong baselines for both Neural Machine Translation and Natural Language Understanding (NLU) tasks.
arXiv Detail & Related papers (2022-09-10T14:52:41Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM) Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z)
Improving Compositional Generalization with Self-Training for Data-to-Text Generation [36.973617793800315]
We study the compositional generalization of current generation models in data-to-text tasks. By simulating structural shifts in the compositional Weather dataset, we show that T5 models fail to generalize to unseen structures. We propose an approach based on self-training using finetuned BLEURT for pseudo-response selection.
arXiv Detail & Related papers (2021-10-16T04:26:56Z)
Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines. In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics. Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)
Selecting Informative Contexts Improves Language Model Finetuning [66.26521454263343]
We present a general fine-tuning method that we call information gain filtration. During fine-tuning, a secondary learner selects informative examples and skips uninformative ones. We show that our method has consistent improvement across datasets, fine-tuning tasks, and language model architectures.
arXiv Detail & Related papers (2020-05-01T02:01:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.