The Power of Prompt Tuning for Low-Resource Semantic Parsing
- URL: http://arxiv.org/abs/2110.08525v1
- Date: Sat, 16 Oct 2021 09:33:09 GMT
- Title: The Power of Prompt Tuning for Low-Resource Semantic Parsing
- Authors: Nathan Schucher, Siva Reddy, Harm de Vries
- Abstract summary: We investigate prompt tuning for semantic parsing.
For large T5 models we find (i.e. that prompt tuning significantly outperforms fine-tuning in the low data regime)
This last result is surprising as it suggests that large T5 models can be modulated to generate sequences far from the pre-training distribution.
- Score: 10.37371743879877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt tuning has recently emerged as an effective method for adapting
pre-trained language models to a number of language tasks. In this paper, we
investigate prompt tuning for semantic parsing, the task of mapping natural
language utterances onto formal meaning representations. For large T5 models we
find (i) that prompt tuning significantly outperforms fine-tuning in the low
data regime and (ii) that canonicalization -- i.e. naturalizing the meaning
representations -- barely improves performance. This last result is surprising
as it suggests that large T5 models can be modulated to generate sequences that
are far from the pre-training distribution.
Related papers
- Ensembling Finetuned Language Models for Text Classification [55.15643209328513]
Finetuning is a common practice across different communities to adapt pretrained models to particular tasks.
ensembles of neural networks are typically used to boost performance and provide reliable uncertainty estimates.
We present a metadataset with predictions from five large finetuned models on six datasets and report results of different ensembling strategies.
arXiv Detail & Related papers (2024-10-25T09:15:54Z) - Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains.
We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z) - SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings [0.7349727826230863]
Soft prompt tuning techniques have gained traction as an effective strategy for the parameter-efficient tuning of pretrained language models.
We introduce SuperPos-Prompt, a new re parameterization technique employing the superposition of multiple pretrained vocabulary embeddings to improve the learning of soft prompts.
Our experiments consistently highlight SuperPos-Prompt's superiority over Residual Prompt tuning, exhibiting an average score increase of $+6.4$ in T5-Small and $+5.0$ in T5-Base.
Remarkably, SuperPos-Prompt occasionally outperforms even full fine-tuning methods.
arXiv Detail & Related papers (2024-06-07T22:18:49Z) - UT5: Pretraining Non autoregressive T5 with unrolled denoising [9.656399724144192]
We studied unsupervised pretraining for non auto-regressive T5 models via unrolled denoising.
We showed its SoTA results in downstream generation tasks such as SQuAD question generation and XSum.
arXiv Detail & Related papers (2023-11-14T21:28:10Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5
for Machine Translation [9.736284584478032]
We show the effectiveness of character-level modeling in translation, particularly in cases where fine-tuning data is limited.
While evaluating the importance of source texts in driving model predictions, we highlight word-level patterns within ByT5.
We conclude by assessing the efficiency tradeoff of byte models, suggesting their usage in non-time-critical scenarios to boost translation quality.
arXiv Detail & Related papers (2023-02-28T00:50:19Z) - Simple and Effective Gradient-Based Tuning of Sequence-to-Sequence
Models [8.370770440898454]
Huge cost of training larger language models can make tuning them prohibitively expensive.
We apply gradient-based hyper- parameter optimization to sequence-to-sequence tasks for the first time.
We show efficiency and performance gains over strong baselines for both Neural Machine Translation and Natural Language Understanding (NLU) tasks.
arXiv Detail & Related papers (2022-09-10T14:52:41Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - An Exploration of Prompt Tuning on Generative Spoken Language Model for
Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM)
Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z) - Improving Compositional Generalization with Self-Training for
Data-to-Text Generation [36.973617793800315]
We study the compositional generalization of current generation models in data-to-text tasks.
By simulating structural shifts in the compositional Weather dataset, we show that T5 models fail to generalize to unseen structures.
We propose an approach based on self-training using finetuned BLEURT for pseudo-response selection.
arXiv Detail & Related papers (2021-10-16T04:26:56Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.