Training Naturalized Semantic Parsers with Very Little Data
- URL: http://arxiv.org/abs/2204.14243v1
- Date: Fri, 29 Apr 2022 17:14:54 GMT
- Title: Training Naturalized Semantic Parsers with Very Little Data
- Authors: Subendhu Rongali, Konstantine Arkoudas, Melanie Rubino, Wael Hamza
- Abstract summary: State-of-the-art (SOTA) semantics are seq2seq architectures based on large language models that have been pretrained on vast amounts of text.
Recent work has explored a reformulation of semantic parsing whereby the output sequences are themselves natural language sentences.
We show that this method delivers new SOTA few-shot performance on the Overnight dataset.
- Score: 10.709587018625275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic parsing is an important NLP problem, particularly for voice
assistants such as Alexa and Google Assistant. State-of-the-art (SOTA) semantic
parsers are seq2seq architectures based on large language models that have been
pretrained on vast amounts of text. To better leverage that pretraining, recent
work has explored a reformulation of semantic parsing whereby the output
sequences are themselves natural language sentences, but in a controlled
fragment of natural language. This approach delivers strong results,
particularly for few-shot semantic parsing, which is of key importance in
practice and the focus of our paper. We push this line of work forward by
introducing an automated methodology that delivers very significant additional
improvements by utilizing modest amounts of unannotated data, which is
typically easy to obtain. Our method is based on a novel synthesis of four
techniques: joint training with auxiliary unsupervised tasks; constrained
decoding; self-training; and paraphrasing. We show that this method delivers
new SOTA few-shot performance on the Overnight dataset, particularly in very
low-resource settings, and very compelling few-shot results on a new semantic
parsing dataset.
Related papers
- Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Training Effective Neural Sentence Encoders from Automatically Mined
Paraphrases [0.0]
We propose a method for training effective language-specific sentence encoders without manually labeled data.
Our approach is to automatically construct a dataset of paraphrase pairs from sentence-aligned bilingual text corpora.
Our sentence encoder can be trained in less than a day on a single graphics card, achieving high performance on a diverse set of sentence-level tasks.
arXiv Detail & Related papers (2022-07-26T09:08:56Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - On the Use of External Data for Spoken Named Entity Recognition [40.93448412171246]
Recent advances in self-supervised speech representations have made it feasible to consider learning models with limited labeled data.
We draw on a variety of approaches, including self-training, knowledge distillation, and transfer learning, and consider their applicability to both end-to-end models and pipeline approaches.
arXiv Detail & Related papers (2021-12-14T18:49:26Z) - To Augment or Not to Augment? A Comparative Study on Text Augmentation
Techniques for Low-Resource NLP [0.0]
We investigate three categories of text augmentation methodologies which perform changes on the syntax.
We compare them on part-of-speech tagging, dependency parsing and semantic role labeling for a diverse set of language families.
Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT.
arXiv Detail & Related papers (2021-11-18T10:52:48Z) - Translate & Fill: Improving Zero-Shot Multilingual Semantic Parsing with
Synthetic Data [2.225882303328135]
We propose a novel Translate-and-Fill (TaF) method to produce silver training data for a multilingual semantic parsing task.
Experimental results on three multilingual semantic parsing datasets show that data augmentation with TaF reaches accuracies competitive with similar systems.
arXiv Detail & Related papers (2021-09-09T14:51:11Z) - Semantic Parsing with Less Prior and More Monolingual Data [12.715221084359085]
This work investigates whether a generic transformer-based seq2seq model can achieve competitive performance with minimal semantic-parsing specific inductive bias design.
By exploiting a relatively large monolingual corpus of the target programming language, which is cheap to mine from the web, unlike a parallel corpus, we achieved 80.75% exact match accuracy on Django and 32.57 BLEU score on CoNaLa, both are SOTA to the best of our knowledge.
arXiv Detail & Related papers (2021-01-01T16:02:38Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Low-Resource Domain Adaptation for Compositional Task-Oriented Semantic
Parsing [85.35582118010608]
Task-oriented semantic parsing is a critical component of virtual assistants.
Recent advances in deep learning have enabled several approaches to successfully parse more complex queries.
We propose a novel method that outperforms a supervised neural model at a 10-fold data reduction.
arXiv Detail & Related papers (2020-10-07T17:47:53Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.