Improving Top-K Decoding for Non-Autoregressive Semantic Parsing via
Intent Conditioning
- URL: http://arxiv.org/abs/2204.06748v1
- Date: Thu, 14 Apr 2022 04:06:39 GMT
- Title: Improving Top-K Decoding for Non-Autoregressive Semantic Parsing via
Intent Conditioning
- Authors: Geunseob Oh, Rahul Goel, Chris Hidey, Shachi Paul, Aditya Gupta,
Pararth Shah, Rushin Shah
- Abstract summary: We propose a novel NAR semantic that introduces intent conditioning on the decoder.
As the top-level intent governs the syntax and semantics of a parse, the intent conditioning allows the model to better control beam search.
We evaluate the proposed NAR on conversational SP datasets, TOP & TOPv2.
- Score: 11.307865386100993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic parsing (SP) is a core component of modern virtual assistants like
Google Assistant and Amazon Alexa. While sequence-to-sequence-based
auto-regressive (AR) approaches are common for conversational semantic parsing,
recent studies employ non-autoregressive (NAR) decoders and reduce inference
latency while maintaining competitive parsing quality. However, a major
drawback of NAR decoders is the difficulty of generating top-k (i.e., k-best)
outputs with approaches such as beam search. To address this challenge, we
propose a novel NAR semantic parser that introduces intent conditioning on the
decoder. Inspired by the traditional intent and slot tagging parsers, we
decouple the top-level intent prediction from the rest of a parse. As the
top-level intent largely governs the syntax and semantics of a parse, the
intent conditioning allows the model to better control beam search and improves
the quality and diversity of top-k outputs. We introduce a hybrid
teacher-forcing approach to avoid training and inference mismatch. We evaluate
the proposed NAR on conversational SP datasets, TOP & TOPv2. Like the existing
NAR models, we maintain the O(1) decoding time complexity while generating more
diverse outputs and improving the top-3 exact match (EM) by 2.4 points. In
comparison with AR models, our model speeds up beam search inference by 6.7
times on CPU with competitive top-k EM.
Related papers
- Context Perception Parallel Decoder for Scene Text Recognition [52.620841341333524]
Scene text recognition methods have struggled to attain high accuracy and fast inference speed.
We present an empirical study of AR decoding in STR, and discover that the AR decoder not only models linguistic context, but also provides guidance on visual context perception.
We construct a series of CPPD models and also plug the proposed modules into existing STR decoders. Experiments on both English and Chinese benchmarks demonstrate that the CPPD models achieve highly competitive accuracy while running approximately 8x faster than their AR-based counterparts.
arXiv Detail & Related papers (2023-07-23T09:04:13Z) - Improving Code Search with Hard Negative Sampling Based on Fine-tuning [15.341959871682981]
We introduce a cross-encoder architecture for code search that jointly encodes the concatenation of query and code.
We also introduce a Retriever-Ranker (RR) framework that cascades the dual-encoder and cross-encoder to promote the efficiency of evaluation and online serving.
arXiv Detail & Related papers (2023-05-08T07:04:28Z) - Noise-Robust Dense Retrieval via Contrastive Alignment Post Training [89.29256833403167]
Contrastive Alignment POst Training (CAPOT) is a highly efficient finetuning method that improves model robustness without requiring index regeneration.
CAPOT enables robust retrieval by freezing the document encoder while the query encoder learns to align noisy queries with their unaltered root.
We evaluate CAPOT noisy variants of MSMARCO, Natural Questions, and Trivia QA passage retrieval, finding CAPOT has a similar impact as data augmentation with none of its overhead.
arXiv Detail & Related papers (2023-04-06T22:16:53Z) - Helping the Weak Makes You Strong: Simple Multi-Task Learning Improves
Non-Autoregressive Translators [35.939982651768666]
Probability framework of NAR models requires conditional independence assumption on target sequences.
We propose a simple and model-agnostic multi-task learning framework to provide more informative learning signals.
Our approach can consistently improve accuracy of multiple NAR baselines without adding any additional decoding overhead.
arXiv Detail & Related papers (2022-11-11T09:10:14Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z) - Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for
Grammar Induction and Text Representation [41.51966652141165]
We propose a model-based pruning method, which also enables parallel encoding during inference.
Our experiments show that our Fast-R2D2 improves performance significantly in grammar induction and competitive results in downstream classification tasks.
arXiv Detail & Related papers (2022-03-01T07:54:44Z) - Recursive Decoding: A Situated Cognition Approach to Compositional
Generation in Grounded Language Understanding [0.0]
We present Recursive Decoding, a novel procedure for training and using seq2seq models.
Rather than generating an entire output sequence in one pass, models are trained to predict one token at a time.
RD yields dramatic improvement on two previously neglected generalization tasks in gSCAN.
arXiv Detail & Related papers (2022-01-27T19:13:42Z) - Non-autoregressive End-to-end Speech Translation with Parallel
Autoregressive Rescoring [83.32560748324667]
This article describes an efficient end-to-end speech translation (E2E-ST) framework based on non-autoregressive (NAR) models.
We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.
arXiv Detail & Related papers (2021-09-09T16:50:16Z) - Orthros: Non-autoregressive End-to-end Speech Translation with
Dual-decoder [64.55176104620848]
We propose a novel NAR E2E-ST framework, Orthros, in which both NAR and autoregressive (AR) decoders are jointly trained on the shared speech encoder.
The latter is used for selecting better translation among various length candidates generated from the former, which dramatically improves the effectiveness of a large length beam with negligible overhead.
Experiments on four benchmarks show the effectiveness of the proposed method in improving inference speed while maintaining competitive translation quality.
arXiv Detail & Related papers (2020-10-25T06:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.