Related papers: Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing

Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing

URL: http://arxiv.org/abs/2104.07275v2
Date: Fri, 16 Apr 2021 18:33:06 GMT
Title: Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing
Authors: Akshat Shrivastava, Pierce Chuang, Arun Babu, Shrey Desai, Abhinav Arora, Alexander Zotov, Ahmed Aly
Abstract summary: An effective recipe for building seq2seq, non-autoregressive, task-orienteds to map utterances to semantic frames proceeds in three steps. These models are typically bottlenecked by length prediction. In our work, we propose non-autoregressives which shift the decoding task from text generation to span prediction.
Score: 55.97957664897004
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: An effective recipe for building seq2seq, non-autoregressive, task-oriented parsers to map utterances to semantic frames proceeds in three steps: encoding an utterance $x$, predicting a frame's length |y|, and decoding a |y|-sized frame with utterance and ontology tokens. Though empirically strong, these models are typically bottlenecked by length prediction, as even small inaccuracies change the syntactic and semantic characteristics of resulting frames. In our work, we propose span pointer networks, non-autoregressive parsers which shift the decoding task from text generation to span prediction; that is, when imputing utterance spans into frame slots, our model produces endpoints (e.g., [i, j]) as opposed to text (e.g., "6pm"). This natural quantization of the output space reduces the variability of gold frames, therefore improving length prediction and, ultimately, exact match. Furthermore, length prediction is now responsible for frame syntax and the decoder is responsible for frame semantics, resulting in a coarse-to-fine model. We evaluate our approach on several task-oriented semantic parsing datasets. Notably, we bridge the quality gap between non-autogressive and autoregressive parsers, achieving 87 EM on TOPv2 (Chen et al. 2020). Furthermore, due to our more consistent gold frames, we show strong improvements in model generalization in both cross-domain and cross-lingual transfer in low-resource settings. Finally, due to our diminished output vocabulary, we observe 70% reduction in latency and 83% reduction in memory at beam size 5 compared to prior non-autoregressive parsers.

Related papers

Meta-DAN: towards an efficient prediction strategy for page-level handwritten text recognition [4.605037293860087]
We propose the Meta Document Attention Network (Meta-DAN) as a novel decoding strategy to reduce the prediction time. We evaluate the proposed approach on 10 full-page handwritten datasets and demonstrate state-of-the-art results on average in terms of character error rate.
arXiv Detail & Related papers (2025-04-04T11:06:09Z)
Object Recognition as Next Token Prediction [99.40793702627396]
We present an approach to pose object recognition as next token prediction. The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels.
arXiv Detail & Related papers (2023-12-04T18:58:40Z)
AiluRus: A Scalable ViT Framework for Dense Prediction [95.1313839257891]
Vision transformers (ViTs) have emerged as a prevalent architecture for vision tasks owing to their impressive performance. We propose to apply adaptive resolution for different regions in the image according to their importance. We evaluate our proposed method on three different datasets and observe promising performance.
arXiv Detail & Related papers (2023-11-02T12:48:43Z)
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale [59.01246141215051]
We analyze the factor that leads to degradation from the perspective of language supervision. We propose a tunable-free pre-training strategy to retain the generalization ability of the text encoder. We produce a series of models, dubbed TVTSv2, with up to one billion parameters.
arXiv Detail & Related papers (2023-05-23T15:44:56Z)
Retrieve-and-Fill for Scenario-based Task-Oriented Semantic Parsing [110.4684789199555]
We introduce scenario-based semantic parsing: a variant of the original task which first requires disambiguating an utterance's "scenario" This formulation enables us to isolate coarse-grained and fine-grained aspects of the task, each of which we solve with off-the-shelf neural modules. Our model is modular, differentiable, interpretable, and allows us to garner extra supervision from scenarios.
arXiv Detail & Related papers (2022-02-02T08:00:21Z)
X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries. We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP. We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z)
AMR Parsing with Action-Pointer Transformer [18.382148821100152]
We propose a transition-based system that combines hard-attention over sentences with a target-side action pointer mechanism. We show that our action-pointer approach leads to increased expressiveness and attains large gains against the best transition-based AMR.
arXiv Detail & Related papers (2021-04-29T22:01:41Z)
Non-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog [22.442123799917074]
We propose a non-autoregressive approach to predict semantic parse trees with an efficient seq2seq model architecture. By combining non-autoregressive prediction with convolutional neural networks, we achieve significant latency gains and parameter size reduction.
arXiv Detail & Related papers (2021-04-11T05:44:35Z)
Frame-To-Frame Consistent Semantic Segmentation [2.538209532048867]
We train a convolutional neural network (CNN) which propagates features through consecutive frames in a video. Our results indicate that the added temporal information produces a frame-to-frame consistent and more accurate image understanding.
arXiv Detail & Related papers (2020-08-03T15:28:40Z)
Parsing as Pretraining [13.03764728768944]
We first cast constituent and dependency parsing as sequence tagging. We then use a single feed-forward layer to directly map word vectors to labels that encode a linearized tree.
arXiv Detail & Related papers (2020-02-05T08:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.