A Robust Semantic Frame Parsing Pipeline on a New Complex Twitter
Dataset
- URL: http://arxiv.org/abs/2212.08987v1
- Date: Sun, 18 Dec 2022 01:59:49 GMT
- Title: A Robust Semantic Frame Parsing Pipeline on a New Complex Twitter
Dataset
- Authors: Yu Wang and Hongxia Jin
- Abstract summary: We introduce a robust semantic frame parsing pipeline that can handle both emphOOD patterns and emphOOV tokens.
We also build an E2E application to demo the feasibility of our algorithm and show why it is useful in real application.
- Score: 53.73316523766183
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most recent semantic frame parsing systems for spoken language understanding
(SLU) are designed based on recurrent neural networks. These systems display
decent performance on benchmark SLU datasets such as ATIS or SNIPS, which
contain short utterances with relatively simple patterns. However, the current
semantic frame parsing models lack a mechanism to handle out-of-distribution
(\emph{OOD}) patterns and out-of-vocabulary (\emph{OOV}) tokens. In this paper,
we introduce a robust semantic frame parsing pipeline that can handle both
\emph{OOD} patterns and \emph{OOV} tokens in conjunction with a new complex
Twitter dataset that contains long tweets with more \emph{OOD} patterns and
\emph{OOV} tokens. The new pipeline demonstrates much better results in
comparison to state-of-the-art baseline SLU models on both the SNIPS dataset
and the new Twitter dataset (Our new Twitter dataset can be downloaded from
https://1drv.ms/u/s!AroHb-W6_OAlavK4begsDsMALfE?e=c8f2XX ). Finally, we also
build an E2E application to demo the feasibility of our algorithm and show why
it is useful in real application.
Related papers
- SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models [64.40250409933752]
We build upon our previous publication by implementing a simple and efficient non-autoregressive (NAR) TTS framework, termed SimpleSpeech 2.
SimpleSpeech 2 effectively combines the strengths of both autoregressive (AR) and non-autoregressive (NAR) methods.
We show a significant improvement in generation performance and generation speed compared to our previous work and other state-of-the-art (SOTA) large-scale TTS models.
arXiv Detail & Related papers (2024-08-25T17:07:39Z) - (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork [60.889175951038496]
Large-scale neural networks have demonstrated remarkable performance in different domains like vision and language processing.
One of the key questions of structural pruning is how to estimate the channel significance.
We propose a novel algorithmic framework, namely textttPASS.
It is a tailored hyper-network to take both visual prompts and network weight statistics as input, and output layer-wise channel sparsity in a recurrent manner.
arXiv Detail & Related papers (2024-07-24T16:47:45Z) - RETVec: Resilient and Efficient Text Vectorizer [5.181952693002194]
RETVec combines a novel character encoding with an optional small embedding model to embed words into a 256-dimensional vector space.
The RETVec embedding model is pre-trained using pair-wise metric learning to be robust against typos and character-level adversarial attacks.
arXiv Detail & Related papers (2023-02-18T02:06:52Z) - TokenFlow: Rethinking Fine-grained Cross-modal Alignment in
Vision-Language Retrieval [30.429340065755436]
We devise a new model-agnostic formulation for fine-grained cross-modal alignment.
Inspired by optimal transport theory, we introduce emphTokenFlow, an instantiation of the proposed scheme.
arXiv Detail & Related papers (2022-09-28T04:11:05Z) - Hierarchical Neural Network Approaches for Long Document Classification [3.6700088931938835]
We employ pre-trained Universal Sentence (USE) and Bidirectional Representations from Transformers (BERT) in a hierarchical setup to capture better representations efficiently.
Our proposed models are conceptually simple where we divide the input data into chunks and then pass this through base models of BERT and USE.
We show that USE + CNN/LSTM performs better than its stand-alone baseline. Whereas the BERT + CNN/LSTM performs on par with its stand-alone counterpart.
arXiv Detail & Related papers (2022-01-18T07:17:40Z) - ESPnet2-TTS: Extending the Edge of TTS Research [62.92178873052468]
ESPnet2-TTS is an end-to-end text-to-speech (E2E-TTS) toolkit.
New features include: on-the-fly flexible pre-processing, joint training with neural vocoders, and state-of-the-art TTS models with extensions like full-band E2E text-to-waveform modeling.
arXiv Detail & Related papers (2021-10-15T03:27:45Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Generating Synthetic Data for Task-Oriented Semantic Parsing with
Hierarchical Representations [0.8203855808943658]
In this work, we explore the possibility of generating synthetic data for neural semantic parsing.
Specifically, we first extract masked templates from the existing labeled utterances, and then fine-tune BART to generate synthetic utterances conditioning.
We show the potential of our approach when evaluating on the Facebook TOP dataset for navigation domain.
arXiv Detail & Related papers (2020-11-03T22:55:40Z) - Learning Reasoning Strategies in End-to-End Differentiable Proving [50.9791149533921]
Conditional Theorem Provers learn optimal rule selection strategy via gradient-based optimisation.
We show that Conditional Theorem Provers are scalable and yield state-of-the-art results on the CLUTRR dataset.
arXiv Detail & Related papers (2020-07-13T16:22:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.