Syntactically Look-Ahead Attention Network for Sentence Compression
- URL: http://arxiv.org/abs/2002.01145v2
- Date: Sun, 17 May 2020 13:33:00 GMT
- Title: Syntactically Look-Ahead Attention Network for Sentence Compression
- Authors: Hidetaka Kamigaito, Manabu Okumura
- Abstract summary: Sentence compression is the task of compressing a long sentence into a short one by deleting redundant words.
In sequence-to-sequence (Seq2Seq) based models, the decoder unidirectionally decides to retain or delete words.
We propose a novel Seq2Seq model, syntactically look-ahead attention network (SLAHAN), that can generate informative summaries.
- Score: 36.6256383447417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentence compression is the task of compressing a long sentence into a short
one by deleting redundant words. In sequence-to-sequence (Seq2Seq) based
models, the decoder unidirectionally decides to retain or delete words. Thus,
it cannot usually explicitly capture the relationships between decoded words
and unseen words that will be decoded in the future time steps. Therefore, to
avoid generating ungrammatical sentences, the decoder sometimes drops important
words in compressing sentences. To solve this problem, we propose a novel
Seq2Seq model, syntactically look-ahead attention network (SLAHAN), that can
generate informative summaries by explicitly tracking both dependency parent
and child words during decoding and capturing important words that will be
decoded in the future. The results of the automatic evaluation on the Google
sentence compression dataset showed that SLAHAN achieved the best
kept-token-based-F1, ROUGE-1, ROUGE-2 and ROUGE-L scores of 85.5, 79.3, 71.3
and 79.1, respectively. SLAHAN also improved the summarization performance on
longer sentences. Furthermore, in the human evaluation, SLAHAN improved
informativeness without losing readability.
Related papers
- VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers [119.89284877061779]
This paper introduces VALL-E 2, the latest advancement in neural language models that marks a milestone in zero-shot text-to-speech (TTS)
VALL-E 2 consistently synthesizes high-quality speech, even for sentences that are traditionally challenging due to their complexity or repetitive phrases.
The advantages of this work could contribute to valuable endeavors, such as generating speech for individuals with aphasia or people with amyotrophic lateral sclerosis.
arXiv Detail & Related papers (2024-06-08T06:31:03Z) - Crossword: A Semantic Approach to Data Compression via Masking [38.107509264270924]
This study places careful emphasis on English text and exploits its semantic aspect to enhance the compression efficiency further.
The proposed masking-based strategy resembles the above game.
In a nutshell, the encoder evaluates the semantic importance of each word according to the semantic loss and then masks the minor ones, while the decoder aims to recover the masked words from the semantic context by means of the Transformer.
arXiv Detail & Related papers (2023-04-03T16:04:06Z) - Inflected Forms Are Redundant in Question Generation Models [27.49894653349779]
We propose an approach to enhance the performance of Question Generation using an encoder-decoder framework.
Firstly, we identify the inflected forms of words from the input of encoder, and replace them with the root words.
Secondly, we propose to adapt QG as a combination of the following actions in the encode-decoder framework: generating a question word, copying a word from the source sequence or generating a word transformation type.
arXiv Detail & Related papers (2023-01-01T13:08:11Z) - Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired
Speech Data [145.95460945321253]
We introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes.
The proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over the method without decoder pre-training.
arXiv Detail & Related papers (2022-03-31T15:33:56Z) - Using BERT Encoding and Sentence-Level Language Model for Sentence
Ordering [0.9134244356393667]
We propose an algorithm for sentence ordering in a corpus of short stories.
Our proposed method uses a language model based on Universal Transformers (UT) that captures sentences' dependencies by employing an attention mechanism.
The proposed model includes three components: Sentence, Language Model, and Sentence Arrangement with Brute Force Search.
arXiv Detail & Related papers (2021-08-24T23:03:36Z) - Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic
Parsing [55.97957664897004]
An effective recipe for building seq2seq, non-autoregressive, task-orienteds to map utterances to semantic frames proceeds in three steps.
These models are typically bottlenecked by length prediction.
In our work, we propose non-autoregressives which shift the decoding task from text generation to span prediction.
arXiv Detail & Related papers (2021-04-15T07:02:35Z) - An Iterative Contextualization Algorithm with Second-Order Attention [0.40611352512781856]
We show how to combine the representations of words that make up a sentence into a cohesive whole.
Our algorithm starts with a presumably erroneous value of the context, and adjusts this value with respect to the tokens at hand.
Our models report strong results in several well-known text classification tasks.
arXiv Detail & Related papers (2021-03-03T05:34:50Z) - Fast End-to-End Speech Recognition via a Non-Autoregressive Model and
Cross-Modal Knowledge Transferring from BERT [72.93855288283059]
We propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once)
The model consists of an encoder, a decoder, and a position dependent summarizer (PDS)
arXiv Detail & Related papers (2021-02-15T15:18:59Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - Attentional Speech Recognition Models Misbehave on Out-of-domain
Utterances [16.639133822656458]
We decode audio from the British National Corpus with an attentional encoder-decoder model trained solely on the LibriSpeech corpus.
We observe that there are many 5-second recordings that produce more than 500 characters of decoding output.
A frame-synchronous hybrid (DNN-HMM) model trained on the same data does not produce these unusually long transcripts.
arXiv Detail & Related papers (2020-02-12T18:53:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.