Related papers: Hierarchical Learning for Generation with Long Source Sequences

Hierarchical Learning for Generation with Long Source Sequences

URL: http://arxiv.org/abs/2104.07545v1
Date: Thu, 15 Apr 2021 15:57:32 GMT
Title: Hierarchical Learning for Generation with Long Source Sequences
Authors: Tobias Rohde, Xiaoxia Wu, Yinhan Liu
Abstract summary: We design and study a new Hierarchical Attention Transformer-based architecture (HAT) that outperforms standard Transformers on several sequence to sequence tasks. Our model achieves stateof-the-art results on four summarization tasks, including ArXiv, CNN/DM, SAMSum, and AMI, and we push PubMed R1 & R2 SOTA further.
Score: 4.851392124435261
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: One of the challenges for current sequence to sequence (seq2seq) models is processing long sequences, such as those in summarization and document level machine translation tasks. These tasks require the model to reason at the token level as well as the sentence and paragraph level. We design and study a new Hierarchical Attention Transformer-based architecture (HAT) that outperforms standard Transformers on several sequence to sequence tasks. In particular, our model achieves stateof-the-art results on four summarization tasks, including ArXiv, CNN/DM, SAMSum, and AMI, and we push PubMed R1 & R2 SOTA further. Our model significantly outperforms our document-level machine translation baseline by 28 BLEU on the WMT19 EN-DE document translation task. We also investigate what the hierarchical layers learn by visualizing the hierarchical encoder-decoder attention. Finally, we study hierarchical learning on encoder-only pre-training and analyze its performance on classification downstream tasks.

Related papers

Emergence and Effectiveness of Task Vectors in In-Context Learning: An Encoder Decoder Perspective [18.077009146950473]
We study how transformers form task vectors during pretraining and how their task encoding quality predicts ICL task performance.<n>Our empirical insights shed light into better understanding the success and failure modes of large language models via their representations.
arXiv Detail & Related papers (2024-12-16T19:00:18Z)
Implant Global and Local Hierarchy Information to Sequence based Code Representation Models [25.776540440893257]
We analyze how the complete hierarchical structure influences the tokens in code sequences and abstract this influence as a property of code tokens called hierarchical embedding. We propose the Hierarchy Transformer (HiT), a simple but effective sequence model to incorporate the complete hierarchical embeddings of source code into a Transformer model.
arXiv Detail & Related papers (2023-03-14T12:01:39Z)
Hierarchical Decision Transformer [0.0]
This paper presents a hierarchical algorithm for learning a sequence model from demonstrations. The high-level mechanism guides the low-level controller through the task by selecting sub-goals for the latter to reach. We validate our method in multiple tasks of OpenAIGym, D4RL and RoboMimic benchmarks.
arXiv Detail & Related papers (2022-09-21T15:48:40Z)
Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph. Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks. Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z)
Efficient Long Sequence Encoding via Synchronization [29.075962393432857]
We propose a synchronization mechanism for hierarchical encoding. Our approach first identifies anchor tokens across segments and groups them by their roles in the original input sequence. Our approach is able to improve the global information exchange among segments while maintaining efficiency.
arXiv Detail & Related papers (2022-03-15T04:37:02Z)
Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework [83.82026345508334]
We propose OFA, a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.) OFA achieves new state-of-the-arts on a series of multimodal tasks, including image captioning (COCO test CIDEr: 149.6), text-to-image generation (COCO test FID: 10.5), VQA (test-std encoder acc.: 80.02), SNLI-VE (test acc.: 90.
arXiv Detail & Related papers (2022-02-07T10:38:21Z)
Retrieve-and-Fill for Scenario-based Task-Oriented Semantic Parsing [110.4684789199555]
We introduce scenario-based semantic parsing: a variant of the original task which first requires disambiguating an utterance's "scenario" This formulation enables us to isolate coarse-grained and fine-grained aspects of the task, each of which we solve with off-the-shelf neural modules. Our model is modular, differentiable, interpretable, and allows us to garner extra supervision from scenarios.
arXiv Detail & Related papers (2022-02-02T08:00:21Z)
Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions. Existing neural models have been shown to lack this basic ability in learning symbolic structures. We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z)
Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models [56.268862325167575]
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs) We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task. We evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task.
arXiv Detail & Related papers (2020-04-04T11:07:54Z)
Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity. Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.