Efficient Long Sequence Encoding via Synchronization
- URL: http://arxiv.org/abs/2203.07644v1
- Date: Tue, 15 Mar 2022 04:37:02 GMT
- Title: Efficient Long Sequence Encoding via Synchronization
- Authors: Xiangyang Mou, Mo Yu, Bingsheng Yao, Lifu Huang
- Abstract summary: We propose a synchronization mechanism for hierarchical encoding.
Our approach first identifies anchor tokens across segments and groups them by their roles in the original input sequence.
Our approach is able to improve the global information exchange among segments while maintaining efficiency.
- Score: 29.075962393432857
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained Transformer models have achieved successes in a wide range of NLP
tasks, but are inefficient when dealing with long input sequences. Existing
studies try to overcome this challenge via segmenting the long sequence
followed by hierarchical encoding or post-hoc aggregation. We propose a
synchronization mechanism for hierarchical encoding. Our approach first
identifies anchor tokens across segments and groups them by their roles in the
original input sequence. Then inside Transformer layer, anchor embeddings are
synchronized within their group via a self-attention module. Our approach is a
general framework with sufficient flexibility -- when adapted to a new task, it
is easy to be enhanced with the task-specific anchor definitions. Experiments
on two representative tasks with different types of long input texts,
NarrativeQA summary setting and wild multi-hop reasoning from HotpotQA,
demonstrate that our approach is able to improve the global information
exchange among segments while maintaining efficiency.
Related papers
- LAIT: Efficient Multi-Segment Encoding in Transformers with
Layer-Adjustable Interaction [31.895986544484206]
We introduce Layer- Interactions in Transformers (LAIT)
Within LAIT, segmented inputs are first encoded independently, and then jointly.
We find LAIT able to reduce 30-50% of the attention FLOPs on many tasks, while preserving high accuracy.
arXiv Detail & Related papers (2023-05-31T06:09:59Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Shift-Reduce Task-Oriented Semantic Parsing with Stack-Transformers [6.744385328015561]
Task-oriented dialogue systems, such as Apple Siri and Amazon Alexa, require a semantic parsing module in order to process user utterances and understand the action to be performed.
This semantic parsing component was initially implemented by rule-based or statistical slot-filling approaches for processing simple queries.
In this article, we advance the research on neural-reduce semantic parsing for task-oriented dialogue.
arXiv Detail & Related papers (2022-10-21T14:19:47Z) - Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
Selection [23.39962989492527]
Transformer-based language models such as BERT have achieved the state-of-the-art on various NLP tasks, but are computationally prohibitive.
We present Pyramid-BERT where we replace previously useds with a em core-set based token selection method justified by theoretical results.
The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths.
arXiv Detail & Related papers (2022-03-27T19:52:01Z) - Retrieve-and-Fill for Scenario-based Task-Oriented Semantic Parsing [110.4684789199555]
We introduce scenario-based semantic parsing: a variant of the original task which first requires disambiguating an utterance's "scenario"
This formulation enables us to isolate coarse-grained and fine-grained aspects of the task, each of which we solve with off-the-shelf neural modules.
Our model is modular, differentiable, interpretable, and allows us to garner extra supervision from scenarios.
arXiv Detail & Related papers (2022-02-02T08:00:21Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z) - Cluster-Former: Clustering-based Sparse Transformer for Long-Range
Dependency Encoding [90.77031668988661]
Cluster-Former is a novel clustering-based sparse Transformer to perform attention across chunked sequences.
The proposed framework is pivoted on two unique types of Transformer layer: Sliding-Window Layer and Cluster-Former Layer.
Experiments show that Cluster-Former achieves state-of-the-art performance on several major QA benchmarks.
arXiv Detail & Related papers (2020-09-13T22:09:30Z) - SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization [39.85688193525843]
We study a sequence-to-sequence setting with input sequence lengths up to 100,000 tokens and output sequence lengths up to 768 tokens.
We propose SEAL, a Transformer-based model, featuring a new encoder-decoder attention that dynamically extracts/selects input snippets to sparsely attend to for each output segment.
The SEAL model achieves state-of-the-art results on existing long-form summarization tasks, and outperforms strong baseline models on a new dataset/task we introduce, Search2Wiki, with much longer input text.
arXiv Detail & Related papers (2020-06-18T00:13:21Z) - Multi-level Head-wise Match and Aggregation in Transformer for Textual
Sequence Matching [87.97265483696613]
We propose a new approach to sequence pair matching with Transformer, by learning head-wise matching representations on multiple levels.
Experiments show that our proposed approach can achieve new state-of-the-art performance on multiple tasks.
arXiv Detail & Related papers (2020-01-20T20:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.