Multi-level Head-wise Match and Aggregation in Transformer for Textual
Sequence Matching
- URL: http://arxiv.org/abs/2001.07234v1
- Date: Mon, 20 Jan 2020 20:02:02 GMT
- Title: Multi-level Head-wise Match and Aggregation in Transformer for Textual
Sequence Matching
- Authors: Shuohang Wang, Yunshi Lan, Yi Tay, Jing Jiang, Jingjing Liu
- Abstract summary: We propose a new approach to sequence pair matching with Transformer, by learning head-wise matching representations on multiple levels.
Experiments show that our proposed approach can achieve new state-of-the-art performance on multiple tasks.
- Score: 87.97265483696613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer has been successfully applied to many natural language processing
tasks. However, for textual sequence matching, simple matching between the
representation of a pair of sequences might bring in unnecessary noise. In this
paper, we propose a new approach to sequence pair matching with Transformer, by
learning head-wise matching representations on multiple levels. Experiments
show that our proposed approach can achieve new state-of-the-art performance on
multiple tasks that rely only on pre-computed sequence-vector-representation,
such as SNLI, MNLI-match, MNLI-mismatch, QQP, and SQuAD-binary.
Related papers
- Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - Shift-Reduce Task-Oriented Semantic Parsing with Stack-Transformers [6.744385328015561]
Task-oriented dialogue systems, such as Apple Siri and Amazon Alexa, require a semantic parsing module in order to process user utterances and understand the action to be performed.
This semantic parsing component was initially implemented by rule-based or statistical slot-filling approaches for processing simple queries.
In this article, we advance the research on neural-reduce semantic parsing for task-oriented dialogue.
arXiv Detail & Related papers (2022-10-21T14:19:47Z) - Paragraph-based Transformer Pre-training for Multi-Sentence Inference [99.59693674455582]
We show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks.
We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences.
arXiv Detail & Related papers (2022-05-02T21:41:14Z) - Efficient Long Sequence Encoding via Synchronization [29.075962393432857]
We propose a synchronization mechanism for hierarchical encoding.
Our approach first identifies anchor tokens across segments and groups them by their roles in the original input sequence.
Our approach is able to improve the global information exchange among segments while maintaining efficiency.
arXiv Detail & Related papers (2022-03-15T04:37:02Z) - s2s-ft: Fine-Tuning Pretrained Transformer Encoders for
Sequence-to-Sequence Learning [47.30689555136054]
We present a sequence-to-sequence fine-tuning toolkit s2s-ft, which adopts pretrained Transformers for conditional generation tasks.
S2s-ft achieves strong performance on several benchmarks of abstractive summarization, and question generation.
arXiv Detail & Related papers (2021-10-26T12:45:34Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene [10.822477939237459]
We propose contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings.
CMLM surpasses several recent post-training methods in few-shot settings without the need for data augmentation.
arXiv Detail & Related papers (2021-06-04T08:17:48Z) - Sequence-Level Mixed Sample Data Augmentation [119.94667752029143]
This work proposes a simple data augmentation approach to encourage compositional behavior in neural models for sequence-to-sequence problems.
Our approach, SeqMix, creates new synthetic examples by softly combining input/output sequences from the training set.
arXiv Detail & Related papers (2020-11-18T02:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.