BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input
Representation
- URL: http://arxiv.org/abs/2204.07837v1
- Date: Sat, 16 Apr 2022 16:19:47 GMT
- Title: BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input
Representation
- Authors: Zheng Zhang, Liang Ding, Dazhao Cheng, Xuebo Liu, Min Zhang, Dacheng
Tao
- Abstract summary: We propose a framework-level robust sequence-to-sequence learning approach, named BLISS, via self-supervised input representation.
We conduct comprehensive experiments to validate the effectiveness of BLISS on various tasks, including machine translation, grammatical error correction, and text summarization.
- Score: 92.75908003533736
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Data augmentations (DA) are the cores to achieving robust
sequence-to-sequence learning on various natural language processing (NLP)
tasks. However, most of the DA approaches force the decoder to make predictions
conditioned on the perturbed input representation, underutilizing supervised
information provided by perturbed input. In this work, we propose a
framework-level robust sequence-to-sequence learning approach, named BLISS, via
self-supervised input representation, which has the great potential to
complement the data-level augmentation approaches. The key idea is to supervise
the sequence-to-sequence framework with both the \textit{supervised}
("input$\rightarrow$output") and \textit{self-supervised} ("perturbed
input$\rightarrow$input") information. We conduct comprehensive experiments to
validate the effectiveness of BLISS on various tasks, including machine
translation, grammatical error correction, and text summarization. The results
show that BLISS outperforms significantly the vanilla Transformer and
consistently works well across tasks than the other five contrastive baselines.
Extensive analyses reveal that BLISS learns robust representations and rich
linguistic knowledge, confirming our claim. Source code will be released upon
publication.
Related papers
- Instance-Aware Graph Prompt Learning [71.26108600288308]
We introduce Instance-Aware Graph Prompt Learning (IA-GPL) in this paper.
The process involves generating intermediate prompts for each instance using a lightweight architecture.
Experiments conducted on multiple datasets and settings showcase the superior performance of IA-GPL compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-11-26T18:38:38Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Logit-Based Ensemble Distribution Distillation for Robust Autoregressive
Sequence Uncertainties [4.8986598953553555]
We investigate Ensemble Distribution Distillation (EDD) applied to large-scale natural language sequence-to-sequence data.
EDD aims to compress the superior uncertainty performance of an expensive (teacher) ensemble into a cheaper (student) single model.
We show, for modern transformer architectures on large-scale translation tasks, that modelling the ensemble logits, instead of softmax probabilities, leads to significantly better students.
arXiv Detail & Related papers (2023-05-17T17:21:10Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - Enjoy the Salience: Towards Better Transformer-based Faithful
Explanations with Word Salience [9.147707153504117]
We propose an auxiliary loss function for guiding the multi-head attention mechanism during training to be close to salient information extracted using TextRank.
Experiments for explanation faithfulness across five datasets, show that models trained with SaLoss consistently provide more faithful explanations.
We further show that the latter result in higher predictive performance in downstream tasks.
arXiv Detail & Related papers (2021-08-31T11:21:30Z) - DeCLUTR: Deep Contrastive Learning for Unsupervised Textual
Representations [4.36561468436181]
We present DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations.
Our approach closes the performance gap between unsupervised and supervised pretraining for universal sentence encoders.
Our code and pretrained models are publicly available and can be easily adapted to new domains or used to embed unseen text.
arXiv Detail & Related papers (2020-06-05T20:00:28Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z) - Hybrid Attention-Based Transformer Block Model for Distant Supervision
Relation Extraction [20.644215991166902]
We propose a new framework using hybrid attention-based Transformer block with multi-instance learning to perform the DSRE task.
The proposed approach can outperform the state-of-the-art algorithms on the evaluation dataset.
arXiv Detail & Related papers (2020-03-10T13:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.