Paragraph-based Transformer Pre-training for Multi-Sentence Inference
- URL: http://arxiv.org/abs/2205.01228v1
- Date: Mon, 2 May 2022 21:41:14 GMT
- Title: Paragraph-based Transformer Pre-training for Multi-Sentence Inference
- Authors: Luca Di Liello, Siddhant Garg, Luca Soldaini, Alessandro Moschitti
- Abstract summary: We show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks.
We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences.
- Score: 99.59693674455582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inference tasks such as answer sentence selection (AS2) or fact verification
are typically solved by fine-tuning transformer-based models as individual
sentence-pair classifiers. Recent studies show that these tasks benefit from
modeling dependencies across multiple candidate sentences jointly. In this
paper, we first show that popular pre-trained transformers perform poorly when
used for fine-tuning on multi-candidate inference tasks. We then propose a new
pre-training objective that models the paragraph-level semantics across
multiple input sentences. Our evaluation on three AS2 and one fact verification
datasets demonstrates the superiority of our pre-training technique over the
traditional ones for transformers used as joint models for multi-candidate
inference tasks, as well as when used as cross-encoders for sentence-pair
formulations of these tasks.
Related papers
- Pre-training Transformer Models with Sentence-Level Objectives for
Answer Sentence Selection [99.59693674455582]
We propose three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents.
Our experiments on three public and one industrial AS2 datasets demonstrate the empirical superiority of our pre-trained transformers over baseline models.
arXiv Detail & Related papers (2022-05-20T22:39:00Z) - Dependency Learning for Legal Judgment Prediction with a Unified
Text-to-Text Transformer [13.896506220470748]
Legal Judgment Prediction involves a series of sub-tasks such as predicting violated law articles, charges and term of penalty.
We propose leveraging a unified text-to-text Transformer for LJP.
We show that this unified transformer, albeit pretrained on general-domain text, outperforms pretrained models tailored specifically for the legal domain.
arXiv Detail & Related papers (2021-12-13T01:38:37Z) - Discriminative and Generative Transformer-based Models For Situation
Entity Classification [8.029049649310211]
We re-examine the situation entity (SE) classification task with varying amounts of available training data.
We exploit a Transformer-based variational autoencoder to encode sentences into a lower dimensional latent space.
arXiv Detail & Related papers (2021-09-15T17:07:07Z) - Consistency Regularization for Cross-Lingual Fine-Tuning [61.08704789561351]
We propose to improve cross-lingual fine-tuning with consistency regularization.
Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations.
Experimental results on the XTREME benchmark show that our method significantly improves cross-lingual fine-tuning across various tasks.
arXiv Detail & Related papers (2021-06-15T15:35:44Z) - Utilizing Bidirectional Encoder Representations from Transformers for
Answer Selection [16.048329028104643]
We adopt a transformer-based model for the language modeling task in a large dataset and fine-tune it for downstream tasks.
We find that fine-tuning the BERT model for the answer selection task is very effective and observe a maximum improvement of 13.1% in the QA datasets and 18.7% in the CQA datasets.
arXiv Detail & Related papers (2020-11-14T03:15:26Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - The Cascade Transformer: an Application for Efficient Answer Sentence
Selection [116.09532365093659]
We introduce the Cascade Transformer, a technique to adapt transformer-based models into a cascade of rankers.
When compared to a state-of-the-art transformer model, our approach reduces computation by 37% with almost no impact on accuracy.
arXiv Detail & Related papers (2020-05-05T23:32:01Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.