Structural Self-Supervised Objectives for Transformers
- URL: http://arxiv.org/abs/2309.08272v1
- Date: Fri, 15 Sep 2023 09:30:45 GMT
- Title: Structural Self-Supervised Objectives for Transformers
- Authors: Luca Di Liello
- Abstract summary: This thesis focuses on improving the pre-training of natural language models using unsupervised raw data.
In the first part, we introduce three alternative pre-training objectives to BERT's Masked Language Modeling (MLM)
In the second part, we proposes self-supervised pre-training tasks that align structurally with downstream applications.
- Score: 3.018656336329545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This thesis focuses on improving the pre-training of natural language models
using unsupervised raw data to make them more efficient and aligned with
downstream applications.
In the first part, we introduce three alternative pre-training objectives to
BERT's Masked Language Modeling (MLM), namely Random Token Substitution (RTS),
Cluster-based Random Token Substitution (C-RTS), and Swapped Language Modeling
(SLM). These objectives involve token swapping instead of masking, with RTS and
C-RTS aiming to predict token originality and SLM predicting the original token
values. Results show that RTS and C-RTS require less pre-training time while
maintaining performance comparable to MLM. Surprisingly, SLM outperforms MLM on
certain tasks despite using the same computational budget.
In the second part, we proposes self-supervised pre-training tasks that align
structurally with downstream applications, reducing the need for labeled data.
We use large corpora like Wikipedia and CC-News to train models to recognize if
text spans originate from the same paragraph or document in several ways. By
doing continuous pre-training, starting from existing models like RoBERTa,
ELECTRA, DeBERTa, BART, and T5, we demonstrate significant performance
improvements in tasks like Fact Verification, Answer Sentence Selection, and
Summarization. These improvements are especially pronounced when limited
annotation data is available. The proposed objectives also achieve
state-of-the-art results on various benchmark datasets, including FEVER (dev
set), ASNQ, WikiQA, and TREC-QA, as well as enhancing the quality of summaries.
Importantly, these techniques can be easily integrated with other methods
without altering the internal structure of Transformer models, making them
versatile for various NLP applications.
Related papers
- Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning [1.6570772838074355]
multimodal large language models (MLLMs) exhibit great potential for chart question answering (CQA)
Recent efforts primarily focus on scaling up training datasets through data collection and synthesis.
We propose a visualization-referenced instruction tuning approach to guide the training dataset enhancement and model development.
arXiv Detail & Related papers (2024-07-29T17:04:34Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services.
Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality.
Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality.
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - LLM-augmented Preference Learning from Natural Language [19.700169351688768]
Large Language Models (LLMs) are equipped to deal with larger context lengths.
LLMs can consistently outperform the SotA when the target text is large.
Few-shot learning yields better performance than zero-shot learning.
arXiv Detail & Related papers (2023-10-12T17:17:27Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes)
TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task.
The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z) - Frustratingly Simple Pretraining Alternatives to Masked Language
Modeling [10.732163031244651]
Masked language modeling (MLM) is widely used in natural language processing for learning text representations.
In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of representations.
arXiv Detail & Related papers (2021-09-04T08:52:37Z) - UHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer
Networks for Offensive Language Detection [28.701023986344993]
Fine-tuning of pre-trained transformer networks such as BERT yield state-of-the-art results for text classification tasks.
Our RoBERTa-based classifier officially ranks 1st in the SemEval 2020 Task12 for the English language.
arXiv Detail & Related papers (2020-04-23T23:59:58Z) - Conversational Question Reformulation via Sequence-to-Sequence
Architectures and Pretrained Language Models [56.268862325167575]
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs)
We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task.
We evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task.
arXiv Detail & Related papers (2020-04-04T11:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.