Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining
- URL: http://arxiv.org/abs/2407.00935v1
- Date: Mon, 1 Jul 2024 03:35:59 GMT
- Title: Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining
- Authors: Qi Zhang, Tianqi Du, Haotian Huang, Yifei Wang, Yisen Wang,
- Abstract summary: We establish the first theoretical comparisons between two leading generative SSL paradigms: autoregressive SSL and masked SSL.
In classification tasks, the flexibility of targeted tokens in masked SSL fosters more inter-sample connections.
In content generation tasks, the misalignment between the flexible lengths of test samples and the fixed length of unmasked texts hinders its generation performance.
- Score: 34.64600580301882
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, the rise of generative self-supervised learning (SSL) paradigms has exhibited impressive performance across visual, language, and multi-modal domains. While the varied designs of generative SSL objectives lead to distinct properties in downstream tasks, a theoretical understanding of these differences remains largely unexplored. In this paper, we establish the first theoretical comparisons between two leading generative SSL paradigms: autoregressive SSL and masked SSL. Through establishing theoretical frameworks, we elucidate the strengths and limitations of autoregressive and masked SSL within the primary evaluation tasks of classification and content generation. Our findings demonstrate that in classification tasks, the flexibility of targeted tokens in masked SSL fosters more inter-sample connections compared to the fixed position of target tokens in autoregressive SSL, which yields superior clustering performance. In content generation tasks, the misalignment between the flexible lengths of test samples and the fixed length of unmasked texts in masked SSL (vs. flexible lengths of conditional texts in autoregressive SSL) hinders its generation performance. To leverage each other's strengths and mitigate weaknesses, we propose diversity-enhanced autoregressive and variable-length masked objectives, which substantially improve the classification performance of autoregressive SSL and the generation performance of masked SSL. Code is available at https://github.com/PKU-ML/LookAheadLookAround.
Related papers
- On the Discriminability of Self-Supervised Representation Learning [38.598160031349686]
Self-supervised learning (SSL) has recently achieved significant success in downstream visual tasks.
A notable gap still exists between SSL and supervised learning (SL), especially in complex downstream tasks.
arXiv Detail & Related papers (2024-07-18T14:18:03Z) - Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning [4.137391543972184]
Semi-supervised learning (SSL) has witnessed remarkable progress, resulting in numerous method variations.
In this paper, we present a novel SSL approach named FineSSL that significantly addresses this limitation by adapting pre-trained foundation models.
We demonstrate that FineSSL sets a new state of the art for SSL on multiple benchmark datasets, reduces the training cost by over six times, and can seamlessly integrate various fine-tuning and modern SSL algorithms.
arXiv Detail & Related papers (2024-05-20T03:33:12Z) - Every Node is Different: Dynamically Fusing Self-Supervised Tasks for
Attributed Graph Clustering [59.45743537594695]
We propose Dynamically Fusing Self-Supervised Learning (DyFSS) for graph clustering.
DyFSS fuses features extracted from diverse SSL tasks using distinct weights derived from a gating network.
Experiments show DyFSS outperforms state-of-the-art multi-task SSL methods by up to 8.66% on the accuracy metric.
arXiv Detail & Related papers (2024-01-12T14:24:10Z) - Reverse Engineering Self-Supervised Learning [17.720366509919167]
Self-supervised learning (SSL) is a powerful tool in machine learning.
This paper presents an in-depth empirical analysis of SSL-trained representations.
arXiv Detail & Related papers (2023-05-24T23:15:28Z) - Semi-supervised Learning with Deterministic Labeling and Large Margin
Projection [25.398314796157933]
The centrality and diversity of the labeled data are very influential to the performance of semi-supervised learning (SSL)
This study is to learn a kernelized large margin metric for a small amount of most stable and most divergent data that are recognized based on the OLF structure.
Attribute to this novel design, the accuracy and performance stableness of the SSL model based on OLF is significantly improved compared with its baseline methods.
arXiv Detail & Related papers (2022-08-17T04:09:35Z) - Combining Spectral and Self-Supervised Features for Low Resource Speech
Recognition and Translation [27.857955394020475]
Self-Supervised Learning (SSL) models have been successfully applied in various deep learning-based speech tasks.
The quality of SSL representations depends highly on the relatedness between the SSL training domain(s) and the target data domain.
We propose a learnable and interpretable framework to combine SF and SSL representations.
arXiv Detail & Related papers (2022-04-05T20:09:15Z) - DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL)
Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z) - Sound and Visual Representation Learning with Multiple Pretraining Tasks [104.11800812671953]
Self-supervised tasks (SSL) reveal different features from the data.
This work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks.
Experiments on sound representations demonstrate that Multi-SSL via incremental learning (IL) of SSL tasks outperforms single SSL task models.
arXiv Detail & Related papers (2022-01-04T09:09:38Z) - A Strong Baseline for Semi-Supervised Incremental Few-Shot Learning [54.617688468341704]
Few-shot learning aims to learn models that generalize to novel classes with limited training samples.
We propose a novel paradigm containing two parts: (1) a well-designed meta-training algorithm for mitigating ambiguity between base and novel classes caused by unreliable pseudo labels and (2) a model adaptation mechanism to learn discriminative features for novel classes while preserving base knowledge using few labeled and all the unlabeled data.
arXiv Detail & Related papers (2021-10-21T13:25:52Z) - Self-Supervised Learning of Graph Neural Networks: A Unified Review [50.71341657322391]
Self-supervised learning is emerging as a new paradigm for making use of large amounts of unlabeled samples.
We provide a unified review of different ways of training graph neural networks (GNNs) using SSL.
Our treatment of SSL methods for GNNs sheds light on the similarities and differences of various methods, setting the stage for developing new methods and algorithms.
arXiv Detail & Related papers (2021-02-22T03:43:45Z) - Boosting Few-Shot Learning With Adaptive Margin Loss [109.03665126222619]
This paper proposes an adaptive margin principle to improve the generalization ability of metric-based meta-learning approaches for few-shot learning problems.
Extensive experiments demonstrate that the proposed method can boost the performance of current metric-based meta-learning approaches.
arXiv Detail & Related papers (2020-05-28T07:58:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.