Improving BERT Pretraining with Syntactic Supervision
- URL: http://arxiv.org/abs/2104.10516v1
- Date: Wed, 21 Apr 2021 13:15:58 GMT
- Title: Improving BERT Pretraining with Syntactic Supervision
- Authors: Giorgos Tziafas, Konstantinos Kogkalidis, Gijs Wijnholds, Michael
Moortgat
- Abstract summary: Bidirectional masked Transformers have become the core theme in the current NLP landscape.
We apply our methodology on Lassy Large, an automatically annotated corpus of written Dutch.
Our experiments suggest that our syntax-aware model performs on par with established baselines.
- Score: 2.4087148947930634
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bidirectional masked Transformers have become the core theme in the current
NLP landscape. Despite their impressive benchmarks, a recurring theme in recent
research has been to question such models' capacity for syntactic
generalization. In this work, we seek to address this question by adding a
supervised, token-level supertagging objective to standard unsupervised
pretraining, enabling the explicit incorporation of syntactic biases into the
network's training dynamics. Our approach is straightforward to implement,
induces a marginal computational overhead and is general enough to adapt to a
variety of settings. We apply our methodology on Lassy Large, an automatically
annotated corpus of written Dutch. Our experiments suggest that our
syntax-aware model performs on par with established baselines, despite Lassy
Large being one order of magnitude smaller than commonly used corpora.
Related papers
- Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation [19.20874993309959]
vision-language foundation models, such as CLIP, have showcased remarkable effectiveness in numerous zero-shot image-level tasks.
In this work, we propose a strong baseline for training-free OVSS, termed Neighbour-Aware CLIP (NACLIP)
Our method enforces localization of patches in the self-attention of CLIP's vision transformer which, despite being crucial for dense prediction tasks, has been overlooked in the OVSS literature.
arXiv Detail & Related papers (2024-04-12T01:08:04Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Context-aware Fine-tuning of Self-supervised Speech Models [56.95389222319555]
We study the use of context, i.e., surrounding segments, during fine-tuning.
We propose a new approach called context-aware fine-tuning.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks.
arXiv Detail & Related papers (2022-12-16T15:46:15Z) - Position Prediction as an Effective Pretraining Strategy [20.925906203643883]
We propose a novel, but surprisingly simple alternative to content reconstruction-- that of predicting locations from content, without providing positional information for it.
Our approach brings improvements over strong supervised training baselines and is comparable to modern unsupervised/self-supervised pretraining methods.
arXiv Detail & Related papers (2022-07-15T17:10:48Z) - Compositional generalization in semantic parsing with pretrained
transformers [13.198689566654108]
We show that language models pretrained exclusively with non-English corpora, or even with programming language corpora, significantly improve out-of-distribution generalization.
We also show that larger models are harder to train from scratch and their generalization accuracy is lower when trained up to convergence.
arXiv Detail & Related papers (2021-09-30T13:06:29Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.