Transferring BERT-like Transformers' Knowledge for Authorship
Verification
- URL: http://arxiv.org/abs/2112.05125v1
- Date: Thu, 9 Dec 2021 18:57:29 GMT
- Title: Transferring BERT-like Transformers' Knowledge for Authorship
Verification
- Authors: Andrei Manolache, Florin Brad, Elena Burceanu, Antonio Barbalau, Radu
Ionescu, Marius Popescu
- Abstract summary: We study the effectiveness of several BERT-like transformers for the task of authorship verification.
We provide new splits for PAN-2020, where training and test data are sampled from disjoint topics or authors.
We show that those splits can enhance the models' capability to transfer knowledge over a new, significantly different dataset.
- Score: 8.443350618722562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of identifying the author of a text spans several decades and was
tackled using linguistics, statistics, and, more recently, machine learning.
Inspired by the impressive performance gains across a broad range of natural
language processing tasks and by the recent availability of the PAN large-scale
authorship dataset, we first study the effectiveness of several BERT-like
transformers for the task of authorship verification. Such models prove to
achieve very high scores consistently. Next, we empirically show that they
focus on topical clues rather than on author writing style characteristics,
taking advantage of existing biases in the dataset. To address this problem, we
provide new splits for PAN-2020, where training and test data are sampled from
disjoint topics or authors. Finally, we introduce DarkReddit, a dataset with a
different input data distribution. We further use it to analyze the domain
generalization performance of models in a low-data regime and how performance
varies when using the proposed PAN-2020 splits for fine-tuning. We show that
those splits can enhance the models' capability to transfer knowledge over a
new, significantly different dataset.
Related papers
- Extensive Evaluation of Transformer-based Architectures for Adverse Drug
Events Extraction [6.78974856327994]
Adverse Event (ADE) extraction is one of the core tasks in digital pharmacovigilance.
We evaluate 19 Transformer-based models for ADE extraction on informal texts.
At the end of our analyses, we identify a list of take-home messages that can be derived from the experimental data.
arXiv Detail & Related papers (2023-06-08T15:25:24Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - A Unified Neural Network Model for Readability Assessment with Feature
Projection and Length-Balanced Loss [17.213602354715956]
We propose a BERT-based model with feature projection and length-balanced loss for readability assessment.
Our model achieves state-of-the-art performances on two English benchmark datasets and one dataset of Chinese textbooks.
arXiv Detail & Related papers (2022-10-19T05:33:27Z) - On the Use of BERT for Automated Essay Scoring: Joint Learning of
Multi-Scale Essay Representation [12.896747108919968]
In this paper, we introduce a novel multi-scale essay representation for BERT that can be jointly learned.
Experiment results show that our approach derives much benefit from joint learning of multi-scale essay representation.
Our multi-scale essay representation also generalizes well to CommonLit Readability Prize data set.
arXiv Detail & Related papers (2022-05-08T10:36:54Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Sensitive Data Detection and Classification in Spanish Clinical Text:
Experiments with BERT [0.8379286663107844]
In this paper, we use a BERT-based sequence labelling model to conduct anonymisation experiments in Spanish.
Experiments show that a simple BERT-based model with general-domain pre-training obtains highly competitive results without any domain specific feature engineering.
arXiv Detail & Related papers (2020-03-06T09:46:51Z) - What BERT Sees: Cross-Modal Transfer for Visual Question Generation [21.640299110619384]
We study the visual capabilities of BERT out-of-the-box, by avoiding pre-training made on supplementary data.
We introduce BERT-gen, a BERT-based architecture for text generation, able to leverage on either mono- or multi- modal representations.
arXiv Detail & Related papers (2020-02-25T12:44:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.