RoChBert: Towards Robust BERT Fine-tuning for Chinese
- URL: http://arxiv.org/abs/2210.15944v1
- Date: Fri, 28 Oct 2022 07:08:00 GMT
- Title: RoChBert: Towards Robust BERT Fine-tuning for Chinese
- Authors: Zihan Zhang, Jinfeng Li, Ning Shi, Bo Yuan, Xiangyu Liu, Rong Zhang,
Hui Xue, Donghong Sun and Chao Zhang
- Abstract summary: RoChBERT is a framework to build more Robust BERT-based models.
It fuses Chinese phonetic and glyph features into pre-trained representations during fine-tuning.
- Score: 31.573147796706223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite of the superb performance on a wide range of tasks, pre-trained
language models (e.g., BERT) have been proved vulnerable to adversarial texts.
In this paper, we present RoChBERT, a framework to build more Robust BERT-based
models by utilizing a more comprehensive adversarial graph to fuse Chinese
phonetic and glyph features into pre-trained representations during
fine-tuning. Inspired by curriculum learning, we further propose to augment the
training dataset with adversarial texts in combination with intermediate
samples. Extensive experiments demonstrate that RoChBERT outperforms previous
methods in significant ways: (i) robust -- RoChBERT greatly improves the model
robustness without sacrificing accuracy on benign texts. Specifically, the
defense lowers the success rates of unlimited and limited attacks by 59.43% and
39.33% respectively, while remaining accuracy of 93.30%; (ii) flexible --
RoChBERT can easily extend to various language models to solve different
downstream tasks with excellent performance; and (iii) efficient -- RoChBERT
can be directly applied to the fine-tuning stage without pre-training language
model from scratch, and the proposed data augmentation method is also low-cost.
Related papers
- RobustSentEmbed: Robust Sentence Embeddings Using Adversarial Self-Supervised Contrastive Learning [11.347789553984741]
RobustSentEmbed is a self-supervised sentence embedding framework designed to improve robustness in diverse text representation tasks.
Our framework achieves a significant reduction in the success rate of various adversarial attacks, notably reducing the BERTAttack success rate by almost half.
arXiv Detail & Related papers (2024-03-17T04:29:45Z) - A Curious Case of Searching for the Correlation between Training Data and Adversarial Robustness of Transformer Textual Models [11.938237087895649]
Existing works have shown that fine-tuned textual transformer models achieve state-of-the-art prediction performances but are also vulnerable to adversarial text perturbations.
In this paper, we want to prove that there is also a strong correlation between training data and model robustness.
We extract 13 different features representing a wide range of input fine-tuning corpora properties and use them to predict the adversarial robustness of the fine-tuned models.
arXiv Detail & Related papers (2024-02-18T05:58:25Z) - oBERTa: Improving Sparse Transfer Learning via improved initialization,
distillation, and pruning regimes [82.99830498937729]
oBERTa is an easy-to-use set of language models for Natural Language Processing.
It allows NLP practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression.
We explore the use of oBERTa on seven representative NLP tasks.
arXiv Detail & Related papers (2023-03-30T01:37:19Z) - DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
Gradient-Disentangled Embedding Sharing [117.41016786835452]
This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model.
vanilla embedding sharing in ELECTRA hurts training efficiency and model performance.
We propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics.
arXiv Detail & Related papers (2021-11-18T06:48:00Z) - BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural
Machine Translation [38.017030073108735]
We show that a tailored and suitable bilingual pre-trained language model (dubbed BiBERT) achieves state-of-the-art translation performance.
Our best models achieve BLEU scores of 30.45 for En->De and 38.61 for De->En on the IWSLT'14 dataset, and 31.26 for En->De and 34.94 for De->En on the WMT'14 dataset.
arXiv Detail & Related papers (2021-09-09T23:43:41Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Non-Autoregressive Text Generation with Pre-trained Language Models [40.50508206201288]
We show that BERT can be employed as the backbone of a NAG model to greatly improve performance.
We devise mechanisms to alleviate the two common problems of vanilla NAG models.
We propose a new decoding strategy, ratio-first, for applications where the output lengths can be approximately estimated beforehand.
arXiv Detail & Related papers (2021-02-16T15:30:33Z) - TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model.
Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.