RoBERTa-wwm-ext Fine-Tuning for Chinese Text Classification
- URL: http://arxiv.org/abs/2103.00492v1
- Date: Wed, 24 Feb 2021 18:57:57 GMT
- Title: RoBERTa-wwm-ext Fine-Tuning for Chinese Text Classification
- Authors: Zhuo Xu
- Abstract summary: Bidirectional Representations from Transformers (BERT) have shown to be a promising way to dramatically improve the performance across various Natural Language Processing tasks.
In this project, RoBERTa-wwm-ext pre-train language model was adopted and fine-tuned for Chinese text classification.
Models were able to classify Chinese texts into two categories, containing descriptions of legal behavior and descriptions of illegal behavior.
- Score: 5.71097144710995
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Bidirectional Encoder Representations from Transformers (BERT) have shown to
be a promising way to dramatically improve the performance across various
Natural Language Processing tasks [Devlin et al., 2019]. Meanwhile, progress
made over the past few years by various Neural Net-work has also proved the
effectiveness of Neural Network in the field of Natural Language Processing. In
this project, RoBERTa-wwm-ext [Cui et al., 2019] pre-train language model was
adopted and fine-tuned for Chinese text classification. The models were able to
classify Chinese texts into two categories, containing descriptions of legal
behavior and descriptions of illegal behavior. Four different models are also
proposed in the paper. Those models will use RoBERTa-wwm-extas their embedding
layer and feed the embedding into different neural networks. The motivation
be-hind proposing these models is straightforward. By introducing complex
output layer architecture, the overall performance of the models could be
improved. All the models were trained on a data set derived from Chinese public
court records, and the performance of different models were compared.The
experiment shows that the performance of pro-posed models failed to beat the
original RoBERTa-wwm-ext model in terms of accuracy and training efficiency.
Related papers
- Can bidirectional encoder become the ultimate winner for downstream applications of foundation models? [1.8120356834558644]
Foundational models have the characteristics of pre-training, transfer learning, and self-supervised learning.
BERT broke through the limitation of only using one-way methods for language modeling in pre-training by using a masked language model.
This article analyzes one-way and bidirectional models based on GPT and BERT and compares their differences based on the purpose of the model.
arXiv Detail & Related papers (2024-11-27T03:31:14Z) - SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models [64.40250409933752]
We build upon our previous publication by implementing a simple and efficient non-autoregressive (NAR) TTS framework, termed SimpleSpeech 2.
SimpleSpeech 2 effectively combines the strengths of both autoregressive (AR) and non-autoregressive (NAR) methods.
We show a significant improvement in generation performance and generation speed compared to our previous work and other state-of-the-art (SOTA) large-scale TTS models.
arXiv Detail & Related papers (2024-08-25T17:07:39Z) - A Text-to-Text Model for Multilingual Offensive Language Identification [19.23565690468299]
This study presents the first pre-trained model with encoder-decoder architecture for offensive language identification with text-to-text transformers (T5)
Our pre-trained T5 model outperforms other transformer-based models fine-tuned for offensive language detection, such as fBERT and HateBERT, in multiple English benchmarks.
Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5.
arXiv Detail & Related papers (2023-12-06T09:37:27Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - TunBERT: Pretrained Contextualized Text Representation for Tunisian
Dialect [0.0]
We investigate the feasibility of training monolingual Transformer-based language models for under represented languages.
We show that the use of noisy web crawled data instead of structured data is more convenient for such non-standardized language.
Our best performing TunBERT model reaches or improves the state-of-the-art in all three downstream tasks.
arXiv Detail & Related papers (2021-11-25T15:49:50Z) - Neural Models for Offensive Language Detection [0.0]
Offensive language detection is an ever-growing natural language processing (NLP) application.
We believe contributing to improving and comparing different machine learning models to fight such harmful contents is an important and challenging goal for this thesis.
arXiv Detail & Related papers (2021-05-30T13:02:45Z) - Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese
Pre-trained Language Models [62.41139712595334]
We propose a novel pre-training paradigm for Chinese -- Lattice-BERT.
We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers.
We show that our model can bring an average increase of 1.5% under the 12-layer setting.
arXiv Detail & Related papers (2021-04-15T02:36:49Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Revisiting Pre-Trained Models for Chinese Natural Language Processing [73.65780892128389]
We revisit Chinese pre-trained language models to examine their effectiveness in a non-English language.
We also propose a model called MacBERT, which improves upon RoBERTa in several ways.
arXiv Detail & Related papers (2020-04-29T02:08:30Z) - Abstractive Text Summarization based on Language Model Conditioning and
Locality Modeling [4.525267347429154]
We train a Transformer-based neural model on the BERT language model.
In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size.
The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset.
arXiv Detail & Related papers (2020-03-29T14:00:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.