LegalTurk Optimized BERT for Multi-Label Text Classification and NER
- URL: http://arxiv.org/abs/2407.00648v1
- Date: Sun, 30 Jun 2024 10:19:54 GMT
- Title: LegalTurk Optimized BERT for Multi-Label Text Classification and NER
- Authors: Farnaz Zeidi, Mehmet Fatih Amasyali, Çiğdem Erol,
- Abstract summary: We introduce our innovative modified pre-training approach by combining diverse masking strategies.
In this work, we focus on two essential downstream tasks in the legal domain: name entity recognition and multi-label text classification.
Our modified approach demonstrated significant improvements in both NER and multi-label text classification tasks compared to the original BERT model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The introduction of the Transformer neural network, along with techniques like self-supervised pre-training and transfer learning, has paved the way for advanced models like BERT. Despite BERT's impressive performance, opportunities for further enhancement exist. To our knowledge, most efforts are focusing on improving BERT's performance in English and in general domains, with no study specifically addressing the legal Turkish domain. Our study is primarily dedicated to enhancing the BERT model within the legal Turkish domain through modifications in the pre-training phase. In this work, we introduce our innovative modified pre-training approach by combining diverse masking strategies. In the fine-tuning task, we focus on two essential downstream tasks in the legal domain: name entity recognition and multi-label text classification. To evaluate our modified pre-training approach, we fine-tuned all customized models alongside the original BERT models to compare their performance. Our modified approach demonstrated significant improvements in both NER and multi-label text classification tasks compared to the original BERT model. Finally, to showcase the impact of our proposed models, we trained our best models with different corpus sizes and compared them with BERTurk models. The experimental results demonstrate that our innovative approach, despite being pre-trained on a smaller corpus, competes with BERTurk.
Related papers
- BERTer: The Efficient One [0.0]
We explore advanced fine-tuning techniques to boost BERT's performance in sentiment analysis, paraphrase detection, and semantic textual similarity.
Our findings reveal substantial improvements in model efficiency and effectiveness when combining multiple fine-tuning architectures.
arXiv Detail & Related papers (2024-07-19T05:33:09Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z) - Optimizing small BERTs trained for German NER [0.16058099298620418]
We investigate various training techniques of smaller BERT models and evaluate them on five public German NER tasks.
We propose two new fine-tuning techniques leading to better performance: CSE-tagging and a modified form of LCRF.
Furthermore, we introduce a new technique called WWA which reduces BERT memory usage and leads to a small increase in performance.
arXiv Detail & Related papers (2021-04-23T12:36:13Z) - Comparing the Performance of NLP Toolkits and Evaluation measures in
Legal Tech [0.0]
We compare and analyze the pretrained Neural Language Models, XLNet (autoregressive), and BERT (autoencoder) on the Legal Tasks.
XLNet Model performs better on our Sequence Classification task of Legal Opinions Classification, whereas BERT produces better results on the NER task.
We use domain-specific pretraining and additional legal vocabulary to adapt BERT Model further to the Legal Domain.
arXiv Detail & Related papers (2021-03-12T11:06:32Z) - ConvBERT: Improving BERT with Span-based Dynamic Convolution [144.25748617961082]
BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost.
We propose a novel span-based dynamic convolution to replace these self-attention heads to directly model local dependencies.
The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning.
arXiv Detail & Related papers (2020-08-06T07:43:19Z) - DagoBERT: Generating Derivational Morphology with a Pretrained Language
Model [20.81930455526026]
We show that pretrained language models (PLMs) can generate derivationally complex words.
Our best model, DagoBERT, clearly outperforms the previous state of the art in derivation generation.
Our experiments show that the input segmentation crucially impacts BERT's derivational knowledge.
arXiv Detail & Related papers (2020-05-02T01:26:46Z) - Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation [84.64004917951547]
Fine-tuning pre-trained language models like BERT has become an effective way in NLP.
In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation.
arXiv Detail & Related papers (2020-02-24T16:17:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.