Harnessing the Power of BERT in the Turkish Clinical Domain: Pretraining
Approaches for Limited Data Scenarios
- URL: http://arxiv.org/abs/2305.03788v1
- Date: Fri, 5 May 2023 18:39:07 GMT
- Title: Harnessing the Power of BERT in the Turkish Clinical Domain: Pretraining
Approaches for Limited Data Scenarios
- Authors: Hazal T\"urkmen, O\u{g}uz Dikenelli, Cenk Eraslan, Mehmet Cem
\c{C}all{\i}, S\"uha S\"ureyya \"Ozbek
- Abstract summary: General Turkish BERT model (BERTurk) and TurkRadBERT-task v1, both of which utilize knowledge from a substantial general-domain corpus, demonstrate the best overall performance.
Our results underscore the significance of domain-specific vocabulary during pre-training for enhancing model performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, major advancements in natural language processing (NLP) have
been driven by the emergence of large language models (LLMs), which have
significantly revolutionized research and development within the field.
Building upon this progress, our study delves into the effects of various
pre-training methodologies on Turkish clinical language models' performance in
a multi-label classification task involving radiology reports, with a focus on
addressing the challenges posed by limited language resources. Additionally, we
evaluated the simultaneous pretraining approach by utilizing limited clinical
task data for the first time. We developed four models, including
TurkRadBERT-task v1, TurkRadBERT-task v2, TurkRadBERT-sim v1, and
TurkRadBERT-sim v2. Our findings indicate that the general Turkish BERT model
(BERTurk) and TurkRadBERT-task v1, both of which utilize knowledge from a
substantial general-domain corpus, demonstrate the best overall performance.
Although the task-adaptive pre-training approach has the potential to capture
domain-specific patterns, it is constrained by the limited task-specific corpus
and may be susceptible to overfitting. Furthermore, our results underscore the
significance of domain-specific vocabulary during pre-training for enhancing
model performance. Ultimately, we observe that the combination of
general-domain knowledge and task-specific fine-tuning is essential for
achieving optimal performance across a range of categories. This study offers
valuable insights for developing effective Turkish clinical language models and
can guide future research on pre-training techniques for other low-resource
languages within the clinical domain.
Related papers
- Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking [1.3716808114696444]
Large Language Models (LLMs) are becoming crucial across various fields, emphasizing the urgency for high-quality models in underrepresented languages.
This study explores the unique challenges faced by low-resource languages, such as data scarcity, model selection, evaluation, and computational limitations.
arXiv Detail & Related papers (2024-05-07T21:58:45Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Fine-tuning Transformer-based Encoder for Turkish Language Understanding
Tasks [0.0]
We provide a Transformer-based model and a baseline benchmark for the Turkish Language.
We successfully fine-tuned a Turkish BERT model, namely BERTurk, to many downstream tasks and evaluated with a the Turkish Benchmark dataset.
arXiv Detail & Related papers (2024-01-30T19:27:04Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - CLIN-X: pre-trained language models and a study on cross-task transfer
for concept extraction in the clinical domain [22.846469609263416]
We introduce the pre-trained CLIN-X (Clinical XLM-R) language models and show how CLIN-X outperforms other pre-trained transformer models.
Our studies reveal stable model performance despite a lack of annotated data with improvements of up to 47 F1 points when only 250 labeled sentences are available.
Our results highlight the importance of specialized language models as CLIN-X for concept extraction in non-standard domains.
arXiv Detail & Related papers (2021-12-16T10:07:39Z) - XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation [80.18830380517753]
We develop a new task-agnostic distillation framework XtremeDistilTransformers.
We study the transferability of several source tasks, augmentation resources and model architecture for distillation.
arXiv Detail & Related papers (2021-06-08T17:49:33Z) - Structured Prediction as Translation between Augmented Natural Languages [109.50236248762877]
We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks.
Instead of tackling the problem by training task-specific discriminatives, we frame it as a translation task between augmented natural languages.
Our approach can match or outperform task-specific models on all tasks, and in particular, achieves new state-of-the-art results on joint entity and relation extraction.
arXiv Detail & Related papers (2021-01-14T18:32:21Z) - Fine-tuning BERT for Low-Resource Natural Language Understanding via
Active Learning [30.5853328612593]
In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model.
Our experimental results show an advantage in model performance by maximizing the approximate knowledge gain of the model.
We analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters.
arXiv Detail & Related papers (2020-12-04T08:34:39Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.