Efficient Domain-adaptive Continual Pretraining for the Process Industry in the German Language
- URL: http://arxiv.org/abs/2504.19856v2
- Date: Wed, 30 Apr 2025 07:13:15 GMT
- Title: Efficient Domain-adaptive Continual Pretraining for the Process Industry in the German Language
- Authors: Anastasia Zhukova, Christian E. Matt, Terry Ruas, Bela Gipp,
- Abstract summary: Domain-adaptive continual pretraining (DAPT) is a state-of-the-art technique that further trains a language model (LM) on its pretraining task, e.g., language masking.<n>This paper introduces an efficient approach called ICL-augmented pretraining or ICL-APT that leverages in-context learning (ICL) and k-nearest neighbors (kNN) to augment target data with domain-related and in-domain texts.<n>Our results show that this approach performs better than traditional DAPT by 3.5 points of the average IR metrics and requires almost 4 times less computing
- Score: 5.886032029544411
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Domain-adaptive continual pretraining (DAPT) is a state-of-the-art technique that further trains a language model (LM) on its pretraining task, e.g., language masking. Although popular, it requires a significant corpus of domain-related data, which is difficult to obtain for specific domains in languages other than English, such as the process industry in the German language. This paper introduces an efficient approach called ICL-augmented pretraining or ICL-APT that leverages in-context learning (ICL) and k-nearest neighbors (kNN) to augment target data with domain-related and in-domain texts, significantly reducing GPU time while maintaining strong model performance. Our results show that this approach performs better than traditional DAPT by 3.5 points of the average IR metrics (e.g., mAP, MRR, and nDCG) and requires almost 4 times less computing time, providing a cost-effective solution for industries with limited computational capacity. The findings highlight the broader applicability of this framework to other low-resource industries, making NLP-based solutions more accessible and feasible in production environments.
Related papers
- Multi-stage Training of Bilingual Islamic LLM for Neural Passage Retrieval [0.0]
The research employs a language reduction technique to create a lightweight bilingual large language model (LLM)<n>Our approach for domain adaptation addresses the unique challenges faced in the Islamic domain, where substantial in-domain corpora exist only in Arabic.<n>The findings suggest that combining domain adaptation and a multi-stage training method for the bilingual Islamic neural retrieval model enables it to outperform monolingual models on downstream retrieval tasks.
arXiv Detail & Related papers (2025-01-17T13:17:42Z) - Building an Efficient Multilingual Non-Profit IR System for the Islamic Domain Leveraging Multiprocessing Design in Rust [0.0]
This work focuses on the development of a multilingual non-profit IR system for the Islamic domain.
By employing methods like continued pre-training for domain adaptation and language reduction to decrease model size, a lightweight multilingual retrieval model was prepared.
arXiv Detail & Related papers (2024-11-09T11:37:18Z) - TransformLLM: Adapting Large Language Models via LLM-Transformed Reading Comprehension Text [5.523385345486362]
We have developed language models specifically designed for legal applications.
Our innovative approach significantly improves capabilities in legal tasks by using Large Language Models (LLMs) to convert raw training data into reading comprehension text.
arXiv Detail & Related papers (2024-10-28T19:32:18Z) - Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention [71.12193680015622]
Large Language Models (LLMs) have shown remarkable capabilities in natural language processing.
LLMs exhibit significant performance gaps among different languages.
We propose Inference-Time Cross-Lingual Intervention (INCLINE) to overcome these limitations without incurring significant costs.
arXiv Detail & Related papers (2024-10-16T11:23:03Z) - Enhancing Medical Specialty Assignment to Patients using NLP Techniques [0.0]
We propose an alternative approach that achieves superior performance while being computationally efficient.
Specifically, we utilize keywords to train a deep learning architecture that outperforms a language model pretrained on a large corpus of text.
Our results demonstrate that utilizing keywords for text classification significantly improves classification performance.
arXiv Detail & Related papers (2023-12-09T14:13:45Z) - Non-Parametric Domain Adaptation for End-to-End Speech Translation [72.37869362559212]
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters.
We propose a novel non-parametric method that leverages domain-specific text translation corpus to achieve domain adaptation for the E2E-ST system.
arXiv Detail & Related papers (2022-05-23T11:41:02Z) - Clinical Prompt Learning with Frozen Language Models [4.077071350659386]
Large but frozen pre-trained language models (PLMs) with prompt learning outperform smaller but fine-tuned models.
We investigated the viability of prompt learning on clinically meaningful decision tasks.
Results are partially in line with the prompt learning literature, with prompt learning able to match or improve on traditional fine-tuning.
arXiv Detail & Related papers (2022-05-11T14:25:13Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Non-Parametric Unsupervised Domain Adaptation for Neural Machine
Translation [61.27321597981737]
$k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor retrieval.
We propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval.
arXiv Detail & Related papers (2021-09-14T11:50:01Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - An Empirical Investigation Towards Efficient Multi-Domain Language Model
Pre-training [15.440627147018711]
We conduct an empirical investigation into known methods to mitigate catastrophic forgetting (CF)
We find that elastic weight consolidation provides best overall scores yielding only a 0.33% drop in performance across seven generic tasks.
arXiv Detail & Related papers (2020-10-01T09:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.