Pretraining and Updates of Domain-Specific LLM: A Case Study in the Japanese Business Domain
- URL: http://arxiv.org/abs/2404.08262v3
- Date: Wed, 06 Nov 2024 16:19:24 GMT
- Title: Pretraining and Updates of Domain-Specific LLM: A Case Study in the Japanese Business Domain
- Authors: Kosuke Takahashi, Takahiro Omi, Kosuke Arima, Tatsuya Ishigaki,
- Abstract summary: This paper presents our findings from training and evaluating a Japanese business domain-specific LLM.
Our pretrained model and business domain benchmark are publicly available to support further studies.
- Score: 4.133477882188227
- License:
- Abstract: The development of Large Language Models (LLMs) in various languages has been advancing, but the combination of non-English languages with domain-specific contexts remains underexplored. This paper presents our findings from training and evaluating a Japanese business domain-specific LLM designed to better understand business-related documents, such as the news on current affairs, technical reports, and patents. Additionally, LLMs in this domain require regular updates to incorporate the most recent knowledge. Therefore, we also report our findings from the first experiments and evaluations involving updates to this LLM using the latest article data, which is an important problem setting that has not been addressed in previous research. From our experiments on a newly created benchmark dataset for question answering in the target domain, we found that (1) our pretrained model improves QA accuracy without losing general knowledge, and (2) a proper mixture of the latest and older texts in the training data for the update is necessary. Our pretrained model and business domain benchmark are publicly available to support further studies.
Related papers
- CPRM: A LLM-based Continual Pre-training Framework for Relevance Modeling in Commercial Search [34.08551439233784]
CPRM is a framework designed for the continual pre-training of large language models (LLMs)
Our framework includes three modules: 1) employing both queries and multi-field item to jointly pre-train for enhancing domain knowledge, 2) applying in-context pre-training, and 3) conducting reading comprehension on items to produce associated domain knowledge and background information.
arXiv Detail & Related papers (2024-12-02T08:35:54Z) - A Practical Guide to Fine-tuning Language Models with Limited Data [9.413178499853156]
Employing pre-trained Large Language Models (LLMs) has become the de facto standard in Natural Language Processing (NLP) despite their extensive data requirements.
Motivated by the recent surge in research focused on training LLMs with limited data, this paper surveys recent transfer learning approaches to optimize model performance in downstream tasks where data is scarce.
arXiv Detail & Related papers (2024-11-14T15:55:37Z) - TransformLLM: Adapting Large Language Models via LLM-Transformed Reading Comprehension Text [5.523385345486362]
We have developed language models specifically designed for legal applications.
Our innovative approach significantly improves capabilities in legal tasks by using Large Language Models (LLMs) to convert raw training data into reading comprehension text.
arXiv Detail & Related papers (2024-10-28T19:32:18Z) - Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.
We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z) - BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks.
Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs.
We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z) - Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation.
We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques.
We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z) - A Self-enhancement Approach for Domain-specific Chatbot Training via
Knowledge Mining and Digest [62.63606958140248]
Large Language Models (LLMs) often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains.
This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources.
We train a knowledge miner, namely LLMiner, which autonomously extracts Question-Answer pairs from relevant documents.
arXiv Detail & Related papers (2023-11-17T16:09:10Z) - Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey [100.24095818099522]
Large language models (LLMs) have significantly advanced the field of natural language processing (NLP)
They provide a highly useful, task-agnostic foundation for a wide range of applications.
However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles.
arXiv Detail & Related papers (2023-05-30T03:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.