EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models
with Semi-structured Data
- URL: http://arxiv.org/abs/2312.15696v1
- Date: Mon, 25 Dec 2023 11:31:47 GMT
- Title: EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models
with Semi-structured Data
- Authors: Shirong Ma, Shen Huang, Shulin Huang, Xiaobin Wang, Yangning Li,
Hai-Tao Zheng, Pengjun Xie, Fei Huang and Yong Jiang
- Abstract summary: Large Language Models (LLMs) pre-trained on massive corpora have exhibited remarkable performance on various NLP tasks.
Applying these models to specific domains still poses significant challenges, such as lack of domain knowledge.
We focus on domain-specific continual pre-training of LLMs using E-commerce domain as an exemplar.
- Score: 67.8302955948861
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) pre-trained on massive corpora have exhibited
remarkable performance on various NLP tasks. However, applying these models to
specific domains still poses significant challenges, such as lack of domain
knowledge, limited capacity to leverage domain knowledge and inadequate
adaptation to domain-specific data formats. Considering the exorbitant cost of
training LLMs from scratch and the scarcity of annotated data within particular
domains, in this work, we focus on domain-specific continual pre-training of
LLMs using E-commerce domain as an exemplar. Specifically, we explore the
impact of continual pre-training on LLMs employing unlabeled general and
E-commercial corpora. Furthermore, we design a mixing strategy among different
data sources to better leverage E-commercial semi-structured data. We construct
multiple tasks to assess LLMs' few-shot In-context Learning ability and their
zero-shot performance after instruction tuning in E-commerce domain.
Experimental results demonstrate the effectiveness of continual pre-training of
E-commerce LLMs and the efficacy of our devised data mixing strategy.
Related papers
- Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs' Domain-Specific Insight Learning? [4.390998479503661]
Large Language Models (LLMs) have demonstrated remarkable performance on various tasks.
However, their ability to extract and internalize deeper insights from domain-specific datasets remains underexplored.
This study investigates how continual pre-training can enhance LLMs' capacity for insight learning.
arXiv Detail & Related papers (2025-01-29T18:40:32Z) - Commute Your Domains: Trajectory Optimality Criterion for Multi-Domain Learning [50.80758278865274]
In multi-domain learning, a single model is trained on diverse data domains to leverage shared knowledge and improve generalization.
The order in which the data from these domains is used for training can significantly affect the model's performance on each domain.
We investigate the influence of training order (or data mixing) in multi-domain learning using the concept of Lie bracket of gradient vector fields.
arXiv Detail & Related papers (2025-01-26T15:12:06Z) - Unified Parameter-Efficient Unlearning for LLMs [25.195126838721492]
Large Language Models (LLMs) have revolutionized natural language processing, enabling advanced understanding and reasoning capabilities across a variety of tasks.
This raises significant privacy and security concerns, as models may inadvertently retain and disseminate sensitive or undesirable information.
We introduce a novel instance-wise unlearning framework, LLMEraser, which systematically categorizes unlearning tasks and applies precise adjustments using influence functions.
arXiv Detail & Related papers (2024-11-30T07:21:02Z) - Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks.
To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z) - A Practical Guide to Fine-tuning Language Models with Limited Data [9.413178499853156]
Employing pre-trained Large Language Models (LLMs) has become the de facto standard in Natural Language Processing (NLP) despite their extensive data requirements.
Motivated by the recent surge in research focused on training LLMs with limited data, this paper surveys recent transfer learning approaches to optimize model performance in downstream tasks where data is scarce.
arXiv Detail & Related papers (2024-11-14T15:55:37Z) - Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.
We devise a series of experiments to explain the performance gap empirically.
arXiv Detail & Related papers (2024-09-27T05:06:43Z) - Investigating LLM Applications in E-Commerce [17.854070801235217]
Large Language Models (LLMs) have revolutionized natural language processing in various applications especially in e-commerce.
This paper explored the efficacy of LLMs in the e-commerce domain, focusing on instruction-tuning an open source LLM model with public e-commerce datasets of varying sizes.
We examined the effectiveness of the current niche industrial application of very large LLM, using in-context learning, in e-commerce specific tasks.
arXiv Detail & Related papers (2024-08-23T00:57:37Z) - Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data [53.70870879858533]
We introduce a Federated Domain-specific Knowledge Transfer framework.
It enables domain-specific knowledge transfer from LLMs to SLMs while preserving clients' data privacy.
The proposed FDKT framework consistently and greatly improves SLMs' task performance by around 5% with a privacy budget of less than 10.
arXiv Detail & Related papers (2024-05-23T06:14:35Z) - BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks.
Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs.
We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z) - Fine-tuning Large Enterprise Language Models via Ontological Reasoning [5.12835891233968]
Large Language Models (LLMs) exploit fine-tuning as a technique to adapt to diverse goals, thanks to task-specific training data.
We propose a novel neurosymbolic architecture that leverages the power of ontological reasoning to build task- and domain-specific corpora for LLM fine-tuning.
arXiv Detail & Related papers (2023-06-19T06:48:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.