DACP: Domain-Adaptive Continual Pre-Training of Large Language Models for Phone Conversation Summarization
- URL: http://arxiv.org/abs/2510.05858v3
- Date: Thu, 09 Oct 2025 12:35:24 GMT
- Title: DACP: Domain-Adaptive Continual Pre-Training of Large Language Models for Phone Conversation Summarization
- Authors: Xue-Yong Fu, Elena Khasanova, Md Tahmid Rahman Laskar, Harsh Saini, Shashi Bhushan TN,
- Abstract summary: Large language models (LLMs) have achieved impressive performance in text summarization.<n>Fine-tuning can improve summarization quality, but it typically relies on costly and scarce high-quality labeled data.<n>We explore continual pre-training as a scalable, self-supervised approach to adapt LLMs for downstream summarization tasks.
- Score: 10.083326281775939
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have achieved impressive performance in text summarization, yet their performance often falls short when applied to specialized domains that differ from their original pre-training distribution. While fine-tuning can improve summarization quality, it typically relies on costly and scarce high-quality labeled data. In this work, we explore continual pre-training as a scalable, self-supervised approach to adapt LLMs for downstream summarization tasks, particularly in the context of noisy real-world conversation transcripts. We conduct extensive experiments using large-scale, unlabeled business conversation data to investigate whether continual pre-training enhances model capabilities in conversational summarization. Our results demonstrate that continual pre-training yields substantial gains in both in-domain and out-of-domain summarization benchmarks, while maintaining strong generalization and robustness. We also analyze the effects of data selection strategies, providing practical guidelines for applying continual pre-training in summarization-focused industrial applications.
Related papers
- Bagging-Based Model Merging for Robust General Text Embeddings [73.51674133699196]
General-purpose text embedding models underpin a wide range of NLP and information retrieval applications.<n>We present a systematic study of multi-task training for text embeddings from two perspectives: data scheduling and model merging.<n>We propose Bagging-based rObust mOdel Merging (BOOM), which trains multiple embedding models on sampled subsets and merges them into a single model.
arXiv Detail & Related papers (2026-02-05T15:45:08Z) - DACIP-RC: Domain Adaptive Continual Instruction Pre-Training via Reading Comprehension on Business Conversations [12.671996818071817]
We propose Domain Adaptive Continual Instruction Pre-Training via Reading (DACIP-RC)<n>Unlike conventional pre-training approaches that rely on next-token prediction, DACIP-RC generates diverse task instructions and responses via reading comprehension on conversation transcripts.<n>Our empirical evaluations demonstrate that DACIP-RC significantly improves zero-shot generalization across a wide range of business conversational tasks.
arXiv Detail & Related papers (2025-10-09T12:35:20Z) - Mid-Training of Large Language Models: A Survey [12.322464058364405]
Large language models (LLMs) are typically developed through large-scale pre-training followed by task-specific fine-tuning.<n>Recent advances highlight the importance of an intermediate mid-training stage.<n>We introduce the first taxonomy of mid-training spanning data distribution, learning-rate scheduling, and long-context extension.
arXiv Detail & Related papers (2025-10-08T09:49:37Z) - Experience Scaling: Post-Deployment Evolution For Large Language Models [44.48142891798125]
We propose experience scaling, a framework for continuous post-deployment evolution for large language models (LLMs)<n>We validate the framework in simulated real-world scenarios involving generalization to previously unseen but related tasks, repetitive queries, and over-saturated knowledge stores.<n>Results demonstrate that structured post-deployment learning can extend LLM capabilities beyond the limits of static human-generated data.
arXiv Detail & Related papers (2025-09-23T08:04:58Z) - From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System [49.57258257916805]
Large Language Models (LLMs) demonstrate strong zero-shot recommendation capabilities.<n>Practical applications often favor smaller, internally managed recommender models due to scalability, interpretability, and data privacy constraints.<n>We propose an active data augmentation framework that synthesizes conversational training data by leveraging black-box LLMs guided by active learning techniques.
arXiv Detail & Related papers (2025-04-21T23:05:47Z) - Aligning Instruction Tuning with Pre-training [61.50161961371844]
We propose Aligning Instruction Tuning with Pre-training (AITP) to align instruction tuning with pre-training distributions.<n>We show consistent performance improvements with AITP on three fully open large language models (LLMs) across eight benchmarks.
arXiv Detail & Related papers (2025-01-16T08:27:40Z) - Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks.
To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z) - Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the output probabilities and the pretraining data frequency.<n>This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Controlled Randomness Improves the Performance of Transformer Models [4.678970068275123]
We introduce controlled randomness, i.e. noise, into the training process to improve fine-tuning language models.
We find that adding such noise can improve the performance in our two downstream tasks of joint named entity recognition and relation extraction and text summarization.
arXiv Detail & Related papers (2023-10-20T14:12:55Z) - Can LMs Generalize to Future Data? An Empirical Analysis on Text
Summarization [50.20034493626049]
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets.
Existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets.
We show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data.
arXiv Detail & Related papers (2023-05-03T08:08:07Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - Contextualization and Generalization in Entity and Relation Extraction [0.0]
We study the behaviour of state-of-the-art models regarding generalization to facts unseen during training.
Traditional benchmarks present important lexical overlap between mentions and relations used for training and evaluating models.
We propose empirical studies to separate performance based on mention and relation overlap with the training set.
arXiv Detail & Related papers (2022-06-15T14:16:42Z) - Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source
Pretraining [10.750492932503649]
Training a large summarization model is generally infeasible due to the inadequacy of dialogue data with annotated summaries.
We propose a multi-source pretraining paradigm to better leverage the external summary data.
Our approach achieves competitive performance and generalizes well in different dialogue scenarios.
arXiv Detail & Related papers (2021-09-09T07:47:16Z) - An Empirical Investigation Towards Efficient Multi-Domain Language Model
Pre-training [15.440627147018711]
We conduct an empirical investigation into known methods to mitigate catastrophic forgetting (CF)
We find that elastic weight consolidation provides best overall scores yielding only a 0.33% drop in performance across seven generic tasks.
arXiv Detail & Related papers (2020-10-01T09:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.