DS-TOD: Efficient Domain Specialization for Task Oriented Dialog
- URL: http://arxiv.org/abs/2110.08395v1
- Date: Fri, 15 Oct 2021 22:25:51 GMT
- Title: DS-TOD: Efficient Domain Specialization for Task Oriented Dialog
- Authors: Chia-Chien Hung, Anne Lauscher, Simone Paolo Ponzetto, Goran
Glava\v{s}
- Abstract summary: Self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD)
We investigate the effects of domain specialization of pretrained language models (PLMs) for task-oriented dialog.
We propose a resource-efficient and modular domain specialization by means of domain adapters.
- Score: 12.395323315744625
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent work has shown that self-supervised dialog-specific pretraining on
large conversational datasets yields substantial gains over traditional
language modeling (LM) pretraining in downstream task-oriented dialog (TOD).
These approaches, however, exploit general dialogic corpora (e.g., Reddit) and
thus presumably fail to reliably embed domain-specific knowledge useful for
concrete downstream TOD domains. In this work, we investigate the effects of
domain specialization of pretrained language models (PLMs) for task-oriented
dialog. Within our DS-TOD framework, we first automatically extract salient
domain-specific terms, and then use them to construct DomainCC and DomainReddit
-- resources that we leverage for domain-specific pretraining, based on (i)
masked language modeling (MLM) and (ii) response selection (RS) objectives,
respectively. We further propose a resource-efficient and modular domain
specialization by means of domain adapters -- additional parameter-light layers
in which we encode the domain knowledge. Our experiments with two prominent TOD
tasks -- dialog state tracking (DST) and response retrieval (RR) --
encompassing five domains from the MultiWOZ TOD benchmark demonstrate the
effectiveness of our domain specialization approach. Moreover, we show that the
light-weight adapter-based specialization (1) performs comparably to full
fine-tuning in single-domain setups and (2) is particularly suitable for
multi-domain specialization, in which, besides advantageous computational
footprint, it can offer better downstream performance.
Related papers
- A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation [52.0964459842176]
Current state-of-the-art dialogue systems heavily rely on extensive training datasets.
We propose a novel data textbfAugmentation framework for textbfMulti-textbfDomain textbfDialogue textbfGeneration, referred to as textbfAMD$2$G.
The AMD$2$G framework consists of a data augmentation process and a two-stage training approach: domain-agnostic training and domain adaptation training.
arXiv Detail & Related papers (2024-06-14T09:52:27Z) - More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs [40.54076184225558]
The performance on general tasks decreases after Large Language Models (LLMs) are fine-tuned on domain-specific tasks, known as Catastrophic Forgetting (CF)
This paper presents a challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI)
The objective of GCI is not merely to retain previously acquired general capabilities alongside new domain knowledge, but to harmonize and utilize both sets of skills in a cohesive manner to enhance performance on domain-specific tasks.
arXiv Detail & Related papers (2024-05-28T05:00:12Z) - Boosting Large Language Models with Continual Learning for Aspect-based Sentiment Analysis [33.86086075084374]
Aspect-based sentiment analysis (ABSA) is an important subtask of sentiment analysis.
We propose a Large Language Model-based Continual Learning (textttLLM-CL) model for ABSA.
arXiv Detail & Related papers (2024-05-09T02:00:07Z) - Unified Language-driven Zero-shot Domain Adaptation [55.64088594551629]
Unified Language-driven Zero-shot Domain Adaptation (ULDA) is a novel task setting.
It enables a single model to adapt to diverse target domains without explicit domain-ID knowledge.
arXiv Detail & Related papers (2024-04-10T16:44:11Z) - Role Prompting Guided Domain Adaptation with General Capability Preserve
for Large Language Models [55.51408151807268]
When tailored to specific domains, Large Language Models (LLMs) tend to experience catastrophic forgetting.
crafting a versatile model for multiple domains simultaneously often results in a decline in overall performance.
We present the RolE Prompting Guided Multi-Domain Adaptation (REGA) strategy.
arXiv Detail & Related papers (2024-03-05T08:22:41Z) - Domain Prompt Learning with Quaternion Networks [49.45309818782329]
We propose to leverage domain-specific knowledge from domain-specific foundation models to transfer the robust recognition ability of Vision-Language Models to specialized domains.
We present a hierarchical approach that generates vision prompt features by analyzing intermodal relationships between hierarchical language prompt features and domain-specific vision features.
Our proposed method achieves new state-of-the-art results in prompt learning.
arXiv Detail & Related papers (2023-12-12T08:49:39Z) - Zero-Shot Generalizable End-to-End Task-Oriented Dialog System using
Context Summarization and Domain Schema [2.7178968279054936]
State-of-the-art approaches in task-oriented dialog systems formulate the problem as a conditional sequence generation task.
This requires labeled training data for each new domain or task.
We introduce a novel Zero-Shot generalizable end-to-end Task-oriented Dialog system, ZS-ToD.
arXiv Detail & Related papers (2023-03-28T18:56:31Z) - A Unified Knowledge Graph Augmentation Service for Boosting
Domain-specific NLP Tasks [10.28161912127425]
We propose KnowledgeDA, a unified domain language model development service to enhance the task-specific training procedure with domain knowledge graphs.
We implement a prototype of KnowledgeDA to learn language models for two domains, healthcare and software development.
arXiv Detail & Related papers (2022-12-10T09:18:43Z) - Structured Latent Embeddings for Recognizing Unseen Classes in Unseen
Domains [108.11746235308046]
We propose a novel approach that learns domain-agnostic structured latent embeddings by projecting images from different domains.
Our experiments on the challenging DomainNet and DomainNet-LS benchmarks show the superiority of our approach over existing methods.
arXiv Detail & Related papers (2021-07-12T17:57:46Z) - Domain Conditioned Adaptation Network [90.63261870610211]
We propose a Domain Conditioned Adaptation Network (DCAN) to excite distinct convolutional channels with a domain conditioned channel attention mechanism.
This is the first work to explore the domain-wise convolutional channel activation for deep DA networks.
arXiv Detail & Related papers (2020-05-14T04:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.