Related papers: Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care

Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care

URL: http://arxiv.org/abs/2506.04389v1
Date: Wed, 04 Jun 2025 19:14:48 GMT
Title: Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care
Authors: Saurabh Kumar, Sourav Bansal, Neeraj Agrawal, Priyanka Bhatt,
Abstract summary: SOTA pre-trained models like multilingual-BERT, fine-tuned on annotated data have shown good performance in downstream tasks relevant to Customer Care.<n>We propose an embedder-cum-classifier model architecture which extends state-of-the-art domain-specific models to other domains with only a few labeled samples.<n> Experiments on Canada and Mexico e-commerce Customer Care dataset with few-shot intent detection show an increase in accuracy by 20-23%.
Score: 1.0129089187146396
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Customer care is an essential pillar of the e-commerce shopping experience with companies spending millions of dollars each year, employing automation and human agents, across geographies (like US, Canada, Mexico, Chile), channels (like Chat, Interactive Voice Response (IVR)), and languages (like English, Spanish). SOTA pre-trained models like multilingual-BERT, fine-tuned on annotated data have shown good performance in downstream tasks relevant to Customer Care. However, model performance is largely subject to the availability of sufficient annotated domain-specific data. Cross-domain availability of data remains a bottleneck, thus building an intent classifier that generalizes across domains (defined by channel, geography, and language) with only a few annotations, is of great practical value. In this paper, we propose an embedder-cum-classifier model architecture which extends state-of-the-art domain-specific models to other domains with only a few labeled samples. We adopt a supervised fine-tuning approach with isotropic regularizers to train a domain-specific sentence embedder and a multilingual knowledge distillation strategy to generalize this embedder across multiple domains. The trained embedder, further augmented with a simple linear classifier can be deployed for new domains. Experiments on Canada and Mexico e-commerce Customer Care dataset with few-shot intent detection show an increase in accuracy by 20-23% against the existing state-of-the-art pre-trained models.

Related papers

Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models [5.466962214217334]
Supervised fine-tuning (SFT) is widely used to align large language models (LLMs) with information extraction (IE) tasks, such as named entity recognition (NER)<n>We propose the SaM framework, which dynamically Selects and Merges expert models at inference time.
arXiv Detail & Related papers (2025-06-28T08:28:52Z)
Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification. We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z)
Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian [75.94354349994576]
This paper explores the feasibility of employing smaller, domain-specific encoder LMs alongside prompting techniques to enhance performance in specialized contexts. Our study concentrates on the Italian bureaucratic and legal language, experimenting with both general-purpose and further pre-trained encoder-only models. The results indicate that while further pre-trained models may show diminished robustness in general knowledge, they exhibit superior adaptability for domain-specific tasks, even in a zero-shot setting.
arXiv Detail & Related papers (2024-07-30T08:50:16Z)
Federated and Generalized Person Re-identification through Domain and Feature Hallucinating [88.77196261300699]
We study the problem of federated domain generalization (FedDG) for person re-identification (re-ID) We propose a novel method, called "Domain and Feature Hallucinating (DFH)", to produce diverse features for learning generalized local and global models. Our method achieves the state-of-the-art performance for FedDG on four large-scale re-ID benchmarks.
arXiv Detail & Related papers (2022-03-05T09:15:13Z)
CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain [22.846469609263416]
We introduce the pre-trained CLIN-X (Clinical XLM-R) language models and show how CLIN-X outperforms other pre-trained transformer models. Our studies reveal stable model performance despite a lack of annotated data with improvements of up to 47 F1 points when only 250 labeled sentences are available. Our results highlight the importance of specialized language models as CLIN-X for concept extraction in non-standard domains.
arXiv Detail & Related papers (2021-12-16T10:07:39Z)
MDAPT: Multilingual Domain Adaptive Pretraining in a Single Model [17.566140528671134]
We show that a single multilingual domain-specific model can outperform the general multilingual model. We propose different techniques to compose pretraining corpora that enable a language model to both become domain-specific and multilingual.
arXiv Detail & Related papers (2021-09-14T11:50:26Z)
X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries. We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP. We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z)
Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements. We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations. Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
Deep Contextual Embeddings for Address Classification in E-commerce [0.03222802562733786]
E-commerce customers in developing nations like India tend to follow no fixed format while entering shipping addresses. It is imperative to understand the language of addresses, so that shipments can be routed without delays. We propose a novel approach towards understanding customer addresses by deriving motivation from recent advances in Natural Language Processing (NLP)
arXiv Detail & Related papers (2020-07-06T19:06:34Z)
Unsupervised Domain Clusters in Pretrained Language Models [61.832234606157286]
We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision. We propose domain data selection methods based on such models. We evaluate our data selection methods for neural machine translation across five diverse domains.
arXiv Detail & Related papers (2020-04-05T06:22:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.