Towards Probing Contact Center Large Language Models
- URL: http://arxiv.org/abs/2312.15922v1
- Date: Tue, 26 Dec 2023 07:34:39 GMT
- Title: Towards Probing Contact Center Large Language Models
- Authors: Varun Nathan, Ayush Kumar, Digvijay Ingle and Jithendra Vepa
- Abstract summary: Fine-tuning large language models (LLMs) with domain-specific instructions has emerged as an effective method to enhance their domain-specific understanding.
We benchmark the fundamental characteristics learned by contact-center (CC) specific instruction fine-tuned LLMs with out-of-the-box (OOB) LLMs.
Our findings reveal remarkable effectiveness of CC-LLMs on the in-domain downstream tasks, with improvement in response acceptability by over 48% compared to OOB-LLMs.
- Score: 11.018095513653758
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Fine-tuning large language models (LLMs) with domain-specific instructions
has emerged as an effective method to enhance their domain-specific
understanding. Yet, there is limited work that examines the core
characteristics acquired during this process. In this study, we benchmark the
fundamental characteristics learned by contact-center (CC) specific instruction
fine-tuned LLMs with out-of-the-box (OOB) LLMs via probing tasks encompassing
conversational, channel, and automatic speech recognition (ASR) properties. We
explore different LLM architectures (Flan-T5 and Llama), sizes (3B, 7B, 11B,
13B), and fine-tuning paradigms (full fine-tuning vs PEFT). Our findings reveal
remarkable effectiveness of CC-LLMs on the in-domain downstream tasks, with
improvement in response acceptability by over 48% compared to OOB-LLMs.
Additionally, we compare the performance of OOB-LLMs and CC-LLMs on the widely
used SentEval dataset, and assess their capabilities in terms of surface,
syntactic, and semantic information through probing tasks. Intriguingly, we
note a relatively consistent performance of probing classifiers on the set of
probing tasks. Our observations indicate that CC-LLMs, while outperforming
their out-of-the-box counterparts, exhibit a tendency to rely less on encoding
surface, syntactic, and semantic properties, highlighting the intricate
interplay between domain-specific adaptation and probing task performance
opening up opportunities to explore behavior of fine-tuned language models in
specialized contexts.
Related papers
- Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning [15.919493497867567]
This study aims to evaluate the performance of Multimodal Large Language Models (MLLMs) on the VALSE benchmark.
We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in model size and pretraining datasets.
arXiv Detail & Related papers (2024-07-17T11:26:47Z) - Benchmarking General-Purpose In-Context Learning [19.40952728849431]
In-context learning (ICL) empowers generative models to address new tasks effectively and efficiently on the fly.
In this paper, we study extending ICL to address a broader range of tasks with an extended learning horizon and higher improvement potential.
arXiv Detail & Related papers (2024-05-27T14:50:42Z) - Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning [79.38140606606126]
We propose an algorithmic framework that fine-tunes vision-language models (VLMs) with reinforcement learning (RL)
Our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning.
We demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks.
arXiv Detail & Related papers (2024-05-16T17:50:19Z) - An Empirical Study of Automated Vulnerability Localization with Large Language Models [21.84971967029474]
Large Language Models (LLMs) have shown potential in various domains, yet their effectiveness in vulnerability localization remains underexplored.
Our investigation encompasses 10+ leading LLMs suitable for code analysis, including ChatGPT and various open-source models.
We explore the efficacy of these LLMs using 4 distinct paradigms: zero-shot learning, one-shot learning, discriminative fine-tuning, and generative fine-tuning.
arXiv Detail & Related papers (2024-03-30T08:42:10Z) - The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM)
We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions.
Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z) - Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs)
Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks.
We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Revisit Input Perturbation Problems for LLMs: A Unified Robustness
Evaluation Framework for Noisy Slot Filling Task [18.623619585980688]
We propose a unified robustness evaluation framework based on the slot-filling task to evaluate the dialogue understanding capability of large language models.
Specifically, we construct a input perturbation evaluation dataset, Noise-LLM, which contains five types of single perturbation and four types of mixed perturbation data.
Our aim is to assess how well various robustness methods of LLMs perform in real-world noisy scenarios.
arXiv Detail & Related papers (2023-10-10T10:22:05Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - Probing Linguistic Features of Sentence-Level Representations in Neural
Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.