Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent
Detection
- URL: http://arxiv.org/abs/2402.17256v2
- Date: Mon, 4 Mar 2024 06:04:32 GMT
- Title: Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent
Detection
- Authors: Pei Wang, Keqing He, Yejie Wang, Xiaoshuai Song, Yutao Mou, Jingang
Wang, Yunsen Xian, Xunliang Cai, Weiran Xu
- Abstract summary: This paper conducts a comprehensive evaluation of large language models (LLMs) represented by ChatGPT.
We find that LLMs exhibit strong zero-shot and few-shot capabilities, but is still at a disadvantage compared to models fine-tuned with full resource.
- Score: 34.135738700682055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Out-of-domain (OOD) intent detection aims to examine whether the user's query
falls outside the predefined domain of the system, which is crucial for the
proper functioning of task-oriented dialogue (TOD) systems. Previous methods
address it by fine-tuning discriminative models. Recently, some studies have
been exploring the application of large language models (LLMs) represented by
ChatGPT to various downstream tasks, but it is still unclear for their ability
on OOD detection task.This paper conducts a comprehensive evaluation of LLMs
under various experimental settings, and then outline the strengths and
weaknesses of LLMs. We find that LLMs exhibit strong zero-shot and few-shot
capabilities, but is still at a disadvantage compared to models fine-tuned with
full resource. More deeply, through a series of additional analysis
experiments, we discuss and summarize the challenges faced by LLMs and provide
guidance for future work including injecting domain knowledge, strengthening
knowledge transfer from IND(In-domain) to OOD, and understanding long
instructions.
Related papers
- Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse.
Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-AI collaboration.
To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z) - Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation.
Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs.
Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z) - Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [85.24399869971236]
We aim to evaluate Large Language Models (LLMs) for embodied decision making.
Existing evaluations tend to rely solely on a final success rate.
We propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks.
arXiv Detail & Related papers (2024-10-09T17:59:00Z) - EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.
We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.
Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - Defining Boundaries: A Spectrum of Task Feasibility for Large Language Models [6.008311204104302]
Large language models (LLMs) have shown remarkable performance in various tasks but often fail to handle queries that exceed their knowledge and capabilities.
This paper addresses the need for LLMs to recognize and refuse infeasible tasks due to the required skills surpassing their capabilities.
arXiv Detail & Related papers (2024-08-11T22:58:23Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - A Reality check of the benefits of LLM in business [1.9181612035055007]
Large language models (LLMs) have achieved remarkable performance in language understanding and generation tasks.
This paper thoroughly examines the usefulness and readiness of LLMs for business processes.
arXiv Detail & Related papers (2024-06-09T02:36:00Z) - Towards Modeling Learner Performance with Large Language Models [7.002923425715133]
This paper investigates whether the pattern recognition and sequence modeling capabilities of LLMs can be extended to the domain of knowledge tracing.
We compare two approaches to using LLMs for this task, zero-shot prompting and model fine-tuning, with existing, non-LLM approaches to knowledge tracing.
While LLM-based approaches do not achieve state-of-the-art performance, fine-tuned LLMs surpass the performance of naive baseline models and perform on par with standard Bayesian Knowledge Tracing approaches.
arXiv Detail & Related papers (2024-02-29T14:06:34Z) - How Good Are LLMs at Out-of-Distribution Detection? [13.35571704613836]
Out-of-distribution (OOD) detection plays a vital role in enhancing the reliability of machine learning (ML) models.
This paper embarks on a pioneering empirical investigation of OOD detection in the domain of large language models (LLMs)
arXiv Detail & Related papers (2023-08-20T13:15:18Z) - Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks.
We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.