Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent
Detection
- URL: http://arxiv.org/abs/2402.17256v2
- Date: Mon, 4 Mar 2024 06:04:32 GMT
- Title: Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent
Detection
- Authors: Pei Wang, Keqing He, Yejie Wang, Xiaoshuai Song, Yutao Mou, Jingang
Wang, Yunsen Xian, Xunliang Cai, Weiran Xu
- Abstract summary: This paper conducts a comprehensive evaluation of large language models (LLMs) represented by ChatGPT.
We find that LLMs exhibit strong zero-shot and few-shot capabilities, but is still at a disadvantage compared to models fine-tuned with full resource.
- Score: 34.135738700682055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Out-of-domain (OOD) intent detection aims to examine whether the user's query
falls outside the predefined domain of the system, which is crucial for the
proper functioning of task-oriented dialogue (TOD) systems. Previous methods
address it by fine-tuning discriminative models. Recently, some studies have
been exploring the application of large language models (LLMs) represented by
ChatGPT to various downstream tasks, but it is still unclear for their ability
on OOD detection task.This paper conducts a comprehensive evaluation of LLMs
under various experimental settings, and then outline the strengths and
weaknesses of LLMs. We find that LLMs exhibit strong zero-shot and few-shot
capabilities, but is still at a disadvantage compared to models fine-tuned with
full resource. More deeply, through a series of additional analysis
experiments, we discuss and summarize the challenges faced by LLMs and provide
guidance for future work including injecting domain knowledge, strengthening
knowledge transfer from IND(In-domain) to OOD, and understanding long
instructions.
Related papers
- Exploring Large Language Models for Multimodal Sentiment Analysis: Challenges, Benchmarks, and Future Directions [0.0]
Multimodal Aspect-Based Sentiment Analysis (MABSA) aims to extract aspect terms and their corresponding sentiment polarities from multimodal information, including text and images.
Traditional supervised learning methods have shown effectiveness in this task, but the adaptability of large language models (LLMs) to MABSA remains uncertain.
Recent advances in LLMs, such as Llama2, LLaVA, and ChatGPT, demonstrate strong capabilities in general tasks, yet their performance in complex and fine-grained scenarios like MABSA is underexplored.
arXiv Detail & Related papers (2024-11-23T02:17:10Z) - Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse.
Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-AI collaboration.
To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z) - Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [85.24399869971236]
We aim to evaluate Large Language Models (LLMs) for embodied decision making.
Existing evaluations tend to rely solely on a final success rate.
We propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks.
arXiv Detail & Related papers (2024-10-09T17:59:00Z) - EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.
We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.
Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - Defining Boundaries: A Spectrum of Task Feasibility for Large Language Models [6.008311204104302]
Large language models (LLMs) have shown remarkable performance in various tasks but often fail to handle queries that exceed their knowledge and capabilities.
This paper addresses the need for LLMs to recognize and refuse infeasible tasks due to the required skills surpassing their capabilities.
arXiv Detail & Related papers (2024-08-11T22:58:23Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - A Reality check of the benefits of LLM in business [1.9181612035055007]
Large language models (LLMs) have achieved remarkable performance in language understanding and generation tasks.
This paper thoroughly examines the usefulness and readiness of LLMs for business processes.
arXiv Detail & Related papers (2024-06-09T02:36:00Z) - Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [68.83624133567213]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question.
We also propose a simple yet effective method, Active Deduction (AD), to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z) - How Good Are LLMs at Out-of-Distribution Detection? [13.35571704613836]
Out-of-distribution (OOD) detection plays a vital role in enhancing the reliability of machine learning (ML) models.
This paper embarks on a pioneering empirical investigation of OOD detection in the domain of large language models (LLMs)
arXiv Detail & Related papers (2023-08-20T13:15:18Z) - Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks.
We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.