Towards Uncertainty-Aware Language Agent
- URL: http://arxiv.org/abs/2401.14016v3
- Date: Thu, 30 May 2024 13:26:38 GMT
- Title: Towards Uncertainty-Aware Language Agent
- Authors: Jiuzhou Han, Wray Buntine, Ehsan Shareghi,
- Abstract summary: We present the Uncertainty-Aware Language Agent (UALA), a framework that orchestrates the interaction between the agent and the external world using uncertainty quantification.
Our experiments demonstrate that UALA brings a significant improvement of performance, while having a substantially lower reliance on the external world.
- Score: 10.227089771963943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Language Agents have achieved promising success by placing Large Language Models at the core of a more versatile design that dynamically interacts with the external world, the existing approaches neglect the notion of uncertainty during these interactions. We present the Uncertainty-Aware Language Agent (UALA), a framework that orchestrates the interaction between the agent and the external world using uncertainty quantification. Compared with other well-known counterparts like ReAct, our extensive experiments across 3 representative tasks (HotpotQA, StrategyQA, MMLU) and various LLM sizes demonstrate that UALA brings a significant improvement of performance, while having a substantially lower reliance on the external world (i.e., reduced number of tool calls and tokens). Our analyses provide various insights including the great potential of UALA compared with agent fine-tuning, and underscore the unreliability of verbalised confidence of LLMs as a proxy for uncertainty.
Related papers
- Positive Experience Reflection for Agents in Interactive Text Environments [9.982616173090264]
We introduce Sweet&Sour, a novel approach that incorporates positive experiences and managed memory to enrich the context available to the agent at decision time.
Our comprehensive analysis spans both closed- and open-source LLMs and demonstrates the effectiveness of Sweet&Sour in improving agent performance.
arXiv Detail & Related papers (2024-11-04T16:15:28Z) - MlingConf: A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models [23.384966485398184]
This paper introduces a comprehensive investigation of Multilingual Confidence estimation (MlingConf) on Large Language Models (LLMs)
The benchmark comprises four meticulously checked and human-evaluated high-quality multilingual datasets for LA tasks and one for the LS task tailored to specific social, cultural, and geographical contexts of a language.
Experiments reveal that on LA tasks English exhibits notable linguistic dominance in confidence estimations than other languages, while on LS tasks, using question-related language to prompt LLMs demonstrates better linguistic dominance in multilingual confidence estimations.
arXiv Detail & Related papers (2024-10-16T11:46:55Z) - DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations [52.242449026151846]
DebUnc is a multi-agent debate framework that uses uncertainty metrics to assess agent confidence levels.
We adapted the attention mechanism to adjust token weights based on confidence levels.
Our evaluations show that attention-based methods are particularly effective.
arXiv Detail & Related papers (2024-07-08T22:15:01Z) - Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study [51.19622266249408]
MultiTrust is the first comprehensive and unified benchmark on the trustworthiness of MLLMs.
Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts.
Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks.
arXiv Detail & Related papers (2024-06-11T08:38:13Z) - A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models [23.384966485398184]
This paper introduces a comprehensive investigation of Multilingual Confidence estimation (MlingConf) on Large Language Models (LLMs)
The benchmark comprises four meticulously checked and human-evaluated high-quality multilingual datasets for LA tasks and one for the LS task tailored to specific social, cultural, and geographical contexts of a language.
Experiments reveal that on LA tasks English exhibits notable linguistic dominance in confidence estimations than other languages, while on LS tasks, using question-related language to prompt LLMs demonstrates better linguistic dominance in multilingual confidence estimations.
arXiv Detail & Related papers (2024-02-21T08:20:06Z) - AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [76.95062553043607]
evaluating large language models (LLMs) is essential for understanding their capabilities and facilitating their integration into practical applications.
We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.
arXiv Detail & Related papers (2024-01-24T01:51:00Z) - AntEval: Evaluation of Social Interaction Competencies in LLM-Driven
Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios.
However, their capability in handling complex, multi-character social interactions has yet to be fully explored.
We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z) - Exchange-of-Thought: Enhancing Large Language Model Capabilities through
Cross-Model Communication [76.04373033082948]
Large Language Models (LLMs) have recently made significant strides in complex reasoning tasks through the Chain-of-Thought technique.
We propose Exchange-of-Thought (EoT), a novel framework that enables cross-model communication during problem-solving.
arXiv Detail & Related papers (2023-12-04T11:53:56Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.