Towards Uncertainty-Aware Language Agent
- URL: http://arxiv.org/abs/2401.14016v3
- Date: Thu, 30 May 2024 13:26:38 GMT
- Title: Towards Uncertainty-Aware Language Agent
- Authors: Jiuzhou Han, Wray Buntine, Ehsan Shareghi,
- Abstract summary: We present the Uncertainty-Aware Language Agent (UALA), a framework that orchestrates the interaction between the agent and the external world using uncertainty quantification.
Our experiments demonstrate that UALA brings a significant improvement of performance, while having a substantially lower reliance on the external world.
- Score: 10.227089771963943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Language Agents have achieved promising success by placing Large Language Models at the core of a more versatile design that dynamically interacts with the external world, the existing approaches neglect the notion of uncertainty during these interactions. We present the Uncertainty-Aware Language Agent (UALA), a framework that orchestrates the interaction between the agent and the external world using uncertainty quantification. Compared with other well-known counterparts like ReAct, our extensive experiments across 3 representative tasks (HotpotQA, StrategyQA, MMLU) and various LLM sizes demonstrate that UALA brings a significant improvement of performance, while having a substantially lower reliance on the external world (i.e., reduced number of tool calls and tokens). Our analyses provide various insights including the great potential of UALA compared with agent fine-tuning, and underscore the unreliability of verbalised confidence of LLMs as a proxy for uncertainty.
Related papers
- A Survey on Trustworthy LLM Agents: Threats and Countermeasures [67.23228612512848]
Large Language Models (LLMs) and Multi-agent Systems (MAS) have significantly expanded the capabilities of LLM ecosystems.
We propose the TrustAgent framework, a comprehensive study on the trustworthiness of agents.
arXiv Detail & Related papers (2025-03-12T08:42:05Z) - Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework [23.42251949130555]
Multimodal large language models (MLLMs) show promise in tasks like visual question answering (VQA)
Recent works adapt agentic frameworks or chain-of-thought (CoT) reasoning to improve performance.
We propose Seeing and Reasoning with Confidence (SRICE), a training-free multimodal reasoning framework.
arXiv Detail & Related papers (2025-03-11T11:18:53Z) - An Overview of Large Language Models for Statisticians [109.38601458831545]
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI)
This paper explores potential areas where statisticians can make important contributions to the development of LLMs.
We focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation.
arXiv Detail & Related papers (2025-02-25T03:40:36Z) - Uncertainty Quantification of Large Language Models through Multi-Dimensional Responses [4.505944978127014]
We introduce a multi-dimensional UQ framework that integrates semantic and knowledge-aware similarity analysis.
This approach disentangles overlapping information from both semantic and knowledge dimensions, capturing both semantic variations and factual consistency.
Our empirical evaluations demonstrate that our method outperforms existing techniques in identifying uncertain responses.
arXiv Detail & Related papers (2025-02-24T04:05:08Z) - On Verbalized Confidence Scores for LLMs [25.160810008907397]
Uncertainty quantification for large language models (LLMs) can establish more human trust into their responses.
This work focuses on asking the LLM itself to verbalize its uncertainty with a confidence score as part of its output tokens.
We assess the reliability of verbalized confidence scores with respect to different datasets, models, and prompt methods.
arXiv Detail & Related papers (2024-12-19T11:10:36Z) - Positive Experience Reflection for Agents in Interactive Text Environments [9.982616173090264]
We introduce Sweet&Sour, a novel approach that incorporates positive experiences and managed memory to enrich the context available to the agent at decision time.
Our comprehensive analysis spans both closed- and open-source LLMs and demonstrates the effectiveness of Sweet&Sour in improving agent performance.
arXiv Detail & Related papers (2024-11-04T16:15:28Z) - MlingConf: A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models [23.384966485398184]
This paper introduces a comprehensive investigation of Multilingual Confidence estimation (MlingConf) on Large Language Models (LLMs)
The benchmark comprises four meticulously checked and human-evaluated high-quality multilingual datasets for LA tasks and one for the LS task tailored to specific social, cultural, and geographical contexts of a language.
Experiments reveal that on LA tasks English exhibits notable linguistic dominance in confidence estimations than other languages, while on LS tasks, using question-related language to prompt LLMs demonstrates better linguistic dominance in multilingual confidence estimations.
arXiv Detail & Related papers (2024-10-16T11:46:55Z) - DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations [52.242449026151846]
DebUnc is a multi-agent debate framework that uses uncertainty metrics to assess agent confidence levels.
We adapted the attention mechanism to adjust token weights based on confidence levels.
Our evaluations show that attention-based methods are particularly effective.
arXiv Detail & Related papers (2024-07-08T22:15:01Z) - Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study [51.19622266249408]
MultiTrust is the first comprehensive and unified benchmark on the trustworthiness of MLLMs.
Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts.
Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks.
arXiv Detail & Related papers (2024-06-11T08:38:13Z) - A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models [23.384966485398184]
This paper introduces a comprehensive investigation of Multilingual Confidence estimation (MlingConf) on Large Language Models (LLMs)
The benchmark comprises four meticulously checked and human-evaluated high-quality multilingual datasets for LA tasks and one for the LS task tailored to specific social, cultural, and geographical contexts of a language.
Experiments reveal that on LA tasks English exhibits notable linguistic dominance in confidence estimations than other languages, while on LS tasks, using question-related language to prompt LLMs demonstrates better linguistic dominance in multilingual confidence estimations.
arXiv Detail & Related papers (2024-02-21T08:20:06Z) - AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [76.95062553043607]
evaluating large language models (LLMs) is essential for understanding their capabilities and facilitating their integration into practical applications.
We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.
arXiv Detail & Related papers (2024-01-24T01:51:00Z) - AntEval: Evaluation of Social Interaction Competencies in LLM-Driven
Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios.
However, their capability in handling complex, multi-character social interactions has yet to be fully explored.
We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z) - Exchange-of-Thought: Enhancing Large Language Model Capabilities through
Cross-Model Communication [76.04373033082948]
Large Language Models (LLMs) have recently made significant strides in complex reasoning tasks through the Chain-of-Thought technique.
We propose Exchange-of-Thought (EoT), a novel framework that enables cross-model communication during problem-solving.
arXiv Detail & Related papers (2023-12-04T11:53:56Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.