I Think, Therefore I am: Benchmarking Awareness of Large Language Models
Using AwareBench
- URL: http://arxiv.org/abs/2401.17882v2
- Date: Fri, 16 Feb 2024 09:47:38 GMT
- Title: I Think, Therefore I am: Benchmarking Awareness of Large Language Models
Using AwareBench
- Authors: Yuan Li, Yue Huang, Yuli Lin, Siyuan Wu, Yao Wan and Lichao Sun
- Abstract summary: We introduce AwareBench, a benchmark designed to evaluate awareness in large language models (LLMs)
We categorize awareness in LLMs into five dimensions, including capability, mission, emotion, culture, and perspective.
Our experiments, conducted on 13 LLMs, reveal that the majority of them struggle to fully recognize their capabilities and missions while demonstrating decent social intelligence.
- Score: 20.909504977779978
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Do large language models (LLMs) exhibit any forms of awareness similar to
humans? In this paper, we introduce AwareBench, a benchmark designed to
evaluate awareness in LLMs. Drawing from theories in psychology and philosophy,
we define awareness in LLMs as the ability to understand themselves as AI
models and to exhibit social intelligence. Subsequently, we categorize
awareness in LLMs into five dimensions, including capability, mission, emotion,
culture, and perspective. Based on this taxonomy, we create a dataset called
AwareEval, which contains binary, multiple-choice, and open-ended questions to
assess LLMs' understandings of specific awareness dimensions. Our experiments,
conducted on 13 LLMs, reveal that the majority of them struggle to fully
recognize their capabilities and missions while demonstrating decent social
intelligence. We conclude by connecting awareness of LLMs with AI alignment and
safety, emphasizing its significance to the trustworthy and ethical development
of LLMs. Our dataset and code are available at
https://github.com/HowieHwong/Awareness-in-LLM.
Related papers
- Entering Real Social World! Benchmarking the Theory of Mind and Socialization Capabilities of LLMs from a First-person Perspective [22.30892836263764]
In the era of artificial intelligence (AI), especially with the development of large language models (LLMs), we raise an intriguing question.
How do LLMs perform in terms of ToM and socialization capabilities?
We introduce EgoSocialArena, a novel framework designed to evaluate and investigate the ToM and socialization capabilities of LLMs from a first person perspective.
arXiv Detail & Related papers (2024-10-08T16:55:51Z) - A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition [0.6138671548064355]
Large Language Models (LLMs) are known for their remarkable ability to generate 'knowledge'
However, there is a huge gap between LLM's and human capabilities for understanding abstract concepts and reasoning.
We discuss these issues in a larger philosophical context of human knowledge acquisition and the Turing test.
arXiv Detail & Related papers (2024-08-13T03:25:49Z) - Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs [38.86647602211699]
AI assistants such as ChatGPT are trained to respond to users by saying, "I am a large language model"
Are they aware of their current circumstances, such as being deployed to the public?
We refer to a model's knowledge of itself and its circumstances as situational awareness.
arXiv Detail & Related papers (2024-07-05T17:57:02Z) - Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception [21.60103376506254]
Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in visual perception and understanding.
These models also suffer from hallucinations, which limit their reliability as AI systems.
This paper aims to define and evaluate the self-awareness of MLLMs in perception.
arXiv Detail & Related papers (2024-01-15T08:19:22Z) - RECALL: A Benchmark for LLMs Robustness against External Counterfactual
Knowledge [69.79676144482792]
This study aims to evaluate the ability of LLMs to distinguish reliable information from external knowledge.
Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information.
arXiv Detail & Related papers (2023-11-14T13:24:19Z) - Towards Concept-Aware Large Language Models [56.48016300758356]
Concepts play a pivotal role in various human cognitive functions, including learning, reasoning and communication.
There is very little work on endowing machines with the ability to form and reason with concepts.
In this work, we analyze how well contemporary large language models (LLMs) capture human concepts and their structure.
arXiv Detail & Related papers (2023-11-03T12:19:22Z) - Large Language Models: The Need for Nuance in Current Debates and a
Pragmatic Perspective on Understanding [1.3654846342364308]
Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text.
This position paper critically assesses three points recurring in critiques of LLM capacities.
We outline a pragmatic perspective on the issue of real' understanding and intentionality in LLMs.
arXiv Detail & Related papers (2023-10-30T15:51:04Z) - Avalon's Game of Thoughts: Battle Against Deception through Recursive
Contemplation [80.126717170151]
This study utilizes the intricate Avalon game as a testbed to explore LLMs' potential in deceptive environments.
We introduce a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information.
arXiv Detail & Related papers (2023-10-02T16:27:36Z) - Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well.
Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries.
We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z) - Do Large Language Models Know What They Don't Know? [74.65014158544011]
Large language models (LLMs) have a wealth of knowledge that allows them to excel in various Natural Language Processing (NLP) tasks.
Despite their vast knowledge, LLMs are still limited by the amount of information they can accommodate and comprehend.
This study aims to evaluate LLMs' self-knowledge by assessing their ability to identify unanswerable or unknowable questions.
arXiv Detail & Related papers (2023-05-29T15:30:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.