I Think, Therefore I am: Benchmarking Awareness of Large Language Models
Using AwareBench
- URL: http://arxiv.org/abs/2401.17882v2
- Date: Fri, 16 Feb 2024 09:47:38 GMT
- Title: I Think, Therefore I am: Benchmarking Awareness of Large Language Models
Using AwareBench
- Authors: Yuan Li, Yue Huang, Yuli Lin, Siyuan Wu, Yao Wan and Lichao Sun
- Abstract summary: We introduce AwareBench, a benchmark designed to evaluate awareness in large language models (LLMs)
We categorize awareness in LLMs into five dimensions, including capability, mission, emotion, culture, and perspective.
Our experiments, conducted on 13 LLMs, reveal that the majority of them struggle to fully recognize their capabilities and missions while demonstrating decent social intelligence.
- Score: 20.909504977779978
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Do large language models (LLMs) exhibit any forms of awareness similar to
humans? In this paper, we introduce AwareBench, a benchmark designed to
evaluate awareness in LLMs. Drawing from theories in psychology and philosophy,
we define awareness in LLMs as the ability to understand themselves as AI
models and to exhibit social intelligence. Subsequently, we categorize
awareness in LLMs into five dimensions, including capability, mission, emotion,
culture, and perspective. Based on this taxonomy, we create a dataset called
AwareEval, which contains binary, multiple-choice, and open-ended questions to
assess LLMs' understandings of specific awareness dimensions. Our experiments,
conducted on 13 LLMs, reveal that the majority of them struggle to fully
recognize their capabilities and missions while demonstrating decent social
intelligence. We conclude by connecting awareness of LLMs with AI alignment and
safety, emphasizing its significance to the trustworthy and ethical development
of LLMs. Our dataset and code are available at
https://github.com/HowieHwong/Awareness-in-LLM.
Related papers
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs [38.86647602211699]
AI assistants such as ChatGPT are trained to respond to users by saying, "I am a large language model"
Are they aware of their current circumstances, such as being deployed to the public?
We refer to a model's knowledge of itself and its circumstances as situational awareness.
arXiv Detail & Related papers (2024-07-05T17:57:02Z) - Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Into the Unknown: Self-Learning Large Language Models [0.0]
We propose a self-learning framework that enables an LLM to independently learn previously unknown knowledge.
Using the hallucination score, we introduce a new concept of Points in the Unknown (PiUs)
It facilitates the creation of a self-learning loop that focuses exclusively on the knowledge gap in Points in the Unknown.
arXiv Detail & Related papers (2024-02-14T12:56:58Z) - MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception [21.60103376506254]
Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in visual perception and understanding.
These models also suffer from hallucinations, which limit their reliability as AI systems.
This paper aims to define and evaluate the self-awareness of MLLMs in perception.
arXiv Detail & Related papers (2024-01-15T08:19:22Z) - RECALL: A Benchmark for LLMs Robustness against External Counterfactual
Knowledge [69.79676144482792]
This study aims to evaluate the ability of LLMs to distinguish reliable information from external knowledge.
Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information.
arXiv Detail & Related papers (2023-11-14T13:24:19Z) - Towards Concept-Aware Large Language Models [56.48016300758356]
Concepts play a pivotal role in various human cognitive functions, including learning, reasoning and communication.
There is very little work on endowing machines with the ability to form and reason with concepts.
In this work, we analyze how well contemporary large language models (LLMs) capture human concepts and their structure.
arXiv Detail & Related papers (2023-11-03T12:19:22Z) - Large Language Models: The Need for Nuance in Current Debates and a
Pragmatic Perspective on Understanding [1.3654846342364308]
Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text.
This position paper critically assesses three points recurring in critiques of LLM capacities.
We outline a pragmatic perspective on the issue of real' understanding and intentionality in LLMs.
arXiv Detail & Related papers (2023-10-30T15:51:04Z) - Avalon's Game of Thoughts: Battle Against Deception through Recursive
Contemplation [80.126717170151]
This study utilizes the intricate Avalon game as a testbed to explore LLMs' potential in deceptive environments.
We introduce a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information.
arXiv Detail & Related papers (2023-10-02T16:27:36Z) - Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench [83.41621219298489]
We propose to evaluate the empathy ability of Large Language Models (LLMs)
We collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study.
We conduct a human evaluation involving more than 1,200 subjects worldwide.
arXiv Detail & Related papers (2023-08-07T15:18:30Z) - Concept-Oriented Deep Learning with Large Language Models [0.4548998901594072]
Large Language Models (LLMs) have been successfully used in many natural-language tasks and applications including text generation and AI chatbots.
They also are a promising new technology for concept-oriented deep learning (CODL)
We discuss conceptual understanding in visual-language LLMs, the most important multimodal LLMs, and major uses of them for CODL including concept extraction from image, concept graph extraction from image, and concept learning.
arXiv Detail & Related papers (2023-06-29T16:47:11Z) - Do Large Language Models Know What They Don't Know? [74.65014158544011]
Large language models (LLMs) have a wealth of knowledge that allows them to excel in various Natural Language Processing (NLP) tasks.
Despite their vast knowledge, LLMs are still limited by the amount of information they can accommodate and comprehend.
This study aims to evaluate LLMs' self-knowledge by assessing their ability to identify unanswerable or unknowable questions.
arXiv Detail & Related papers (2023-05-29T15:30:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.