POSQA: Probe the World Models of LLMs with Size Comparisons
- URL: http://arxiv.org/abs/2310.13394v1
- Date: Fri, 20 Oct 2023 10:05:01 GMT
- Title: POSQA: Probe the World Models of LLMs with Size Comparisons
- Authors: Chang Shu, Jiuzhou Han, Fangyu Liu, Ehsan Shareghi, Nigel Collier
- Abstract summary: Embodied language comprehension emphasizes that language understanding is not solely a matter of mental processing in the brain.
With the explosive growth of Large Language Models (LLMs) and their already ubiquitous presence in our daily lives, it is becoming increasingly necessary to verify their real-world understanding.
- Score: 38.30479784257936
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Embodied language comprehension emphasizes that language understanding is not
solely a matter of mental processing in the brain but also involves
interactions with the physical and social environment. With the explosive
growth of Large Language Models (LLMs) and their already ubiquitous presence in
our daily lives, it is becoming increasingly necessary to verify their
real-world understanding. Inspired by cognitive theories, we propose POSQA: a
Physical Object Size Question Answering dataset with simple size comparison
questions to examine the extremity and analyze the potential mechanisms of the
embodied comprehension of the latest LLMs.
We show that even the largest LLMs today perform poorly under the zero-shot
setting. We then push their limits with advanced prompting techniques and
external knowledge augmentation. Furthermore, we investigate whether their
real-world comprehension primarily derives from contextual information or
internal weights and analyse the impact of prompt formats and report bias of
different objects. Our results show that real-world understanding that LLMs
shaped from textual data can be vulnerable to deception and confusion by the
surface form of prompts, which makes it less aligned with human behaviours.
Related papers
- Information Anxiety in Large Language Models [21.574677910096735]
Large Language Models (LLMs) have demonstrated strong performance as knowledge repositories.
We take the investigation further by conducting a comprehensive analysis of the internal reasoning and retrieval mechanisms of LLMs.
Our work focuses on three critical dimensions - the impact of entity popularity, the models' sensitivity to lexical variations in query formulation, and the progression of hidden state representations.
arXiv Detail & Related papers (2024-11-16T14:28:33Z) - Narrative Analysis of True Crime Podcasts With Knowledge Graph-Augmented Large Language Models [8.78598447041169]
Large language models (LLMs) still struggle with complex narrative arcs as well as narratives containing conflicting information.
Recent work indicates LLMs augmented with external knowledge bases can improve the accuracy and interpretability of the resulting models.
In this work, we analyze the effectiveness of applying knowledge graphs (KGs) in understanding true-crime podcast data.
arXiv Detail & Related papers (2024-11-01T21:49:00Z) - A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition [0.6138671548064355]
Large Language Models (LLMs) are known for their remarkable ability to generate 'knowledge'
However, there is a huge gap between LLM's and human capabilities for understanding abstract concepts and reasoning.
We discuss these issues in a larger philosophical context of human knowledge acquisition and the Turing test.
arXiv Detail & Related papers (2024-08-13T03:25:49Z) - Crafting Interpretable Embeddings by Asking LLMs Questions [89.49960984640363]
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks.
We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM.
We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli.
arXiv Detail & Related papers (2024-05-26T22:30:29Z) - Can large language models understand uncommon meanings of common words? [30.527834781076546]
Large language models (LLMs) have shown significant advancements across diverse natural language understanding (NLU) tasks.
Yet, lacking widely acknowledged testing mechanisms, answering whether LLMs are parrots or genuinely comprehend the world' remains unclear.
This paper presents innovative construction of a Lexical Semantic dataset with novel evaluation metrics.
arXiv Detail & Related papers (2024-05-09T12:58:22Z) - LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities.
If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information.
To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - From Understanding to Utilization: A Survey on Explainability for Large
Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs)
Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity.
When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Brain in a Vat: On Missing Pieces Towards Artificial General
Intelligence in Large Language Models [83.63242931107638]
We propose four characteristics of generally intelligent agents.
We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations.
We conclude by outlining promising future research directions in the field of artificial general intelligence.
arXiv Detail & Related papers (2023-07-07T13:58:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.