Shared Imagination: LLMs Hallucinate Alike
- URL: http://arxiv.org/abs/2407.16604v1
- Date: Tue, 23 Jul 2024 16:06:22 GMT
- Title: Shared Imagination: LLMs Hallucinate Alike
- Authors: Yilun Zhou, Caiming Xiong, Silvio Savarese, Chien-Sheng Wu,
- Abstract summary: We propose a novel setting, imaginary question answering (IQA), to better understand model similarity.
In IQA, we ask one model to generate purely imaginary questions and prompt another model to answer.
Despite the total fictionality of these questions, all models can answer each other's questions with remarkable success.
- Score: 92.4557277529155
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the recent proliferation of large language models (LLMs), their training recipes -- model architecture, pre-training data and optimization algorithm -- are often very similar. This naturally raises the question of the similarity among the resulting models. In this paper, we propose a novel setting, imaginary question answering (IQA), to better understand model similarity. In IQA, we ask one model to generate purely imaginary questions (e.g., on completely made-up concepts in physics) and prompt another model to answer. Surprisingly, despite the total fictionality of these questions, all models can answer each other's questions with remarkable success, suggesting a "shared imagination space" in which these models operate during such hallucinations. We conduct a series of investigations into this phenomenon and discuss implications on model homogeneity, hallucination, and computational creativity.
Related papers
- Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs' Decoding Layers [3.4307476319801213]
Large language models (LLMs) are known to hallucinate, a phenomenon often linked to creativity.
We introduce an evaluation framework, HCL, which quantifies Hallucination and Creativity across different Layers of LLMs during decoding.
Our empirical analysis reveals a tradeoff between hallucination and creativity that is consistent across layer depth, model type, and model size.
arXiv Detail & Related papers (2025-03-04T18:27:00Z) - Admitting Ignorance Helps the Video Question Answering Models to Answer [82.22149677979189]
We argue that models often establish shortcuts, resulting in spurious correlations between questions and answers.
We propose a novel training framework in which the model is compelled to acknowledge its ignorance when presented with an intervened question.
In practice, we integrate a state-of-the-art model into our framework to validate its effectiveness.
arXiv Detail & Related papers (2025-01-15T12:44:52Z) - Understanding with toy surrogate models in machine learning [0.0]
Some of the simple surrogate models used to understand opaque machine learning (ML) models bear some resemblance to scientific toy models.
This paper provides an account of what it means to understand an opaque ML model globally with the aid of such simple models.
arXiv Detail & Related papers (2024-10-08T04:22:28Z) - What is it for a Machine Learning Model to Have a Capability? [0.0]
We develop an account of machine learning models' capabilities which can be usefully applied to the nascent science of model evaluation.
Our core proposal is a conditional analysis of model abilities (CAMA), crudely, a machine learning model has a capability to X just when it would reliably succeed at doing X if it 'tried'
arXiv Detail & Related papers (2024-05-14T23:03:52Z) - PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns [69.17409440805498]
We evaluate large multimodal models with abstract patterns based on fundamental concepts.
We find that they are not able to generalize well to simple abstract patterns.
Our systematic analysis finds that the main bottlenecks of GPT-4V are weaker visual perception and inductive reasoning abilities.
arXiv Detail & Related papers (2024-03-20T05:37:24Z) - Unfamiliar Finetuning Examples Control How Language Models Hallucinate [75.03210107477157]
Large language models are known to hallucinate when faced with unfamiliar queries.
We find that unfamiliar examples in the models' finetuning data are crucial in shaping these errors.
Our work further investigates RL finetuning strategies for improving the factuality of long-form model generations.
arXiv Detail & Related papers (2024-03-08T18:28:13Z) - Is a model equivalent to its computer implementation? [0.021756081703276]
We argue that even in widely used models the causal link between the (formal) mathematical model and the set of results is no longer certain.
A new perspective on this topic stems from the accelerating trend that in some branches of research only implemented models are used.
arXiv Detail & Related papers (2024-02-23T14:54:40Z) - Composing Ensembles of Pre-trained Models via Iterative Consensus [95.10641301155232]
We propose a unified framework for composing ensembles of different pre-trained models.
We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization.
We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer.
arXiv Detail & Related papers (2022-10-20T18:46:31Z) - Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language [49.82293730925404]
Large foundation models can exhibit unique capabilities depending on the domain of data they are trained on.
We show that this model diversity is symbiotic, and can be leveraged to build AI systems with structured Socratic dialogue.
arXiv Detail & Related papers (2022-04-01T17:43:13Z) - DREAM: Uncovering Mental Models behind Language Models [15.71233907204059]
DREAM is a model that takes a situational question as input to produce a mental model elaborating the situation.
It inherits its social commonsense through distant supervision from existing NLP resources.
Mental models generated by DREAM can be used as additional context for situational QA tasks.
arXiv Detail & Related papers (2021-12-16T06:22:47Z) - Reconstruction of Pairwise Interactions using Energy-Based Models [3.553493344868414]
We show that hybrid models, which combine a pairwise model and a neural network, can lead to significant improvements in the reconstruction of pairwise interactions.
This is in line with the general idea that simple interpretable models and complex black-box models are not necessarily a dichotomy.
arXiv Detail & Related papers (2020-12-11T20:15:10Z) - What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets.
We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.