Say What You Mean! Large Language Models Speak Too Positively about
Negative Commonsense Knowledge
- URL: http://arxiv.org/abs/2305.05976v2
- Date: Sat, 13 May 2023 13:34:04 GMT
- Title: Say What You Mean! Large Language Models Speak Too Positively about
Negative Commonsense Knowledge
- Authors: Jiangjie Chen, Wei Shi, Ziquan Fu, Sijie Cheng, Lei Li, Yanghua Xiao
- Abstract summary: Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge.
negative knowledge, such as "lions don't live in the ocean", is also ubiquitous in the world but rarely mentioned explicitly in the text.
This work examines the ability of LLMs to negative commonsense knowledge.
- Score: 22.543345304998258
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) have been widely studied for their ability to
store and utilize positive knowledge. However, negative knowledge, such as
"lions don't live in the ocean", is also ubiquitous in the world but rarely
mentioned explicitly in the text. What do LLMs know about negative knowledge?
This work examines the ability of LLMs to negative commonsense knowledge. We
design a constrained keywords-to-sentence generation task (CG) and a Boolean
question-answering task (QA) to probe LLMs. Our experiments reveal that LLMs
frequently fail to generate valid sentences grounded in negative commonsense
knowledge, yet they can correctly answer polar yes-or-no questions. We term
this phenomenon the belief conflict of LLMs. Our further analysis shows that
statistical shortcuts and negation reporting bias from language modeling
pre-training cause this conflict.
Related papers
- Delving into the Reversal Curse: How Far Can Large Language Models Generalize? [40.64539467276017]
Large language models (LLMs) exhibit limitations when facing seemingly trivial tasks.
A prime example is the recently debated "reversal curse", which surfaces when models, having been trained on the fact "A is B", struggle to generalize this knowledge to infer that "B is A"
arXiv Detail & Related papers (2024-10-24T14:55:09Z) - AI Meets the Classroom: When Does ChatGPT Harm Learning? [0.0]
We study how generative AI and specifically large language models (LLMs) impact learning in coding classes.
We show across three studies that LLM usage can have positive and negative effects on learning outcomes.
arXiv Detail & Related papers (2024-08-29T17:07:46Z) - Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals [53.273592543786705]
Large language models (LLMs) have achieved great success, but their occasional content fabrication, or hallucination, limits their practical application.
We propose CoKE, which first probes LLMs' knowledge boundary via internal confidence given a set of questions, and then leverages the probing results to elicit the expression of the knowledge boundary.
arXiv Detail & Related papers (2024-06-16T10:07:20Z) - LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities.
If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information.
To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z) - See the Unseen: Better Context-Consistent Knowledge-Editing by Noises [73.54237379082795]
Knowledge-editing updates knowledge of large language models (LLMs)
Existing works ignore this property and the editing lacks generalization.
We empirically find that the effects of different contexts upon LLMs in recalling the same knowledge follow a Gaussian-like distribution.
arXiv Detail & Related papers (2024-01-15T09:09:14Z) - Enabling Large Language Models to Learn from Rules [99.16680531261987]
We are inspired that humans can learn the new tasks or knowledge in another way by learning from rules.
We propose rule distillation, which first uses the strong in-context abilities of LLMs to extract the knowledge from the textual rules.
Our experiments show that making LLMs learn from rules by our method is much more efficient than example-based learning in both the sample size and generalization ability.
arXiv Detail & Related papers (2023-11-15T11:42:41Z) - Do Large Language Models Know about Facts? [60.501902866946]
Large language models (LLMs) have recently driven striking performance improvements across a range of natural language processing tasks.
We aim to evaluate the extent and scope of factual knowledge within LLMs by designing the benchmark Pinocchio.
Pinocchio contains 20K diverse factual questions that span different sources, timelines, domains, regions, and languages.
arXiv Detail & Related papers (2023-10-08T14:26:55Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Can large language models generate salient negative statements? [18.577880767789097]
We examine the ability of large language models to generate salient (interesting) negative statements about real-world entities.
We probe the LLMs using zero- and k-shot unconstrained probes, and compare with traditional methods for negation generation.
We measure the correctness and salience of the generated lists about subjects from different domains.
arXiv Detail & Related papers (2023-05-26T09:13:59Z) - Understanding Causality with Large Language Models: Feasibility and
Opportunities [23.68197884888299]
We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses.
We discuss possible future directions and opportunities, such as enabling explicit and implicit causal modules.
arXiv Detail & Related papers (2023-04-11T22:30:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.