Related papers: Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge

Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge

URL: http://arxiv.org/abs/2305.05976v2
Date: Sat, 13 May 2023 13:34:04 GMT
Title: Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge
Authors: Jiangjie Chen, Wei Shi, Ziquan Fu, Sijie Cheng, Lei Li, Yanghua Xiao
Abstract summary: Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. negative knowledge, such as "lions don't live in the ocean", is also ubiquitous in the world but rarely mentioned explicitly in the text. This work examines the ability of LLMs to negative commonsense knowledge.
Score: 22.543345304998258
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. However, negative knowledge, such as "lions don't live in the ocean", is also ubiquitous in the world but rarely mentioned explicitly in the text. What do LLMs know about negative knowledge? This work examines the ability of LLMs to negative commonsense knowledge. We design a constrained keywords-to-sentence generation task (CG) and a Boolean question-answering task (QA) to probe LLMs. Our experiments reveal that LLMs frequently fail to generate valid sentences grounded in negative commonsense knowledge, yet they can correctly answer polar yes-or-no questions. We term this phenomenon the belief conflict of LLMs. Our further analysis shows that statistical shortcuts and negation reporting bias from language modeling pre-training cause this conflict.

Related papers

Delving into the Reversal Curse: How Far Can Large Language Models Generalize? [40.64539467276017]
Large language models (LLMs) exhibit limitations when facing seemingly trivial tasks. A prime example is the recently debated "reversal curse", which surfaces when models, having been trained on the fact "A is B", struggle to generalize this knowledge to infer that "B is A"
arXiv Detail & Related papers (2024-10-24T14:55:09Z)
AI Meets the Classroom: When Does ChatGPT Harm Learning? [0.0]
We study how generative AI and specifically large language models (LLMs) impact learning in coding classes. We show across three studies that LLM usage can have positive and negative effects on learning outcomes.
arXiv Detail & Related papers (2024-08-29T17:07:46Z)
Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals [53.273592543786705]
Large language models (LLMs) have achieved great success, but their occasional content fabrication, or hallucination, limits their practical application. We propose CoKE, which first probes LLMs' knowledge boundary via internal confidence given a set of questions, and then leverages the probing results to elicit the expression of the knowledge boundary.
arXiv Detail & Related papers (2024-06-16T10:07:20Z)
LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities. If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information. To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z)
See the Unseen: Better Context-Consistent Knowledge-Editing by Noises [73.54237379082795]
Knowledge-editing updates knowledge of large language models (LLMs) Existing works ignore this property and the editing lacks generalization. We empirically find that the effects of different contexts upon LLMs in recalling the same knowledge follow a Gaussian-like distribution.
arXiv Detail & Related papers (2024-01-15T09:09:14Z)
Enabling Large Language Models to Learn from Rules [99.16680531261987]
We are inspired that humans can learn the new tasks or knowledge in another way by learning from rules. We propose rule distillation, which first uses the strong in-context abilities of LLMs to extract the knowledge from the textual rules. Our experiments show that making LLMs learn from rules by our method is much more efficient than example-based learning in both the sample size and generalization ability.
arXiv Detail & Related papers (2023-11-15T11:42:41Z)
This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models [4.017326849033009]
We try to clarify the reasons for the sub-optimal performance of large language models understanding negation. We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge. We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability.
arXiv Detail & Related papers (2023-10-24T15:38:21Z)
Democratizing Reasoning Ability: Tailored Learning from Large Language Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs. We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z)
Do Large Language Models Know about Facts? [60.501902866946]
Large language models (LLMs) have recently driven striking performance improvements across a range of natural language processing tasks. We aim to evaluate the extent and scope of factual knowledge within LLMs by designing the benchmark Pinocchio. Pinocchio contains 20K diverse factual questions that span different sources, timelines, domains, regions, and languages.
arXiv Detail & Related papers (2023-10-08T14:26:55Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
Can large language models generate salient negative statements? [18.577880767789097]
We examine the ability of large language models to generate salient (interesting) negative statements about real-world entities. We probe the LLMs using zero- and k-shot unconstrained probes, and compare with traditional methods for negation generation. We measure the correctness and salience of the generated lists about subjects from different domains.
arXiv Detail & Related papers (2023-05-26T09:13:59Z)
Understanding Causality with Large Language Models: Feasibility and Opportunities [23.68197884888299]
We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses. We discuss possible future directions and opportunities, such as enabling explicit and implicit causal modules.
arXiv Detail & Related papers (2023-04-11T22:30:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.