Related papers: Filling the Gap: Is Commonsense Knowledge Generation useful for Natural Language Inference?

Filling the Gap: Is Commonsense Knowledge Generation useful for Natural Language Inference?

URL: http://arxiv.org/abs/2507.15100v1
Date: Sun, 20 Jul 2025 19:42:45 GMT
Title: Filling the Gap: Is Commonsense Knowledge Generation useful for Natural Language Inference?
Authors: Chathuri Jayaweera, Brianna Yanqui, Bonnie Dorr,
Abstract summary: Natural Language Inference (NLI) is the task of determining the semantic entailment of a premise for a given hypothesis.<n>Existing commonsense resources lack sufficient coverage for a variety of premise-hypothesis pairs.<n>This study explores the potential of Large Language Models as commonsense knowledge generators for NLI along two key dimensions.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Natural Language Inference (NLI) is the task of determining the semantic entailment of a premise for a given hypothesis. The task aims to develop systems that emulate natural human inferential processes where commonsense knowledge plays a major role. However, existing commonsense resources lack sufficient coverage for a variety of premise-hypothesis pairs. This study explores the potential of Large Language Models as commonsense knowledge generators for NLI along two key dimensions: their reliability in generating such knowledge and the impact of that knowledge on prediction accuracy. We adapt and modify existing metrics to assess LLM factuality and consistency in generating in this context. While explicitly incorporating commonsense knowledge does not consistently improve overall results, it effectively helps distinguish entailing instances and moderately improves distinguishing contradictory and neutral inferences.

Related papers

LExT: Towards Evaluating Trustworthiness of Natural Language Explanations [10.77745803401336]
We propose a framework for quantifying trustworthiness of natural language explanations, balancing Plausibility and Faithfulness.<n>Applying our domain-agnostic framework to the healthcare domain using public medical datasets, we evaluate six models.<n>Our findings demonstrate significant differences in their ability to generate trustworthy explanations.
arXiv Detail & Related papers (2025-04-08T17:16:52Z)
What Really is Commonsense Knowledge? [58.5342212738895]
We survey existing definitions of commonsense knowledge, ground into the three frameworks for defining concepts, and consolidate them into a unified definition of commonsense knowledge. We then use the consolidated definition for annotations and experiments on the CommonsenseQA and CommonsenseQA 2.0 datasets. Our study shows that there exists a large portion of non-commonsense-knowledge instances in the two datasets, and a large performance gap on these two subsets.
arXiv Detail & Related papers (2024-11-06T14:54:19Z)
Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation. We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge. Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z)
Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators [78.63553017938911]
Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks. However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge. We introduce CONNER, designed to evaluate generated knowledge from six important perspectives.
arXiv Detail & Related papers (2023-10-11T08:22:37Z)
Uncertainty in Natural Language Generation: From Theory to Applications [42.55924708592451]
We argue that a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals. We first present the fundamental theory, frameworks and vocabulary required to represent uncertainty. We then propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy.
arXiv Detail & Related papers (2023-07-28T17:51:21Z)
Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model. It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model. It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Context-faithful Prompting for Large Language Models [51.194410884263135]
Large language models (LLMs) encode parametric knowledge about world facts. Their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks. We assess and enhance LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction with abstention.
arXiv Detail & Related papers (2023-03-20T17:54:58Z)
Generated Knowledge Prompting for Commonsense Reasoning [53.88983683513114]
We propose generating knowledge statements directly from a language model with a generic prompt format. This approach improves performance of both off-the-shelf and finetuned language models on four commonsense reasoning tasks. Notably, we find that a model's predictions can improve when using its own generated knowledge.
arXiv Detail & Related papers (2021-10-15T21:58:03Z)
Revisiting the Uniform Information Density Hypothesis [44.277066511088634]
We investigate the uniform information density (UID) hypothesis using reading time and acceptability data. For acceptability judgments, we find clearer evidence that non-uniformity in information density is predictive of lower acceptability.
arXiv Detail & Related papers (2021-09-23T20:41:47Z)
Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings [35.2513653224183]
Natural language inference (NLI) requires models to learn and apply commonsense knowledge. We investigate whether external knowledge can also improve their explanation capabilities. We conduct the largest and most fine-grained explainable NLI crowdsourcing study to date.
arXiv Detail & Related papers (2021-09-16T09:56:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.