Related papers: Reverse-engineering NLI: A study of the meta-inferential properties of Natural Language Inference

Reverse-engineering NLI: A study of the meta-inferential properties of Natural Language Inference

URL: http://arxiv.org/abs/2601.05170v2
Date: Fri, 09 Jan 2026 17:37:13 GMT
Title: Reverse-engineering NLI: A study of the meta-inferential properties of Natural Language Inference
Authors: Rasmus Blanck, Bill Noble, Stergios Chatzikyriakidis,
Abstract summary: Natural Language Inference (NLI) has been an important task for evaluating language models for Natural Language Understanding.<n>We formulate three possible readings of the NLI label set and perform a comprehensive analysis of the meta-inferential properties they entail.<n>We derive insights into which reading of the logical relations is encoded by the dataset.
Score: 0.3431752534091897
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Natural Language Inference (NLI) has been an important task for evaluating language models for Natural Language Understanding, but the logical properties of the task are poorly understood and often mischaracterized. Understanding the notion of inference captured by NLI is key to interpreting model performance on the task. In this paper we formulate three possible readings of the NLI label set and perform a comprehensive analysis of the meta-inferential properties they entail. Focusing on the SNLI dataset, we exploit (1) NLI items with shared premises and (2) items generated by LLMs to evaluate models trained on SNLI for meta-inferential consistency and derive insights into which reading of the logical relations is encoded by the dataset.

Related papers

Let's CONFER: A Dataset for Evaluating Natural Language Inference Models on CONditional InFERence and Presupposition [6.429761894240061]
We introduce CONFER, a novel dataset designed to evaluate how NLI models process inference in conditional sentences.<n>We assess the performance of four NLI models, including two pre-trained models, to examine their generalization to conditional reasoning.<n>Our findings indicate that NLI models struggle with presuppositional reasoning in conditionals, and fine-tuning on existing NLI datasets does not necessarily improve their performance.
arXiv Detail & Related papers (2025-06-06T14:42:20Z)
IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis [60.32962597618861]
IDA-Bench is a novel benchmark evaluating large language models in multi-round interactive scenarios.<n>Agent performance is judged by comparing its final numerical output to the human-derived baseline.<n>Even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on 50% of the tasks, highlighting limitations not evident in single-turn tests.
arXiv Detail & Related papers (2025-05-23T09:37:52Z)
Pushing the boundary on Natural Language Inference [49.15148871877941]
Natural Language Inference (NLI) is a central task in natural language understanding with applications in fact-checking, question answering and information retrieval.<n>Despite its importance, current NLI systems heavily rely on learning with limiting artifacts and biases, inference and real-world applicability.<n>This work provides a framework for building robust NLI systems without sacrificing quality or real-world applicability.
arXiv Detail & Related papers (2025-04-25T14:20:57Z)
Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study [41.84915013818794]
The Natural Language to Visualization (NL2Vis) task aims to transform natural-language descriptions into visual representations for a grounded table. Many deep learning-based approaches have been developed for NL2Vis, but challenges persist in visualizing data sourced from unseen databases or spanning multiple tables. Taking inspiration from the remarkable generation capabilities of Large Language Models (LLMs), this paper conducts an empirical study to evaluate their potential in generating visualizations.
arXiv Detail & Related papers (2024-04-26T03:25:35Z)
Exploiting Contextual Target Attributes for Target Sentiment Classification [53.30511968323911]
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task. We present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes.
arXiv Detail & Related papers (2023-12-21T11:45:28Z)
kNN-ICL: Compositional Task-Oriented Parsing Generalization with Nearest Neighbor In-Context Learning [50.40636157214161]
Task-Oriented Parsing (TOP) enables conversational assistants to interpret user commands expressed in natural language. LLMs have achieved impressive performance in computer programs based on a natural language prompt. This paper focuses on harnessing the capabilities of LLMs for semantic parsing tasks.
arXiv Detail & Related papers (2023-12-17T17:26:50Z)
Native Language Identification with Large Language Models [60.80452362519818]
We show that GPT models are proficient at NLI classification, with GPT-4 setting a new performance record of 91.7% on the benchmark11 test set in a zero-shot setting. We also show that unlike previous fully-supervised settings, LLMs can perform NLI without being limited to a set of known classes.
arXiv Detail & Related papers (2023-12-13T00:52:15Z)
Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
Exploring Transitivity in Neural NLI Models through Veridicality [39.845425535943534]
We focus on the transitivity of inference relations, a fundamental property for systematically drawing inferences. A model capturing transitivity can compose basic inference patterns and draw new inferences. We find that current NLI models do not perform consistently well on transitivity inference tasks.
arXiv Detail & Related papers (2021-01-26T11:18:35Z)
Probing the Natural Language Inference Task with Automated Reasoning Tools [6.445605125467574]
The Natural Language Inference (NLI) task is an important task in modern NLP, as it asks a broad question to which many other tasks may be reducible. We use other techniques to examine the logical structure of the NLI task. We show how well a machine-oriented controlled natural language can be used to parse NLI sentences, and how well automated theorem provers can reason over the resulting formulae.
arXiv Detail & Related papers (2020-05-06T03:18:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.