Related papers: A MISMATCHED Benchmark for Scientific Natural Language Inference

A MISMATCHED Benchmark for Scientific Natural Language Inference

URL: http://arxiv.org/abs/2506.04603v1
Date: Thu, 05 Jun 2025 03:40:57 GMT
Title: A MISMATCHED Benchmark for Scientific Natural Language Inference
Authors: Firoz Shaik, Mobashir Sadat, Nikita Gautam, Doina Caragea, Cornelia Caragea,
Abstract summary: We introduce a novel evaluation benchmark for scientific NLI, called MISMATCHED.<n>The new benchmark covers three non-CS domains-PSYCHOLOGY, ENGINEERING, and PUBLIC HEALTH, and contains 2,700 human annotated sentence pairs.<n>In addition to introducing the MISMATCHED benchmark, we show that incorporating sentence pairs having an implicit scientific NLI relation between them in model training improves their performance on scientific NLI.
Score: 53.17435107472026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scientific Natural Language Inference (NLI) is the task of predicting the semantic relation between a pair of sentences extracted from research articles. Existing datasets for this task are derived from various computer science (CS) domains, whereas non-CS domains are completely ignored. In this paper, we introduce a novel evaluation benchmark for scientific NLI, called MISMATCHED. The new MISMATCHED benchmark covers three non-CS domains-PSYCHOLOGY, ENGINEERING, and PUBLIC HEALTH, and contains 2,700 human annotated sentence pairs. We establish strong baselines on MISMATCHED using both Pre-trained Small Language Models (SLMs) and Large Language Models (LLMs). Our best performing baseline shows a Macro F1 of only 78.17% illustrating the substantial headroom for future improvements. In addition to introducing the MISMATCHED benchmark, we show that incorporating sentence pairs having an implicit scientific NLI relation between them in model training improves their performance on scientific NLI. We make our dataset and code publicly available on GitHub.

Related papers

A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding. There is no publicly available NLI corpus for the Romanian language. We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z)
MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference [65.37685198688538]
This paper presents MSciNLI, a dataset containing 132,320 sentence pairs extracted from five new scientific domains. We establish strong baselines on MSciNLI by fine-tuning Pre-trained Language Models (PLMs) and prompting Large Language Models (LLMs) We show that domain shift degrades the performance of scientific NLI models which demonstrates the diverse characteristics of different domains in our dataset.
arXiv Detail & Related papers (2024-04-11T18:12:12Z)
Improving Domain-Specific Retrieval by NLI Fine-Tuning [64.79760042717822]
This article investigates the fine-tuning potential of natural language inference (NLI) data to improve information retrieval and ranking. We employ both monolingual and multilingual sentence encoders fine-tuned by a supervised method utilizing contrastive loss and NLI data. Our results point to the fact that NLI fine-tuning increases the performance of the models in both tasks and both languages, with the potential to improve mono- and multilingual models.
arXiv Detail & Related papers (2023-08-06T12:40:58Z)
NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts [51.64770549988806]
We introduce NaSGEC, a new dataset to facilitate research on Chinese grammatical error correction (CGEC) for native speaker texts from multiple domains. To broaden the target domain, we annotate multiple references for 12,500 sentences from three native domains, i.e., social media, scientific writing, and examination. We provide solid benchmark results for NaSGEC by employing cutting-edge CGEC models and different training data.
arXiv Detail & Related papers (2023-05-25T13:05:52Z)
SciNLI: A Corpus for Natural Language Inference on Scientific Text [47.293189105900524]
We introduce SciNLI, a large dataset for NLI that captures the formality in scientific text. Our best performing model with XLNet achieves a Macro F1 score of only 78.18% and an accuracy of 78.23%.
arXiv Detail & Related papers (2022-03-13T18:23:37Z)
OCNLI: Original Chinese Natural Language Inference [21.540733910984006]
We present the first large-scale NLI dataset (consisting of 56,000 annotated sentence pairs) for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI) Unlike recent attempts at extending NLI to other languages, our dataset does not rely on any automatic translation or non-expert annotation. We establish several baseline results on our dataset using state-of-the-art pre-trained models for Chinese, and find even the best performing models to be far outpaced by human performance.
arXiv Detail & Related papers (2020-10-12T04:25:48Z)
Probing the Natural Language Inference Task with Automated Reasoning Tools [6.445605125467574]
The Natural Language Inference (NLI) task is an important task in modern NLP, as it asks a broad question to which many other tasks may be reducible. We use other techniques to examine the logical structure of the NLI task. We show how well a machine-oriented controlled natural language can be used to parse NLI sentences, and how well automated theorem provers can reason over the resulting formulae.
arXiv Detail & Related papers (2020-05-06T03:18:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.