SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space
- URL: http://arxiv.org/abs/2307.02269v1
- Date: Wed, 5 Jul 2023 13:08:18 GMT
- Title: SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space
- Authors: Lasha Abzianidze, Joost Zwarts, Yoad Winter
- Abstract summary: We create an NLI dataset for spatial reasoning, called SpaceNLI.
The data samples are automatically generated from a curated set of reasoning patterns, where the patterns are annotated with inference labels by experts.
We test several SOTA NLI systems on SpaceNLI to gauge the complexity of the dataset and the system's capacity for spatial reasoning.
- Score: 0.6778628056950066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While many natural language inference (NLI) datasets target certain semantic
phenomena, e.g., negation, tense & aspect, monotonicity, and presupposition, to
the best of our knowledge, there is no NLI dataset that involves diverse types
of spatial expressions and reasoning. We fill this gap by semi-automatically
creating an NLI dataset for spatial reasoning, called SpaceNLI. The data
samples are automatically generated from a curated set of reasoning patterns,
where the patterns are annotated with inference labels by experts. We test
several SOTA NLI systems on SpaceNLI to gauge the complexity of the dataset and
the system's capacity for spatial reasoning. Moreover, we introduce a Pattern
Accuracy and argue that it is a more reliable and stricter measure than the
accuracy for evaluating a system's performance on pattern-based generated data
samples. Based on the evaluation results we find that the systems obtain
moderate results on the spatial NLI problems but lack consistency per inference
pattern. The results also reveal that non-projective spatial inferences
(especially due to the "between" preposition) are the most challenging ones.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Trajectory Volatility for Out-of-Distribution Detection in Mathematical Reasoning [50.84938730450622]
We propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning.
Our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios.
Our method can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.
arXiv Detail & Related papers (2024-05-22T22:22:25Z) - How Can Large Language Models Understand Spatial-Temporal Data? [12.968952073740796]
This paper introduces STG-LLM, an innovative approach empowering Large Language Models for spatial-temporal forecasting.
We tackle the data mismatch by proposing: 1) STG-Tokenizer: This spatial-temporal graph tokenizer transforms intricate graph data into concise tokens capturing both spatial and temporal relationships; 2) STG-Adapter: This minimalistic adapter, consisting of linear encoding and decoding layers, bridges the gap between tokenized data and LLM comprehension.
arXiv Detail & Related papers (2024-01-25T14:03:15Z) - SpaCE: The Spatial Confounding Environment [2.572906392867547]
SpaCE provides realistic benchmark datasets and tools for evaluating causal inference methods.
Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores.
SpaCE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models.
arXiv Detail & Related papers (2023-12-01T16:42:57Z) - Multi-Scales Data Augmentation Approach In Natural Language Inference
For Artifacts Mitigation And Pre-Trained Model Optimization [0.0]
We provide a variety of techniques for analyzing and locating dataset artifacts inside the crowdsourced Stanford Natural Language Inference corpus.
To mitigate dataset artifacts, we employ a unique multi-scale data augmentation technique with two distinct frameworks.
Our combination method enhances our model's resistance to perturbation testing, enabling it to continuously outperform the pre-trained baseline.
arXiv Detail & Related papers (2022-12-16T23:37:44Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - Automatically Identifying Semantic Bias in Crowdsourced Natural Language
Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets.
interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - Distance in Latent Space as Novelty Measure [0.0]
We propose to intelligently select samples when constructing data sets.
The selection methodology is based on the presumption that two dissimilar samples are worth more than two similar samples in a data set.
By using a self-supervised method to construct the latent space, it is ensured that the space fits the data well and that any upfront labeling effort can be avoided.
arXiv Detail & Related papers (2020-03-31T09:14:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.