NevIR: Negation in Neural Information Retrieval
- URL: http://arxiv.org/abs/2305.07614v2
- Date: Mon, 26 Feb 2024 20:55:25 GMT
- Title: NevIR: Negation in Neural Information Retrieval
- Authors: Orion Weller, Dawn Lawrie, Benjamin Van Durme
- Abstract summary: Negation is a common everyday phenomenon and has been a consistent area of weakness for language models (LMs)
We construct a benchmark asking IR models to rank two documents that differ only by negation.
We show that the results vary widely according to the type of IR architecture: cross-encoders perform best, followed by late-interaction models, and in last place are bi-encoder and sparse neural architectures.
- Score: 45.9442701147499
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Negation is a common everyday phenomena and has been a consistent area of
weakness for language models (LMs). Although the Information Retrieval (IR)
community has adopted LMs as the backbone of modern IR architectures, there has
been little to no research in understanding how negation impacts neural IR. We
therefore construct a straightforward benchmark on this theme: asking IR models
to rank two documents that differ only by negation. We show that the results
vary widely according to the type of IR architecture: cross-encoders perform
best, followed by late-interaction models, and in last place are bi-encoder and
sparse neural architectures. We find that most information retrieval models
(including SOTA ones) do not consider negation, performing the same or worse
than a random ranking. We show that although the obvious approach of continued
fine-tuning on a dataset of contrastive documents containing negations
increases performance (as does model size), there is still a large gap between
machine and human performance.
Related papers
- Reproducing NevIR: Negation in Neural Information Retrieval [5.950812862331131]
Negation is a fundamental aspect of human communication, yet it remains a challenge for Language Models in Information Retrieval (IR)
We reproduce and extend the findings of NevIR, a benchmark study that revealed most IR models perform at or below the level of random ranking when dealing with negation.
Our findings show that a recently emerging category-listwise Large Language Model (LLM) re-rankers-outperforms other models but still underperforms human performance.
arXiv Detail & Related papers (2025-02-19T07:50:59Z) - Vision-Language Models Do Not Understand Negation [50.27667000027403]
NegBench is a benchmark designed to evaluate negation understanding across 18 task variations and 79k examples spanning image, video, and medical datasets.
We show that this approach can result in a 10% increase in recall on negated queries and a 40% boost in accuracy on multiple-choice questions with negated captions.
arXiv Detail & Related papers (2025-01-16T09:55:42Z) - Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective [111.58315434849047]
robustness of neural information retrieval models (IR) models has garnered significant attention.
We view the robustness of IR to be a multifaceted concept, emphasizing its necessity against adversarial attacks, out-of-distribution (OOD) scenarios and performance variance.
We provide an in-depth discussion of existing methods, datasets, and evaluation metrics, shedding light on challenges and future directions in the era of large language models.
arXiv Detail & Related papers (2024-07-09T16:07:01Z) - Explainable AI for Comparative Analysis of Intrusion Detection Models [20.683181384051395]
This research analyzes various machine learning models to the tasks of binary and multi-class classification for intrusion detection from network traffic.
We trained all models to the accuracy of 90% on the UNSW-NB15 dataset.
We also discover that Random Forest provides the best performance in terms of accuracy, time efficiency and robustness.
arXiv Detail & Related papers (2024-06-14T03:11:01Z) - Evaluating Machine Learning Models with NERO: Non-Equivariance Revealed
on Orbits [19.45052971156096]
We propose a novel evaluation workflow, named Non-Equivariance Revealed on Orbits (NERO) Evaluation.
NERO evaluation is consist of a task-agnostic interactive interface and a set of visualizations, called NERO plots.
Case studies on how NERO evaluation can be applied to multiple research areas, including 2D digit recognition, object detection, particle image velocimetry (PIV), and 3D point cloud classification.
arXiv Detail & Related papers (2023-05-31T14:24:35Z) - Mind the Backbone: Minimizing Backbone Distortion for Robust Object
Detection [52.355018626115346]
Building object detectors that are robust to domain shifts is critical for real-world applications.
We propose to use Relative Gradient Norm as a way to measure the vulnerability of a backbone to feature distortion.
We present recipes to boost OOD robustness for both types of backbones.
arXiv Detail & Related papers (2023-03-26T14:50:43Z) - Towards Regression-Free Neural Networks for Diverse Compute Platforms [50.64489250972764]
We introduce REGression constrained Neural Architecture Search (REG-NAS) to design a family of highly accurate models that engender fewer negative flips.
REG-NAS consists of two components: (1) A novel architecture constraint that enables a larger model to contain all the weights of the smaller one thus maximizing weight sharing.
We demonstrate that regnas can successfully find desirable architectures with few negative flips in three popular architecture search spaces.
arXiv Detail & Related papers (2022-09-27T23:19:16Z) - Entity-Conditioned Question Generation for Robust Attention Distribution
in Neural Information Retrieval [51.53892300802014]
We show that supervised neural information retrieval models are prone to learning sparse attention patterns over passage tokens.
Using a novel targeted synthetic data generation method, we teach neural IR to attend more uniformly and robustly to all entities in a given passage.
arXiv Detail & Related papers (2022-04-24T22:36:48Z) - Characterizing and Understanding the Behavior of Quantized Models for
Reliable Deployment [32.01355605506855]
Quantization-aware training can produce more stable models than standard, adversarial, and Mixup training.
Disagreements often have closer top-1 and top-2 output probabilities, and $Margin$ is a better indicator than the other uncertainty metrics to distinguish disagreements.
We opensource our code and models as a new benchmark for further studying the quantized models.
arXiv Detail & Related papers (2022-04-08T11:19:16Z) - Match Your Words! A Study of Lexical Matching in Neural Information
Retrieval [11.930815087240479]
We study the behavior of different state-of-the-art neural IR models, focusing on whether they are able to perform lexical matching when it's actually useful.
We show that neural IR models fail to properly generalize term importance on out-of-domain collections or terms almost unseen during training.
arXiv Detail & Related papers (2021-12-10T16:49:49Z) - Learning from Context or Names? An Empirical Study on Neural Relation
Extraction [112.06614505580501]
We study the effect of two main information sources in text: textual context and entity mentions (names)
We propose an entity-masked contrastive pre-training framework for relation extraction (RE)
Our framework can improve the effectiveness and robustness of neural models in different RE scenarios.
arXiv Detail & Related papers (2020-10-05T11:21:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.