Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR
- URL: http://arxiv.org/abs/2407.07790v1
- Date: Wed, 10 Jul 2024 16:07:51 GMT
- Title: Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR
- Authors: Nandan Thakur, Luiz Bonifacio, Maik Fröbe, Alexander Bondarenko, Ehsan Kamalloo, Martin Potthast, Matthias Hagen, Jimmy Lin,
- Abstract summary: We run a study on the Touch'e 2020 data to explore the potential limits of neural retrieval models.
Our black-box evaluation reveals an inherent bias of neural models towards retrieving short passages.
As many of the short Touch'e passages are not argumentative and thus non-relevant per se, we denoise the Touch'e 2020 data by excluding very short passages.
- Score: 99.13855300096925
- License:
- Abstract: The zero-shot effectiveness of neural retrieval models is often evaluated on the BEIR benchmark -- a combination of different IR evaluation datasets. Interestingly, previous studies found that particularly on the BEIR subset Touch\'e 2020, an argument retrieval task, neural retrieval models are considerably less effective than BM25. Still, so far, no further investigation has been conducted on what makes argument retrieval so "special". To more deeply analyze the respective potential limits of neural retrieval models, we run a reproducibility study on the Touch\'e 2020 data. In our study, we focus on two experiments: (i) a black-box evaluation (i.e., no model retraining), incorporating a theoretical exploration using retrieval axioms, and (ii) a data denoising evaluation involving post-hoc relevance judgments. Our black-box evaluation reveals an inherent bias of neural models towards retrieving short passages from the Touch\'e 2020 data, and we also find that quite a few of the neural models' results are unjudged in the Touch\'e 2020 data. As many of the short Touch\'e passages are not argumentative and thus non-relevant per se, and as the missing judgments complicate fair comparison, we denoise the Touch\'e 2020 data by excluding very short passages (less than 20 words) and by augmenting the unjudged data with post-hoc judgments following the Touch\'e guidelines. On the denoised data, the effectiveness of the neural models improves by up to 0.52 in nDCG@10, but BM25 is still more effective. Our code and the augmented Touch\'e 2020 dataset are available at \url{https://github.com/castorini/touche-error-analysis}.
Related papers
- Investigating Weight-Perturbed Deep Neural Networks With Application in
Iris Presentation Attack Detection [11.209470024746683]
We assess the sensitivity of deep neural networks against perturbations to their weight and bias parameters.
We propose improved models simply by perturbing parameters of the network without undergoing training.
The ensemble at the parameter-level shows an average improvement of 43.58% on the LivDet-Iris-2017 dataset and 9.25% on the LivDet-Iris-2020 dataset.
arXiv Detail & Related papers (2023-11-21T18:18:50Z) - Continuous time recurrent neural networks: overview and application to
forecasting blood glucose in the intensive care unit [56.801856519460465]
Continuous time autoregressive recurrent neural networks (CTRNNs) are a deep learning model that account for irregular observations.
We demonstrate the application of these models to probabilistic forecasting of blood glucose in a critical care setting.
arXiv Detail & Related papers (2023-04-14T09:39:06Z) - Reveal to Revise: An Explainable AI Life Cycle for Iterative Bias
Correction of Deep Models [11.879170124003252]
State-of-the-art machine learning models often learn spurious correlations embedded in the training data.
This poses risks when deploying these models for high-stake decision-making.
We propose Reveal to Revise (R2R) to identify, mitigate, and (re-)evaluate spurious model behavior.
arXiv Detail & Related papers (2023-03-22T15:23:09Z) - Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language
Understanding [82.46024259137823]
We propose a cross-model comparative loss for a broad range of tasks.
We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks.
arXiv Detail & Related papers (2023-01-10T03:04:27Z) - Combining human parsing with analytical feature extraction and ranking
schemes for high-generalization person reidentification [0.0]
Person reidentification (re-ID) has been receiving increasing attention in recent years due to its importance for both science and society.
Machine learning and particularly Deep Learning (DL) has become the main re-id tool that allowed researches to achieve unprecedented accuracy levels on benchmark datasets.
We present a model without trainable parameters which shows great potential for high generalization.
arXiv Detail & Related papers (2022-07-28T17:22:48Z) - PROMISSING: Pruning Missing Values in Neural Networks [0.0]
We propose a simple and intuitive yet effective method for pruning missing values (PROMISSING) during learning and inference steps in neural networks.
Our experiments show that PROMISSING results in similar prediction performance compared to various imputation techniques.
arXiv Detail & Related papers (2022-06-03T15:37:27Z) - A Thorough Examination on Zero-shot Dense Retrieval [84.70868940598143]
We present the first thorough examination of the zero-shot capability of dense retrieval (DR) models.
We discuss the effect of several key factors related to source training set, analyze the potential bias from the target dataset, and review and compare existing zero-shot DR models.
arXiv Detail & Related papers (2022-04-27T07:59:07Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Classification of fNIRS Data Under Uncertainty: A Bayesian Neural
Network Approach [0.15229257192293197]
We use a Bayesian Neural Network (BNN) to carry out a binary classification on an open-access dataset.
Our model produced an overall classification accuracy of 86.44% over 30 volunteers.
arXiv Detail & Related papers (2021-01-18T15:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.