ABNIRML: Analyzing the Behavior of Neural IR Models
- URL: http://arxiv.org/abs/2011.00696v2
- Date: Thu, 20 Jul 2023 08:56:26 GMT
- Title: ABNIRML: Analyzing the Behavior of Neural IR Models
- Authors: Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman
Cohan
- Abstract summary: Pretrained language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search.
We present a new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML)
We conduct an empirical study that yields insights into the factors that contribute to the neural model's gains.
- Score: 45.74073795558624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained contextualized language models such as BERT and T5 have
established a new state-of-the-art for ad-hoc search. However, it is not yet
well-understood why these methods are so effective, what makes some variants
more effective than others, and what pitfalls they may have. We present a new
comprehensive framework for Analyzing the Behavior of Neural IR ModeLs
(ABNIRML), which includes new types of diagnostic probes that allow us to test
several characteristics -- such as writing styles, factuality, sensitivity to
paraphrasing and word order -- that are not addressed by previous techniques.
To demonstrate the value of the framework, we conduct an extensive empirical
study that yields insights into the factors that contribute to the neural
model's gains, and identify potential unintended biases the models exhibit.
Some of our results confirm conventional wisdom, like that recent neural
ranking models rely less on exact term overlap with the query, and instead
leverage richer linguistic information, evidenced by their higher sensitivity
to word and sentence order. Other results are more surprising, such as that
some models (e.g., T5 and ColBERT) are biased towards factually correct (rather
than simply relevant) texts. Further, some characteristics vary even for the
same base language model, and other characteristics can appear due to random
variations during model training.
Related papers
- Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - A generative framework to bridge data-driven models and scientific theories in language neuroscience [84.76462599023802]
We present generative explanation-mediated validation, a framework for generating concise explanations of language selectivity in the brain.
We show that explanatory accuracy is closely related to the predictive power and stability of the underlying statistical models.
arXiv Detail & Related papers (2024-10-01T15:57:48Z) - Longer Fixations, More Computation: Gaze-Guided Recurrent Neural
Networks [12.57650361978445]
Humans read texts at a varying pace, while machine learning models treat each token in the same way.
In this paper, we convert this intuition into a set of novel models with fixation-guided parallel RNNs or layers.
We find that, interestingly, the fixation duration predicted by neural networks bears some resemblance to humans' fixation.
arXiv Detail & Related papers (2023-10-31T21:32:11Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA)
We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets.
The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z) - On the Lack of Robust Interpretability of Neural Text Classifiers [14.685352584216757]
We assess the robustness of interpretations of neural text classifiers based on pretrained Transformer encoders.
Both tests show surprising deviations from expected behavior, raising questions about the extent of insights that practitioners may draw from interpretations.
arXiv Detail & Related papers (2021-06-08T18:31:02Z) - Recoding latent sentence representations -- Dynamic gradient-based
activation modification in RNNs [0.0]
In RNNs, encoding information in a suboptimal way can impact the quality of representations based on later elements in the sequence.
I propose an augmentation to standard RNNs in form of a gradient-based correction mechanism.
I conduct different experiments in the context of language modeling, where the impact of using such a mechanism is examined in detail.
arXiv Detail & Related papers (2021-01-03T17:54:17Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.