Related papers: Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

URL: http://arxiv.org/abs/2304.12631v1
Date: Tue, 25 Apr 2023 07:58:38 GMT
Title: Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation
Authors: Michael Llordes, Debasis Ganguly, Sumit Bhatia and Chirag Agarwal
Abstract summary: We introduce the notion of equivalent queries that are generated by maximizing the similarity between the NRM's results and the result set of a sparse retrieval system. We then compare this approach with existing methods such as RM3-based query expansion.
Score: 19.922420813509518
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural retrieval models (NRMs) have been shown to outperform their statistical counterparts owing to their ability to capture semantic meaning via dense document representations. These models, however, suffer from poor interpretability as they do not rely on explicit term matching. As a form of local per-query explanations, we introduce the notion of equivalent queries that are generated by maximizing the similarity between the NRM's results and the result set of a sparse retrieval system with the equivalent query. We then compare this approach with existing methods such as RM3-based query expansion and contrast differences in retrieval effectiveness and in the terms generated by each approach.

Related papers

An Ensemble Embedding Approach for Improving Semantic Caching Performance in LLM-based Systems [4.364576564103288]
This paper presents an ensemble embedding approach that combines multiple embedding models through a trained meta-encoder to improve semantic similarity detection.<n>We evaluate our method using the Quora Question Pairs dataset, measuring cache hit ratios, cache miss ratios, token savings, and response times.
arXiv Detail & Related papers (2025-07-08T09:20:12Z)
Unifying Generative and Dense Retrieval for Sequential Recommendation [37.402860622707244]
We propose LIGER, a hybrid model that combines the strengths of sequential dense retrieval and generative retrieval. LIGER integrates sequential dense retrieval into generative retrieval, mitigating performance differences and enhancing cold-start item recommendation. This hybrid approach provides insights into the trade-offs between these approaches and demonstrates improvements in efficiency and effectiveness for recommendation systems in small-scale benchmarks.
arXiv Detail & Related papers (2024-11-27T23:36:59Z)
Rethinking Distance Metrics for Counterfactual Explainability [53.436414009687]
We investigate a framing for counterfactual generation methods that considers counterfactuals not as independent draws from a region around the reference, but as jointly sampled with the reference from the underlying data distribution. We derive a distance metric, tailored for counterfactual similarity that can be applied to a broad range of settings.
arXiv Detail & Related papers (2024-10-18T15:06:50Z)
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval [49.076590578101985]
We present a diffusion-based ATR framework (DiffATR) that generates joint distribution from noise. Experiments on the AudioCaps and Clotho datasets with superior performances, verify the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-16T06:33:26Z)
Retrieval with Learned Similarities [2.729516456192901]
State-of-the-art retrieval algorithms have migrated to learned similarities. We show that Mixture-of-Logits (MoL) can be realized empirically to achieve superior performance on diverse retrieval scenarios.
arXiv Detail & Related papers (2024-07-22T08:19:34Z)
CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
Cross-modal retrieval aims to search for instances, which are semantically related to the query through the interaction of different modal data.<n>Traditional solutions utilize a single-tower or dual-tower framework to explicitly compute the score between queries and candidates.<n>We propose a generative cross-modal retrieval framework (CART) based on coarse-to-fine semantic modeling.
arXiv Detail & Related papers (2024-06-25T12:47:04Z)
Modelled Multivariate Overlap: A method for measuring vowel merger [0.0]
This paper introduces a novel method for quantifying vowel overlap. We evaluate this method on corpus speech data targeting the PIN-PEN merger in four dialects of English.
arXiv Detail & Related papers (2024-06-24T04:56:26Z)
DEMO: A Statistical Perspective for Efficient Image-Text Matching [32.256725860652914]
We introduce Distribution-based Structure Mining with Consistency Learning (DEMO) for efficient image-text matching. DEMO characterizes each image using multiple augmented views, which are considered as samples drawn from its intrinsic semantic distribution. In addition, we introduce collaborative consistency learning which not only preserves the similarity structure in the Hamming space but also encourages consistency between retrieval distribution from different directions.
arXiv Detail & Related papers (2024-05-19T09:38:56Z)
Counting Like Human: Anthropoid Crowd Counting on Modeling the Similarity of Objects [92.80955339180119]
mainstream crowd counting methods regress density map and integrate it to obtain counting results. Inspired by this, we propose a rational and anthropoid crowd counting framework.
arXiv Detail & Related papers (2022-12-02T07:00:53Z)
Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context. Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR. For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z)
UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query. Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition [59.52434325897716]
We propose a solution, named DMUE, to address the problem of annotation ambiguity from two perspectives. For the former, an auxiliary multi-branch learning framework is introduced to better mine and describe the latent distribution in the label space. For the latter, the pairwise relationship of semantic feature between instances are fully exploited to estimate the ambiguity extent in the instance space.
arXiv Detail & Related papers (2021-04-01T03:21:57Z)
Named Entity Recognition and Relation Extraction using Enhanced Table Filling by Contextualized Representations [14.614028420899409]
The proposed method computes representations for entity mentions and long-range dependencies without complicated hand-crafted features or neural-network architectures. We also adapt a tensor dot-product to predict relation labels all at once without resorting to history-based predictions or search strategies. Despite its simplicity, the experimental results demonstrate that the proposed method outperforms the state-of-the-art methods on the CoNLL04 and ACE05 English datasets.
arXiv Detail & Related papers (2020-10-15T04:58:23Z)
Unsupervised Summarization by Jointly Extracting Sentences and Keywords [12.387378783627762]
RepRank is an unsupervised graph-based ranking model for extractive multi-document summarization. We show that salient sentences and keywords can be extracted in a joint and mutual reinforcement process using our learned representations. Experiment results with multiple benchmark datasets show that RepRank achieved the best or comparable performance in ROUGE.
arXiv Detail & Related papers (2020-09-16T05:58:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.