IR-BERT: Leveraging BERT for Semantic Search in Background Linking for
News Articles
- URL: http://arxiv.org/abs/2007.12603v1
- Date: Fri, 24 Jul 2020 16:02:14 GMT
- Title: IR-BERT: Leveraging BERT for Semantic Search in Background Linking for
News Articles
- Authors: Anup Anand Deshmukh and Udhav Sethi
- Abstract summary: This work describes our two approaches for the background linking task of TREC 2020 News Track.
The main objective of this task is to recommend a list of relevant articles that the reader should refer to in order to understand the context.
We empirically show that employing a language model benefits our approach in understanding the context as well as the background of the query article.
- Score: 2.707154152696381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work describes our two approaches for the background linking task of
TREC 2020 News Track. The main objective of this task is to recommend a list of
relevant articles that the reader should refer to in order to understand the
context and gain background information of the query article. Our first
approach focuses on building an effective search query by combining weighted
keywords extracted from the query document and uses BM25 for retrieval. The
second approach leverages the capability of SBERT (Nils Reimers et al.) to
learn contextual representations of the query in order to perform semantic
search over the corpus. We empirically show that employing a language model
benefits our approach in understanding the context as well as the background of
the query article. The proposed approaches are evaluated on the TREC 2018
Washington Post dataset and our best model outperforms the TREC median as well
as the highest scoring model of 2018 in terms of the nDCG@5 metric. We further
propose a diversity measure to evaluate the effectiveness of the various
approaches in retrieving a diverse set of documents. This would potentially
motivate researchers to work on introducing diversity in their recommended
list. We have open sourced our implementation on Github and plan to submit our
runs for the background linking task in TREC 2020.
Related papers
- Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval [15.757140563856675]
This work introduces a novel task that focuses on suggesting minimal textual modifications needed to explore visually consistent subsets of the collection.
To facilitate the evaluation and development of methods, we present a tailored benchmark named CroQS.
Baseline methods from related fields, such as image captioning and content summarization, are adapted for this task to provide reference performance scores.
arXiv Detail & Related papers (2024-12-18T13:24:09Z) - Multi-Modal Retrieval For Large Language Model Based Speech Recognition [15.494654232953678]
We propose multi-modal retrieval with two approaches: kNN-LM and cross-attention techniques.
We show that speech-based multi-modal retrieval outperforms text based retrieval.
We achieve state-of-the-art recognition results on the Spoken-Squad question answering dataset.
arXiv Detail & Related papers (2024-06-13T22:55:22Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - Query Rewriting for Retrieval-Augmented Large Language Models [139.242907155883]
Large Language Models (LLMs) play powerful, black-box readers in the retrieve-then-read pipeline.
This work introduces a new framework, Rewrite-Retrieve-Read instead of the previous retrieve-then-read for the retrieval-augmented LLMs.
arXiv Detail & Related papers (2023-05-23T17:27:50Z) - Zero-Shot Listwise Document Reranking with a Large Language Model [58.64141622176841]
We propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data.
Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker.
arXiv Detail & Related papers (2023-05-03T14:45:34Z) - Simple Yet Effective Neural Ranking and Reranking Baselines for
Cross-Lingual Information Retrieval [50.882816288076725]
Cross-lingual information retrieval is the task of searching documents in one language with queries in another.
We provide a conceptual framework for organizing different approaches to cross-lingual retrieval using multi-stage architectures for mono-lingual retrieval as a scaffold.
We implement simple yet effective reproducible baselines in the Anserini and Pyserini IR toolkits for test collections from the TREC 2022 NeuCLIR Track, in Persian, Russian, and Chinese.
arXiv Detail & Related papers (2023-04-03T14:17:00Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z) - ArgFuse: A Weakly-Supervised Framework for Document-Level Event Argument
Aggregation [9.56216681584111]
We introduce the task of Information Aggregation or Argument Aggregation.
Our aim is to filter irrelevant and redundant argument mentions that were extracted at a sentence level and render a document level information frame.
We present an extractive algorithm with multiple sieves which adopts active learning strategies to work efficiently in low-resource settings.
arXiv Detail & Related papers (2021-06-21T05:21:27Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.