IITD-DBAI: Multi-Stage Retrieval with Pseudo-Relevance Feedback and
Query Reformulation
- URL: http://arxiv.org/abs/2203.17042v1
- Date: Thu, 31 Mar 2022 14:07:47 GMT
- Title: IITD-DBAI: Multi-Stage Retrieval with Pseudo-Relevance Feedback and
Query Reformulation
- Authors: Shivani Choudhary
- Abstract summary: Resolving the contextual dependency is one of the most challenging tasks in the Conversational system.
Our submission has produced a mean NDCG@3 performance better than the median model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Resolving the contextual dependency is one of the most challenging tasks in
the Conversational system. Our submission to CAsT-2021 aimed to preserve the
key terms and the context in all subsequent turns and use classical Information
retrieval methods. It was aimed to pull as relevant documents as possible from
the corpus. We have participated in automatic track and submitted two runs in
the CAsT-2021. Our submission has produced a mean NDCG@3 performance better
than the median model.
Related papers
- Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts [67.67746334493302]
Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet they often rely on external context to handle complex tasks.
We propose a tri-encoder sequential retriever that models this process as a Markov Decision Process (MDP)
We show that our method consistently and significantly outperforms baselines, underscoring the importance of explicitly modeling inter-example dependencies.
arXiv Detail & Related papers (2025-04-15T17:35:56Z) - The First Place Solution of WSDM Cup 2024: Leveraging Large Language
Models for Conversational Multi-Doc QA [15.405052113769164]
We introduce our winning approach for the "Conversational Multi-Doc QA" challenge in WSDM Cup 2024.
We first adapt Large Language Models to the task, then devise a hybrid training strategy to make the most of in-domain unlabeled data.
Our solution ranked 1st place in WSDM Cup 2024, surpassing its rivals to a large extent.
arXiv Detail & Related papers (2024-02-28T15:05:43Z) - Sequencing Matters: A Generate-Retrieve-Generate Model for Building
Conversational Agents [9.191944519634111]
The Georgetown InfoSense group has done in regard to solving the challenges presented by TREC iKAT 2023.
Our submitted runs outperform the median runs by a significant margin, exhibiting superior performance in nDCG across various cut numbers and in overall success rate.
Our solution involves the use of Large Language Models (LLMs) for initial answers, answer grounding by BM25, passage quality filtering by logistic regression, and answer generation by LLMs again.
arXiv Detail & Related papers (2023-11-16T02:37:58Z) - Hybrid Retrieval and Multi-stage Text Ranking Solution at TREC 2022 Deep
Learning Track [22.81602641419962]
We explain the hybrid text retrieval and multi-stage text ranking method adopted in our solution.
In the ranking stage, in addition to the full interaction-based ranking model built on large pre-trained language model, we also proposes a lightweight sub-ranking module.
Our models achieve the 1st and 4th rank on the test set of passage ranking and document ranking respectively.
arXiv Detail & Related papers (2023-08-23T09:56:59Z) - SSP: Self-Supervised Post-training for Conversational Search [63.28684982954115]
We propose fullmodel (model) which is a new post-training paradigm with three self-supervised tasks to efficiently initialize the conversational search model.
To verify the effectiveness of our proposed method, we apply the conversational encoder post-trained by model on the conversational search task using two benchmark datasets: CAsT-19 and CAsT-20.
arXiv Detail & Related papers (2023-07-02T13:36:36Z) - Retrieval as Attention: End-to-end Learning of Retrieval and Reading
within a Single Transformer [80.50327229467993]
We show that a single model trained end-to-end can achieve both competitive retrieval and QA performance.
We show that end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings.
arXiv Detail & Related papers (2022-12-05T04:51:21Z) - D2S: Document-to-Slide Generation Via Query-Based Text Summarization [27.576875048631265]
We contribute a new dataset, SciDuet, consisting of pairs of papers and their corresponding slides decks from recent years' NLP and ML conferences.
Secondly, we present D2S, a novel system that tackles the document-to-slides task with a two-step approach.
Our evaluation suggests that long-form QA outperforms state-of-the-art summarization baselines on both automated ROUGE metrics and qualitative human evaluation.
arXiv Detail & Related papers (2021-05-08T10:29:41Z) - Leveraging Query Resolution and Reading Comprehension for Conversational
Passage Retrieval [6.490148466525755]
This paper describes the participation of UvA.ILPS group at the TREC CAsT 2020 track.
Our pipeline consists of (i) an initial retrieval module that uses BM25, and (ii) a re-ranking module that combines the score of a BERT ranking model with the score of a machine comprehension model adjusted for passage retrieval.
arXiv Detail & Related papers (2021-02-17T14:41:57Z) - Tradeoffs in Sentence Selection Techniques for Open-Domain Question
Answering [54.541952928070344]
We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question.
We show that very lightweight QA models can do well at this task, but retrieval-based models are faster still.
arXiv Detail & Related papers (2020-09-18T23:39:15Z) - Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers.
We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z) - Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term
Importance Estimation and Neural Query Rewriting [56.268862325167575]
We tackle conversational passage retrieval (ConvPR) with query reformulation integrated into a multi-stage ad-hoc IR system.
We propose two conversational query reformulation (CQR) methods: (1) term importance estimation and (2) neural query rewriting.
For the former, we expand conversational queries using important terms extracted from the conversational context with frequency-based signals.
For the latter, we reformulate conversational queries into natural, standalone, human-understandable queries with a pretrained sequence-tosequence model.
arXiv Detail & Related papers (2020-05-05T14:30:20Z) - A Study on Efficiency, Accuracy and Document Structure for Answer
Sentence Selection [112.0514737686492]
In this paper, we argue that by exploiting the intrinsic structure of the original rank together with an effective word-relatedness encoder, we can achieve competitive results.
Our model takes 9.5 seconds to train on the WikiQA dataset, i.e., very fast in comparison with the $sim 18$ minutes required by a standard BERT-base fine-tuning.
arXiv Detail & Related papers (2020-03-04T22:12:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.