Knowledge-Aware Diverse Reranking for Cross-Source Question Answering
- URL: http://arxiv.org/abs/2506.20476v1
- Date: Wed, 25 Jun 2025 14:23:21 GMT
- Title: Knowledge-Aware Diverse Reranking for Cross-Source Question Answering
- Authors: Tong Zhou,
- Abstract summary: This paper presents Team Marikarp's solution for the SIGIR 2025 LiveRAG competition.<n>The competition's evaluation set, automatically generated by DataMorgana from internet corpora, encompassed a wide range of target topics.<n>Our proposed knowledge-aware diverse reranking RAG pipeline achieved first place in the competition.
- Score: 9.788039182463768
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents Team Marikarp's solution for the SIGIR 2025 LiveRAG competition. The competition's evaluation set, automatically generated by DataMorgana from internet corpora, encompassed a wide range of target topics, question types, question formulations, audience types, and knowledge organization methods. It offered a fair evaluation of retrieving question-relevant supporting documents from a 15M documents subset of the FineWeb corpus. Our proposed knowledge-aware diverse reranking RAG pipeline achieved first place in the competition.
Related papers
- GETALP@AutoMin 2025: Leveraging RAG to Answer Questions based on Meeting Transcripts [0.18846515534317265]
This paper documents GETALP's submission to the Third Run of the Automatic Minuting Shared Task at SIGDial 2025.<n>Our method is based on a retrieval augmented generation (RAG) system and Abstract Meaning Representations (AMR)<n>Our results show that incorporating AMR leads to high-quality responses for approximately 35% of the questions.
arXiv Detail & Related papers (2025-08-01T09:51:05Z) - PreQRAG -- Classify and Rewrite for Enhanced RAG [1.652907918484303]
We introduce PreQRAG, a Retrieval Augmented Generation architecture designed to improve retrieval and generation quality.<n>PreQRAG incorporates a pipeline that first classifies each input question as either single-document or multi-document type.<n>For single-document questions, we employ question rewriting techniques to improve retrieval precision and generation relevance.<n>For multi-document questions, we decompose complex queries into focused sub-questions that can be processed more effectively.
arXiv Detail & Related papers (2025-06-20T22:02:05Z) - NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA [49.74911193222192]
The competition introduced a dataset of real invoice documents, along with associated questions and answers.<n>The base model is a multi-modal generative language model, and sensitive information could be exposed through either the visual or textual input modality.<n>Participants proposed elegant solutions to reduce communication costs while maintaining a minimum utility threshold.
arXiv Detail & Related papers (2024-11-06T07:51:19Z) - Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage [74.70255719194819]
We introduce a novel framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question.
We use this framework to evaluate three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat.
We find that while all answer engines cover core sub-questions more often than background or follow-up ones, they still miss around 50% of core sub-questions.
arXiv Detail & Related papers (2024-10-20T22:59:34Z) - ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
We study out-of-scope questions, where the retrieved document appears semantically similar to the question but lacks the necessary information to answer it.<n>We propose a guided hallucination-based approach ELOQ to automatically generate a diverse set of out-of-scope questions from post-cutoff documents.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval [5.69361786082969]
Our study focuses on the open-domain QA setting, where the key challenge is to first uncover relevant evidence in large knowledge bases.
By utilizing the common retrieve-then-read QA pipeline and PubMed as a trustworthy collection of medical research documents, we answer health questions from three diverse datasets.
Our results reveal that cutting down on the amount of retrieved documents and favoring more recent and highly cited documents can improve the final macro F1 score up to 10%.
arXiv Detail & Related papers (2024-04-12T09:56:12Z) - NTIRE 2021 Multi-modal Aerial View Object Classification Challenge [88.89190054948325]
We introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR.
This challenge is composed of two different tracks using EO and SAR imagery.
We discuss the top methods submitted for this competition and evaluate their results on our blind test set.
arXiv Detail & Related papers (2021-07-02T16:55:08Z) - DeeperForensics Challenge 2020 on Real-World Face Forgery Detection:
Methods and Results [144.5252578415748]
This paper reports methods and results in the DeeperForensics Challenge 2020 on real-world face forgery detection.
The challenge employs the DeeperForensics-1.0 dataset, with 60,000 videos constituted by a total of 17.6 million frames.
A total of 115 participants registered for the competition, and 25 teams made valid submissions.
arXiv Detail & Related papers (2021-02-18T16:48:57Z) - A Clarifying Question Selection System from NTES_ALONG in Convai3
Challenge [8.656503175492375]
This paper presents the participation of NetEase Game AI Lab team for the ClariQ challenge at Search-oriented Conversational AI (SCAI) EMNLP workshop in 2020.
The challenge asks for a complete conversational information retrieval system that can understanding and generating clarification questions.
We propose a clarifying question selection system which consists of response understanding, candidate question recalling and clarifying question ranking.
arXiv Detail & Related papers (2020-10-27T11:22:53Z) - CAiRE-COVID: A Question Answering and Query-focused Multi-Document
Summarization System for COVID-19 Scholarly Information Management [48.251211691263514]
We present CAiRE-COVID, a real-time question answering (QA) and multi-document summarization system, which won one of the 10 tasks in the Kaggle COVID-19 Open Research dataset Challenge.
Our system aims to tackle the recent challenge of mining the numerous scientific articles being published on COVID-19 by answering high priority questions from the community.
arXiv Detail & Related papers (2020-05-04T15:07:27Z) - Recognizing Families In the Wild: White Paper for the 4th Edition Data
Challenge [91.55319616114943]
This paper summarizes the supported tasks (i.e., kinship verification, tri-subject verification, and search & retrieval of missing children) in the Recognizing Families In the Wild (RFIW) evaluation.
The purpose of this paper is to describe the 2020 RFIW challenge, end-to-end, along with forecasts in promising future directions.
arXiv Detail & Related papers (2020-02-15T02:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.