Related papers: AUEB-Archimedes at RIRAG-2025: Is obligation concatenation really all you need?

AUEB-Archimedes at RIRAG-2025: Is obligation concatenation really all you need?

URL: http://arxiv.org/abs/2412.11567v1
Date: Mon, 16 Dec 2024 08:54:21 GMT
Title: AUEB-Archimedes at RIRAG-2025: Is obligation concatenation really all you need?
Authors: Ioannis Chasandras, Odysseas S. Chlapanis, Ion Androutsopoulos,
Abstract summary: This paper presents the systems we developed for RIRAG-2025, a shared task that requires answering regulatory questions by retrieving relevant passages.<n>The generated answers are evaluated using RePASs, a reference-free and model-based metric.<n>We show that by exploiting a neural component of RePASs that extracts important sentences ('obligations') from the retrieved passages, we achieve a dubiously high score (0.947)<n>We then show that by selecting the answer with the best RePASs among a few generated alternatives, we can generate readable, coherent answers that achieve a more plausible and relatively high
Score: 11.172264842171682
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents the systems we developed for RIRAG-2025, a shared task that requires answering regulatory questions by retrieving relevant passages. The generated answers are evaluated using RePASs, a reference-free and model-based metric. Our systems use a combination of three retrieval models and a reranker. We show that by exploiting a neural component of RePASs that extracts important sentences ('obligations') from the retrieved passages, we achieve a dubiously high score (0.947), even though the answers are directly extracted from the retrieved passages and are not actually generated answers. We then show that by selecting the answer with the best RePASs among a few generated alternatives and then iteratively refining this answer by reducing contradictions and covering more obligations, we can generate readable, coherent answers that achieve a more plausible and relatively high score (0.639).

Related papers

Question Decomposition for Retrieval-Augmented Generation [2.6409776648054764]
We propose a RAG pipeline that incorporates question decomposition into sub-questions.<n>We show that question decomposition effectively assembles complementary documents, while reranking reduces noise.<n>Although reranking itself is standard, we show that pairing an off-the-shelf cross-encoder reranker with LLM-driven question decomposition bridges the retrieval gap on multi-hop questions.
arXiv Detail & Related papers (2025-07-01T01:01:54Z)
Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval via Bagging and SVR Ensembles [51.0691253204425]
We introduce a retrieval approach leveraging Support Vector Regression ensembles, bootstrap aggregation (bagging), and embedding spaces on the German dataset for Legal Information Retrieval (GerDaLIR) We show improved recall over the baselines using our voting ensemble, suggesting promising initial results, without training or fine-tuning any deep learning models.
arXiv Detail & Related papers (2025-01-09T07:21:44Z)
Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation [41.43397783169612]
Open-domain Question Answering (QA) has garnered substantial interest by combining faithfully retrieved passages and relevant passages generated through Large Language Models (LLMs) There is a lack of definitive labels available to pair these sources of knowledge. We propose Bi-Reranking for Merging Generated and Retrieved Knowledge (BRMGR), which utilizes re-ranking methods for both retrieved passages and LLM-generated passages.
arXiv Detail & Related papers (2024-12-25T06:40:36Z)
Evidence Contextualization and Counterfactual Attribution for Conversational QA over Heterogeneous Data with RAG Systems [4.143039012104666]
Retrieval Augmented Generation (RAG) works as a backbone for interacting with an enterprise's own data via Conversational Question Answering (ConvQA)<n>In this work, we demonstrate RAGONITE, a RAG system that remedies the above concerns by: (i) contextualizing evidence with source metadata and surrounding text; and (ii) computing counterfactual attribution.
arXiv Detail & Related papers (2024-12-13T21:28:17Z)
Bridging Relevance and Reasoning: Rationale Distillation in Retrieval-Augmented Generation [43.50677378728461]
We propose RADIO, a novel and practical preference alignment framework with RAtionale DIstillatiOn.<n>We first propose a rationale extraction method that leverages the reasoning capabilities of Large Language Models (LLMs) to extract the rationales necessary for answering the query.<n> Subsequently, a rationale-based alignment process is designed to rerank the documents based on the extracted rationales, and fine-tune the reranker to align the preferences.
arXiv Detail & Related papers (2024-12-11T16:32:41Z)
RAG-based Question Answering over Heterogeneous Data and Text [23.075485587443485]
This article presents the QUASAR system for question answering over unstructured text, structured tables, and knowledge graphs.<n>The system adopts a RAG-based architecture, with a pipeline of evidence retrieval followed by answer generation, with the latter powered by a moderate-sized language model.<n> Experiments with three different benchmarks demonstrate the high answering quality of our approach, being on par with or better than large GPT models.
arXiv Detail & Related papers (2024-12-10T11:18:29Z)
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage [74.70255719194819]
We introduce a novel framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question. We use this framework to evaluate three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat. We find that while all answer engines cover core sub-questions more often than background or follow-up ones, they still miss around 50% of core sub-questions.
arXiv Detail & Related papers (2024-10-20T22:59:34Z)
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding. We show that incorporating explicit reasoning about the query improves retrieval performance by up to 12.2 points.
arXiv Detail & Related papers (2024-07-16T17:58:27Z)
SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs [85.54906813106683]
We propose a simple yet effective framework to enhance open-domain question answering (ODQA) with large language models (LLMs) SuRe helps LLMs predict more accurate answers for a given question, which are well-supported by the summarized retrieval (SuRe) Experimental results on diverse ODQA benchmarks demonstrate the superiority of SuRe, with improvements of up to 4.6% in exact match (EM) and 4.0% in F1 score over standard prompting approaches.
arXiv Detail & Related papers (2024-04-17T01:15:54Z)
Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval [9.136948771060895]
We evaluate two methods for further improvement in this setting. Both focus on combining rationales generated by a larger Language Model with longer contexts created from a multi-hop dense retrieval system. Our single best Reasoning model materially improves upon strong comparable prior baselines for unseen evaluation datasets.
arXiv Detail & Related papers (2023-08-09T05:06:39Z)
Adapting Neural Link Predictors for Data-Efficient Complex Query Answering [45.961111441411084]
We propose a parameter-efficient score emphadaptation model optimised to re-calibrate neural link prediction scores for the complex query answering task. CQD$mathcalA$ produces significantly more accurate results than current state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T00:17:16Z)
Reranking Overgenerated Responses for End-to-End Task-Oriented Dialogue Systems [71.33737787564966]
End-to-end (E2E) task-oriented dialogue (ToD) systems are prone to fall into the so-called 'likelihood trap' We propose a reranking method which aims to select high-quality items from the lists of responses initially overgenerated by the system. Our methods improve a state-of-the-art E2E ToD system by 2.4 BLEU, 3.2 ROUGE, and 2.8 METEOR scores, achieving new peak results.
arXiv Detail & Related papers (2022-11-07T15:59:49Z)
Joint Passage Ranking for Diverse Multi-Answer Retrieval [56.43443577137929]
We study multi-answer retrieval, an under-explored problem that requires retrieving passages to cover multiple distinct answers for a question. This task requires joint modeling of retrieved passages, as models should not repeatedly retrieve passages containing the same answer at the cost of missing a different valid answer. In this paper, we introduce JPR, a joint passage retrieval model focusing on reranking. To model the joint probability of the retrieved passages, JPR makes use of an autoregressive reranker that selects a sequence of passages, equipped with novel training and decoding algorithms.
arXiv Detail & Related papers (2021-04-17T04:48:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.