Related papers: Question Answering for Multi-Release Systems: A Case Study at Ciena

Question Answering for Multi-Release Systems: A Case Study at Ciena

URL: http://arxiv.org/abs/2601.02345v1
Date: Mon, 05 Jan 2026 18:44:26 GMT
Title: Question Answering for Multi-Release Systems: A Case Study at Ciena
Authors: Parham Khamsepour, Mark Cole, Ish Ashraf, Sandeep Puri, Mehrdad Sabetzadeh, Shiva Nejati,
Abstract summary: Question answering over documents from multi-release systems poses challenges because different releases have distinct yet overlapping documentation.<n>Motivated by the observed inaccuracy of state-of-the-art question-answering techniques on multi-release system documents, we propose QAMR.<n> QAMR enhances traditional retrieval-augmented generation (RAG) to ensure accuracy in the face of highly similar yet distinct documentation for different releases.
Score: 1.3252590516094356
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Companies regularly have to contend with multi-release systems, where several versions of the same software are in operation simultaneously. Question answering over documents from multi-release systems poses challenges because different releases have distinct yet overlapping documentation. Motivated by the observed inaccuracy of state-of-the-art question-answering techniques on multi-release system documents, we propose QAMR, a chatbot designed to answer questions across multi-release system documentation. QAMR enhances traditional retrieval-augmented generation (RAG) to ensure accuracy in the face of highly similar yet distinct documentation for different releases. It achieves this through a novel combination of pre-processing, query rewriting, and context selection. In addition, QAMR employs a dual-chunking strategy to enable separately tuned chunk sizes for retrieval and answer generation, improving overall question-answering accuracy. We evaluate QAMR using a public software-engineering benchmark as well as a collection of real-world, multi-release system documents from our industry partner, Ciena. Our evaluation yields five main findings: (1) QAMR outperforms a baseline RAG-based chatbot, achieving an average answer correctness of 88.5% and an average retrieval accuracy of 90%, which correspond to improvements of 16.5% and 12%, respectively. (2) An ablation study shows that QAMR's mechanisms for handling multi-release documents directly improve answer accuracy. (3) Compared to its component-ablated variants, QAMR achieves a 19.6% average gain in answer correctness and a 14.0% average gain in retrieval accuracy over the best ablation. (4) QAMR reduces response time by 8% on average relative to the baseline. (5) The automatically computed accuracy metrics used in our evaluation strongly correlate with expert human assessments, validating the reliability of our methodology.

Related papers

Rethinking Retrieval: From Traditional Retrieval Augmented Generation to Agentic and Non-Vector Reasoning Systems in the Financial Domain for Large Language Models [0.0]
We present the first systematic evaluation comparing vector-based agentic RAG using hybrid search and metadata filtering.<n>We measure retrieval metrics (MRR, Recall@5), answer quality through LLM-as-a-judge pairwise comparisons, latency, and preprocessing costs.<n>Our findings reveal that applying advanced RAG techniques to financial Q&A systems improves retrieval accuracy, answer quality, and has cost-performance tradeoffs to be considered in production.
arXiv Detail & Related papers (2025-11-22T20:06:25Z)
Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z)
Pathways of Thoughts: Multi-Directional Thinking for Long-form Personalized Question Answering [57.12316804290369]
Personalization is essential for adapting question answering systems to user-specific information needs.<n>We propose Pathways of Thoughts (PoT), an inference-stage method that applies to any large language model (LLM) without requiring task-specific fine-tuning.<n>PoT consistently outperforms competitive baselines, achieving up to a 13.1% relative improvement.
arXiv Detail & Related papers (2025-09-23T14:44:46Z)
Retrieval-Augmented Generation for Reliable Interpretation of Radio Regulations [49.671779378073886]
We study question answering in the domain of radio regulations.<n>We propose a telecom-specific Retrieval-Augmented Generation (RAG) pipeline.<n>Our approach consistently improves generation accuracy across all tested models.
arXiv Detail & Related papers (2025-09-11T17:43:42Z)
RAGentA: Multi-Agent Retrieval-Augmented Generation for Attributed Question Answering [4.224843546370802]
We present RAGentA, a framework for attributed question answering with large language models (LLMs)<n>With the goal of trustworthy answer generation, RAGentA focuses on optimizing answer correctness, defined by coverage and relevance to the question and faithfulness.<n>Central to the framework is a hybrid retrieval strategy that combines sparse and dense methods, improving Recall@20 by 12.5% compared to the best single retrieval model.
arXiv Detail & Related papers (2025-06-20T13:37:03Z)
Vendi-RAG: Adaptively Trading-Off Diversity And Quality Significantly Improves Retrieval Augmented Generation With LLMs [2.992602379681373]
Vendi-RAG is a framework based on an iterative process that jointly optimize retrieval diversity and answer quality.<n>Veddi-RAG leverages the Vendi Score (VS), a flexible similarity-based diversity metric, to promote semantic diversity in document retrieval.<n>Veddi-RAG achieves significant accuracy improvements over traditional single-step and multi-step RAG approaches.
arXiv Detail & Related papers (2025-02-16T18:46:10Z)
MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation [34.66546005629471]
Large Language Models (LLMs) are essential tools for various natural language processing tasks but often suffer from generating outdated or incorrect information.<n>Retrieval-Augmented Generation (RAG) addresses this issue by incorporating external, real-time information retrieval to ground LLM responses.<n>To tackle this problem, we propose Multi-Agent Filtering Retrieval-Augmented Generation (MAIN-RAG)<n>MAIN-RAG is a training-free RAG framework that leverages multiple LLM agents to collaboratively filter and score retrieved documents.
arXiv Detail & Related papers (2024-12-31T08:07:26Z)
Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU [3.1787418271023404]
We designed a Retrieval-Augmented Generation (RAG) system to provide large language models with relevant documents for answering domain-specific questions. We extracted over 1,800 subpages using a greedy scraping strategy and employed a hybrid annotation process, combining manual and Mistral-generated question-answer pairs. Our RAG framework integrates BM25 and FAISS retrievers, enhanced with a reranker for improved document retrieval accuracy.
arXiv Detail & Related papers (2024-11-20T20:10:43Z)
ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
We study out-of-scope questions, where the retrieved document appears semantically similar to the question but lacks the necessary information to answer it.<n>We propose a guided hallucination-based approach ELOQ to automatically generate a diverse set of out-of-scope questions from post-cutoff documents.
arXiv Detail & Related papers (2024-10-18T16:11:29Z)
SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation) We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z)
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics [74.28810048824519]
Question answering-based summarization evaluation metrics must automatically determine whether the QA model's prediction is correct or not. We benchmark the lexical answer verification methods which have been used by current QA-based metrics as well as two more sophisticated text comparison methods.
arXiv Detail & Related papers (2022-04-21T15:43:45Z)
Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees [63.62448343531963]
We propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently. We observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget.
arXiv Detail & Related papers (2021-11-17T05:00:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.