Meet Your New Client: Writing Reports for AI -- Benchmarking Information Loss in Market Research Deliverables
- URL: http://arxiv.org/abs/2508.15817v1
- Date: Sun, 17 Aug 2025 12:05:44 GMT
- Title: Meet Your New Client: Writing Reports for AI -- Benchmarking Information Loss in Market Research Deliverables
- Authors: Paul F. Simmering, Benedikt Schulz, Oliver Tabino, Georg Wittenburg,
- Abstract summary: This study evaluates information loss during their ingestion into RAG systems.<n>Findings show that while text is reliably extracted, significant information is lost from complex objects like charts and diagrams.<n>This suggests a need for specialized, AI-native deliverables to ensure research insights are not lost in translation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As organizations adopt retrieval-augmented generation (RAG) for their knowledge management systems (KMS), traditional market research deliverables face new functional demands. While PDF reports and slides have long served human readers, they are now also "read" by AI systems to answer user questions. To future-proof reports being delivered today, this study evaluates information loss during their ingestion into RAG systems. It compares how well PDF and PowerPoint (PPTX) documents converted to Markdown can be used by an LLM to answer factual questions in an end-to-end benchmark. Findings show that while text is reliably extracted, significant information is lost from complex objects like charts and diagrams. This suggests a need for specialized, AI-native deliverables to ensure research insights are not lost in translation.
Related papers
- Inferential Question Answering [67.54465021408724]
We introduce Inferential QA -- a new task that challenges models to infer answers from answer-supporting passages which provide only clues.<n>To study this problem, we construct QUIT (QUestions requiring Inference from Texts) dataset, comprising 7,401 questions and 2.4M passages.<n>We show that methods effective on traditional QA tasks struggle in inferential QA: retrievers underperform, rerankers offer limited gains, and fine-tuning provides inconsistent improvements.
arXiv Detail & Related papers (2026-02-01T14:02:43Z) - GE-Chat: A Graph Enhanced RAG Framework for Evidential Response Generation of LLMs [6.3596531375179515]
This paper proposes GE-Chat, a knowledge Graph enhanced retrieval-augmented generation framework to provide evidence-based response generation.<n> Specifically, when the user uploads a material document, a knowledge graph will be created, which helps construct a retrieval-augmented agent.<n>We leverage Chain-of-Thought (CoT) logic generation, n-hop sub-graph searching, and entailment-based sentence generation to realize accurate evidence retrieval.
arXiv Detail & Related papers (2025-05-15T10:17:35Z) - Developing Retrieval Augmented Generation (RAG) based LLM Systems from PDFs: An Experience Report [3.4632900249241874]
This paper presents an experience report on the development of Retrieval Augmented Generation (RAG) systems using PDF documents as the primary data source.
The RAG architecture combines generative capabilities of Large Language Models (LLMs) with the precision of information retrieval.
The practical implications of this research lie in enhancing the reliability of generative AI systems in various sectors.
arXiv Detail & Related papers (2024-10-21T12:21:49Z) - ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
We study out-of-scope questions, where the retrieved document appears semantically similar to the question but lacks the necessary information to answer it.<n>We propose a guided hallucination-based approach ELOQ to automatically generate a diverse set of out-of-scope questions from post-cutoff documents.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation [19.543102037001134]
Language models (LMs) are known to suffer from hallucinations and misinformation.
Retrieval augmented generation (RAG) that retrieves verifiable information from an external knowledge corpus provides a tangible solution to these problems.
RAG generation quality is highly dependent on the relevance between a user's query and the retrieved documents.
arXiv Detail & Related papers (2024-10-10T19:14:55Z) - InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification [60.10193972862099]
This work proposes a framework to characterize and recover simplification-induced information loss in form of question-and-answer pairs.
QA pairs are designed to help readers deepen their knowledge of a text.
arXiv Detail & Related papers (2024-01-29T19:00:01Z) - Context Matters: Pushing the Boundaries of Open-Ended Answer Generation with Graph-Structured Knowledge Context [4.1229332722825]
This paper introduces a novel framework that combines graph-driven context retrieval in conjunction to knowledge graphs based enhancement.
We conduct experiments on various Large Language Models (LLMs) with different parameter sizes to evaluate their ability to ground knowledge and determine factual accuracy in answers to open-ended questions.
Our methodology GraphContextGen consistently outperforms dominant text-based retrieval systems, demonstrating its robustness and adaptability to a larger number of use cases.
arXiv Detail & Related papers (2024-01-23T11:25:34Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Large Language Models for Information Retrieval: A Survey [83.75872593741578]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.<n>Recent research has sought to leverage large language models (LLMs) to improve IR systems.<n>We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - Synergistic Interplay between Search and Large Language Models for
Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections.
InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z) - SmartBook: AI-Assisted Situation Report Generation for Intelligence Analysts [55.73424958012229]
This work identifies intelligence analysts' practices and preferences for AI assistance in situation report generation.
We introduce SmartBook, an automated framework designed to generate situation reports from large volumes of news data.
Our comprehensive evaluation of SmartBook, encompassing a user study alongside a content review with an editing study, reveals SmartBook's effectiveness in generating accurate and relevant situation reports.
arXiv Detail & Related papers (2023-03-25T03:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.