Towards Efficient Quantity Retrieval from Text:An Approach via Description Parsing and Weak Supervision
- URL: http://arxiv.org/abs/2507.08322v2
- Date: Mon, 14 Jul 2025 05:02:01 GMT
- Title: Towards Efficient Quantity Retrieval from Text:An Approach via Description Parsing and Weak Supervision
- Authors: Yixuan Cao, Zhengrong Chen, Chengxuan Xia, Kun Wu, Ping Luo,
- Abstract summary: Given a description of a quantitative fact, the system returns the relevant value and supporting evidence.<n>We introduce a framework based on description parsing that converts text into structured (description, quantity) pairs for effective retrieval.<n>We evaluate our approach on a large corpus of financial annual reports and a newly annotated quantity description dataset.
- Score: 29.941457252419493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantitative facts are continually generated by companies and governments, supporting data-driven decision-making. While common facts are structured, many long-tail quantitative facts remain buried in unstructured documents, making them difficult to access. We propose the task of Quantity Retrieval: given a description of a quantitative fact, the system returns the relevant value and supporting evidence. Understanding quantity semantics in context is essential for this task. We introduce a framework based on description parsing that converts text into structured (description, quantity) pairs for effective retrieval. To improve learning, we construct a large paraphrase dataset using weak supervision based on quantity co-occurrence. We evaluate our approach on a large corpus of financial annual reports and a newly annotated quantity description dataset. Our method significantly improves top-1 retrieval accuracy from 30.98 percent to 64.66 percent.
Related papers
- StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional Evaluation [8.251302684712773]
StructText is an end-to-end framework for automatically generating high-fidelity benchmarks for key-value extraction from text.<n>We evaluate the proposed method on 71,539 examples across 49 documents.
arXiv Detail & Related papers (2025-07-28T21:20:44Z) - Map&Make: Schema Guided Text to Table Generation [41.52038779169547]
Text-to-Table generation is an essential task for information retrieval.<n>We introduce a versatile approach, Map&Make, which "dissects" text into propositional atomic statements.<n>Our approach is tested against two challenging datasets, Rotowire and Livesum.
arXiv Detail & Related papers (2025-05-29T07:12:46Z) - Structuring the Unstructured: A Multi-Agent System for Extracting and Querying Financial KPIs and Guidance [54.25184684077833]
We propose an efficient and scalable method for extracting quantitative insights from unstructured financial documents.<n>Our proposed system consists of two specialized agents: the emphExtraction Agent and the emphText-to-Agent
arXiv Detail & Related papers (2025-05-25T15:45:46Z) - Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization [7.218054628599005]
We study factual inconsistency errors and connect them with a line of discourse analysis.<n>We propose a framework that decomposes long texts into discourse-inspired chunks.
arXiv Detail & Related papers (2025-02-10T06:30:15Z) - Beyond Factual Accuracy: Evaluating Coverage of Diverse Factual Information in Long-form Text Generation [56.82274763974443]
ICAT is an evaluation framework for measuring coverage of diverse factual information in long-form text generation.<n>It computes the alignment between the atomic factual claims and various aspects expected to be presented in the output.<n>Our framework offers interpretable and fine-grained analysis of diversity and coverage.
arXiv Detail & Related papers (2025-01-07T05:43:23Z) - Beyond Relevant Documents: A Knowledge-Intensive Approach for Query-Focused Summarization using Large Language Models [27.90653125902507]
We propose a knowledge-intensive approach that reframes query-focused summarization as a knowledge-intensive task setup.
The retrieval module efficiently retrieves potentially relevant documents from a large-scale knowledge corpus.
The summarization controller seamlessly integrates a powerful large language model (LLM)-based summarizer with a carefully tailored prompt.
arXiv Detail & Related papers (2024-08-19T18:54:20Z) - CQE: A Comprehensive Quantity Extractor [2.2079886535603084]
We present a comprehensive quantity extraction framework from text data.
It efficiently detects combinations of values and units, the behavior of a quantity, and the concept a quantity is associated with.
Our framework makes use of dependency parsing and a dictionary of units, and it provides for a proper normalization and standardization of detected quantities.
arXiv Detail & Related papers (2023-05-15T17:59:41Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - Evaluation of Automatically Constructed Word Meaning Explanations [0.0]
We present a new tool that derives explanations automatically based on collective information from very large corpora.
We show that the presented approach allows to create explanations that contain data useful for understanding the word meaning in approximately 90% of cases.
arXiv Detail & Related papers (2023-02-27T09:47:55Z) - Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE)
In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE.
Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.