Related papers: MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data

MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data

URL: http://arxiv.org/abs/2206.01347v1
Date: Fri, 3 Jun 2022 00:24:35 GMT
Title: MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data
Authors: Yilun Zhao, Yunxiang Li, Chenying Li, Rui Zhang
Abstract summary: Existing question answering benchmarks over hybrid data only include a single flat table in each document. We construct a new large-scale benchmark, MultiHiertt, with QA pairs over Multi Hierarchical Tabular and Textual data. Results show that MultiHiertt presents a strong challenge for existing baselines whose results lag far behind the performance of human experts.
Score: 7.063167712310221
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Numerical reasoning over hybrid data containing both textual and tabular content (e.g., financial reports) has recently attracted much attention in the NLP community. However, existing question answering (QA) benchmarks over hybrid data only include a single flat table in each document and thus lack examples of multi-step numerical reasoning across multiple hierarchical tables. To facilitate data analytical progress, we construct a new large-scale benchmark, MultiHiertt, with QA pairs over Multi Hierarchical Tabular and Textual data. MultiHiertt is built from a wealth of financial reports and has the following unique characteristics: 1) each document contain multiple tables and longer unstructured texts; 2) most of tables contained are hierarchical; 3) the reasoning process required for each question is more complex and challenging than existing benchmarks; and 4) fine-grained annotations of reasoning processes and supporting facts are provided to reveal complex numerical reasoning. We further introduce a novel QA model termed MT2Net, which first applies facts retrieving to extract relevant supporting facts from both tables and text and then uses a reasoning module to perform symbolic reasoning over retrieved facts. We conduct comprehensive experiments on various baselines. The experimental results show that MultiHiertt presents a strong challenge for existing baselines whose results lag far behind the performance of human experts. The dataset and code are publicly available at https://github.com/psunlpgroup/MultiHiertt.

Related papers

SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables [13.249024309069236]
Table-Text question answering tasks require models that can reason across long text and source tables, traversing multiple hops and executing complex operations such as aggregation.<n>We present SPARTA, an end-to-end construction framework that automatically generates largescale Table-Text QA benchmarks with lightweight human validation.<n>On SPARTA, state-of-the-art models that reach over 70 F1 on HybridQA or over 50 F1 on OTT-QA drop by more than 30 F1 points.
arXiv Detail & Related papers (2026-02-26T17:59:51Z)
DocHop-QA: Towards Multi-Hop Reasoning over Multimodal Document Collections [23.428084176322866]
We propose DocHop-QA, a large-scale benchmark for multimodal, multi-document, multi-hop question answering.<n> DocHop-QA is domain-agnostic and incorporates diverse information formats, including textual passages, tables, and structural layout cues.<n>We evaluate DocHop-QA through four tasks spanning structured index prediction, generative answering, and multimodal integration.
arXiv Detail & Related papers (2025-08-20T08:17:45Z)
Benchmarking Multimodal Understanding and Complex Reasoning for ESG Tasks [56.350173737493215]
Environmental, Social, and Governance (ESG) reports are essential for evaluating sustainability practices, ensuring regulatory compliance, and promoting financial transparency.<n>MMESGBench is a first-of-its-kind benchmark dataset to evaluate multimodal understanding and complex reasoning across structurally diverse and multi-source ESG documents.<n>MMESGBench comprises 933 validated QA pairs derived from 45 ESG documents, spanning across seven distinct document types and three major ESG source categories.
arXiv Detail & Related papers (2025-07-25T03:58:07Z)
MTabVQA: Evaluating Multi-Tabular Reasoning of Language Models in Visual Space [16.35255926212628]
We introduce MTabVQA, a novel benchmark specifically designed for multi-tabular visual question answering.<n>MTabVQA comprises 3,745 complex question-answer pairs that necessitate multi-hop reasoning across several visually rendered table images.<n>We show that fine-tuning VLMs with MTabVQA-Instruct substantially improves their performance on visual multi-tabular reasoning.
arXiv Detail & Related papers (2025-06-13T11:21:00Z)
Multimodal Tabular Reasoning with Privileged Structured Information [67.40011423365712]
We introduce TabUlar Reasoning with Bridged infOrmation (sc Turbo)<n>sc Turbo benefits from a structure-aware reasoning trace generator based on DeepSeek-R1.<n>sc Turbo achieves state-of-the-art performance ($+7.2%$ vs. previous SOTA) across multiple datasets.
arXiv Detail & Related papers (2025-06-04T15:46:30Z)
TANQ: An open domain dataset of table answered questions [15.323690523538572]
TANQ is the first open domain question answering dataset where the answers require building tables from information across multiple sources. We release the full source attribution for every cell in the resulting table and benchmark state-of-the-art language models in open, oracle, and closed book setups. Our best-performing baseline, GPT4 reaches an overall F1 score of 29.1, lagging behind human performance by 19.7 points.
arXiv Detail & Related papers (2024-05-13T14:07:20Z)
QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs [63.98556480088152]
Table summarization is a crucial task aimed at condensing information into concise and comprehensible textual summaries. We propose a novel method to address these limitations by introducing query-focused multi-table summarization. Our approach, which comprises a table serialization module, a summarization controller, and a large language model, generates query-dependent table summaries tailored to users' information needs.
arXiv Detail & Related papers (2024-05-08T15:05:55Z)
QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions. We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning. We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z)
MultiTabQA: Generating Tabular Answers for Multi-Table Question Answering [61.48881995121938]
Real-world queries are complex in nature, often over multiple tables in a relational database or web page. Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers.
arXiv Detail & Related papers (2023-05-22T08:25:15Z)
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical Graphs [79.0426838808629]
We propose TAT-DQA, i.e. to answer the question over a visually-rich table-text document. Specifically, we propose a novel Doc2SoarGraph framework with enhanced discrete reasoning capability. We conduct extensive experiments on TAT-DQA dataset, and the results show that our proposed framework outperforms the best baseline model by 17.73% and 16.91% in terms of Exact Match (EM) and F1 score respectively on the test set.
arXiv Detail & Related papers (2023-05-03T07:30:32Z)
Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language. We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs. We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z)
Multi-Row, Multi-Span Distant Supervision For Table+Text Question [33.809732338627136]
Question answering (QA) over tables and linked text, also called TextTableQA, has witnessed significant research in recent years. We present MITQA, a transformer-based TextTableQA system that is explicitly designed to cope with distant supervision along both these axes.
arXiv Detail & Related papers (2021-12-14T12:48:19Z)
HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation [35.73434495391091]
Hierarchical tables challenge existing methods by hierarchical indexing, as well as implicit relationships of calculation and semantics. This work presents HiTab, a free and open dataset for the research community to study question answering (QA) and natural language generation (NLG) over hierarchical tables.
arXiv Detail & Related papers (2021-08-15T10:14:21Z)
Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.