TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding
- URL: http://arxiv.org/abs/2506.21393v1
- Date: Thu, 26 Jun 2025 15:41:34 GMT
- Title: TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding
- Authors: Junwen Zhang, Pu Chen, Yin Zhang,
- Abstract summary: TableMoE is a neuro-symbolic Mixture-of-Connector-Experts (MoCE) architecture specifically designed for robust, structured reasoning over multimodal table data.<n>TableMoE features an innovative Neuro-Symbolic Routing mechanism, which predicts latent semantic token roles and dynamically routes table elements to specialized experts.<n>For evaluation, we curate and release four challenging WildStruct benchmarks, designed specifically to stress-test models under real-world multimodal degradation and structural complexity.
- Score: 3.404552731440374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal understanding of tables in real-world contexts is challenging due to the complexity of structure, symbolic density, and visual degradation (blur, skew, watermarking, incomplete structures or fonts, multi-span or hierarchically nested layouts). Existing multimodal large language models (MLLMs) struggle with such WildStruct conditions, resulting in limited performance and poor generalization. To address these challenges, we propose TableMoE, a neuro-symbolic Mixture-of-Connector-Experts (MoCE) architecture specifically designed for robust, structured reasoning over multimodal table data. TableMoE features an innovative Neuro-Symbolic Routing mechanism, which predicts latent semantic token roles (e.g., header, data cell, axis, formula) and dynamically routes table elements to specialized experts (Table-to-HTML, Table-to-JSON, Table-to-Code) using a confidence-aware gating strategy informed by symbolic reasoning graphs. To facilitate effective alignment-driven pretraining, we introduce the large-scale TableMoE-Align dataset, consisting of 1.2M table-HTML-JSON-code quadruples across finance, science, biomedicine and industry, utilized exclusively for model pretraining. For evaluation, we curate and release four challenging WildStruct benchmarks: WMMFinQA, WMMTatQA, WMMTabDialog, and WMMFinanceMath, designed specifically to stress-test models under real-world multimodal degradation and structural complexity. Experimental results demonstrate that TableMoE significantly surpasses existing state-of-the-art models. Extensive ablation studies validate each core component, emphasizing the critical role of Neuro-Symbolic Routing and structured expert alignment. Through qualitative analyses, we further showcase TableMoE's interpretability and enhanced robustness, underscoring the effectiveness of integrating neuro-symbolic reasoning for multimodal table understanding.
Related papers
- Tabular Data Understanding with LLMs: A Survey of Recent Advances and Challenges [22.054723113358865]
This paper introduces key concepts through a taxonomy of tabular input representations and an introduction of table understanding tasks.<n>Tables are two-dimensional, encompassing formats that range from well-structured database tables to complex, multi-layered spreadsheets, each with different purposes.<n>We highlight several critical gaps in the field that indicate the need for further research.
arXiv Detail & Related papers (2025-07-31T23:41:31Z) - Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables [48.39080455781475]
ChemTable is a large-scale benchmark of real-world chemical tables curated from the experimental sections of literature.<n>ChemTable includes expert-annotated cell polygons, logical layouts, and domain-specific labels, including reagents, catalysts, yields, and graphical components.<n>We evaluated a range of representative multimodal models, including both open-source and closed-source models, on ChemTable and reported a series of findings with practical and conceptual insights.
arXiv Detail & Related papers (2025-06-13T00:45:41Z) - Multimodal Tabular Reasoning with Privileged Structured Information [67.40011423365712]
We introduce TabUlar Reasoning with Bridged infOrmation (sc Turbo)<n>sc Turbo benefits from a structure-aware reasoning trace generator based on DeepSeek-R1.<n>sc Turbo achieves state-of-the-art performance ($+7.2%$ vs. previous SOTA) across multiple datasets.
arXiv Detail & Related papers (2025-06-04T15:46:30Z) - Enhancing Large Vision-Language Models with Layout Modality for Table Question Answering on Japanese Annual Securities Reports [4.2134954427867]
We propose a method to enhance LVLM-based table understanding by incorporating in-table textual content and layout features.<n> Experimental results demonstrate that these auxiliary modalities significantly improve performance.
arXiv Detail & Related papers (2025-05-23T08:36:22Z) - Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding [42.841205217768106]
"Tree-of-Table" is a novel approach designed to enhance LLMs' reasoning capabilities over large and complex tables.
We show that Tree-of-Table sets a new benchmark with superior performance, showcasing remarkable efficiency and generalization capabilities in large-scale table reasoning.
arXiv Detail & Related papers (2024-11-13T11:02:04Z) - TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.<n>TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.<n>Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z) - Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.24395216111462]
This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data.
We introduce MMTabQA, a new dataset designed for this purpose.
Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs.
arXiv Detail & Related papers (2024-08-25T15:17:43Z) - TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy [81.76462101465354]
We present a novel large vision-hugging model, TabPedia, equipped with a concept synergy mechanism.
This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering.
To better evaluate the VTU task in real-world scenarios, we establish a new and comprehensive table VQA benchmark, ComTQA.
arXiv Detail & Related papers (2024-06-03T13:54:05Z) - Benchmarking Diverse-Modal Entity Linking with Generative Models [78.93737257356784]
We construct a benchmark for diverse-modal EL (DMEL) from existing EL datasets.
To approach the DMEL task, we proposed a generative diverse-modal model (GDMM) following a multimodal-encoder-decoder paradigm.
GDMM builds a stronger DMEL baseline, outperforming state-of-the-art task-specific EL models by 8.51 F1 score on average.
arXiv Detail & Related papers (2023-05-27T02:38:46Z) - Neural Collaborative Graph Machines for Table Structure Recognition [18.759018425097747]
In this paper, we present a novel Neural Collaborative Graph Machines (NCGM) equipped with stacked collaborative blocks.
We show that the proposed NCGM can modulate collaborative pattern of different modalities conditioned on the context of intra-modality cues.
arXiv Detail & Related papers (2021-11-26T08:40:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.