Related papers: Neural Collaborative Graph Machines for Table Structure Recognition

Neural Collaborative Graph Machines for Table Structure Recognition

URL: http://arxiv.org/abs/2111.13359v1
Date: Fri, 26 Nov 2021 08:40:47 GMT
Title: Neural Collaborative Graph Machines for Table Structure Recognition
Authors: Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren
Abstract summary: In this paper, we present a novel Neural Collaborative Graph Machines (NCGM) equipped with stacked collaborative blocks. We show that the proposed NCGM can modulate collaborative pattern of different modalities conditioned on the context of intra-modality cues.
Score: 18.759018425097747
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, table structure recognition has achieved impressive progress with the help of deep graph models. Most of them exploit single visual cues of tabular elements or simply combine visual cues with other modalities via early fusion to reason their graph relationships. However, neither early fusion nor individually reasoning in terms of multiple modalities can be appropriate for all varieties of table structures with great diversity. Instead, different modalities are expected to collaborate with each other in different patterns for different table cases. In the community, the importance of intra-inter modality interactions for table structure reasoning is still unexplored. In this paper, we define it as heterogeneous table structure recognition (Hetero-TSR) problem. With the aim of filling this gap, we present a novel Neural Collaborative Graph Machines (NCGM) equipped with stacked collaborative blocks, which alternatively extracts intra-modality context and models inter-modality interactions in a hierarchical way. It can represent the intra-inter modality relationships of tabular elements more robustly, which significantly improves the recognition performance. We also show that the proposed NCGM can modulate collaborative pattern of different modalities conditioned on the context of intra-modality cues, which is vital for diversified table cases. Experimental results on benchmarks demonstrate our proposed NCGM achieves state-of-the-art performance and beats other contemporary methods by a large margin especially under challenging scenarios.

Related papers

Hybrid Hypergraph Networks for Multimodal Sequence Data Classification [9.688069013427057]
We propose the hybrid hypergraph network (HHN), a novel framework that models temporal multimodal data via a segmentation-first, graph-later strategy.<n>HHN achieves state-of-the-art results on four multimodal datasets, demonstrating its effectiveness in complex classification tasks.
arXiv Detail & Related papers (2025-07-30T12:13:05Z)
TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding [3.404552731440374]
TableMoE is a neuro-symbolic Mixture-of-Connector-Experts (MoCE) architecture specifically designed for robust, structured reasoning over multimodal table data.<n>TableMoE features an innovative Neuro-Symbolic Routing mechanism, which predicts latent semantic token roles and dynamically routes table elements to specialized experts.<n>For evaluation, we curate and release four challenging WildStruct benchmarks, designed specifically to stress-test models under real-world multimodal degradation and structural complexity.
arXiv Detail & Related papers (2025-06-26T15:41:34Z)
LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment [18.365849722239865]
Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multi-modal knowledge graphs. We propose a novel local-to-global interaction network for MMEA, termed as LoginMEA.
arXiv Detail & Related papers (2024-07-29T01:06:45Z)
GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation [68.63955715643974]
Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o) We propose an innovative Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o)
arXiv Detail & Related papers (2024-07-08T01:06:13Z)
Histopathology Whole Slide Image Analysis with Heterogeneous Graph Representation Learning [78.49090351193269]
We propose a novel graph-based framework to leverage the inter-relationships among different types of nuclei for WSI analysis. Specifically, we formulate the WSI as a heterogeneous graph with "nucleus-type" attribute to each node and a semantic attribute similarity to each edge. Our framework outperforms the state-of-the-art methods with considerable margins on various tasks.
arXiv Detail & Related papers (2023-07-09T14:43:40Z)
Multi-modal Contrastive Representation Learning for Entity Alignment [57.92705405276161]
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs. We propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model. In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions.
arXiv Detail & Related papers (2022-09-02T08:59:57Z)
Bayesian intrinsic groupwise registration via explicit hierarchical disentanglement [18.374535632681884]
We propose a general framework which formulates groupwise registration as a procedure of hierarchical Bayesian inference. Here, we propose a novel variational posterior and network architecture that facilitate joint learning of the common structural representation. Results have demonstrated the efficacy of our framework in realizing multimodal groupwise registration in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-06T06:13:24Z)
Learning Multimodal VAEs through Mutual Supervision [72.77685889312889]
MEME combines information between modalities implicitly through mutual supervision. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes.
arXiv Detail & Related papers (2021-06-23T17:54:35Z)
Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks [68.9026534589483]
RioGNN is a novel Reinforced, recursive and flexible neighborhood selection guided multi-relational Graph Neural Network architecture. RioGNN can learn more discriminative node embedding with enhanced explainability due to the recognition of individual importance of each relation.
arXiv Detail & Related papers (2021-04-16T04:30:06Z)
TCN: Table Convolutional Network for Web Table Interpretation [52.32515851633981]
We propose a novel table representation learning approach considering both the intra- and inter-table contextual information. Our method can outperform competitive baselines by +4.8% of F1 for column type prediction and by +4.1% of F1 for column pairwise relation prediction.
arXiv Detail & Related papers (2021-02-17T02:18:10Z)
Self-Supervised Multimodal Domino: in Search of Biomarkers for Alzheimer's Disease [19.86082635340699]
We propose a taxonomy of all reasonable ways to organize self-supervised representation-learning algorithms. We first evaluate models on toy multimodal MNIST datasets and then apply them to a multimodal neuroimaging dataset with Alzheimer's disease patients. Results show that the proposed approach outperforms previous self-supervised encoder-decoder methods.
arXiv Detail & Related papers (2020-12-25T20:28:13Z)
COBRA: Contrastive Bi-Modal Representation Algorithm [43.33840912256077]
We present a novel framework that aims to train two modalities in a joint fashion inspired by Contrastive Predictive Coding (CPC) and Noise Contrastive Estimation (NCE) paradigms. We empirically show that this framework reduces the modality gap significantly and generates a robust and task agnostic joint-embedding space. We outperform existing work on four diverse downstream tasks spanning across seven benchmark cross-modal datasets.
arXiv Detail & Related papers (2020-05-07T18:20:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.