Neural Collaborative Graph Machines for Table Structure Recognition
- URL: http://arxiv.org/abs/2111.13359v1
- Date: Fri, 26 Nov 2021 08:40:47 GMT
- Title: Neural Collaborative Graph Machines for Table Structure Recognition
- Authors: Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren
- Abstract summary: In this paper, we present a novel Neural Collaborative Graph Machines (NCGM) equipped with stacked collaborative blocks.
We show that the proposed NCGM can modulate collaborative pattern of different modalities conditioned on the context of intra-modality cues.
- Score: 18.759018425097747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, table structure recognition has achieved impressive progress with
the help of deep graph models. Most of them exploit single visual cues of
tabular elements or simply combine visual cues with other modalities via early
fusion to reason their graph relationships. However, neither early fusion nor
individually reasoning in terms of multiple modalities can be appropriate for
all varieties of table structures with great diversity. Instead, different
modalities are expected to collaborate with each other in different patterns
for different table cases. In the community, the importance of intra-inter
modality interactions for table structure reasoning is still unexplored. In
this paper, we define it as heterogeneous table structure recognition
(Hetero-TSR) problem. With the aim of filling this gap, we present a novel
Neural Collaborative Graph Machines (NCGM) equipped with stacked collaborative
blocks, which alternatively extracts intra-modality context and models
inter-modality interactions in a hierarchical way. It can represent the
intra-inter modality relationships of tabular elements more robustly, which
significantly improves the recognition performance. We also show that the
proposed NCGM can modulate collaborative pattern of different modalities
conditioned on the context of intra-modality cues, which is vital for
diversified table cases. Experimental results on benchmarks demonstrate our
proposed NCGM achieves state-of-the-art performance and beats other
contemporary methods by a large margin especially under challenging scenarios.
Related papers
- Hybrid Hypergraph Networks for Multimodal Sequence Data Classification [9.688069013427057]
We propose the hybrid hypergraph network (HHN), a novel framework that models temporal multimodal data via a segmentation-first, graph-later strategy.<n>HHN achieves state-of-the-art results on four multimodal datasets, demonstrating its effectiveness in complex classification tasks.
arXiv Detail & Related papers (2025-07-30T12:13:05Z) - TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding [3.404552731440374]
TableMoE is a neuro-symbolic Mixture-of-Connector-Experts (MoCE) architecture specifically designed for robust, structured reasoning over multimodal table data.<n>TableMoE features an innovative Neuro-Symbolic Routing mechanism, which predicts latent semantic token roles and dynamically routes table elements to specialized experts.<n>For evaluation, we curate and release four challenging WildStruct benchmarks, designed specifically to stress-test models under real-world multimodal degradation and structural complexity.
arXiv Detail & Related papers (2025-06-26T15:41:34Z) - LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment [18.365849722239865]
Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multi-modal knowledge graphs.
We propose a novel local-to-global interaction network for MMEA, termed as LoginMEA.
arXiv Detail & Related papers (2024-07-29T01:06:45Z) - GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation [68.63955715643974]
Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o)
We propose an innovative Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o)
arXiv Detail & Related papers (2024-07-08T01:06:13Z) - Histopathology Whole Slide Image Analysis with Heterogeneous Graph
Representation Learning [78.49090351193269]
We propose a novel graph-based framework to leverage the inter-relationships among different types of nuclei for WSI analysis.
Specifically, we formulate the WSI as a heterogeneous graph with "nucleus-type" attribute to each node and a semantic attribute similarity to each edge.
Our framework outperforms the state-of-the-art methods with considerable margins on various tasks.
arXiv Detail & Related papers (2023-07-09T14:43:40Z) - Multi-modal Contrastive Representation Learning for Entity Alignment [57.92705405276161]
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs.
We propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model.
In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions.
arXiv Detail & Related papers (2022-09-02T08:59:57Z) - Bayesian intrinsic groupwise registration via explicit hierarchical
disentanglement [18.374535632681884]
We propose a general framework which formulates groupwise registration as a procedure of hierarchical Bayesian inference.
Here, we propose a novel variational posterior and network architecture that facilitate joint learning of the common structural representation.
Results have demonstrated the efficacy of our framework in realizing multimodal groupwise registration in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-06T06:13:24Z) - Learning Multimodal VAEs through Mutual Supervision [72.77685889312889]
MEME combines information between modalities implicitly through mutual supervision.
We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes.
arXiv Detail & Related papers (2021-06-23T17:54:35Z) - Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural
Networks [68.9026534589483]
RioGNN is a novel Reinforced, recursive and flexible neighborhood selection guided multi-relational Graph Neural Network architecture.
RioGNN can learn more discriminative node embedding with enhanced explainability due to the recognition of individual importance of each relation.
arXiv Detail & Related papers (2021-04-16T04:30:06Z) - TCN: Table Convolutional Network for Web Table Interpretation [52.32515851633981]
We propose a novel table representation learning approach considering both the intra- and inter-table contextual information.
Our method can outperform competitive baselines by +4.8% of F1 for column type prediction and by +4.1% of F1 for column pairwise relation prediction.
arXiv Detail & Related papers (2021-02-17T02:18:10Z) - Self-Supervised Multimodal Domino: in Search of Biomarkers for
Alzheimer's Disease [19.86082635340699]
We propose a taxonomy of all reasonable ways to organize self-supervised representation-learning algorithms.
We first evaluate models on toy multimodal MNIST datasets and then apply them to a multimodal neuroimaging dataset with Alzheimer's disease patients.
Results show that the proposed approach outperforms previous self-supervised encoder-decoder methods.
arXiv Detail & Related papers (2020-12-25T20:28:13Z) - COBRA: Contrastive Bi-Modal Representation Algorithm [43.33840912256077]
We present a novel framework that aims to train two modalities in a joint fashion inspired by Contrastive Predictive Coding (CPC) and Noise Contrastive Estimation (NCE) paradigms.
We empirically show that this framework reduces the modality gap significantly and generates a robust and task agnostic joint-embedding space.
We outperform existing work on four diverse downstream tasks spanning across seven benchmark cross-modal datasets.
arXiv Detail & Related papers (2020-05-07T18:20:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.