HYTREL: Hypergraph-enhanced Tabular Data Representation Learning
- URL: http://arxiv.org/abs/2307.08623v2
- Date: Fri, 27 Oct 2023 01:51:48 GMT
- Title: HYTREL: Hypergraph-enhanced Tabular Data Representation Learning
- Authors: Pei Chen, Soumajyoti Sarkar, Leonard Lausen, Balasubramaniam
Srinivasan, Sheng Zha, Ruihong Huang and George Karypis
- Abstract summary: HYTREL is a language model that captures the row/column permutation invariances and three more structural properties of tabular data.
We show that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining.
Our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table.
- Score: 36.731257438472035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models pretrained on large collections of tabular data have
demonstrated their effectiveness in several downstream tasks. However, many of
these models do not take into account the row/column permutation invariances,
hierarchical structure, etc. that exist in tabular data. To alleviate these
limitations, we propose HYTREL, a tabular language model, that captures the
permutation invariances and three more structural properties of tabular data by
using hypergraphs - where the table cells make up the nodes and the cells
occurring jointly together in each row, column, and the entire table are used
to form three different types of hyperedges. We show that HYTREL is maximally
invariant under certain conditions for tabular data, i.e., two tables obtain
the same representations via HYTREL iff the two tables are identical up to
permutations. Our empirical results demonstrate that HYTREL consistently
outperforms other competitive baselines on four downstream tasks with minimal
pretraining, illustrating the advantages of incorporating the inductive biases
associated with tabular data into the representations. Finally, our qualitative
analyses showcase that HYTREL can assimilate the table structures to generate
robust representations for the cells, rows, columns, and the entire table.
Related papers
- LaTable: Towards Large Tabular Models [63.995130144110156]
Tabular generative foundation models are hard to build due to the heterogeneous feature spaces of different datasets.
LaTable is a novel diffusion model that addresses these challenges and can be trained across different datasets.
We find that LaTable outperforms baselines on in-distribution generation, and that finetuning LaTable can generate out-of-distribution datasets better with fewer samples.
arXiv Detail & Related papers (2024-06-25T16:03:50Z) - GridFormer: Towards Accurate Table Structure Recognition via Grid
Prediction [35.15882175670814]
We propose GridFormer, a novel approach for interpreting unconstrained table structures.
In this paper, we propose a flexible table representation in the form of an MXN grid.
Then, we introduce a DETR-style table structure recognizer to efficiently predict this multi-objective information of the grid in a single shot.
arXiv Detail & Related papers (2023-09-26T14:29:45Z) - SEMv2: Table Separation Line Detection Based on Instance Segmentation [96.36188168694781]
We propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge)
We address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution.
To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB.
arXiv Detail & Related papers (2023-03-08T05:15:01Z) - TGRNet: A Table Graph Reconstruction Network for Table Structure
Recognition [76.06530816349763]
We propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition.
Specifically, the proposed method has two main branches, a cell detection branch and a cell logical location branch, to jointly predict the spatial location and the logical location of different cells.
arXiv Detail & Related papers (2021-06-20T01:57:05Z) - TABBIE: Pretrained Representations of Tabular Data [22.444607481407633]
We devise a simple pretraining objective that learns exclusively from tabular data.
Unlike competing approaches, our model (TABBIE) provides embeddings of all table substructures.
A qualitative analysis of our model's learned cell, column, and row representations shows that it understands complex table semantics and numerical trends.
arXiv Detail & Related papers (2021-05-06T11:15:16Z) - Retrieving Complex Tables with Multi-Granular Graph Representation
Learning [20.72341939868327]
The task of natural language table retrieval seeks to retrieve semantically relevant tables based on natural language queries.
Existing learning systems treat tables as plain text based on the assumption that tables are structured as dataframes.
We propose Graph-based Table Retrieval (GTR), a generalizable NLTR framework with multi-granular graph representation learning.
arXiv Detail & Related papers (2021-05-04T20:19:03Z) - TCN: Table Convolutional Network for Web Table Interpretation [52.32515851633981]
We propose a novel table representation learning approach considering both the intra- and inter-table contextual information.
Our method can outperform competitive baselines by +4.8% of F1 for column type prediction and by +4.1% of F1 for column pairwise relation prediction.
arXiv Detail & Related papers (2021-02-17T02:18:10Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - Identifying Table Structure in Documents using Conditional Generative
Adversarial Networks [0.0]
In many industries and in academic research, information is primarily transmitted in the form of unstructured documents.
We propose a top-down approach, first using a conditional generative adversarial network to map a table image into a standardised skeleton' table form.
We then deriving latent table structure using xy-cut projection and Genetic Algorithm optimisation.
arXiv Detail & Related papers (2020-01-13T20:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.