TabSim: A Siamese Neural Network for Accurate Estimation of Table
Similarity
- URL: http://arxiv.org/abs/2008.10856v1
- Date: Tue, 25 Aug 2020 07:32:09 GMT
- Title: TabSim: A Siamese Neural Network for Accurate Estimation of Table
Similarity
- Authors: Maryam Habibi, Johannes Starlinger, Ulf Leser
- Abstract summary: We present TabSim, a novel method to compute table similarity scores using deep neural networks.
To train and evaluate our method, we created a gold standard corpus consisting of 1500 table pairs extracted from biomedical articles.
Our evaluation shows that TabSim outperforms other table similarity measures on average by app. 7% pp F1-score in a binary similarity classification setting and by app. 1.5% pp in a ranking scenario.
- Score: 5.889134549635538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tables are a popular and efficient means of presenting structured
information. They are used extensively in various kinds of documents including
web pages. Tables display information as a two-dimensional matrix, the
semantics of which is conveyed by a mixture of structure (rows, columns),
headers, caption, and content. Recent research has started to consider tables
as first class objects, not just as an addendum to texts, yielding interesting
results for problems like table matching, table completion, or value
imputation. All of these problems inherently rely on an accurate measure for
the semantic similarity of two tables. We present TabSim, a novel method to
compute table similarity scores using deep neural networks. Conceptually,
TabSim represents a table as a learned concatenation of embeddings of its
caption, its content, and its structure. Given two tables in this
representation, a Siamese neural network is trained to compute a score
correlating with the tables' semantic similarity. To train and evaluate our
method, we created a gold standard corpus consisting of 1500 table pairs
extracted from biomedical articles and manually scored regarding their degree
of similarity, and adopted two other corpora originally developed for a
different yet similar task. Our evaluation shows that TabSim outperforms other
table similarity measures on average by app. 7% pp F1-score in a binary
similarity classification setting and by app. 1.5% pp in a ranking scenario.
Related papers
- Is This a Bad Table? A Closer Look at the Evaluation of Table Generation from Text [21.699434525769586]
Existing measures for table quality evaluation fail to capture the overall semantics of the tables.
We propose TabEval, a novel table evaluation strategy that captures table semantics.
To validate our approach, we curate a dataset comprising of text descriptions for 1,250 diverse Wikipedia tables.
arXiv Detail & Related papers (2024-06-21T02:18:03Z) - SEMv2: Table Separation Line Detection Based on Instance Segmentation [96.36188168694781]
We propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge)
We address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution.
To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB.
arXiv Detail & Related papers (2023-03-08T05:15:01Z) - TRUST: An Accurate and End-to-End Table structure Recognizer Using
Splitting-based Transformers [56.56591337457137]
We propose an accurate and end-to-end transformer-based table structure recognition method, referred to as TRUST.
Transformers are suitable for table structure recognition because of their global computations, perfect memory, and parallel computation.
We conduct experiments on several popular benchmarks including PubTabNet and SynthTable, our method achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-08-31T08:33:36Z) - Data augmentation on graphs for table type classification [1.1859913430860336]
We address the classification of tables using a Graph Neural Network, exploiting the table structure for the message passing algorithm in use.
We achieve promising preliminary results, proposing a data augmentation method suitable for graph-based table representation.
arXiv Detail & Related papers (2022-08-23T21:54:46Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - TGRNet: A Table Graph Reconstruction Network for Table Structure
Recognition [76.06530816349763]
We propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition.
Specifically, the proposed method has two main branches, a cell detection branch and a cell logical location branch, to jointly predict the spatial location and the logical location of different cells.
arXiv Detail & Related papers (2021-06-20T01:57:05Z) - TCN: Table Convolutional Network for Web Table Interpretation [52.32515851633981]
We propose a novel table representation learning approach considering both the intra- and inter-table contextual information.
Our method can outperform competitive baselines by +4.8% of F1 for column type prediction and by +4.1% of F1 for column pairwise relation prediction.
arXiv Detail & Related papers (2021-02-17T02:18:10Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - Identifying Table Structure in Documents using Conditional Generative
Adversarial Networks [0.0]
In many industries and in academic research, information is primarily transmitted in the form of unstructured documents.
We propose a top-down approach, first using a conditional generative adversarial network to map a table image into a standardised skeleton' table form.
We then deriving latent table structure using xy-cut projection and Genetic Algorithm optimisation.
arXiv Detail & Related papers (2020-01-13T20:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.