Evaluating Table Structure Recognition: A New Perspective
- URL: http://arxiv.org/abs/2208.00385v1
- Date: Sun, 31 Jul 2022 07:48:36 GMT
- Title: Evaluating Table Structure Recognition: A New Perspective
- Authors: Tarun Kumar and Himanshu Sharad Bhatt
- Abstract summary: Existing metrics used to evaluate table structure recognition algorithms have shortcomings with regard to capturing text and empty cells alignment.
In this paper, we propose a new metric - TEDS based IOU similarity (TEDS (IOU)) for table structure recognition which uses bounding boxes instead of text while simultaneously being robust against the above disadvantages.
- Score: 2.1067139116005595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing metrics used to evaluate table structure recognition algorithms have
shortcomings with regard to capturing text and empty cells alignment. In this
paper, we build on prior work and propose a new metric - TEDS based IOU
similarity (TEDS (IOU)) for table structure recognition which uses bounding
boxes instead of text while simultaneously being robust against the above
disadvantages. We demonstrate the effectiveness of our metric against previous
metrics through various examples.
Related papers
- Using Similarity to Evaluate Factual Consistency in Summaries [2.7595794227140056]
Abstractive summarisers generate fluent summaries, but the factuality of the generated text is not guaranteed.
We propose a new zero-shot factuality evaluation metric, Sentence-BERTScore (SBERTScore), which compares sentences between the summary and the source document.
Our experiments indicate that each technique has different strengths, with SBERTScore particularly effective in identifying correct summaries.
arXiv Detail & Related papers (2024-09-23T15:02:38Z) - UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition [55.153629718464565]
We introduce UniTabNet, a novel framework for table structure parsing based on the image-to-text model.
UniTabNet employs a divide-and-conquer'' strategy, utilizing an image-to-text model to decouple table cells and integrating both physical and logical decoders to reconstruct the complete table structure.
arXiv Detail & Related papers (2024-09-20T01:26:32Z) - Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance [6.164970071786899]
We revisit recent code similarity evaluation metrics, particularly focusing on the application of Abstract Syntax Tree (AST) editing distance.
Our experiments showcase the effectiveness of AST editing distance in capturing intricate code structures, revealing a high correlation with established metrics.
We propose, optimize, and publish a metric that demonstrates effectiveness across all tested languages.
arXiv Detail & Related papers (2024-04-12T21:28:18Z) - ClusterTabNet: Supervised clustering method for table detection and table structure recognition [0.0]
We present a novel deep-learning-based method to cluster words in documents which we apply to detect and recognize tables given the OCR output.
We interpret table structure bottom-up as a graph of relations between pairs of words and use a transformer encoder model to predict its adjacency matrix.
Compared to the current state-of-the-art detection methods such as DETR and Faster R-CNN, our method achieves similar or better accuracy, while requiring a significantly smaller model.
arXiv Detail & Related papers (2024-02-12T09:10:24Z) - SEMv2: Table Separation Line Detection Based on Instance Segmentation [96.36188168694781]
We propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge)
We address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution.
To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB.
arXiv Detail & Related papers (2023-03-08T05:15:01Z) - TGRNet: A Table Graph Reconstruction Network for Table Structure
Recognition [76.06530816349763]
We propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition.
Specifically, the proposed method has two main branches, a cell detection branch and a cell logical location branch, to jointly predict the spatial location and the logical location of different cells.
arXiv Detail & Related papers (2021-06-20T01:57:05Z) - Table Structure Recognition using Top-Down and Bottom-Up Cues [28.65687982486627]
We present an approach for table structure recognition that combines cell detection and interaction modules.
We empirically validate our method on the publicly available real-world datasets.
arXiv Detail & Related papers (2020-10-09T13:32:53Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Tabular Structure Detection from Document Images for Resource
Constrained Devices Using A Row Based Similarity Measure [0.9814898713780167]
Tabular structures are used to present crucial information in a structured and crisp manner.
Most of the existing techniques detect tables from a document image by using prior knowledge of the structures of the tables.
arXiv Detail & Related papers (2020-08-26T21:59:27Z) - Towards Faithful Neural Table-to-Text Generation with Content-Matching
Constraints [63.84063384518667]
We propose a novel Transformer-based generation framework to achieve the goal.
Core techniques in our method to enforce faithfulness include a new table-text optimal-transport matching loss.
To evaluate faithfulness, we propose a new automatic metric specialized to the table-to-text generation problem.
arXiv Detail & Related papers (2020-05-03T02:54:26Z) - ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples.
We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.
While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.