An End-to-End Multi-Task Learning Model for Image-based Table
Recognition
- URL: http://arxiv.org/abs/2303.08648v2
- Date: Wed, 29 Mar 2023 07:51:05 GMT
- Title: An End-to-End Multi-Task Learning Model for Image-based Table
Recognition
- Authors: Nam Tuan Ly and Atsuhiro Takasu
- Abstract summary: We propose an end-to-end multi-task learning model for image-based table recognition.
The proposed model consists of one shared encoder, one shared decoder, and three separate decoders.
The whole system can be easily trained and inferred in an end-to-end approach.
- Score: 4.530704014707227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-based table recognition is a challenging task due to the diversity of
table styles and the complexity of table structures. Most of the previous
methods focus on a non-end-to-end approach which divides the problem into two
separate sub-problems: table structure recognition; and cell-content
recognition and then attempts to solve each sub-problem independently using two
separate systems. In this paper, we propose an end-to-end multi-task learning
model for image-based table recognition. The proposed model consists of one
shared encoder, one shared decoder, and three separate decoders which are used
for learning three sub-tasks of table recognition: table structure recognition,
cell detection, and cell-content recognition. The whole system can be easily
trained and inferred in an end-to-end approach. In the experiments, we evaluate
the performance of the proposed model on two large-scale datasets: FinTabNet
and PubTabNet. The experiment results show that the proposed model outperforms
the state-of-the-art methods in all benchmark datasets.
Related papers
- Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.24395216111462]
This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data.
We introduce MMTabQA, a new dataset designed for this purpose.
Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs.
arXiv Detail & Related papers (2024-08-25T15:17:43Z) - Multi-Cell Decoder and Mutual Learning for Table Structure and Character Recognition [1.2328446298523066]
We propose a multi-cell content decoder and bidirectional mutual learning mechanism to improve the end-to-end approach.
The effectiveness is demonstrated on two large datasets, and the experimental results show comparable performance to state-of-the-art models.
arXiv Detail & Related papers (2024-04-20T04:30:38Z) - TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content [39.34067105360439]
We propose an end-to-end pipeline that integrates deep learning models, including DETR, CascadeTabNet, and PP OCR v2, to achieve comprehensive image-based table recognition.
Our system achieves simultaneous table detection (TD), table structure recognition (TSR), and table content recognition (TCR)
Our proposed approach achieves an IOU of 0.96 and an OCR Accuracy of 78%, showcasing a remarkable improvement of approximately 25% in the OCR Accuracy compared to the previous Table Transformer approach.
arXiv Detail & Related papers (2024-04-16T06:24:53Z) - Reliable Representations Learning for Incomplete Multi-View Partial Multi-Label Classification [78.15629210659516]
In this paper, we propose an incomplete multi-view partial multi-label classification network named RANK.
We break through the view-level weights inherent in existing methods and propose a quality-aware sub-network to dynamically assign quality scores to each view of each sample.
Our model is not only able to handle complete multi-view multi-label datasets, but also works on datasets with missing instances and labels.
arXiv Detail & Related papers (2023-03-30T03:09:25Z) - Multiview Representation Learning from Crowdsourced Triplet Comparisons [23.652378640389756]
Triplet similarity comparison is a type of crowdsourcing task.
Crowd workers are asked the question.
among three given objects, which two are more similar?
arXiv Detail & Related papers (2023-02-08T10:51:44Z) - Cross-view Graph Contrastive Representation Learning on Partially
Aligned Multi-view Data [52.491074276133325]
Multi-view representation learning has developed rapidly over the past decades and has been applied in many fields.
We propose a new cross-view graph contrastive learning framework, which integrates multi-view information to align data and learn latent representations.
Experiments conducted on several real datasets demonstrate the effectiveness of the proposed method on the clustering and classification tasks.
arXiv Detail & Related papers (2022-11-08T09:19:32Z) - Multi-modal Contrastive Representation Learning for Entity Alignment [57.92705405276161]
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs.
We propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model.
In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions.
arXiv Detail & Related papers (2022-09-02T08:59:57Z) - TSK Fuzzy System Towards Few Labeled Incomplete Multi-View Data
Classification [24.01191516774655]
A transductive semi-supervised incomplete multi-view TSK fuzzy system modeling method (SSIMV_TSK) is proposed to address these challenges.
The proposed method integrates missing view imputation, pseudo label learning of unlabeled data, and fuzzy system modeling into a single process to yield a model with interpretable fuzzy rules.
Experimental results on real datasets show that the proposed method significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-10-08T11:41:06Z) - Multi-Type-TD-TSR -- Extracting Tables from Document Images using a
Multi-stage Pipeline for Table Detection and Table Structure Recognition:
from OCR to Structured Table Representations [63.98463053292982]
The recognition of tables consists of two main tasks, namely table detection and table structure recognition.
Recent work shows a clear trend towards deep learning approaches coupled with the use of transfer learning for the task of table structure recognition.
We present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for the problem of table recognition.
arXiv Detail & Related papers (2021-05-23T21:17:18Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.