Handling big tabular data of ICT supply chains: a multi-task,
machine-interpretable approach
- URL: http://arxiv.org/abs/2208.06031v1
- Date: Thu, 11 Aug 2022 20:29:45 GMT
- Title: Handling big tabular data of ICT supply chains: a multi-task,
machine-interpretable approach
- Authors: Bin Xiao, Murat Simsek, Burak Kantarci and Ala Abu Alkheir
- Abstract summary: We define a Table Structure Recognition (TSR) task and a Table Cell Type Classification (CTC) task.
Our proposed method can outperform state-of-the-art methods on ICDAR2013 and UNLV datasets.
- Score: 13.976736586808308
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Due to the characteristics of Information and Communications Technology (ICT)
products, the critical information of ICT devices is often summarized in big
tabular data shared across supply chains. Therefore, it is critical to
automatically interpret tabular structures with the surging amount of
electronic assets. To transform the tabular data in electronic documents into a
machine-interpretable format and provide layout and semantic information for
information extraction and interpretation, we define a Table Structure
Recognition (TSR) task and a Table Cell Type Classification (CTC) task. We use
a graph to represent complex table structures for the TSR task. Meanwhile,
table cells are categorized into three groups based on their functional roles
for the CTC task, namely Header, Attribute, and Data. Subsequently, we propose
a multi-task model to solve the defined two tasks simultaneously by using the
text modal and image modal features. Our experimental results show that our
proposed method can outperform state-of-the-art methods on ICDAR2013 and UNLV
datasets.
Related papers
- TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy [51.23025356179886]
We present a novel large vision-hugging model, TabPedia, equipped with a concept synergy mechanism.
This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering.
We establish a new and comprehensive table VQA benchmark, ComTQA, featuring approximately 9,000 QA pairs.
arXiv Detail & Related papers (2024-06-03T13:54:05Z) - UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining [22.031699293366486]
We present UniTable, a training framework that unifies the training paradigm and training objective of table recognition.
Our framework unifies the training objectives of all three TR tasks into a unified task-agnostic training objective: language modeling.
UniTable's table parsing capability has surpassed both existing TR methods and general large vision-language models.
arXiv Detail & Related papers (2024-03-07T15:44:50Z) - StructChart: Perception, Structuring, Reasoning for Visual Chart
Understanding [58.38480335579541]
Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data.
In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks.
Experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm.
arXiv Detail & Related papers (2023-09-20T12:51:13Z) - Factorized Contrastive Learning: Going Beyond Multi-view Redundancy [116.25342513407173]
This paper proposes FactorCL, a new multimodal representation learning method to go beyond multi-view redundancy.
On large-scale real-world datasets, FactorCL captures both shared and unique information and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-06-08T15:17:04Z) - Efficient Information Sharing in ICT Supply Chain Social Network via
Table Structure Recognition [12.79419287446918]
Table Structure Recognition (TSR) aims to represent tables with complex structures in a machine-interpretable format.
We implement our proposed method based on Faster-RCNN and achieve 94.79% on mean Average Precision (AP)
arXiv Detail & Related papers (2022-11-03T20:03:07Z) - SubTab: Subsetting Features of Tabular Data for Self-Supervised
Representation Learning [5.5616364225463055]
We introduce a new framework, Subsetting features of Tabular data (SubTab)
In this paper, we introduce a new framework, Subsetting features of Tabular data (SubTab)
We argue that reconstructing the data from the subset of its features rather than its corrupted version in an autoencoder setting can better capture its underlying representation.
arXiv Detail & Related papers (2021-10-08T20:11:09Z) - Multi-Type-TD-TSR -- Extracting Tables from Document Images using a
Multi-stage Pipeline for Table Detection and Table Structure Recognition:
from OCR to Structured Table Representations [63.98463053292982]
The recognition of tables consists of two main tasks, namely table detection and table structure recognition.
Recent work shows a clear trend towards deep learning approaches coupled with the use of transfer learning for the task of table structure recognition.
We present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for the problem of table recognition.
arXiv Detail & Related papers (2021-05-23T21:17:18Z) - TCN: Table Convolutional Network for Web Table Interpretation [52.32515851633981]
We propose a novel table representation learning approach considering both the intra- and inter-table contextual information.
Our method can outperform competitive baselines by +4.8% of F1 for column type prediction and by +4.1% of F1 for column pairwise relation prediction.
arXiv Detail & Related papers (2021-02-17T02:18:10Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.