ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX
- URL: http://arxiv.org/abs/2105.14426v1
- Date: Sun, 30 May 2021 04:17:55 GMT
- Title: ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX
- Authors: Pratik Kayal, Mrinal Anand, Harsh Desai, Mayank Singh
- Abstract summary: This paper discusses the dataset, tasks, participants' methods, and results of the ICDAR 2021 Competition on Scientific Table Image Recognition.
We propose two subtasks: reconstruct the structure code from an image, and reconstruct the content code from an image.
This report describes the datasets and ground truth specification, details the performance evaluation metrics used, presents the final results, and summarizes the participating methods.
- Score: 1.149654395906819
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tables present important information concisely in many scientific documents.
Visual features like mathematical symbols, equations, and spanning cells make
structure and content extraction from tables embedded in research documents
difficult. This paper discusses the dataset, tasks, participants' methods, and
results of the ICDAR 2021 Competition on Scientific Table Image Recognition to
LaTeX. Specifically, the task of the competition is to convert a tabular image
to its corresponding LaTeX source code. We proposed two subtasks. In Subtask 1,
we ask the participants to reconstruct the LaTeX structure code from an image.
In Subtask 2, we ask the participants to reconstruct the LaTeX content code
from an image. This report describes the datasets and ground truth
specification, details the performance evaluation metrics used, presents the
final results, and summarizes the participating methods. Submission by team
VCGroup got the highest Exact Match accuracy score of 74% for Subtask 1 and 55%
for Subtask 2, beating previous baselines by 5% and 12%, respectively. Although
improvements can still be made to the recognition capabilities of models, this
competition contributes to the development of fully automated table recognition
systems by challenging practitioners to solve problems under specific
constraints and sharing their approaches; the platform will remain available
for post-challenge submissions at
https://competitions.codalab.org/competitions/26979 .
Related papers
- LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement [11.931911831112357]
LATTE improves the source extraction accuracy of both formulae and tables, outperforming existing techniques as well as GPT-4V.
This paper proposes LATTE, the first iterative refinement framework for recognition.
arXiv Detail & Related papers (2024-09-21T17:18:49Z) - CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs [62.84082370758761]
CharXiv is a comprehensive evaluation suite involving 2,323 charts from arXiv papers.
To ensure quality, all charts and questions are handpicked, curated, and verified by human experts.
Results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model.
arXiv Detail & Related papers (2024-06-26T17:50:11Z) - VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models [76.94378391979228]
We introduce a new, more demanding task known as Interleaved Image-Text (IITC)
This task challenges models to discern and disregard superfluous elements in both images and text to accurately answer questions.
In support of this task, we further craft a new VEGA dataset, tailored for the IITC task on scientific content, and devised a subtask, Image-Text Association (ITA)
arXiv Detail & Related papers (2024-06-14T17:59:40Z) - MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition [2.325171167252542]
We present an improved version of the benchmark dataset im2latex-100k, featuring 30 fonts instead of one.
Second, we introduce the real-world dataset realFormula, with MEs extracted from papers.
Third, we developed a MER model, MathNet, based on a convolutional vision transformer, with superior results on all four test sets.
arXiv Detail & Related papers (2024-04-21T14:03:34Z) - Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text
Documents via Semantic-Oriented Hierarchical Graphs [79.0426838808629]
We propose TAT-DQA, i.e. to answer the question over a visually-rich table-text document.
Specifically, we propose a novel Doc2SoarGraph framework with enhanced discrete reasoning capability.
We conduct extensive experiments on TAT-DQA dataset, and the results show that our proposed framework outperforms the best baseline model by 17.73% and 16.91% in terms of Exact Match (EM) and F1 score respectively on the test set.
arXiv Detail & Related papers (2023-05-03T07:30:32Z) - Texts as Images in Prompt Tuning for Multi-Label Image Recognition [70.9310322461598]
We advocate that image-text contrastive learning makes it feasible to treat texts as images for prompt tuning and introduce TaI prompting.
Particularly, we apply TaI prompting to multi-label image recognition, where sentences in the wild serve as alternatives to images for prompt tuning.
Our proposed TaI-DPT outperforms zero-shot CLIP by a large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-11-23T07:00:11Z) - Tables to LaTeX: structure and content extraction from scientific tables [0.848135258677752]
We adapt the transformer-based language modeling paradigm for scientific table structure and content extraction.
We achieve an exact match accuracy of 70.35 and 49.69% on table structure and content extraction, respectively.
arXiv Detail & Related papers (2022-10-31T12:08:39Z) - Mixed-modality Representation Learning and Pre-training for Joint
Table-and-Text Retrieval in OpenQA [85.17249272519626]
An optimized OpenQA Table-Text Retriever (OTTeR) is proposed.
We conduct retrieval-centric mixed-modality synthetic pre-training.
OTTeR substantially improves the performance of table-and-text retrieval on the OTT-QA dataset.
arXiv Detail & Related papers (2022-10-11T07:04:39Z) - Text-Based Person Search with Limited Data [66.26504077270356]
Text-based person search (TBPS) aims at retrieving a target person from an image gallery with a descriptive text query.
We present a framework with two novel components to handle the problems brought by limited data.
arXiv Detail & Related papers (2021-10-20T22:20:47Z) - PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Table
Image Recognition to Latex [16.003357804292513]
ICDAR 2021 Competition has two sub-tasks: Table Structure Reconstruction (TSR) and Table Content Reconstruction (TCR)
We leverage our previously proposed algorithm MASTER citelu 2019master, which is originally proposed for scene text recognition.
Our method achieves 0.7444 Exact Match and 0.8765 Exact Match @95% on the TSR task, and obtains 0.5586 Exact Match and 0.7386 Exact Match 95% on the TCR task.
arXiv Detail & Related papers (2021-05-05T03:15:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.