A Study on Reproducibility and Replicability of Table Structure
Recognition Methods
- URL: http://arxiv.org/abs/2304.10439v1
- Date: Thu, 20 Apr 2023 16:30:58 GMT
- Title: A Study on Reproducibility and Replicability of Table Structure
Recognition Methods
- Authors: Kehinde Ajayi, Muntabhir Hasan Choudhury, Sarah Rajtmajer, and Jian Wu
- Abstract summary: We examine both and replicability of a corpus of 16 papers on table structure recognition (TSR)
We reproduce results consistent with the original in only four of the 16 papers studied.
No paper is identified as replicable using the new dataset.
- Score: 3.8366337377024298
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Concerns about reproducibility in artificial intelligence (AI) have emerged,
as researchers have reported unsuccessful attempts to directly reproduce
published findings in the field. Replicability, the ability to affirm a finding
using the same procedures on new data, has not been well studied. In this
paper, we examine both reproducibility and replicability of a corpus of 16
papers on table structure recognition (TSR), an AI task aimed at identifying
cell locations of tables in digital documents. We attempt to reproduce
published results using codes and datasets provided by the original authors. We
then examine replicability using a dataset similar to the original as well as a
new dataset, GenTSR, consisting of 386 annotated tables extracted from
scientific papers. Out of 16 papers studied, we reproduce results consistent
with the original in only four. Two of the four papers are identified as
replicable using the similar dataset under certain IoU values. No paper is
identified as replicable using the new dataset. We offer observations on the
causes of irreproducibility and irreplicability. All code and data are
available on Codeocean at https://codeocean.com/capsule/6680116/tree.
Related papers
- SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models [58.34560740973768]
We introduce a framework that leverages language models (LMs) to generate literature review tables.
A new dataset of 2,228 literature review tables extracted from ArXiv papers synthesize a total of 7,542 research papers.
We evaluate LMs' abilities to reconstruct reference tables, finding this task benefits from additional context.
arXiv Detail & Related papers (2024-10-25T18:31:50Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank [37.15089945367366]
We investigate the conditions under which relevance can be recovered from click data.
The recovery of relevance is feasible if and only if the identifiability graph (IG) is connected.
We introduce two methods, namely node intervention and node merging, designed to modify the dataset and restore the connectivity of the IG.
arXiv Detail & Related papers (2023-09-27T10:31:58Z) - Replication: Contrastive Learning and Data Augmentation in Traffic
Classification Using a Flowpic Input Representation [47.95762911696397]
We reproduce [16] on the same datasets and replicate its most salient aspect (the importance of data augmentation) on three additional public datasets.
While we confirm most of the original results, we also found a 20% accuracy drop on some of the investigated scenarios due to a data shift in the original dataset.
arXiv Detail & Related papers (2023-09-18T12:55:09Z) - arXiVeri: Automatic table verification with GPT [44.388120096898554]
We propose a novel task of automatic table verification (AutoTV)
The objective is to verify the accuracy of numerical data in tables by cross-referencing cited sources.
By leveraging the flexible capabilities of modern large language models (LLMs), we propose simple baselines for table verification.
arXiv Detail & Related papers (2023-06-13T17:59:57Z) - Replicable Reinforcement Learning [15.857503103543308]
We provide a provably replicable algorithm for parallel value iteration, and a provably replicable version of R-max in the episodic setting.
These are the first formal replicability results for control problems, which present different challenges for replication than batch learning settings.
arXiv Detail & Related papers (2023-05-24T16:05:15Z) - Deconstructing Self-Supervised Monocular Reconstruction: The Design
Decisions that Matter [63.5550818034739]
This paper presents a framework to evaluate state-of-the-art contributions to self-supervised monocular depth estimation.
It includes pretraining, backbone, architectural design choices and loss functions.
We re-implement, validate and re-evaluate 16 state-of-the-art contributions and introduce a new dataset.
arXiv Detail & Related papers (2022-08-02T14:38:53Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.