Tab2Know: Building a Knowledge Base from Tables in Scientific Papers
- URL: http://arxiv.org/abs/2107.13306v1
- Date: Wed, 28 Jul 2021 11:56:53 GMT
- Title: Tab2Know: Building a Knowledge Base from Tables in Scientific Papers
- Authors: Benno Kruit, Hongyu He, Jacopo Urbani
- Abstract summary: We present Tab2Know, a new end-to-end system to build a Knowledge Base from tables in scientific papers.
We propose a pipeline that employs both statistical-based classifiers and logic-based reasoning.
An empirical evaluation of our approach using a corpus of papers in the Computer Science domain has returned satisfactory performance.
- Score: 6.514665180383298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tables in scientific papers contain a wealth of valuable knowledge for the
scientific enterprise. To help the many of us who frequently consult this type
of knowledge, we present Tab2Know, a new end-to-end system to build a Knowledge
Base (KB) from tables in scientific papers. Tab2Know addresses the challenge of
automatically interpreting the tables in papers and of disambiguating the
entities that they contain. To solve these problems, we propose a pipeline that
employs both statistical-based classifiers and logic-based reasoning. First,
our pipeline applies weakly supervised classifiers to recognize the type of
tables and columns, with the help of a data labeling system and an ontology
specifically designed for our purpose. Then, logic-based reasoning is used to
link equivalent entities (via sameAs links) in different tables. An empirical
evaluation of our approach using a corpus of papers in the Computer Science
domain has returned satisfactory performance. This suggests that ours is a
promising step to create a large-scale KB of scientific knowledge.
Related papers
- Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing.
We present TP-BERTa, a specifically pre-trained LM for tabular data prediction.
A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z) - A Practical Entity Linking System for Tables in Scientific Literature [2.093510158982825]
This paper introduces a general-purpose system for linking entities to items in the Wikidata knowledge base.
It describes how we adapt this system for linking domain-specific entities, especially for those entities embedded within tables drawn from COVID-19-related scientific literature.
arXiv Detail & Related papers (2023-06-12T01:40:57Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - Data augmentation on graphs for table type classification [1.1859913430860336]
We address the classification of tables using a Graph Neural Network, exploiting the table structure for the message passing algorithm in use.
We achieve promising preliminary results, proposing a data augmentation method suitable for graph-based table representation.
arXiv Detail & Related papers (2022-08-23T21:54:46Z) - Graph Neural Networks and Representation Embedding for Table Extraction
in PDF Documents [1.1859913430860336]
The main contribution of this work is to tackle the problem of table extraction, exploiting Graph Neural Networks.
We experimentally evaluated the proposed approach on a new dataset obtained by merging the information provided in the PubLayNet and PubTables-1M datasets.
arXiv Detail & Related papers (2022-08-23T21:36:01Z) - TGRNet: A Table Graph Reconstruction Network for Table Structure
Recognition [76.06530816349763]
We propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition.
Specifically, the proposed method has two main branches, a cell detection branch and a cell logical location branch, to jointly predict the spatial location and the logical location of different cells.
arXiv Detail & Related papers (2021-06-20T01:57:05Z) - Multi-Type-TD-TSR -- Extracting Tables from Document Images using a
Multi-stage Pipeline for Table Detection and Table Structure Recognition:
from OCR to Structured Table Representations [63.98463053292982]
The recognition of tables consists of two main tasks, namely table detection and table structure recognition.
Recent work shows a clear trend towards deep learning approaches coupled with the use of transfer learning for the task of table structure recognition.
We present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for the problem of table recognition.
arXiv Detail & Related papers (2021-05-23T21:17:18Z) - TUTA: Tree-based Transformers for Generally Structured Table
Pre-training [47.181660558590515]
Recent attempts on table understanding mainly focus on relational tables, yet overlook to other common table structures.
We propose TUTA, a unified pre-training architecture for understanding generally structured tables.
TUTA is highly effective, achieving state-of-the-art on five widely-studied datasets.
arXiv Detail & Related papers (2020-10-21T13:22:31Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - Novel Entity Discovery from Web Tables [21.16349961050804]
We leverage tables on the Web to discover new entities, properties, and relationships.
Our method identifies not only out-of-KB (novel'') information but also novel aliases for in-KB (known'') entities.
arXiv Detail & Related papers (2020-02-01T13:24:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.