S2abEL: A Dataset for Entity Linking from Scientific Tables
- URL: http://arxiv.org/abs/2305.00366v1
- Date: Sun, 30 Apr 2023 02:07:22 GMT
- Title: S2abEL: A Dataset for Entity Linking from Scientific Tables
- Authors: Yuze Lou, Bailey Kuehl, Erin Bransom, Sergey Feldman, Aakanksha Naik,
Doug Downey
- Abstract summary: We present the first dataset for entity linking in scientific tables.
Our dataset, S2abEL, focuses on EL in machine learning results tables.
We introduce a neural baseline method designed for EL on scientific tables containing many out-of-knowledge-base mentions.
- Score: 15.300960829210164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Entity linking (EL) is the task of linking a textual mention to its
corresponding entry in a knowledge base, and is critical for many
knowledge-intensive NLP applications. When applied to tables in scientific
papers, EL is a step toward large-scale scientific knowledge bases that could
enable advanced scientific question answering and analytics. We present the
first dataset for EL in scientific tables. EL for scientific tables is
especially challenging because scientific knowledge bases can be very
incomplete, and disambiguating table mentions typically requires understanding
the papers's tet in addition to the table. Our dataset, S2abEL, focuses on EL
in machine learning results tables and includes hand-labeled cell types,
attributed sources, and entity links from the PaperswithCode taxonomy for 8,429
cells from 732 tables. We introduce a neural baseline method designed for EL on
scientific tables containing many out-of-knowledge-base mentions, and show that
it significantly outperforms a state-of-the-art generic table EL method. The
best baselines fall below human performance, and our analysis highlights
avenues for improvement.
Related papers
- SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - Towards Controlled Table-to-Text Generation with Scientific Reasoning [46.87189607486007]
We present a new task for generating fluent and logical descriptions that match user preferences over scientific data, aiming to automate scientific document analysis.
We construct a new challenging dataset,SciTab, consisting of table-description pairs extracted from the scientific literature, with highlighted cells and corresponding domain-specific knowledge base.
The results showed that large models struggle to produce accurate content that aligns with user preferences. As the first of its kind, our work should motivate further research in scientific domains.
arXiv Detail & Related papers (2023-12-08T22:57:35Z) - ACL-Fig: A Dataset for Scientific Figure Classification [15.241086410108512]
We develop a pipeline that extracts figures and tables from the scientific literature and a deep-learning-based framework that classifies scientific figures using visual features.
We build the first large-scale automatically annotated corpus, ACL-Fig, consisting of 112,052 scientific figures extracted from 56K research papers in the ACL Anthology.
The ACL-Fig-Pilot dataset contains 1,671 manually labeled scientific figures belonging to 19 categories.
arXiv Detail & Related papers (2023-01-28T20:27:35Z) - Data augmentation on graphs for table type classification [1.1859913430860336]
We address the classification of tables using a Graph Neural Network, exploiting the table structure for the message passing algorithm in use.
We achieve promising preliminary results, proposing a data augmentation method suitable for graph-based table representation.
arXiv Detail & Related papers (2022-08-23T21:54:46Z) - Entity Linking Meets Deep Learning: Techniques and Solutions [49.017379833990155]
We present a comprehensive review and analysis of existing deep learning based EL methods.
We propose a new taxonomy, which organizes existing DL based EL methods using three axes: embedding, feature, and algorithm.
We give a quantitative performance analysis of DL based EL methods over data sets.
arXiv Detail & Related papers (2021-09-26T07:57:38Z) - Tab2Know: Building a Knowledge Base from Tables in Scientific Papers [6.514665180383298]
We present Tab2Know, a new end-to-end system to build a Knowledge Base from tables in scientific papers.
We propose a pipeline that employs both statistical-based classifiers and logic-based reasoning.
An empirical evaluation of our approach using a corpus of papers in the Computer Science domain has returned satisfactory performance.
arXiv Detail & Related papers (2021-07-28T11:56:53Z) - LNN-EL: A Neuro-Symbolic Approach to Short-text Entity Linking [62.634516517844496]
We propose LNN-EL, a neuro-symbolic approach that combines the advantages of using interpretable rules with the performance of neural learning.
Even though constrained to using rules, LNN-EL performs competitively against SotA black-box neural approaches.
arXiv Detail & Related papers (2021-06-17T20:22:45Z) - Tab.IAIS: Flexible Table Recognition and Semantic Interpretation System [84.39812458417246]
We develop two rule-based algorithms that perform the complete table recognition process and support the most frequent table formats.
To incorporate the extraction of semantic information into the table recognition process, we develop a graph-based table interpretation method.
Our table recognition approach achieves results competitive with state-of-the-art approaches.
arXiv Detail & Related papers (2021-05-25T12:31:02Z) - Learning to Reason for Text Generation from Scientific Tables [100.61286775597947]
We introduce SciGen, a new challenge dataset for the task of reasoning-aware data-to-text generation.
Describing scientific tables goes beyond the surface realization of the table content and requires reasoning over table values.
We study the effectiveness of state-of-the-art data-to-text generation models on SciGen and evaluate the results using common metrics as well as human evaluation.
arXiv Detail & Related papers (2021-04-16T18:01:36Z) - TCN: Table Convolutional Network for Web Table Interpretation [52.32515851633981]
We propose a novel table representation learning approach considering both the intra- and inter-table contextual information.
Our method can outperform competitive baselines by +4.8% of F1 for column type prediction and by +4.1% of F1 for column pairwise relation prediction.
arXiv Detail & Related papers (2021-02-17T02:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.