Symlink: A New Dataset for Scientific Symbol-Description Linking
- URL: http://arxiv.org/abs/2204.12070v1
- Date: Tue, 26 Apr 2022 04:36:14 GMT
- Title: Symlink: A New Dataset for Scientific Symbol-Description Linking
- Authors: Viet Dac Lai, Amir Pouran Ben Veyseh, Franck Dernoncourt, Thien Huu
Nguyen
- Abstract summary: We present a new large-scale dataset that emphasizes extracting symbols and descriptions in scientific documents.
Our experiments on Symlink demonstrate the challenges of the symbol-description linking task for existing models.
- Score: 69.97278287534157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mathematical symbols and descriptions appear in various forms across document
section boundaries without explicit markup. In this paper, we present a new
large-scale dataset that emphasizes extracting symbols and descriptions in
scientific documents. Symlink annotates scientific papers of 5 different
domains (i.e., computer science, biology, physics, mathematics, and economics).
Our experiments on Symlink demonstrate the challenges of the symbol-description
linking task for existing models and call for further research effort in this
area. We will publicly release Symlink to facilitate future research.
Related papers
- MuLMS: A Multi-Layer Annotated Text Corpus for Information Extraction in
the Materials Science Domain [0.7947524927438001]
We present MuLMS, a new dataset of 50 open-access articles, spanning seven sub-domains of materials science.
We present competitive neural models for all tasks and demonstrate that multi-task training with existing related resources leads to benefits.
arXiv Detail & Related papers (2023-10-24T07:23:46Z) - MuLMS-AZ: An Argumentative Zoning Dataset for the Materials Science
Domain [1.209268134212644]
Classifying the Argumentative Zone (AZ) has been proposed to improve processing of scholarly documents.
We present and release a new dataset of 50 manually annotated research articles.
arXiv Detail & Related papers (2023-07-05T14:55:18Z) - Contrastive Hierarchical Discourse Graph for Scientific Document
Summarization [14.930704950433324]
CHANGES is a contrastive hierarchical graph neural network for extractive scientific paper summarization.
We also propose a graph contrastive learning module to learn global theme-aware sentence representations.
arXiv Detail & Related papers (2023-05-31T20:54:43Z) - Complex Mathematical Symbol Definition Structures: A Dataset and Model
for Coordination Resolution in Definition Extraction [27.896132821710783]
We present SymDef, an English language dataset of 5,927 sentences from full-text scientific papers.
This dataset focuses specifically on complex coordination structures such as "respectively" constructions.
We introduce a new definition extraction method that masks mathematical symbols, creates a copy of each sentence for each symbol, specifies a target symbol, and predicts its corresponding definition spans using slot filling.
arXiv Detail & Related papers (2023-05-24T02:53:48Z) - Scientific Paper Extractive Summarization Enhanced by Citation Graphs [50.19266650000948]
We focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings.
Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework.
Motivated by this, we propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available.
arXiv Detail & Related papers (2022-12-08T11:53:12Z) - Topic-Guided Abstractive Multi-Document Summarization [21.856615677793243]
A critical point of multi-document summarization (MDS) is to learn the relations among various documents.
We propose a novel abstractive MDS model, in which we represent multiple documents as a heterogeneous graph.
We employ a neural topic model to jointly discover latent topics that can act as cross-document semantic units.
arXiv Detail & Related papers (2021-10-21T15:32:30Z) - Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph [96.95815946327079]
It is difficult to learn the association between named entities and visual cues due to the long-tail distribution of named entities.
We propose a novel approach that constructs a multi-modal knowledge graph to associate the visual objects with named entities.
arXiv Detail & Related papers (2021-07-26T05:50:41Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.