Evaluation of Embedding Models for Automatic Extraction and
Classification of Acknowledged Entities in Scientific Documents
- URL: http://arxiv.org/abs/2206.10939v1
- Date: Wed, 22 Jun 2022 09:32:28 GMT
- Title: Evaluation of Embedding Models for Automatic Extraction and
Classification of Acknowledged Entities in Scientific Documents
- Authors: Nina Smirnova, Philipp Mayr
- Abstract summary: The aim of the paper is to evaluate the performance of different embedding models for the task of automatic extraction and classification of acknowledged entities.
The training was conducted using three default Flair NER models with two differently-sized corpora.
Our model is able to recognize six entity types: funding agency, grant number, individuals, university, corporation and miscellaneous.
- Score: 5.330844352905488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Acknowledgments in scientific papers may give an insight into aspects of the
scientific community, such as reward systems, collaboration patterns, and
hidden research trends. The aim of the paper is to evaluate the performance of
different embedding models for the task of automatic extraction and
classification of acknowledged entities from the acknowledgment text in
scientific papers. We trained and implemented a named entity recognition (NER)
task using the Flair NLP-framework. The training was conducted using three
default Flair NER models with two differently-sized corpora. The Flair
Embeddings model trained on the larger training corpus showed the best accuracy
of 0.77. Our model is able to recognize six entity types: funding agency, grant
number, individuals, university, corporation and miscellaneous. The model works
more precise for some entity types than the others, thus, individuals and grant
numbers showed very good F1-Score over 0.9. Most of the previous works on
acknowledgement analysis were limited by the manual evaluation of data and
therefore by the amount of processed data. This model can be applied for the
comprehensive analysis of the acknowledgement texts and may potentially make a
great contribution to the field of automated acknowledgement analysis.
Related papers
- Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - Evaluating the Generation Capabilities of Large Chinese Language Models [27.598864484231477]
This paper unveils CG-Eval, the first-ever comprehensive and automated evaluation framework.
It assesses the generative capabilities of large Chinese language models across a spectrum of academic disciplines.
Gscore automates the quality measurement of a model's text generation against reference standards.
arXiv Detail & Related papers (2023-08-09T09:22:56Z) - A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check [53.152011258252315]
We show that using phonetic and graphic information reasonably is effective for Chinese Spelling Check.
Models are sensitive to the error distribution of the test set, which reflects the shortcomings of models.
The commonly used benchmark, SIGHAN, can not reliably evaluate models' performance.
arXiv Detail & Related papers (2023-07-25T17:02:38Z) - Embedding Models for Supervised Automatic Extraction and Classification
of Named Entities in Scientific Acknowledgements [5.330844352905488]
The aim of the paper is to evaluate the performance of different embedding models for the task of automatic extraction and classification of acknowledged entities.
The training was conducted using three default Flair NER models with four differently-sized corpora and different versions of the Flair NLP framework.
The model is able to recognize six entity types: funding agency, grant number, individuals, university, corporation, and miscellaneous.
arXiv Detail & Related papers (2023-07-25T09:51:17Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - SciRepEval: A Multi-Format Benchmark for Scientific Document
Representations [52.01865318382197]
We introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations.
We show how state-of-the-art models like SPECTER and SciNCL struggle to generalize across the task formats.
A new approach that learns multiple embeddings per document, each tailored to a different format, can improve performance.
arXiv Detail & Related papers (2022-11-23T21:25:39Z) - Automated and Explainable Ontology Extension Based on Deep Learning: A
Case Study in the Chemical Domain [0.9449650062296822]
We present a new methodology for automatic ontology extension for large domains.
We trained a Transformer-based deep learning model on the leaf node from the ChEBI ontology and the classes to which they belong.
The proposed model achieved an overall F1 score of 0.80, an improvement of 6 percentage points over our previous results.
arXiv Detail & Related papers (2021-09-19T19:37:08Z) - An Intelligent Hybrid Model for Identity Document Classification [0.0]
Digitization may provide opportunities (e.g., increase in productivity, disaster recovery, and environmentally friendly solutions) and challenges for businesses.
One of the main challenges would be to accurately classify numerous scanned documents uploaded every day by customers.
There are not many studies available to address the challenge as an application of image classification.
The proposed approach has been implemented using Python and experimentally validated on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-06-07T13:08:00Z) - A Multi-Level Attention Model for Evidence-Based Fact Checking [58.95413968110558]
We present a simple model that can be trained on sequence structures.
Results on a large-scale dataset for Fact Extraction and VERification show that our model outperforms the graph-based approaches.
arXiv Detail & Related papers (2021-06-02T05:40:12Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.