Automatic Recognition and Classification of Future Work Sentences from
Academic Articles in a Specific Domain
- URL: http://arxiv.org/abs/2212.13860v1
- Date: Wed, 28 Dec 2022 15:26:04 GMT
- Title: Automatic Recognition and Classification of Future Work Sentences from
Academic Articles in a Specific Domain
- Authors: Chengzhi Zhang, Yi Xiang, Wenke Hao, Zhicheng Li, Yuchen Qian, Yuzhuo
Wang
- Abstract summary: Future work sentences (FWS) are the sentences in academic papers that contain the author's description of their proposed follow-up research direction.
This paper presents methods to automatically extract FWS from academic papers and classify them according to the different future directions embodied in the paper's content.
- Score: 7.652206854575039
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Future work sentences (FWS) are the particular sentences in academic papers
that contain the author's description of their proposed follow-up research
direction. This paper presents methods to automatically extract FWS from
academic papers and classify them according to the different future directions
embodied in the paper's content. FWS recognition methods will enable subsequent
researchers to locate future work sentences more accurately and quickly and
reduce the time and cost of acquiring the corpus. The current work on automatic
identification of future work sentences is relatively small, and the existing
research cannot accurately identify FWS from academic papers, and thus cannot
conduct data mining on a large scale. Furthermore, there are many aspects to
the content of future work, and the subdivision of the content is conducive to
the analysis of specific development directions. In this paper, Nature Language
Processing (NLP) is used as a case study, and FWS are extracted from academic
papers and classified into different types. We manually build an annotated
corpus with six different types of FWS. Then, automatic recognition and
classification of FWS are implemented using machine learning models, and the
performance of these models is compared based on the evaluation metrics. The
results show that the Bernoulli Bayesian model has the best performance in the
automatic recognition task, with the Macro F1 reaching 90.73%, and the SCIBERT
model has the best performance in the automatic classification task, with the
weighted average F1 reaching 72.63%. Finally, we extract keywords from FWS and
gain a deep understanding of the key content described in FWS, and we also
demonstrate that content determination in FWS will be reflected in the
subsequent research work by measuring the similarity between future work
sentences and the abstracts.
Related papers
- Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Information Extraction in Domain and Generic Documents: Findings from
Heuristic-based and Data-driven Approaches [0.0]
Information extraction plays important role in natural language processing.
Document genre and length influence on IE tasks.
No single method demonstrated overwhelming performance in both tasks.
arXiv Detail & Related papers (2023-06-30T20:43:27Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Tuning Traditional Language Processing Approaches for Pashto Text
Classification [0.0]
The main aim of this study is to establish a Pashto automatic text classification system.
This study compares several models containing both statistical and neural network machine learning techniques.
This research obtained average testing accuracy rate 94% using classification algorithm and TFIDF feature extraction method.
arXiv Detail & Related papers (2023-05-04T22:57:45Z) - Data-Centric Machine Learning in the Legal Domain [0.2624902795082451]
This paper explores how changes in a data set influence the measured performance of a model.
Using three publicly available data sets from the legal domain, we investigate how changes to their size, the train/test splits, and the human labelling accuracy impact the performance.
The observed effects are surprisingly pronounced, especially when the per-class performance is considered.
arXiv Detail & Related papers (2022-01-17T23:05:14Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - What do writing features tell us about AI papers? [23.224038524126467]
We argue that studying interpretable dimensions of academic papers could lead to scalable solutions.
We extract a collection of writing features, and construct a suite of prediction tasks to assess the usefulness of these features in predicting citation counts and the publication of AI-related papers.
arXiv Detail & Related papers (2021-07-13T18:12:12Z) - Automating Document Classification with Distant Supervision to Increase
the Efficiency of Systematic Reviews [18.33687903724145]
Well-done systematic reviews are expensive, time-demanding, and labor-intensive.
We propose an automatic document classification approach to significantly reduce the effort in reviewing documents.
arXiv Detail & Related papers (2020-12-09T22:45:40Z) - SPECTER: Document-level Representation Learning using Citation-informed
Transformers [51.048515757909215]
SPECTER generates document-level embedding of scientific documents based on pretraining a Transformer language model.
We introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction to document classification and recommendation.
arXiv Detail & Related papers (2020-04-15T16:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.