Feature engineering vs. deep learning for paper section identification: Toward applications in Chinese medical literature
- URL: http://arxiv.org/abs/2412.11125v1
- Date: Sun, 15 Dec 2024 09:11:14 GMT
- Title: Feature engineering vs. deep learning for paper section identification: Toward applications in Chinese medical literature
- Authors: Sijia Zhou, Xin Li,
- Abstract summary: Section identification is an important task for library science, especially knowledge management.
We study the paper section identification problem in the context of Chinese medical literature analysis.
- Score: 5.773921786449337
- License:
- Abstract: Section identification is an important task for library science, especially knowledge management. Identifying the sections of a paper would help filter noise in entity and relation extraction. In this research, we studied the paper section identification problem in the context of Chinese medical literature analysis, where the subjects, methods, and results are more valuable from a physician's perspective. Based on previous studies on English literature section identification, we experiment with the effective features to use with classic machine learning algorithms to tackle the problem. It is found that Conditional Random Fields, which consider sentence interdependency, is more effective in combining different feature sets, such as bag-of-words, part-of-speech, and headings, for Chinese literature section identification. Moreover, we find that classic machine learning algorithms are more effective than generic deep learning models for this problem. Based on these observations, we design a novel deep learning model, the Structural Bidirectional Long Short-Term Memory (SLSTM) model, which models word and sentence interdependency together with the contextual information. Experiments on a human-curated asthma literature dataset show that our approach outperforms the traditional machine learning methods and other deep learning methods and achieves close to 90% precision and recall in the task. The model shows good potential for use in other text mining tasks. The research has significant methodological and practical implications.
Related papers
- Enhancing literature review with LLM and NLP methods. Algorithmic trading case [0.0]
This study utilizes machine learning algorithms to analyze and organize knowledge in the field of algorithmic trading.
By filtering a dataset of 136 million research papers, we identified 14,342 relevant articles published between 1956 and Q1 2020.
arXiv Detail & Related papers (2024-10-23T13:37:27Z) - Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Onologies are widely used for representing domain knowledge and meta data.
logical reasoning that can directly support are quite limited in learning, approximation and prediction.
One straightforward solution is to integrate statistical analysis and machine learning.
arXiv Detail & Related papers (2024-06-16T14:49:19Z) - Construction of a Syntactic Analysis Map for Yi Shui School through Text
Mining and Natural Language Processing Research [5.015294834550435]
This study constructs a word segmentation and entity relationship extraction model based on conditional random fields.
The dependency network is used to analyze the grammatical relationship between entities in each ancient book article.
arXiv Detail & Related papers (2024-02-16T14:59:55Z) - Searching for chromate replacements using natural language processing
and machine learning algorithms [0.0]
This study demonstrates it is possible to extract knowledge from the automated interpretation of the scientific literature and achieve expert human level insights.
We have employed the Word2Vec model, previously explored by others, and the BERT model - applying them towards a unique challenge in materials engineering.
arXiv Detail & Related papers (2022-08-11T07:21:18Z) - What Makes Good Contrastive Learning on Small-Scale Wearable-based
Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task.
This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z) - Enhancing Identification of Structure Function of Academic Articles
Using Contextual Information [6.28532577139029]
This paper takes articles of the ACL conference as the corpus to identify the structure function of academic articles.
We employ the traditional machine learning models and deep learning models to construct the classifiers based on various feature input.
Inspired by (2), this paper introduces contextual information into the deep learning models and achieved significant results.
arXiv Detail & Related papers (2021-11-28T11:21:21Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - Ten Quick Tips for Deep Learning in Biology [116.78436313026478]
Machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling.
Deep learning has become its own subfield of machine learning.
In the context of biological research, deep learning has been increasingly used to derive novel insights from high-dimensional biological data.
arXiv Detail & Related papers (2021-05-29T21:02:44Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z) - Towards Improved Model Design for Authorship Identification: A Survey on
Writing Style Understanding [30.642840676899734]
Authorship identification tasks rely heavily on linguistic styles.
Traditional machine learning methods based on handcrafted feature sets are already approaching their performance limits.
We describe outstanding methods in style-related tasks in general and analyze how they are used in combination in the top-performing models.
arXiv Detail & Related papers (2020-09-30T05:17:42Z) - Confident Coreset for Active Learning in Medical Image Analysis [57.436224561482966]
We propose a novel active learning method, confident coreset, which considers both uncertainty and distribution for effectively selecting informative samples.
By comparative experiments on two medical image analysis tasks, we show that our method outperforms other active learning methods.
arXiv Detail & Related papers (2020-04-05T13:46:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.