Automatic selection of primary studies in systematic reviews with evolutionary rule-based classification
- URL: http://arxiv.org/abs/2509.23981v1
- Date: Sun, 28 Sep 2025 17:13:20 GMT
- Title: Automatic selection of primary studies in systematic reviews with evolutionary rule-based classification
- Authors: José de la Torre-López, Aurora Ramírez, José Raúl Romero,
- Abstract summary: We propose an evolutionary machine learning approach, called ourmodel, to automatically determine whether a paper retrieved from a literature search process is relevant.<n>The use of a grammar to define the syntax and the structure of the rules allows ourmodel to easily combine the usual textual information with other bibliometric data not considered by state-of-the-art methods.
- Score: 0.30586855806896035
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Searching, filtering and analysing scientific literature are time-consuming tasks when performing a systematic literature review. With the rise of artificial intelligence, some steps in the review process are progressively being automated. In particular, machine learning for automatic paper selection can greatly reduce the effort required to identify relevant literature in scientific databases. We propose an evolutionary machine learning approach, called \ourmodel, to automatically determine whether a paper retrieved from a literature search process is relevant. \ourmodel builds an interpretable rule-based classifier using grammar-guided genetic programming. The use of a grammar to define the syntax and the structure of the rules allows \ourmodel to easily combine the usual textual information with other bibliometric data not considered by state-of-the-art methods. Our experiments demonstrate that it is possible to generate accurate classifiers without impairing interpretability and using configurable information sources not supported so far.
Related papers
- DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers [18.279429202248632]
We introduce DISCERN, a framework for interpreting systematic biases in text classifiers using language explanations.
DISCERN iteratively generates precise natural language descriptions of systematic errors by employing an interactive loop between two large language models.
We show that users can interpret systematic biases more effectively (by over 25% relative) and efficiently when described through language explanations as opposed to cluster exemplars.
arXiv Detail & Related papers (2024-10-29T17:04:55Z) - SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics [2.3742710594744105]
We introduce SciPrompt, a framework designed to automatically retrieve scientific topic-related terms for low-resource text classification tasks.
Our method outperforms state-of-the-art, prompt-based fine-tuning methods on scientific text classification tasks under few and zero-shot settings.
arXiv Detail & Related papers (2024-10-02T18:45:04Z) - Evaluating Generative Ad Hoc Information Retrieval [58.800799175084286]
generative retrieval systems often directly return a grounded generated text as a response to a query.
Quantifying the utility of the textual responses is essential for appropriately evaluating such generative ad hoc retrieval.
arXiv Detail & Related papers (2023-11-08T14:05:00Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - A Framework For Refining Text Classification and Object Recognition from Academic Articles [2.699900017799093]
Current data mining methods for academic articles employ rule-based(RB) or machine learning(ML) approaches.
We have developed a novel Text Block Refinement Framework (TBRF), a machine learning and rule-based scheme hybrid.
arXiv Detail & Related papers (2023-05-27T07:59:49Z) - No Pattern, No Recognition: a Survey about Reproducibility and
Distortion Issues of Text Clustering and Topic Modeling [0.0]
Machine learning algorithms can be used to extract knowledge from unlabeled texts.
Unsupervised learning can lead to variability depending on the machine learning algorithm.
The presence of outliers and anomalies can be a determining factor.
arXiv Detail & Related papers (2022-08-02T19:51:43Z) - Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods.
We conducted three types of experiments -- monolingual, multilingual, and cross-lingual.
The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z) - Toward Educator-focused Automated Scoring Systems for Reading and
Writing [0.0]
This paper addresses the challenges of data and label availability, authentic and extended writing, domain scoring, prompt and source variety, and transfer learning.
It employs techniques that preserve essay length as an important feature without increasing model training costs.
arXiv Detail & Related papers (2021-12-22T15:44:30Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
in Natural Language Processing [78.8500633981247]
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning"
Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly.
arXiv Detail & Related papers (2021-07-28T18:09:46Z) - Curious Case of Language Generation Evaluation Metrics: A Cautionary
Tale [52.663117551150954]
A few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation.
This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them.
In this paper, we urge the community for more careful consideration of how they automatically evaluate their models.
arXiv Detail & Related papers (2020-10-26T13:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.