Related papers: Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming

Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming

URL: http://arxiv.org/abs/2012.07515v1
Date: Fri, 4 Dec 2020 03:44:46 GMT
Title: Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming
Authors: J Liu, R Bai, Z Lu, P Ge, D Liu, Uwe Aickelin
Abstract summary: This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions. Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In medical fields, text classification is one of the most important tasks that can significantly reduce human workload through structured information digitization and intelligent decision support. Despite the popularity of learning-based text classification techniques, it is hard for human to understand or manually fine-tune the classification results for better precision and recall, due to the black box nature of learning. This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions that can classify a given medical text inquiry with satisfactory precision and recall while allow human to read the classifier and fine-tune accordingly if necessary. Given a seed population of regular expressions (can be randomly initialized or manually constructed by experts), our method evolves a population of regular expressions according to chosen fitness function, using a novel regular expression syntax and a series of carefully chosen reproduction operators. Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance. More importantly, our method generates classifiers that can be fully understood, checked and updated by medical doctors, which are fundamentally crucial for medical related practices.

Related papers

Large Language Models in the Task of Automatic Validation of Text Classifier Predictions [55.2480439325792]
Machine learning models for text classification are trained to predict a class for a given text.<n>To do this, training and validation samples must be prepared, and each text is assigned a class.<n>Human annotators are usually assigned by human annotators with different expertise levels, depending on the specific classification task.<n>This paper proposes several approaches to replace human annotators with Large Language Models.
arXiv Detail & Related papers (2025-05-24T13:19:03Z)
Comparing Lexical and Semantic Vector Search Methods When Classifying Medical Documents [0.0]
Our task was to classify rigidly-structured medical documents according to their content.<n>We found that using off-the-shelf semantic vector search produced slightly worse predictive accuracy than creating a bespoke lexical vector search model.
arXiv Detail & Related papers (2025-05-16T17:06:35Z)
AI-assisted summary of suicide risk Formulation [0.9224875902060083]
This study describes how we developed advanced Natural Language Processing (NLP) algorithms, a branch of Artificial Intelligence (AI) Formulation, associated with suicide risk assessment, is an individualised process that seeks to understand the idiosyncratic nature and development of an individual's problems.
arXiv Detail & Related papers (2024-11-29T16:40:28Z)
SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics [2.3742710594744105]
We introduce SciPrompt, a framework designed to automatically retrieve scientific topic-related terms for low-resource text classification tasks. Our method outperforms state-of-the-art, prompt-based fine-tuning methods on scientific text classification tasks under few and zero-shot settings.
arXiv Detail & Related papers (2024-10-02T18:45:04Z)
Self-Supervised Representation Learning for Online Handwriting Text Classification [0.8594140167290099]
We propose the novel Part of Stroke Masking (POSM) as a pretext task for pretraining models to extract informative representations from the online handwriting of individuals in English and Chinese languages. To evaluate the quality of the extracted representations, we use both intrinsic and extrinsic evaluation methods. The pretrained models are fine-tuned to achieve state-of-the-art results in tasks such as writer identification, gender classification, and handedness classification.
arXiv Detail & Related papers (2023-10-10T14:07:49Z)
Detecting automatically the layout of clinical documents to enhance the performances of downstream natural language processing [53.797797404164946]
We designed an algorithm to process clinical PDF documents and extract only clinically relevant text. The algorithm consists of several steps: initial text extraction using a PDF, followed by classification into such categories as body text, left notes, and footers. Medical performance was evaluated by examining the extraction of medical concepts of interest from the text in their respective sections.
arXiv Detail & Related papers (2023-05-23T08:38:33Z)
Textual Entailment Recognition with Semantic Features from Empirical Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text. In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis. We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z)
Curriculum-Based Self-Training Makes Better Few-Shot Learners for Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation. Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z)
Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts. The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z)
Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods. We conducted three types of experiments -- monolingual, multilingual, and cross-lingual. The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z)
Towards more patient friendly clinical notes through language models and ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling. We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians. Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z)
Word-level Text Highlighting of Medical Texts forTelehealth Services [0.0]
This paper aims to show how different text highlighting techniques can capture relevant medical context. Three different word-level text highlighting methodologies are implemented and evaluated. The results of our experiments show that the neural network approach is successful in highlighting medically-relevant terms.
arXiv Detail & Related papers (2021-05-21T15:13:54Z)
Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing and Word-vector Models [0.6807963587057013]
We propose a rule-based engine composed of high quality and interpretable regular expressions for medical classification. The regular expressions are auto generated by a constructive method and optimized using a Pool-based Simulated Annealing (PSA) approach.
arXiv Detail & Related papers (2020-11-16T07:20:02Z)
Revisiting Regex Generation for Modeling Industrial Applications by Incorporating Byte Pair Encoder [14.42244606935982]
This work focuses on automatically generating regular expressions and proposes a novel genetic algorithm to deal with this problem. We first utilize byte pair encoder (BPE) to extract some frequent items, which are then used to construct regular expressions. By doing exponential decay, the training speed is approximately 100 times faster than the methods without using exponential decay.
arXiv Detail & Related papers (2020-05-06T02:09:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.