Data-Driven Regular Expressions Evolution for Medical Text
  Classification Using Genetic Programming
        - URL: http://arxiv.org/abs/2012.07515v1
- Date: Fri, 4 Dec 2020 03:44:46 GMT
- Title: Data-Driven Regular Expressions Evolution for Medical Text
  Classification Using Genetic Programming
- Authors: J Liu, R Bai, Z Lu, P Ge, D Liu, Uwe Aickelin
- Abstract summary: This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions.
Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   In medical fields, text classification is one of the most important tasks
that can significantly reduce human workload through structured information
digitization and intelligent decision support. Despite the popularity of
learning-based text classification techniques, it is hard for human to
understand or manually fine-tune the classification results for better
precision and recall, due to the black box nature of learning. This study
proposes a novel regular expression-based text classification method making use
of genetic programming (GP) approaches to evolve regular expressions that can
classify a given medical text inquiry with satisfactory precision and recall
while allow human to read the classifier and fine-tune accordingly if
necessary. Given a seed population of regular expressions (can be randomly
initialized or manually constructed by experts), our method evolves a
population of regular expressions according to chosen fitness function, using a
novel regular expression syntax and a series of carefully chosen reproduction
operators. Our method is evaluated with real-life medical text inquiries from
an online healthcare provider and shows promising performance. More
importantly, our method generates classifiers that can be fully understood,
checked and updated by medical doctors, which are fundamentally crucial for
medical related practices.
 
      
        Related papers
        - Large Language Models in the Task of Automatic Validation of Text   Classifier Predictions [55.2480439325792]
 Machine learning models for text classification are trained to predict a class for a given text.<n>To do this, training and validation samples must be prepared, and each text is assigned a class.<n>Human annotators are usually assigned by human annotators with different expertise levels, depending on the specific classification task.<n>This paper proposes several approaches to replace human annotators with Large Language Models.
 arXiv  Detail & Related papers  (2025-05-24T13:19:03Z)
- Comparing Lexical and Semantic Vector Search Methods When Classifying   Medical Documents [0.0]
 Our task was to classify rigidly-structured medical documents according to their content.<n>We found that using off-the-shelf semantic vector search produced slightly worse predictive accuracy than creating a bespoke lexical vector search model.
 arXiv  Detail & Related papers  (2025-05-16T17:06:35Z)
- AI-assisted summary of suicide risk Formulation [0.9224875902060083]
 This study describes how we developed advanced Natural Language Processing (NLP) algorithms, a branch of Artificial Intelligence (AI)
Formulation, associated with suicide risk assessment, is an individualised process that seeks to understand the idiosyncratic nature and development of an individual's problems.
 arXiv  Detail & Related papers  (2024-11-29T16:40:28Z)
- SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization   of Scientific Topics [2.3742710594744105]
 We introduce SciPrompt, a framework designed to automatically retrieve scientific topic-related terms for low-resource text classification tasks.
Our method outperforms state-of-the-art, prompt-based fine-tuning methods on scientific text classification tasks under few and zero-shot settings.
 arXiv  Detail & Related papers  (2024-10-02T18:45:04Z)
- Self-Supervised Representation Learning for Online Handwriting Text
  Classification [0.8594140167290099]
 We propose the novel Part of Stroke Masking (POSM) as a pretext task for pretraining models to extract informative representations from the online handwriting of individuals in English and Chinese languages.
To evaluate the quality of the extracted representations, we use both intrinsic and extrinsic evaluation methods.
The pretrained models are fine-tuned to achieve state-of-the-art results in tasks such as writer identification, gender classification, and handedness classification.
 arXiv  Detail & Related papers  (2023-10-10T14:07:49Z)
- Detecting automatically the layout of clinical documents to enhance the
  performances of downstream natural language processing [53.797797404164946]
 We designed an algorithm to process clinical PDF documents and extract only clinically relevant text.
The algorithm consists of several steps: initial text extraction using a PDF, followed by classification into such categories as body text, left notes, and footers.
Medical performance was evaluated by examining the extraction of medical concepts of interest from the text in their respective sections.
 arXiv  Detail & Related papers  (2023-05-23T08:38:33Z)
- Textual Entailment Recognition with Semantic Features from Empirical
  Text Representation [60.31047947815282]
 A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
 arXiv  Detail & Related papers  (2022-10-18T10:03:51Z)
- Curriculum-Based Self-Training Makes Better Few-Shot Learners for
  Data-to-Text Generation [56.98033565736974]
 We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
 arXiv  Detail & Related papers  (2022-06-06T16:11:58Z)
- Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
 We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts.
The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
 arXiv  Detail & Related papers  (2022-05-15T12:58:35Z)
- Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
 This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods.
We conducted three types of experiments -- monolingual, multilingual, and cross-lingual.
The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
 arXiv  Detail & Related papers  (2022-04-19T16:23:07Z)
- Towards more patient friendly clinical notes through language models and
  ontologies [57.51898902864543]
 We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
 arXiv  Detail & Related papers  (2021-12-23T16:11:19Z)
- Word-level Text Highlighting of Medical Texts forTelehealth Services [0.0]
 This paper aims to show how different text highlighting techniques can capture relevant medical context.
Three different word-level text highlighting methodologies are implemented and evaluated.
The results of our experiments show that the neural network approach is successful in highlighting medically-relevant terms.
 arXiv  Detail & Related papers  (2021-05-21T15:13:54Z)
- Learning Regular Expressions for Interpretable Medical Text
  Classification Using a Pool-based Simulated Annealing and Word-vector Models [0.6807963587057013]
 We propose a rule-based engine composed of high quality and interpretable regular expressions for medical classification.
The regular expressions are auto generated by a constructive method and optimized using a Pool-based Simulated Annealing (PSA) approach.
 arXiv  Detail & Related papers  (2020-11-16T07:20:02Z)
- Revisiting Regex Generation for Modeling Industrial Applications by
  Incorporating Byte Pair Encoder [14.42244606935982]
 This work focuses on automatically generating regular expressions and proposes a novel genetic algorithm to deal with this problem.
We first utilize byte pair encoder (BPE) to extract some frequent items, which are then used to construct regular expressions.
By doing exponential decay, the training speed is approximately 100 times faster than the methods without using exponential decay.
 arXiv  Detail & Related papers  (2020-05-06T02:09:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.