Data-Driven Regular Expressions Evolution for Medical Text
Classification Using Genetic Programming
- URL: http://arxiv.org/abs/2012.07515v1
- Date: Fri, 4 Dec 2020 03:44:46 GMT
- Title: Data-Driven Regular Expressions Evolution for Medical Text
Classification Using Genetic Programming
- Authors: J Liu, R Bai, Z Lu, P Ge, D Liu, Uwe Aickelin
- Abstract summary: This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions.
Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In medical fields, text classification is one of the most important tasks
that can significantly reduce human workload through structured information
digitization and intelligent decision support. Despite the popularity of
learning-based text classification techniques, it is hard for human to
understand or manually fine-tune the classification results for better
precision and recall, due to the black box nature of learning. This study
proposes a novel regular expression-based text classification method making use
of genetic programming (GP) approaches to evolve regular expressions that can
classify a given medical text inquiry with satisfactory precision and recall
while allow human to read the classifier and fine-tune accordingly if
necessary. Given a seed population of regular expressions (can be randomly
initialized or manually constructed by experts), our method evolves a
population of regular expressions according to chosen fitness function, using a
novel regular expression syntax and a series of carefully chosen reproduction
operators. Our method is evaluated with real-life medical text inquiries from
an online healthcare provider and shows promising performance. More
importantly, our method generates classifiers that can be fully understood,
checked and updated by medical doctors, which are fundamentally crucial for
medical related practices.
Related papers
- SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics [2.3742710594744105]
We introduce SciPrompt, a framework designed to automatically retrieve scientific topic-related terms for low-resource text classification tasks.
Our method outperforms state-of-the-art, prompt-based fine-tuning methods on scientific text classification tasks under few and zero-shot settings.
arXiv Detail & Related papers (2024-10-02T18:45:04Z) - Self-Supervised Representation Learning for Online Handwriting Text
Classification [0.8594140167290099]
We propose the novel Part of Stroke Masking (POSM) as a pretext task for pretraining models to extract informative representations from the online handwriting of individuals in English and Chinese languages.
To evaluate the quality of the extracted representations, we use both intrinsic and extrinsic evaluation methods.
The pretrained models are fine-tuned to achieve state-of-the-art results in tasks such as writer identification, gender classification, and handedness classification.
arXiv Detail & Related papers (2023-10-10T14:07:49Z) - Detecting automatically the layout of clinical documents to enhance the
performances of downstream natural language processing [53.797797404164946]
We designed an algorithm to process clinical PDF documents and extract only clinically relevant text.
The algorithm consists of several steps: initial text extraction using a PDF, followed by classification into such categories as body text, left notes, and footers.
Medical performance was evaluated by examining the extraction of medical concepts of interest from the text in their respective sections.
arXiv Detail & Related papers (2023-05-23T08:38:33Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts.
The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z) - Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods.
We conducted three types of experiments -- monolingual, multilingual, and cross-lingual.
The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Word-level Text Highlighting of Medical Texts forTelehealth Services [0.0]
This paper aims to show how different text highlighting techniques can capture relevant medical context.
Three different word-level text highlighting methodologies are implemented and evaluated.
The results of our experiments show that the neural network approach is successful in highlighting medically-relevant terms.
arXiv Detail & Related papers (2021-05-21T15:13:54Z) - Learning Regular Expressions for Interpretable Medical Text
Classification Using a Pool-based Simulated Annealing and Word-vector Models [0.6807963587057013]
We propose a rule-based engine composed of high quality and interpretable regular expressions for medical classification.
The regular expressions are auto generated by a constructive method and optimized using a Pool-based Simulated Annealing (PSA) approach.
arXiv Detail & Related papers (2020-11-16T07:20:02Z) - Revisiting Regex Generation for Modeling Industrial Applications by
Incorporating Byte Pair Encoder [14.42244606935982]
This work focuses on automatically generating regular expressions and proposes a novel genetic algorithm to deal with this problem.
We first utilize byte pair encoder (BPE) to extract some frequent items, which are then used to construct regular expressions.
By doing exponential decay, the training speed is approximately 100 times faster than the methods without using exponential decay.
arXiv Detail & Related papers (2020-05-06T02:09:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.