Learning Regular Expressions for Interpretable Medical Text
Classification Using a Pool-based Simulated Annealing and Word-vector Models
- URL: http://arxiv.org/abs/2011.09351v1
- Date: Mon, 16 Nov 2020 07:20:02 GMT
- Title: Learning Regular Expressions for Interpretable Medical Text
Classification Using a Pool-based Simulated Annealing and Word-vector Models
- Authors: Chaofan Tu, Ruibin Bai, Zheng Lu, Uwe Aickelin, Peiming Ge, Jianshuang
Zhao
- Abstract summary: We propose a rule-based engine composed of high quality and interpretable regular expressions for medical classification.
The regular expressions are auto generated by a constructive method and optimized using a Pool-based Simulated Annealing (PSA) approach.
- Score: 0.6807963587057013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a rule-based engine composed of high quality and
interpretable regular expressions for medical text classification. The regular
expressions are auto generated by a constructive heuristic method and optimized
using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep
Neural Network (DNN) methods present high quality performance in most Natural
Language Processing (NLP) applications, the solutions are regarded as
uninterpretable black boxes to humans. Therefore, rule-based methods are often
introduced when interpretable solutions are needed, especially in the medical
field. However, the construction of regular expressions can be extremely
labor-intensive for large data sets. This research aims to reduce the manual
efforts while maintaining high-quality solutions
Related papers
- Bi-Encoders based Species Normalization -- Pairwise Sentence Learning to
Rank [0.0]
We present a novel deep learning approach for named entity normalization, treating it as a pair-wise learning to rank problem.
We conduct experiments on species entity types and evaluate our method against state-of-the-art techniques.
arXiv Detail & Related papers (2023-10-22T17:30:16Z) - Toward Unified Controllable Text Generation via Regular Expression
Instruction [56.68753672187368]
Our paper introduces Regular Expression Instruction (REI), which utilizes an instruction-based mechanism to fully exploit regular expressions' advantages to uniformly model diverse constraints.
Our method only requires fine-tuning on medium-scale language models or few-shot, in-context learning on large language models, and requires no further adjustment when applied to various constraint combinations.
arXiv Detail & Related papers (2023-09-19T09:05:14Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Regularization-based Pruning of Irrelevant Weights in Deep Neural
Architectures [0.0]
We propose a method for learning sparse neural topologies via a regularization technique which identifies non relevant weights and selectively shrinks their norm.
We tested the proposed technique on different image classification and Natural language generation tasks, obtaining results on par or better then competitors in terms of sparsity and metrics.
arXiv Detail & Related papers (2022-04-11T09:44:16Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - Lexically-constrained Text Generation through Commonsense Knowledge
Extraction and Injection [62.071938098215085]
We focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts.
We propose strategies for enhancing the semantic correctness of the generated text.
arXiv Detail & Related papers (2020-12-19T23:23:40Z) - Data-Driven Regular Expressions Evolution for Medical Text
Classification Using Genetic Programming [0.0]
This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions.
Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance.
arXiv Detail & Related papers (2020-12-04T03:44:46Z) - Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z) - Revisiting Regex Generation for Modeling Industrial Applications by
Incorporating Byte Pair Encoder [14.42244606935982]
This work focuses on automatically generating regular expressions and proposes a novel genetic algorithm to deal with this problem.
We first utilize byte pair encoder (BPE) to extract some frequent items, which are then used to construct regular expressions.
By doing exponential decay, the training speed is approximately 100 times faster than the methods without using exponential decay.
arXiv Detail & Related papers (2020-05-06T02:09:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.