Related papers: Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing and Word-vector Models

Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing and Word-vector Models

URL: http://arxiv.org/abs/2011.09351v1
Date: Mon, 16 Nov 2020 07:20:02 GMT
Title: Learning Regular Expressions for Interpretable Medical Text Classification Using a Pool-based Simulated Annealing and Word-vector Models
Authors: Chaofan Tu, Ruibin Bai, Zheng Lu, Uwe Aickelin, Peiming Ge, Jianshuang Zhao
Abstract summary: We propose a rule-based engine composed of high quality and interpretable regular expressions for medical classification. The regular expressions are auto generated by a constructive method and optimized using a Pool-based Simulated Annealing (PSA) approach.
Score: 0.6807963587057013
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose a rule-based engine composed of high quality and interpretable regular expressions for medical text classification. The regular expressions are auto generated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present high quality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable black boxes to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions

Related papers

Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models [46.43567715840425]
This paper introduces Dynamic Importance Sampling for Constrained Parallel Prefix-Verification (PPV) PPV is a novel algorithm that leverages dynamic importance sampling to achieve theoretically guaranteed unbiasedness and overcomes the inefficiency of prefix-tree. Experiments demonstrate the superiority of our method over existing methods in both efficiency and output quality.
arXiv Detail & Related papers (2025-04-12T08:49:21Z)
Bi-Encoders based Species Normalization -- Pairwise Sentence Learning to Rank [0.0]
We present a novel deep learning approach for named entity normalization, treating it as a pair-wise learning to rank problem. We conduct experiments on species entity types and evaluate our method against state-of-the-art techniques.
arXiv Detail & Related papers (2023-10-22T17:30:16Z)
Toward Unified Controllable Text Generation via Regular Expression Instruction [56.68753672187368]
Our paper introduces Regular Expression Instruction (REI), which utilizes an instruction-based mechanism to fully exploit regular expressions' advantages to uniformly model diverse constraints. Our method only requires fine-tuning on medium-scale language models or few-shot, in-context learning on large language models, and requires no further adjustment when applied to various constraint combinations.
arXiv Detail & Related papers (2023-09-19T09:05:14Z)
An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians. Recent studies have achieved promising results in automatic impression generation using large-scale medical text data. These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
Regularization-based Pruning of Irrelevant Weights in Deep Neural Architectures [0.0]
We propose a method for learning sparse neural topologies via a regularization technique which identifies non relevant weights and selectively shrinks their norm. We tested the proposed technique on different image classification and Natural language generation tasks, obtaining results on par or better then competitors in terms of sparsity and metrics.
arXiv Detail & Related papers (2022-04-11T09:44:16Z)
SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation. Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)
Lexically-constrained Text Generation through Commonsense Knowledge Extraction and Injection [62.071938098215085]
We focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts. We propose strategies for enhancing the semantic correctness of the generated text.
arXiv Detail & Related papers (2020-12-19T23:23:40Z)
Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming [0.0]
This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions. Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance.
arXiv Detail & Related papers (2020-12-04T03:44:46Z)
Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z)
Revisiting Regex Generation for Modeling Industrial Applications by Incorporating Byte Pair Encoder [14.42244606935982]
This work focuses on automatically generating regular expressions and proposes a novel genetic algorithm to deal with this problem. We first utilize byte pair encoder (BPE) to extract some frequent items, which are then used to construct regular expressions. By doing exponential decay, the training speed is approximately 100 times faster than the methods without using exponential decay.
arXiv Detail & Related papers (2020-05-06T02:09:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.