Noise Pollution in Hospital Readmission Prediction: Long Document
Classification with Reinforcement Learning
- URL: http://arxiv.org/abs/2005.01259v2
- Date: Sat, 23 May 2020 04:36:36 GMT
- Title: Noise Pollution in Hospital Readmission Prediction: Long Document
Classification with Reinforcement Learning
- Authors: Liyan Xu, Julien Hogan, Rachel E. Patzer and Jinho D. Choi
- Abstract summary: This paper presents a reinforcement learning approach to extract noise in long clinical documents for the task of readmission prediction after kidney transplant.
We first experiment four types of encoders to empirically decide the best document representation, and then apply reinforcement learning to remove noisy text from the long documents.
- Score: 15.476161876559074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a reinforcement learning approach to extract noise in
long clinical documents for the task of readmission prediction after kidney
transplant. We face the challenges of developing robust models on a small
dataset where each document may consist of over 10K tokens with full of noise
including tabular text and task-irrelevant sentences. We first experiment four
types of encoders to empirically decide the best document representation, and
then apply reinforcement learning to remove noisy text from the long documents,
which models the noise extraction process as a sequential decision problem. Our
results show that the old bag-of-words encoder outperforms deep learning-based
encoders on this task, and reinforcement learning is able to improve upon
baseline while pruning out 25% text segments. Our analysis depicts that
reinforcement learning is able to identify both typical noisy tokens and
task-specific noisy text.
Related papers
- Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs.
We propose a more realistic setting in which only noisy text and its NER labels are available.
We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z) - NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification [7.464154519547575]
Existing research on learning with noisy labels predominantly focuses on synthetic noise patterns.
We constructed a benchmark dataset to better understand label noise in real-world text classification settings.
Our findings reveal that while pre-trained models are resilient to synthetic noise, they struggle against instance-dependent noise.
arXiv Detail & Related papers (2024-07-09T06:18:40Z) - Improving the Robustness of Summarization Systems with Dual Augmentation [68.53139002203118]
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input.
We first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise.
We propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models.
arXiv Detail & Related papers (2023-06-01T19:04:17Z) - Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization [76.57699934689468]
We propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side to enhance the performance of neural models.
To overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens.
arXiv Detail & Related papers (2023-05-18T16:02:04Z) - Coherence and Diversity through Noise: Self-Supervised Paraphrase
Generation via Structure-Aware Denoising [5.682665111938764]
We propose SCANING, an unsupervised framework for paraphrasing via controlled noise injection.
We focus on the novel task of paraphrasing algebraic word problems having practical applications in online pedagogy.
We demonstrate SCANING considerably improves performance in terms of both semantic preservation and producing diverse paraphrases.
arXiv Detail & Related papers (2023-02-06T13:50:57Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval [103.85002875155551]
We propose a novel generalized distillation method, TeachText, for exploiting large-scale language pretraining.
We extend our method to video side modalities and show that we can effectively reduce the number of used modalities at test time.
Our approach advances the state of the art on several video retrieval benchmarks by a significant margin and adds no computational overhead at test time.
arXiv Detail & Related papers (2021-04-16T17:55:28Z) - Towards Robustness to Label Noise in Text Classification via Noise
Modeling [7.863638253070439]
Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures.
We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier.
arXiv Detail & Related papers (2021-01-27T05:41:57Z) - Learning Not to Learn in the Presence of Noisy Labels [104.7655376309784]
We show that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption.
We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels.
arXiv Detail & Related papers (2020-02-16T09:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.