Evaluating the Robustness of Adverse Drug Event Classification Models Using Templates
- URL: http://arxiv.org/abs/2407.02432v1
- Date: Tue, 2 Jul 2024 17:09:24 GMT
- Title: Evaluating the Robustness of Adverse Drug Event Classification Models Using Templates
- Authors: Dorothea MacPhail, David Harbecke, Lisa Raithel, Sebastian Möller,
- Abstract summary: An adverse drug effect (ADE) is any harmful event resulting from medical drug treatment.
Despite their importance, ADEs are often under-reported in official channels.
Some research has turned to detecting discussions of ADEs in social media.
- Score: 11.276505487445782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An adverse drug effect (ADE) is any harmful event resulting from medical drug treatment. Despite their importance, ADEs are often under-reported in official channels. Some research has therefore turned to detecting discussions of ADEs in social media. Impressive results have been achieved in various attempts to detect ADEs. In a high-stakes domain such as medicine, however, an in-depth evaluation of a model's abilities is crucial. We address the issue of thorough performance evaluation in English-language ADE detection with hand-crafted templates for four capabilities: Temporal order, negation, sentiment, and beneficial effect. We find that models with similar performance on held-out test sets have varying results on these capabilities.
Related papers
- Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech [60.08015780474457]
Alzheimer's Disease (AD) detection has emerged as a promising research area that employs machine learning classification models.
We identify within-class variation as a critical challenge in AD detection: individuals with AD exhibit a spectrum of cognitive impairments.
We propose two novel methods: Soft Target Distillation (SoTD) and Instance-level Re-balancing (InRe), targeting two problems respectively.
arXiv Detail & Related papers (2024-09-22T02:06:05Z) - An Evaluation Benchmark for Adverse Drug Event Prediction from Clinical Trial Results [0.10051474951635876]
Adverse drug events (ADEs) are a major safety issue in clinical trials.
We introduce CT-ADE, a dataset for multilabel ADE prediction in monopharmacy treatments.
arXiv Detail & Related papers (2024-04-19T12:04:32Z) - Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness.
We show that this observed gain in robustness is an illusion of robustness (IOR)
We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Increasing Adverse Drug Events extraction robustness on social media:
case study on negation and speculation [7.052238842788185]
In the last decade, an increasing number of users have started reporting Adverse Drug Events (ADE) on social media platforms.
This paper takes into consideration four state-of-the-art systems for ADE detection on social media texts.
We introduce SNAX, a benchmark to test their performance against samples containing negated and speculated ADEs.
arXiv Detail & Related papers (2022-09-06T20:38:42Z) - Assessment of contextualised representations in detecting outcome
phrases in clinical trials [14.584741378279316]
We introduce "EBM-COMET", a dataset in which 300 PubMed abstracts are expertly annotated for clinical outcomes.
To extract outcomes, we fine-tune a variety of pre-trained contextualized representations.
We observe our best model (BioBERT) achieve 81.5% F1, 81.3% sensitivity and 98.0% specificity.
arXiv Detail & Related papers (2022-02-13T15:08:00Z) - Explaining medical AI performance disparities across sites with
confounder Shapley value analysis [8.785345834486057]
Multi-site evaluations are key to diagnosing such disparities.
Our framework provides a method for quantifying the marginal and cumulative effect of each type of bias on the overall performance difference.
We demonstrate its usefulness in a case study of a deep learning model trained to detect the presence of pneumothorax.
arXiv Detail & Related papers (2021-11-12T18:54:10Z) - NADE: A Benchmark for Robust Adverse Drug Events Extraction in Face of
Negations [8.380439657099906]
Adverse Drug Event (ADE) extraction mod-els can rapidly examine large collections of so-cial media texts, detecting mentions of drug-related adverse reactions and trigger medicalinvestigations.
Despite the recent ad-vances in NLP, it is currently unknown if suchmodels are robust in face ofnegation, which ispervasive across language varieties.
In this paper we evaluate three state-of-the-art systems, showing their fragility against nega-tion, and then we introduce two possible strate-gies to increase the robustness of these mod-els.
arXiv Detail & Related papers (2021-09-21T10:33:29Z) - Learn what you can't learn: Regularized Ensembles for Transductive
Out-of-distribution Detection [76.39067237772286]
We show that current out-of-distribution (OOD) detection algorithms for neural networks produce unsatisfactory results in a variety of OOD detection scenarios.
This paper studies how such "hard" OOD scenarios can benefit from adjusting the detection method after observing a batch of the test data.
We propose a novel method that uses an artificial labeling scheme for the test data and regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch.
arXiv Detail & Related papers (2020-12-10T16:55:13Z) - How Robust are the Estimated Effects of Nonpharmaceutical Interventions
against COVID-19? [46.28845358816497]
We investigate 2 state-of-the-art NPI effectiveness models and propose 6 variants that make different structural assumptions.
We investigate how well NPI effectiveness estimates generalise to unseen countries, and their sensitivity to unobserved factors.
We mathematically ground the interpretation of NPI effectiveness estimates when certain common assumptions do not hold.
arXiv Detail & Related papers (2020-07-27T11:49:54Z) - On Adversarial Examples for Biomedical NLP Tasks [4.7677261488999205]
We propose an adversarial evaluation scheme on two well-known datasets for medical NER and STS.
We show that we can significantly improve the robustness of the models by training them with adversarial examples.
arXiv Detail & Related papers (2020-04-23T13:46:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.