Related papers: UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small Expert-Labeled Tweets for Foodborne Illness Detection

UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small Expert-Labeled Tweets for Foodborne Illness Detection

URL: http://arxiv.org/abs/2312.01225v1
Date: Sat, 2 Dec 2023 21:03:23 GMT
Title: UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small Expert-Labeled Tweets for Foodborne Illness Detection
Authors: Ruofan Hu, Dongyu Zhang, Dandan Tao, Huayi Zhang, Hao Feng, and Elke Rundensteiner
Abstract summary: We propose EGAL, a deep learning framework for foodborne illness detection. EGAL uses small expert-labeled tweets augmented by crowdsourced-labeled and massive unlabeled data. EGAL has the potential to be deployed for real-time analysis of tweet streaming, contributing to foodborne illness outbreak surveillance efforts.
Score: 8.934980946374367
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Foodborne illnesses significantly impact public health. Deep learning surveillance applications using social media data aim to detect early warning signals. However, labeling foodborne illness-related tweets for model training requires extensive human resources, making it challenging to collect a sufficient number of high-quality labels for tweets within a limited budget. The severe class imbalance resulting from the scarcity of foodborne illness-related tweets among the vast volume of social media further exacerbates the problem. Classifiers trained on a class-imbalanced dataset are biased towards the majority class, making accurate detection difficult. To overcome these challenges, we propose EGAL, a deep learning framework for foodborne illness detection that uses small expert-labeled tweets augmented by crowdsourced-labeled and massive unlabeled data. Specifically, by leveraging tweets labeled by experts as a reward set, EGAL learns to assign a weight of zero to incorrectly labeled tweets to mitigate their negative influence. Other tweets receive proportionate weights to counter-balance the unbalanced class distribution. Extensive experiments on real-world \textit{TWEET-FID} data show that EGAL outperforms strong baseline models across different settings, including varying expert-labeled set sizes and class imbalance ratios. A case study on a multistate outbreak of Salmonella Typhimurium infection linked to packaged salad greens demonstrates how the trained model captures relevant tweets offering valuable outbreak insights. EGAL, funded by the U.S. Department of Agriculture (USDA), has the potential to be deployed for real-time analysis of tweet streaming, contributing to foodborne illness outbreak surveillance efforts.

Related papers

Epidemiology-informed Network for Robust Rumor Detection [59.89351792706995]
We propose a novel Epidemiology-informed Network (EIN) that integrates epidemiological knowledge to enhance performance. To adapt epidemiology theory to rumor detection, it is expected that each users stance toward the source information will be annotated. Our experimental results demonstrate that the proposed EIN not only outperforms state-of-the-art methods on real-world datasets but also exhibits enhanced robustness across varying tree depths.
arXiv Detail & Related papers (2024-11-20T00:43:32Z)
Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks. It is quite beneficial and challenging to detect poisoned samples from a mixed dataset. We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z)
CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting. Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z)
Named Entity Recognition for Monitoring Plant Health Threats in Tweets: a ChouBERT Approach [0.0]
ChouBERT is a pre-trained language model that can identify Tweets concerning observations of plant health issues with generalizability on unseen natural hazards. This paper tackles the lack of labelled data by further studying ChouBERT's know-how on token-level annotation tasks over small labeled sets.
arXiv Detail & Related papers (2023-10-19T06:54:55Z)
A Novel Site-Agnostic Multimodal Deep Learning Model to Identify Pro-Eating Disorder Content on Social Media [0.0]
This study aimed to create a multimodal deep learning model that can determine if a social media post promotes eating disorders. A labeled dataset of Tweets was collected from Twitter, recently rebranded as X, upon which twelve deep learning models were trained and evaluated. The RoBERTa and MaxViT fusion model, deployed to classify an unlabeled dataset of posts from the social media sites Tumblr and Reddit, generated results akin to those of previous research studies.
arXiv Detail & Related papers (2023-07-06T16:04:46Z)
Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information. By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples. We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z)
RevealED: Uncovering Pro-Eating Disorder Content on Twitter Using Deep Learning [0.0]
This study aimed to create a deep learning model capable of determining whether a social media post promotes eating disorders based solely on image data. Several deep-learning models were trained on the scraped dataset and were evaluated based on their accuracy, F1 score, precision, and recall. The model, which was applied to unlabeled Twitter image data scraped from "#selfie", uncovered seasonal fluctuations in the relative abundance of pro-eating disorder content.
arXiv Detail & Related papers (2022-12-28T16:50:49Z)
Attend Who is Weak: Pruning-assisted Medical Image Localization under Sophisticated and Implicit Imbalances [102.68466217374655]
Deep neural networks (DNNs) have rapidly become a textitde facto choice for medical image understanding tasks. In this paper, we propose to use pruning to automatically and adaptively identify textithard-to-learn (HTL) training samples. We also present an interesting demographic analysis which illustrates HTLs ability to capture complex demographic imbalances.
arXiv Detail & Related papers (2022-12-06T00:32:03Z)
TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks [14.523433519237607]
Foodborne illness is a serious but preventable public health problem. There is a dearth of labeled datasets for developing effective outbreak detection models. We present TWEET-FID, the first publicly available annotated dataset for foodborne illness incident detection tasks.
arXiv Detail & Related papers (2022-05-22T03:47:18Z)
Robust Deep Semi-Supervised Learning: A Brief Introduction [63.09703308309176]
Semi-supervised learning (SSL) aims to improve learning performance by leveraging unlabeled data when labels are insufficient. SSL with deep models has proven to be successful on standard benchmark tasks. However, they are still vulnerable to various robustness threats in real-world applications.
arXiv Detail & Related papers (2022-02-12T04:16:41Z)
Combining exogenous and endogenous signals with a semi-supervised co-attention network for early detection of COVID-19 fake tweets [14.771202995527315]
During COVID-19, tweets with misinformation should be flagged and neutralized in their early stages to mitigate the damages. Most of the existing methods for early detection of fake news assume to have enough propagation information for large labeled tweets. We present ENDEMIC, a novel early detection model which leverages endogenous and endogenous signals related to tweets.
arXiv Detail & Related papers (2021-04-12T10:01:44Z)
Leveraging Multi-Source Weak Social Supervision for Early Detection of Fake News [67.53424807783414]
Social media has greatly enabled people to participate in online activities at an unprecedented rate. This unrestricted access also exacerbates the spread of misinformation and fake news online which might cause confusion and chaos unless being detected early for its mitigation. We jointly leverage the limited amount of clean data along with weak signals from social engagements to train deep neural networks in a meta-learning framework to estimate the quality of different weak instances. Experiments on realworld datasets demonstrate that the proposed framework outperforms state-of-the-art baselines for early detection of fake news without using any user engagements at prediction time.
arXiv Detail & Related papers (2020-04-03T18:26:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.