Suicide Risk Assessment on Social Media with Semi-Supervised Learning
- URL: http://arxiv.org/abs/2411.12767v1
- Date: Mon, 18 Nov 2024 02:43:05 GMT
- Title: Suicide Risk Assessment on Social Media with Semi-Supervised Learning
- Authors: Max Lovitt, Haotian Ma, Song Wang, Yifan Peng,
- Abstract summary: We propose a semi-supervised framework that leverages labeled and unlabeled data.
We manually verify a subset of the pseudo-labeled data that was not predicted unanimously across multiple trials of pseudo-label generation.
By leveraging partially validated pseudo-labeled data in addition to ground-truth labeled data, we substantially improve our model's ability to assess suicide risk from social media posts.
- Score: 20.193174124912282
- License:
- Abstract: With social media communities increasingly becoming places where suicidal individuals post and congregate, natural language processing presents an exciting avenue for the development of automated suicide risk assessment systems. However, past efforts suffer from a lack of labeled data and class imbalances within the available labeled data. To accommodate this task's imperfect data landscape, we propose a semi-supervised framework that leverages labeled (n=500) and unlabeled (n=1,500) data and expands upon the self-training algorithm with a novel pseudo-label acquisition process designed to handle imbalanced datasets. To further ensure pseudo-label quality, we manually verify a subset of the pseudo-labeled data that was not predicted unanimously across multiple trials of pseudo-label generation. We test various models to serve as the backbone for this framework, ultimately deciding that RoBERTa performs the best. Ultimately, by leveraging partially validated pseudo-labeled data in addition to ground-truth labeled data, we substantially improve our model's ability to assess suicide risk from social media posts.
Related papers
- Fair-OBNC: Correcting Label Noise for Fairer Datasets [9.427445881721814]
biases in the training data are sometimes related to label noise.
Models trained on such biased data may perpetuate or even aggravate the biases with respect to sensitive information.
We propose Fair-OBNC, a label noise correction method with fairness considerations.
arXiv Detail & Related papers (2024-10-08T17:18:18Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Label Matching Semi-Supervised Object Detection [85.99282969977541]
Semi-supervised object detection has made significant progress with the development of mean teacher driven self-training.
Label mismatch problem is not yet fully explored in the previous works, leading to severe confirmation bias during self-training.
We propose a simple yet effective LabelMatch framework from two different yet complementary perspectives.
arXiv Detail & Related papers (2022-06-14T05:59:41Z) - Construction of Large-Scale Misinformation Labeled Datasets from Social
Media Discourse using Label Refinement [16.754951815543006]
We propose to leverage news-source credibility labels as weak labels for social media posts.
The framework will incorporate social context of the post in terms of the community of its associated user for surfacing inaccurate labels.
The approach is demonstrated for providing a large-scale misinformation dataset on COVID-19 vaccines.
arXiv Detail & Related papers (2022-02-24T23:10:29Z) - Debiased Learning from Naturally Imbalanced Pseudo-Labels for Zero-Shot
and Semi-Supervised Learning [27.770473405635585]
This work studies the bias issue of pseudo-labeling, a natural phenomenon that widely occurs but often overlooked by prior research.
We observe heavy long-tailed pseudo-labels when a semi-supervised learning model FixMatch predicts labels on the unlabeled set even though the unlabeled data is curated to be balanced.
Without intervention, the training model inherits the bias from the pseudo-labels and end up being sub-optimal.
arXiv Detail & Related papers (2022-01-05T07:40:24Z) - Uncertainty-aware Mean Teacher for Source-free Unsupervised Domain
Adaptive 3D Object Detection [6.345037597566315]
Pseudo-label based self training approaches are a popular method for source-free unsupervised domain adaptation.
We propose an uncertainty-aware mean teacher framework which implicitly filters incorrect pseudo-labels during training.
arXiv Detail & Related papers (2021-09-29T18:17:09Z) - Boosting Semi-Supervised Face Recognition with Noise Robustness [54.342992887966616]
This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.
We develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN.
arXiv Detail & Related papers (2021-05-10T14:43:11Z) - Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning.
It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model.
It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z) - Self-Supervised Noisy Label Learning for Source-Free Unsupervised Domain
Adaptation [87.60688582088194]
We propose a novel Self-Supervised Noisy Label Learning method.
Our method can easily achieve state-of-the-art results and surpass other methods by a very large margin.
arXiv Detail & Related papers (2021-02-23T10:51:45Z) - Semi-supervised Relation Extraction via Incremental Meta Self-Training [56.633441255756075]
Semi-Supervised Relation Extraction methods aim to leverage unlabeled data in addition to learning from limited samples.
Existing self-training methods suffer from the gradual drift problem, where noisy pseudo labels on unlabeled data are incorporated during training.
We propose a method called MetaSRE, where a Relation Label Generation Network generates quality assessment on pseudo labels by (meta) learning from the successful and failed attempts on Relation Classification Network as an additional meta-objective.
arXiv Detail & Related papers (2020-10-06T03:54:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.