Lived Experience Matters: Automatic Detection of Stigma on Social Media
Toward People Who Use Substances
- URL: http://arxiv.org/abs/2302.02064v2
- Date: Sun, 16 Jul 2023 11:26:31 GMT
- Title: Lived Experience Matters: Automatic Detection of Stigma on Social Media
Toward People Who Use Substances
- Authors: Salvatore Giorgi, Douglas Bellew, Daniel Roy Sadek Habib, Garrick
Sherman, Joao Sedoc, Chase Smitterberg, Amanda Devoto, McKenzie
Himelein-Wachowiak, and Brenda Curtis
- Abstract summary: Stigma toward people who use substances (PWUS) is a leading barrier to seeking treatment.
This paper explores stigma toward PWUS using a data set of roughly 5,000 public Reddit posts.
- Score: 1.7378386177263254
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Stigma toward people who use substances (PWUS) is a leading barrier to
seeking treatment.Further, those in treatment are more likely to drop out if
they experience higher levels of stigmatization. While related concepts of hate
speech and toxicity, including those targeted toward vulnerable populations,
have been the focus of automatic content moderation research, stigma and, in
particular, people who use substances have not. This paper explores stigma
toward PWUS using a data set of roughly 5,000 public Reddit posts. We performed
a crowd-sourced annotation task where workers are asked to annotate each post
for the presence of stigma toward PWUS and answer a series of questions related
to their experiences with substance use. Results show that workers who use
substances or know someone with a substance use disorder are more likely to
rate a post as stigmatizing. Building on this, we use a supervised machine
learning framework that centers workers with lived substance use experience to
label each Reddit post as stigmatizing. Modeling person-level demographics in
addition to comment-level language results in a classification accuracy (as
measured by AUC) of 0.69 -- a 17% increase over modeling language alone.
Finally, we explore the linguist cues which distinguish stigmatizing content:
PWUS substances and those who don't agree that language around othering
("people", "they") and terms like "addict" are stigmatizing, while PWUS (as
opposed to those who do not) find discussions around specific substances more
stigmatizing. Our findings offer insights into the nature of perceived stigma
in substance use. Additionally, these results further establish the subjective
nature of such machine learning tasks, highlighting the need for understanding
their social contexts.
Related papers
- Words Matter: Reducing Stigma in Online Conversations about Substance Use with Large Language Models [0.0]
Stigma is a barrier to treatment for individuals struggling with substance use disorders (SUD)
This study investigates how stigma manifests on social media, particularly Reddit, where anonymity can exacerbate discriminatory behaviors.
arXiv Detail & Related papers (2024-08-15T01:00:28Z) - Decoding the Narratives: Analyzing Personal Drug Experiences Shared on Reddit [1.080878521069079]
This study aims to develop a multi-level, multi-label classification model to analyze online user-generated texts about substance use experiences.
Using various multi-label classification algorithms on a set of annotated data, we show that GPT-4, when prompted with instructions, definitions, and examples, outperformed all other models.
arXiv Detail & Related papers (2024-06-17T21:56:57Z) - Reddit-Impacts: A Named Entity Recognition Dataset for Analyzing Clinical and Social Effects of Substance Use Derived from Social Media [6.138126219622993]
Substance use disorders (SUDs) are a growing concern globally, necessitating enhanced understanding of the problem and its trends through data-driven research.
Social media are unique and important sources of information about SUDs, particularly since the data in such sources are often generated by people with lived experiences.
In this paper, we introduce Reddit-Impacts, a challenging Named Entity Recognition (NER) dataset curated from subreddits dedicated to discussions on prescription and illicit opioids, as well as medications for opioid use disorder.
The dataset specifically concentrates on the lesser-studied, yet critically important, aspects of substance use--its
arXiv Detail & Related papers (2024-05-09T23:43:57Z) - Comparing Hallucination Detection Metrics for Multilingual Generation [62.97224994631494]
This paper assesses how well various factual hallucination detection metrics identify hallucinations in generated biographical summaries across languages.
We compare how well automatic metrics correlate to each other and whether they agree with human judgments of factuality.
Our analysis reveals that while the lexical metrics are ineffective, NLI-based metrics perform well, correlating with human annotations in many settings and often outperforming supervised models.
arXiv Detail & Related papers (2024-02-16T08:10:34Z) - Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts [26.161892748901252]
We present a corpus of 2500 opioid-related posts from various subreddits labeled with six different phases of opioid use.
For every post, we annotate span-level explanations and crucially study their role both in annotation quality and model development.
arXiv Detail & Related papers (2023-11-15T16:05:55Z) - Bias Against 93 Stigmatized Groups in Masked Language Models and
Downstream Sentiment Classification Tasks [2.5690340428649323]
This study extends the focus of bias evaluation in extant work by examining bias against social stigmas on a large scale.
It focuses on 93 stigmatized groups in the United States, including a wide range of conditions related to disease, disability, drug use, mental illness, religion, sexuality, socioeconomic status, and other relevant factors.
We investigate bias against these groups in English pre-trained Masked Language Models (MLMs) and their downstream sentiment classification tasks.
arXiv Detail & Related papers (2023-06-08T20:46:09Z) - Goal Driven Discovery of Distributional Differences via Language
Descriptions [58.764821647036946]
Mining large corpora can generate useful discoveries but is time-consuming for humans.
We formulate a new task, D5, that automatically discovers differences between two large corpora in a goal-driven way.
Our system produces discoveries previously unknown to the authors on a wide range of applications in OpenD5.
arXiv Detail & Related papers (2023-02-28T01:32:32Z) - Semantic Similarity Models for Depression Severity Estimation [53.72188878602294]
This paper presents an efficient semantic pipeline to study depression severity in individuals based on their social media writings.
We use test user sentences for producing semantic rankings over an index of representative training sentences corresponding to depressive symptoms and severity levels.
We evaluate our methods on two Reddit-based benchmarks, achieving 30% improvement over state of the art in terms of measuring depression severity.
arXiv Detail & Related papers (2022-11-14T18:47:26Z) - Toxicity Detection can be Sensitive to the Conversational Context [64.28043776806213]
We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels.
We introduce a new task, context sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context is also considered.
arXiv Detail & Related papers (2021-11-19T13:57:26Z) - Annotators with Attitudes: How Annotator Beliefs And Identities Bias
Toxic Language Detection [75.54119209776894]
We investigate the effect of annotator identities (who) and beliefs (why) on toxic language annotations.
We consider posts with three characteristics: anti-Black language, African American English dialect, and vulgarity.
Our results show strong associations between annotator identity and beliefs and their ratings of toxicity.
arXiv Detail & Related papers (2021-11-15T18:58:20Z) - Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language.
We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection.
Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.