Opioid Named Entity Recognition (ONER-2025) from Reddit
- URL: http://arxiv.org/abs/2504.00027v3
- Date: Wed, 30 Apr 2025 21:34:50 GMT
- Title: Opioid Named Entity Recognition (ONER-2025) from Reddit
- Authors: Grigori Sidorov, Muhammad Ahmad, Iqra Ameer, Muhammad Usman, Ildar Batyrshin,
- Abstract summary: Social media platforms like Reddit provide vast amounts of unstructured data that offer insights into public perceptions, discussions, and experiences related to opioid use.<n>This study leverages Natural Language Processing (NLP), specifically Opioid Named Entity Recognition (ONER-2025), to extract actionable information from these platforms.<n>First, we created a unique, manually annotated dataset sourced from Reddit, where users share self-reported experiences of opioid use via different administration routes.<n>Second, we detail our annotation process and guidelines while discussing the challenges of labeling the ONER-2025 dataset.<n>Third, we analyze key linguistic challenges, including slang, ambiguity, fragmented
- Score: 5.641312824886231
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The opioid overdose epidemic remains a critical public health crisis, particularly in the United States, leading to significant mortality and societal costs. Social media platforms like Reddit provide vast amounts of unstructured data that offer insights into public perceptions, discussions, and experiences related to opioid use. This study leverages Natural Language Processing (NLP), specifically Opioid Named Entity Recognition (ONER-2025), to extract actionable information from these platforms. Our research makes four key contributions. First, we created a unique, manually annotated dataset sourced from Reddit, where users share self-reported experiences of opioid use via different administration routes. This dataset contains 331,285 tokens and includes eight major opioid entity categories. Second, we detail our annotation process and guidelines while discussing the challenges of labeling the ONER-2025 dataset. Third, we analyze key linguistic challenges, including slang, ambiguity, fragmented sentences, and emotionally charged language, in opioid discussions. Fourth, we propose a real-time monitoring system to process streaming data from social media, healthcare records, and emergency services to identify overdose events. Using 5-fold cross-validation in 11 experiments, our system integrates machine learning, deep learning, and transformer-based language models with advanced contextual embeddings to enhance understanding. Our transformer-based models (bert-base-NER and roberta-base) achieved 97% accuracy and F1-score, outperforming baselines by 10.23% (RF=0.88).
Related papers
- A Thematic Framework for Analyzing Large-scale Self-reported Social Media Data on Opioid Use Disorder Treatment Using Buprenorphine Product [1.4599176517017673]
Buprenorphine is one of the key FDA-approved medications for Opioid Use Disorder.
Despite its popularity, individuals often report various information needs regarding buprenorphine treatment on social media platforms like Reddit.
We propose a theme-based framework to curate and analyze large-scale data from social media to characterize self-reported treatment information needs.
arXiv Detail & Related papers (2024-10-02T15:04:21Z) - A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis [48.84443450990355]
Deep networks have achieved broad success in analyzing natural images, when applied to medical scans, they often fail in unexcepted situations.
We investigate this challenge and focus on model sensitivity to domain shifts, such as data sampled from different hospitals or data confounded by demographic variables such as sex, race, etc, in the context of chest X-rays and skin lesion images.
Taking inspiration from medical training, we propose giving deep networks a prior grounded in explicit medical knowledge communicated in natural language.
arXiv Detail & Related papers (2024-05-23T17:55:02Z) - Reddit-Impacts: A Named Entity Recognition Dataset for Analyzing Clinical and Social Effects of Substance Use Derived from Social Media [6.138126219622993]
Substance use disorders (SUDs) are a growing concern globally, necessitating enhanced understanding of the problem and its trends through data-driven research.
Social media are unique and important sources of information about SUDs, particularly since the data in such sources are often generated by people with lived experiences.
In this paper, we introduce Reddit-Impacts, a challenging Named Entity Recognition (NER) dataset curated from subreddits dedicated to discussions on prescription and illicit opioids, as well as medications for opioid use disorder.
The dataset specifically concentrates on the lesser-studied, yet critically important, aspects of substance use--its
arXiv Detail & Related papers (2024-05-09T23:43:57Z) - "Hey..! This medicine made me sick": Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques [2.2874754079405535]
This project proposes a drug review classification system that classifies user reviews on a particular drug into different classes, such as positive, negative, and neutral.
The collected data is manually labeled and verified manually to ensure that the labels are correct.
arXiv Detail & Related papers (2024-04-09T08:42:34Z) - Learning to Describe for Predicting Zero-shot Drug-Drug Interactions [54.172575323610175]
Adverse drug-drug interactions can compromise the effectiveness of concurrent drug administration.
Traditional computational methods for DDI prediction may fail to capture interactions for new drugs due to the lack of knowledge.
We propose TextDDI with a language model-based DDI predictor and a reinforcement learning(RL)-based information selector.
arXiv Detail & Related papers (2024-03-13T09:42:46Z) - Detection of Opioid Users from Reddit Posts via an Attention-based Bidirectional Recurrent Neural Network [11.491225833044021]
We take advantage of recent advances in machine learning to identify opioid users on Reddit.
posts from more than 1,000 users who have posted on three sub-reddits over a period of one month have been collected.
We apply an attention-based bidirectional long short memory model to identify opioid users.
arXiv Detail & Related papers (2024-02-09T22:12:20Z) - Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts [26.161892748901252]
We present a corpus of 2500 opioid-related posts from various subreddits labeled with six different phases of opioid use.
For every post, we annotate span-level explanations and crucially study their role both in annotation quality and model development.
arXiv Detail & Related papers (2023-11-15T16:05:55Z) - Goal Driven Discovery of Distributional Differences via Language
Descriptions [58.764821647036946]
Mining large corpora can generate useful discoveries but is time-consuming for humans.
We formulate a new task, D5, that automatically discovers differences between two large corpora in a goal-driven way.
Our system produces discoveries previously unknown to the authors on a wide range of applications in OpenD5.
arXiv Detail & Related papers (2023-02-28T01:32:32Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z) - Computational Support for Substance Use Disorder Prevention, Detection,
Treatment, and Recovery [62.9980747784214]
Substance Use Disorders involve the misuse of alcohol, opioids, marijuana, and methamphetamine.
1 in 12 U.S. adults have or have had a substance use disorder.
National Institute on Drug Abuse estimates that SUDs cost the U.S. $520 billion annually.
arXiv Detail & Related papers (2020-06-23T18:30:20Z) - DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment
Prediction [67.91606509226132]
Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment.
DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria (tabular data) into a shared latent space for matching inference.
arXiv Detail & Related papers (2020-01-22T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.