Theme-driven Keyphrase Extraction to Analyze Social Media Discourse
- URL: http://arxiv.org/abs/2301.11508v2
- Date: Sun, 28 May 2023 20:06:35 GMT
- Title: Theme-driven Keyphrase Extraction to Analyze Social Media Discourse
- Authors: William Romano, Omar Sharif, Madhusudan Basak, Joseph Gatto, and Sarah
Preum
- Abstract summary: This paper introduces a theme-driven keyphrase extraction framework tailored for social media.
We develop a novel data collection and curation framework for theme-driven keyphrase extraction.
We create MOUD-Keyphrase, the first dataset of its kind comprising human-annotated keyphrases from a Reddit community.
- Score: 3.2365983191405103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social media platforms are vital resources for sharing self-reported health
experiences, offering rich data on various health topics. Despite advancements
in Natural Language Processing (NLP) enabling large-scale social media data
analysis, a gap remains in applying keyphrase extraction to health-related
content. Keyphrase extraction is used to identify salient concepts in social
media discourse without being constrained by predefined entity classes. This
paper introduces a theme-driven keyphrase extraction framework tailored for
social media, a pioneering approach designed to capture clinically relevant
keyphrases from user-generated health texts. Themes are defined as broad
categories determined by the objectives of the extraction task. We formulate
this novel task of theme-driven keyphrase extraction and demonstrate its
potential for efficiently mining social media text for the use case of
treatment for opioid use disorder. This paper leverages qualitative and
quantitative analysis to demonstrate the feasibility of extracting actionable
insights from social media data and efficiently extracting keyphrases using
minimally supervised NLP models. Our contributions include the development of a
novel data collection and curation framework for theme-driven keyphrase
extraction and the creation of MOUD-Keyphrase, the first dataset of its kind
comprising human-annotated keyphrases from a Reddit community. We also identify
the scope of minimally supervised NLP models to extract keyphrases from social
media data efficiently. Lastly, we found that a large language model (ChatGPT)
outperforms unsupervised keyphrase extraction models, and we evaluate its
efficacy in this task.
Related papers
- Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-Sample Aggregation on Large Language Models [52.829293635314194]
Keyphrase generation is a long-standing NLP task for automatically generating keyphrases for a given document.
We focus on the zero-shot capabilities of open-source instruction-tuned LLMs (Phi-3, Llama-3) and the closed-source GPT-4o for this task.
arXiv Detail & Related papers (2025-03-01T19:38:57Z) - MetaKP: On-Demand Keyphrase Generation [52.48698290354449]
We introduce on-demand keyphrase generation, a novel paradigm that requires keyphrases that conform to specific high-level goals or intents.
We present MetaKP, a large-scale benchmark comprising four datasets, 7500 documents, and 3760 goals across news and biomedical domains with human-annotated keyphrases.
We demonstrate the potential of our method to serve as a general NLP infrastructure, exemplified by its application in epidemic event detection from social media.
arXiv Detail & Related papers (2024-06-28T19:02:59Z) - A Large-Scale Evaluation of Speech Foundation Models [110.95827399522204]
We establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the foundation model paradigm for speech.
We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads.
arXiv Detail & Related papers (2024-04-15T00:03:16Z) - Improving Keyphrase Extraction with Data Augmentation and Information
Filtering [67.43025048639333]
Keyphrase extraction is one of the essential tasks for document understanding in NLP.
We present a novel corpus and method for keyphrase extraction from the videos streamed on the Behance platform.
arXiv Detail & Related papers (2022-09-11T22:38:02Z) - Retrieval-Augmented Multilingual Keyphrase Generation with
Retriever-Generator Iterative Training [66.64843711515341]
Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text.
We call attention to a new setting named multilingual keyphrase generation.
We propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the data shortage problem in non-English languages.
arXiv Detail & Related papers (2022-05-21T00:45:21Z) - Representation Learning for Resource-Constrained Keyphrase Generation [78.02577815973764]
We introduce salient span recovery and salient span prediction as guided denoising language modeling objectives.
We show the effectiveness of the proposed approach for low-resource and zero-shot keyphrase generation.
arXiv Detail & Related papers (2022-03-15T17:48:04Z) - Unsupervised Keyphrase Extraction via Interpretable Neural Networks [27.774524511005172]
Keyphrases that are most useful for predicting the topic of a text are important keyphrases.
InSPECT is a self-explaining neural framework for identifying influential keyphrases.
We show that INSPECT achieves state-of-the-art results in unsupervised key extraction across four diverse datasets.
arXiv Detail & Related papers (2022-03-15T04:30:47Z) - Deep Keyphrase Completion [59.0413813332449]
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval.
We propose textitkeyphrase completion (KPC) to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases.
We name it textitdeep keyphrase completion (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework
arXiv Detail & Related papers (2021-10-29T07:15:35Z) - UniKeyphrase: A Unified Extraction and Generation Framework for
Keyphrase Prediction [20.26899340581431]
Keyphrase Prediction task aims at predicting several keyphrases that can summarize the main idea of the given document.
Mainstream KP methods can be categorized into purely generative approaches and integrated models with extraction and generation.
We propose UniKeyphrase, a novel end-to-end learning framework that jointly learns to extract and generate keyphrases.
arXiv Detail & Related papers (2021-06-09T07:09:51Z) - Persian Keyphrase Generation Using Sequence-to-Sequence Models [1.192436948211501]
Keyphrases are a summary of an input text and provide the main subjects discussed in the text.
In this paper, we try to tackle the problem of keyphrase generation and extraction from news articles using deep sequence-to-sequence models.
arXiv Detail & Related papers (2020-09-25T14:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.