Balanced and Explainable Social Media Analysis for Public Health with
Large Language Models
- URL: http://arxiv.org/abs/2309.05951v1
- Date: Tue, 12 Sep 2023 04:15:34 GMT
- Title: Balanced and Explainable Social Media Analysis for Public Health with
Large Language Models
- Authors: Yan Jiang, Ruihong Qiu, Yi Zhang, Peng-Fei Zhang
- Abstract summary: Current techniques for public health analysis involve popular models such as BERT and large language models (LLMs)
To tackle these challenges, the data imbalance issue can be overcome by sophisticated data augmentation methods for social media datasets.
In this paper, a novel ALEX framework is proposed for social media analysis on public health.
- Score: 13.977401672173533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As social media becomes increasingly popular, more and more public health
activities emerge, which is worth noting for pandemic monitoring and government
decision-making. Current techniques for public health analysis involve popular
models such as BERT and large language models (LLMs). Although recent progress
in LLMs has shown a strong ability to comprehend knowledge by being fine-tuned
on specific domain datasets, the costs of training an in-domain LLM for every
specific public health task are especially expensive. Furthermore, such kinds
of in-domain datasets from social media are generally highly imbalanced, which
will hinder the efficiency of LLMs tuning. To tackle these challenges, the data
imbalance issue can be overcome by sophisticated data augmentation methods for
social media datasets. In addition, the ability of the LLMs can be effectively
utilised by prompting the model properly. In light of the above discussion, in
this paper, a novel ALEX framework is proposed for social media analysis on
public health. Specifically, an augmentation pipeline is developed to resolve
the data imbalance issue. Furthermore, an LLMs explanation mechanism is
proposed by prompting an LLM with the predicted results from BERT models.
Extensive experiments conducted on three tasks at the Social Media Mining for
Health 2023 (SMM4H) competition with the first ranking in two tasks demonstrate
the superior performance of the proposed ALEX method. Our code has been
released in https://github.com/YanJiangJerry/ALEX.
Related papers
- How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library [68.10605098856087]
With the rise of Large Language Models (LLMs) in recent years, new opportunities are emerging, but also new challenges, and contamination is quickly becoming critical.
Business applications and fundraising in AI have reached a scale at which a few percentage points gained on popular question-answering benchmarks could translate into dozens of millions of dollars.
It is becoming harder and harder to keep track of the data that LLMs have seen; if not impossible with closed-source models like GPT-4 and Claude-3 not divulging any information on the training set.
arXiv Detail & Related papers (2024-03-31T14:32:02Z) - ChatGPT Based Data Augmentation for Improved Parameter-Efficient
Debiasing of LLMs [69.27030571729392]
Large Language models (LLMs) exhibit harmful social biases.
This work introduces a novel approach utilizing ChatGPT to generate synthetic training data.
arXiv Detail & Related papers (2024-02-19T01:28:48Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - Countering Misinformation via Emotional Response Generation [15.383062216223971]
proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and democracy.
Previous research has shown how social correction can be an effective way to curb misinformation.
We present VerMouth, the first large-scale dataset comprising roughly 12 thousand claim-response pairs.
arXiv Detail & Related papers (2023-11-17T15:37:18Z) - Harnessing the Power of LLMs: Evaluating Human-AI Text Co-Creation
through the Lens of News Headline Generation [58.31430028519306]
This study explores how humans can best leverage LLMs for writing and how interacting with these models affects feelings of ownership and trust in the writing process.
While LLMs alone can generate satisfactory news headlines, on average, human control is needed to fix undesirable model outputs.
arXiv Detail & Related papers (2023-10-16T15:11:01Z) - Automated Claim Matching with Large Language Models: Empowering
Fact-Checkers in the Fight Against Misinformation [11.323961700172175]
FACT-GPT is a framework designed to automate the claim matching phase of fact-checking using Large Language Models.
This framework identifies new social media content that either supports or contradicts claims previously debunked by fact-checkers.
We evaluated FACT-GPT on an extensive dataset of social media content related to public health.
arXiv Detail & Related papers (2023-10-13T16:21:07Z) - A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics [32.10937977924507]
The utilization of large language models (LLMs) in the Healthcare domain has generated both excitement and concern.
This survey outlines the capabilities of the currently developed LLMs for Healthcare and explicates their development process.
arXiv Detail & Related papers (2023-10-09T13:15:23Z) - MentaLLaMA: Interpretable Mental Health Analysis on Social Media with
Large Language Models [28.62967557368565]
We build the first multi-task and multi-source interpretable mental health instruction dataset on social media, with 105K data samples.
We use expert-written few-shot prompts and collected labels to prompt ChatGPT and obtain explanations from its responses.
Based on the IMHI dataset and LLaMA2 foundation models, we train MentalLLaMA, the first open-source LLM series for interpretable mental health analysis.
arXiv Detail & Related papers (2023-09-24T06:46:08Z) - UQ at #SMM4H 2023: ALEX for Public Health Analysis with Social Media [33.081637097464146]
Current techniques for public health analysis involve popular models such as BERT and large language models (LLMs)
In this paper, a novel ALEX framework is proposed to improve the performance of public health analysis on social media.
arXiv Detail & Related papers (2023-09-08T08:54:55Z) - Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks.
Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information.
This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.