Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models
- URL: http://arxiv.org/abs/2409.17990v1
- Date: Thu, 26 Sep 2024 16:02:00 GMT
- Title: Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models
- Authors: Georg Ahnert, Max Pellert, David Garcia, Markus Strohmaier,
- Abstract summary: This paper proposes temporally aligned Large Language Models (LLMs) as a tool for longitudinal analysis of social media data.
We fine-tune Temporal Adapters for Llama 3 8B on full timelines from a panel of British Twitter users, and extract longitudinal aggregates of emotions and attitudes with established questionnaires.
- Score: 1.9299251284637737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes temporally aligned Large Language Models (LLMs) as a tool for longitudinal analysis of social media data. We fine-tune Temporal Adapters for Llama 3 8B on full timelines from a panel of British Twitter users, and extract longitudinal aggregates of emotions and attitudes with established questionnaires. We validate our estimates against representative British survey data and find strong positive, significant correlations for several collective emotions. The obtained estimates are robust across multiple training seeds and prompt formulations, and in line with collective emotions extracted using a traditional classification model trained on labeled data. To the best of our knowledge, this is the first work to extend the analysis of affect in LLMs to a longitudinal setting through Temporal Adapters. Our work enables new approaches towards the longitudinal analysis of social media data.
Related papers
- Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding [57.62275091656578]
We refer to the complex events composed of many news articles over an extended period as Temporal Complex Event (TCE)
This paper proposes a novel approach using Large Language Models (LLMs) to systematically extract and analyze the event chain within TCE.
arXiv Detail & Related papers (2024-06-04T16:42:17Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Evaluation of Faithfulness Using the Longest Supported Subsequence [52.27522262537075]
We introduce a novel approach to evaluate faithfulness of machine-generated text by computing the longest noncontinuous of the claim that is supported by the context.
Using a new human-annotated dataset, we finetune a model to generate Longest Supported Subsequence (LSS)
Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset.
arXiv Detail & Related papers (2023-08-23T14:18:44Z) - Exploring the Power of Topic Modeling Techniques in Analyzing Customer
Reviews: A Comparative Analysis [0.0]
Machine learning and natural language processing algorithms have been deployed to analyze the vast amount of textual data available online.
In this study, we examine and compare five frequently used topic modeling methods specifically applied to customer reviews.
Our findings reveal that BERTopic consistently yield more meaningful extracted topics and achieve favorable results.
arXiv Detail & Related papers (2023-08-19T08:18:04Z) - Tweet Insights: A Visualization Platform to Extract Temporal Insights
from Twitter [19.591692602304494]
This paper introduces a large collection of time series data derived from Twitter.
This data comprises the past five years and captures changes in n-gram frequency, similarity, sentiment and topic distribution.
The interface built on top of this data enables temporal analysis for detecting and characterizing shifts in meaning.
arXiv Detail & Related papers (2023-08-04T05:39:26Z) - Curating corpora with classifiers: A case study of clean energy sentiment online [0.8192907805418583]
Large-scale corpora of social media posts contain broad public opinion.
Surveys can be expensive to run and lag public opinion by days or weeks.
We propose a method for rapidly selecting the best corpus of relevant documents for analysis.
arXiv Detail & Related papers (2023-05-04T18:15:45Z) - Time Series Contrastive Learning with Information-Aware Augmentations [57.45139904366001]
A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples.
How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question.
We propose a new contrastive learning approach with information-aware augmentations, InfoTS, that adaptively selects optimal augmentations for time series representation learning.
arXiv Detail & Related papers (2023-03-21T15:02:50Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Validating daily social media macroscopes of emotions [0.12656629989060433]
We run a large-scale survey at an online newspaper to gather daily self-reports of affective states from its users.
We compare these with aggregated results of sentiment analysis of user discussions on the same online platform.
For both platforms, we find strong correlations between text analysis results and levels of self-reported emotions.
arXiv Detail & Related papers (2021-08-17T14:28:56Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Learning User Embeddings from Temporal Social Media Data: A Survey [15.324014759254915]
We survey representative work on learning a concise latent user representation (a.k.a. user embedding) that can capture the main characteristics of a social media user.
The learned user embeddings can later be used to support different downstream user analysis tasks such as personality modeling, suicidal risk assessment and purchase decision prediction.
arXiv Detail & Related papers (2021-05-17T16:22:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.