It's Just a Matter of Time: Detecting Depression with Time-Enriched
Multimodal Transformers
- URL: http://arxiv.org/abs/2301.05453v1
- Date: Fri, 13 Jan 2023 09:40:19 GMT
- Title: It's Just a Matter of Time: Detecting Depression with Time-Enriched
Multimodal Transformers
- Authors: Ana-Maria Bucur, Adrian Cosma, Paolo Rosso, Liviu P. Dinu
- Abstract summary: We propose a flexible time-enriched multimodal transformer architecture for detecting depression from social media posts.
Our model operates directly at the user-level, and we enrich it with the relative time between posts by using time2vec positional embeddings.
We show that our method, using EmoBERTa and CLIP embeddings, surpasses other methods on two multimodal datasets.
- Score: 24.776445591293186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depression detection from user-generated content on the internet has been a
long-lasting topic of interest in the research community, providing valuable
screening tools for psychologists. The ubiquitous use of social media platforms
lays out the perfect avenue for exploring mental health manifestations in posts
and interactions with other users. Current methods for depression detection
from social media mainly focus on text processing, and only a few also utilize
images posted by users. In this work, we propose a flexible time-enriched
multimodal transformer architecture for detecting depression from social media
posts, using pretrained models for extracting image and text embeddings. Our
model operates directly at the user-level, and we enrich it with the relative
time between posts by using time2vec positional embeddings. Moreover, we
propose another model variant, which can operate on randomly sampled and
unordered sets of posts to be more robust to dataset noise. We show that our
method, using EmoBERTa and CLIP embeddings, surpasses other methods on two
multimodal datasets, obtaining state-of-the-art results of 0.931 F1 score on a
popular multimodal Twitter dataset, and 0.902 F1 score on the only multimodal
Reddit dataset.
Related papers
- Multi-modal Stance Detection: New Datasets and Model [56.97470987479277]
We study multi-modal stance detection for tweets consisting of texts and images.
We propose a simple yet effective Targeted Multi-modal Prompt Tuning framework (TMPT)
TMPT achieves state-of-the-art performance in multi-modal stance detection.
arXiv Detail & Related papers (2024-02-22T05:24:19Z) - Reading Between the Frames: Multi-Modal Depression Detection in Videos
from Non-Verbal Cues [11.942057763913208]
Depression, a prominent contributor to global disability, affects a substantial portion of the population.
Efforts to detect depression from social media texts have been prevalent, yet only a few works explored depression detection from user-generated video content.
We propose a simple and flexible multi-modal temporal model capable of discerning non-verbal depression cues from diverse modalities in noisy, real-world videos.
arXiv Detail & Related papers (2024-01-05T10:47:42Z) - An Attention-Based Denoising Framework for Personality Detection in
Social Media Texts [1.4887196224762684]
Personality detection based on user-generated texts is a universal method that can be used to build user portraits.
We propose an attention-based information extraction mechanism (AIEM) for long texts, which is applied to quickly locate valuable pieces of information.
We obtain an average accuracy improvement of 10.2% on the gold standard Twitter-Myers-Briggs Type Indicator dataset.
arXiv Detail & Related papers (2023-11-16T14:56:09Z) - Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4)
DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content.
We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z) - Towards Better Multi-modal Keyphrase Generation via Visual Entity
Enhancement and Multi-granularity Image Noise Filtering [79.44443231700201]
Multi-modal keyphrase generation aims to produce a set of keyphrases that represent the core points of the input text-image pair.
The input text and image are often not perfectly matched, and thus the image may introduce noise into the model.
We propose a novel multi-modal keyphrase generation model, which not only enriches the model input with external knowledge, but also effectively filters image noise.
arXiv Detail & Related papers (2023-09-09T09:41:36Z) - Multimodal Detection of Bots on X (Twitter) using Transformers [6.390468088226495]
We propose a novel method for detecting bots in social media.
We use only the user description field and images of three channels.
Experiments conducted on the Cresci'17 and TwiBot-20 datasets demonstrate valuable advantages of our introduced approaches.
arXiv Detail & Related papers (2023-08-28T10:51:11Z) - Depression detection in social media posts using affective and social
norm features [84.12658971655253]
We propose a deep architecture for depression detection from social media posts.
We incorporate profanity and morality features of posts and words in our architecture using a late fusion scheme.
The inclusion of the proposed features yields state-of-the-art results in both settings.
arXiv Detail & Related papers (2023-03-24T21:26:27Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - GAME-ON: Graph Attention Network based Multimodal Fusion for Fake News Detection [6.037721620350107]
We propose GAME-ON, a Graph Neural Network based end-to-end trainable framework to learn more robust data representations for multimodal fake news detection.
Our model outperforms on Twitter by an average of 11% and keeps competitive performance on Weibo, within a 2.6% margin, while using 65% fewer parameters than the best comparable state-of-the-art baseline.
arXiv Detail & Related papers (2022-02-25T03:27:37Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Explainable Depression Detection with Multi-Modalities Using a Hybrid
Deep Learning Model on Social Media [21.619614611039257]
We propose interpretive Multi-Modal Depression Detection with Hierarchical Attention Network MDHAN.
Our model helps improve predictive performance when detecting depression in users who are posting messages publicly on social media.
arXiv Detail & Related papers (2020-07-03T12:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.