GASTON: Graph-Aware Social Transformer for Online Networks
- URL: http://arxiv.org/abs/2602.02524v1
- Date: Mon, 26 Jan 2026 05:45:48 GMT
- Title: GASTON: Graph-Aware Social Transformer for Online Networks
- Authors: Olha Wloch, Liam Hebert, Robin Cohen, Lukasz Golab,
- Abstract summary: GASTON (Graph-Aware Social Transformer for Online Networks) learns text and user embeddings grounded in their local norms.<n>Our solution pretrains community embeddings based on user membership patterns, capturing a community's user base before processing any text.<n>Experiments on tasks such as stress detection, toxicity scoring, and norm violation demonstrate that the embeddings produced by GASTON outperform state-of-the-art baselines.
- Score: 5.659290426197765
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online communities have become essential places for socialization and support, yet they also possess toxicity, echo chambers, and misinformation. Detecting this harmful content is difficult because the meaning of an online interaction stems from both what is written (textual content) and where it is posted (social norms). We propose GASTON (Graph-Aware Social Transformer for Online Networks), which learns text and user embeddings that are grounded in their local norms, providing the necessary context for downstream tasks. The heart of our solution is a contrastive initialization strategy that pretrains community embeddings based on user membership patterns, capturing a community's user base before processing any text. This allows GASTON to distinguish between communities (e.g., a support group vs. a hate group) based on who interacts there, even if they share similar vocabulary. Experiments on tasks such as stress detection, toxicity scoring, and norm violation demonstrate that the embeddings produced by GASTON outperform state-of-the-art baselines.
Related papers
- Community Norms in the Spotlight: Enabling Task-Agnostic Unsupervised Pre-Training to Benefit Online Social Media [1.518418913270911]
We advocate a paradigm shift from task-specific fine-tuning to unsupervised pretraining.<n>We believe that this direction offers many opportunities for AI for Social Good.
arXiv Detail & Related papers (2026-01-26T05:52:19Z) - Operational Validation of Large-Language-Model Agent Social Simulation: Evidence from Voat v/technology [59.63189507373199]
We build a technology community simulation modeled on Voat, a Reddit-like alt-right news aggregator and discussion platform active from 2014 to 2020.<n>Using the YSocial framework, we seed the simulation with a fixed catalog of technology links sampled from Voat's shared URLs.<n>Agents generate posts, replies, and reactions under platform rules for link and text submissions, threaded replies and daily activity cycles.<n>Results indicate familiar online regularities: similar activity rhythms, heavy-tailed participation, sparse low-clustering interaction networks, core-periphery structure, topical alignment with Voat, and elevated toxicity
arXiv Detail & Related papers (2025-08-29T16:06:27Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - Analyzing Norm Violations in Live-Stream Chat [49.120561596550395]
We study the first NLP study dedicated to detecting norm violations in conversations on live-streaming platforms.
We define norm violation categories in live-stream chats and annotate 4,583 moderated comments from Twitch.
Our results show that appropriate contextual information can boost moderation performance by 35%.
arXiv Detail & Related papers (2023-05-18T05:58:27Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - CRUSH: Contextually Regularized and User anchored Self-supervised Hate
speech Detection [6.759148939470331]
We introduce CRUSH, a framework for hate speech detection using user-anchored self-supervision and contextual regularization.
Our proposed approach secures 1-12% improvement in test set metrics over best performing previous approaches on two types of tasks and multiple popular english social media datasets.
arXiv Detail & Related papers (2022-04-13T13:51:51Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - CASS: Towards Building a Social-Support Chatbot for Online Health
Community [67.45813419121603]
The CASS architecture is based on advanced neural network algorithms.
It can handle new inputs from users and generate a variety of responses to them.
With a follow-up field experiment, CASS is proven useful in supporting individual members who seek emotional support.
arXiv Detail & Related papers (2021-01-04T05:52:03Z) - Analysing Social Media Network Data with R: Semi-Automated Screening of
Users, Comments and Communication Patterns [0.0]
Communication on social media platforms is increasingly widespread across societies.
Fake news, hate speech and radicalizing elements are part of this modern form of communication.
A basic understanding of these mechanisms and communication patterns could help to counteract negative forms of communication.
arXiv Detail & Related papers (2020-11-26T14:52:01Z) - Detecting Online Hate Speech: Approaches Using Weak Supervision and
Network Embedding Models [2.3322477552758234]
We propose a weak supervision deep learning model that quantitatively uncover hateful users and (ii) present a novel qualitative analysis to uncover indirect hateful conversations.
We evaluate our model on 19.2M posts and show that our weak supervision model outperforms the baseline models in identifying indirect hateful interactions.
We also analyze a multilayer network, constructed from two types of user interactions in Gab(quote and reply) and interaction scores from the weak supervision model as edge weights, to predict hateful users.
arXiv Detail & Related papers (2020-07-24T18:13:52Z) - Quantifying the Vulnerabilities of the Online Public Square to Adversarial Manipulation Tactics [43.98568073610101]
We use a social media model to quantify the impacts of several adversarial manipulation tactics on the quality of content.
We find that the presence of influential accounts, a hallmark of social media, exacerbates the vulnerabilities of online communities to manipulation.
These insights suggest countermeasures that platforms could employ to increase the resilience of social media users to manipulation.
arXiv Detail & Related papers (2019-07-13T21:12:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.