@GrokSet: multi-party Human-LLM Interactions in Social Media
- URL: http://arxiv.org/abs/2602.21236v1
- Date: Wed, 11 Feb 2026 12:42:32 GMT
- Title: @GrokSet: multi-party Human-LLM Interactions in Social Media
- Authors: Matteo Migliarini, Berat Ercevik, Oluwagbemike Olowe, Saira Fatima, Sarah Zhao, Minh Anh Le, Vasu Sharma, Ashwinee Panda,
- Abstract summary: Large Language Models (LLMs) are increasingly deployed as active participants on public social media platforms.<n>We introduce @GrokSet, a large-scale dataset of over 1 million tweets involving the @Grok LLM on X.<n>Our analysis reveals a distinct functional shift: rather than serving as a general assistant, the LLM is frequently invoked as an authoritative arbiter in high-stakes, polarizing political debates.
- Score: 6.836704021198838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly deployed as active participants on public social media platforms, yet their behavior in these unconstrained social environments remains largely unstudied. Existing datasets, drawn primarily from private chat interfaces, lack the multi-party dynamics and public visibility crucial for understanding real-world performance. To address this gap, we introduce @GrokSet, a large-scale dataset of over 1 million tweets involving the @Grok LLM on X. Our analysis reveals a distinct functional shift: rather than serving as a general assistant, the LLM is frequently invoked as an authoritative arbiter in high-stakes, polarizing political debates. However, we observe a persistent engagement gap: despite this visibility, the model functions as a low-status utility, receiving significantly less social validation (likes, replies) than human peers. Finally, we find that this adversarial context exposes shallow alignment: users bypass safety filters not through complex jailbreaks, but through simple persona adoption and tone mirroring. We release @GrokSet as a critical resource for studying the intersection of AI agents and societal discourse.
Related papers
- Interpretable Debiasing of Vision-Language Models for Social Fairness [55.85977929985967]
We introduce an interpretable, model-agnostic bias mitigation framework, DeBiasLens, that localizes social attribute neurons in Vision-Language models.<n>We train SAEs on facial image or caption datasets without corresponding social attribute labels to uncover neurons highly responsive to specific demographics.<n>Our research lays the groundwork for future auditing tools, prioritizing social fairness in emerging real-world AI systems.
arXiv Detail & Related papers (2026-02-27T13:37:11Z) - Grok in the Wild: Characterizing the Roles and Uses of Large Language Models on Social Media [5.844783557050257]
xAI's large language model, Grok, is called by millions of people each week on the social media platform X.<n>At the platform level, we find that Grok responds to 62% of requests, that the majority (51%) are in English, and that engagement is low.<n>We also inductively build a taxonomy of 10 roles that LLMs play in mediating social interactions and use these roles to analyze 41,735 interactions with Grok on X.
arXiv Detail & Related papers (2026-02-11T19:06:22Z) - From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection [57.74400052368147]
This paper presents a social robot navigation framework that integrates geometric planning with contextual social reasoning.<n>The system first extracts obstacles and human dynamics to generate geometrically feasible candidate paths, then leverages a fine-tuned vision-language model (VLM) to evaluate these paths.<n>Experiments in four social navigation contexts demonstrate that our method achieves the best overall performance with the lowest personal space violation duration, the minimal pedestrian-facing time, and no social zone intrusions.
arXiv Detail & Related papers (2026-02-09T18:46:12Z) - Persona Jailbreaking in Large Language Models [8.618075786777219]
Large Language Models (LLMs) are increasingly deployed in domains such as education, mental health and customer support.<n>Black-box persona manipulation remains unexplored, raising concerns for robustness in realistic interactions.<n>We introduce the task of persona editing, which adversarially steers LLM traits through user-side inputs under a black-box, inference-only setting.
arXiv Detail & Related papers (2026-01-23T05:51:35Z) - HumanLLM: Towards Personalized Understanding and Simulation of Human Nature [72.55730315685837]
HumanLLM is a foundation model designed for personalized understanding and simulation of individuals.<n>We first construct the Cognitive Genome, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon.<n>We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences.
arXiv Detail & Related papers (2026-01-22T09:27:27Z) - SoMe: A Realistic Benchmark for LLM-based Social Media Agents [64.05026384906915]
SoMe is a benchmark designed to evaluate social media agents equipped with various agent tools for accessing and analyzing social media data.<n>SoMe comprises a diverse collection of 8 social media agent tasks, 9,164,284 posts, 6,591 user profiles, and 25,686 reports from various social media platforms and external websites.<n>By extensive quantitative and qualitative analysis, we provide the first overview into the performance of mainstream agentic LLMs in realistic social media environments.
arXiv Detail & Related papers (2025-12-09T08:36:09Z) - Can MLLMs Read the Room? A Multimodal Benchmark for Verifying Truthfulness in Multi-Party Social Interactions [21.974884890305365]
We present a novel dataset derived from the social deduction game Werewolf.<n>This dataset provides synchronized video, text, with verifiable ground-truth labels for every statement.<n>We evaluate state-of-the-art MLLMs, revealing a significant performance gap.
arXiv Detail & Related papers (2025-10-31T05:36:36Z) - SI-Bench: Benchmarking Social Intelligence of Large Language Models in Human-to-Human Conversations [8.453848538355508]
We introduce SI-Bench, a novel benchmark to evaluate aspects of social intelligence in large language models (LLMs)<n>Grounded in broad social science theories, SI-Bench contains 2,221 authentic multi-turn dialogues collected from a social networking application.<n>Experiments show that SOTA models have surpassed the human expert in process reasoning under complex social situations, yet they still fall behind humans in reply quality.
arXiv Detail & Related papers (2025-10-27T10:21:46Z) - SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning [53.16179295245888]
We introduce SIV-Bench, a novel video benchmark for evaluating the capabilities of Multimodal Large Language Models (MLLMs) across Social Scene Understanding (SSU), Social State Reasoning (SSR), and Social Dynamics Prediction (SDP)<n>SIV-Bench features 2,792 video clips and 8,792 meticulously generated question-answer pairs derived from a human-LLM collaborative pipeline.<n>It also includes a dedicated setup for analyzing the impact of different textual cues-original on-screen text, added dialogue, or no text.
arXiv Detail & Related papers (2025-06-05T05:51:35Z) - Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation [51.44040615856536]
This paper analyzes large language models' ability to simulate social media engagement through action guided response generation.<n>We benchmark GPT-4o-mini, O1-mini, and DeepSeek-R1 in social media engagement simulation regarding a major societal event.
arXiv Detail & Related papers (2025-02-17T17:43:08Z) - Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding [85.63710017456792]
FuSe is a novel approach that enables finetuning visuomotor generalist policies on heterogeneous sensor modalities.<n>We show that FuSe enables performing challenging tasks that require reasoning jointly over modalities such as vision, touch, and sound.<n>Experiments in the real world show that FuSeis able to increase success rates by over 20% compared to all considered baselines.
arXiv Detail & Related papers (2025-01-08T18:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.