Related papers: SoMe: A Realistic Benchmark for LLM-based Social Media Agents

SoMe: A Realistic Benchmark for LLM-based Social Media Agents

URL: http://arxiv.org/abs/2512.14720v1
Date: Tue, 09 Dec 2025 08:36:09 GMT
Title: SoMe: A Realistic Benchmark for LLM-based Social Media Agents
Authors: Dizhan Xue, Jing Cui, Shengsheng Qian, Chuanrui Hu, Changsheng Xu,
Abstract summary: SoMe is a benchmark designed to evaluate social media agents equipped with various agent tools for accessing and analyzing social media data.<n>SoMe comprises a diverse collection of 8 social media agent tasks, 9,164,284 posts, 6,591 user profiles, and 25,686 reports from various social media platforms and external websites.<n>By extensive quantitative and qualitative analysis, we provide the first overview into the performance of mainstream agentic LLMs in realistic social media environments.
Score: 64.05026384906915
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Intelligent agents powered by large language models (LLMs) have recently demonstrated impressive capabilities and gained increasing popularity on social media platforms. While LLM agents are reshaping the ecology of social media, there exists a current gap in conducting a comprehensive evaluation of their ability to comprehend media content, understand user behaviors, and make intricate decisions. To address this challenge, we introduce SoMe, a pioneering benchmark designed to evaluate social media agents equipped with various agent tools for accessing and analyzing social media data. SoMe comprises a diverse collection of 8 social media agent tasks, 9,164,284 posts, 6,591 user profiles, and 25,686 reports from various social media platforms and external websites, with 17,869 meticulously annotated task queries. Compared with the existing datasets and benchmarks for social media tasks, SoMe is the first to provide a versatile and realistic platform for LLM-based social media agents to handle diverse social media tasks. By extensive quantitative and qualitative analysis, we provide the first overview insight into the performance of mainstream agentic LLMs in realistic social media environments and identify several limitations. Our evaluation reveals that both the current closed-source and open-source LLMs cannot handle social media agent tasks satisfactorily. SoMe provides a challenging yet meaningful testbed for future social media agents. Our code and data are available at https://github.com/LivXue/SoMe

Related papers

Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics [7.849709311008473]
Large language models (LLMs) have traditionally relied on static training data, limiting their knowledge to fixed snapshots.<n>Recent advancements have equipped LLMs with web browsing capabilities, enabling real time information retrieval and multi step reasoning over live web content.<n>Here, we evaluate whether web browsing LLMs can infer demographic attributes of social media users given only their usernames.<n>We show that these models can access social media content and predict user demographics with reasonable accuracy.
arXiv Detail & Related papers (2025-07-16T16:21:01Z)
How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks [6.487500253901779]
Large language models (LLMs) play roles in multi-user, multi-turn social agent tasks.<n>We propose a novel benchmark, How Social Is It (we call it HSII below), designed to assess LLM's social capabilities.<n>HSII comprises four stages: format parsing, target selection, target switching conversation, and stable conversation, which collectively evaluate the communication and task completion capabilities of LLMs.
arXiv Detail & Related papers (2025-04-04T08:59:01Z)
Can LLMs Simulate Social Media Engagement? A Study on Action-Guided Response Generation [51.44040615856536]
This paper analyzes large language models' ability to simulate social media engagement through action guided response generation.<n>We benchmark GPT-4o-mini, O1-mini, and DeepSeek-R1 in social media engagement simulation regarding a major societal event.
arXiv Detail & Related papers (2025-02-17T17:43:08Z)
OASIS: Open Agent Social Interaction Simulations with One Million Agents [147.00696959981173]
We propose a scalable social media simulator based on real-world social media platforms.<n>OASIS supports large-scale user simulations capable of modeling up to one million users.<n>We replicate various social phenomena, including information spreading, group polarization, and herd effects across X and Reddit platforms.
arXiv Detail & Related papers (2024-11-18T13:57:35Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space.<n>AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
SocialBench: Sociality Evaluation of Role-Playing Conversational Agents [85.6641890712617]
Large language models (LLMs) have advanced the development of various AI conversational agents. SocialBench is the first benchmark designed to evaluate the sociality of role-playing conversational agents at both individual and group levels. We find that agents excelling in individual level does not imply their proficiency in group level.
arXiv Detail & Related papers (2024-03-20T15:38:36Z)
MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms [25.73585435351771]
This paper introduces MM-Soc, a benchmark designed to evaluate Multimodal Large Language Models' understanding of social media content. MM-Soc compiles prominent multimodal datasets and incorporates a novel large-scale YouTube tagging dataset. Our analysis reveals that, in a zero-shot setting, various types of MLLMs generally exhibit difficulties in handling social media tasks.
arXiv Detail & Related papers (2024-02-21T22:27:40Z)
SoMeLVLM: A Large Vision Language Model for Social Media Processing [78.47310657638567]
We introduce a Large Vision Language Model for Social Media Processing (SoMeLVLM) SoMeLVLM is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation. Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks.
arXiv Detail & Related papers (2024-02-20T14:02:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.