AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms
- URL: http://arxiv.org/abs/2502.18754v1
- Date: Wed, 26 Feb 2025 02:10:25 GMT
- Title: AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms
- Authors: Yuwei Yan, Yu Shang, Qingbin Zeng, Yu Li, Keyu Zhao, Zhiheng Zheng, Xuefei Ning, Tianji Wu, Shengen Yan, Yu Wang, Fengli Xu, Yong Li,
- Abstract summary: The AgentSociety Challenge is the first competition in the Web Conference that aims to explore the potential of Large Language Model (LLM) agents.<n>The Challenge has attracted 295 teams across the globe and received over 1,400 submissions in total over the course of 37 official competition days.<n>This paper discusses the detailed designs of the Challenge, analyzes the outcomes, and highlights the most successful LLM agent designs.
- Score: 21.518109364403642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The AgentSociety Challenge is the first competition in the Web Conference that aims to explore the potential of Large Language Model (LLM) agents in modeling user behavior and enhancing recommender systems on web platforms. The Challenge consists of two tracks: the User Modeling Track and the Recommendation Track. Participants are tasked to utilize a combined dataset from Yelp, Amazon, and Goodreads, along with an interactive environment simulator, to develop innovative LLM agents. The Challenge has attracted 295 teams across the globe and received over 1,400 submissions in total over the course of 37 official competition days. The participants have achieved 21.9% and 20.3% performance improvement for Track 1 and Track 2 in the Development Phase, and 9.1% and 15.9% in the Final Phase, representing a significant accomplishment. This paper discusses the detailed designs of the Challenge, analyzes the outcomes, and highlights the most successful LLM agent designs. To support further research and development, we have open-sourced the benchmark environment at https://tsinghua-fib-lab.github.io/AgentSocietyChallenge.
Related papers
- SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories.
Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development.
We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z) - Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks [62.443665295250035]
We present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023)
In total, 32 competing teams register for the challenge, from which we received 11 successful submissions.
arXiv Detail & Related papers (2024-07-20T10:13:54Z) - Benchmarking Robustness and Generalization in Multi-Agent Systems: A
Case Study on Neural MMO [50.58083807719749]
We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions.
This competition targets robustness and generalization in multi-agent systems.
We will open-source our benchmark including the environment wrapper, baselines, a visualization tool, and selected policies for further research.
arXiv Detail & Related papers (2023-08-30T07:16:11Z) - MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised
Learning [90.17500229142755]
The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia.
This paper introduces the motivation behind this challenge, describe the benchmark dataset, and provide some statistics about participants.
We believe this high-quality dataset can become a new benchmark in multimodal emotion recognition, especially for the Chinese research community.
arXiv Detail & Related papers (2023-04-18T13:23:42Z) - Retrospective on the 2021 BASALT Competition on Learning from Human
Feedback [92.37243979045817]
The goal of the competition was to promote research towards agents that use learning from human feedback (LfHF) techniques to solve open-world tasks.
Rather than mandating the use of LfHF techniques, we described four tasks in natural language to be accomplished in the video game Minecraft.
Teams developed a diverse range of LfHF algorithms across a variety of possible human feedback types.
arXiv Detail & Related papers (2022-04-14T17:24:54Z) - The Second Place Solution for ICCV2021 VIPriors Instance Segmentation
Challenge [6.087398773657721]
The Visual Inductive Priors(VIPriors) for Data-Efficient Computer Vision challenges ask competitors to train models from scratch in a data-deficient setting.
We introduce the technical details of our submission to the ICCV 2021 VIPriors instance segmentation challenge.
Our approach can achieve 40.2%AP@0.50:0.95 on the test set of ICCV 2021 VIPriors instance segmentation challenge.
arXiv Detail & Related papers (2021-12-02T09:23:02Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.