SimLab: A Platform for Simulation-based Evaluation of Conversational Information Access Systems
- URL: http://arxiv.org/abs/2507.04888v1
- Date: Mon, 07 Jul 2025 11:19:28 GMT
- Title: SimLab: A Platform for Simulation-based Evaluation of Conversational Information Access Systems
- Authors: Nolwenn Bernard, Sharath Chandra Etagi Suresh, Krisztian Balog, ChengXiang Zhai,
- Abstract summary: We introduce SimLab, the first cloud-based platform to benchmark both conversational systems and user simulators in a controlled and reproducible environment.<n>We present the design and implementation of an initial version of SimLab and showcase its features with an initial evaluation task of conversational movie recommendation.<n>This paper is a call for the community to contribute to the platform to drive progress in the field of conversational information access and user simulation.
- Score: 33.48172339249859
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Research on interactive and conversational information access systems, including search engines, recommender systems, and conversational assistants, has been hindered by the difficulty in evaluating such systems with reproducible experiments. User simulation provides a promising solution, but there is a lack of infrastructure and tooling to support this kind of evaluation. To facilitate simulation-based evaluation of conversational information access systems, we introduce SimLab, the first cloud-based platform to provide a centralized general solution for the community to benchmark both conversational systems and user simulators in a controlled and reproducible environment. We articulate requirements for such a platform and propose a general infrastructure to address these requirements. We then present the design and implementation of an initial version of SimLab and showcase its features with an initial evaluation task of conversational movie recommendation, which is made publicly available. Furthermore, we discuss the sustainability of the platform and its future opportunities. This paper is a call for the community to contribute to the platform to drive progress in the field of conversational information access and user simulation.
Related papers
- Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems [40.09105175322562]
RecInter is a novel agent-based simulation platform for recommender systems.<n>In RecInter, simulated user actions (e.g., likes, reviews, purchases) dynamically update item attributes in real-time.<n> Merchant Agents can reply, fostering a more realistic and evolving ecosystem.
arXiv Detail & Related papers (2025-05-22T09:14:23Z) - YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models [50.86336063222539]
We introduce a novel social simulator called YuLan-OneSim.<n>Users can simply describe and refine their simulation scenarios through natural language interactions with our simulator.<n>We implement 50 default simulation scenarios spanning 8 domains, including economics, sociology, politics, psychology, organization, demographics, law, and communication.
arXiv Detail & Related papers (2025-05-12T14:05:17Z) - clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations [18.256529559741075]
clem todd is a framework for systematically evaluating dialogue systems under consistent conditions.<n>It supports plug-and-play integration and ensures uniform datasets, evaluation metrics, and computational constraints.<n>Our results provide actionable insights into how architecture, scale, and prompting strategies affect dialogue performance.
arXiv Detail & Related papers (2025-05-08T17:36:36Z) - Design of JiuTian Intelligent Network Simulation Platform [16.343389061714973]
The paper introduces the JiuTian Intelligent Network Simulation Platform, which can provide wireless communication simulation data services for the Open Innovation Platform.
The platform contains a series of scalable simulator functionalities, offering open services that enable users to use reinforcement learning algorithms for model training and inference based on simulation environments and data.
arXiv Detail & Related papers (2023-09-28T07:02:39Z) - User Simulation for Evaluating Information Access Systems [38.48048183731099]
evaluating the effectiveness of interactive intelligent systems is a complex scientific challenge.
This book provides a thorough understanding of user simulation techniques designed specifically for evaluation.
It covers both general frameworks for designing user simulators, and specific models and algorithms for simulating user interactions with search engines, recommender systems, and conversational assistants.
arXiv Detail & Related papers (2023-06-14T14:54:06Z) - Information Extraction and Human-Robot Dialogue towards Real-life Tasks:
A Baseline Study with the MobileCS Dataset [52.22314870976088]
The SereTOD challenge is organized and releases the MobileCS dataset, which consists of real-world dialog transcripts between real users and customer-service staffs from China Mobile.
Based on the MobileCS dataset, the SereTOD challenge has two tasks, not only evaluating the construction of the dialogue system itself, but also examining information extraction from dialog transcripts.
This paper mainly presents a baseline study of the two tasks with the MobileCS dataset.
arXiv Detail & Related papers (2022-09-27T15:30:43Z) - Synthetic Data-Based Simulators for Recommender Systems: A Survey [55.60116686945561]
This survey aims at providing a comprehensive overview of the recent trends in the field of modeling and simulation.
We start with the motivation behind the development of frameworks implementing the simulations -- simulators.
We provide a new consistent classification of existing simulators based on their functionality, approbation, and industrial effectiveness.
arXiv Detail & Related papers (2022-06-22T19:33:21Z) - Metaphorical User Simulators for Evaluating Task-oriented Dialogue
Systems [80.77917437785773]
Task-oriented dialogue systems ( TDSs) are assessed mainly in an offline setting or through human evaluation.
We propose a metaphorical user simulator for end-to-end TDS evaluation, where we define a simulator to be metaphorical if it simulates user's analogical thinking in interactions with systems.
We also propose a tester-based evaluation framework to generate variants, i.e., dialogue systems with different capabilities.
arXiv Detail & Related papers (2022-04-02T05:11:03Z) - Recommendation System Simulations: A Discussion of Two Key Challenges [0.0]
Simulations provide an avenue for understanding the impacts of recommendation systems on individuals and society.
This paper will delve into two key challenges: first, defining a model for users selecting or engaging with recommended items and second, defining a mechanism for users encountering items that are not recommended to the user directly by the platform.
arXiv Detail & Related papers (2021-08-25T15:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.