Related papers: User Simulation for Evaluating Information Access Systems

User Simulation for Evaluating Information Access Systems

URL: http://arxiv.org/abs/2306.08550v2
Date: Thu, 23 May 2024 19:29:07 GMT
Title: User Simulation for Evaluating Information Access Systems
Authors: Krisztian Balog, ChengXiang Zhai,
Abstract summary: evaluating the effectiveness of interactive intelligent systems is a complex scientific challenge. This book provides a thorough understanding of user simulation techniques designed specifically for evaluation. It covers both general frameworks for designing user simulators, and specific models and algorithms for simulating user interactions with search engines, recommender systems, and conversational assistants.
Score: 38.48048183731099
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Information access systems, such as search engines, recommender systems, and conversational assistants, have become integral to our daily lives as they help us satisfy our information needs. However, evaluating the effectiveness of these systems presents a long-standing and complex scientific challenge. This challenge is rooted in the difficulty of assessing a system's overall effectiveness in assisting users to complete tasks through interactive support, and further exacerbated by the substantial variation in user behaviour and preferences. To address this challenge, user simulation emerges as a promising solution. This book focuses on providing a thorough understanding of user simulation techniques designed specifically for evaluation purposes. We begin with a background of information access system evaluation and explore the diverse applications of user simulation. Subsequently, we systematically review the major research progress in user simulation, covering both general frameworks for designing user simulators, utilizing user simulation for evaluation, and specific models and algorithms for simulating user interactions with search engines, recommender systems, and conversational assistants. Realizing that user simulation is an interdisciplinary research topic, whenever possible, we attempt to establish connections with related fields, including machine learning, dialogue systems, user modeling, and economics. We end the book with a detailed discussion of important future research directions, many of which extend beyond the evaluation of information access systems and are expected to have broader impact on how to evaluate interactive intelligent systems in general.

Related papers

SimLab: A Platform for Simulation-based Evaluation of Conversational Information Access Systems [33.48172339249859]
We introduce SimLab, the first cloud-based platform to benchmark both conversational systems and user simulators in a controlled and reproducible environment.<n>We present the design and implementation of an initial version of SimLab and showcase its features with an initial evaluation task of conversational movie recommendation.<n>This paper is a call for the community to contribute to the platform to drive progress in the field of conversational information access and user simulation.
arXiv Detail & Related papers (2025-07-07T11:19:28Z)
Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User [117.82681846559909]
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations. We propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs.
arXiv Detail & Related papers (2025-04-29T06:37:30Z)
Proactive User Information Acquisition via Chats on User-Favored Topics [3.6698472838681893]
This study proposes the PIVOT task, designed to advance the technical foundation for these systems. We found that even recent large language models (LLMs) show a low success rate in the PIVOT task. We developed a simple but effective system for this task by incorporating insights obtained through the analysis of this dataset.
arXiv Detail & Related papers (2025-04-10T12:32:16Z)
A Survey on (M)LLM-Based GUI Agents [62.57899977018417]
Graphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction. Recent advances in large language models and multimodal learning have revolutionized GUI automation across desktop, mobile, and web platforms. This survey identifies key technical challenges, including accurate element localization, effective knowledge retrieval, long-horizon planning, and safety-aware execution control.
arXiv Detail & Related papers (2025-03-27T17:58:31Z)
User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation [38.48048183731099]
User simulation is an emerging interdisciplinary topic with multiple critical applications in the era of Generative AI. It involves creating an intelligent agent that mimics the actions of a human user interacting with an AI system. User simulation has profound implications for diverse fields and plays a vital role in the pursuit of Artificial General Intelligence.
arXiv Detail & Related papers (2025-01-08T10:49:13Z)
Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems [2.788542465279969]
This paper introduces DAUS, a Domain-Aware User Simulator. We fine-tune DAUS on real examples of task-oriented dialogues. Results on two relevant benchmarks showcase significant improvements in terms of user goal fulfillment.
arXiv Detail & Related papers (2024-02-20T20:57:47Z)
User Behavior Simulation with Large Language Model based Agents [116.74368915420065]
We propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors. Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans.
arXiv Detail & Related papers (2023-06-05T02:58:35Z)
Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications. It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data. We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z)
Synthetic Data-Based Simulators for Recommender Systems: A Survey [55.60116686945561]
This survey aims at providing a comprehensive overview of the recent trends in the field of modeling and simulation. We start with the motivation behind the development of frameworks implementing the simulations -- simulators. We provide a new consistent classification of existing simulators based on their functionality, approbation, and industrial effectiveness.
arXiv Detail & Related papers (2022-06-22T19:33:21Z)
Use-Case-Grounded Simulations for Explanation Evaluation [23.584251632331046]
We introduce Use-Case-Grounded Simulated Evaluations (SimEvals) SimEvals involve training algorithmic agents that take as input the information content that would be presented to each participant in a human subject study. We run a comprehensive evaluation on three real-world use cases to demonstrate that Simevals can effectively identify which explanation methods will help humans for each use case.
arXiv Detail & Related papers (2022-06-05T20:12:19Z)
Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems [80.77917437785773]
Task-oriented dialogue systems ( TDSs) are assessed mainly in an offline setting or through human evaluation. We propose a metaphorical user simulator for end-to-end TDS evaluation, where we define a simulator to be metaphorical if it simulates user's analogical thinking in interactions with systems. We also propose a tester-based evaluation framework to generate variants, i.e., dialogue systems with different capabilities.
arXiv Detail & Related papers (2022-04-02T05:11:03Z)
Learning User-Interpretable Descriptions of Black-Box AI System Capabilities [9.608555640607731]
This paper presents an approach for learning user-interpretable symbolic descriptions of the limits and capabilities of a black-box AI system. It uses a hierarchical active querying paradigm to generate questions and to learn a user-interpretable model of the AI system based on its responses.
arXiv Detail & Related papers (2021-07-28T23:33:31Z)
Micro-entries: Encouraging Deeper Evaluation of Mental Models Over Time for Interactive Data Systems [7.578368459974474]
We discuss the evaluation of users' mental models of system logic. Mental models are challenging to capture and analyze. By asking users to describe what they know and how they know it, researchers can collect structured, time-ordered insight.
arXiv Detail & Related papers (2020-09-02T18:27:04Z)
Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions. These inferences can be made regardless of prior knowledge and across different types of user behavior. We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z)
Large-scale Hybrid Approach for Predicting User Satisfaction with Conversational Agents [28.668681892786264]
Measuring user satisfaction level is a challenging task, and a critical component in developing large-scale conversational agent systems. Human annotation based approaches are easier to control, but hard to scale. A novel alternative approach is to collect user's direct feedback via a feedback elicitation system embedded to the conversational agent system.
arXiv Detail & Related papers (2020-05-29T16:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.