Imitate TheWorld: A Search Engine Simulation Platform
- URL: http://arxiv.org/abs/2107.07693v1
- Date: Fri, 16 Jul 2021 03:55:33 GMT
- Title: Imitate TheWorld: A Search Engine Simulation Platform
- Authors: Yongqing Gao, Guangda Huzhang, Weijie Shen, Yawen Liu, Wen-Ji Zhou,
Qing Da, Dan Shen, Yang Yu
- Abstract summary: We build a simulated search engine AESim that can properly give feedback by a well-trained discriminator for generated pages.
Different from previous simulation platforms which lose connection with the real world, ours depends on the real data in Search.
Our experiments also show AESim can better reflect the online performance of ranking models than classic ranking metrics.
- Score: 13.011052642314421
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent E-commerce applications benefit from the growth of deep learning
techniques. However, we notice that many works attempt to maximize business
objectives by closely matching offline labels which follow the supervised
learning paradigm. This results in models obtain high offline performance in
terms of Area Under Curve (AUC) and Normalized Discounted Cumulative Gain
(NDCG), but cannot consistently increase the revenue metrics such as purchases
amount of users. Towards the issues, we build a simulated search engine AESim
that can properly give feedback by a well-trained discriminator for generated
pages, as a dynamic dataset. Different from previous simulation platforms which
lose connection with the real world, ours depends on the real data in
AliExpress Search: we use adversarial learning to generate virtual users and
use Generative Adversarial Imitation Learning (GAIL) to capture behavior
patterns of users. Our experiments also show AESim can better reflect the
online performance of ranking models than classic ranking metrics, implying
AESim can play a surrogate of AliExpress Search and evaluate models without
going online.
Related papers
- Online Bandit Learning with Offline Preference Data [15.799929216215672]
We propose a posterior sampling algorithm for online learning that can be warm-started with an offline dataset with noisy preference feedback.
We show that by modeling the 'competence' of the expert that generated it, we are able to use such a dataset most effectively.
arXiv Detail & Related papers (2024-06-13T20:25:52Z) - BASES: Large-scale Web Search User Simulation with Large Language Model
based Agents [108.97507653131917]
BASES is a novel user simulation framework with large language models (LLMs)
Our simulation framework can generate unique user profiles at scale, which subsequently leads to diverse search behaviors.
WARRIORS is a new large-scale dataset encompassing web search user behaviors, including both Chinese and English versions.
arXiv Detail & Related papers (2024-02-27T13:44:09Z) - MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - On Generative Agents in Recommendation [58.42840923200071]
Agent4Rec is a user simulator in recommendation based on Large Language Models.
Each agent interacts with personalized recommender models in a page-by-page manner.
arXiv Detail & Related papers (2023-10-16T06:41:16Z) - Automatic Music Playlist Generation via Simulation-based Reinforcement
Learning [17.628525710776877]
Personalization of playlists is a common feature in music streaming services.
We present a reinforcement learning framework that solves for user satisfaction metrics via the use of a simulated playlist-generation environment.
arXiv Detail & Related papers (2023-10-13T14:13:02Z) - Causal Decision Transformer for Recommender Systems via Offline
Reinforcement Learning [23.638418776700522]
We propose a new model named the causal decision transformer for recommender systems (CDT4Rec)
CDT4Rec is an offline reinforcement learning system that can learn from a dataset rather than from online interaction.
To demonstrate the feasibility and superiority of our model, we have conducted experiments on six real-world offline datasets and one online simulator.
arXiv Detail & Related papers (2023-04-17T00:05:52Z) - Towards Data-Driven Offline Simulations for Online Reinforcement
Learning [30.654163861164864]
We formalize offline learner simulation (OLS) for reinforcement learning (RL)
We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
arXiv Detail & Related papers (2022-11-14T18:36:13Z) - Offline Reinforcement Learning Hands-On [60.36729294485601]
offline RL aims to turn large datasets into powerful decision-making engines without any online interactions with the environment.
This work aims to reflect upon these efforts from a practitioner viewpoint.
We experimentally validate that diversity and high-return examples in the data are crucial to the success of offline RL.
arXiv Detail & Related papers (2020-11-29T14:45:02Z) - Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z) - AliExpress Learning-To-Rank: Maximizing Online Model Performance without
Going Online [60.887637616379926]
This paper proposes an evaluator-generator framework for learning-to-rank.
It consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning.
Our method achieves a significant improvement in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.
arXiv Detail & Related papers (2020-03-25T10:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.