Related papers: Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation

Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation

URL: http://arxiv.org/abs/2008.07146v5
Date: Tue, 26 Oct 2021 08:57:39 GMT
Title: Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Authors: Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita
Abstract summary: Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using data generated by a different policy. There is, however, no real-world public dataset that enables the evaluation of OPE. We present Open Bandit dataset, a public logged bandit dataset collected on a large-scale fashion e-commerce platform, ZOZOTOWN.
Score: 10.135719343010178
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using data generated by a different policy. Because of its huge potential impact in practice, there has been growing research interest in this field. There is, however, no real-world public dataset that enables the evaluation of OPE, making its experimental studies unrealistic and irreproducible. With the goal of enabling realistic and reproducible OPE research, we present Open Bandit Dataset, a public logged bandit dataset collected on a large-scale fashion e-commerce platform, ZOZOTOWN. Our dataset is unique in that it contains a set of multiple logged bandit datasets collected by running different policies on the same platform. This enables experimental comparisons of different OPE estimators for the first time. We also develop Python software called Open Bandit Pipeline to streamline and standardize the implementation of batch bandit algorithms and OPE. Our open data and software will contribute to fair and transparent OPE research and help the community identify fruitful research directions. We provide extensive benchmark experiments of existing OPE estimators using our dataset and software. The results open up essential challenges and new avenues for future OPE research.

Related papers

Survey on Datasets for Perception in Unstructured Outdoor Environments [0.0]
We focus on datasets for common perception tasks in field robotics. This survey categorizes and compares available research datasets. We believe more consideration should be taken in choosing compatible annotation policies across the datasets in unstructured outdoor environments.
arXiv Detail & Related papers (2024-04-29T14:49:35Z)
When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective [64.73162159837956]
evaluating the value of a hypothetical target policy with only a logged dataset is important but challenging. We propose DataCOPE, a data-centric framework for evaluating a target policy given a dataset. Our empirical analysis of DataCOPE in the logged contextual bandit settings using healthcare datasets confirms its ability to evaluate both machine-learning and human expert policies.
arXiv Detail & Related papers (2023-11-23T17:13:37Z)
Packaging code for reproducible research in the public sector [0.0]
jtstats project consists of R and Python packages for importing, processing, and visualising large and complex datasets. Jtstats shows how domain specific packages can enable reproducible research within the public sector and beyond.
arXiv Detail & Related papers (2023-05-25T16:07:24Z)
Going beyond research datasets: Novel intent discovery in the industry setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform. We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision. We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z)
DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We provide an open, online platform with multiple rounds of challenges to support this iterative development. The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z)
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video. Recent studies have found that current benchmark datasets may have obvious moment annotation biases. We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
Design of Experiments for Stochastic Contextual Linear Bandits [47.804797753836894]
In the linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired. We design a single policy to collect a good dataset from which a near-optimal policy can be extracted. We present a theoretical analysis as well as numerical experiments on both synthetic and real-world datasets.
arXiv Detail & Related papers (2021-07-21T07:25:37Z)
Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits [5.144809478361604]
We improve the doubly robust (DR) estimator by adaptively weighting observations to control its variance. We provide empirical evidence for our estimator's improved accuracy and inferential properties relative to existing alternatives.
arXiv Detail & Related papers (2021-06-03T17:54:44Z)
Benchmarks for Deep Off-Policy Evaluation [152.28569758144022]
We present a collection of policies that can be used for benchmarking off-policy evaluation. The goal of our benchmark is to provide a standardized measure of progress that is motivated from a set of principles. We provide open-source access to our data and code to foster future research in this area.
arXiv Detail & Related papers (2021-03-30T18:09:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.