Related papers: Adaptive Experimentation with Delayed Binary Feedback

Adaptive Experimentation with Delayed Binary Feedback

URL: http://arxiv.org/abs/2202.00846v1
Date: Wed, 2 Feb 2022 01:47:10 GMT
Title: Adaptive Experimentation with Delayed Binary Feedback
Authors: Zenan Wang, Carlos Carrion, Xiliang Lin, Fuhua Ji, Yongjun Bao, Weipeng Yan
Abstract summary: This paper presents an adaptive experimentation solution tailored for delayed binary feedback objectives. It estimates the real underlying objectives before they materialize and dynamically allocates variants based on the estimates. This solution is currently deployed in the online experimentation platform of JD.com.
Score: 11.778924435036519
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Conducting experiments with objectives that take significant delays to materialize (e.g. conversions, add-to-cart events, etc.) is challenging. Although the classical "split sample testing" is still valid for the delayed feedback, the experiment will take longer to complete, which also means spending more resources on worse-performing strategies due to their fixed allocation schedules. Alternatively, adaptive approaches such as "multi-armed bandits" are able to effectively reduce the cost of experimentation. But these methods generally cannot handle delayed objectives directly out of the box. This paper presents an adaptive experimentation solution tailored for delayed binary feedback objectives by estimating the real underlying objectives before they materialize and dynamically allocating variants based on the estimates. Experiments show that the proposed method is more efficient for delayed feedback compared to various other approaches and is robust in different settings. In addition, we describe an experimentation product powered by this algorithm. This product is currently deployed in the online experimentation platform of JD.com, a large e-commerce company and a publisher of digital ads.

Related papers

Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments [18.721012607370977]
Decision-makers wish to optimize for long-term treatment effects of the system changes.<n>We describe a novel approach that combines fast experiments (e.g., biased experiments run only for a few hours or days) with long-running, slow experiments.
arXiv Detail & Related papers (2025-06-23T15:18:54Z)
Active Test-time Vision-Language Navigation [60.69722522420299]
ATENA is a test-time active learning framework that enables a practical human-robot interaction via episodic feedback on uncertain navigation outcomes.<n>In particular, ATENA learns to increase certainty in successful episodes and decrease it in failed ones, improving uncertainty calibration.<n>In addition, we propose a self-active learning strategy that enables an agent to evaluate its navigation outcomes based on confident predictions.
arXiv Detail & Related papers (2025-06-07T02:24:44Z)
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance. We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations. We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z)
Dual Test-time Training for Out-of-distribution Recommender System [91.15209066874694]
We propose a novel Dual Test-Time-Training framework for OOD Recommendation, termed DT3OR. In DT3OR, we incorporate a model adaptation mechanism during the test-time phase to carefully update the recommendation model. To the best of our knowledge, this paper is the first work to address OOD recommendation via a test-time-training strategy.
arXiv Detail & Related papers (2024-07-22T13:27:51Z)
Adaptive Experimentation When You Can't Experiment [55.86593195947978]
This paper introduces the emphconfounded pure exploration transductive linear bandit (textttCPET-LB) problem. Online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment.
arXiv Detail & Related papers (2024-06-15T20:54:48Z)
Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification [9.030753181146176]
We propose a unified model that simultaneously accounts for within-experiment performance and post-experiment outcomes. We show that substantial reductions in experiment duration can often be achieved with minimal impact on both within-experiment and post-experiment regret.
arXiv Detail & Related papers (2024-02-16T11:27:48Z)
Unraveling Batch Normalization for Realistic Test-Time Adaptation [22.126177142716188]
This paper delves into the problem of mini-batch degradation. By unraveling batch normalization, we discover that the inexact target statistics largely stem from the substantially reduced class diversity in batch. We introduce a straightforward tool, Test-time Exponential Moving Average (TEMA), to bridge the class diversity gap between training and testing batches.
arXiv Detail & Related papers (2023-12-15T01:52:35Z)
Search Strategies for Self-driving Laboratories with Pending Experiments [4.416701099409113]
Self-driving laboratories (SDLs) consist of multiple stations that perform material synthesis and characterisation tasks. It is practical to run experiments in asynchronous parallel, in which multiple experiments are being performed at once in different stages. We build a simulator for a multi-stage SDL and compare optimisation strategies for dealing with delayed feedback and asynchronous parallelized operation.
arXiv Detail & Related papers (2023-12-06T12:41:53Z)
Task-specific experimental design for treatment effect estimation [59.879567967089145]
Large randomised trials (RCTs) are the standard for causal inference. Recent work has proposed more sample-efficient alternatives to RCTs, but these are not adaptable to the downstream application for which the causal effect is sought. We develop a task-specific approach to experimental design and derive sampling strategies customised to particular downstream applications.
arXiv Detail & Related papers (2023-06-08T18:10:37Z)
Evaluation of Test-Time Adaptation Under Computational Time Constraints [80.40939405129102]
Test Time Adaptation (TTA) methods leverage unlabeled data at test time to adapt to distribution shifts. Current evaluation protocols overlook the effect of this extra cost, affecting their real-world applicability. We propose a more realistic evaluation protocol for TTA methods, where data is received in an online fashion from a constant-speed data stream.
arXiv Detail & Related papers (2023-04-10T18:01:47Z)
Clustering-based Imputation for Dropout Buyers in Large-scale Online Experimentation [4.753069295451989]
In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process. In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a clustering-based imputation method using $k$-nearest neighbors.
arXiv Detail & Related papers (2022-09-09T01:05:53Z)
Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design [11.414086057582324]
We introduce Deep Adaptive Design (DAD), a method for amortizing the cost of performing sequential adaptive experiments. We demonstrate that DAD successfully amortizes the process of experimental design, outperforming alternative strategies on a number of problems.
arXiv Detail & Related papers (2021-03-03T14:43:48Z)
Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection. Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling. It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.