Adaptive Experimentation with Delayed Binary Feedback
- URL: http://arxiv.org/abs/2202.00846v1
- Date: Wed, 2 Feb 2022 01:47:10 GMT
- Title: Adaptive Experimentation with Delayed Binary Feedback
- Authors: Zenan Wang, Carlos Carrion, Xiliang Lin, Fuhua Ji, Yongjun Bao,
Weipeng Yan
- Abstract summary: This paper presents an adaptive experimentation solution tailored for delayed binary feedback objectives.
It estimates the real underlying objectives before they materialize and dynamically allocates variants based on the estimates.
This solution is currently deployed in the online experimentation platform of JD.com.
- Score: 11.778924435036519
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Conducting experiments with objectives that take significant delays to
materialize (e.g. conversions, add-to-cart events, etc.) is challenging.
Although the classical "split sample testing" is still valid for the delayed
feedback, the experiment will take longer to complete, which also means
spending more resources on worse-performing strategies due to their fixed
allocation schedules. Alternatively, adaptive approaches such as "multi-armed
bandits" are able to effectively reduce the cost of experimentation. But these
methods generally cannot handle delayed objectives directly out of the box.
This paper presents an adaptive experimentation solution tailored for delayed
binary feedback objectives by estimating the real underlying objectives before
they materialize and dynamically allocating variants based on the estimates.
Experiments show that the proposed method is more efficient for delayed
feedback compared to various other approaches and is robust in different
settings. In addition, we describe an experimentation product powered by this
algorithm. This product is currently deployed in the online experimentation
platform of JD.com, a large e-commerce company and a publisher of digital ads.
Related papers
- Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments [18.721012607370977]
Decision-makers wish to optimize for long-term treatment effects of the system changes.<n>We describe a novel approach that combines fast experiments (e.g., biased experiments run only for a few hours or days) with long-running, slow experiments.
arXiv Detail & Related papers (2025-06-23T15:18:54Z) - Active Test-time Vision-Language Navigation [60.69722522420299]
ATENA is a test-time active learning framework that enables a practical human-robot interaction via episodic feedback on uncertain navigation outcomes.<n>In particular, ATENA learns to increase certainty in successful episodes and decrease it in failed ones, improving uncertainty calibration.<n>In addition, we propose a self-active learning strategy that enables an agent to evaluate its navigation outcomes based on confident predictions.
arXiv Detail & Related papers (2025-06-07T02:24:44Z) - Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance.
We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Dual Test-time Training for Out-of-distribution Recommender System [91.15209066874694]
We propose a novel Dual Test-Time-Training framework for OOD Recommendation, termed DT3OR.
In DT3OR, we incorporate a model adaptation mechanism during the test-time phase to carefully update the recommendation model.
To the best of our knowledge, this paper is the first work to address OOD recommendation via a test-time-training strategy.
arXiv Detail & Related papers (2024-07-22T13:27:51Z) - Adaptive Experimentation When You Can't Experiment [55.86593195947978]
This paper introduces the emphconfounded pure exploration transductive linear bandit (textttCPET-LB) problem.
Online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment.
arXiv Detail & Related papers (2024-06-15T20:54:48Z) - Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification [9.030753181146176]
We propose a unified model that simultaneously accounts for within-experiment performance and post-experiment outcomes.
We show that substantial reductions in experiment duration can often be achieved with minimal impact on both within-experiment and post-experiment regret.
arXiv Detail & Related papers (2024-02-16T11:27:48Z) - Unraveling Batch Normalization for Realistic Test-Time Adaptation [22.126177142716188]
This paper delves into the problem of mini-batch degradation.
By unraveling batch normalization, we discover that the inexact target statistics largely stem from the substantially reduced class diversity in batch.
We introduce a straightforward tool, Test-time Exponential Moving Average (TEMA), to bridge the class diversity gap between training and testing batches.
arXiv Detail & Related papers (2023-12-15T01:52:35Z) - Search Strategies for Self-driving Laboratories with Pending Experiments [4.416701099409113]
Self-driving laboratories (SDLs) consist of multiple stations that perform material synthesis and characterisation tasks.
It is practical to run experiments in asynchronous parallel, in which multiple experiments are being performed at once in different stages.
We build a simulator for a multi-stage SDL and compare optimisation strategies for dealing with delayed feedback and asynchronous parallelized operation.
arXiv Detail & Related papers (2023-12-06T12:41:53Z) - Task-specific experimental design for treatment effect estimation [59.879567967089145]
Large randomised trials (RCTs) are the standard for causal inference.
Recent work has proposed more sample-efficient alternatives to RCTs, but these are not adaptable to the downstream application for which the causal effect is sought.
We develop a task-specific approach to experimental design and derive sampling strategies customised to particular downstream applications.
arXiv Detail & Related papers (2023-06-08T18:10:37Z) - Evaluation of Test-Time Adaptation Under Computational Time Constraints [80.40939405129102]
Test Time Adaptation (TTA) methods leverage unlabeled data at test time to adapt to distribution shifts.
Current evaluation protocols overlook the effect of this extra cost, affecting their real-world applicability.
We propose a more realistic evaluation protocol for TTA methods, where data is received in an online fashion from a constant-speed data stream.
arXiv Detail & Related papers (2023-04-10T18:01:47Z) - Clustering-based Imputation for Dropout Buyers in Large-scale Online
Experimentation [4.753069295451989]
In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process.
In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers.
For the analysis of incomplete metrics, we propose a clustering-based imputation method using $k$-nearest neighbors.
arXiv Detail & Related papers (2022-09-09T01:05:53Z) - Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design [11.414086057582324]
We introduce Deep Adaptive Design (DAD), a method for amortizing the cost of performing sequential adaptive experiments.
We demonstrate that DAD successfully amortizes the process of experimental design, outperforming alternative strategies on a number of problems.
arXiv Detail & Related papers (2021-03-03T14:43:48Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.