Related papers: Theoretical Analysis on the Efficiency of Interleaved Comparisons

Theoretical Analysis on the Efficiency of Interleaved Comparisons

URL: http://arxiv.org/abs/2306.10023v1
Date: Wed, 31 May 2023 03:04:29 GMT
Title: Theoretical Analysis on the Efficiency of Interleaved Comparisons
Authors: Kojiro Iizuka, Hajime Morita, Makoto P. Kato
Abstract summary: This study presents a theoretical analysis on the efficiency of interleaving, an efficient online evaluation method for rankings. We begin by designing a simple interleaving method similar to ordinary interleaving methods. We explore a condition under which the interleaving method is more efficient than A/B testing and find that this is the case when users leave the ranking depending on the item's relevance.
Score: 3.654658106140114
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study presents a theoretical analysis on the efficiency of interleaving, an efficient online evaluation method for rankings. Although interleaving has already been applied to production systems, the source of its high efficiency has not been clarified in the literature. Therefore, this study presents a theoretical analysis on the efficiency of interleaving methods. We begin by designing a simple interleaving method similar to ordinary interleaving methods. Then, we explore a condition under which the interleaving method is more efficient than A/B testing and find that this is the case when users leave the ranking depending on the item's relevance, a typical assumption made in click models. Finally, we perform experiments based on numerical analysis and user simulation, demonstrating that the theoretical results are consistent with the empirical results.

Related papers

Comparing Model-agnostic Feature Selection Methods through Relative Efficiency [8.870380386952993]
We present a theoretical comparison under three model settings: linear models, non-linear additive models, and single index models that mimic a single-layer neural network.<n>Our theoretical results, along with empirical findings, demonstrate that GCM-related methods generally outperform LOCO under suitable regularity conditions.<n>Our simulations and real data analysis include widely used machine learning methods such as neural networks and gradient boosting trees.
arXiv Detail & Related papers (2025-08-19T20:55:43Z)
Accelerated Bayesian Optimal Experimental Design via Conditional Density Estimation and Informative Data [0.4178382980763478]
The Design of Experiments (DOEs) is a fundamental scientific methodology that provides researchers with systematic principles and techniques to enhance the validity, reliability, and efficiency of experimental outcomes.<n>In this study, we explore optimal experimental design within a Bayesian framework, utilizing Bayes' theorem to reformulate the utility expectation.
arXiv Detail & Related papers (2025-07-21T04:41:05Z)
Test-time Scaling Techniques in Theoretical Physics -- A Comparison of Methods on the TPBench Dataset [13.530403536762064]
We evaluate a range of common test-time scaling methods on the TPBench physics dataset.<n>We develop a novel, symbolic weak-verifier framework to improve parallel scaling results.<n>Our findings highlight the power of step-wise symbolic verification for tackling complex scientific problems.
arXiv Detail & Related papers (2025-06-25T18:00:18Z)
An extensive simulation study evaluating the interaction of resampling techniques across multiple causal discovery contexts [2.0946534289186842]
We present theoretical results proving that certain resampling methods emulate the assignment of specific values to algorithm tuning parameters. We also report the results of extensive simulation experiments, which verify the theoretical result and provide substantial data.
arXiv Detail & Related papers (2025-03-19T17:18:18Z)
Machine learning-based system reliability analysis with Gaussian Process Regression [1.0445957451908694]
We propose several theorems that facilitates such exploration. Cases that considering and neglecting the correlations among the candidate design samples are well elaborated. We prove that the well-known U learning function can be reformulated to the optimal learning function for the case neglecting the Kriging correlation.
arXiv Detail & Related papers (2024-03-17T07:17:07Z)
MFABA: A More Faithful and Accelerated Boundary-based Attribution Method for Deep Neural Networks [69.28125286491502]
We introduce MFABA, an attribution algorithm that adheres to axioms. Results demonstrate its superiority by achieving over 101.5142 times faster speed than the state-of-the-art attribution algorithms.
arXiv Detail & Related papers (2023-12-21T07:48:15Z)
Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches [13.504353263032359]
The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency. Traditionally, experimenters determine AES based on domain knowledge, but this method becomes impractical for online experimentation services managing numerous experiments. We propose two solutions for data-driven AES selection in for online experimentation services.
arXiv Detail & Related papers (2023-12-20T09:34:28Z)
Revisiting the Role of Similarity and Dissimilarity in Best Counter Argument Retrieval [1.7607244667735586]
We develop an efficient model for scoring counter-arguments based on similarity and dissimilarity metrics. We propose Bipolar-encoder, a novel BERT-based model to learn an optimal representation for simultaneous similarity and dissimilarity. Experimental results show that our proposed method can achieve the accuracy@1 of 49.04%, which significantly outperforms other baselines by a large margin.
arXiv Detail & Related papers (2023-04-18T08:13:48Z)
Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers [68.55472265775514]
We consider an efficiency method as an operator applied on a model. In this paper, we study the plausibility of this idea, and the commutativity and cumulativeness of efficiency operators.
arXiv Detail & Related papers (2022-07-31T18:01:06Z)
Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning. We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z)
Learnability of Competitive Threshold Models [11.005966612053262]
We study the learnability of the competitive threshold model from a theoretical perspective. We demonstrate how competitive threshold models can be seamlessly simulated by artificial neural networks.
arXiv Detail & Related papers (2022-05-08T01:11:51Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms [91.3755431537592]
We analyze four broad meta-learning strategies which rely on plug-in estimation and pseudo-outcome regression. We highlight how this theoretical reasoning can be used to guide principled algorithm design and translate our analyses into practice.
arXiv Detail & Related papers (2021-01-26T17:11:40Z)
Robust Imitation Learning from Noisy Demonstrations [81.67837507534001]
We show that robust imitation learning can be achieved by optimizing a classification risk with a symmetric loss. We propose a new imitation learning method that effectively combines pseudo-labeling with co-training. Experimental results on continuous-control benchmarks show that our method is more robust compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-10-20T10:41:37Z)
Detangling robustness in high dimensions: composite versus model-averaged estimation [11.658462692891355]
Robust methods, though ubiquitous in practice, are yet to be fully understood in the context of regularized estimation and high dimensions. This paper provides a toolbox to further study robustness in these settings and focuses on prediction.
arXiv Detail & Related papers (2020-06-12T20:40:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.