Probabilistic Factorial Experimental Design for Combinatorial Interventions
- URL: http://arxiv.org/abs/2506.03363v1
- Date: Tue, 03 Jun 2025 20:15:08 GMT
- Title: Probabilistic Factorial Experimental Design for Combinatorial Interventions
- Authors: Divya Shyamal, Jiaqi Zhang, Caroline Uhler,
- Abstract summary: We introduce factorial experimental design, formalized from how scientists perform lab experiments.<n>We address the optimal experimental design problem within an intervention model that imposes bounded-degree interactions between treatments.<n>Our results prove that a dosage of $tfrac12$ for each treatment is optimal up to a factor of $1+O(tfracln(n)n)$ for estimating any $k$-way interaction model.
- Score: 10.482728002416348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A combinatorial intervention, consisting of multiple treatments applied to a single unit with potentially interactive effects, has substantial applications in fields such as biomedicine, engineering, and beyond. Given $p$ possible treatments, conducting all possible $2^p$ combinatorial interventions can be laborious and quickly becomes infeasible as $p$ increases. Here we introduce probabilistic factorial experimental design, formalized from how scientists perform lab experiments. In this framework, the experimenter selects a dosage for each possible treatment and applies it to a group of units. Each unit independently receives a random combination of treatments, sampled from a product Bernoulli distribution determined by the dosages. Additionally, the experimenter can carry out such experiments over multiple rounds, adapting the design in an active manner. We address the optimal experimental design problem within an intervention model that imposes bounded-degree interactions between treatments. In the passive setting, we provide a closed-form solution for the near-optimal design. Our results prove that a dosage of $\tfrac{1}{2}$ for each treatment is optimal up to a factor of $1+O(\tfrac{\ln(n)}{n})$ for estimating any $k$-way interaction model, regardless of $k$, and imply that $O\big(kp^{3k}\ln(p)\big)$ observations are required to accurately estimate this model. For the multi-round setting, we provide a near-optimal acquisition function that can be numerically optimized. We also explore several extensions of the design problem and finally validate our findings through simulations.
Related papers
- ML-assisted Randomization Tests for Detecting Treatment Effects in A/B Experiments [3.79377147545355]
In this paper, we construct randomization tests for complex treatment effects.<n>A key feature of our approach is the use of flexible machine learning (ML) models.<n>This approach combines the predictive power of modern ML tools with the finite-sample validity of randomization procedures.
arXiv Detail & Related papers (2025-01-13T22:14:58Z) - Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning.<n>We propose a new paradigm that integrates both paired and unpaired data.<n>We show that our approach can theoretically recover true conditional distributions with arbitrarily small error.
arXiv Detail & Related papers (2024-10-03T16:12:59Z) - Adaptive Experimentation When You Can't Experiment [55.86593195947978]
This paper introduces the emphconfounded pure exploration transductive linear bandit (textttCPET-LB) problem.
Online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment.
arXiv Detail & Related papers (2024-06-15T20:54:48Z) - Clustered Switchback Designs for Experimentation Under Spatio-temporal Interference [44.644520116360106]
We estimate the global average treatment effect (GATE), the difference between average outcomes having exposed all units at all times to treatment or to control.<n>We propose a clustered switchback design, where units are grouped into clusters and time steps are grouped into blocks.<n>We show that for graphs that admit good clustering, a truncated Horvitz-Thompson estimator achieves a $tilde O(1/NT)$ mean squared error (MSE)<n>Our results simultaneously generalize the results from citethu2022switchback,ugander2013graph and citetleung2022rate
arXiv Detail & Related papers (2023-12-25T01:00:58Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Synthetic Combinations: A Causal Inference Framework for Combinatorial
Interventions [8.491098180590447]
We learn unit-specific potential outcomes for any combination of interventions, i.e., $N times 2p$ causal parameters.
Running $N times 2p$ experiments to estimate the various parameters is likely expensive and/or infeasible as $N$ and $p$ grow.
arXiv Detail & Related papers (2023-03-24T18:45:44Z) - Neural Design for Genetic Perturbation Experiments [16.95249173404529]
We introduce the Optimistic Arm Elimination principle to find an almost optimal arm under different functional relationships between the queries (arms) and the outputs (rewards)
OAE also outperforms the benchmark algorithms in 3 of 4 datasets in the GeneDisco experimental planning challenge.
arXiv Detail & Related papers (2022-07-26T10:59:52Z) - Sharper Rates and Flexible Framework for Nonconvex SGD with Client and
Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions.
We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism.
We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z) - What can the millions of random treatments in nonexperimental data
reveal about causes? [0.0]
The article introduces one such model and a Bayesian approach to combine the $O(n2)$ pairwise observations typically available in nonexperimnetal data.
We demonstrate that the proposed approach recovers causal effects in common NSW samples, as well as in arbitrary subpopulations and an order-of-magnitude larger supersample.
arXiv Detail & Related papers (2021-05-03T20:13:34Z) - Brain Image Synthesis with Unsupervised Multivariate Canonical
CSC$\ell_4$Net [122.8907826672382]
We propose to learn dedicated features that cross both intre- and intra-modal variations using a novel CSC$ell_4$Net.
arXiv Detail & Related papers (2021-03-22T05:19:40Z) - High-Dimensional Feature Selection for Sample Efficient Treatment Effect
Estimation [0.0]
The estimation of causal treatment effects from observational data is a fundamental problem in causal inference.
We propose a common objective function involving outcomes across treatment cohorts.
We validate our approach with experiments on treatment effect estimation.
arXiv Detail & Related papers (2020-11-03T19:54:16Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - Optimal Bayesian experimental design for subsurface flow problems [77.34726150561087]
We propose a novel approach for development of chaos expansion (PCE) surrogate model for the design utility function.
This novel technique enables the derivation of a reasonable quality response surface for the targeted objective function with a computational budget comparable to several single-point evaluations.
arXiv Detail & Related papers (2020-08-10T09:42:59Z) - Synthetic Interventions [18.6573968345062]
The synthetic controls (SC) methodology is a prominent tool for policy evaluation in panel data applications.
In the recent work of [Abadie '20], one of the pioneering authors of the SC method posed the question of how the SC framework can be extended to multiple treatments.
This article offers one resolution to this open question that we call synthetic interventions (SI)
arXiv Detail & Related papers (2020-06-13T18:15:22Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.