Chiseling: Powerful and Valid Subgroup Selection via Interactive Machine Learning
- URL: http://arxiv.org/abs/2509.19490v3
- Date: Wed, 29 Oct 2025 21:17:22 GMT
- Title: Chiseling: Powerful and Valid Subgroup Selection via Interactive Machine Learning
- Authors: Nathan Cheng, Asher Spector, Lucas Janson,
- Abstract summary: In regression and causal inference, controlled subgroup selection aims to identify a subgroup on which the average response or treatment effect is above a given threshold.<n>We propose a novel framework called chiseling that allows the analyst to interactively refine and test a candidate subgroup by iteratively shrinking it.
- Score: 7.170797040538138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In regression and causal inference, controlled subgroup selection aims to identify, with inferential guarantees, a subgroup (defined as a subset of the covariate space) on which the average response or treatment effect is above a given threshold. E.g., in a clinical trial, it may be of interest to find a subgroup with a positive average treatment effect. However, existing methods either lack inferential guarantees, heavily restrict the search for the subgroup, or sacrifice efficiency by naive data splitting. We propose a novel framework called chiseling that allows the analyst to interactively refine and test a candidate subgroup by iteratively shrinking it. The sole restriction is that the shrinkage direction only depends on the points outside the current subgroup, but otherwise the analyst may leverage any prior information or machine learning algorithm. Despite this flexibility, chiseling controls the probability that the discovered subgroup is null (e.g., has a non-positive average treatment effect) under minimal assumptions: for example, in randomized experiments, this inferential validity guarantee holds under only bounded moment conditions. When applied to a variety of simulated datasets and a real survey experiment, chiseling identifies substantially better subgroups than existing methods with inferential guarantees.
Related papers
- Fair Decisions from Calibrated Scores: Achieving Optimal Classification While Satisfying Sufficiency [2.0686600920324163]
Binary classification based on predicted probabilities (scores) is a fundamental task in supervised machine learning.<n>We present an exact solution for optimal binary classification under sufficiency, assuming finite sets of group-calibrated scores.
arXiv Detail & Related papers (2026-02-07T00:26:40Z) - An Algorithm for Identifying Interpretable Subgroups With Elevated Treatment Effects [0.0]
We introduce an algorithm for identifying interpretable subgroups with elevated treatment effects, given an estimate of individual or conditional average treatment effects (CATE)<n>Subgroups are characterized by rule sets'' -- easy-to-understand statements of the form (Condition A AND Condition B) OR (Condition C)
arXiv Detail & Related papers (2025-07-13T05:01:48Z) - Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness [61.45587642780908]
We propose a three-step approach for parameter-efficient fine-tuning of image-text foundation models.<n>Our method improves its two key components: minority samples identification and the robust training algorithm.<n>Our theoretical analysis shows that our PPA enhances minority group identification and is Bayes optimal for minimizing the balanced group error.
arXiv Detail & Related papers (2025-03-12T15:46:12Z) - A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - Statistical Performance Guarantee for Subgroup Identification with Generic Machine Learning [0.5989855268111279]
We develop uniform confidence bands for estimation of the group average treatment effect sorted by generic ML algorithm (GATES)<n>We analyze a clinical trial of late-stage prostate cancer and find a relatively large proportion of exceptional responders.
arXiv Detail & Related papers (2023-10-12T01:41:47Z) - Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias.
We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates.
We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Group Testing with Non-identical Infection Probabilities [59.96266198512243]
We develop an adaptive group testing algorithm using the set formation method.
We show that our algorithm outperforms the state of the art, and performs close to the entropy lower bound.
arXiv Detail & Related papers (2021-08-27T17:53:25Z) - Finding Subgroups with Significant Treatment Effects [20.457122933924463]
We propose a machine-learning method that is specifically optimized for finding such subgroups in noisy data.
Unlike available methods for personalized treatment assignment, our tool is designed to take significance testing into account.
It produces a subgroup that is chosen to maximize the probability of obtaining a statistically significant positive treatment effect.
arXiv Detail & Related papers (2021-03-12T03:36:03Z) - Robust Recursive Partitioning for Heterogeneous Treatment Effects with
Uncertainty Quantification [84.53697297858146]
Subgroup analysis of treatment effects plays an important role in applications from medicine to public policy to recommender systems.
Most of the current methods of subgroup analysis begin with a particular algorithm for estimating individualized treatment effects (ITE)
This paper develops a new method for subgroup analysis, R2P, that addresses all these weaknesses.
arXiv Detail & Related papers (2020-06-14T14:50:02Z) - Almost-Matching-Exactly for Treatment Effect Estimation under Network
Interference [73.23326654892963]
We propose a matching method that recovers direct treatment effects from randomized experiments where units are connected in an observed network.
Our method matches units almost exactly on counts of unique subgraphs within their neighborhood graphs.
arXiv Detail & Related papers (2020-03-02T15:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.