Multivariate Conformal Selection
- URL: http://arxiv.org/abs/2505.00917v1
- Date: Thu, 01 May 2025 23:33:57 GMT
- Title: Multivariate Conformal Selection
- Authors: Tian Bai, Yue Zhao, Xiang Yu, Archer Y. Yang,
- Abstract summary: We propose a generalization of Conformal Selection (CS) to provide rigorous uncertainty quantification.<n>We present two variants: mCS-dist, using distance-based scores, and mCS-learn, which learns optimal scores via differentiable optimization.<n>Experiments on simulated and real-world datasets demonstrate that mCS significantly improves selection power while maintaining False Discovery Rate (FDR) control.
- Score: 9.431551477608528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Selecting high-quality candidates from large datasets is critical in applications such as drug discovery, precision medicine, and alignment of large language models (LLMs). While Conformal Selection (CS) provides rigorous uncertainty quantification, it is limited to univariate responses and scalar criteria. To address this issue, we propose Multivariate Conformal Selection (mCS), a generalization of CS designed for multivariate response settings. Our method introduces regional monotonicity and employs multivariate nonconformity scores to construct conformal p-values, enabling finite-sample False Discovery Rate (FDR) control. We present two variants: mCS-dist, using distance-based scores, and mCS-learn, which learns optimal scores via differentiable optimization. Experiments on simulated and real-world datasets demonstrate that mCS significantly improves selection power while maintaining FDR control, establishing it as a robust framework for multivariate selection tasks.
Related papers
- Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks [81.44256822500257]
RLHF has emerged as a predominant approach for aligning artificial intelligence systems with human preferences.<n> RLHF exhibits insufficient compliance capabilities when confronted with complex multi-instruction tasks.<n>We propose a novel Multi-level Aware Preference Learning (MAPL) framework, capable of enhancing multi-instruction capabilities.
arXiv Detail & Related papers (2025-05-19T08:33:11Z) - Multi-QuAD: Multi-Level Quality-Adaptive Dynamic Network for Reliable Multimodal Classification [57.08108545219043]
Existing reliable multimodal classification methods fail to provide robust estimation of data quality.<n>New framework for reliable classification termed textitMulti-level Quality-Adaptive Dynamic multimodal network (Multi-QuAD) is proposed.<n>Experiments conducted on four datasets demonstrate that Multi-QuAD significantly outperforms state-of-the-art methods in classification performance and reliability.
arXiv Detail & Related papers (2024-12-19T03:26:51Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - Optimized Conformal Selection: Powerful Selective Inference After Conformity Score Optimization [4.984656106595651]
This paper presents OptCS, a framework that allows valid statistical testing (selection) after flexible data-driven model optimization.<n>We introduce general conditions under which OptCS constructs valid conformal p-values despite substantial data reuse.<n>We propose three FDR-controlling procedures, each optimizing the models differently.
arXiv Detail & Related papers (2024-11-27T01:40:50Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step.<n>Our work offers a comprehensive comparison of existing truncation sampling methods and serves as a practical user guideline for their parameter selection.
arXiv Detail & Related papers (2024-08-24T14:14:32Z) - Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states.
This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO)
We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z) - A data-science pipeline to enable the Interpretability of Many-Objective
Feature Selection [0.1474723404975345]
Many-Objective Feature Selection (MOFS) approaches use four or more objectives to determine the relevance of a subset of features in a supervised learning task.
This paper proposes an original methodology to support data scientists in the interpretation and comparison of the MOFS outcome by combining post-processing and visualisation of the set of solutions.
arXiv Detail & Related papers (2023-11-30T17:44:22Z) - Variable Selection in Maximum Mean Discrepancy for Interpretable
Distribution Comparison [9.12501922682336]
Two-sample testing decides whether two datasets are generated from the same distribution.
This paper studies variable selection for two-sample testing, the task being to identify the variables responsible for the discrepancies between the two distributions.
arXiv Detail & Related papers (2023-11-02T18:38:39Z) - BOtied: Multi-objective Bayesian optimization with tied multivariate ranks [33.414682601242006]
In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function.
Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied.
Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions.
arXiv Detail & Related papers (2023-06-01T04:50:06Z) - Variable Selection for Kernel Two-Sample Tests [9.581064100940381]
We consider the variable selection problem for two-sample tests.<n>We propose a novel framework based on the kernel maximum mean discrepancy (MMD)<n>Our approach seeks a subset of variables with a pre-specified size that maximizes the variance-regularized kernel MMD statistic.
arXiv Detail & Related papers (2023-02-15T00:39:56Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Greedy Search Algorithms for Unsupervised Variable Selection: A
Comparative Study [3.4888132404740797]
This paper focuses on unsupervised variable selection based dimensionality reduction.
We present a critical evaluation of seven unsupervised greedy variable selection algorithms.
We introduce and evaluate for the first time, a lazy implementation of the variance explained based forward selection component analysis (FSCA) algorithm.
arXiv Detail & Related papers (2021-03-03T21:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.