Related papers: Efficiently Controlling Multiple Risks with Pareto Testing

Efficiently Controlling Multiple Risks with Pareto Testing

URL: http://arxiv.org/abs/2210.07913v1
Date: Fri, 14 Oct 2022 15:54:39 GMT
Title: Efficiently Controlling Multiple Risks with Pareto Testing
Authors: Bracha Laufer-Goldshtein, Adam Fisch, Regina Barzilay, Tommi Jaakkola
Abstract summary: We propose a two-stage process which combines multi-objective optimization with multiple hypothesis testing. We demonstrate the effectiveness of our approach to reliably accelerate the execution of large-scale Transformer models in natural language processing (NLP) applications.
Score: 34.83506056862348
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning applications frequently come with multiple diverse objectives and constraints that can change over time. Accordingly, trained models can be tuned with sets of hyper-parameters that affect their predictive behavior (e.g., their run-time efficiency versus error rate). As the number of constraints and hyper-parameter dimensions grow, naively selected settings may lead to sub-optimal and/or unreliable results. We develop an efficient method for calibrating models such that their predictions provably satisfy multiple explicit and simultaneous statistical guarantees (e.g., upper-bounded error rates), while also optimizing any number of additional, unconstrained objectives (e.g., total run-time cost). Building on recent results in distribution-free, finite-sample risk control for general losses, we propose Pareto Testing: a two-stage process which combines multi-objective optimization with multiple hypothesis testing. The optimization stage constructs a set of promising combinations on the Pareto frontier. We then apply statistical testing to this frontier only to identify configurations that have (i) high utility with respect to our objectives, and (ii) guaranteed risk levels with respect to our constraints, with specifiable high probability. We demonstrate the effectiveness of our approach to reliably accelerate the execution of large-scale Transformer models in natural language processing (NLP) applications. In particular, we show how Pareto Testing can be used to dynamically configure multiple inter-dependent model attributes -- including the number of layers computed before exiting, number of attention heads pruned, or number of text tokens considered -- to simultaneously control and optimize various accuracy and cost metrics.

Related papers

Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
Distilling Calibration via Conformalized Credal Inference [36.01369881486141]
One way to enhance reliability is through uncertainty quantification via Bayesian inference. This paper introduces a low-complexity methodology to address this challenge by distilling calibration information from a more complex model. Experiments on visual and language tasks demonstrate that the proposed approach, termed Conformalized Distillation for Credal Inference (CD-CI), significantly improves calibration performance.
arXiv Detail & Related papers (2025-01-10T15:57:23Z)
Vector Optimization with Gaussian Process Bandits [7.049738935364297]
Learning problems in which multiple objectives must be considered simultaneously often arise in various fields, including engineering, drug design, and environmental management. Traditional methods for dealing with multiple black-box objective functions have limitations in incorporating objective preferences and exploring the solution space accordingly. We propose Vector Optimization with Gaussian Process (VOGP), a probably approximately correct adaptive elimination algorithm that performs black-box vector optimization using Gaussian process bandits.
arXiv Detail & Related papers (2024-12-03T14:47:46Z)
Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores. We flexibly estimate the joint cumulative distribution function (CDF) of the scores. Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z)
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost. We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z)
Risk-Controlling Model Selection via Guided Bayesian Optimization [35.53469358591976]
We find a configuration that adheres to user-specified limits on certain risks while being useful with respect to other conflicting metrics. Our method identifies a set of optimal configurations residing in a designated region of interest. We demonstrate the effectiveness of our approach on a range of tasks with multiple desiderata, including low error rates, equitable predictions, handling spurious correlations, managing rate and distortion in generative models, and reducing computational costs.
arXiv Detail & Related papers (2023-12-04T07:29:44Z)
Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach [28.815294991377645]
Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation. This fixed approach is inefficient because of a dynamic trade-off between cost and speed. We propose a novel Probabilistics framework that adaptively changes batch sizes.
arXiv Detail & Related papers (2023-06-09T12:17:18Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Symmetric Tensor Networks for Generative Modeling and Constrained Combinatorial Optimization [72.41480594026815]
Constrained optimization problems abound in industry, from portfolio optimization to logistics. One of the major roadblocks in solving these problems is the presence of non-trivial hard constraints which limit the valid search space. In this work, we encode arbitrary integer-valued equality constraints of the form Ax=b, directly into U(1) symmetric networks (TNs) and leverage their applicability as quantum-inspired generative models.
arXiv Detail & Related papers (2022-11-16T18:59:54Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled. We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples. We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z)
Constrained multi-objective optimization of process design parameters in settings with scarce data: an application to adhesive bonding [48.7576911714538]
Finding the optimal process parameters for an adhesive bonding process is challenging. Traditional evolutionary approaches (such as genetic algorithms) are then ill-suited to solve the problem. In this research, we successfully applied specific machine learning techniques to emulate the objective and constraint functions.
arXiv Detail & Related papers (2021-12-16T10:14:39Z)
Evolutionary Optimization of High-Coverage Budgeted Classifiers [1.7767466724342065]
Budgeted multi-feature classifiers (MSC) process inputs through a sequence of partial feature acquisition and evaluation steps. This paper proposes a problem-specific MSC that incorporates a terminal reject option for indecisive predictions. The algorithm's design emphasizes efficiency while respecting a notion of aggregated performance via a uniqueization.
arXiv Detail & Related papers (2021-10-25T16:03:07Z)
Pareto Navigation Gradient Descent: a First-Order Algorithm for Optimization in Pareto Set [17.617944390196286]
Modern machine learning applications, such as multi-task learning, require finding optimal model parameters to trade-off multiple objective functions. We propose a first-order algorithm that approximately solves OPT-in-Pareto using only gradient information.
arXiv Detail & Related papers (2021-10-17T04:07:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.