Efficiently Controlling Multiple Risks with Pareto Testing
- URL: http://arxiv.org/abs/2210.07913v1
- Date: Fri, 14 Oct 2022 15:54:39 GMT
- Title: Efficiently Controlling Multiple Risks with Pareto Testing
- Authors: Bracha Laufer-Goldshtein, Adam Fisch, Regina Barzilay, Tommi Jaakkola
- Abstract summary: We propose a two-stage process which combines multi-objective optimization with multiple hypothesis testing.
We demonstrate the effectiveness of our approach to reliably accelerate the execution of large-scale Transformer models in natural language processing (NLP) applications.
- Score: 34.83506056862348
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning applications frequently come with multiple diverse
objectives and constraints that can change over time. Accordingly, trained
models can be tuned with sets of hyper-parameters that affect their predictive
behavior (e.g., their run-time efficiency versus error rate). As the number of
constraints and hyper-parameter dimensions grow, naively selected settings may
lead to sub-optimal and/or unreliable results. We develop an efficient method
for calibrating models such that their predictions provably satisfy multiple
explicit and simultaneous statistical guarantees (e.g., upper-bounded error
rates), while also optimizing any number of additional, unconstrained
objectives (e.g., total run-time cost). Building on recent results in
distribution-free, finite-sample risk control for general losses, we propose
Pareto Testing: a two-stage process which combines multi-objective optimization
with multiple hypothesis testing. The optimization stage constructs a set of
promising combinations on the Pareto frontier. We then apply statistical
testing to this frontier only to identify configurations that have (i) high
utility with respect to our objectives, and (ii) guaranteed risk levels with
respect to our constraints, with specifiable high probability. We demonstrate
the effectiveness of our approach to reliably accelerate the execution of
large-scale Transformer models in natural language processing (NLP)
applications. In particular, we show how Pareto Testing can be used to
dynamically configure multiple inter-dependent model attributes -- including
the number of layers computed before exiting, number of attention heads pruned,
or number of text tokens considered -- to simultaneously control and optimize
various accuracy and cost metrics.
Related papers
- Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - Risk-Controlling Model Selection via Guided Bayesian Optimization [35.53469358591976]
We find a configuration that adheres to user-specified limits on certain risks while being useful with respect to other conflicting metrics.
Our method identifies a set of optimal configurations residing in a designated region of interest.
We demonstrate the effectiveness of our approach on a range of tasks with multiple desiderata, including low error rates, equitable predictions, handling spurious correlations, managing rate and distortion in generative models, and reducing computational costs.
arXiv Detail & Related papers (2023-12-04T07:29:44Z) - Adaptive Batch Sizes for Active Learning A Probabilistic Numerics
Approach [28.815294991377645]
Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation.
This fixed approach is inefficient because of a dynamic trade-off between cost and speed.
We propose a novel Probabilistics framework that adaptively changes batch sizes.
arXiv Detail & Related papers (2023-06-09T12:17:18Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Symmetric Tensor Networks for Generative Modeling and Constrained
Combinatorial Optimization [72.41480594026815]
Constrained optimization problems abound in industry, from portfolio optimization to logistics.
One of the major roadblocks in solving these problems is the presence of non-trivial hard constraints which limit the valid search space.
In this work, we encode arbitrary integer-valued equality constraints of the form Ax=b, directly into U(1) symmetric networks (TNs) and leverage their applicability as quantum-inspired generative models.
arXiv Detail & Related papers (2022-11-16T18:59:54Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Constrained multi-objective optimization of process design parameters in
settings with scarce data: an application to adhesive bonding [48.7576911714538]
Finding the optimal process parameters for an adhesive bonding process is challenging.
Traditional evolutionary approaches (such as genetic algorithms) are then ill-suited to solve the problem.
In this research, we successfully applied specific machine learning techniques to emulate the objective and constraint functions.
arXiv Detail & Related papers (2021-12-16T10:14:39Z) - Evolutionary Optimization of High-Coverage Budgeted Classifiers [1.7767466724342065]
Budgeted multi-feature classifiers (MSC) process inputs through a sequence of partial feature acquisition and evaluation steps.
This paper proposes a problem-specific MSC that incorporates a terminal reject option for indecisive predictions.
The algorithm's design emphasizes efficiency while respecting a notion of aggregated performance via a uniqueization.
arXiv Detail & Related papers (2021-10-25T16:03:07Z) - Pareto Navigation Gradient Descent: a First-Order Algorithm for
Optimization in Pareto Set [17.617944390196286]
Modern machine learning applications, such as multi-task learning, require finding optimal model parameters to trade-off multiple objective functions.
We propose a first-order algorithm that approximately solves OPT-in-Pareto using only gradient information.
arXiv Detail & Related papers (2021-10-17T04:07:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.