Efficiently Controlling Multiple Risks with Pareto Testing
- URL: http://arxiv.org/abs/2210.07913v1
- Date: Fri, 14 Oct 2022 15:54:39 GMT
- Title: Efficiently Controlling Multiple Risks with Pareto Testing
- Authors: Bracha Laufer-Goldshtein, Adam Fisch, Regina Barzilay, Tommi Jaakkola
- Abstract summary: We propose a two-stage process which combines multi-objective optimization with multiple hypothesis testing.
We demonstrate the effectiveness of our approach to reliably accelerate the execution of large-scale Transformer models in natural language processing (NLP) applications.
- Score: 34.83506056862348
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning applications frequently come with multiple diverse
objectives and constraints that can change over time. Accordingly, trained
models can be tuned with sets of hyper-parameters that affect their predictive
behavior (e.g., their run-time efficiency versus error rate). As the number of
constraints and hyper-parameter dimensions grow, naively selected settings may
lead to sub-optimal and/or unreliable results. We develop an efficient method
for calibrating models such that their predictions provably satisfy multiple
explicit and simultaneous statistical guarantees (e.g., upper-bounded error
rates), while also optimizing any number of additional, unconstrained
objectives (e.g., total run-time cost). Building on recent results in
distribution-free, finite-sample risk control for general losses, we propose
Pareto Testing: a two-stage process which combines multi-objective optimization
with multiple hypothesis testing. The optimization stage constructs a set of
promising combinations on the Pareto frontier. We then apply statistical
testing to this frontier only to identify configurations that have (i) high
utility with respect to our objectives, and (ii) guaranteed risk levels with
respect to our constraints, with specifiable high probability. We demonstrate
the effectiveness of our approach to reliably accelerate the execution of
large-scale Transformer models in natural language processing (NLP)
applications. In particular, we show how Pareto Testing can be used to
dynamically configure multiple inter-dependent model attributes -- including
the number of layers computed before exiting, number of attention heads pruned,
or number of text tokens considered -- to simultaneously control and optimize
various accuracy and cost metrics.
Related papers
- Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach.
Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z) - Distilling Calibration via Conformalized Credal Inference [36.01369881486141]
One way to enhance reliability is through uncertainty quantification via Bayesian inference.
This paper introduces a low-complexity methodology to address this challenge by distilling calibration information from a more complex model.
Experiments on visual and language tasks demonstrate that the proposed approach, termed Conformalized Distillation for Credal Inference (CD-CI), significantly improves calibration performance.
arXiv Detail & Related papers (2025-01-10T15:57:23Z) - Vector Optimization with Gaussian Process Bandits [7.049738935364297]
Learning problems in which multiple objectives must be considered simultaneously often arise in various fields, including engineering, drug design, and environmental management.
Traditional methods for dealing with multiple black-box objective functions have limitations in incorporating objective preferences and exploring the solution space accordingly.
We propose Vector Optimization with Gaussian Process (VOGP), a probably approximately correct adaptive elimination algorithm that performs black-box vector optimization using Gaussian process bandits.
arXiv Detail & Related papers (2024-12-03T14:47:46Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - Risk-Controlling Model Selection via Guided Bayesian Optimization [35.53469358591976]
We find a configuration that adheres to user-specified limits on certain risks while being useful with respect to other conflicting metrics.
Our method identifies a set of optimal configurations residing in a designated region of interest.
We demonstrate the effectiveness of our approach on a range of tasks with multiple desiderata, including low error rates, equitable predictions, handling spurious correlations, managing rate and distortion in generative models, and reducing computational costs.
arXiv Detail & Related papers (2023-12-04T07:29:44Z) - Adaptive Batch Sizes for Active Learning A Probabilistic Numerics
Approach [28.815294991377645]
Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation.
This fixed approach is inefficient because of a dynamic trade-off between cost and speed.
We propose a novel Probabilistics framework that adaptively changes batch sizes.
arXiv Detail & Related papers (2023-06-09T12:17:18Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Constrained multi-objective optimization of process design parameters in
settings with scarce data: an application to adhesive bonding [48.7576911714538]
Finding the optimal process parameters for an adhesive bonding process is challenging.
Traditional evolutionary approaches (such as genetic algorithms) are then ill-suited to solve the problem.
In this research, we successfully applied specific machine learning techniques to emulate the objective and constraint functions.
arXiv Detail & Related papers (2021-12-16T10:14:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.