Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows
- URL: http://arxiv.org/abs/2406.16218v1
- Date: Sun, 23 Jun 2024 21:05:31 GMT
- Title: Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows
- Authors: Ching-An Cheng, Allen Nie, Adith Swaminathan,
- Abstract summary: We study a class of optimization problems motivated by automating the design and update of AI systems like coding assistants, robots, and copilots.
We propose an end-to-end optimization framework, which treats the computational workflow of an AI system as a graph akin to neural networks.
- Score: 19.89948665187903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a class of optimization problems motivated by automating the design and update of AI systems like coding assistants, robots, and copilots. We propose an end-to-end optimization framework, Trace, which treats the computational workflow of an AI system as a graph akin to neural networks, based on a generalization of back-propagation. Optimization of computational workflows often involves rich feedback (e.g. console output or user's responses), heterogeneous parameters (e.g. prompts, hyper-parameters, codes), and intricate objectives (beyond maximizing a score). Moreover, its computation graph can change dynamically with the inputs and parameters. We frame a new mathematical setup of iterative optimization, Optimization with Trace Oracle (OPTO), to capture and abstract these properties so as to design optimizers that work across many domains. In OPTO, an optimizer receives an execution trace along with feedback on the computed output and updates parameters iteratively. Trace is the tool to implement OPTO in practice. Trace has a Python interface that efficiently converts a computational workflow into an OPTO instance using a PyTorch-like interface. Using Trace, we develop a general-purpose LLM-based optimizer called OptoPrime that can effectively solve OPTO problems. In empirical studies, we find that OptoPrime is capable of first-order numerical optimization, prompt optimization, hyper-parameter tuning, robot controller design, code debugging, etc., and is often competitive with specialized optimizers for each domain. We believe that Trace, OptoPrime and the OPTO framework will enable the next generation of interactive agents that automatically adapt using various kinds of feedback. Website: https://microsoft.github.io/Trace
Related papers
- Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - MADA: Meta-Adaptive Optimizers through hyper-gradient Descent [73.1383658672682]
We introduce Meta-Adaptives (MADA), a unified framework that can generalize several known convergences and dynamically learn the most suitable one during training.
We empirically compare MADA to other populars on vision and language tasks, and find that MADA consistently outperforms Adam and other populars.
We also propose AVGrad, a modification of AMS that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization.
arXiv Detail & Related papers (2024-01-17T00:16:46Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - Learning to Optimize Quasi-Newton Methods [22.504971951262004]
This paper introduces a novel machine learning called LODO, which tries to online meta-learn the best preconditioner during optimization.
Unlike other L2O methods, LODO does not require any meta-training on a training task distribution.
We show that our gradient approximates the inverse Hessian in noisy loss landscapes and is capable of representing a wide range of inverse Hessians.
arXiv Detail & Related papers (2022-10-11T03:47:14Z) - Teaching Networks to Solve Optimization Problems [13.803078209630444]
We propose to replace the iterative solvers altogether with a trainable parametric set function.
We show the feasibility of learning such parametric (set) functions to solve various classic optimization problems.
arXiv Detail & Related papers (2022-02-08T19:13:13Z) - Automatic Tuning of Tensorflow's CPU Backend using Gradient-Free
Optimization Algorithms [0.6543507682026964]
Deep learning (DL) applications are built using DL libraries and frameworks such as Genetic and PyTorch.
These frameworks have complex parameters and tuning them to obtain good training and inference performance is challenging for typical users.
In this paper, we treat the problem of tuning parameters of DL frameworks to improve training and inference performance as a black-box problem.
arXiv Detail & Related papers (2021-09-13T19:10:23Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Meta Learning Black-Box Population-Based Optimizers [0.0]
We propose the use of meta-learning to infer population-based blackbox generalizations.
We show that the meta-loss function encourages a learned algorithm to alter its search behavior so that it can easily fit into a new context.
arXiv Detail & Related papers (2021-03-05T08:13:25Z) - Transferable Graph Optimizers for ML Compilers [18.353830282858834]
We propose an end-to-end, transferable deep reinforcement learning method for computational graph optimization (GO)
GO generates decisions on the entire graph rather than on each individual node autoregressively, drastically speeding up the search compared to prior methods.
GO achieves 21% improvement over human experts and 18% improvement over the prior state of the art with 15x faster convergence.
arXiv Detail & Related papers (2020-10-21T20:28:33Z) - Tasks, stability, architecture, and compute: Training more effective
learned optimizers, and using them to train themselves [53.37905268850274]
We introduce a new, hierarchical, neural network parameterized, hierarchical with access to additional features such as validation loss to enable automatic regularization.
Most learneds have been trained on only a single task, or a small number of tasks.
We train ours on thousands of tasks, making use of orders of magnitude more compute, resulting in generalizes that perform better to unseen tasks.
arXiv Detail & Related papers (2020-09-23T16:35:09Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.