Large Language Models Synergize with Automated Machine Learning
- URL: http://arxiv.org/abs/2405.03727v2
- Date: Sat, 11 May 2024 07:49:39 GMT
- Title: Large Language Models Synergize with Automated Machine Learning
- Authors: Jinglue Xu, Jialong Li, Zhen Liu, Nagar Anthel Venkatesh Suryanarayanan, Guoyuan Zhou, Jia Guo, Hitoshi Iba, Kenji Tei,
- Abstract summary: This paper explores a novel form of program synthesis, targeting machine learning (ML) programs, by combining large language models (LLMs) and automated machine learning (autoML)
In experiments, given the textual task description, our method, Text-to-ML, generates the complete and optimized ML program in a fully autonomous process.
- Score: 12.364087286739647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, program synthesis driven by large language models (LLMs) has become increasingly popular. However, program synthesis for machine learning (ML) tasks still poses significant challenges. This paper explores a novel form of program synthesis, targeting ML programs, by combining LLMs and automated machine learning (autoML). Specifically, our goal is to fully automate the generation and optimization of the code of the entire ML workflow, from data preparation to modeling and post-processing, utilizing only textual descriptions of the ML tasks. To manage the length and diversity of ML programs, we propose to break each ML program into smaller, manageable parts. Each part is generated separately by the LLM, with careful consideration of their compatibilities. To ensure compatibilities, we design a testing technique for ML programs. Unlike traditional program synthesis, which typically relies on binary evaluations (i.e., correct or incorrect), evaluating ML programs necessitates more than just binary judgments. Therefore, we further assess ML programs numerically and select the optimal programs from a range of candidates using AutoML methods. In experiments across various ML tasks, our method outperforms existing methods in 10 out of 12 tasks for generating ML programs. In addition, autoML significantly improves the performance of the generated ML programs. In experiments, given the textual task description, our method, Text-to-ML, generates the complete and optimized ML program in a fully autonomous process.
Related papers
- BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce Bench, a benchmark that challenges Large Language Models to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks.
Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
arXiv Detail & Related papers (2024-06-22T15:52:04Z) - Verbalized Machine Learning: Revisiting Machine Learning with Language Models [63.10391314749408]
We introduce the framework of verbalized machine learning (VML)
VML constrains the parameter space to be human-interpretable natural language.
We conduct several studies to empirically evaluate the effectiveness of VML.
arXiv Detail & Related papers (2024-06-06T17:59:56Z) - AutoMMLab: Automatically Generating Deployable Models from Language
Instructions for Computer Vision Tasks [39.71649832548044]
AutoMMLab is a general-purpose LLM-empowered AutoML system that follows user's language instructions.
The proposed AutoMMLab system effectively employs LLMs as the bridge to connect AutoML and OpenMMLab community.
Experiments show that our AutoMMLab system is versatile and covers a wide range of mainstream tasks.
arXiv Detail & Related papers (2024-02-23T14:38:19Z) - ML-On-Rails: Safeguarding Machine Learning Models in Software Systems A
Case Study [4.087995998278127]
We introduce ML-On-Rails, a protocol designed to safeguard machine learning models.
ML-On-Rails establishes a well-defined endpoint interface for different ML tasks, and clear communication between ML providers and ML consumers.
We evaluate the protocol through a real-world case study of the MoveReminder application.
arXiv Detail & Related papers (2024-01-12T11:27:15Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks.
This paper presents the first comprehensive MLLM Evaluation benchmark MME.
It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z) - Machine Learning for Software Engineering: A Tertiary Study [13.832268599253412]
Machine learning (ML) techniques increase the effectiveness of software engineering (SE) lifecycle activities.
We systematically collected, quality-assessed, summarized, and categorized 83 reviews in ML for SE published between 2009-2022, covering 6,117 primary studies.
The SE areas most tackled with ML are software quality and testing, while human-centered areas appear more challenging for ML.
arXiv Detail & Related papers (2022-11-17T09:19:53Z) - Operationalizing Machine Learning: An Interview Study [13.300075655862573]
We conduct semi-structured interviews with 18 machine learning engineers (MLEs) working across many applications.
Our interviews expose three variables that govern success for a production ML deployment: Velocity, Validation, and Versioning.
We summarize common practices for successful ML experimentation, deployment, and sustaining production performance.
arXiv Detail & Related papers (2022-09-16T16:59:36Z) - MLGO: a Machine Learning Guided Compiler Optimizations Framework [0.0]
This work is the first full integration of machine learning in a complex compiler pass in a real-world setting.
We use two different ML algorithms to train the inlining-for-size model, and achieve up to 7% size reduction.
The same model generalizes well to a diversity of real-world targets, as well as to the same set of targets after months of active development.
arXiv Detail & Related papers (2021-01-13T00:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.