Archon: An Architecture Search Framework for Inference-Time Techniques
- URL: http://arxiv.org/abs/2409.15254v5
- Date: Thu, 3 Oct 2024 05:41:48 GMT
- Title: Archon: An Architecture Search Framework for Inference-Time Techniques
- Authors: Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher RĂ©, Azalia Mirhoseini,
- Abstract summary: Archon is a framework for selecting, combining, and stacking layers of inference-time techniques.
We evaluate Archon architectures across a range of instruction-following, reasoning, and coding benchmarks.
- Score: 31.655124464284523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inference-time techniques are emerging as highly effective tools to enhance large language model (LLM) capabilities. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of individual inference-time techniques and the interactions between them. Additionally, efficiently and automatically searching the space of model choices, inference-time techniques, and their compositions is challenging due to the large design space. To address these challenges, we introduce Archon, a modular framework for selecting, combining, and stacking layers of inference-time techniques to construct optimized LLM systems for target benchmarks. Rather than relying on a single LLM called once, we leverage a diverse set of LLMs and inference-time techniques, creating LLM systems greater than the sum of their parts. Archon defines an extensible design space, encompassing techniques such as generation ensembling, repeated sampling, ranking, fusion, critiquing, verification, and unit testing. It transforms the problem of building LLM systems into a hyperparameter optimization objective. Given the available LLMs, inference-time techniques, and compute budget, Archon utilizes hyperparameter search techniques to discover optimized architectures for target benchmark(s). We evaluate Archon architectures across a range of instruction-following, reasoning, and coding benchmarks, including MT-Bench, Arena-Hard-Auto, AlpacaEval 2.0, MixEval, MixEval Hard, MATH, and CodeContests. Archon architectures outperform frontier models, such as GPT-4o and Claude 3.5 Sonnet, on these benchmarks, achieving an average accuracy increase of 15.1 percentage points by using all available LLMs. We make our code and datasets available publicly on Github: https://github.com/ScalingIntelligence/Archon.
Related papers
- DesignX: Human-Competitive Algorithm Designer for Black-Box Optimization [11.467054529894497]
We present DesignX, the first automated algorithm design framework that generates an effective specific to a given black-box optimization problem within seconds.<n>A comprehensive modular algorithmic space is first built, embracing hundreds of algorithm components collected from decades of research.<n>Remarkably, through days of autonomous learning, DesignX-generated meta-trainings surpass human-crafted designs.
arXiv Detail & Related papers (2025-05-23T13:16:01Z) - ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity.
This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics.
Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z) - SEKI: Self-Evolution and Knowledge Inspiration based Neural Architecture Search via Large Language Models [11.670056503731905]
We introduce SEKI, a novel large language model (LLM)-based neural architecture search (NAS) method.
Inspired by the chain-of-thought (CoT) paradigm in modern LLMs, SEKI operates in two key stages: self-evolution and knowledge distillation.
arXiv Detail & Related papers (2025-02-27T09:17:49Z) - AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations [3.6231171463908938]
Design space exploration plays a crucial role in enabling custom hardware architectures.<n>Recently, AIrchitect v1, the first attempt to address the limitations of DSE into a search-time classification problem.
arXiv Detail & Related papers (2025-01-17T04:57:42Z) - A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation.<n> deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency.<n>This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z) - Improving Parallel Program Performance Through DSL-Driven Code Generation with LLM Optimizers [9.880183350366792]
Mapping computations to processors and assigning memory are critical for maximizing performance in parallel programming.
These mapping decisions are managed through the development of specialized low-level system code, called mappers, crafted by performance engineers.
We introduce an approach that leverages recent advances in LLM-baseds for mapper design.
In under ten minutes, our method automatically discovers mappers that surpass human expert designs in scientific applications by up to 1.34X speedup.
arXiv Detail & Related papers (2024-10-21T04:08:37Z) - Search for Efficient Large Language Models [52.98684997131108]
Large Language Models (LLMs) have long held sway in the realms of artificial intelligence research.
Weight pruning, quantization, and distillation have been embraced to compress LLMs, targeting memory reduction and inference acceleration.
Most model compression techniques concentrate on weight optimization, overlooking the exploration of optimal architectures.
arXiv Detail & Related papers (2024-09-25T21:32:12Z) - Rome was Not Built in a Single Step: Hierarchical Prompting for LLM-based Chip Design [22.70660876673987]
Large Language Models (LLMs) are effective in computer hardware synthesis via hardware description language (HDL) generation.
However, LLM-assisted approaches for HDL generation struggle when handling complex tasks.
We introduce a suite of hierarchical prompting techniques which facilitate efficient stepwise design methods.
arXiv Detail & Related papers (2024-07-23T21:18:31Z) - Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM models [8.02264001053969]
Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts.
With constant innovation in LLM serving optimizations and model architecture evolving at breakneck speed, the hardware requirements to meet Service Level Objectives (SLOs) remain an open research question.
We present an analytical tool, GenZ, to efficiently navigate the relationship between diverse LLM model architectures and AI platform design parameters.
arXiv Detail & Related papers (2024-06-03T18:00:50Z) - CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
Toolsets [75.64181719386497]
We present CRAFT, a tool creation and retrieval framework for large language models (LLMs)
It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks.
Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning.
arXiv Detail & Related papers (2023-09-29T17:40:26Z) - ArchGym: An Open-Source Gymnasium for Machine Learning Assisted
Architecture Design [52.57999109204569]
ArchGym is an open-source framework that connects diverse search algorithms to architecture simulators.
We evaluate ArchGym across multiple vanilla and domain-specific search algorithms in designing custom memory controller, deep neural network accelerators, and custom SOC for AR/VR workloads.
arXiv Detail & Related papers (2023-06-15T06:41:23Z) - LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization [4.951599300340954]
Large Language Models (LLMs) have emerged as powerful tools capable of accomplishing a broad spectrum of tasks.
We propose using the coding abilities of LLMs to introduce meaningful variations to code defining neural networks.
By merging the code-generating abilities of LLMs with the diversity and robustness of QD solutions, we introduce textttLLMatic, a Neural Architecture Search (NAS) algorithm.
arXiv Detail & Related papers (2023-06-01T19:33:21Z) - POPNASv3: a Pareto-Optimal Neural Architecture Search Solution for Image
and Time Series Classification [8.190723030003804]
This article presents the third version of a sequential model-based NAS algorithm targeting different hardware environments and multiple classification tasks.
Our method is able to find competitive architectures within large search spaces, while keeping a flexible structure and data processing pipeline to adapt to different tasks.
The experiments performed on images and time series classification datasets provide evidence that POPNASv3 can explore a large set of assorted operators and converge to optimal architectures suited for the type of data provided under different scenarios.
arXiv Detail & Related papers (2022-12-13T17:14:14Z) - Multi-Agent Reinforcement Learning for Microprocessor Design Space
Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency.
We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem.
Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z) - Pareto-aware Neural Architecture Generation for Diverse Computational
Budgets [94.27982238384847]
Existing methods often perform an independent architecture search process for each target budget.
We propose a Neural Architecture Generator (PNAG) which only needs to be trained once and dynamically produces the optimal architecture for any given budget via inference.
Such a joint search algorithm not only greatly reduces the overall search cost but also improves the results.
arXiv Detail & Related papers (2022-10-14T08:30:59Z) - FreeREA: Training-Free Evolution-based Architecture Search [17.202375422110553]
FreeREA is a custom cell-based evolution NAS algorithm that exploits an optimised combination of training-free metrics to rank architectures.
Our experiments, carried out on the common benchmarks NAS-Bench-101 and NATS-Bench, demonstrate that i) FreeREA is a fast, efficient, and effective search method for models automatic design.
arXiv Detail & Related papers (2022-06-17T11:16:28Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - Pareto-Frontier-aware Neural Architecture Generation for Diverse Budgets [93.79297053429447]
Existing methods often perform an independent architecture search for each target budget.
We propose a general architecture generator that automatically produces effective architectures for an arbitrary budget merely via model inference.
Extensive experiments on three platforms (i.e., mobile, CPU, and GPU) show the superiority of the proposed method over existing NAS methods.
arXiv Detail & Related papers (2021-02-27T13:59:17Z) - Off-Policy Reinforcement Learning for Efficient and Effective GAN
Architecture Search [50.40004966087121]
We introduce a new reinforcement learning based neural architecture search (NAS) methodology for generative adversarial network (GAN) architecture search.
The key idea is to formulate the GAN architecture search problem as a Markov decision process (MDP) for smoother architecture sampling.
We exploit an off-policy GAN architecture search algorithm that makes efficient use of the samples generated by previous policies.
arXiv Detail & Related papers (2020-07-17T18:29:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.