Related papers: Towards LLM-based optimization compilers. Can LLMs learn how to apply a single peephole optimization? Reasoning is all LLMs need!

Towards LLM-based optimization compilers. Can LLMs learn how to apply a single peephole optimization? Reasoning is all LLMs need!

URL: http://arxiv.org/abs/2412.12163v1
Date: Wed, 11 Dec 2024 18:44:31 GMT
Title: Towards LLM-based optimization compilers. Can LLMs learn how to apply a single peephole optimization? Reasoning is all LLMs need!
Authors: Xiangxin Fang, Lev Mukhanov,
Abstract summary: We show that OpenAI GPT-o1, despite not being fine-tuned, outperforms the fine-tuned Llama2 and GPT-4o.<n>Our findings indicate that this advantage is largely due to the chain-of-thought reasoning implemented in GPT-o1.
Score: 0.9054540533394926
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated great potential in various language processing tasks, and recent studies have explored their application in compiler optimizations. However, all these studies focus on the conventional open-source LLMs, such as Llama2, which lack enhanced reasoning mechanisms. In this study, we investigate the errors produced by the fine-tuned 7B-parameter Llama2 model as it attempts to learn and apply a simple peephole optimization for the AArch64 assembly code. We provide an analysis of the errors produced by the LLM and compare it with state-of-the-art OpenAI models which implement advanced reasoning logic, including GPT-4o and GPT-o1 (preview). We demonstrate that OpenAI GPT-o1, despite not being fine-tuned, outperforms the fine-tuned Llama2 and GPT-4o. Our findings indicate that this advantage is largely due to the chain-of-thought reasoning implemented in GPT-o1. We hope our work will inspire further research on using LLMs with enhanced reasoning mechanisms and chain-of-thought for code generation and optimization.

Related papers

GOLLuM: Gaussian Process Optimized LLMs -- Reframing LLM Finetuning through Bayesian Optimization [0.4037357056611557]
Large Language Models (LLMs) can encode complex relationships in their latent spaces. We introduce LLM-based deep kernels, jointly optimized with GPs to preserve the benefits of both. Our method nearly doubles the discovery rate of high-performing reactions compared to static LLM embeddings.
arXiv Detail & Related papers (2025-04-08T17:59:57Z)
Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective [18.791753740931185]
OpenAI o3-mini and DeepSeek-R1 use enhanced reasoning through Chain-of-Thought (CoT) This paper investigates whether reasoning LLMs can address challenges in High-Level Synthesis (HLS) design space exploration and optimization.
arXiv Detail & Related papers (2025-03-17T01:21:39Z)
LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs. We introduce LLM2, a novel framework that combines an LLM with a process-based verifier. LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z)
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding [74.31981011985681]
Large language models (LLMs) have shown impressive capabilities, but still struggle with complex reasoning tasks requiring multiple steps. We introduce LaTent Reasoning Optimization (LaTRO), a principled framework that formulates reasoning as sampling from a latent distribution. We validate LaTRO through experiments on GSM8K and ARC-Challenge datasets using multiple model architectures.
arXiv Detail & Related papers (2024-11-06T22:02:30Z)
LLM-based Optimization of Compound AI Systems: A Survey [64.39860384538338]
In a compound AI system, components such as an LLM call, a retriever, a code interpreter, or tools are interconnected. Recent advancements enable end-to-end optimization of these parameters using an LLM. This paper presents a survey of the principles and emerging trends in LLM-based optimization of compound AI systems.
arXiv Detail & Related papers (2024-10-21T18:06:25Z)
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling [62.19438812624467]
Large language models (LLMs) have exhibited their problem-solving abilities in mathematical reasoning. We propose OptiBench, a benchmark for End-to-end optimization problem-solving with human-readable inputs and outputs.
arXiv Detail & Related papers (2024-07-13T13:27:57Z)
LLaMoCo: Instruction Tuning of Large Language Models for Optimization Code Generation [26.975412742800614]
We introduce LLaMoCo, the first instruction-tuning framework designed to adapt large language models for solving optimization problems in a code-to-code manner. Specifically, we establish a comprehensive instruction set containing well-described problem prompts and effective optimization codes. Experiment results demonstrate that a CodeGen (350M) model fine-tuned by our LLaMoCo achieves superior optimization performance compared to GPT-4 Turbo.
arXiv Detail & Related papers (2024-03-02T08:21:59Z)
Are Large Language Models Good Prompt Optimizers? [65.48910201816223]
We conduct a study to uncover the actual mechanism of LLM-based Prompt Optimization. Our findings reveal that the LLMs struggle to identify the true causes of errors during reflection, tending to be biased by their own prior knowledge. We introduce a new "Automatic Behavior Optimization" paradigm, which directly optimize the target model's behavior in a more controllable manner.
arXiv Detail & Related papers (2024-02-03T09:48:54Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models. We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z)
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond [29.778018058541676]
GPT-Fathom is an open-source and reproducible evaluation suite for large language models (LLMs) built on top of OpenAI Evals. We evaluate 10+ leading LLMs as well as OpenAI's legacy models on 20+ curated benchmarks across 7 capability categories, all aligned under settings.
arXiv Detail & Related papers (2023-09-28T16:43:35Z)
Learning to Optimize: A Primer and A Benchmark [94.29436694770953]
Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods. This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization.
arXiv Detail & Related papers (2021-03-23T20:46:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.