Related papers: From Memorization to Creativity: LLM as a Designer of Novel Neural-Architectures

From Memorization to Creativity: LLM as a Designer of Novel Neural-Architectures

URL: http://arxiv.org/abs/2601.02997v1
Date: Tue, 06 Jan 2026 13:20:28 GMT
Title: From Memorization to Creativity: LLM as a Designer of Novel Neural-Architectures
Authors: Waleed Khalid, Dmitry Ignatov, Radu Timofte,
Abstract summary: Large language models (LLMs) excel in program synthesis, yet their ability to autonomously navigate neural architecture design-balancing reliability, performance, and structural novelty--remains underexplored.<n>We address this by placing a code-oriented LLM within a closed-loop synthesis framework, analyzing its evolution over 22 supervised fine-tuning cycles.
Score: 48.83701310501069
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) excel in program synthesis, yet their ability to autonomously navigate neural architecture design--balancing syntactic reliability, performance, and structural novelty--remains underexplored. We address this by placing a code-oriented LLM within a closed-loop synthesis framework, analyzing its evolution over 22 supervised fine-tuning cycles. The model synthesizes PyTorch convolutional networks which are validated, evaluated via low-fidelity performance signals (single-epoch accuracy), and filtered using a MinHash-Jaccard criterion to prevent structural redundancy. High-performing, novel architectures are converted into prompt-code pairs for iterative fine-tuning via parameter-efficient LoRA adaptation, initialized from the LEMUR dataset. Across cycles, the LLM internalizes empirical architectural priors, becoming a robust generator. The valid generation rate stabilizes at 50.6 percent (peaking at 74.5 percent), while mean first-epoch accuracy rises from 28.06 percent to 50.99 percent, and the fraction of candidates exceeding 40 percent accuracy grows from 2.04 percent to 96.81 percent. Analyses confirm the model moves beyond replicating existing motifs, synthesizing 455 high-performing architectures absent from the original corpus. By grounding code synthesis in execution feedback, this work provides a scalable blueprint for transforming stochastic generators into autonomous, performance-driven neural designers, establishing that LLMs can internalize empirical, non-textual rewards to transcend their training data.

Related papers

From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs [48.83701310501069]
Large language models (LLMs) have achieved notable performance in code synthesis.<n>We introduce a performance-aware, closed-loop solution that enables LLMs to autonomously engineer optimal transformations.<n>We fine-tune LLMs with Low-Rank Adaptation on a novel repository of more than 6,000 empirically evaluated PyTorch augmentation functions.
arXiv Detail & Related papers (2026-01-07T11:13:02Z)
NNGPT: Rethinking AutoML with Large Language Models [36.90850535125572]
NNGPT is an open-source framework that turns a large language model (LLM) into a self-improving AutoML engine for neural network development.<n>It integrates within one unified workflow five synergistic LLM-based pipelines: zero-shot architecture synthesis, hyper parameter optimization, code-aware accuracy/early-stop prediction, and reinforcement learning.<n>The system has already generated over 5K validated models, proving NNGPT as an autonomous AutoML engine.
arXiv Detail & Related papers (2025-11-25T14:10:44Z)
wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation [1.2929845407528824]
We introduce wa-hls4ml, a benchmark for ML accelerator resource and latency estimation.<n>We also introduce GNN- and transformer-based surrogate models that predict latency and resources for ML accelerators.
arXiv Detail & Related papers (2025-11-06T17:18:13Z)
QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code [52.66657751895655]
Large Language Models (LLMs) offer a compelling new paradigm: Neural Compilation.<n>This paper introduces NeuComBack, a novel benchmark dataset specifically designed for IR-to-assembly compilation.<n>We propose a self-evolving prompt optimization method that enables LLMs to evolve their internal prompt strategies.
arXiv Detail & Related papers (2025-11-03T03:20:26Z)
Beyond Single LLMs: Enhanced Code Generation via Multi-Stage Performance-Guided LLM Orchestration [12.674888937998086]
Large Language Models (LLMs) have become the predominant paradigm for automated code generation.<n>This paper challenges the single-model convention by introducing a multi-stage, performance-guided orchestration framework.<n>Perch orchestrates top-performing LLMs for each task context through stage-wise validation and rollback mechanisms.
arXiv Detail & Related papers (2025-10-01T19:07:16Z)
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation [52.2244588424002]
We present RoboTwin 2.0, a scalable framework for automated, large-scale generation of diverse and realistic data.<n>At its core is RoboTwin-OD, an object library of 731 instances across 147 categories with semantic and manipulation-relevant annotations.<n>To improve sim-to-real transfer, RoboTwin 2.0 applies structured domain randomization along five axes.
arXiv Detail & Related papers (2025-06-22T16:26:53Z)
ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity.<n>This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics.<n>Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z)
Few-Shot Optimized Framework for Hallucination Detection in Resource-Limited NLP Systems [1.0124625066746595]
We introduce DeepSeek Few-shot optimization to enhance weak label generation through iterative prompt engineering.<n>We achieve high-quality annotations that considerably enhanced the performance of downstream models.<n>We further fine-tuned the Mistral-7B-Instruct-v0.3 model on these optimized annotations, enabling it to accurately detect hallucinations in resource-limited settings.
arXiv Detail & Related papers (2025-01-28T01:26:22Z)
rule4ml: An Open-Source Tool for Resource Utilization and Latency Estimation for ML Models on FPGA [0.0]
This paper introduces a novel method to predict the resource utilization and inference latency of Neural Networks (NNs) before their synthesis and implementation on FPGA. We leverage HLS4ML, a tool-flow that helps translate NNs into high-level synthesis (HLS) code. Our method uses trained regression models for immediate pre-synthesis predictions.
arXiv Detail & Related papers (2024-08-09T19:35:10Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
TarGEN: Targeted Data Generation with Large Language Models [51.87504111286201]
TarGEN is a multi-step prompting strategy for generating high-quality synthetic datasets. We augment TarGEN with a method known as self-correction empowering LLMs to rectify inaccurately labeled instances. A comprehensive analysis of the synthetic dataset compared to the original dataset reveals similar or higher levels of dataset complexity and diversity.
arXiv Detail & Related papers (2023-10-27T03:32:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.