Related papers: LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding

LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding

URL: http://arxiv.org/abs/2509.05657v3
Date: Thu, 25 Sep 2025 05:43:31 GMT
Title: LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding
Authors: Yuxuan Hu, Jihao Liu, Ke Wang, Jinliang Zhen, Weikang Shi, Manyuan Zhang, Qi Dou, Rui Liu, Aojun Zhou, Hongsheng Li,
Abstract summary: LM-Searcher is a novel framework for cross-domain neural architecture optimization.<n>Central to our approach is NCode, a universal numerical string representation for neural architectures.<n>Our dataset, encompassing a wide range of architecture-performance pairs, encourages robust and transferable learning.
Score: 55.5535016040221
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent progress in Large Language Models (LLMs) has opened new avenues for solving complex optimization problems, including Neural Architecture Search (NAS). However, existing LLM-driven NAS approaches rely heavily on prompt engineering and domain-specific tuning, limiting their practicality and scalability across diverse tasks. In this work, we propose LM-Searcher, a novel framework that leverages LLMs for cross-domain neural architecture optimization without the need for extensive domain-specific adaptation. Central to our approach is NCode, a universal numerical string representation for neural architectures, which enables cross-domain architecture encoding and search. We also reformulate the NAS problem as a ranking task, training LLMs to select high-performing architectures from candidate pools using instruction-tuning samples derived from a novel pruning-based subspace sampling strategy. Our curated dataset, encompassing a wide range of architecture-performance pairs, encourages robust and transferable learning. Comprehensive experiments demonstrate that LM-Searcher achieves competitive performance in both in-domain (e.g., CNNs for image classification) and out-of-domain (e.g., LoRA configurations for segmentation and generation) tasks, establishing a new paradigm for flexible and generalizable LLM-based architecture search. The datasets and models will be released at https://github.com/Ashone3/LM-Searcher.

Related papers

Closed-Loop LLM Discovery of Non-Standard Channel Priors in Vision Models [48.83701310501069]
Large Language Models (LLMs) offer a transformative approach to Neural Architecture Search (NAS)<n>We formulate the search as a sequence of conditional code generation tasks, where an LLM refines architectural specifications based on performance telemetry.<n>We generate a vast corpus of valid, shape-consistent architectures via Abstract Syntax Tree (AST) mutations.<n> Experimental results on CIFAR-100 validate the efficacy of this approach, demonstrating that the model yields statistically significant improvements in accuracy.
arXiv Detail & Related papers (2026-01-13T13:00:30Z)
LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding [6.576358106930216]
Designing state encoders for reinforcement learning with multiple information sources remains underexplored and often requires manual design.<n>We formalize this challenge as a problem of composite neural architecture search (NAS), where multiple source-specific modules and a fusion module are jointly optimized.<n>We propose an LLM-driven NAS pipeline in which the LLM serves as a neural architecture design agent, leveraging language-model priors and intermediate-output signals.
arXiv Detail & Related papers (2025-12-07T20:25:07Z)
ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity.<n>This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics.<n>Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z)
Instructing the Architecture Search for Spatial-temporal Sequence Forecasting with LLM [18.649295352998546]
We propose a novel NAS method for STSF based on large language models (LLMs)<n>Our method can achieve competitive effectiveness with superior efficiency against existing NAS methods for STSF.
arXiv Detail & Related papers (2025-03-23T08:59:04Z)
SEKI: Self-Evolution and Knowledge Inspiration based Neural Architecture Search via Large Language Models [11.670056503731905]
We introduce SEKI, a novel large language model (LLM)-based neural architecture search (NAS) method.<n>Inspired by the chain-of-thought (CoT) paradigm in modern LLMs, SEKI operates in two key stages: self-evolution and knowledge distillation.
arXiv Detail & Related papers (2025-02-27T09:17:49Z)
Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback [52.763620660061115]
ONI is a distributed architecture that simultaneously learns an RL policy and an intrinsic reward function.<n>We explore a range of algorithmic choices for reward modeling with varying complexity.<n>Our approach achieves state-of-the-art performance across a range of challenging tasks from the NetHack Learning Environment.
arXiv Detail & Related papers (2024-10-30T13:52:43Z)
Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation [66.72195610471624]
Cross-Domain Sequential Recommendation aims to mine and transfer users' sequential preferences across different domains. We propose a novel framework named URLLM, which aims to improve the CDSR performance by exploring the User Retrieval approach.
arXiv Detail & Related papers (2024-06-05T09:19:54Z)
LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z)
LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization [4.951599300340954]
Large Language Models (LLMs) have emerged as powerful tools capable of accomplishing a broad spectrum of tasks. We propose using the coding abilities of LLMs to introduce meaningful variations to code defining neural networks. By merging the code-generating abilities of LLMs with the diversity and robustness of QD solutions, we introduce textttLLMatic, a Neural Architecture Search (NAS) algorithm.
arXiv Detail & Related papers (2023-06-01T19:33:21Z)
Elastic Architecture Search for Diverse Tasks with Different Resources [87.23061200971912]
We study a new challenging problem of efficient deployment for diverse tasks with different resources, where the resource constraint and task of interest corresponding to a group of classes are dynamically specified at testing time. Previous NAS approaches seek to design architectures for all classes simultaneously, which may not be optimal for some individual tasks. We present a novel and general framework, called Elastic Architecture Search (EAS), permitting instant specializations at runtime for diverse tasks with various resource constraints.
arXiv Detail & Related papers (2021-08-03T00:54:27Z)
MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning [71.90902837008278]
We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL) In order to adapt to different task combinations, we disentangle the GP-MTL networks into single-task backbones. We also propose a novel single-shot gradient-based search algorithm that closes the performance gap between the searched architectures.
arXiv Detail & Related papers (2020-03-31T09:49:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.