A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks
- URL: http://arxiv.org/abs/2512.04329v1
- Date: Wed, 03 Dec 2025 23:28:30 GMT
- Title: A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks
- Authors: Waleed Khalid, Dmitry Ignatov, Radu Timofte,
- Abstract summary: We introduce NN-RAG, a retrieval-augmented generation system that converts large, heterogeneous PyTorchs into a searchable library of validated neural modules.<n>Applying to 19 major repositories, the pipeline extracted 1,289 candidate blocks, validated 941 (73.0%), and demonstrated that over 80% are structurally unique.
- Score: 48.83701310501069
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reusing existing neural-network components is central to research efficiency, yet discovering, extracting, and validating such modules across thousands of open-source repositories remains difficult. We introduce NN-RAG, a retrieval-augmented generation system that converts large, heterogeneous PyTorch codebases into a searchable and executable library of validated neural modules. Unlike conventional code search or clone-detection tools, NN-RAG performs scope-aware dependency resolution, import-preserving reconstruction, and validator-gated promotion -- ensuring that every retrieved block is scope-closed, compilable, and runnable. Applied to 19 major repositories, the pipeline extracted 1,289 candidate blocks, validated 941 (73.0%), and demonstrated that over 80% are structurally unique. Through multi-level de-duplication (exact, lexical, structural), we find that NN-RAG contributes the overwhelming majority of unique architectures to the LEMUR dataset, supplying approximately 72% of all novel network structures. Beyond quantity, NN-RAG uniquely enables cross-repository migration of architectural patterns, automatically identifying reusable modules in one project and regenerating them, dependency-complete, in another context. To our knowledge, no other open-source system provides this capability at scale. The framework's neutral specifications further allow optional integration with language models for synthesis or dataset registration without redistributing third-party code. Overall, NN-RAG transforms fragmented vision code into a reproducible, provenance-tracked substrate for algorithmic discovery, offering a first open-source solution that both quantifies and expands the diversity of executable neural architectures across repositories.
Related papers
- Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition [53.50448142467294]
RAIM is a multi-design and architecture-aware framework for repository-level feature addition.<n>It shifts away from linear patching by generating multiple diverse implementation designs.<n>Experiments on the NoCode-bench Verified dataset demonstrate that RAIM establishes a new state-of-the-art performance.
arXiv Detail & Related papers (2026-03-02T12:50:40Z) - Unified Implementations of Recurrent Neural Networks in Multiple Deep Learning Frameworks [0.5187177298223502]
torchrecurrent, RecurrentLayers.jl, and LuxRecurrentLayers.jl offer a consistent framework for constructing and extending RNN models.<n>All packages are available under the MIT license and actively maintained on GitHub.
arXiv Detail & Related papers (2025-10-24T08:35:33Z) - ONNX-Net: Towards Universal Representations and Instant Performance Prediction for Neural Architectures [60.14199724905456]
ONNX-Bench is a benchmark consisting of a collection of neural networks in a unified format based on ONNX files.<n> ONNX-Net represents any neural architecture using natural language descriptions acting as an input to a performance predictor.<n>Experiments show strong zero-shot performance across disparate search spaces using only a small amount of pretraining samples.
arXiv Detail & Related papers (2025-10-06T15:43:36Z) - LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding [55.5535016040221]
LM-Searcher is a novel framework for cross-domain neural architecture optimization.<n>Central to our approach is NCode, a universal numerical string representation for neural architectures.<n>Our dataset, encompassing a wide range of architecture-performance pairs, encourages robust and transferable learning.
arXiv Detail & Related papers (2025-09-06T09:26:39Z) - Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers [103.4410890572479]
We introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification.<n>LoongBench is a curated seed dataset containing 8,729 human-vetted examples across 12 domains.<n>LoongEnv is a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples.
arXiv Detail & Related papers (2025-09-03T06:42:40Z) - Arch-LLM: Taming LLMs for Neural Architecture Generation via Unsupervised Discrete Representation Learning [2.981775461282335]
A common approach involves the use of Variational Autoencoders (VAEs) to map discrete architectures onto a continuous representation space.<n>We introduce a Vector Quantized Variational Autoencoder (VQ-VAE) to learn a discrete latent space more naturally aligned with the discrete neural architectures.<n>Compared to VAE-based methods, our approach improves the generation of valid and unique architectures by over 80% on NASBench-101 and over 8% on NASBench-201.
arXiv Detail & Related papers (2025-03-28T00:56:56Z) - GNN-Coder: Boosting Semantic Code Retrieval with Combined GNNs and Transformer [15.991615273248804]
We introduce GNN-Coder, a novel framework based on Graph Neural Network (GNN) to utilize Abstract Syntax Tree (AST)<n>GNN-Coder significantly boosts retrieval performance, with a 1%-10% improvement in MRR on the CSN dataset, and a notable 20% gain in zero-shot performance on the CosQA dataset.
arXiv Detail & Related papers (2025-02-21T04:29:53Z) - A novel Region of Interest Extraction Layer for Instance Segmentation [3.5493798890908104]
This paper is motivated by the need to overcome the limitations of existing RoI extractors.
The proposed layer (called Generic RoI Extractor - GRoIE) introduces non-local building blocks and attention mechanisms to boost the performance.
GRoIE can be integrated seamlessly with every two-stage architecture for both object detection and instance segmentation tasks.
arXiv Detail & Related papers (2020-04-28T17:07:32Z) - When Residual Learning Meets Dense Aggregation: Rethinking the
Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations.
Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.