Related papers: From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs

From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs

URL: http://arxiv.org/abs/2601.03808v1
Date: Wed, 07 Jan 2026 11:13:02 GMT
Title: From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs
Authors: Usha Shrestha, Dmitry Ignatov, Radu Timofte,
Abstract summary: Large language models (LLMs) have achieved notable performance in code synthesis.<n>We introduce a performance-aware, closed-loop solution that enables LLMs to autonomously engineer optimal transformations.<n>We fine-tune LLMs with Low-Rank Adaptation on a novel repository of more than 6,000 empirically evaluated PyTorch augmentation functions.
Score: 48.83701310501069
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have achieved notable performance in code synthesis; however, data-aware augmentation remains a limiting factor, handled via heuristic design or brute-force approaches. We introduce a performance-aware, closed-loop solution in the NNGPT ecosystem of projects that enables LLMs to autonomously engineer optimal transformations by internalizing empirical performance cues. We fine-tune LLMs with Low-Rank Adaptation on a novel repository of more than 6,000 empirically evaluated PyTorch augmentation functions, each annotated solely by downstream model accuracy. Training uses pairwise performance ordering (better-worse transformations), enabling alignment through empirical feedback without reinforcement learning, reward models, or symbolic objectives. This reduces the need for exhaustive search, achieving up to 600x times fewer evaluated candidates than brute-force discovery while maintaining competitive peak accuracy and shifting generation from random synthesis to task-aligned design. Ablation studies show that structured Chain-of-Thought prompting introduces syntactic noise and degrades performance, whereas direct prompting ensures stable optimization in performance-critical code tasks. Qualitative and quantitative analyses demonstrate that the model internalizes semantic performance cues rather than memorizing syntax. These results show that LLMs can exhibit task-level reasoning through non-textual feedback loops, bypassing explicit symbolic rewards.

Related papers

Quantum Circuit Generation via test-time learning with large language models [0.0]
Large language models (LLMs) can generate structured artifacts, but using them as dependables for scientific design requires a mechanism for iterative improvement under black-box evaluation.<n>Here, we cast quantum circuit synthesis as a closed-loop, test-time optimization problem: an LLM proposes edits to a fixed-length gate list, and an external simulator evaluates the resulting state with the Meyer-Wallach (MW) global entanglement measure.<n>We introduce a lightweight test-time learning recipe that can reuse prior high-performing candidates as an explicit memory trace, augments prompts with a score-difference feedback, and applies restart-from-
arXiv Detail & Related papers (2026-02-03T12:41:25Z)
Rectifying LLM Thought from Lens of Optimization [48.98086817378953]
Long chain-of-thought (CoT) prompting enables thorough exploration and deliberation.<n>Despite advances, long-CoT LLMs often exhibit suboptimal reasoning behaviors.<n>We introduce RePro, a novel approach to refine LLM reasoning during post-training.
arXiv Detail & Related papers (2025-12-01T17:41:08Z)
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding [54.72617309922891]
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a core paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs)<n>Previous practice requires the LLM to sequentially generate solutions and self-verifications using two separate prompt templates, which significantly reduces efficiency.<n>We propose LaSeR (Reinforcement Learning with Last-Token Self-Rewarding), an algorithm that simply augments the original RLVR loss with a MSE loss.
arXiv Detail & Related papers (2025-10-16T17:55:11Z)
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z)
Hierarchical Verification of Speculative Beams for Accelerating LLM Inference [0.0]
Hierarchical Verification Tree (HVT) is a novel framework that restructures speculative beam decoding by prioritizing high-likelihood drafts.<n>HVT consistently outperforms existing speculative decoding schemes, achieving substantial reductions in inference time and energy consumption.<n>Findings highlight the potential of hierarchical verification strategies as a new direction for accelerating large language model inference.
arXiv Detail & Related papers (2025-07-30T02:58:03Z)
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency.<n>UPFT removes the need for labeled data or exhaustive sampling.<n> Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z)
PEARL: Towards Permutation-Resilient LLMs [29.55886726376898]
In-context learning (ICL) capability of large language models (LLMs) enables them to perform challenging tasks using provided demonstrations.<n>ICL is highly sensitive to the ordering of demonstrations, leading to instability in predictions.<n>This paper shows that this vulnerability can be exploited to design a natural attack that achieves nearly 80% success rate on LLaMA-3.
arXiv Detail & Related papers (2025-02-20T15:07:02Z)
RaCT: Ranking-aware Chain-of-Thought Optimization for LLMs [30.216174551427443]
Large language models (LLMs) have demonstrated remarkable potential in text reranking tasks.<n> conventional supervised fine-tuning approaches for specializing LLMs in ranking tasks often lead to significant degradation of the models' general-purpose abilities.<n>This paper presents a novel methodology that strategically combines Chain-of-Thought (CoT) prompting techniques with an innovative two-stage training pipeline.
arXiv Detail & Related papers (2024-12-18T23:24:15Z)
In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.