Related papers: Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors

Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors

URL: http://arxiv.org/abs/2504.04785v1
Date: Mon, 07 Apr 2025 07:27:31 GMT
Title: Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
Authors: Fan Nie, Lan Feng, Haotian Ye, Weixin Liang, Pan Lu, Huaxiu Yao, Alexandre Alahi, James Zou,
Abstract summary: This paper proposes Weakfor-Strong Harnessing (W4S), a novel framework that customizes smaller, cost-efficient language models to design and optimize for harnessing stronger models.<n>W4S formulates design as a multi-turn markov decision process and introduces reinforcement learning for agentic workflow optimization.<n> Empirical results demonstrate the superiority of W4S that our 7B meta-agent, trained with just one GPU hour, outperforms the strongest baseline by 2.9% 24.6% across eleven benchmarks.
Score: 104.5401871607713
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Efficiently leveraging of the capabilities of contemporary large language models (LLMs) is increasingly challenging, particularly when direct fine-tuning is expensive and often impractical. Existing training-free methods, including manually or automated designed workflows, typically demand substantial human effort or yield suboptimal results. This paper proposes Weak-for-Strong Harnessing (W4S), a novel framework that customizes smaller, cost-efficient language models to design and optimize workflows for harnessing stronger models. W4S formulates workflow design as a multi-turn markov decision process and introduces reinforcement learning for agentic workflow optimization (RLAO) to train a weak meta-agent. Through iterative interaction with the environment, the meta-agent learns to design increasingly effective workflows without manual intervention. Empirical results demonstrate the superiority of W4S that our 7B meta-agent, trained with just one GPU hour, outperforms the strongest baseline by 2.9% ~ 24.6% across eleven benchmarks, successfully elevating the performance of state-of-the-art models such as GPT-3.5-Turbo and GPT-4o. Notably, W4S exhibits strong generalization capabilities across both seen and unseen tasks, offering an efficient, high-performing alternative to directly fine-tuning strong models.

Related papers

CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design [11.639825726501659]
Large language models (LLMs) can autonomously discover high-performings at a fraction of the traditional cost.<n>We propose a hybrid framework that combines verbal and numerical guidance.<n>Our method outperforms state-of-the-art (SOTA) baselines across various optimization tasks.
arXiv Detail & Related papers (2025-05-18T07:48:47Z)
EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models [64.18350535770357]
We propose an automatic pruning method for large vision-language models to enhance the efficiency of multimodal reasoning. Our approach only leverages a small number of samples to search for the desired pruning policy. We conduct extensive experiments on the ScienceQA, Vizwiz, MM-vet, and LLaVA-Bench datasets for the task of visual question answering.
arXiv Detail & Related papers (2025-03-19T16:07:04Z)
MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models [34.138699712315]
This paper introduces a novel vision--action (VLA) model, mixture of robotic experts (MoRE) for quadruped robots.<n>MoRE integrates multiple low-rank adaptation modules as distinct experts within a dense multi-modal large language model.<n>Experiments demonstrate that MoRE outperforms all baselines across six different skills and exhibits superior generalization capabilities in out-of-distribution scenarios.
arXiv Detail & Related papers (2025-03-11T03:13:45Z)
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation [68.17081518640934]
We propose a PrIrmitive-driVen waypOinT-aware world model for Robotic manipulation (PIVOT-R) PIVOT-R consists of a Waypoint-aware World Model (WAWM) and a lightweight action prediction module. Our PIVOT-R outperforms state-of-the-art open-source models on the SeaWave benchmark, achieving an average relative improvement of 19.45% across four levels of instruction tasks.
arXiv Detail & Related papers (2024-10-14T11:30:18Z)
Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework [0.0]
Auto-Evolve is a novel framework that enables Large Language Models to self-create dynamic reasoning modules. We evaluate Auto-Evolve on the challenging BigBench-Hard dataset with Claude 2.0, Claude 3 Sonnet, Mistral Large, and GPT 4.
arXiv Detail & Related papers (2024-10-08T20:07:47Z)
Patched MOA: optimizing inference for diverse software development tasks [1.14219428942199]
This paper introduces Patched MOA, an inference optimization technique that significantly enhances the performance of large language models (LLMs) We evaluate three inference optimization algorithms - Best of N, Mixture of Agents, and Monte Carlo Tree Search. We demonstrate that Patched MOA can boost the performance of smaller models to surpass that of larger, more expensive models.
arXiv Detail & Related papers (2024-07-26T05:34:34Z)
Compact Language Models via Pruning and Knowledge Distillation [61.56557874432008]
Minitron models exhibit up to a 16% improvement in MMLU scores compared to training from scratch. Deriving 8B and 4B models from an already pretrained 15B model using our approach requires up to 40x fewer training tokens per model compared to training from scratch.
arXiv Detail & Related papers (2024-07-19T21:47:57Z)
VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought [38.03704123835915]
ICAL iteratively refines suboptimal trajectories into high-quality data with optimized actions and detailed reasoning.<n>ICAL surpasses state-of-the-art in TEACh, VisualWebArena, and Ego4D.<n>ICAL scales 2x better than raw human demonstrations and reduces manual prompt engineering.
arXiv Detail & Related papers (2024-06-20T17:45:02Z)
InternLM2 Technical Report [159.70692271378581]
This paper introduces InternLM2, an open-source Large Language Models (LLMs) that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages.
arXiv Detail & Related papers (2024-03-26T00:53:24Z)
Majority Kernels: An Approach to Leverage Big Model Dynamics for Efficient Small Model Training [32.154166415680066]
Methods like distillation, compression or quantization help leverage the highly performant large models to induce smaller performant ones. This paper explores the hypothesis that a single training run can simultaneously train a larger model for performance and derive a smaller model for deployment.
arXiv Detail & Related papers (2024-02-07T17:07:41Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.