Related papers: An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation

An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation

URL: http://arxiv.org/abs/2505.20182v1
Date: Mon, 26 May 2025 16:25:38 GMT
Title: An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Authors: Shubham Gandhi, Atharva Naik, Yiqing Xie, Carolyn Rose,
Abstract summary: We study cost-efficient collaboration between strong and weak language models for repository-level code generation.<n>We evaluate a broad spectrum of collaboration strategies: context-based, pipeline-based, and dynamic.<n>Our most effective collaborative strategy achieves equivalent performance to the strong model while reducing the cost by 40%.
Score: 8.28760409619167
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study cost-efficient collaboration between strong and weak language models for repository-level code generation, where the weak model handles simpler tasks at lower cost, and the most challenging tasks are delegated to the strong model. While many works propose architectures for this task, few analyze performance relative to cost. We evaluate a broad spectrum of collaboration strategies: context-based, pipeline-based, and dynamic, on GitHub issue resolution. Our most effective collaborative strategy achieves equivalent performance to the strong model while reducing the cost by 40%. Based on our findings, we offer actionable guidelines for choosing collaboration strategies under varying budget and performance constraints. Our results show that strong-weak collaboration substantially boosts the weak model's performance at a fraction of the cost, pipeline and context-based methods being most efficient. We release the code for our work at https://github.com/shubhamrgandhi/codegen-strong-weak-collab.

Related papers

MoCo: A One-Stop Shop for Model Collaboration Research [132.52160996841505]
We present MoCo: a one-stop Python library of executing, benchmarking, and comparing model collaboration algorithms at scale.<n>MoCo features 26 model collaboration methods, spanning diverse levels of cross-model information exchange.<n>Extensive experiments with MoCo demonstrate that most collaboration strategies outperform models without collaboration.<n>We envision MoCo as a valuable toolkit to facilitate and turbocharge the quest for an open, modular, decentralized, and collaborative AI future.
arXiv Detail & Related papers (2026-01-29T04:36:52Z)
Multi-task Code LLMs: Data Mix or Model Merge? [5.741318641887549]
We compare two approaches for creating small, multi-task code LLMs: data mixing versus model merging.<n>Our evaluation on HumanEval, MBPP, and CodeXGlue benchmarks reveals that model merging achieves the best overall performance at larger scale.
arXiv Detail & Related papers (2026-01-28T23:06:09Z)
CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning [12.710191300398924]
We introduce CoDA, a reinforcement learning framework that decouples high-level planning from low-level execution.<n>CoDA achieves significant performance improvements over state-of-the-art baselines on complex multi-hop question-answering benchmarks.
arXiv Detail & Related papers (2025-12-14T14:41:29Z)
Collaborative LLM Inference via Planning for Efficient Reasoning [50.04696654679751]
We propose a test-time collaboration framework in which a planner model first generates a plan, defined as a distilled and high-level abstraction of the problem.<n>Small and large models take turns acting as planner and reasoner, exchanging plans in a multi-round cascade to collaboratively solve complex tasks.<n>Our method achieves accuracy comparable to strong proprietary models alone, while significantly reducing reliance on paid inference.
arXiv Detail & Related papers (2025-06-13T08:35:50Z)
CoLA: Collaborative Low-Rank Adaptation [3.421904493396495]
Fine-tuning a pre-trained model for specific tasks achieves strong performance; however, it is computationally expensive and inefficient.<n>LoRA, in particular, has proven effective, but its application to multi-task scenarios is limited by interference between tasks.<n>We propose CoLA, a more flexible LoRA architecture and three collaborative strategies to enhance performance by better utilizing the quantitative relationships between $A$ and $B$.
arXiv Detail & Related papers (2025-05-21T12:46:42Z)
MergeBench: A Benchmark for Merging Domain-Specialized LLMs [19.49737955489798]
We introduce MergeBench, a comprehensive evaluation suite designed to assess model merging at scale.<n> MergeBench builds on state-of-the-art open-source language models, including Llama and Gemma families at 2B to 9B scales.<n>We assess eight representative merging methods across multi-task performance, forgetting and runtime efficiency.
arXiv Detail & Related papers (2025-05-16T04:02:55Z)
Synergistic Weak-Strong Collaboration by Aligning Preferences [53.47675666475273]
Current Large Language Models (LLMs) excel in general reasoning yet struggle with specialized tasks requiring proprietary or domain-specific knowledge.<n>We propose a collaborative framework that pairs a specialized weak model with a general strong model.<n>We find that the collaboration significantly outperforms each model alone by leveraging complementary strengths.
arXiv Detail & Related papers (2025-04-21T15:57:33Z)
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute [54.22256089592864]
This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute.<n>Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple models, even weaker ones, to leverage their complementary strengths.
arXiv Detail & Related papers (2025-04-01T13:13:43Z)
Reward-Guided Speculative Decoding for Efficient LLM Reasoning [80.55186052123196]
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs)<n>RSD incorporates a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness.<n>RSD delivers significant efficiency gains against decoding with the target model only, while achieving significant better accuracy than parallel decoding method on average.
arXiv Detail & Related papers (2025-01-31T17:19:57Z)
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [72.10987117380584]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.<n>We find existing methods discard task-specific information that, while causing conflicts, is crucial for performance.<n>Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion [23.63688816017186]
Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge. We propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task. Our method closes the performance gap by 96.4% in single-task scenarios and by 86.3% in multi-task scenarios.
arXiv Detail & Related papers (2024-06-17T03:07:41Z)
Majority Kernels: An Approach to Leverage Big Model Dynamics for Efficient Small Model Training [32.154166415680066]
Methods like distillation, compression or quantization help leverage the highly performant large models to induce smaller performant ones. This paper explores the hypothesis that a single training run can simultaneously train a larger model for performance and derive a smaller model for deployment.
arXiv Detail & Related papers (2024-02-07T17:07:41Z)
Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously. We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously. Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.