When Many-Shot Prompting Fails: An Empirical Study of LLM Code Translation
- URL: http://arxiv.org/abs/2510.16809v1
- Date: Sun, 19 Oct 2025 12:29:13 GMT
- Title: When Many-Shot Prompting Fails: An Empirical Study of LLM Code Translation
- Authors: Amirkia Rafiei Oskooei, Kaan Baturalp Cosdan, Husamettin Isiktas, Mehmet S. Aktas,
- Abstract summary: Large Language Models (LLMs) with vast context windows offer new avenues for in-context learning (ICL)<n>We systematically evaluate the impact of scaling in-context examples from zero-shot to many-shot configurations of up to 625 examples.<n>Our findings reveal a "many-shot paradox": while static similarity metrics may modestly improve with more examples, functional correctness consistently peaks with few-shot prompting.
- Score: 0.23332469289621782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) with vast context windows offer new avenues for in-context learning (ICL), where providing many examples ("many-shot" prompting) is often assumed to enhance performance. We investigate this assumption for the complex task of code translation. Through a large-scale empirical study of over 90,000 translations, we systematically evaluate the impact of scaling in-context examples from zero-shot to many-shot configurations of up to 625 examples, with prompts spanning from approximately 100,000 to 800,000 tokens. Our findings reveal a "many-shot paradox": while static similarity metrics may modestly improve with more examples, functional correctness consistently peaks with few-shot prompting (5-25 examples). Providing substantially more examples often degrades this crucial functional performance. This study highlights that for code translation, the quality of a few well-chosen examples outweighs sheer quantity, challenging the universal efficacy of "more is better" for ICL and underscoring the task-dependent nature of optimal prompting strategies. Our results have significant implications for effectively leveraging LLMs in software engineering.
Related papers
- On Selecting Few-Shot Examples for LLM-based Code Vulnerability Detection [8.460805514983816]
Large language models (LLMs) have demonstrated impressive capabilities for many coding tasks.<n> detecting code vulnerabilities remains a challenging task for LLMs.<n>In-context learning (ICL) provides few-shot examples similar to the query, along with correct answers.
arXiv Detail & Related papers (2025-10-31T17:41:58Z) - From Few to Many: Self-Improving Many-Shot Reasoners Through Iterative Optimization and Generation [18.988069926846357]
Many-shot in-context learning (ICL) can lead to performance benefits, but it is unclear what aspects dominate the benefits and whether simply scaling to more examples is the most effective way of improving ICL.<n>We propose BRIDGE, an algorithm that alternates between the optimize step with Bayesian optimization to discover the influential sets of examples and the generate step to reuse this set to expand the reasoning paths of the examples back to the many-shot regime automatically.<n>On Gemini, Claude, and Mistral LLMs of different sizes, we show that BRIDGE to significant improvements across a diverse set of tasks, including
arXiv Detail & Related papers (2025-02-01T06:23:24Z) - PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from Related Example Banks [57.86928556668849]
Large Language Models (LLMs) have recently demonstrated impressive few-shot learning capabilities through in-context learning (ICL)<n>ICL performance is highly dependent on the choice of few-shot demonstrations, making the selection of the most optimal examples a persistent research challenge.<n>In this work, we propose PromptRefine, a novel Alternating Minimization approach for example selection that improves ICL performance on low-resource Indic languages.
arXiv Detail & Related papers (2024-12-07T17:51:31Z) - Many-Shot In-Context Learning [58.395589302800566]
Large language models (LLMs) excel at few-shot in-context learning (ICL)
We observe significant performance gains across a wide variety of generative and discriminative tasks.
Unlike few-shot learning, many-shot learning is effective at overriding pretraining biases.
arXiv Detail & Related papers (2024-04-17T02:49:26Z) - ParaICL: Towards Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.<n>Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.<n>We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning.
We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z) - In-context Example Selection with Influences [8.058815264255152]
In-context learning (ICL) is a powerful paradigm emerged from large language models (LLMs)
In this work, we use $textitin-context influences$ to analyze few-shot ICL performance directly from in-context examples.
arXiv Detail & Related papers (2023-02-21T22:47:45Z) - Prompting Large Language Model for Machine Translation: A Case Study [87.88120385000666]
We offer a systematic study on prompting strategies for machine translation.
We examine factors for prompt template and demonstration example selection.
We explore the use of monolingual data and the feasibility of cross-lingual, cross-domain, and sentence-to-document transfer learning.
arXiv Detail & Related papers (2023-01-17T18:32:06Z) - Structured Prompting: Scaling In-Context Learning to 1,000 Examples [78.41281805608081]
We introduce structured prompting that breaks the length limit and scales in-context learning to thousands of examples.
Specifically, demonstration examples are separately encoded with well-designed position embeddings, and then they are jointly attended by the test example using a rescaled attention mechanism.
arXiv Detail & Related papers (2022-12-13T16:31:21Z) - Large Language Models are Zero-Shot Reasoners [28.6899375595088]
Chain of thought (CoT) prompting is a technique for eliciting complex multi-step reasoning through step-by-step answer examples.
We show that LLMs are decent zero-shot reasoners by simply adding Let's think step by step'' before each answer.
Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances.
arXiv Detail & Related papers (2022-05-24T09:22:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.