Related papers: Beyond the Surface: A Solution-Aware Retrieval Model for Competition-level Code Generation

Beyond the Surface: A Solution-Aware Retrieval Model for Competition-level Code Generation

URL: http://arxiv.org/abs/2509.01129v1
Date: Mon, 01 Sep 2025 04:47:51 GMT
Title: Beyond the Surface: A Solution-Aware Retrieval Model for Competition-level Code Generation
Authors: Shiwen Zhang, Lingxiang Wang, Hainan Zhang, Ziwei Wang, Sijia Wen, Zhiming Zheng,
Abstract summary: We propose SolveRank, a solution-aware ranking model empowered by synthetic data for competitive programming tasks.<n>Specifically, we leverage the DeepSeek-R1 model to generate equivalent but differently phrased new problems, verified by GPT-4o for solution consistency.<n>During inference, SolveRank retrieves relevant problems and corresponding code from the corpus to assist a downstream code generator.
Score: 12.805684772054684
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In competitive programming task, problem statements are often embedded within elaborate narrative backgrounds, requiring deep understanding of the underlying solutions to successfully complete the tasks. Current code generation models primarily focus on token-level semantic modeling, highly susceptible to distractions from irrelevant narrative statements. Inspired by RAG, retrieving reference code with similar solutions may help enhance model performance on difficult problems. However, existing retrieval models also emphasize surface-level semantic similarity, neglecting the deeper solution-level logical similarities that are critical in competitive programming. Therefore, designing ranking models capable of accurately identifying and retrieving problems and corresponding codes remains an urgent research problem in competitive code generation. In this paper, we propose SolveRank, a solution-aware ranking model empowered by synthetic data for competitive programming tasks. Specifically, we leverage the DeepSeek-R1 model to generate logically equivalent but differently phrased new problems, verified by GPT-4o for solution consistency. Then, we train SolveRank with these as positive samples and BM25/random-retrieved problems as negatives. During inference, SolveRank retrieves relevant problems and corresponding code from the corpus to assist a downstream code generator. Experiments on the xCodeEval dataset demonstrate that SolveRank outperforms SOTA ranking methods in precision and recall metrics, and boosts code generation performance for difficult problems.

Related papers

Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks [41.75017840131367]
Large language models (LLMs) have shown impressive promise in code generation.<n>We present a scalable synthetic data generation pipeline that produces nearly 800k instruction-reasoning-code-test quadruplets.
arXiv Detail & Related papers (2025-10-27T10:54:25Z)
Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models [44.17697803306198]
We introduce textitCodeSeq, a synthetic post-training dataset built from number sequences.<n>Our pipeline generates supervised fine data by reflecting on failed test cases and incorporating iterative corrections.<n> Experimental results show that the models trained with textitCodeSeq improve on various reasoning tasks and can preserve the models' OOD performance.
arXiv Detail & Related papers (2025-10-16T12:29:40Z)
SciML Agents: Write the Solver, Not the Solution [69.5021018644143]
We introduce two new datasets: a diagnostic dataset of adversarial "misleading" problems; and a large-scale benchmark of 1,000 diverse ODE tasks.<n>We evaluate open- and closed-source LLM models along two axes: (i) unguided versus guided prompting with domain-specific knowledge; and (ii) off-the-shelf versus fine-tuned variants.<n>Preliminary results indicate that careful prompting and fine-tuning can yield a specialized LLM agent capable of reliably solving simple ODE problems.
arXiv Detail & Related papers (2025-09-12T02:53:57Z)
CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming [56.17331530444765]
CPRet is a retrieval-oriented benchmark suite for competitive programming.<n>It covers four retrieval tasks: two code-centric (i.e., Text-to-Code and Code-to-Code) and two newly proposed problem-centric tasks (i.e., Problem-to-Duplicate and Simplified-to-Full)<n>Our contribution includes both high-quality training data and temporally separated test sets for reliable evaluation.
arXiv Detail & Related papers (2025-05-19T10:07:51Z)
SweRank: Software Issue Localization with Code Ranking [109.3289316191729]
SweRank is an efficient retrieve-and-rerank framework for software issue localization.<n>We construct SweLoc, a large-scale dataset curated from public GitHub repositories.<n>We show that SweRank achieves state-of-the-art performance, outperforming both prior ranking models and costly agent-based systems.
arXiv Detail & Related papers (2025-05-07T19:44:09Z)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking [45.18877655831977]
We introduce CoRNStack, a large-scale, high-quality contrastive training dataset for code that spans multiple programming languages.<n>This dataset is curated using consistency filtering to eliminate noisy positives and is further enriched with mined hard negatives.<n>We demonstrate that contrastive training of embedding models using CoRNStack leads to state-of-the-art performance across a variety of code retrieval tasks.
arXiv Detail & Related papers (2024-12-01T23:54:12Z)
Learning to Better Search with Language Models via Guided Reinforced Self-Training [15.289058352618468]
We propose guided self-training (Guided-ReST) to improve the model's capability for effective search during inference.<n>Guided-ReST incorporates optimal solutions into the model's search procedure, enabling the generation of high-quality search traces.<n>Our method significantly enhances the search capabilities of language models on arithmetic reasoning and code self-repair tasks.
arXiv Detail & Related papers (2024-10-03T21:07:59Z)
List-aware Reranking-Truncation Joint Model for Search and Retrieval-augmented Generation [80.12531449946655]
We propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently. GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture. Our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.
arXiv Detail & Related papers (2024-02-05T06:52:53Z)
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning. During inference, we introduce a new generation procedure with a critical sampling strategy. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)
On the Difficulty of Generalizing Reinforcement Learning Framework for Combinatorial Optimization [6.935838847004389]
Combinatorial optimization problems (COPs) on the graph with real-life applications are canonical challenges in Computer Science. The underlying principle of this approach is to deploy a graph neural network (GNN) for encoding both the local information of the nodes and the graph-structured data. We use the security-aware phone clone allocation in the cloud as a classical quadratic assignment problem (QAP) to investigate whether or not deep RL-based model is generally applicable to solve other classes of such hard problems.
arXiv Detail & Related papers (2021-08-08T19:12:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.