Code Search Debiasing:Improve Search Results beyond Overall Ranking
Performance
- URL: http://arxiv.org/abs/2311.14901v2
- Date: Sat, 17 Feb 2024 01:22:24 GMT
- Title: Code Search Debiasing:Improve Search Results beyond Overall Ranking
Performance
- Authors: Sheng Zhang, Hui Li, Yanlin Wang, Zhao Wei, Yong Xiu, Juhong Wang,
Rongong Ji
- Abstract summary: Biased code search engines provide poor user experience, even though they show promising overall performance.
We develop a general debiasing framework that employs reranking to calibrate search results.
Experiments show that our framework can effectively reduce biases.
- Score: 10.059769537424582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code search engine is an essential tool in software development. Many code
search methods have sprung up, focusing on the overall ranking performance of
code search. In this paper, we study code search from another perspective by
analyzing the bias of code search models. Biased code search engines provide
poor user experience, even though they show promising overall performance. Due
to different development conventions (e.g., prefer long queries or
abbreviations), some programmers will find the engine useful, while others may
find it hard to get desirable search results. To mitigate biases, we develop a
general debiasing framework that employs reranking to calibrate search results.
It can be easily plugged into existing engines and handle new code search
biases discovered in the future. Experiments show that our framework can
effectively reduce biases. Meanwhile, the overall ranking performance of code
search gets improved after debiasing.
Related papers
- SmartSearch: Process Reward-Guided Query Refinement for Search Agents [63.46067892354375]
Large language model (LLM)-based search agents have proven promising for addressing knowledge-intensive problems.<n>Existing works largely focus on optimizing the reasoning paradigms of search agents, yet the quality of intermediate search queries during reasoning remains overlooked.<n>We introduce SmartSearch, a framework built upon two key mechanisms to mitigate this issue.
arXiv Detail & Related papers (2026-01-08T12:39:05Z) - RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation [65.5353313491402]
We introduce RethinkMCTS, which employs the Monte Carlo Tree Search (MCTS) algorithm to conduct thought-level searches before generating code.
We construct verbal feedback from fine-turbo code execution feedback to refine erroneous thoughts during the search.
We demonstrate that RethinkMCTS outperforms previous search-based and feedback-based code generation baselines.
arXiv Detail & Related papers (2024-09-15T02:07:28Z) - A Survey of Source Code Search: A 3-Dimensional Perspective [17.524674603550043]
Code search is widely concerned by software engineering researchers because it can improve the productivity and quality of software development.
To realize effective and efficient code search, many techniques have been proposed successively.
arXiv Detail & Related papers (2023-11-13T06:42:08Z) - Whole Page Unbiased Learning to Rank [59.52040055543542]
Unbiased Learning to Rank(ULTR) algorithms are proposed to learn an unbiased ranking model with biased click data.
We propose a Bias Agnostic whole-page unbiased Learning to rank algorithm, named BAL, to automatically find the user behavior model.
Experimental results on a real-world dataset verify the effectiveness of the BAL.
arXiv Detail & Related papers (2022-10-19T16:53:08Z) - Revisiting Code Search in a Two-Stage Paradigm [67.02322603435628]
TOSS is a two-stage fusion code search framework.
It first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates.
It then uses fine-grained cross-encoders for finer ranking.
arXiv Detail & Related papers (2022-08-24T02:34:27Z) - Accelerating Code Search with Deep Hashing and Code Classification [64.3543949306799]
Code search is to search reusable code snippets from source code corpus based on natural languages queries.
We propose a novel method CoSHC to accelerate code search with deep hashing and code classification.
arXiv Detail & Related papers (2022-03-29T07:05:30Z) - Search4Code: Code Search Intent Classification Using Weak Supervision [5.441318460204245]
We propose a weak supervision based approach for detecting code search intent in search queries for C# and Java programming languages.
We evaluate the approach against several baselines on a real-world dataset comprised of over 1 million queries mined from Bing web search engine.
We are also releasing Search4Code, the first large-scale real-world dataset of code search queries mined from Bing web search engine.
arXiv Detail & Related papers (2020-11-24T08:06:53Z) - COSEA: Convolutional Code Search with Layer-wise Attention [90.35777733464354]
We propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the code's intrinsic structural logic.
COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.
arXiv Detail & Related papers (2020-10-19T13:53:38Z) - Best-First Beam Search [78.71330480725668]
We show that the standard implementation of beam search can be made up to 10x faster in practice.
We propose a memory-reduced variant of Best-First Beam Search, which has a similar beneficial search bias in terms of downstream performance.
arXiv Detail & Related papers (2020-07-08T05:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.