Related papers: Accelerating Code Search with Deep Hashing and Code Classification

Accelerating Code Search with Deep Hashing and Code Classification

URL: http://arxiv.org/abs/2203.15287v2
Date: Thu, 31 Mar 2022 03:01:55 GMT
Title: Accelerating Code Search with Deep Hashing and Code Classification
Authors: Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Michael R. Lyu
Abstract summary: Code search is to search reusable code snippets from source code corpus based on natural languages queries. We propose a novel method CoSHC to accelerate code search with deep hashing and code classification.
Score: 64.3543949306799
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Code search is to search reusable code snippets from source code corpus based on natural languages queries. Deep learning-based methods of code search have shown promising results. However, previous methods focus on retrieval accuracy but lacked attention to the efficiency of the retrieval process. We propose a novel method CoSHC to accelerate code search with deep hashing and code classification, aiming to perform an efficient code search without sacrificing too much accuracy. To evaluate the effectiveness of CoSHC, we apply our method to five code search models. Extensive experimental results indicate that compared with previous code search baselines, CoSHC can save more than 90% of retrieval time meanwhile preserving at least 99% of retrieval accuracy.

Related papers

SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing [83.35231185111464]
Deep learning has shifted the retrieval paradigm from lexical-based matching to encode source code and queries into vector representations. Previous research proposes deep hashing-based methods, which generate hash codes for queries and code snippets and use Hamming distance for rapid recall of code candidates. We propose a novel approach, which converts long hash codes calculated by existing deep hashing approaches into several short hash code segments through an iterative training strategy.
arXiv Detail & Related papers (2024-12-16T12:51:35Z)
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation [65.5353313491402]
We introduce RethinkMCTS, which employs the Monte Carlo Tree Search (MCTS) algorithm to conduct thought-level searches before generating code. We construct verbal feedback from fine-turbo code execution feedback to refine erroneous thoughts during the search. We demonstrate that RethinkMCTS outperforms previous search-based and feedback-based code generation baselines.
arXiv Detail & Related papers (2024-09-15T02:07:28Z)
Revisiting Code Search in a Two-Stage Paradigm [67.02322603435628]
TOSS is a two-stage fusion code search framework. It first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates. It then uses fine-grained cross-encoders for finer ranking.
arXiv Detail & Related papers (2022-08-24T02:34:27Z)
Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z)
Search4Code: Code Search Intent Classification Using Weak Supervision [5.441318460204245]
We propose a weak supervision based approach for detecting code search intent in search queries for C# and Java programming languages. We evaluate the approach against several baselines on a real-world dataset comprised of over 1 million queries mined from Bing web search engine. We are also releasing Search4Code, the first large-scale real-world dataset of code search queries mined from Bing web search engine.
arXiv Detail & Related papers (2020-11-24T08:06:53Z)
CoNCRA: A Convolutional Neural Network Code Retrieval Approach [0.0]
We propose a technique for semantic code search: A Convolutional Neural Network approach to code retrieval. Our technique aims to find the code snippet that most closely matches the developer's intent, expressed in natural language. We evaluated our approach's efficacy on a dataset composed of questions and code snippets collected from Stack Overflow.
arXiv Detail & Related papers (2020-09-03T23:38:52Z)
Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent [1.1168121941015012]
We study how code retrieval systems can be improved by leveraging descriptions to better capture the intents of code snippets. Building on recent progress in transfer learning and natural language processing, we create a domain-specific retrieval model for code annotated with a natural language description.
arXiv Detail & Related papers (2020-08-27T15:39:09Z)
Faster Person Re-Identification [68.22203008760269]
We introduce a new solution for fast ReID by formulating a novel Coarse-to-Fine hashing code search strategy. It uses shorter codes to coarsely rank broad matching similarities and longer codes to refine only a few top candidates for more accurate instance ReID. Experimental results on 2 datasets show that our proposed method (CtF) is not only 8% more accurate but also 5x faster than contemporary hashing ReID methods.
arXiv Detail & Related papers (2020-08-16T03:02:49Z)
Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering [87.32442219333046]
We propose a simple and resource-efficient method to pretrain the paragraph encoder. Our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.
arXiv Detail & Related papers (2020-04-30T18:09:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.