Accelerating Code Search with Deep Hashing and Code Classification
- URL: http://arxiv.org/abs/2203.15287v2
- Date: Thu, 31 Mar 2022 03:01:55 GMT
- Title: Accelerating Code Search with Deep Hashing and Code Classification
- Authors: Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang,
and Michael R. Lyu
- Abstract summary: Code search is to search reusable code snippets from source code corpus based on natural languages queries.
We propose a novel method CoSHC to accelerate code search with deep hashing and code classification.
- Score: 64.3543949306799
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code search is to search reusable code snippets from source code corpus based
on natural languages queries. Deep learning-based methods of code search have
shown promising results. However, previous methods focus on retrieval accuracy
but lacked attention to the efficiency of the retrieval process. We propose a
novel method CoSHC to accelerate code search with deep hashing and code
classification, aiming to perform an efficient code search without sacrificing
too much accuracy. To evaluate the effectiveness of CoSHC, we apply our method
to five code search models. Extensive experimental results indicate that
compared with previous code search baselines, CoSHC can save more than 90% of
retrieval time meanwhile preserving at least 99% of retrieval accuracy.
Related papers
- RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation [65.5353313491402]
We introduce RethinkMCTS, which employs the Monte Carlo Tree Search (MCTS) algorithm to conduct thought-level searches before generating code.
We construct verbal feedback from fine-turbo code execution feedback to refine erroneous thoughts during the search.
We demonstrate that RethinkMCTS outperforms previous search-based and feedback-based code generation baselines.
arXiv Detail & Related papers (2024-09-15T02:07:28Z) - Revisiting Code Search in a Two-Stage Paradigm [67.02322603435628]
TOSS is a two-stage fusion code search framework.
It first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates.
It then uses fine-grained cross-encoders for finer ranking.
arXiv Detail & Related papers (2022-08-24T02:34:27Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Search4Code: Code Search Intent Classification Using Weak Supervision [5.441318460204245]
We propose a weak supervision based approach for detecting code search intent in search queries for C# and Java programming languages.
We evaluate the approach against several baselines on a real-world dataset comprised of over 1 million queries mined from Bing web search engine.
We are also releasing Search4Code, the first large-scale real-world dataset of code search queries mined from Bing web search engine.
arXiv Detail & Related papers (2020-11-24T08:06:53Z) - CoNCRA: A Convolutional Neural Network Code Retrieval Approach [0.0]
We propose a technique for semantic code search: A Convolutional Neural Network approach to code retrieval.
Our technique aims to find the code snippet that most closely matches the developer's intent, expressed in natural language.
We evaluated our approach's efficacy on a dataset composed of questions and code snippets collected from Stack Overflow.
arXiv Detail & Related papers (2020-09-03T23:38:52Z) - Neural Code Search Revisited: Enhancing Code Snippet Retrieval through
Natural Language Intent [1.1168121941015012]
We study how code retrieval systems can be improved by leveraging descriptions to better capture the intents of code snippets.
Building on recent progress in transfer learning and natural language processing, we create a domain-specific retrieval model for code annotated with a natural language description.
arXiv Detail & Related papers (2020-08-27T15:39:09Z) - Faster Person Re-Identification [68.22203008760269]
We introduce a new solution for fast ReID by formulating a novel Coarse-to-Fine hashing code search strategy.
It uses shorter codes to coarsely rank broad matching similarities and longer codes to refine only a few top candidates for more accurate instance ReID.
Experimental results on 2 datasets show that our proposed method (CtF) is not only 8% more accurate but also 5x faster than contemporary hashing ReID methods.
arXiv Detail & Related papers (2020-08-16T03:02:49Z) - Progressively Pretrained Dense Corpus Index for Open-Domain Question
Answering [87.32442219333046]
We propose a simple and resource-efficient method to pretrain the paragraph encoder.
Our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.
arXiv Detail & Related papers (2020-04-30T18:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.