Synthesizing Conjunctive Queries for Code Search
- URL: http://arxiv.org/abs/2305.04316v2
- Date: Thu, 11 May 2023 15:02:19 GMT
- Title: Synthesizing Conjunctive Queries for Code Search
- Authors: Chengpeng Wang, Peisen Yao, Wensheng Tang, Gang Fan, and Charles Zhang
- Abstract summary: Squid is a new conjunctive query algorithm for searching code with target patterns.
Squid successfully synthesizes the conjunctive queries for all the tasks, taking only 2.56 seconds on average.
- Score: 9.146394499214672
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents Squid, a new conjunctive query synthesis algorithm for
searching code with target patterns. Given positive and negative examples along
with a natural language description, Squid analyzes the relations derived from
the examples by a Datalog-based program analyzer and synthesizes a conjunctive
query expressing the search intent. The synthesized query can be further used
to search for desired grammatical constructs in the editor. To achieve high
efficiency, we prune the huge search space by removing unnecessary relations
and enumerating query candidates via refinement. We also introduce two
quantitative metrics for query prioritization to select the queries from
multiple candidates, yielding desired queries for code search. We have
evaluated Squid on over thirty code search tasks. It is shown that Squid
successfully synthesizes the conjunctive queries for all the tasks, taking only
2.56 seconds on average.
Related papers
- PseudoSeer: a Search Engine for Pseudocode [18.726136894285403]
A novel pseudocode search engine is designed to facilitate efficient retrieval and search of academic papers containing pseudocode.
By leveraging snippets, the system enables users to search across various facets of a paper, such as the title, abstract, author information, and code snippets.
A weighted BM25-based ranking algorithm is used by the search engine, and factors considered when prioritizing search results are described.
arXiv Detail & Related papers (2024-11-19T16:58:03Z) - Aligning Query Representation with Rewritten Query and Relevance Judgments in Conversational Search [32.35446999027349]
We leverage both rewritten queries and relevance judgments in the conversational search data to train a better query representation model.
The proposed model -- Query Representation Alignment Conversational Retriever, QRACDR, is tested on eight datasets.
arXiv Detail & Related papers (2024-07-29T17:14:36Z) - CoSQA+: Enhancing Code Search Dataset with Matching Code [27.10957318333608]
CoSQA+ pairs high-quality queries with multiple suitable codes.
CoSQA+ has demonstrated superior quality over CoSQA.
We propose a new metric to assess one-to-N code search performance.
arXiv Detail & Related papers (2024-06-17T14:34:14Z) - ExcluIR: Exclusionary Neural Information Retrieval [74.08276741093317]
We present ExcluIR, a set of resources for exclusionary retrieval.
evaluation benchmark includes 3,452 high-quality exclusionary queries.
training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document.
arXiv Detail & Related papers (2024-04-26T09:43:40Z) - Decoding a Neural Retriever's Latent Space for Query Suggestion [28.410064376447718]
We show that it is possible to decode a meaningful query from its latent representation and, when moving in the right direction in latent space, to decode a query that retrieves the relevant paragraph.
We employ the query decoder to generate a large synthetic dataset of query reformulations for MSMarco.
On this data, we train a pseudo-relevance feedback (PRF) T5 model for the application of query suggestion.
arXiv Detail & Related papers (2022-10-21T16:19:31Z) - Query-Response Interactions by Multi-tasks in Semantic Search for
Chatbot Candidate Retrieval [12.615150401073711]
We propose a novel approach, called Multitask-based Semantic Search Neural Network (MSSNN) for candidate retrieval.
The method employs a Seq2Seq modeling task to learn a good query encoder, and then performs a word prediction task to build response embeddings, finally conducts a simple matching model to form the dot-product scorer.
arXiv Detail & Related papers (2022-08-23T15:07:35Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - How to Query An Oracle? Efficient Strategies to Label Data [59.89900843097016]
We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning.
We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O(fracNk2)$.
In addition, we present an adaptive greedy query scheme, which achieves an average rate of $approx 0.2N$ queries per sample with triplet queries.
arXiv Detail & Related papers (2021-10-05T20:15:35Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - Session-Aware Query Auto-completion using Extreme Multi-label Ranking [61.753713147852125]
We take the novel approach of modeling session-aware query auto-completion as an e Multi-Xtreme Ranking (XMR) problem.
We adapt a popular XMR algorithm for this purpose by proposing several modifications to the key steps in the algorithm.
Our approach meets the stringent latency requirements for auto-complete systems while leveraging session information in making suggestions.
arXiv Detail & Related papers (2020-12-09T17:56:22Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.