A Systematic Review of Automated Query Reformulations in Source Code
Search
- URL: http://arxiv.org/abs/2108.09646v2
- Date: Thu, 8 Jun 2023 22:10:08 GMT
- Title: A Systematic Review of Automated Query Reformulations in Source Code
Search
- Authors: Mohammad Masudur Rahman and Chanchal K. Roy
- Abstract summary: We select 70 studies on query reformulations from 2,970 candidate studies.
We discuss the best practices and future opportunities to advance the state of research in search query reformulations.
- Score: 12.234169944475537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fixing software bugs and adding new features are two of the major maintenance
tasks. Software bugs and features are reported as change requests. Developers
consult these requests and often choose a few keywords from them as an ad hoc
query. Then they execute the query with a search engine to find the exact
locations within software code that need to be changed. Unfortunately, even
experienced developers often fail to choose appropriate queries, which leads to
costly trials and errors during a code search. Over the years, many studies
attempt to reformulate the ad hoc queries from developers to support them. In
this systematic literature review, we carefully select 70 primary studies on
query reformulations from 2,970 candidate studies, perform an in-depth
qualitative analysis (e.g., Grounded Theory), and then answer seven research
questions with major findings. First, to date, eight major methodologies (e.g.,
term weighting, term co-occurrence analysis, thesaurus lookup) have been
adopted to reformulate queries. Second, the existing studies suffer from
several major limitations (e.g., lack of generalizability, vocabulary mismatch
problem, subjective bias) that might prevent their wide adoption. Finally, we
discuss the best practices and future opportunities to advance the state of
research in search query reformulations.
Related papers
- Forgetful by Design? A Critical Audit of YouTube's Search API for Academic Research [0.0]
This paper critically audits the search endpoint of YouTube's Data API (v3), a common tool for academic research.<n>We identify major limitations regarding completeness, representativeness, consistency, and bias.
arXiv Detail & Related papers (2025-06-13T12:39:59Z) - MultiConIR: Towards multi-condition Information Retrieval [57.6405602406446]
We introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios.
We propose three tasks to assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity.
arXiv Detail & Related papers (2025-03-11T05:02:03Z) - Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [102.31558123570437]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)
We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval [52.134133938779776]
We present CLARINET, a system that asks informative clarification questions by choosing questions whose answers would maximize certainty in the correct candidate.
Our approach works by augmenting a large language model (LLM) to condition on a retrieval distribution, finetuning end-to-end to generate the question that would have maximized the rank of the true candidate at each turn.
arXiv Detail & Related papers (2024-04-28T18:21:31Z) - Typo-Robust Representation Learning for Dense Retrieval [6.148710657178892]
One of the main challenges of dense retrieval in real-world settings is the handling of queries containing misspelled words.
A popular approach for handling misspelled queries is minimizing the representations discrepancy between misspelled queries and their pristine ones.
Unlike the existing approaches, which only focus on the alignment between misspelled and pristine queries, our method also improves the contrast between each misspelled query and its surrounding queries.
arXiv Detail & Related papers (2023-06-17T13:48:30Z) - ConvGQR: Generative Query Reformulation for Conversational Search [37.54018632257896]
ConvGQR is a new framework to reformulate conversational queries based on generative pre-trained language models.
We propose a knowledge infusion mechanism to optimize both query reformulation and retrieval.
arXiv Detail & Related papers (2023-05-25T01:45:06Z) - Keyword Embeddings for Query Suggestion [3.7900158137749322]
This paper proposes two novel models for the keyword suggestion task trained on scientific literature.
Our techniques adapt the architecture of Word2Vec and FastText to generate keyword embeddings by leveraging documents' keyword co-occurrence.
We evaluate our proposals against the state-of-the-art word and sentence embedding models showing considerable improvements over the baselines for the tasks.
arXiv Detail & Related papers (2023-01-19T11:13:04Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - Diversity driven Query Rewriting in Search Advertising [1.5289756643078838]
generative retrieval models have been shown to be effective at the task of generating such query rewrites.
We introduce CLOVER, a framework to generate both high-quality and diverse rewrites.
We empirically show the effectiveness of our proposed approach through offline experiments on search queries across geographies spanning three major languages.
arXiv Detail & Related papers (2021-06-07T17:30:45Z) - Automated Query Reformulation for Efficient Search based on Query Logs
From Stack Overflow [0.0]
We propose an automated software-specific query reformulation approach based on deep learning.
We construct a large-scale query reformulation corpus, including the original queries and corresponding reformulated ones.
Our approach trains a Transformer model that can automatically generate candidate reformulated queries when given the user's original query.
arXiv Detail & Related papers (2021-02-01T13:31:50Z) - On the Social and Technical Challenges of Web Search Autosuggestion
Moderation [118.47867428272878]
Autosuggestions are typically generated by machine learning (ML) systems trained on a corpus of search logs and document representations.
While current search engines have become increasingly proficient at suppressing such problematic suggestions, there are still persistent issues that remain.
We discuss several dimensions of problematic suggestions, difficult issues along the pipeline, and why our discussion applies to the increasing number of applications beyond web search.
arXiv Detail & Related papers (2020-07-09T19:22:00Z) - Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers.
We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.