Related papers: A Systematic Review of Automated Query Reformulations in Source Code Search

A Systematic Review of Automated Query Reformulations in Source Code Search

URL: http://arxiv.org/abs/2108.09646v2
Date: Thu, 8 Jun 2023 22:10:08 GMT
Title: A Systematic Review of Automated Query Reformulations in Source Code Search
Authors: Mohammad Masudur Rahman and Chanchal K. Roy
Abstract summary: We select 70 studies on query reformulations from 2,970 candidate studies. We discuss the best practices and future opportunities to advance the state of research in search query reformulations.
Score: 12.234169944475537
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced developers often fail to choose appropriate queries, which leads to costly trials and errors during a code search. Over the years, many studies attempt to reformulate the ad hoc queries from developers to support them. In this systematic literature review, we carefully select 70 primary studies on query reformulations from 2,970 candidate studies, perform an in-depth qualitative analysis (e.g., Grounded Theory), and then answer seven research questions with major findings. First, to date, eight major methodologies (e.g., term weighting, term co-occurrence analysis, thesaurus lookup) have been adopted to reformulate queries. Second, the existing studies suffer from several major limitations (e.g., lack of generalizability, vocabulary mismatch problem, subjective bias) that might prevent their wide adoption. Finally, we discuss the best practices and future opportunities to advance the state of research in search query reformulations.

Related papers

Forgetful by Design? A Critical Audit of YouTube's Search API for Academic Research [0.0]
This paper critically audits the search endpoint of YouTube's Data API (v3), a common tool for academic research.<n>We identify major limitations regarding completeness, representativeness, consistency, and bias.
arXiv Detail & Related papers (2025-06-13T12:39:59Z)
MultiConIR: Towards multi-condition Information Retrieval [57.6405602406446]
We introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios. We propose three tasks to assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity.
arXiv Detail & Related papers (2025-03-11T05:02:03Z)
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [102.31558123570437]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs) We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z)
CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval [52.134133938779776]
We present CLARINET, a system that asks informative clarification questions by choosing questions whose answers would maximize certainty in the correct candidate. Our approach works by augmenting a large language model (LLM) to condition on a retrieval distribution, finetuning end-to-end to generate the question that would have maximized the rank of the true candidate at each turn.
arXiv Detail & Related papers (2024-04-28T18:21:31Z)
Typo-Robust Representation Learning for Dense Retrieval [6.148710657178892]
One of the main challenges of dense retrieval in real-world settings is the handling of queries containing misspelled words. A popular approach for handling misspelled queries is minimizing the representations discrepancy between misspelled queries and their pristine ones. Unlike the existing approaches, which only focus on the alignment between misspelled and pristine queries, our method also improves the contrast between each misspelled query and its surrounding queries.
arXiv Detail & Related papers (2023-06-17T13:48:30Z)
ConvGQR: Generative Query Reformulation for Conversational Search [37.54018632257896]
ConvGQR is a new framework to reformulate conversational queries based on generative pre-trained language models. We propose a knowledge infusion mechanism to optimize both query reformulation and retrieval.
arXiv Detail & Related papers (2023-05-25T01:45:06Z)
Keyword Embeddings for Query Suggestion [3.7900158137749322]
This paper proposes two novel models for the keyword suggestion task trained on scientific literature. Our techniques adapt the architecture of Word2Vec and FastText to generate keyword embeddings by leveraging documents' keyword co-occurrence. We evaluate our proposals against the state-of-the-art word and sentence embedding models showing considerable improvements over the baselines for the tasks.
arXiv Detail & Related papers (2023-01-19T11:13:04Z)
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question. We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems. We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z)
Diversity driven Query Rewriting in Search Advertising [1.5289756643078838]
generative retrieval models have been shown to be effective at the task of generating such query rewrites. We introduce CLOVER, a framework to generate both high-quality and diverse rewrites. We empirically show the effectiveness of our proposed approach through offline experiments on search queries across geographies spanning three major languages.
arXiv Detail & Related papers (2021-06-07T17:30:45Z)
Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow [0.0]
We propose an automated software-specific query reformulation approach based on deep learning. We construct a large-scale query reformulation corpus, including the original queries and corresponding reformulated ones. Our approach trains a Transformer model that can automatically generate candidate reformulated queries when given the user's original query.
arXiv Detail & Related papers (2021-02-01T13:31:50Z)
On the Social and Technical Challenges of Web Search Autosuggestion Moderation [118.47867428272878]
Autosuggestions are typically generated by machine learning (ML) systems trained on a corpus of search logs and document representations. While current search engines have become increasingly proficient at suppressing such problematic suggestions, there are still persistent issues that remain. We discuss several dimensions of problematic suggestions, difficult issues along the pipeline, and why our discussion applies to the increasing number of applications beyond web search.
arXiv Detail & Related papers (2020-07-09T19:22:00Z)
Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers. We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.