Related papers: Exposing Query Identification for Search Transparency

Exposing Query Identification for Search Transparency

URL: http://arxiv.org/abs/2110.07701v1
Date: Thu, 14 Oct 2021 20:19:27 GMT
Title: Exposing Query Identification for Search Transparency
Authors: Ruohan Li, Jianxiang Li, Bhaskar Mitra, Fernando Diaz, Asia J. Biega
Abstract summary: We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems. We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
Score: 69.06545074617685
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Search systems control the exposure of ranked content to searchers. In many cases, creators value not only the exposure of their content but, moreover, an understanding of the specific searches where the content is surfaced. The problem of identifying which queries expose a given piece of content in the ranking results is an important and relatively under-explored search transparency challenge. Exposing queries are useful for quantifying various issues of search bias, privacy, data protection, security, and search engine optimization. Exact identification of exposing queries in a given system is computationally expensive, especially in dynamic contexts such as web search. In quest of a more lightweight solution, we explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems: dense dual-encoder models and traditional BM25 models. We then propose how this approach can be improved through metric learning over the retrieval embedding space. We further derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.

Related papers

MMSearch-R1: Incentivizing LMMs to Search [49.889749277236376]
We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables on-demand, multi-turn search in real-world Internet environments.<n>Our framework integrates both image and text search tools, allowing the model to reason about when and how to invoke them guided by an outcome-based reward with a search penalty.
arXiv Detail & Related papers (2025-06-25T17:59:42Z)
ScholarSearch: Benchmarking Scholar Searching Ability of LLMs [5.562566989891248]
We proposed ScholarSearch, the first dataset specifically designed to evaluate the complex information retrieval capabilities of Large Language Models (LLMs) in academic research.<n> ScholarSearch possesses the following key characteristics: Academic Practicality, where question content closely mirrors real academic learning and research environments.<n>We expect to more precisely measure and promote the performance improvement of LLMs in complex academic information retrieval tasks.
arXiv Detail & Related papers (2025-06-11T02:05:23Z)
InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation [63.55258191625131]
InfoDeepSeek is a new benchmark for assessing agentic information seeking in real-world, dynamic web environments.<n>We propose a systematic methodology for constructing challenging queries satisfying the criteria of determinacy, difficulty, and diversity.<n>We develop the first evaluation framework tailored to dynamic agentic information seeking, including fine-grained metrics about the accuracy, utility, and compactness of information seeking outcomes.
arXiv Detail & Related papers (2025-05-21T14:44:40Z)
MultiConIR: Towards multi-condition Information Retrieval [57.6405602406446]
We introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios. We propose three tasks to assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity.
arXiv Detail & Related papers (2025-03-11T05:02:03Z)
Open-Ended and Knowledge-Intensive Video Question Answering [20.256081440725353]
We investigate knowledge-intensive video question answering (KI-VideoQA) through the lens of multi-modal retrieval-augmented generation. Our analysis examines various retrieval augmentation approaches using cutting-edge retrieval and vision language models. We achieve a substantial 17.5% improvement in accuracy on multiple choice questions in the KnowIT VQA dataset.
arXiv Detail & Related papers (2025-02-17T12:40:35Z)
Holistically Guided Monte Carlo Tree Search for Intricate Information Seeking [118.3983437282541]
We introduce an LLM-based search assistant that adopts a new information seeking paradigm with holistically guided Monte Carlo tree search (HG-MCTS) We reformulate the task as a progressive information collection process with a knowledge memory and unite an adaptive checklist with multi-perspective reward modeling in MCTS. Our multi-perspective reward modeling offers both exploration and retrieval rewards, along with progress feedback that tracks completed and remaining sub-goals.
arXiv Detail & Related papers (2025-02-07T08:36:39Z)
PseudoSeer: a Search Engine for Pseudocode [18.726136894285403]
A novel pseudocode search engine is designed to facilitate efficient retrieval and search of academic papers containing pseudocode. By leveraging snippets, the system enables users to search across various facets of a paper, such as the title, abstract, author information, and code snippets. A weighted BM25-based ranking algorithm is used by the search engine, and factors considered when prioritizing search results are described.
arXiv Detail & Related papers (2024-11-19T16:58:03Z)
QUIDS: Query Intent Generation via Dual Space Modeling [12.572815037915348]
We propose a dual-space model that uses semantic relevance and irrelevance information in the returned documents to explain the understanding of the query intent. Our methods produce high-quality query intent descriptions, outperforming existing methods for this task, as well as state-of-the-art query-based summarization methods.
arXiv Detail & Related papers (2024-10-16T09:28:58Z)
Improving Retrieval in Sponsored Search by Leveraging Query Context Signals [6.152499434499752]
We propose an approach to enhance query understanding by augmenting queries with rich contextual signals. We use web search titles and snippets to ground queries in real-world information and utilize GPT-4 to generate query rewrites and explanations. Our context-aware approach substantially outperforms context-free models.
arXiv Detail & Related papers (2024-07-19T14:28:53Z)
Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling. We generate supplemental training pairs by altering the most important part of a search context. We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z)
Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness [56.42192735214931]
retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query. In this work, we study whether retrievers can recognize and respond to different perspectives of the queries. We show that current retrievers have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives.
arXiv Detail & Related papers (2024-05-04T17:10:00Z)
ExcluIR: Exclusionary Neural Information Retrieval [74.08276741093317]
We present ExcluIR, a set of resources for exclusionary retrieval. evaluation benchmark includes 3,452 high-quality exclusionary queries. training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document.
arXiv Detail & Related papers (2024-04-26T09:43:40Z)
How Does Generative Retrieval Scale to Millions of Passages? [68.98628807288972]
We conduct the first empirical study of generative retrieval techniques across various corpus scales. We scale generative retrieval to millions of passages with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters. While generative retrieval is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge.
arXiv Detail & Related papers (2023-05-19T17:33:38Z)
Compositional Attention: Disentangling Search and Retrieval [66.7108739597771]
Multi-head, key-value attention is the backbone of the Transformer model and its variants. Standard attention heads learn a rigid mapping between search and retrieval. We propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure.
arXiv Detail & Related papers (2021-10-18T15:47:38Z)
Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search [36.64582291809485]
Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems. In this paper, we enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources. Our experiments use a public dataset for search clarification and demonstrate significant improvements compared to competitive baselines.
arXiv Detail & Related papers (2020-06-13T03:24:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.