Exposing Query Identification for Search Transparency
- URL: http://arxiv.org/abs/2110.07701v1
- Date: Thu, 14 Oct 2021 20:19:27 GMT
- Title: Exposing Query Identification for Search Transparency
- Authors: Ruohan Li, Jianxiang Li, Bhaskar Mitra, Fernando Diaz, Asia J. Biega
- Abstract summary: We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
- Score: 69.06545074617685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Search systems control the exposure of ranked content to searchers. In many
cases, creators value not only the exposure of their content but, moreover, an
understanding of the specific searches where the content is surfaced. The
problem of identifying which queries expose a given piece of content in the
ranking results is an important and relatively under-explored search
transparency challenge. Exposing queries are useful for quantifying various
issues of search bias, privacy, data protection, security, and search engine
optimization.
Exact identification of exposing queries in a given system is computationally
expensive, especially in dynamic contexts such as web search. In quest of a
more lightweight solution, we explore the feasibility of approximate exposing
query identification (EQI) as a retrieval task by reversing the role of queries
and documents in two classes of search systems: dense dual-encoder models and
traditional BM25 models. We then propose how this approach can be improved
through metric learning over the retrieval embedding space. We further derive
an evaluation metric to measure the quality of a ranking of exposing queries,
as well as conducting an empirical analysis focusing on various practical
aspects of approximate EQI.
Related papers
- Improving Retrieval in Sponsored Search by Leveraging Query Context Signals [6.152499434499752]
We propose an approach to enhance query understanding by augmenting queries with rich contextual signals.
We use web search titles and snippets to ground queries in real-world information and utilize GPT-4 to generate query rewrites and explanations.
Our context-aware approach substantially outperforms context-free models.
arXiv Detail & Related papers (2024-07-19T14:28:53Z) - Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling.
We generate supplemental training pairs by altering the most important part of a search context.
We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z) - Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness [56.42192735214931]
retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query.
In this work, we study whether retrievers can recognize and respond to different perspectives of the queries.
We show that current retrievers have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives.
arXiv Detail & Related papers (2024-05-04T17:10:00Z) - End-to-end Knowledge Retrieval with Multi-modal Queries [50.01264794081951]
ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries.
We introduce a retriever model ReViz'' that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion.
We demonstrate superior performance in retrieval on two datasets under zero-shot settings.
arXiv Detail & Related papers (2023-06-01T08:04:12Z) - How Does Generative Retrieval Scale to Millions of Passages? [68.98628807288972]
We conduct the first empirical study of generative retrieval techniques across various corpus scales.
We scale generative retrieval to millions of passages with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters.
While generative retrieval is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge.
arXiv Detail & Related papers (2023-05-19T17:33:38Z) - Improving Content Retrievability in Search with Controllable Query
Generation [5.450798147045502]
Machine-learned search engines have a high retrievability bias, where the majority of the queries return the same entities.
We propose CtrlQGen, a method that generates queries for a chosen underlying intent-narrow or broad.
Our results on datasets from the domains of music, podcasts, and books reveal that we can significantly decrease the retrievability bias of a dense retrieval model.
arXiv Detail & Related papers (2023-03-21T07:46:57Z) - Compositional Attention: Disentangling Search and Retrieval [66.7108739597771]
Multi-head, key-value attention is the backbone of the Transformer model and its variants.
Standard attention heads learn a rigid mapping between search and retrieval.
We propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure.
arXiv Detail & Related papers (2021-10-18T15:47:38Z) - Neural Methods for Effective, Efficient, and Exposure-Aware Information
Retrieval [7.3371176873092585]
We present novel neural architectures and methods motivated by the specific needs and challenges of information retrieval.
In many real-life IR tasks, the retrieval involves extremely large collections--such as the document index of a commercial Web search engine--containing billions of documents.
arXiv Detail & Related papers (2020-12-21T21:20:16Z) - Guided Transformer: Leveraging Multiple External Sources for
Representation Learning in Conversational Search [36.64582291809485]
Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems.
In this paper, we enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources.
Our experiments use a public dataset for search clarification and demonstrate significant improvements compared to competitive baselines.
arXiv Detail & Related papers (2020-06-13T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.