Improving Query Safety at Pinterest
- URL: http://arxiv.org/abs/2006.11511v2
- Date: Tue, 23 Jun 2020 04:12:09 GMT
- Title: Improving Query Safety at Pinterest
- Authors: Abhijit Mahabal, Yinrui Li, Rajat Raina, Daniel Sun, Revati Mahajan,
Jure Leskovec
- Abstract summary: PinSets is a system for query-set expansion.
It applies a simple yet powerful mechanism to search user sessions.
It expands a tiny seed set into thousands of related queries at nearly perfect precision.
- Score: 46.57632646205479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Query recommendations in search engines is a double edged sword, with
undeniable benefits but potential of harm. Identifying unsafe queries is
necessary to protect users from inappropriate query suggestions. However,
identifying these is non-trivial because of the linguistic diversity resulting
from large vocabularies, social-group-specific slang and typos, and because the
inappropriateness of a term depends on the context. Here we formulate the
problem as query-set expansion, where we are given a small and potentially
biased seed set and the aim is to identify a diverse set of semantically
related queries. We present PinSets, a system for query-set expansion, which
applies a simple yet powerful mechanism to search user sessions, expanding a
tiny seed set into thousands of related queries at nearly perfect precision,
deep into the tail, along with explanations that are easy to interpret. PinSets
owes its high quality expansion to using a hybrid of textual and behavioral
techniques (i.e., treating queries both as compositional and as black boxes).
Experiments show that, for the domain of drugs-related queries, PinSets expands
20 seed queries into 15,670 positive training examples at over 99\% precision.
The generated expansions have diverse vocabulary and correctly handles words
with ambiguous safety. PinSets decreased unsafe query suggestions at Pinterest
by 90\%.
Related papers
- AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries [56.82807063333088]
We introduce a new benchmark, AMBROSIA, which we hope will inform and inspire the development of text-to-open programs.
Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness)
In each case, the ambiguity persists even when the database context is provided.
This is achieved through a novel approach that involves controlled generation of databases from scratch.
arXiv Detail & Related papers (2024-06-27T10:43:04Z) - Blowfish: Topological and statistical signatures for quantifying ambiguity in semantic search [0.0]
We show that proxy ambiguous queries display different distributions of homologies 0 and 1 based features than proxy clear queries.
We propose a strategy to leverage those findings as a new scoring strategy of semantic similarities.
arXiv Detail & Related papers (2024-06-12T08:26:30Z) - QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set
Operations [36.70770411188946]
QUEST is a dataset of 3357 natural language queries with implicit set operations.
The dataset challenges models to match multiple constraints mentioned in queries with corresponding evidence in documents.
We analyze several modern retrieval systems, finding that they often struggle on such queries.
arXiv Detail & Related papers (2023-05-19T14:19:32Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - ConQX: Semantic Expansion of Spoken Queries for Intent Detection based
on Conditioned Text Generation [4.264192013842096]
We propose a method for semantic expansion of spoken queries, called ConQX.
To avoid off-topic text generation, we condition the input query to a structured context with prompt mining.
We then apply zero-shot, one-shot, and few-shot learning to fine-tune BERT and RoBERTa for intent detection.
arXiv Detail & Related papers (2021-09-02T05:57:07Z) - Session-Aware Query Auto-completion using Extreme Multi-label Ranking [61.753713147852125]
We take the novel approach of modeling session-aware query auto-completion as an e Multi-Xtreme Ranking (XMR) problem.
We adapt a popular XMR algorithm for this purpose by proposing several modifications to the key steps in the algorithm.
Our approach meets the stringent latency requirements for auto-complete systems while leveraging session information in making suggestions.
arXiv Detail & Related papers (2020-12-09T17:56:22Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z) - Coupled intrinsic and extrinsic human language resource-based query
expansion [0.0]
We present here a query expansion framework which capitalizes on both linguistic characteristics for query constituent encoding, expansion concept extraction and concept weighting.
A thorough empirical evaluation on real-world datasets validates our approach against unigram language model, relevance model and a sequential dependence based technique.
arXiv Detail & Related papers (2020-04-23T11:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.