Related papers: Synthetic Prefixes to Mitigate Bias in Real-Time Neural Query Autocomplete

Synthetic Prefixes to Mitigate Bias in Real-Time Neural Query Autocomplete

URL: http://arxiv.org/abs/2510.01574v1
Date: Thu, 02 Oct 2025 01:44:44 GMT
Title: Synthetic Prefixes to Mitigate Bias in Real-Time Neural Query Autocomplete
Authors: Adithya Rajan, Xiaoyu Liu, Prateek Verma, Vibhu Arora,
Abstract summary: We introduce a data-centric approach for mitigating presentation bias in real-time neural query autocomplete systems through the use of synthetic prefixes.<n>These prefixes are generated from complete user queries collected during regular search sessions where autocomplete was not active.<n>Our system demonstrates statistically significant improvements in user engagement, as measured by mean reciprocal rank and related metrics.
Score: 11.632489223177773
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a data-centric approach for mitigating presentation bias in real-time neural query autocomplete systems through the use of synthetic prefixes. These prefixes are generated from complete user queries collected during regular search sessions where autocomplete was not active. This allows us to enrich the training data for learning to rank models with more diverse and less biased examples. This method addresses the inherent bias in engagement signals collected from live query autocomplete interactions, where model suggestions influence user behavior. Our neural ranker is optimized for real-time deployment under strict latency constraints and incorporates a rich set of features, including query popularity, seasonality, fuzzy match scores, and contextual signals such as department affinity, device type, and vertical alignment with previous user queries. To support efficient training, we introduce a task-specific simplification of the listwise loss, reducing computational complexity from $O(n^2)$ to $O(n)$ by leveraging the query autocomplete structure of having only one ground-truth selection per prefix. Deployed in a large-scale e-commerce setting, our system demonstrates statistically significant improvements in user engagement, as measured by mean reciprocal rank and related metrics. Our findings show that synthetic prefixes not only improve generalization but also provide a scalable path toward bias mitigation in other low-latency ranking tasks, including related searches and query recommendations.

Related papers

A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.<n>With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)<n>Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.<n>High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z)
Sequential Decision-Making for Inline Text Autocomplete [14.83046358936405]
We study the problem of improving inline autocomplete suggestions in text entry systems. We use reinforcement learning to learn suggestion policies through repeated interactions with a target user.
arXiv Detail & Related papers (2024-03-21T22:33:16Z)
Type-based Neural Link Prediction Adapter for Complex Query Answering [2.1098688291287475]
We propose TypE-based Neural Link Prediction Adapter (TENLPA), a novel model that constructs type-based entity-relation graphs. In order to effectively combine type information with complex logical queries, an adaptive learning mechanism is introduced. Experiments on 3 standard datasets show that TENLPA model achieves state-of-the-art performance on complex query answering.
arXiv Detail & Related papers (2024-01-29T10:54:28Z)
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization [66.22007368434633]
We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR) The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task. We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.
arXiv Detail & Related papers (2023-09-29T14:18:59Z)
Adapting Neural Link Predictors for Data-Efficient Complex Query Answering [45.961111441411084]
We propose a parameter-efficient score emphadaptation model optimised to re-calibrate neural link prediction scores for the complex query answering task. CQD$mathcalA$ produces significantly more accurate results than current state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T00:17:16Z)
Fine-grained Retrieval Prompt Tuning [149.9071858259279]
Fine-grained Retrieval Prompt Tuning steers a frozen pre-trained model to perform the fine-grained retrieval task from the perspectives of sample prompt and feature adaptation. Our FRPT with fewer learnable parameters achieves the state-of-the-art performance on three widely-used fine-grained datasets.
arXiv Detail & Related papers (2022-07-29T04:10:04Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Session-Aware Query Auto-completion using Extreme Multi-label Ranking [61.753713147852125]
We take the novel approach of modeling session-aware query auto-completion as an e Multi-Xtreme Ranking (XMR) problem. We adapt a popular XMR algorithm for this purpose by proposing several modifications to the key steps in the algorithm. Our approach meets the stringent latency requirements for auto-complete systems while leveraging session information in making suggestions.
arXiv Detail & Related papers (2020-12-09T17:56:22Z)
Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks. We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.