Weak Supervision for Improved Precision in Search Systems
- URL: http://arxiv.org/abs/2503.07025v1
- Date: Mon, 10 Mar 2025 08:06:30 GMT
- Title: Weak Supervision for Improved Precision in Search Systems
- Authors: Sriram Vasudevan,
- Abstract summary: We present a weak supervision approach to infer the quality of query-document pairs.<n>We apply it within a Learning to Rank framework to enhance the precision of a large-scale search system.
- Score: 1.5773159234875098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Labeled datasets are essential for modern search engines, which increasingly rely on supervised learning methods like Learning to Rank and massive amounts of data to power deep learning models. However, creating these datasets is both time-consuming and costly, leading to the common use of user click and activity logs as proxies for relevance. In this paper, we present a weak supervision approach to infer the quality of query-document pairs and apply it within a Learning to Rank framework to enhance the precision of a large-scale search system.
Related papers
- DeepRetrieval: Powerful Query Generation for Information Retrieval with Reinforcement Learning [0.9065034043031668]
DeepRetrieval is a novel reinforcement learning-based approach that trains LLMs to perform query augmentation directly through trial and error.<n>Our preliminary results demonstrate that DeepRetrieval significantly outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-02-28T22:16:42Z) - Meta Learning to Rank for Sparsely Supervised Queries [10.422527051110526]
In many real-world search and retrieval scenarios, supervisory signals may not be readily available or could be costly to obtain for some queries.
We propose a novel meta learning to rank framework which leverages fast learning and adaption capability of meta-learning.
The proposed method would yield significant advantages especially when new queries are of different characteristics with the training queries.
arXiv Detail & Related papers (2024-09-29T04:24:38Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - Evaluating and Crafting Datasets Effective for Deep Learning With Data
Maps [0.0]
Training on large datasets often requires excessive system resources and an infeasible amount of time.
For supervised learning, large datasets require more time for manually labeling samples.
We propose a method of curating smaller datasets with comparable out-of-distribution model accuracy after an initial training session.
arXiv Detail & Related papers (2022-08-22T03:30:18Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z) - Understanding the World Through Action [91.3755431537592]
I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning.
I will discuss how such a procedure is more closely aligned with potential downstream tasks.
arXiv Detail & Related papers (2021-10-24T22:33:52Z) - Curriculum Learning: A Survey [65.31516318260759]
Curriculum learning strategies have been successfully employed in all areas of machine learning.
We construct a taxonomy of curriculum learning approaches by hand, considering various classification criteria.
We build a hierarchical tree of curriculum learning methods using an agglomerative clustering algorithm.
arXiv Detail & Related papers (2021-01-25T20:08:32Z) - Online feature selection for rapid, low-overhead learning in networked
systems [0.0]
We present an online algorithm, called OSFS, that selects a small feature set from a large number of available data sources.
We find that OSFS requires several hundreds measurements to reduce the number of data sources by two orders of magnitude.
arXiv Detail & Related papers (2020-10-28T12:00:42Z) - AutoOD: Automated Outlier Detection via Curiosity-guided Search and
Self-imitation Learning [72.99415402575886]
Outlier detection is an important data mining task with numerous practical applications.
We propose AutoOD, an automated outlier detection framework, which aims to search for an optimal neural network model.
Experimental results on various real-world benchmark datasets demonstrate that the deep model identified by AutoOD achieves the best performance.
arXiv Detail & Related papers (2020-06-19T18:57:51Z) - Modeling Document Interactions for Learning to Rank with Regularized
Self-Attention [22.140197412459393]
We explore modeling documents interactions with self-attention based neural networks.
We propose simple yet effective regularization terms designed to model interactions between documents.
We show that training self-attention network with our proposed regularization terms can significantly outperform existing learning to rank methods.
arXiv Detail & Related papers (2020-05-08T09:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.