Migrating a Job Search Relevance Function
- URL: http://arxiv.org/abs/2504.01284v1
- Date: Wed, 02 Apr 2025 01:22:55 GMT
- Title: Migrating a Job Search Relevance Function
- Authors: Bennett Mountain, Gabriel Womark, Ritvik Kharkar,
- Abstract summary: We describe the migration of a homebrewed C++ search engine to OpenSearch, aimed at preserving and improving search performance with minimal impact on business metrics.<n>We froze our job corpus and executed queries in low inventory locations to capture a representative mixture of high- and low-quality search results.<n>We fine-tuned a new retrieval algorithm on OpenSearch, replicating key components of the original engine's logic while introducing new functionality where necessary.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we describe the migration of a homebrewed C++ search engine to OpenSearch, aimed at preserving and improving search performance with minimal impact on business metrics. To facilitate the migration, we froze our job corpus and executed queries in low inventory locations to capture a representative mixture of high- and low-quality search results. These query-job pairs were labeled by crowd-sourced annotators using a custom rubric designed to reflect relevance and user satisfaction. Leveraging Bayesian optimization, we fine-tuned a new retrieval algorithm on OpenSearch, replicating key components of the original engine's logic while introducing new functionality where necessary. Through extensive online testing, we demonstrated that the new system performed on par with the original, showing improvements in specific engagement metrics, with negligible effects on revenue.
Related papers
- Automated Query-Product Relevance Labeling using Large Language Models for E-commerce Search [3.392843594990172]
Traditional approaches for annotating query-product pairs rely on human-based labeling services.<n>We show that Large Language Models (LLMs) can approach human-level accuracy on this task in a fraction of the time and cost required by human-labelers.<n>This scalable alternative to human-annotation has significant implications for information retrieval domains.
arXiv Detail & Related papers (2025-02-21T22:59:36Z) - Iterative NLP Query Refinement for Enhancing Domain-Specific Information Retrieval: A Case Study in Career Services [0.13980986259786224]
Retrieving semantically relevant documents in niche domains poses significant challenges for TF-IDF-based systems.<n>This paper introduces an iterative and semi-automated query refinement methodology tailored to Humber College's career services webpages.
arXiv Detail & Related papers (2024-12-22T15:57:35Z) - Learning to Rank for Multiple Retrieval-Augmented Models through Iterative Utility Maximization [21.115495457454365]
This paper investigates the design of a unified search engine to serve multiple retrieval-augmented generation (RAG) agents.
We introduce an iterative approach where the search engine generates retrieval results for these RAG agents and gathers feedback on the quality of the retrieved documents during an offline phase.
We adapt this approach to an online setting, allowing the search engine to refine its behavior based on real-time individual agents feedback.
arXiv Detail & Related papers (2024-10-13T17:53:50Z) - Tree Search for Language Model Agents [69.43007235771383]
We propose an inference-time search algorithm for LM agents to perform exploration and multi-step planning in interactive web environments.
Our approach is a form of best-first tree search that operates within the actual environment space.
It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks.
arXiv Detail & Related papers (2024-07-01T17:07:55Z) - Learning to Retrieve for Job Matching [22.007634436648427]
We discuss applying learning-to-retrieve technology to enhance LinkedIns job search and recommendation systems.
We leverage confirmed hire data to construct a graph that evaluates a seeker's qualification for a job, and utilize learned links for retrieval.
In addition to a solution based on a conventional inverted index, we developed an on-GPU solution capable of supporting both KNN and term matching efficiently.
arXiv Detail & Related papers (2024-02-21T00:05:25Z) - Unified Embedding Based Personalized Retrieval in Etsy Search [0.206242362470764]
We propose learning a unified embedding model incorporating graph, transformer and term-based embeddings end to end.
Our personalized retrieval model significantly improves the overall search experience, as measured by a 5.58% increase in search purchase rate and a 2.63% increase in site-wide conversion rate.
arXiv Detail & Related papers (2023-06-07T23:24:50Z) - Query Rewriting for Retrieval-Augmented Large Language Models [139.242907155883]
Large Language Models (LLMs) play powerful, black-box readers in the retrieve-then-read pipeline.
This work introduces a new framework, Rewrite-Retrieve-Read instead of the previous retrieve-then-read for the retrieval-augmented LLMs.
arXiv Detail & Related papers (2023-05-23T17:27:50Z) - How Does Generative Retrieval Scale to Millions of Passages? [68.98628807288972]
We conduct the first empirical study of generative retrieval techniques across various corpus scales.
We scale generative retrieval to millions of passages with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters.
While generative retrieval is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge.
arXiv Detail & Related papers (2023-05-19T17:33:38Z) - Synergistic Interplay between Search and Large Language Models for
Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections.
InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - Mining Implicit Relevance Feedback from User Behavior for Web Question
Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance.
Our approach significantly improves the accuracy of passage ranking without extra human labeled data.
In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.