Improving Ad matching via Cluster-Adaptive Keyword Expansion and Relevance tuning
- URL: http://arxiv.org/abs/2505.18897v1
- Date: Sat, 24 May 2025 23:02:19 GMT
- Title: Improving Ad matching via Cluster-Adaptive Keyword Expansion and Relevance tuning
- Authors: Dipanwita Saha, Anis Zaman, Hua Zou, Ning Chen, Xinxin Shu, Nadia Vase, Abraham Bagherjeiran,
- Abstract summary: This work extends keyword reach through document-side semantic keyword expansion.<n>We propose a solution using a pre-trained siamese model to generate dense vector representations of ad keywords.<n>We introduce a cluster-based thresholding mechanism that adjusts similarity cutoffs based on local semantic density.
- Score: 2.730740440506481
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In search advertising, keyword matching connects user queries with relevant ads. While token-based matching increases ad coverage, it can reduce relevance due to overly permissive semantic expansion. This work extends keyword reach through document-side semantic keyword expansion, using a language model to broaden token-level matching without altering queries. We propose a solution using a pre-trained siamese model to generate dense vector representations of ad keywords and identify semantically related variants through nearest neighbor search. To maintain precision, we introduce a cluster-based thresholding mechanism that adjusts similarity cutoffs based on local semantic density. Each expanded keyword maps to a group of seller-listed items, which may only partially align with the original intent. To ensure relevance, we enhance the downstream relevance model by adapting it to the expanded keyword space using an incremental learning strategy with a lightweight decision tree ensemble. This system improves both relevance and click-through rate (CTR), offering a scalable, low-latency solution adaptable to evolving query behavior and advertising inventory.
Related papers
- Beyond Nearest Neighbors: Semantic Compression and Graph-Augmented Retrieval for Enhanced Vector Search [2.377892000761193]
We introduce a new retrieval paradigm: semantic compression, which aims to select a compact, representative set of vectors that captures the broader semantic structure around a query.<n>To operationalize this idea, we propose graph-augmented vector retrieval, which overlays semantic graphs (e.g., kNN or knowledge-based links) atop vector spaces.<n>Our work outlines a foundation for meaning-centric vector search systems, emphasizing hybrid indexing, diversity-aware querying, and structured semantic retrieval.
arXiv Detail & Related papers (2025-07-25T23:35:11Z) - Knowledge Graph Completion with Relation-Aware Anchor Enhancement [50.50944396454757]
We propose a relation-aware anchor enhanced knowledge graph completion method (RAA-KGC)<n>We first generate anchor entities within the relation-aware neighborhood of the head entity.<n>Then, by pulling the query embedding towards the neighborhoods of the anchors, it is tuned to be more discriminative for target entity matching.
arXiv Detail & Related papers (2025-04-08T15:22:08Z) - Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation [72.28364940168092]
Open-vocabulary semantic segmentation models associate vision and text to label pixels from an undefined set of classes using textual queries.<n>We introduce Semantic Library Adaptation (SemLA), a novel framework for training-free, test-time domain adaptation.
arXiv Detail & Related papers (2025-03-27T17:59:58Z) - LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance.
There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results.
We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z) - A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - Information Retrieval in long documents: Word clustering approach for improving Semantics [0.0]
We propose an alternative to deep neural networks for semantic information retrieval for the case of long documents.<n>This new approach exploiting clustering techniques takes into account the meaning of words in Information Retrieval systems targeting long as well as short documents.
arXiv Detail & Related papers (2023-02-20T18:32:57Z) - Keyword Targeting Optimization in Sponsored Search Advertising:
Combining Selection and Matching [0.0]
An optimal keyword targeting strategy guarantees reaching the right population effectively.
This paper aims to address the keyword targeting problem, which is a challenging task because of the incomplete information of historical advertising performance indices.
Experimental results show that, (a) BB-KSM outperforms seven baselines in terms of profit; (b) BB-KSM shows its superiority as the budget increases.
arXiv Detail & Related papers (2022-10-19T03:37:32Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z) - Graph Adaptive Semantic Transfer for Cross-domain Sentiment
Classification [68.06496970320595]
Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain.
We present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs.
arXiv Detail & Related papers (2022-05-18T07:47:01Z) - Quotient Space-Based Keyword Retrieval in Sponsored Search [7.639289301435027]
Synonymous keyword retrieval has become an important problem for sponsored search.
We propose a novel quotient space-based retrieval framework to address this problem.
This method has been successfully implemented in Baidu's online sponsored search system.
arXiv Detail & Related papers (2021-05-26T07:27:54Z) - Unsupervised Key-phrase Extraction and Clustering for Classification
Scheme in Scientific Publications [0.0]
We investigate possible ways of automating parts of the Systematic Mapping (SM) and Systematic Review (SR) process.
Key-phrases are extracted from scientific documents using unsupervised methods, which are then used to construct the corresponding Classification Scheme.
We also explore how clustering can be used to group related key-phrases.
arXiv Detail & Related papers (2021-01-25T10:17:33Z) - A Linguistically Driven Framework for Query Expansion via Grammatical
Constituent Highlighting and Role-Based Concept Weighting [0.0]
Concepts-of-Interest are recognized as the core concepts that represent the gist of the search goal.
The remaining query constituents which serve to specify the search goal and complete the query structure are classified as descriptive, relational or structural.
arXiv Detail & Related papers (2020-04-25T01:43:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.