Related papers: From Pixels to Purchase: Building and Evaluating a Taxonomy-Decoupled Visual Search Engine for Home Goods E-commerce

From Pixels to Purchase: Building and Evaluating a Taxonomy-Decoupled Visual Search Engine for Home Goods E-commerce

URL: http://arxiv.org/abs/2601.11769v1
Date: Fri, 16 Jan 2026 20:54:30 GMT
Title: From Pixels to Purchase: Building and Evaluating a Taxonomy-Decoupled Visual Search Engine for Home Goods E-commerce
Authors: Cheng Lyu, Jingyue Zhang, Ryan Maunu, Mengwei Li, Vinny DeGenova, Yuanli Pei,
Abstract summary: Visual search is critical for e-commerce, especially in style-driven domains where user intent is subjective and open-ended.<n>We propose a taxonomy-decoupled architecture that uses classification-free region proposals and unified embeddings for similarity retrieval.<n>Our system improves retrieval quality and yields a measurable uplift in customer engagement, while our offline evaluation metrics strongly correlate with real-world outcomes.
Score: 6.200631104634354
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual search is critical for e-commerce, especially in style-driven domains where user intent is subjective and open-ended. Existing industrial systems typically couple object detection with taxonomy-based classification and rely on catalog data for evaluation, which is prone to noise that limits robustness and scalability. We propose a taxonomy-decoupled architecture that uses classification-free region proposals and unified embeddings for similarity retrieval, enabling a more flexible and generalizable visual search. To overcome the evaluation bottleneck, we propose an LLM-as-a-Judge framework that assesses nuanced visual similarity and category relevance for query-result pairs in a zero-shot manner, removing dependence on human annotations or noise-prone catalog data. Deployed at scale on a global home goods platform, our system improves retrieval quality and yields a measurable uplift in customer engagement, while our offline evaluation metrics strongly correlate with real-world outcomes.

Related papers

UniDGF: A Unified Detection-to-Generation Framework for Hierarchical Object Visual Recognition [14.256812146187565]
We introduce a detection-guided generative framework that predicts hierarchical category and attribute tokens.<n>For each detected object, we extract refined ROI-level features and employ a BART-based generator to produce semantic tokens.<n> Experiments on both large-scale proprietary e-commerce datasets and open-source datasets demonstrate that our approach significantly outperforms existing similarity-based pipelines.
arXiv Detail & Related papers (2025-11-20T02:37:43Z)
Taxonomy-based Negative Sampling In Personalized Semantic Search for E-commerce [46.251483528080236]
We present a semantic retrieval model for e-commerce search that embeds queries and products into a shared vector space.<n>To further tailor retrievals, we incorporate user-level personalization by modeling each customer's past purchase history and behavior.
arXiv Detail & Related papers (2025-11-01T20:25:00Z)
Improving E-commerce Search with Category-Aligned Retrieval [0.0]
Category-Aligned Retrieval System (CARS) improves search relevance by first predicting the product category from a user's query and then boosting products within that category.<n>We introduce a novel method for creating "Trainable Category Prototypes" from query embeddings.
arXiv Detail & Related papers (2025-09-03T20:43:52Z)
Zero-Shot Retrieval for Scalable Visual Search in a Two-Sided Marketplace [0.0]
This paper presents a scalable visual search system deployed in Mercari's C2C marketplace.<n>We evaluate recent vision-language models for zero-shot image retrieval and compare their performance with an existing fine-tuned baseline.
arXiv Detail & Related papers (2025-07-31T05:13:20Z)
SCAN: Structured Capability Assessment and Navigation for LLMs [54.54085382131134]
textbfSCAN (Structured Capability Assessment and Navigation) is a practical framework that enables detailed characterization of Large Language Models.<n>SCAN incorporates four key components:.<n>TaxBuilder, which extracts capability-indicating tags from queries to construct a hierarchical taxonomy;.<n>RealMix, a query synthesis and filtering mechanism that ensures sufficient evaluation data for each capability tag;.<n>A PC$2$-based (Pre-Comparison-derived Criteria) LLM-as-a-Judge approach achieves significantly higher accuracy compared to classic LLM-as-a-Judge method
arXiv Detail & Related papers (2025-05-10T16:52:40Z)
Multi-output Headed Ensembles for Product Item Classification [0.9053163124987533]
We propose a deep learning based classification model framework for e-commerce catalogs. We show improvements against robust industry standard baseline models. We also propose a novel way to evaluate model performance using user sessions.
arXiv Detail & Related papers (2023-07-29T01:23:36Z)
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories. We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks. We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z)
Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object. This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
CRACT: Cascaded Regression-Align-Classification for Robust Visual Tracking [97.84109669027225]
We introduce an improved proposal refinement module, Cascaded Regression-Align- Classification (CRAC) CRAC yields new state-of-the-art performances on many benchmarks. In experiments on seven benchmarks including OTB-2015, UAV123, NfS, VOT-2018, TrackingNet, GOT-10k and LaSOT, our CRACT exhibits very promising results in comparison with state-of-the-art competitors.
arXiv Detail & Related papers (2020-11-25T02:18:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.