Related papers: Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings

Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings

URL: http://arxiv.org/abs/2509.15858v1
Date: Fri, 19 Sep 2025 10:49:39 GMT
Title: Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings
Authors: Aysenur Kulunk, Berk Taskin, M. Furkan Eseoglu, H. Bahadir Sahin,
Abstract summary: We introduce a scalable, multimodal product deduplication designed specifically for the e-commerce domain.<n>Our approach employs a domain-specific text model grounded in BERT architecture in conjunction with MaskedAutoEncoders for image representations.<n>By integrating these feature extraction mechanisms with Milvus, an optimized vector database, our system can facilitate efficient and high-precision similarity searches.
Score: 0.13999481573773068
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In large scale e-commerce marketplaces, duplicate product listings frequently cause consumer confusion and operational inefficiencies, degrading trust on the platform and increasing costs. Traditional keyword-based search methodologies falter in accurately identifying duplicates due to their reliance on exact textual matches, neglecting semantic similarities inherent in product titles. To address these challenges, we introduce a scalable, multimodal product deduplication designed specifically for the e-commerce domain. Our approach employs a domain-specific text model grounded in BERT architecture in conjunction with MaskedAutoEncoders for image representations. Both of these architectures are augmented with dimensionality reduction techniques to produce compact 128-dimensional embeddings without significant information loss. Complementing this, we also developed a novel decider model that leverages both text and image vectors. By integrating these feature extraction mechanisms with Milvus, an optimized vector database, our system can facilitate efficient and high-precision similarity searches across extensive product catalogs exceeding 200 million items with just 100GB of system RAM consumption. Empirical evaluations demonstrate that our matching system achieves a macro-average F1 score of 0.90, outperforming third-party solutions which attain an F1 score of 0.83. Our findings show the potential of combining domain-specific adaptations with state-of-the-art machine learning techniques to mitigate duplicate listings in large-scale e-commerce environments.

Related papers

OneSearch: A Preliminary Exploration of the Unified End-to-End Generative Framework for E-commerce Search [43.94443394870866]
OneSearch is an end-to-end generative framework for e-commerce search.<n>It has been successfully deployed across multiple search scenarios in Kuaishou, serving millions of users, generating tens of millions of PVs daily.
arXiv Detail & Related papers (2025-09-03T11:50:04Z)
UniECS: Unified Multimodal E-Commerce Search Framework with Gated Cross-modal Fusion [20.13803245640432]
Current e-commerce multimodal retrieval systems face two key limitations.<n>They optimize for specific tasks with fixed modality pairings, and lack comprehensive benchmarks for evaluating unified retrieval approaches.<n>We introduce UniECS, a unified multimodal e-commerce search framework that handles all retrieval scenarios across image, text, and their combinations.
arXiv Detail & Related papers (2025-08-19T14:06:13Z)
PRISM: Distributed Inference for Foundation Models at Edge [73.54372283220444]
PRISM is a communication-efficient and compute-aware strategy for distributed Transformer inference on edge devices.<n>We evaluate PRISM on ViT, BERT, and GPT-2 across diverse datasets.
arXiv Detail & Related papers (2025-07-16T11:25:03Z)
Learning Item Representations Directly from Multimodal Features for Effective Recommendation [51.49251689107541]
multimodal recommender systems predominantly leverage Bayesian Personalized Ranking (BPR) optimization to learn item representations.<n>We propose a novel model (i.e., LIRDRec) that learns item representations directly from multimodal features to augment recommendation performance.
arXiv Detail & Related papers (2025-05-08T05:42:22Z)
Generative Retrieval and Alignment Model: A New Paradigm for E-commerce Retrieval [12.318142818707317]
This paper introduces a novel e-commerce retrieval paradigm: the Generative Retrieval and Alignment Model (GRAM)<n>GRAM employs joint training on text information from both queries and products to generate shared text codes.<n>GRAM significantly outperforms traditional models and the latest generative retrieval models.
arXiv Detail & Related papers (2025-04-02T06:40:09Z)
Semantic Ads Retrieval at Walmart eCommerce with Language Models Progressively Trained on Multiple Knowledge Domains [6.1008328784394]
We present an end-to-end solution tailored to optimize the ads retrieval system on Walmart.com.<n>Our approach is to pretrain the BERT-like classification model with product category information.<n>It enhances the search relevance metric by up to 16% compared to a baseline DSSM-based model.
arXiv Detail & Related papers (2025-02-13T09:01:34Z)
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories. We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks. We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z)
VFed-SSD: Towards Practical Vertical Federated Advertising [53.08038962443853]
We propose a semi-supervised split distillation framework VFed-SSD to alleviate the two limitations. Specifically, we develop a self-supervised task MatchedPair Detection (MPD) to exploit the vertically partitioned unlabeled data. Our framework provides an efficient federation-enhanced solution for real-time display advertising with minimal deploying cost and significant performance lift.
arXiv Detail & Related papers (2022-05-31T17:45:30Z)
ACE-BERT: Adversarial Cross-modal Enhanced BERT for E-commerce Retrieval [6.274310862007448]
We propose a novel Adrial Cross-modal Enhanced BERT (ACE-BERT) for efficient E-commerce retrieval. With the pre-trained enhanced BERT as the backbone network, ACE-BERT adopts adversarial learning to ensure the distribution consistency of different modality representations. Experimental results demonstrate that ACE-BERT outperforms the state-of-the-art approaches on the retrieval task.
arXiv Detail & Related papers (2021-12-14T07:36:20Z)
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining [108.86502855439774]
We investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval. We contribute Product1M, one of the largest multi-modal cosmetic datasets for real-world instance-level retrieval. We propose a novel model named Cross-modal contrAstive Product Transformer for instance-level prodUct REtrieval (CAPTURE)
arXiv Detail & Related papers (2021-07-30T12:11:24Z)
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval [80.35589927511667]
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image. We propose a novel fine-tuning framework which turns any pretrained text-image multi-modal model into an efficient retrieval model. Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.
arXiv Detail & Related papers (2021-03-22T15:08:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.