Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings
- URL: http://arxiv.org/abs/2509.15858v1
- Date: Fri, 19 Sep 2025 10:49:39 GMT
- Title: Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings
- Authors: Aysenur Kulunk, Berk Taskin, M. Furkan Eseoglu, H. Bahadir Sahin,
- Abstract summary: We introduce a scalable, multimodal product deduplication designed specifically for the e-commerce domain.<n>Our approach employs a domain-specific text model grounded in BERT architecture in conjunction with MaskedAutoEncoders for image representations.<n>By integrating these feature extraction mechanisms with Milvus, an optimized vector database, our system can facilitate efficient and high-precision similarity searches.
- Score: 0.13999481573773068
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In large scale e-commerce marketplaces, duplicate product listings frequently cause consumer confusion and operational inefficiencies, degrading trust on the platform and increasing costs. Traditional keyword-based search methodologies falter in accurately identifying duplicates due to their reliance on exact textual matches, neglecting semantic similarities inherent in product titles. To address these challenges, we introduce a scalable, multimodal product deduplication designed specifically for the e-commerce domain. Our approach employs a domain-specific text model grounded in BERT architecture in conjunction with MaskedAutoEncoders for image representations. Both of these architectures are augmented with dimensionality reduction techniques to produce compact 128-dimensional embeddings without significant information loss. Complementing this, we also developed a novel decider model that leverages both text and image vectors. By integrating these feature extraction mechanisms with Milvus, an optimized vector database, our system can facilitate efficient and high-precision similarity searches across extensive product catalogs exceeding 200 million items with just 100GB of system RAM consumption. Empirical evaluations demonstrate that our matching system achieves a macro-average F1 score of 0.90, outperforming third-party solutions which attain an F1 score of 0.83. Our findings show the potential of combining domain-specific adaptations with state-of-the-art machine learning techniques to mitigate duplicate listings in large-scale e-commerce environments.
Related papers
- OneSearch: A Preliminary Exploration of the Unified End-to-End Generative Framework for E-commerce Search [43.94443394870866]
OneSearch is an end-to-end generative framework for e-commerce search.<n>It has been successfully deployed across multiple search scenarios in Kuaishou, serving millions of users, generating tens of millions of PVs daily.
arXiv Detail & Related papers (2025-09-03T11:50:04Z) - UniECS: Unified Multimodal E-Commerce Search Framework with Gated Cross-modal Fusion [20.13803245640432]
Current e-commerce multimodal retrieval systems face two key limitations.<n>They optimize for specific tasks with fixed modality pairings, and lack comprehensive benchmarks for evaluating unified retrieval approaches.<n>We introduce UniECS, a unified multimodal e-commerce search framework that handles all retrieval scenarios across image, text, and their combinations.
arXiv Detail & Related papers (2025-08-19T14:06:13Z) - PRISM: Distributed Inference for Foundation Models at Edge [73.54372283220444]
PRISM is a communication-efficient and compute-aware strategy for distributed Transformer inference on edge devices.<n>We evaluate PRISM on ViT, BERT, and GPT-2 across diverse datasets.
arXiv Detail & Related papers (2025-07-16T11:25:03Z) - Learning Item Representations Directly from Multimodal Features for Effective Recommendation [51.49251689107541]
multimodal recommender systems predominantly leverage Bayesian Personalized Ranking (BPR) optimization to learn item representations.<n>We propose a novel model (i.e., LIRDRec) that learns item representations directly from multimodal features to augment recommendation performance.
arXiv Detail & Related papers (2025-05-08T05:42:22Z) - Generative Retrieval and Alignment Model: A New Paradigm for E-commerce Retrieval [12.318142818707317]
This paper introduces a novel e-commerce retrieval paradigm: the Generative Retrieval and Alignment Model (GRAM)<n>GRAM employs joint training on text information from both queries and products to generate shared text codes.<n>GRAM significantly outperforms traditional models and the latest generative retrieval models.
arXiv Detail & Related papers (2025-04-02T06:40:09Z) - Semantic Ads Retrieval at Walmart eCommerce with Language Models Progressively Trained on Multiple Knowledge Domains [6.1008328784394]
We present an end-to-end solution tailored to optimize the ads retrieval system on Walmart.com.<n>Our approach is to pretrain the BERT-like classification model with product category information.<n>It enhances the search relevance metric by up to 16% compared to a baseline DSSM-based model.
arXiv Detail & Related papers (2025-02-13T09:01:34Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - VFed-SSD: Towards Practical Vertical Federated Advertising [53.08038962443853]
We propose a semi-supervised split distillation framework VFed-SSD to alleviate the two limitations.
Specifically, we develop a self-supervised task MatchedPair Detection (MPD) to exploit the vertically partitioned unlabeled data.
Our framework provides an efficient federation-enhanced solution for real-time display advertising with minimal deploying cost and significant performance lift.
arXiv Detail & Related papers (2022-05-31T17:45:30Z) - ACE-BERT: Adversarial Cross-modal Enhanced BERT for E-commerce Retrieval [6.274310862007448]
We propose a novel Adrial Cross-modal Enhanced BERT (ACE-BERT) for efficient E-commerce retrieval.
With the pre-trained enhanced BERT as the backbone network, ACE-BERT adopts adversarial learning to ensure the distribution consistency of different modality representations.
Experimental results demonstrate that ACE-BERT outperforms the state-of-the-art approaches on the retrieval task.
arXiv Detail & Related papers (2021-12-14T07:36:20Z) - Product1M: Towards Weakly Supervised Instance-Level Product Retrieval
via Cross-modal Pretraining [108.86502855439774]
We investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval.
We contribute Product1M, one of the largest multi-modal cosmetic datasets for real-world instance-level retrieval.
We propose a novel model named Cross-modal contrAstive Product Transformer for instance-level prodUct REtrieval (CAPTURE)
arXiv Detail & Related papers (2021-07-30T12:11:24Z) - Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for
Improved Cross-Modal Retrieval [80.35589927511667]
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.
We propose a novel fine-tuning framework which turns any pretrained text-image multi-modal model into an efficient retrieval model.
Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.
arXiv Detail & Related papers (2021-03-22T15:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.