Related papers: End-to-end multi-modal product matching in fashion e-commerce

End-to-end multi-modal product matching in fashion e-commerce

URL: http://arxiv.org/abs/2403.11593v1
Date: Mon, 18 Mar 2024 09:12:16 GMT
Title: End-to-end multi-modal product matching in fashion e-commerce
Authors: Sándor Tóth, Stephen Wilson, Alexia Tsoukara, Enric Moreu, Anton Masalovich, Lars Roemheld,
Abstract summary: We present a robust multi-modal product matching system in an industry setting. We show how a human-in-the-loop process can be combined with model-based predictions to achieve near perfect precision.
Score: 0.6047429555885261
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Product matching, the task of identifying different representations of the same product for better discoverability, curation, and pricing, is a key capability for online marketplace and e-commerce companies. We present a robust multi-modal product matching system in an industry setting, where large datasets, data distribution shifts and unseen domains pose challenges. We compare different approaches and conclude that a relatively straightforward projection of pretrained image and text encoders, trained through contrastive learning, yields state-of-the-art results, while balancing cost and performance. Our solution outperforms single modality matching systems and large pretrained models, such as CLIP. Furthermore we show how a human-in-the-loop process can be combined with model-based predictions to achieve near perfect precision in a production system.

Related papers

DashCLIP: Leveraging multimodal models for generating semantic embeddings for DoorDash [0.4288177321445912]
We introduce a joint training framework for product and user queries by aligning uni-modal and multi-modal encoders through contrastive learning on image-text data. Our novel approach trains a query encoder with an LLM-curated relevance dataset, eliminating the reliance on engagement history. For personalized ads recommendation, a significant uplift in the click-through rate and conversion rate after the deployment confirms the impact on key business metrics.
arXiv Detail & Related papers (2025-03-18T20:38:31Z)
Semantic Ads Retrieval at Walmart eCommerce with Language Models Progressively Trained on Multiple Knowledge Domains [6.1008328784394]
We present an end-to-end solution tailored to optimize the ads retrieval system on Walmart.com. Our approach is to pretrain the BERT-like classification model with product category information. It enhances the search relevance metric by up to 16% compared to a baseline DSSM-based model.
arXiv Detail & Related papers (2025-02-13T09:01:34Z)
A Unified Knowledge-Distillation and Semi-Supervised Learning Framework to Improve Industrial Ads Delivery Systems [19.0143243243314]
Industrial ads ranking systems conventionally rely on labeled impression data, which leads to challenges such as overfitting, slower incremental gain from model scaling, and biases due to discrepancies between training and serving data. We propose a Unified framework for Knowledge-Distillation and Semi-supervised Learning (UK) for ads ranking, empowering the training of models on a significantly larger and more diverse datasets.
arXiv Detail & Related papers (2025-02-05T23:14:07Z)
A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning. We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z)
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a new sandbox suite tailored for integrated data-model co-development. This sandbox provides a feedback-driven experimental platform, enabling cost-effective and guided refinement of both data and models.
arXiv Detail & Related papers (2024-07-16T14:40:07Z)
Text-Based Product Matching -- Semi-Supervised Clustering Approach [9.748519919202986]
This paper aims to present a new philosophy to product matching utilizing a semi-supervised clustering approach. We study the properties of this method by experimenting with the IDEC algorithm on the real-world dataset.
arXiv Detail & Related papers (2024-02-01T18:52:26Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
UniMatch: A Unified User-Item Matching Framework for the Multi-purpose Merchant Marketing [27.459774494479227]
We present a unified user-item matching framework to simultaneously conduct item recommendation and user targeting with just one model. Our framework results in significant performance gains in comparison with the state-of-the-art methods, with greatly reduced cost on computing resources and daily maintenance.
arXiv Detail & Related papers (2023-07-19T13:49:35Z)
UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC) UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z)
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry. We propose FedDM to build the global training objective from multiple local surrogate functions. In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z)
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories. We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks. We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z)
Multimodal Adversarially Learned Inference with Factorized Discriminators [10.818838437018682]
We propose a novel approach to generative modeling of multimodal data based on generative adversarial networks. To learn a coherent multimodal generative model, we show that it is necessary to align different encoder distributions with the joint decoder distribution simultaneously. By taking advantage of contrastive learning through factorizing the discriminator, we train our model on unimodal data.
arXiv Detail & Related papers (2021-12-20T08:18:49Z)
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining [108.86502855439774]
We investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval. We contribute Product1M, one of the largest multi-modal cosmetic datasets for real-world instance-level retrieval. We propose a novel model named Cross-modal contrAstive Product Transformer for instance-level prodUct REtrieval (CAPTURE)
arXiv Detail & Related papers (2021-07-30T12:11:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.