Cross-Domain Product Representation Learning for Rich-Content E-Commerce
- URL: http://arxiv.org/abs/2308.05550v1
- Date: Thu, 10 Aug 2023 13:06:05 GMT
- Title: Cross-Domain Product Representation Learning for Rich-Content E-Commerce
- Authors: Xuehan Bai, Yan Li, Yanhua Cheng, Wenjie Yang, Quan Chen, Han Li
- Abstract summary: This paper introduces a large-scale cRoss-dOmain Product Ecognition dataset, called ROPE.
ROPE covers a wide range of product categories and contains over 180,000 products, corresponding to millions of short videos and live streams.
It is the first dataset to cover product pages, short videos, and live streams simultaneously, providing the basis for establishing a unified product representation across different media domains.
- Score: 16.418118040661646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proliferation of short video and live-streaming platforms has
revolutionized how consumers engage in online shopping. Instead of browsing
product pages, consumers are now turning to rich-content e-commerce, where they
can purchase products through dynamic and interactive media like short videos
and live streams. This emerging form of online shopping has introduced
technical challenges, as products may be presented differently across various
media domains. Therefore, a unified product representation is essential for
achieving cross-domain product recognition to ensure an optimal user search
experience and effective product recommendations. Despite the urgent industrial
need for a unified cross-domain product representation, previous studies have
predominantly focused only on product pages without taking into account short
videos and live streams. To fill the gap in the rich-content e-commerce area,
in this paper, we introduce a large-scale cRoss-dOmain Product Ecognition
dataset, called ROPE. ROPE covers a wide range of product categories and
contains over 180,000 products, corresponding to millions of short videos and
live streams. It is the first dataset to cover product pages, short videos, and
live streams simultaneously, providing the basis for establishing a unified
product representation across different media domains. Furthermore, we propose
a Cross-dOmain Product rEpresentation framework, namely COPE, which unifies
product representations in different domains through multimodal learning
including text and vision. Extensive experiments on downstream tasks
demonstrate the effectiveness of COPE in learning a joint feature space for all
product domains.
Related papers
- ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval [28.13183873658186]
E-commerce is increasingly multimedia-enriched, with products exhibited in a broad-domain manner as images, short videos, or live stream promotions.
Due to large intra-product variance and high inter-product similarity in the broad-domain scenario, a visual-only representation is inadequate.
We propose ASR-enhanced Multimodal Product Representation Learning (AMPere)
arXiv Detail & Related papers (2024-08-06T06:24:10Z) - Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval [32.478352606125306]
We propose a text-guided attention mechanism that leverages the spoken content of salespeople to guide the model to focus toward intended products.
A long-rangetemporal graph network is further designed to achieve both instance-level interaction and frame-level matching.
We demonstrate the superior performance of our proposed SGMN model, surpassing the state-of-the-art methods by a substantial margin.
arXiv Detail & Related papers (2024-07-23T07:36:54Z) - MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product
Summarization [93.5217515566437]
Multi-modal Product Summarization (MPS) aims to increase customers' desire to purchase by highlighting product characteristics.
Existing MPS methods can produce promising results, but they still lack end-to-end product summarization.
We propose an end-to-end multi-modal attribute-aware product summarization method (MMAPS) for generating high-quality product summaries in e-commerce.
arXiv Detail & Related papers (2023-08-22T11:00:09Z) - Cross-view Semantic Alignment for Livestreaming Product Recognition [24.38606354376169]
We present LPR4M, a large-scale multimodal dataset that covers 34 categories.
LPR4M contains diverse videos and noise modality pairs while exhibiting a long-tailed distribution.
A novel Patch Feature Reconstruction loss is proposed to penalize the semantic misalignment between cross-view patches.
arXiv Detail & Related papers (2023-08-09T12:23:41Z) - Multi-queue Momentum Contrast for Microvideo-Product Retrieval [57.527227171945796]
We formulate the microvideo-product retrieval task, which is the first attempt to explore the retrieval between the multi-modal and multi-modal instances.
A novel approach named Multi-Queue Momentum Contrast (MQMC) network is proposed for bidirectional retrieval.
A discriminative selection strategy with a multi-queue is used to distinguish the importance of different negatives based on their categories.
arXiv Detail & Related papers (2022-12-22T03:47:14Z) - e-CLIP: Large-Scale Vision-Language Representation Learning in
E-commerce [9.46186546774799]
We propose a contrastive learning framework that aligns language and visual models using unlabeled raw product text and images.
We present techniques we used to train large-scale representation learning models and share solutions that address domain-specific challenges.
arXiv Detail & Related papers (2022-07-01T05:16:47Z) - ItemSage: Learning Product Embeddings for Shopping Recommendations at
Pinterest [60.841761065439414]
At Pinterest, we build a single set of product embeddings called ItemSage to provide relevant recommendations in all shopping use cases.
This approach has led to significant improvements in engagement and conversion metrics, while reducing both infrastructure and maintenance cost.
arXiv Detail & Related papers (2022-05-24T02:28:58Z) - Product1M: Towards Weakly Supervised Instance-Level Product Retrieval
via Cross-modal Pretraining [108.86502855439774]
We investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval.
We contribute Product1M, one of the largest multi-modal cosmetic datasets for real-world instance-level retrieval.
We propose a novel model named Cross-modal contrAstive Product Transformer for instance-level prodUct REtrieval (CAPTURE)
arXiv Detail & Related papers (2021-07-30T12:11:24Z) - Fashion Focus: Multi-modal Retrieval System for Video Commodity
Localization in E-commerce [18.651201334846352]
We present an innovative demonstration of multi-modal retrieval system called "Fashion Focus"
It enables to exactly localize the product images in the online video as the focuses.
Our system employs two procedures for analysis, including video content structuring and multi-modal retrieval, to automatically achieve accurate video-to-shop matching.
arXiv Detail & Related papers (2021-02-09T09:45:04Z) - Poet: Product-oriented Video Captioner for E-commerce [124.9936946822493]
In e-commerce, a growing number of user-generated videos are used for product promotion. How to generate video descriptions that narrate the user-preferred product characteristics depicted in the video is vital for successful promoting.
We propose a product-oriented video captioner framework, abbreviated as Poet.
We show that Poet achieves consistent performance improvement over previous methods concerning generation quality, product aspects capturing, and lexical diversity.
arXiv Detail & Related papers (2020-08-16T10:53:46Z) - Comprehensive Information Integration Modeling Framework for Video
Titling [124.11296128308396]
We integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework.
To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization.
We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform.
arXiv Detail & Related papers (2020-06-24T10:38:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.