Transformer-empowered Multi-modal Item Embedding for Enhanced Image
Search in E-Commerce
- URL: http://arxiv.org/abs/2311.17954v2
- Date: Thu, 8 Feb 2024 15:35:36 GMT
- Title: Transformer-empowered Multi-modal Item Embedding for Enhanced Image
Search in E-Commerce
- Authors: Chang Liu, Peng Hou, Anxiang Zeng, Han Yu
- Abstract summary: Multi-modal Item Embedding Model (MIEM) is capable of utilizing both textual information and multiple images about a product to construct meaningful product features.
MIEM has become an integral part of the Shopee image search platform.
- Score: 20.921870288665627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the past decade, significant advances have been made in the field of
image search for e-commerce applications. Traditional image-to-image retrieval
models, which focus solely on image details such as texture, tend to overlook
useful semantic information contained within the images. As a result, the
retrieved products might possess similar image details, but fail to fulfil the
user's search goals. Moreover, the use of image-to-image retrieval models for
products containing multiple images results in significant online product
feature storage overhead and complex mapping implementations. In this paper, we
report the design and deployment of the proposed Multi-modal Item Embedding
Model (MIEM) to address these limitations. It is capable of utilizing both
textual information and multiple images about a product to construct meaningful
product features. By leveraging semantic information from images, MIEM
effectively supplements the image search process, improving the overall
accuracy of retrieval results. MIEM has become an integral part of the Shopee
image search platform. Since its deployment in March 2023, it has achieved a
remarkable 9.90% increase in terms of clicks per user and a 4.23% boost in
terms of orders per user for the image search feature on the Shopee e-commerce
platform.
Related papers
- VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model [76.02314305164595]
This work presents a novel image outpainting framework that is capable of customizing the results according to the requirement of users.
We take advantage of a Multimodal Large Language Model (MLLM) that automatically extracts and organizes the corresponding textual descriptions of the masked and unmasked part of a given image.
In addition, a special Cross-Attention module, namely Center-Total-Surrounding (CTS), is elaborately designed to enhance further the the interaction between specific space regions of the image and corresponding parts of the text prompts.
arXiv Detail & Related papers (2024-06-03T07:14:19Z) - Many-to-many Image Generation with Auto-regressive Diffusion Models [59.5041405824704]
This paper introduces a domain-general framework for many-to-many image generation, capable of producing interrelated image series from a given set of images.
We present MIS, a novel large-scale multi-image dataset, containing 12M synthetic multi-image samples, each with 25 interconnected images.
We learn M2M, an autoregressive model for many-to-many generation, where each image is modeled within a diffusion framework.
arXiv Detail & Related papers (2024-04-03T23:20:40Z) - Mutual Query Network for Multi-Modal Product Image Segmentation [13.192334066413837]
We propose a mutual query network to segment products based on both visual and linguistic modalities.
To promote the research in this field, we also construct a Multi-Modal Product dataset (MMPS)
The proposed method significantly outperforms the state-of-the-art methods on MMPS.
arXiv Detail & Related papers (2023-06-26T03:18:38Z) - EDIS: Entity-Driven Image Search over Multimodal Web Content [95.40238328527931]
We introduce textbfEntity-textbfDriven textbfImage textbfSearch (EDIS), a dataset for cross-modal image search in the news domain.
EDIS consists of 1 million web images from actual search engine results and curated datasets, with each image paired with a textual description.
arXiv Detail & Related papers (2023-05-23T02:59:19Z) - Unified Vision-Language Representation Modeling for E-Commerce
Same-Style Products Retrieval [12.588713044749177]
Same-style products retrieval plays an important role in e-commerce platforms.
We propose a unified vision-language modeling method for e-commerce same-style products retrieval.
It is capable of cross-modal product-to-product retrieval, as well as style transfer and user-interactive search.
arXiv Detail & Related papers (2023-02-10T07:24:23Z) - Unposed: Unsupervised Pose Estimation based Product Image
Recommendations [4.467248776406006]
We propose a Human Pose Detection based unsupervised method to scan the image set of a product for the missing ones.
The unsupervised approach suggests a fair approach to sellers based on product and category irrespective of any biases.
We surveyed 200 products manually, a large fraction of which had at least 1 repeated image or missing variant, and sampled 3K products(20K images) of which a significant proportion had scope for adding many image variants.
arXiv Detail & Related papers (2023-01-19T05:02:55Z) - Named Entity and Relation Extraction with Multi-Modal Retrieval [51.660650522630526]
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE.
We propose a novel Multi-modal Retrieval based framework (MoRe)
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
arXiv Detail & Related papers (2022-12-03T13:11:32Z) - Automatic Generation of Product-Image Sequence in E-commerce [46.06263129000091]
Multi-modality Unified Imagesequence (MUIsC) is able to simultaneously detect all categories through learning rule violations.
By Dec 2021, our AGPIS framework has generated high-standard images for about 1.5 million products and achieves 13.6% in reject rate.
arXiv Detail & Related papers (2022-06-26T23:38:42Z) - A Study on the Efficient Product Search Service for the Damaged Image
Information [12.310316230437005]
The idea of this study is to help search for products through image restoration using an image pre-processing and image inpainting algorithm for damaged images.
The system has the advantage of efficiently showing information by category, so that enables efficient sales of registered information.
arXiv Detail & Related papers (2021-11-14T13:58:48Z) - An Automatic Image Content Retrieval Method for better Mobile Device
Display User Experiences [91.3755431537592]
A new mobile application for image content retrieval and classification for mobile device display is proposed.
The application was run on thousands of pictures and showed encouraging results towards a better user visual experience with mobile displays.
arXiv Detail & Related papers (2021-08-26T23:44:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.