MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding
- URL: http://arxiv.org/abs/2406.10701v3
- Date: Sat, 12 Oct 2024 05:08:54 GMT
- Title: MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding
- Authors: Baixuan Xu, Weiqi Wang, Haochen Shi, Wenxuan Ding, Huihao Jing, Tianqing Fang, Jiaxin Bai, Xin Liu, Changlong Yu, Zheng Li, Chen Luo, Qingyu Yin, Bing Yin, Long Chen, Yangqiu Song,
- Abstract summary: MIND is a framework that infers purchase intentions from multimodal product metadata and prioritizes human-centric ones.
Using Amazon Review data, we create a multimodal intention knowledge base, which contains 1,264,441 million intentions.
Our obtained intentions significantly enhance large language models in two intention comprehension tasks.
- Score: 67.26334044239161
- License:
- Abstract: Improving user experience and providing personalized search results in E-commerce platforms heavily rely on understanding purchase intention. However, existing methods for acquiring large-scale intentions bank on distilling large language models with human annotation for verification. Such an approach tends to generate product-centric intentions, overlook valuable visual information from product images, and incurs high costs for scalability. To address these issues, we introduce MIND, a multimodal framework that allows Large Vision-Language Models (LVLMs) to infer purchase intentions from multimodal product metadata and prioritize human-centric ones. Using Amazon Review data, we apply MIND and create a multimodal intention knowledge base, which contains 1,264,441 million intentions derived from 126,142 co-buy shopping records across 107,215 products. Extensive human evaluations demonstrate the high plausibility and typicality of our obtained intentions and validate the effectiveness of our distillation framework and filtering mechanism. Additional experiments reveal that our obtained intentions significantly enhance large language models in two intention comprehension tasks.
Related papers
- Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation [3.670782697615276]
Large Language Models (LLMs) have the potential to address this scaling issue.
We propose a framework for assessing the product search engines in a large-scale e-commerce setting.
Our method, validated through deployment on a large e-commerce platform, demonstrates comparable quality to human annotations.
arXiv Detail & Related papers (2024-09-18T10:30:50Z) - Image Score: Learning and Evaluating Human Preferences for Mercari Search [2.1555050262085027]
Large Language Models (LLMs) are being actively studied and used for data labelling tasks.
We propose a cost-efficient LLM-driven approach for assessing and predicting image quality in e-commerce settings.
We show that our LLM-produced labels correlate with user behavior on Mercari.
arXiv Detail & Related papers (2024-08-21T05:30:06Z) - LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce [71.37481473399559]
In this paper, we present IntentionQA, a benchmark to evaluate LMs' comprehension of purchase intentions in E-commerce.
IntentionQA consists of 4,360 carefully curated problems across three difficulty levels, constructed using an automated pipeline.
Human evaluations demonstrate the high quality and low false-negative rate of our benchmark.
arXiv Detail & Related papers (2024-06-14T16:51:21Z) - Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model [82.93634081255942]
We propose a vision-language connector that enables MLLMs to achieve high accuracy while maintain low cost.
We first reveal the existence of the visual anchors in Vision Transformer and propose a cost-effective search algorithm to extract them.
We introduce the Anchor Former (AcFormer), a novel vision-language connector designed to leverage the rich prior knowledge obtained from these visual anchors during pretraining.
arXiv Detail & Related papers (2024-05-28T04:23:00Z) - ItemSage: Learning Product Embeddings for Shopping Recommendations at
Pinterest [60.841761065439414]
At Pinterest, we build a single set of product embeddings called ItemSage to provide relevant recommendations in all shopping use cases.
This approach has led to significant improvements in engagement and conversion metrics, while reducing both infrastructure and maintenance cost.
arXiv Detail & Related papers (2022-05-24T02:28:58Z) - Product1M: Towards Weakly Supervised Instance-Level Product Retrieval
via Cross-modal Pretraining [108.86502855439774]
We investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval.
We contribute Product1M, one of the largest multi-modal cosmetic datasets for real-world instance-level retrieval.
We propose a novel model named Cross-modal contrAstive Product Transformer for instance-level prodUct REtrieval (CAPTURE)
arXiv Detail & Related papers (2021-07-30T12:11:24Z) - A Multimodal Late Fusion Model for E-Commerce Product Classification [7.463657960984954]
In this study, we investigated a multimodal late fusion approach based on text and image modalities to categorize e-commerce products on Rakuten.
Specifically, we developed modal specific state-of-the-art deep neural networks for each input modal, and then fused them at the decision level.
Our team named pa_curis won the 1st place with a macro-F1 of 0.9144 on the final leaderboard.
arXiv Detail & Related papers (2020-08-14T03:46:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.