Multi-modal Generative Models in Recommendation System
- URL: http://arxiv.org/abs/2409.10993v1
- Date: Tue, 17 Sep 2024 08:55:50 GMT
- Title: Multi-modal Generative Models in Recommendation System
- Authors: Arnau Ramisa, Rene Vidal, Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Mahesh Sathiamoorthy, Atoosa Kasrizadeh, Silvia Milano, Francesco Ricci,
- Abstract summary: Many recommendation systems limit user inputs to text strings or behavior signals such as clicks and purchases.
With the advent of generative AI, users have come to expect richer levels of interactions.
We argue that future recommendation systems will benefit from a multi-modal understanding of the products.
- Score: 34.45328907249946
- License:
- Abstract: Many recommendation systems limit user inputs to text strings or behavior signals such as clicks and purchases, and system outputs to a list of products sorted by relevance. With the advent of generative AI, users have come to expect richer levels of interactions. In visual search, for example, a user may provide a picture of their desired product along with a natural language modification of the content of the picture (e.g., a dress like the one shown in the picture but in red color). Moreover, users may want to better understand the recommendations they receive by visualizing how the product fits their use case, e.g., with a representation of how a garment might look on them, or how a furniture item might look in their room. Such advanced levels of interaction require recommendation systems that are able to discover both shared and complementary information about the product across modalities, and visualize the product in a realistic and informative way. However, existing systems often treat multiple modalities independently: text search is usually done by comparing the user query to product titles and descriptions, while visual search is typically done by comparing an image provided by the customer to product images. We argue that future recommendation systems will benefit from a multi-modal understanding of the products that leverages the rich information retailers have about both customers and products to come up with the best recommendations. In this chapter we review recommendation systems that use multiple data modalities simultaneously.
Related papers
- Attention-based sequential recommendation system using multimodal data [8.110978727364397]
We propose an attention-based sequential recommendation method that employs multimodal data of items such as images, texts, and categories.
The experimental results obtained from the Amazon datasets show that the proposed method outperforms those of conventional sequential recommendation systems.
arXiv Detail & Related papers (2024-05-28T08:41:05Z) - MMGRec: Multimodal Generative Recommendation with Transformer Model [81.61896141495144]
MMGRec aims to introduce a generative paradigm into multimodal recommendation.
We first devise a hierarchical quantization method Graph CF-RQVAE to assign Rec-ID for each item from its multimodal information.
We then train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences.
arXiv Detail & Related papers (2024-04-25T12:11:27Z) - Unified Vision-Language Representation Modeling for E-Commerce
Same-Style Products Retrieval [12.588713044749177]
Same-style products retrieval plays an important role in e-commerce platforms.
We propose a unified vision-language modeling method for e-commerce same-style products retrieval.
It is capable of cross-modal product-to-product retrieval, as well as style transfer and user-interactive search.
arXiv Detail & Related papers (2023-02-10T07:24:23Z) - Talk the Walk: Synthetic Data Generation for Conversational Music
Recommendation [62.019437228000776]
We present TalkWalk, which generates realistic high-quality conversational data by leveraging encoded expertise in widely available item collections.
We generate over one million diverse conversations in a human-collected dataset.
arXiv Detail & Related papers (2023-01-27T01:54:16Z) - Deep Multi-View Learning for Tire Recommendation [0.0]
We propose a comparative study between several state-of-the-art multi-view models applied to our industrial data.
Our study demonstrates the relevance of using multi-view learning within recommender systems.
arXiv Detail & Related papers (2022-03-23T14:43:14Z) - Knowledge-Enhanced Hierarchical Graph Transformer Network for
Multi-Behavior Recommendation [56.12499090935242]
This work proposes a Knowledge-Enhanced Hierarchical Graph Transformer Network (KHGT) to investigate multi-typed interactive patterns between users and items in recommender systems.
KHGT is built upon a graph-structured neural architecture to capture type-specific behavior characteristics.
We show that KHGT consistently outperforms many state-of-the-art recommendation methods across various evaluation settings.
arXiv Detail & Related papers (2021-10-08T09:44:00Z) - What Users Want? WARHOL: A Generative Model for Recommendation [9.195173526948125]
We argue that existing recommendation models cannot directly be used to predict the optimal combination of features that will make new products serve better the needs of the target audience.
We develop WARHOL, a product generation and recommendation architecture that takes as input past user shopping activity.
We show that WARHOL can approach the performance of state-of-the-art recommendation models, while being able to generate entirely new products that are relevant to the given user profiles.
arXiv Detail & Related papers (2021-09-02T17:15:28Z) - An Overview of Recommender Systems and Machine Learning in Feature
Modeling and Configuration [55.67505546330206]
We give an overview of a potential new line of research which is related to the application of recommender systems and machine learning techniques.
In this paper, we give examples of the application of recommender systems and machine learning and discuss future research issues.
arXiv Detail & Related papers (2021-02-12T17:21:36Z) - Pre-training Graph Transformer with Multimodal Side Information for
Recommendation [82.4194024706817]
We propose a pre-training strategy to learn item representations by considering both item side information and their relationships.
We develop a novel sampling algorithm named MCNSampling to select contextual neighbors for each item.
The proposed Pre-trained Multimodal Graph Transformer (PMGT) learns item representations with two objectives: 1) graph structure reconstruction, and 2) masked node feature reconstruction.
arXiv Detail & Related papers (2020-10-23T10:30:24Z) - Exploiting Latent Codes: Interactive Fashion Product Generation, Similar
Image Retrieval, and Cross-Category Recommendation using Variational
Autoencoders [0.0]
Author proposes using Variational Autoencoder (VAE) to build an interactive fashion product application framework.
This pipeline is applicable in the booming industry of e-commerce enabling direct user interaction in specifying desired products.
arXiv Detail & Related papers (2020-09-02T13:27:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.