Multimodal Recommendation Dialog with Subjective Preference: A New
Challenge and Benchmark
- URL: http://arxiv.org/abs/2305.18212v1
- Date: Fri, 26 May 2023 08:43:46 GMT
- Title: Multimodal Recommendation Dialog with Subjective Preference: A New
Challenge and Benchmark
- Authors: Yuxing Long, Binyuan Hui, Caixia Yuan1, Fei Huang, Yongbin Li, Xiaojie
Wang
- Abstract summary: This paper introduces a new dataset SURE (Multimodal Recommendation Dialog with SUbjective PREference)
The data is built in two phases with human annotations to ensure quality and diversity.
SURE is well-annotated with subjective preferences and recommendation acts proposed by sales experts.
- Score: 38.613625892808706
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Existing multimodal task-oriented dialog data fails to demonstrate the
diverse expressions of user subjective preferences and recommendation acts in
the real-life shopping scenario. This paper introduces a new dataset SURE
(Multimodal Recommendation Dialog with SUbjective PREference), which contains
12K shopping dialogs in complex store scenes. The data is built in two phases
with human annotations to ensure quality and diversity. SURE is well-annotated
with subjective preferences and recommendation acts proposed by sales experts.
A comprehensive analysis is given to reveal the distinguishing features of
SURE. Three benchmark tasks are then proposed on the data to evaluate the
capability of multimodal recommendation agents. Based on the SURE, we propose a
baseline model, powered by a state-of-the-art multimodal model, for these
tasks.
Related papers
- IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification [60.38841251693781]
We propose a novel framework to generate robust multi-modal object ReIDs.
Our framework uses Modal Prefixes and InverseNet to integrate multi-modal information with semantic guidance from inverted text.
Experiments on three multi-modal object ReID benchmarks demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2025-03-13T13:00:31Z) - Joint Modeling in Recommendations: A Survey [46.000357352884926]
Joint modeling approaches are central to overcoming limitations by integrating diverse tasks, scenarios, modalities, and behaviors in the recommendation process.
We define the scope of joint modeling through four distinct dimensions: multi-task, multi-scenario, multi-modal, and multi-behavior modeling.
We highlight several promising avenues for future exploration in joint modeling for recommendations and provide a concise conclusion to our findings.
arXiv Detail & Related papers (2025-02-28T16:14:00Z) - Scenario-Wise Rec: A Multi-Scenario Recommendation Benchmark [54.93461228053298]
We introduce our benchmark, textbfScenario-Wise Rec, which comprises 6 public datasets and 12 benchmark models, along with a training and evaluation pipeline.
We aim for this benchmark to offer researchers valuable insights from prior work, enabling the development of novel models.
arXiv Detail & Related papers (2024-12-23T08:15:34Z) - Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation [9.506245109666907]
Multi-faceted features characterizing products and services may influence each customer on online selling platforms differently.
The common multimodal recommendation pipeline involves (i) extracting multimodal features, (ii) refining their high-level representations to suit the recommendation task, and (iv) predicting the user-item score.
This paper settles as the first attempt to offer a large-scale benchmarking for multimodal recommender systems, with a specific focus on multimodal extractors.
arXiv Detail & Related papers (2024-09-24T08:29:10Z) - Towards Bridging the Cross-modal Semantic Gap for Multi-modal Recommendation [12.306686291299146]
Multi-modal recommendation greatly enhances the performance of recommender systems.
Most existing multi-modal recommendation models exploit multimedia information propagation processes to enrich item representations.
We propose a novel framework to bridge the semantic gap between modalities and extract fine-grained multi-view semantic information.
arXiv Detail & Related papers (2024-07-07T15:56:03Z) - BiVRec: Bidirectional View-based Multimodal Sequential Recommendation [55.87443627659778]
We propose an innovative framework, BivRec, that jointly trains the recommendation tasks in both ID and multimodal views.
BivRec achieves state-of-the-art performance on five datasets and showcases various practical advantages.
arXiv Detail & Related papers (2024-02-27T09:10:41Z) - Ada-Retrieval: An Adaptive Multi-Round Retrieval Paradigm for Sequential
Recommendations [50.03560306423678]
We propose Ada-Retrieval, an adaptive multi-round retrieval paradigm for recommender systems.
Ada-Retrieval iteratively refines user representations to better capture potential candidates in the full item space.
arXiv Detail & Related papers (2024-01-12T15:26:40Z) - DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval.
Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP.
To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z) - Application of frozen large-scale models to multimodal task-oriented
dialogue [0.0]
We use the existing Large Language Models ENnhanced to See Framework (LENS Framework) to test the feasibility of multimodal task-oriented dialogues.
The LENS Framework has been proposed as a method to solve computer vision tasks without additional training and with fixed parameters of pre-trained models.
arXiv Detail & Related papers (2023-10-02T01:42:28Z) - Large Language Models as Zero-Shot Conversational Recommenders [52.57230221644014]
We present empirical studies on conversational recommendation tasks using representative large language models in a zero-shot setting.
We construct a new dataset of recommendation-related conversations by scraping a popular discussion website.
We observe that even without fine-tuning, large language models can outperform existing fine-tuned conversational recommendation models.
arXiv Detail & Related papers (2023-08-19T15:29:45Z) - Read, Look or Listen? What's Needed for Solving a Multimodal Dataset [7.0430001782867]
We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it.
We apply our approach to TVQA, a video question-answering dataset, and discover that most questions can be answered using a single modality, without a substantial bias towards any specific modality.
We analyze the MERLOT Reserve, finding that it struggles with image-based questions compared to text and audio, but also with auditory speaker identification.
arXiv Detail & Related papers (2023-07-06T08:02:45Z) - DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization [127.714919036388]
DIONYSUS is a pre-trained encoder-decoder model for summarizing dialogues in any new domain.
Our experiments show that DIONYSUS outperforms existing methods on six datasets.
arXiv Detail & Related papers (2022-12-20T06:21:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.