Related papers: Tell Me What Is Good About This Property: Leveraging Reviews For Segment-Personalized Image Collection Summarization

Tell Me What Is Good About This Property: Leveraging Reviews For Segment-Personalized Image Collection Summarization

URL: http://arxiv.org/abs/2310.19743v1
Date: Mon, 30 Oct 2023 17:06:49 GMT
Title: Tell Me What Is Good About This Property: Leveraging Reviews For Segment-Personalized Image Collection Summarization
Authors: Monika Wysoczanska, Moran Beladev, Karen Lastmann Assaraf, Fengjun Wang, Ofri Kleinfeld, Gil Amsalem, Hadas Harush Boker
Abstract summary: We consider user intentions in the summarization of property visuals by analyzing property reviews. By incorporating the insights from reviews in our visual summaries, we enhance the summaries by presenting the relevant content to a user. Our experiments, including human perceptual studies, demonstrate the superiority of our cross-modal approach.
Score: 3.063926257586959
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Image collection summarization techniques aim to present a compact representation of an image gallery through a carefully selected subset of images that captures its semantic content. When it comes to web content, however, the ideal selection can vary based on the user's specific intentions and preferences. This is particularly relevant at Booking.com, where presenting properties and their visual summaries that align with users' expectations is crucial. To address this challenge, we consider user intentions in the summarization of property visuals by analyzing property reviews and extracting the most significant aspects mentioned by users. By incorporating the insights from reviews in our visual summaries, we enhance the summaries by presenting the relevant content to a user. Moreover, we achieve it without the need for costly annotations. Our experiments, including human perceptual studies, demonstrate the superiority of our cross-modal approach, which we coin as CrossSummarizer over the no-personalization and image-based clustering baselines.

Related papers

RAGAR: Retrieval Augment Personalized Image Generation Guided by Recommendation [9.31199434211423]
We propose Retrieval Augment Personalized Image GenerAtion guided by Recommendation (RAGAR)<n>Our approach uses a retrieval mechanism to assign different weights to historical items according to their similarities to the reference item.<n> RAGAR achieves significant improvements in both personalization and semantic metrics compared to five baselines.
arXiv Detail & Related papers (2025-05-03T02:20:30Z)
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning [77.2852342808769]
In this paper, we introduce a detailed caption benchmark, termed as CompreCap, to evaluate the visual context from a directed scene graph view. We first manually segment the image into semantically meaningful regions according to common-object vocabulary, while also distinguishing attributes of objects within all those regions. Then directional relation labels of these objects are annotated to compose a directed scene graph that can well encode rich compositional information of the image.
arXiv Detail & Related papers (2024-12-11T18:37:42Z)
Enhancing Historical Image Retrieval with Compositional Cues [3.2276097734075426]
We introduce a crucial factor from computational aesthetics, namely image composition, into this topic. By explicitly integrating composition-related information extracted by CNN into the designed retrieval model, our method considers both the image's composition rules and semantic information.
arXiv Detail & Related papers (2024-03-21T10:51:19Z)
Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language Models [53.337728969143086]
Recommendation systems harness user-item interactions like clicks and reviews to learn their representations. Previous studies improve recommendation accuracy and interpretability by modeling user preferences across various aspects and intents. We introduce a chain-based prompting approach to uncover semantic aspect-aware interactions.
arXiv Detail & Related papers (2023-12-26T15:44:09Z)
Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context. We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available. We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z)
Collaborative Group: Composed Image Retrieval via Consensus Learning from Noisy Annotations [67.92679668612858]
We propose the Consensus Network (Css-Net), inspired by the psychological concept that groups outperform individuals. Css-Net comprises two core components: (1) a consensus module with four diverse compositors, each generating distinct image-text embeddings; and (2) a Kullback-Leibler divergence loss that encourages learning of inter-compositor interactions. On benchmark datasets, particularly FashionIQ, Css-Net demonstrates marked improvements. Notably, it achieves significant recall gains, with a 2.77% increase in R@10 and 6.67% boost in R@50, underscoring its
arXiv Detail & Related papers (2023-06-03T11:50:44Z)
The Elements of Visual Art Recommendation: Learning Latent Semantic Representations of Paintings [7.79230326339002]
Artwork recommendation is challenging because it requires understanding how users interact with highly subjective content. In this paper, we focus on efficiently capturing the elements (i.e., latent semantic relationships) of visual art for personalized recommendation.
arXiv Detail & Related papers (2023-02-28T18:17:36Z)
Can you recommend content to creatives instead of final consumers? A RecSys based on user's preferred visual styles [69.69160476215895]
This report is an extension of the paper "Learning Users' Preferred Visual Styles in an Image Marketplace", presented at ACM RecSys '22. We design a RecSys that learns visual styles preferences to the semantics of the projects users work on.
arXiv Detail & Related papers (2022-08-23T12:11:28Z)
FaIRCoP: Facial Image Retrieval using Contrastive Personalization [43.293482565385055]
Retrieving facial images from attributes plays a vital role in various systems such as face recognition and suspect identification. Existing methods do so by comparing specific characteristics from the user's mental image against the suggested images. We propose a method that uses the user's feedback to label images as either similar or dissimilar to the target image.
arXiv Detail & Related papers (2022-05-28T09:52:09Z)
Composition and Style Attributes Guided Image Aesthetic Assessment [66.60253358722538]
We propose a method for the automatic prediction of the aesthetics of an image. The proposed network includes: a pre-trained network for semantic features extraction (the Backbone); a Multi Layer Perceptron (MLP) network that relies on the Backbone features for the prediction of image attributes (the AttributeNet) Given an image, the proposed multi-network is able to predict: style and composition attributes, and aesthetic score distribution.
arXiv Detail & Related papers (2021-11-08T17:16:38Z)
Exploring Set Similarity for Dense Self-supervised Representation Learning [96.35286140203407]
We propose to explore textbfset textbfsimilarity (SetSim) for dense self-supervised representation learning. We generalize pixel-wise similarity learning to set-wise one to improve the robustness because sets contain more semantic and structure information. Specifically, by resorting to attentional features of views, we establish corresponding sets, thus filtering out noisy backgrounds that may cause incorrect correspondences.
arXiv Detail & Related papers (2021-07-19T09:38:27Z)
Content and Context Features for Scene Image Representation [16.252523139552174]
We propose new techniques to compute content features and context features, and then fuse them together. For content features, we design multi-scale deep features based on background and foreground information in images. For context features, we use annotations of similar images available in the web to design a filter words (codebook)
arXiv Detail & Related papers (2020-06-05T03:19:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.