Tell Me What Is Good About This Property: Leveraging Reviews For
Segment-Personalized Image Collection Summarization
- URL: http://arxiv.org/abs/2310.19743v1
- Date: Mon, 30 Oct 2023 17:06:49 GMT
- Title: Tell Me What Is Good About This Property: Leveraging Reviews For
Segment-Personalized Image Collection Summarization
- Authors: Monika Wysoczanska, Moran Beladev, Karen Lastmann Assaraf, Fengjun
Wang, Ofri Kleinfeld, Gil Amsalem, Hadas Harush Boker
- Abstract summary: We consider user intentions in the summarization of property visuals by analyzing property reviews.
By incorporating the insights from reviews in our visual summaries, we enhance the summaries by presenting the relevant content to a user.
Our experiments, including human perceptual studies, demonstrate the superiority of our cross-modal approach.
- Score: 3.063926257586959
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image collection summarization techniques aim to present a compact
representation of an image gallery through a carefully selected subset of
images that captures its semantic content. When it comes to web content,
however, the ideal selection can vary based on the user's specific intentions
and preferences. This is particularly relevant at Booking.com, where presenting
properties and their visual summaries that align with users' expectations is
crucial. To address this challenge, we consider user intentions in the
summarization of property visuals by analyzing property reviews and extracting
the most significant aspects mentioned by users. By incorporating the insights
from reviews in our visual summaries, we enhance the summaries by presenting
the relevant content to a user. Moreover, we achieve it without the need for
costly annotations. Our experiments, including human perceptual studies,
demonstrate the superiority of our cross-modal approach, which we coin as
CrossSummarizer over the no-personalization and image-based clustering
baselines.
Related papers
- Enhancing Historical Image Retrieval with Compositional Cues [3.2276097734075426]
We introduce a crucial factor from computational aesthetics, namely image composition, into this topic.
By explicitly integrating composition-related information extracted by CNN into the designed retrieval model, our method considers both the image's composition rules and semantic information.
arXiv Detail & Related papers (2024-03-21T10:51:19Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - Collaborative Group: Composed Image Retrieval via Consensus Learning from Noisy Annotations [67.92679668612858]
We propose the Consensus Network (Css-Net), inspired by the psychological concept that groups outperform individuals.
Css-Net comprises two core components: (1) a consensus module with four diverse compositors, each generating distinct image-text embeddings; and (2) a Kullback-Leibler divergence loss that encourages learning of inter-compositor interactions.
On benchmark datasets, particularly FashionIQ, Css-Net demonstrates marked improvements. Notably, it achieves significant recall gains, with a 2.77% increase in R@10 and 6.67% boost in R@50, underscoring its
arXiv Detail & Related papers (2023-06-03T11:50:44Z) - The Elements of Visual Art Recommendation: Learning Latent Semantic
Representations of Paintings [7.79230326339002]
Artwork recommendation is challenging because it requires understanding how users interact with highly subjective content.
In this paper, we focus on efficiently capturing the elements (i.e., latent semantic relationships) of visual art for personalized recommendation.
arXiv Detail & Related papers (2023-02-28T18:17:36Z) - Can you recommend content to creatives instead of final consumers? A
RecSys based on user's preferred visual styles [69.69160476215895]
This report is an extension of the paper "Learning Users' Preferred Visual Styles in an Image Marketplace", presented at ACM RecSys '22.
We design a RecSys that learns visual styles preferences to the semantics of the projects users work on.
arXiv Detail & Related papers (2022-08-23T12:11:28Z) - FaIRCoP: Facial Image Retrieval using Contrastive Personalization [43.293482565385055]
Retrieving facial images from attributes plays a vital role in various systems such as face recognition and suspect identification.
Existing methods do so by comparing specific characteristics from the user's mental image against the suggested images.
We propose a method that uses the user's feedback to label images as either similar or dissimilar to the target image.
arXiv Detail & Related papers (2022-05-28T09:52:09Z) - Semantic-Aware Generation for Self-Supervised Visual Representation
Learning [116.5814634936371]
We advocate for Semantic-aware Generation (SaGe) to facilitate richer semantics rather than details to be preserved in the generated image.
SaGe complements the target network with view-specific features and thus alleviates the semantic degradation brought by intensive data augmentations.
We execute SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream tasks including nearest neighbor test, linear classification, and fine-scaled image recognition.
arXiv Detail & Related papers (2021-11-25T16:46:13Z) - Composition and Style Attributes Guided Image Aesthetic Assessment [66.60253358722538]
We propose a method for the automatic prediction of the aesthetics of an image.
The proposed network includes: a pre-trained network for semantic features extraction (the Backbone); a Multi Layer Perceptron (MLP) network that relies on the Backbone features for the prediction of image attributes (the AttributeNet)
Given an image, the proposed multi-network is able to predict: style and composition attributes, and aesthetic score distribution.
arXiv Detail & Related papers (2021-11-08T17:16:38Z) - Exploring Set Similarity for Dense Self-supervised Representation
Learning [96.35286140203407]
We propose to explore textbfset textbfsimilarity (SetSim) for dense self-supervised representation learning.
We generalize pixel-wise similarity learning to set-wise one to improve the robustness because sets contain more semantic and structure information.
Specifically, by resorting to attentional features of views, we establish corresponding sets, thus filtering out noisy backgrounds that may cause incorrect correspondences.
arXiv Detail & Related papers (2021-07-19T09:38:27Z) - Content and Context Features for Scene Image Representation [16.252523139552174]
We propose new techniques to compute content features and context features, and then fuse them together.
For content features, we design multi-scale deep features based on background and foreground information in images.
For context features, we use annotations of similar images available in the web to design a filter words (codebook)
arXiv Detail & Related papers (2020-06-05T03:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.