End-to-End Image-Based Fashion Recommendation
- URL: http://arxiv.org/abs/2205.02923v1
- Date: Thu, 5 May 2022 21:14:42 GMT
- Title: End-to-End Image-Based Fashion Recommendation
- Authors: Shereen Elsayed, Lukas Brinkmeyer and Lars Schmidt-Thieme
- Abstract summary: In fashion-based recommendation settings, incorporating the item image features is considered a crucial factor.
We propose a simple yet effective attribute-aware model that incorporates image features for better item representation learning.
Experiments on two image-based real-world recommender systems datasets show that the proposed model significantly outperforms all state-of-the-art image-based models.
- Score: 5.210197476419621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In fashion-based recommendation settings, incorporating the item image
features is considered a crucial factor, and it has shown significant
improvements to many traditional models, including but not limited to matrix
factorization, auto-encoders, and nearest neighbor models. While there are
numerous image-based recommender approaches that utilize dedicated deep neural
networks, comparisons to attribute-aware models are often disregarded despite
their ability to be easily extended to leverage items' image features. In this
paper, we propose a simple yet effective attribute-aware model that
incorporates image features for better item representation learning in item
recommendation tasks. The proposed model utilizes items' image features
extracted by a calibrated ResNet50 component. We present an ablation study to
compare incorporating the image features using three different techniques into
the recommender system component that can seamlessly leverage any available
items' attributes. Experiments on two image-based real-world recommender
systems datasets show that the proposed model significantly outperforms all
state-of-the-art image-based models.
Related papers
- Generating Multi-Image Synthetic Data for Text-to-Image Customization [48.59231755159313]
Customization of text-to-image models enables users to insert custom concepts and generate the concepts in unseen settings.
Existing methods either rely on costly test-time optimization or train encoders on single-image training datasets without multi-image supervision.
We propose a simple approach that addresses both limitations.
arXiv Detail & Related papers (2025-02-03T18:59:41Z) - Personalized Fashion Recommendation with Image Attributes and Aesthetics Assessment [15.423307815155534]
We aim to provide more accurate personalized fashion recommendations by converting available information, especially images, into two graphs attribute.
Compared with previous methods that separate image and text as two components, the proposed method combines image and text information to create a richer attributes graph.
Preliminary experiments on the IQON3000 dataset have shown that the proposed method achieves competitive accuracy compared with baselines.
arXiv Detail & Related papers (2025-01-06T15:31:10Z) - FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models [112.94440113631897]
Current methods attempt to distill identity and style from source images.
"style" is a broad concept that includes texture, color, and artistic elements, but does not cover other important attributes such as lighting and dynamics.
We formulate a more effective approach to decompose the aesthetics of a picture into specific visual attributes, allowing users to apply characteristics such as lighting, texture, and dynamics from different images.
arXiv Detail & Related papers (2024-12-10T17:02:58Z) - Multi-View Photometric Stereo Revisited [100.97116470055273]
Multi-view photometric stereo (MVPS) is a preferred method for detailed and precise 3D acquisition of an object from images.
We present a simple, practical approach to MVPS, which works well for isotropic as well as other object material types such as anisotropic and glossy.
The proposed approach shows state-of-the-art results when tested extensively on several benchmark datasets.
arXiv Detail & Related papers (2022-10-14T09:46:15Z) - Composition and Style Attributes Guided Image Aesthetic Assessment [66.60253358722538]
We propose a method for the automatic prediction of the aesthetics of an image.
The proposed network includes: a pre-trained network for semantic features extraction (the Backbone); a Multi Layer Perceptron (MLP) network that relies on the Backbone features for the prediction of image attributes (the AttributeNet)
Given an image, the proposed multi-network is able to predict: style and composition attributes, and aesthetic score distribution.
arXiv Detail & Related papers (2021-11-08T17:16:38Z) - Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention.
We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z) - Inverting Adversarially Robust Networks for Image Synthesis [37.927552662984034]
We propose the use of robust representations as a perceptual primitive for feature inversion models.
We empirically show that adopting robust representations as an image prior significantly improves the reconstruction accuracy of CNN-based feature inversion models.
Following these findings, we propose an encoding-decoding network based on robust representations and show its advantages for applications such as anomaly detection, style transfer and image denoising.
arXiv Detail & Related papers (2021-06-13T05:51:00Z) - Apparel Recommender System based on Bilateral image shape features [0.0]
This study proposes a novel probabilistic model that integrates double convolutional neural networks (CNNs) into recommender systems.
For apparel goods, two trained CNNs from the image shape features of users and items are combined, and the latent variables of users and items are optimized.
Our model predicts outcome more accurately than do other recommender models.
arXiv Detail & Related papers (2021-05-04T14:48:38Z) - Adaptive Compact Attention For Few-shot Video-to-video Translation [13.535988102579918]
We introduce a novel adaptive compact attention mechanism to efficiently extract contextual features jointly from multiple reference images.
Our core idea is to extract compact basis sets from all the reference images as higher-level representations.
We extensively evaluate our method on a large-scale talking-head video dataset and a human dancing dataset.
arXiv Detail & Related papers (2020-11-30T11:19:12Z) - Saliency-driven Class Impressions for Feature Visualization of Deep
Neural Networks [55.11806035788036]
It is advantageous to visualize the features considered to be essential for classification.
Existing visualization methods develop high confidence images consisting of both background and foreground features.
In this work, we propose a saliency-driven approach to visualize discriminative features that are considered most important for a given task.
arXiv Detail & Related papers (2020-07-31T06:11:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.