End-to-End Image-Based Fashion Recommendation
- URL: http://arxiv.org/abs/2205.02923v1
- Date: Thu, 5 May 2022 21:14:42 GMT
- Title: End-to-End Image-Based Fashion Recommendation
- Authors: Shereen Elsayed, Lukas Brinkmeyer and Lars Schmidt-Thieme
- Abstract summary: In fashion-based recommendation settings, incorporating the item image features is considered a crucial factor.
We propose a simple yet effective attribute-aware model that incorporates image features for better item representation learning.
Experiments on two image-based real-world recommender systems datasets show that the proposed model significantly outperforms all state-of-the-art image-based models.
- Score: 5.210197476419621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In fashion-based recommendation settings, incorporating the item image
features is considered a crucial factor, and it has shown significant
improvements to many traditional models, including but not limited to matrix
factorization, auto-encoders, and nearest neighbor models. While there are
numerous image-based recommender approaches that utilize dedicated deep neural
networks, comparisons to attribute-aware models are often disregarded despite
their ability to be easily extended to leverage items' image features. In this
paper, we propose a simple yet effective attribute-aware model that
incorporates image features for better item representation learning in item
recommendation tasks. The proposed model utilizes items' image features
extracted by a calibrated ResNet50 component. We present an ablation study to
compare incorporating the image features using three different techniques into
the recommender system component that can seamlessly leverage any available
items' attributes. Experiments on two image-based real-world recommender
systems datasets show that the proposed model significantly outperforms all
state-of-the-art image-based models.
Related papers
- Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder [57.574544285878794]
Ada-Adapter is a novel framework for few-shot style personalization of diffusion models.
Our method enables efficient zero-shot style transfer utilizing a single reference image.
We demonstrate the effectiveness of our approach on various artistic styles, including flat art, 3D rendering, and logo design.
arXiv Detail & Related papers (2024-07-08T02:00:17Z) - Named Entity and Relation Extraction with Multi-Modal Retrieval [51.660650522630526]
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE.
We propose a novel Multi-modal Retrieval based framework (MoRe)
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
arXiv Detail & Related papers (2022-12-03T13:11:32Z) - Multi-View Photometric Stereo Revisited [100.97116470055273]
Multi-view photometric stereo (MVPS) is a preferred method for detailed and precise 3D acquisition of an object from images.
We present a simple, practical approach to MVPS, which works well for isotropic as well as other object material types such as anisotropic and glossy.
The proposed approach shows state-of-the-art results when tested extensively on several benchmark datasets.
arXiv Detail & Related papers (2022-10-14T09:46:15Z) - Aesthetic Attribute Assessment of Images Numerically on Mixed
Multi-attribute Datasets [16.120684660965978]
We construct an image attribute dataset called aesthetic mixed dataset with attributes(AMD-A) and design external attribute features for fusion.
Our model can achieve aesthetic classification, overall scoring and attribute scoring.
Experimental results, using the MindSpore, show that our proposed method can effectively improve the performance of the aesthetic overall and attribute assessment.
arXiv Detail & Related papers (2022-07-05T04:42:10Z) - Composition and Style Attributes Guided Image Aesthetic Assessment [66.60253358722538]
We propose a method for the automatic prediction of the aesthetics of an image.
The proposed network includes: a pre-trained network for semantic features extraction (the Backbone); a Multi Layer Perceptron (MLP) network that relies on the Backbone features for the prediction of image attributes (the AttributeNet)
Given an image, the proposed multi-network is able to predict: style and composition attributes, and aesthetic score distribution.
arXiv Detail & Related papers (2021-11-08T17:16:38Z) - Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention.
We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z) - Inverting Adversarially Robust Networks for Image Synthesis [37.927552662984034]
We propose the use of robust representations as a perceptual primitive for feature inversion models.
We empirically show that adopting robust representations as an image prior significantly improves the reconstruction accuracy of CNN-based feature inversion models.
Following these findings, we propose an encoding-decoding network based on robust representations and show its advantages for applications such as anomaly detection, style transfer and image denoising.
arXiv Detail & Related papers (2021-06-13T05:51:00Z) - Apparel Recommender System based on Bilateral image shape features [0.0]
This study proposes a novel probabilistic model that integrates double convolutional neural networks (CNNs) into recommender systems.
For apparel goods, two trained CNNs from the image shape features of users and items are combined, and the latent variables of users and items are optimized.
Our model predicts outcome more accurately than do other recommender models.
arXiv Detail & Related papers (2021-05-04T14:48:38Z) - Adaptive Compact Attention For Few-shot Video-to-video Translation [13.535988102579918]
We introduce a novel adaptive compact attention mechanism to efficiently extract contextual features jointly from multiple reference images.
Our core idea is to extract compact basis sets from all the reference images as higher-level representations.
We extensively evaluate our method on a large-scale talking-head video dataset and a human dancing dataset.
arXiv Detail & Related papers (2020-11-30T11:19:12Z) - Saliency-driven Class Impressions for Feature Visualization of Deep
Neural Networks [55.11806035788036]
It is advantageous to visualize the features considered to be essential for classification.
Existing visualization methods develop high confidence images consisting of both background and foreground features.
In this work, we propose a saliency-driven approach to visualize discriminative features that are considered most important for a given task.
arXiv Detail & Related papers (2020-07-31T06:11:06Z) - Multi-Image Summarization: Textual Summary from a Set of Cohesive Images [17.688344968462275]
This paper proposes the new task of multi-image summarization.
It aims to generate a concise and descriptive textual summary given a coherent set of input images.
A dense average image feature aggregation network allows the model to focus on a coherent subset of attributes.
arXiv Detail & Related papers (2020-06-15T18:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.