End-to-End Image-Based Fashion Recommendation
- URL: http://arxiv.org/abs/2205.02923v1
- Date: Thu, 5 May 2022 21:14:42 GMT
- Title: End-to-End Image-Based Fashion Recommendation
- Authors: Shereen Elsayed, Lukas Brinkmeyer and Lars Schmidt-Thieme
- Abstract summary: In fashion-based recommendation settings, incorporating the item image features is considered a crucial factor.
We propose a simple yet effective attribute-aware model that incorporates image features for better item representation learning.
Experiments on two image-based real-world recommender systems datasets show that the proposed model significantly outperforms all state-of-the-art image-based models.
- Score: 5.210197476419621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In fashion-based recommendation settings, incorporating the item image
features is considered a crucial factor, and it has shown significant
improvements to many traditional models, including but not limited to matrix
factorization, auto-encoders, and nearest neighbor models. While there are
numerous image-based recommender approaches that utilize dedicated deep neural
networks, comparisons to attribute-aware models are often disregarded despite
their ability to be easily extended to leverage items' image features. In this
paper, we propose a simple yet effective attribute-aware model that
incorporates image features for better item representation learning in item
recommendation tasks. The proposed model utilizes items' image features
extracted by a calibrated ResNet50 component. We present an ablation study to
compare incorporating the image features using three different techniques into
the recommender system component that can seamlessly leverage any available
items' attributes. Experiments on two image-based real-world recommender
systems datasets show that the proposed model significantly outperforms all
state-of-the-art image-based models.
Related papers
- MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models [85.30735602813093]
Multi-Image Augmented Direct Preference Optimization (MIA-DPO) is a visual preference alignment approach that effectively handles multi-image inputs.
MIA-DPO mitigates the scarcity of diverse multi-image training data by extending single-image data with unrelated images arranged in grid collages or pic-in-pic formats.
arXiv Detail & Related papers (2024-10-23T07:56:48Z) - ARMADA: Attribute-Based Multimodal Data Augmentation [93.05614922383822]
Attribute-based Multimodal Data Augmentation (ARMADA) is a novel multimodal data augmentation method via knowledge-guided manipulation of visual attributes.
ARMADA is a novel multimodal data generation framework that: (i) extracts knowledge-grounded attributes from symbolic KBs for semantically consistent yet distinctive image-text pair generation.
This also highlights the need to leverage external knowledge proxies for enhanced interpretability and real-world grounding.
arXiv Detail & Related papers (2024-08-19T15:27:25Z) - Multi-View Photometric Stereo Revisited [100.97116470055273]
Multi-view photometric stereo (MVPS) is a preferred method for detailed and precise 3D acquisition of an object from images.
We present a simple, practical approach to MVPS, which works well for isotropic as well as other object material types such as anisotropic and glossy.
The proposed approach shows state-of-the-art results when tested extensively on several benchmark datasets.
arXiv Detail & Related papers (2022-10-14T09:46:15Z) - Aesthetic Attribute Assessment of Images Numerically on Mixed
Multi-attribute Datasets [16.120684660965978]
We construct an image attribute dataset called aesthetic mixed dataset with attributes(AMD-A) and design external attribute features for fusion.
Our model can achieve aesthetic classification, overall scoring and attribute scoring.
Experimental results, using the MindSpore, show that our proposed method can effectively improve the performance of the aesthetic overall and attribute assessment.
arXiv Detail & Related papers (2022-07-05T04:42:10Z) - Composition and Style Attributes Guided Image Aesthetic Assessment [66.60253358722538]
We propose a method for the automatic prediction of the aesthetics of an image.
The proposed network includes: a pre-trained network for semantic features extraction (the Backbone); a Multi Layer Perceptron (MLP) network that relies on the Backbone features for the prediction of image attributes (the AttributeNet)
Given an image, the proposed multi-network is able to predict: style and composition attributes, and aesthetic score distribution.
arXiv Detail & Related papers (2021-11-08T17:16:38Z) - Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention.
We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z) - Inverting Adversarially Robust Networks for Image Synthesis [37.927552662984034]
We propose the use of robust representations as a perceptual primitive for feature inversion models.
We empirically show that adopting robust representations as an image prior significantly improves the reconstruction accuracy of CNN-based feature inversion models.
Following these findings, we propose an encoding-decoding network based on robust representations and show its advantages for applications such as anomaly detection, style transfer and image denoising.
arXiv Detail & Related papers (2021-06-13T05:51:00Z) - Apparel Recommender System based on Bilateral image shape features [0.0]
This study proposes a novel probabilistic model that integrates double convolutional neural networks (CNNs) into recommender systems.
For apparel goods, two trained CNNs from the image shape features of users and items are combined, and the latent variables of users and items are optimized.
Our model predicts outcome more accurately than do other recommender models.
arXiv Detail & Related papers (2021-05-04T14:48:38Z) - Adaptive Compact Attention For Few-shot Video-to-video Translation [13.535988102579918]
We introduce a novel adaptive compact attention mechanism to efficiently extract contextual features jointly from multiple reference images.
Our core idea is to extract compact basis sets from all the reference images as higher-level representations.
We extensively evaluate our method on a large-scale talking-head video dataset and a human dancing dataset.
arXiv Detail & Related papers (2020-11-30T11:19:12Z) - Saliency-driven Class Impressions for Feature Visualization of Deep
Neural Networks [55.11806035788036]
It is advantageous to visualize the features considered to be essential for classification.
Existing visualization methods develop high confidence images consisting of both background and foreground features.
In this work, we propose a saliency-driven approach to visualize discriminative features that are considered most important for a given task.
arXiv Detail & Related papers (2020-07-31T06:11:06Z) - Multi-Image Summarization: Textual Summary from a Set of Cohesive Images [17.688344968462275]
This paper proposes the new task of multi-image summarization.
It aims to generate a concise and descriptive textual summary given a coherent set of input images.
A dense average image feature aggregation network allows the model to focus on a coherent subset of attributes.
arXiv Detail & Related papers (2020-06-15T18:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.