Related papers: End-to-End Image-Based Fashion Recommendation

End-to-End Image-Based Fashion Recommendation

URL: http://arxiv.org/abs/2205.02923v1
Date: Thu, 5 May 2022 21:14:42 GMT
Title: End-to-End Image-Based Fashion Recommendation
Authors: Shereen Elsayed, Lukas Brinkmeyer and Lars Schmidt-Thieme
Abstract summary: In fashion-based recommendation settings, incorporating the item image features is considered a crucial factor. We propose a simple yet effective attribute-aware model that incorporates image features for better item representation learning. Experiments on two image-based real-world recommender systems datasets show that the proposed model significantly outperforms all state-of-the-art image-based models.
Score: 5.210197476419621
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In fashion-based recommendation settings, incorporating the item image features is considered a crucial factor, and it has shown significant improvements to many traditional models, including but not limited to matrix factorization, auto-encoders, and nearest neighbor models. While there are numerous image-based recommender approaches that utilize dedicated deep neural networks, comparisons to attribute-aware models are often disregarded despite their ability to be easily extended to leverage items' image features. In this paper, we propose a simple yet effective attribute-aware model that incorporates image features for better item representation learning in item recommendation tasks. The proposed model utilizes items' image features extracted by a calibrated ResNet50 component. We present an ablation study to compare incorporating the image features using three different techniques into the recommender system component that can seamlessly leverage any available items' attributes. Experiments on two image-based real-world recommender systems datasets show that the proposed model significantly outperforms all state-of-the-art image-based models.

Related papers

Generating Multi-Image Synthetic Data for Text-to-Image Customization [48.59231755159313]
Customization of text-to-image models enables users to insert custom concepts and generate the concepts in unseen settings. Existing methods either rely on costly test-time optimization or train encoders on single-image training datasets without multi-image supervision. We propose a simple approach that addresses both limitations.
arXiv Detail & Related papers (2025-02-03T18:59:41Z)
Personalized Fashion Recommendation with Image Attributes and Aesthetics Assessment [15.423307815155534]
We aim to provide more accurate personalized fashion recommendations by converting available information, especially images, into two graphs attribute. Compared with previous methods that separate image and text as two components, the proposed method combines image and text information to create a richer attributes graph. Preliminary experiments on the IQON3000 dataset have shown that the proposed method achieves competitive accuracy compared with baselines.
arXiv Detail & Related papers (2025-01-06T15:31:10Z)
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models [112.94440113631897]
Current methods attempt to distill identity and style from source images. "style" is a broad concept that includes texture, color, and artistic elements, but does not cover other important attributes such as lighting and dynamics. We formulate a more effective approach to decompose the aesthetics of a picture into specific visual attributes, allowing users to apply characteristics such as lighting, texture, and dynamics from different images.
arXiv Detail & Related papers (2024-12-10T17:02:58Z)
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models [85.30735602813093]
Multi-Image Augmented Direct Preference Optimization (MIA-DPO) is a visual preference alignment approach that effectively handles multi-image inputs. MIA-DPO mitigates the scarcity of diverse multi-image training data by extending single-image data with unrelated images arranged in grid collages or pic-in-pic formats.
arXiv Detail & Related papers (2024-10-23T07:56:48Z)
ARMADA: Attribute-Based Multimodal Data Augmentation [93.05614922383822]
Attribute-based Multimodal Data Augmentation (ARMADA) is a novel multimodal data augmentation method via knowledge-guided manipulation of visual attributes. ARMADA is a novel multimodal data generation framework that: (i) extracts knowledge-grounded attributes from symbolic KBs for semantically consistent yet distinctive image-text pair generation. This also highlights the need to leverage external knowledge proxies for enhanced interpretability and real-world grounding.
arXiv Detail & Related papers (2024-08-19T15:27:25Z)
Multi-View Photometric Stereo Revisited [100.97116470055273]
Multi-view photometric stereo (MVPS) is a preferred method for detailed and precise 3D acquisition of an object from images. We present a simple, practical approach to MVPS, which works well for isotropic as well as other object material types such as anisotropic and glossy. The proposed approach shows state-of-the-art results when tested extensively on several benchmark datasets.
arXiv Detail & Related papers (2022-10-14T09:46:15Z)
Aesthetic Attribute Assessment of Images Numerically on Mixed Multi-attribute Datasets [16.120684660965978]
We construct an image attribute dataset called aesthetic mixed dataset with attributes(AMD-A) and design external attribute features for fusion. Our model can achieve aesthetic classification, overall scoring and attribute scoring. Experimental results, using the MindSpore, show that our proposed method can effectively improve the performance of the aesthetic overall and attribute assessment.
arXiv Detail & Related papers (2022-07-05T04:42:10Z)
Composition and Style Attributes Guided Image Aesthetic Assessment [66.60253358722538]
We propose a method for the automatic prediction of the aesthetics of an image. The proposed network includes: a pre-trained network for semantic features extraction (the Backbone); a Multi Layer Perceptron (MLP) network that relies on the Backbone features for the prediction of image attributes (the AttributeNet) Given an image, the proposed multi-network is able to predict: style and composition attributes, and aesthetic score distribution.
arXiv Detail & Related papers (2021-11-08T17:16:38Z)
Contextual Similarity Aggregation with Self-attention for Visual Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention. We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z)
Inverting Adversarially Robust Networks for Image Synthesis [37.927552662984034]
We propose the use of robust representations as a perceptual primitive for feature inversion models. We empirically show that adopting robust representations as an image prior significantly improves the reconstruction accuracy of CNN-based feature inversion models. Following these findings, we propose an encoding-decoding network based on robust representations and show its advantages for applications such as anomaly detection, style transfer and image denoising.
arXiv Detail & Related papers (2021-06-13T05:51:00Z)
Apparel Recommender System based on Bilateral image shape features [0.0]
This study proposes a novel probabilistic model that integrates double convolutional neural networks (CNNs) into recommender systems. For apparel goods, two trained CNNs from the image shape features of users and items are combined, and the latent variables of users and items are optimized. Our model predicts outcome more accurately than do other recommender models.
arXiv Detail & Related papers (2021-05-04T14:48:38Z)
Adaptive Compact Attention For Few-shot Video-to-video Translation [13.535988102579918]
We introduce a novel adaptive compact attention mechanism to efficiently extract contextual features jointly from multiple reference images. Our core idea is to extract compact basis sets from all the reference images as higher-level representations. We extensively evaluate our method on a large-scale talking-head video dataset and a human dancing dataset.
arXiv Detail & Related papers (2020-11-30T11:19:12Z)
Saliency-driven Class Impressions for Feature Visualization of Deep Neural Networks [55.11806035788036]
It is advantageous to visualize the features considered to be essential for classification. Existing visualization methods develop high confidence images consisting of both background and foreground features. In this work, we propose a saliency-driven approach to visualize discriminative features that are considered most important for a given task.
arXiv Detail & Related papers (2020-07-31T06:11:06Z)
Multi-Image Summarization: Textual Summary from a Set of Cohesive Images [17.688344968462275]
This paper proposes the new task of multi-image summarization. It aims to generate a concise and descriptive textual summary given a coherent set of input images. A dense average image feature aggregation network allows the model to focus on a coherent subset of attributes.
arXiv Detail & Related papers (2020-06-15T18:45:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.