Where to Look and How to Describe: Fashion Image Retrieval with an
Attentional Heterogeneous Bilinear Network
- URL: http://arxiv.org/abs/2010.13357v1
- Date: Mon, 26 Oct 2020 06:01:09 GMT
- Title: Where to Look and How to Describe: Fashion Image Retrieval with an
Attentional Heterogeneous Bilinear Network
- Authors: Haibo Su, Peng Wang, Lingqiao Liu, Hui Li, Zhen Li, Yanning Zhang
- Abstract summary: We propose a biologically inspired framework for image-based fashion product retrieval.
Our proposed framework achieves satisfactory performance on three image-based fashion product retrieval benchmarks.
- Score: 50.19558726384559
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fashion products typically feature in compositions of a variety of styles at
different clothing parts. In order to distinguish images of different fashion
products, we need to extract both appearance (i.e., "how to describe") and
localization (i.e.,"where to look") information, and their interactions. To
this end, we propose a biologically inspired framework for image-based fashion
product retrieval, which mimics the hypothesized twostream visual processing
system of human brain. The proposed attentional heterogeneous bilinear network
(AHBN) consists of two branches: a deep CNN branch to extract fine-grained
appearance attributes and a fully convolutional branch to extract landmark
localization information. A joint channel-wise attention mechanism is further
applied to the extracted heterogeneous features to focus on important channels,
followed by a compact bilinear pooling layer to model the interaction of the
two streams. Our proposed framework achieves satisfactory performance on three
image-based fashion product retrieval benchmarks.
Related papers
- BIMM: Brain Inspired Masked Modeling for Video Representation Learning [47.56270575865621]
We propose the Brain Inspired Masked Modeling (BIMM) framework, aiming to learn comprehensive representations from videos.
Specifically, our approach consists of ventral and dorsal branches, which learn image and video representations, respectively.
To achieve the goals of different visual cortices in the brain, we segment the encoder of each branch into three intermediate blocks and reconstruct progressive prediction targets with light weight decoders.
arXiv Detail & Related papers (2024-05-21T13:09:04Z) - Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation [6.479933058008389]
Style-Extracting Diffusion Models generate images with unseen characteristics beneficial for downstream tasks.
In this work, we show the capability of our method on a natural image dataset as a proof-of-concept.
We verify the added value of the generated images by showing improved segmentation results and lower performance variability between patients.
arXiv Detail & Related papers (2024-03-21T14:36:59Z) - Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - MMFL-Net: Multi-scale and Multi-granularity Feature Learning for
Cross-domain Fashion Retrieval [3.7045939497992917]
Cross-domain fashion retrieval is a difficult task due to a wide range of consumer-to-shop (C2S) domain discrepancies.
We propose a novel multi-scale and multi-granularity feature learning network (MMFL-Net), which can jointly learn global-local aggregation feature representations of clothing images.
Our proposed model also combines the multi-task attribute recognition and classification module with multi-label semantic attributes and product ID labels.
arXiv Detail & Related papers (2022-10-27T02:25:52Z) - Single Stage Virtual Try-on via Deformable Attention Flows [51.70606454288168]
Virtual try-on aims to generate a photo-realistic fitting result given an in-shop garment and a reference person image.
We develop a novel Deformable Attention Flow (DAFlow) which applies the deformable attention scheme to multi-flow estimation.
Our proposed method achieves state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-07-19T10:01:31Z) - Cross-View Panorama Image Synthesis [68.35351563852335]
PanoGAN is a novel adversarial feedback GAN framework named.
PanoGAN enables high-quality panorama image generation with more convincing details than state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-22T15:59:44Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - Unsupervised Learning of Landmarks based on Inter-Intra Subject
Consistencies [72.67344725725961]
We present a novel unsupervised learning approach to image landmark discovery by incorporating the inter-subject landmark consistencies on facial images.
This is achieved via an inter-subject mapping module that transforms original subject landmarks based on an auxiliary subject-related structure.
To recover from the transformed images back to the original subject, the landmark detector is forced to learn spatial locations that contain the consistent semantic meanings both for the paired intra-subject images and between the paired inter-subject images.
arXiv Detail & Related papers (2020-04-16T20:38:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.