Symbiotic Adversarial Learning for Attribute-based Person Search
- URL: http://arxiv.org/abs/2007.09609v2
- Date: Mon, 24 Aug 2020 12:24:34 GMT
- Title: Symbiotic Adversarial Learning for Attribute-based Person Search
- Authors: Yu-Tong Cao, Jingya Wang, Dacheng Tao
- Abstract summary: We present a symbiotic adversarial learning framework, called SAL.Two GANs sit at the base of the framework in a symbiotic learning scheme.
Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process.
- Score: 86.7506832053208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attribute-based person search is in significant demand for applications where
no detected query images are available, such as identifying a criminal from
witness. However, the task itself is quite challenging because there is a huge
modality gap between images and physical descriptions of attributes. Often,
there may also be a large number of unseen categories (attribute combinations).
The current state-of-the-art methods either focus on learning better
cross-modal embeddings by mining only seen data, or they explicitly use
generative adversarial networks (GANs) to synthesize unseen features. The
former tends to produce poor embeddings due to insufficient data, while the
latter does not preserve intra-class compactness during generation. In this
paper, we present a symbiotic adversarial learning framework, called SAL.Two
GANs sit at the base of the framework in a symbiotic learning scheme: one
synthesizes features of unseen classes/categories, while the other optimizes
the embedding and performs the cross-modal alignment on the common embedding
space .Specifically, two different types of generative adversarial networks
learn collaboratively throughout the training process and the interactions
between the two mutually benefit each other. Extensive evaluations show SAL's
superiority over nine state-of-the-art methods with two challenging pedestrian
benchmarks, PETA and Market-1501. The code is publicly available at:
https://github.com/ycao5602/SAL .
Related papers
- GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning.
The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms.
Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z) - Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework [0.0]
Generalized Zero-Shot Learning (GZSL) is a challenging task requiring accurate classification of both seen and unseen classes.
We introduce a general framework employing out-of-distribution (OOD) detection, aiming to harness the strengths of both approaches.
We test our framework on three popular audio-visual datasets and observe a significant improvement comparing to existing state-of-the-art works.
arXiv Detail & Related papers (2024-08-02T14:10:20Z) - Few-Shot Classification of Interactive Activities of Daily Living (InteractADL) [17.15896055218621]
We propose a new dataset and benchmark, InteractADL, for understanding complex ADLs that involve interaction between humans (and objects)
We propose a novel method for fine-grained few-shot video classification called Name Tuning that enables greater semantic separability by learning optimal class name vectors.
arXiv Detail & Related papers (2024-06-03T17:59:55Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot
Action Recognition [33.23662792742078]
We propose a two-stage deep neural network for zero-shot action recognition.
In the sampling stage, we utilize a generative adversarial networks (GAN) trained by action features and word vectors of seen classes.
In the classification stage, we construct a knowledge graph based on the relationship between word vectors of action classes and related objects.
arXiv Detail & Related papers (2021-05-25T09:34:42Z) - Generative Multi-Label Zero-Shot Learning [136.17594611722285]
Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training.
Our work is the first to tackle the problem of multi-label feature in the (generalized) zero-shot setting.
Our cross-level fusion-based generative approach outperforms the state-of-the-art on all three datasets.
arXiv Detail & Related papers (2021-01-27T18:56:46Z) - Deep Class-Specific Affinity-Guided Convolutional Network for Multimodal
Unpaired Image Segmentation [7.021001169318551]
Multi-modal medical image segmentation plays an essential role in clinical diagnosis.
It remains challenging as the input modalities are often not well-aligned spatially.
We propose an affinity-guided fully convolutional network for multimodal image segmentation.
arXiv Detail & Related papers (2021-01-05T13:56:51Z) - ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image
Classification [49.87503122462432]
We introduce a novel neural network termed Relation-and-Margin learning Network (ReMarNet)
Our method assembles two networks of different backbones so as to learn the features that can perform excellently in both of the aforementioned two classification mechanisms.
Experiments on four image datasets demonstrate that our approach is effective in learning discriminative features from a small set of labeled samples.
arXiv Detail & Related papers (2020-06-27T13:50:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.