Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and
Novel-View Synthesis
- URL: http://arxiv.org/abs/2008.06981v2
- Date: Tue, 6 Apr 2021 19:18:51 GMT
- Title: Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and
Novel-View Synthesis
- Authors: Zhipeng Bao, Yu-Xiong Wang and Martial Hebert
- Abstract summary: We propose a novel task of joint few-shot recognition and novel-view synthesis.
We aim to simultaneously learn an object classifier and generate images of that type of object from new viewpoints.
We focus on the interaction and cooperation between a generative model and a discriminative model.
- Score: 39.53519330457627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel task of joint few-shot recognition and novel-view
synthesis: given only one or few images of a novel object from arbitrary views
with only category annotation, we aim to simultaneously learn an object
classifier and generate images of that type of object from new viewpoints.
While existing work copes with two or more tasks mainly by multi-task learning
of shareable feature representations, we take a different perspective. We focus
on the interaction and cooperation between a generative model and a
discriminative model, in a way that facilitates knowledge to flow across tasks
in complementary directions. To this end, we propose bowtie networks that
jointly learn 3D geometric and semantic representations with a feedback loop.
Experimental evaluation on challenging fine-grained recognition datasets
demonstrates that our synthesized images are realistic from multiple viewpoints
and significantly improve recognition performance as ways of data augmentation,
especially in the low-data regime. Code and pre-trained models are released at
https://github.com/zpbao/bowtie_networks.
Related papers
- Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs [57.492124844326206]
This work delves into the task of pose-free novel view synthesis from stereo pairs, a challenging and pioneering task in 3D vision.
Our innovative framework, unlike any before, seamlessly integrates 2D correspondence matching, camera pose estimation, and NeRF rendering, fostering a synergistic enhancement of these tasks.
arXiv Detail & Related papers (2023-12-12T13:22:44Z) - Sample-Efficient Learning of Novel Visual Concepts [7.398195748292981]
State-of-the-art deep learning models struggle to recognize novel objects in a few-shot setting.
We show that incorporating a symbolic knowledge graph into a state-of-the-art recognition model enables a new approach for effective few-shot classification.
arXiv Detail & Related papers (2023-06-15T20:24:30Z) - ImageBind: One Embedding Space To Bind Them All [41.46167013891263]
ImageBind is an approach to learn a joint embedding across six different modalities.
We show that only image-paired data is sufficient to bind the modalities together.
arXiv Detail & Related papers (2023-05-09T17:59:07Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - AutoRF: Learning 3D Object Radiance Fields from Single View Observations [17.289819674602295]
AutoRF is a new approach for learning neural 3D object representations where each object in the training set is observed by only a single view.
We show that our method generalizes well to unseen objects, even across different datasets of challenging real-world street scenes.
arXiv Detail & Related papers (2022-04-07T17:13:39Z) - Deep Contrastive Learning for Multi-View Network Embedding [20.035449838566503]
Multi-view network embedding aims at projecting nodes in the network to low-dimensional vectors.
Most contrastive learning-based methods mostly rely on high-quality graph embedding.
We design a novel node-to-node Contrastive learning framework for Multi-view network Embedding (CREME)
arXiv Detail & Related papers (2021-08-16T06:29:18Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition [86.31412529187243]
Few-shot video recognition aims at learning new actions with only very few labeled samples.
We propose a depth guided Adaptive Meta-Fusion Network for few-shot video recognition which is termed as AMeFu-Net.
arXiv Detail & Related papers (2020-10-20T03:06:20Z) - Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot
Learning [21.89909688056478]
We propose a new two-level joint idea to augment the generative network with an inference network during training.
This provides strong cross-modal interaction for effective transfer of knowledge between visual and semantic domains.
We evaluate our approach on four benchmark datasets against several state-of-the-art methods, and show its performance.
arXiv Detail & Related papers (2020-07-15T15:34:09Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.