NeurIPS 2019 Disentanglement Challenge: Improved Disentanglement through
Learned Aggregation of Convolutional Feature Maps
- URL: http://arxiv.org/abs/2002.12356v2
- Date: Tue, 17 Nov 2020 15:16:35 GMT
- Title: NeurIPS 2019 Disentanglement Challenge: Improved Disentanglement through
Learned Aggregation of Convolutional Feature Maps
- Authors: Maximilian Seitzer, Andreas Foltyn, Felix P. Kemeth
- Abstract summary: We propose to train a variational autoencoder on regionally aggregated feature maps obtained from networks pretrained on the ImageNet database.
Our approach achieved the 2nd place in stage 2 of the NeurIPS 2019 disentanglement challenge.
- Score: 2.8360662552057323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This report to our stage 2 submission to the NeurIPS 2019 disentanglement
challenge presents a simple image preprocessing method for learning
disentangled latent factors. We propose to train a variational autoencoder on
regionally aggregated feature maps obtained from networks pretrained on the
ImageNet database, utilizing the implicit inductive bias contained in those
features for disentanglement. This bias can be further enhanced by explicitly
fine-tuning the feature maps on auxiliary tasks useful for the challenge, such
as angle, position estimation, or color classification. Our approach achieved
the 2nd place in stage 2 of the challenge. Code is available at
https://github.com/mseitzer/neurips2019-disentanglement-challenge.
Related papers
- P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising [81.92854168911704]
We tackle the task of point cloud denoising through a novel framework that adapts Diffusion Schr"odinger bridges to points clouds.
Experiments on object datasets show that P2P-Bridge achieves significant improvements over existing methods.
arXiv Detail & Related papers (2024-08-29T08:00:07Z) - Learning to Search Feasible and Infeasible Regions of Routing Problems
with Flexible Neural k-Opt [30.510841880901655]
We present Neural k-Opt (NeuOpt), a novel learning-to-search (L2S) solver for routing problems.
It learns to perform flexible k-opt exchanges based on a tailored action factorization method and a customized recurrent dual-stream decoder.
arXiv Detail & Related papers (2023-10-27T16:51:41Z) - N2V2 -- Fixing Noise2Void Checkerboard Artifacts with Modified Sampling
Strategies and a Tweaked Network Architecture [66.03918859810022]
We present two modifications to the vanilla N2V setup that both help to reduce the unwanted artifacts considerably.
We validate our modifications on a range of microscopy and natural image data.
arXiv Detail & Related papers (2022-11-15T21:12:09Z) - TSG: Target-Selective Gradient Backprop for Probing CNN Visual Saliency [72.9106103283475]
We study the visual saliency, a.k.a. visual explanation, to interpret convolutional neural networks.
Inspired by those observations, we propose a novel visual saliency framework, termed Target-Selective Gradient (TSG) backprop.
The proposed TSG consists of two components, namely, TSG-Conv and TSG-FC, which rectify the gradients for convolutional layers and fully-connected layers, respectively.
arXiv Detail & Related papers (2021-10-11T12:00:20Z) - An audiovisual and contextual approach for categorical and continuous
emotion recognition in-the-wild [27.943550651941166]
We tackle the task of video-based audio-visual emotion recognition, within the premises of the 2nd Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW)
Standard methodologies that rely solely on the extraction of facial features often fall short of accurate emotion prediction in cases where the aforementioned source of affective information is inaccessible due to head/body orientation, low resolution and poor illumination.
We aspire to alleviate this problem by leveraging bodily as well as contextual features, as part of a broader emotion recognition framework.
arXiv Detail & Related papers (2021-07-07T20:13:17Z) - Learning Semantic Person Image Generation by Region-Adaptive
Normalization [81.52223606284443]
We propose a new two-stage framework to handle the pose and appearance translation.
In the first stage, we predict the target semantic parsing maps to eliminate the difficulties of pose transfer.
In the second stage, we suggest a new person image generation method by incorporating the region-adaptive normalization.
arXiv Detail & Related papers (2021-04-14T06:51:37Z) - Dual-Level Collaborative Transformer for Image Captioning [126.59298716978577]
We introduce a novel Dual-Level Collaborative Transformer (DLCT) network to realize the complementary advantages of the two features.
In addition, we propose a Locality-Constrained Cross Attention module to address the semantic noises caused by the direct fusion of these two features.
arXiv Detail & Related papers (2021-01-16T15:43:17Z) - RESA: Recurrent Feature-Shift Aggregator for Lane Detection [32.246537653680484]
We present a novel module named REcurrent Feature-Shift Aggregator (RESA) to enrich lane feature after preliminary feature extraction with an ordinary CNN.
RESA can conjecture lanes accurately in challenging scenarios with weak appearance clues by aggregating sliced feature map.
Our method achieves state-of-the-art results on two popular lane detection benchmarks (CULane and Tusimple)
arXiv Detail & Related papers (2020-08-31T16:37:30Z) - 1st Place Solutions for OpenImage2019 -- Object Detection and Instance
Segmentation [116.25081559037872]
This article introduces the solutions of the two champion teams, MMfruit' for the detection track and MMfruitSeg' for the segmentation track, in OpenImage Challenge 2019.
It is commonly known that for an object detector, the shared feature at the end of the backbone is not appropriate for both classification and regression.
We propose the Decoupling Head (DH) to disentangle the object classification and regression via the self-learned optimal feature extraction.
arXiv Detail & Related papers (2020-03-17T06:45:07Z) - NeurIPS 2019 Disentanglement Challenge: Improved Disentanglement through
Aggregated Convolutional Feature Maps [2.538209532048867]
This report presents a simple image preprocessing method for training VAEs.
In particular, we propose to use regionally aggregated feature maps extracted from CNNs pretrained on ImageNet.
Our method achieved the 2nd place in stage 1 of the NeurIPS 2019 disentanglement challenge.
arXiv Detail & Related papers (2020-02-23T22:35:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.