DiverseNet: When One Right Answer is not Enough
- URL: http://arxiv.org/abs/2008.10634v1
- Date: Mon, 24 Aug 2020 18:12:49 GMT
- Title: DiverseNet: When One Right Answer is not Enough
- Authors: Michael Firman, Neill D. F. Campbell, Lourdes Agapito, Gabriel J.
Brostow
- Abstract summary: We introduce a simple method for training a neural network, which enables diverse structured predictions to be made for each test-time query.
Our method results in quantitative improvements across three challenging tasks: 2D image completion, 3D volume estimation, and flow prediction.
- Score: 35.764028730120096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many structured prediction tasks in machine vision have a collection of
acceptable answers, instead of one definitive ground truth answer. Segmentation
of images, for example, is subject to human labeling bias. Similarly, there are
multiple possible pixel values that could plausibly complete occluded image
regions. State-of-the art supervised learning methods are typically optimized
to make a single test-time prediction for each query, failing to find other
modes in the output space. Existing methods that allow for sampling often
sacrifice speed or accuracy.
We introduce a simple method for training a neural network, which enables
diverse structured predictions to be made for each test-time query. For a
single input, we learn to predict a range of possible answers. We compare
favorably to methods that seek diversity through an ensemble of networks. Such
stochastic multiple choice learning faces mode collapse, where one or more
ensemble members fail to receive any training signal. Our best performing
solution can be deployed for various tasks, and just involves small
modifications to the existing single-mode architecture, loss function, and
training regime. We demonstrate that our method results in quantitative
improvements across three challenging tasks: 2D image completion, 3D volume
estimation, and flow prediction.
Related papers
- Customize Your Own Paired Data via Few-shot Way [14.193031218059646]
Some supervised methods require huge amounts of paired training data, which greatly limits their usages.
The other unsupervised methods take full advantage of large-scale pre-trained priors, thus being strictly restricted to the domains where the priors are trained on and behaving badly in out-of-distribution cases.
In our proposed framework, a novel few-shot learning mechanism based on the directional transformations among samples is introduced and expands the learnable space exponentially.
arXiv Detail & Related papers (2024-05-21T04:21:35Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Zero-Episode Few-Shot Contrastive Predictive Coding: Solving
intelligence tests without prior training [0.0]
We argue that finding a predictive latent variable and using it to evaluate the consistency of a future image enables data-efficient predictions.
We show that a one-dimensional Markov Contrastive Predictive Coding model solves sequence completion intelligence tests efficiently.
arXiv Detail & Related papers (2022-05-04T07:46:03Z) - Enhanced Performance of Pre-Trained Networks by Matched Augmentation
Distributions [10.74023489125222]
We propose a simple solution to address the train-test distributional shift.
We combine results for multiple random crops for a test image.
This not only matches the train time augmentation but also provides the full coverage of the input image.
arXiv Detail & Related papers (2022-01-19T22:33:00Z) - Thought Flow Nets: From Single Predictions to Trains of Model Thought [39.619001911390804]
When humans solve complex problems, they rarely come up with a decision right-away.
Instead, they start with an intuitive decision reflecting upon it, spot mistakes, resolve contradictions and jump between different hypotheses.
arXiv Detail & Related papers (2021-07-26T13:56:37Z) - Warp Consistency for Unsupervised Learning of Dense Correspondences [116.56251250853488]
Key challenge in learning dense correspondences is lack of ground-truth matches for real image pairs.
We propose Warp Consistency, an unsupervised learning objective for dense correspondence regression.
Our approach sets a new state-of-the-art on several challenging benchmarks, including MegaDepth, RobotCar and TSS.
arXiv Detail & Related papers (2021-04-07T17:58:22Z) - MT3: Meta Test-Time Training for Self-Supervised Test-Time Adaption [69.76837484008033]
An unresolved problem in Deep Learning is the ability of neural networks to cope with domain shifts during test-time.
We combine meta-learning, self-supervision and test-time training to learn to adapt to unseen test distributions.
Our approach significantly improves the state-of-the-art results on the CIFAR-10-Corrupted image classification benchmark.
arXiv Detail & Related papers (2021-03-30T09:33:38Z) - Recurrent Multi-view Alignment Network for Unsupervised Surface
Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations.
We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z) - Diverse Plausible Shape Completions from Ambiguous Depth Images [7.652701739127332]
PSSNet is a network architecture for generating plausible 3D reconstructions from a single 2.5D depth image.
We perform experiments using Shapenet mugs and partially-occluded YCB objects and find that our method performs comparably in datasets with little ambiguity.
arXiv Detail & Related papers (2020-11-18T16:42:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.