Related papers: DiverseNet: When One Right Answer is not Enough

DiverseNet: When One Right Answer is not Enough

URL: http://arxiv.org/abs/2008.10634v1
Date: Mon, 24 Aug 2020 18:12:49 GMT
Title: DiverseNet: When One Right Answer is not Enough
Authors: Michael Firman, Neill D. F. Campbell, Lourdes Agapito, Gabriel J. Brostow
Abstract summary: We introduce a simple method for training a neural network, which enables diverse structured predictions to be made for each test-time query. Our method results in quantitative improvements across three challenging tasks: 2D image completion, 3D volume estimation, and flow prediction.
Score: 35.764028730120096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many structured prediction tasks in machine vision have a collection of acceptable answers, instead of one definitive ground truth answer. Segmentation of images, for example, is subject to human labeling bias. Similarly, there are multiple possible pixel values that could plausibly complete occluded image regions. State-of-the art supervised learning methods are typically optimized to make a single test-time prediction for each query, failing to find other modes in the output space. Existing methods that allow for sampling often sacrifice speed or accuracy. We introduce a simple method for training a neural network, which enables diverse structured predictions to be made for each test-time query. For a single input, we learn to predict a range of possible answers. We compare favorably to methods that seek diversity through an ensemble of networks. Such stochastic multiple choice learning faces mode collapse, where one or more ensemble members fail to receive any training signal. Our best performing solution can be deployed for various tasks, and just involves small modifications to the existing single-mode architecture, loss function, and training regime. We demonstrate that our method results in quantitative improvements across three challenging tasks: 2D image completion, 3D volume estimation, and flow prediction.

Related papers

Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC [77.8851460746251]
We propose a simple, efficient, and general approach to fine-tune diffusion models. ONE-PIC enhances the inherited generative ability in the pretrained diffusion models without introducing additional modules. Our method is simple and efficient which streamlines the adaptation process and achieves excellent performance with lower costs.
arXiv Detail & Related papers (2024-12-07T11:19:32Z)
One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding. It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps. OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z)
SEMPose: A Single End-to-end Network for Multi-object Pose Estimation [13.131534219937533]
SEMPose is an end-to-end multi-object pose estimation network. It can perform inference at 32 FPS without requiring inputs other than the RGB image. It can accurately estimate the poses of multiple objects in real time, with inference time unaffected by the number of target objects.
arXiv Detail & Related papers (2024-11-21T10:37:54Z)
Customize Your Own Paired Data via Few-shot Way [14.193031218059646]
Some supervised methods require huge amounts of paired training data, which greatly limits their usages. The other unsupervised methods take full advantage of large-scale pre-trained priors, thus being strictly restricted to the domains where the priors are trained on and behaving badly in out-of-distribution cases. In our proposed framework, a novel few-shot learning mechanism based on the directional transformations among samples is introduced and expands the learnable space exponentially.
arXiv Detail & Related papers (2024-05-21T04:21:35Z)
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks. We propose a single-stage and standalone method, MOCA, which unifies both desired properties. We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z)
Enhanced Performance of Pre-Trained Networks by Matched Augmentation Distributions [10.74023489125222]
We propose a simple solution to address the train-test distributional shift. We combine results for multiple random crops for a test image. This not only matches the train time augmentation but also provides the full coverage of the input image.
arXiv Detail & Related papers (2022-01-19T22:33:00Z)
Thought Flow Nets: From Single Predictions to Trains of Model Thought [39.619001911390804]
When humans solve complex problems, they rarely come up with a decision right-away. Instead, they start with an intuitive decision reflecting upon it, spot mistakes, resolve contradictions and jump between different hypotheses.
arXiv Detail & Related papers (2021-07-26T13:56:37Z)
Warp Consistency for Unsupervised Learning of Dense Correspondences [116.56251250853488]
Key challenge in learning dense correspondences is lack of ground-truth matches for real image pairs. We propose Warp Consistency, an unsupervised learning objective for dense correspondence regression. Our approach sets a new state-of-the-art on several challenging benchmarks, including MegaDepth, RobotCar and TSS.
arXiv Detail & Related papers (2021-04-07T17:58:22Z)
MT3: Meta Test-Time Training for Self-Supervised Test-Time Adaption [69.76837484008033]
An unresolved problem in Deep Learning is the ability of neural networks to cope with domain shifts during test-time. We combine meta-learning, self-supervision and test-time training to learn to adapt to unseen test distributions. Our approach significantly improves the state-of-the-art results on the CIFAR-10-Corrupted image classification benchmark.
arXiv Detail & Related papers (2021-03-30T09:33:38Z)
Recurrent Multi-view Alignment Network for Unsupervised Surface Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data. We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations. We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.