One Size Fits All: Can We Train One Denoiser for All Noise Levels?
- URL: http://arxiv.org/abs/2005.09627v3
- Date: Thu, 16 Jul 2020 20:25:19 GMT
- Title: One Size Fits All: Can We Train One Denoiser for All Noise Levels?
- Authors: Abhiram Gnansambandam, Stanley H. Chan
- Abstract summary: It is often preferred to train one neural network estimator and apply it to all noise levels.
The de facto protocol is to train the estimator with noisy samples whose noise are uniformly distributed.
This paper addresses the sample problem from a minimax risk optimization perspective.
- Score: 13.46272057205994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When training an estimator such as a neural network for tasks like image
denoising, it is often preferred to train one estimator and apply it to all
noise levels. The de facto training protocol to achieve this goal is to train
the estimator with noisy samples whose noise levels are uniformly distributed
across the range of interest. However, why should we allocate the samples
uniformly? Can we have more training samples that are less noisy, and fewer
samples that are more noisy? What is the optimal distribution? How do we obtain
such a distribution? The goal of this paper is to address this training sample
distribution problem from a minimax risk optimization perspective. We derive a
dual ascent algorithm to determine the optimal sampling distribution of which
the convergence is guaranteed as long as the set of admissible estimators is
closed and convex. For estimators with non-convex admissible sets such as deep
neural networks, our dual formulation converges to a solution of the convex
relaxation. We discuss how the algorithm can be implemented in practice. We
evaluate the algorithm on linear estimators and deep networks.
Related papers
- Efficient Neural Network Training via Subset Pretraining [5.352839075466439]
In training neural networks, it is common practice to use partial gradients computed over batches.
The loss minimum of the training set can be expected to be well-approximated by the minima of its subsets.
experiments have confirmed that results equivalent to conventional training can be reached.
arXiv Detail & Related papers (2024-10-21T21:31:12Z) - The Sampling-Gaussian for stereo matching [7.9898209414259425]
The soft-argmax operation is widely adopted in neural network-based stereo matching methods.
Previous methods failed to effectively improve the accuracy and even compromises the efficiency of the network.
We propose a novel supervision method for stereo matching, Sampling-Gaussian.
arXiv Detail & Related papers (2024-10-09T03:57:13Z) - Learning conditional distributions on continuous spaces [0.0]
We investigate sample-based learning of conditional distributions on multi-dimensional unit boxes.
We employ two distinct clustering schemes: one based on a fixed-radius ball and the other on nearest neighbors.
We propose to incorporate the nearest neighbors method into neural network training, as our empirical analysis indicates it has better performance in practice.
arXiv Detail & Related papers (2024-06-13T17:53:47Z) - Distributed Extra-gradient with Optimal Complexity and Communication
Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors.
Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient.
We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Communication-Efficient Sampling for Distributed Training of Graph
Convolutional Networks [3.075766050800645]
Training Graph Convolutional Networks (GCNs) is expensive as it needs to aggregate data from neighboring nodes.
Previous works have proposed various neighbor sampling methods that estimate the aggregation result based on a small number of sampled neighbors.
We present an algorithm that determines the local sampling probabilities and makes sure our skewed neighbor sampling does not affect much the convergence of the training.
arXiv Detail & Related papers (2021-01-19T16:12:44Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.