Differentiable Adaptive Computation Time for Visual Reasoning
- URL: http://arxiv.org/abs/2004.12770v3
- Date: Fri, 22 May 2020 16:57:14 GMT
- Title: Differentiable Adaptive Computation Time for Visual Reasoning
- Authors: Cristobal Eyzaguirre, Alvaro Soto
- Abstract summary: This paper presents a novel attention-based algorithm for achieving adaptive computation called DACT.
In particular, we study its application to the widely known MAC architecture.
We show that by increasing the maximum number of steps used, we surpass the accuracy of even our best non-adaptive MAC in the CLEVR dataset.
- Score: 4.7518908453572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel attention-based algorithm for achieving adaptive
computation called DACT, which, unlike existing ones, is end-to-end
differentiable. Our method can be used in conjunction with many networks; in
particular, we study its application to the widely known MAC architecture,
obtaining a significant reduction in the number of recurrent steps needed to
achieve similar accuracies, therefore improving its performance to computation
ratio. Furthermore, we show that by increasing the maximum number of steps
used, we surpass the accuracy of even our best non-adaptive MAC in the CLEVR
dataset, demonstrating that our approach is able to control the number of steps
without significant loss of performance. Additional advantages provided by our
approach include considerably improving interpretability by discarding useless
steps and providing more insights into the underlying reasoning process.
Finally, we present adaptive computation as an equivalent to an ensemble of
models, similar to a mixture of expert formulation. Both the code and the
configuration files for our experiments are made available to support further
research in this area.
Related papers
- Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - Efficient Computation of Sparse and Robust Maximum Association
Estimators [0.5156484100374059]
High-dimensional empirical examples underline the usefulness of this procedure.
A combination of Lagrangian algorithm and sparse descent is implemented to also include suitable constraints for inducing sparse sparsity.
arXiv Detail & Related papers (2023-11-29T11:57:50Z) - Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance.
Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Federated Learning via Inexact ADMM [46.99210047518554]
In this paper, we develop an inexact alternating direction method of multipliers (ADMM)
It is both- and communication-efficient, capable of combating the stragglers' effect, and convergent under mild conditions.
It has a high numerical performance compared with several state-of-the-art algorithms for federated learning.
arXiv Detail & Related papers (2022-04-22T09:55:33Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient
Descent [8.714458129632158]
Kolmogorov model (KM) is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables.
We propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method.
It is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80%$.
arXiv Detail & Related papers (2021-07-11T10:33:02Z) - Efficient Learning of Generative Models via Finite-Difference Score
Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference.
Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z) - Adaptive Discretization for Model-Based Reinforcement Learning [10.21634042036049]
We introduce the technique of adaptive discretization to design an efficient model-based episodic reinforcement learning algorithm.
Our algorithm is based on optimistic one-step value iteration extended to maintain an adaptive discretization of the space.
arXiv Detail & Related papers (2020-07-01T19:36:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.