Related papers: Differentiable Adaptive Computation Time for Visual Reasoning

Differentiable Adaptive Computation Time for Visual Reasoning

URL: http://arxiv.org/abs/2004.12770v3
Date: Fri, 22 May 2020 16:57:14 GMT
Title: Differentiable Adaptive Computation Time for Visual Reasoning
Authors: Cristobal Eyzaguirre, Alvaro Soto
Abstract summary: This paper presents a novel attention-based algorithm for achieving adaptive computation called DACT. In particular, we study its application to the widely known MAC architecture. We show that by increasing the maximum number of steps used, we surpass the accuracy of even our best non-adaptive MAC in the CLEVR dataset.
Score: 4.7518908453572
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a novel attention-based algorithm for achieving adaptive computation called DACT, which, unlike existing ones, is end-to-end differentiable. Our method can be used in conjunction with many networks; in particular, we study its application to the widely known MAC architecture, obtaining a significant reduction in the number of recurrent steps needed to achieve similar accuracies, therefore improving its performance to computation ratio. Furthermore, we show that by increasing the maximum number of steps used, we surpass the accuracy of even our best non-adaptive MAC in the CLEVR dataset, demonstrating that our approach is able to control the number of steps without significant loss of performance. Additional advantages provided by our approach include considerably improving interpretability by discarding useless steps and providing more insights into the underlying reasoning process. Finally, we present adaptive computation as an equivalent to an ensemble of models, similar to a mixture of expert formulation. Both the code and the configuration files for our experiments are made available to support further research in this area.

Related papers

Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
Gradient Descent Efficiency Index [0.0]
This study introduces a new efficiency metric, Ek, designed to quantify the effectiveness of each iteration. The proposed metric accounts for both the relative change in error and the stability of the loss function across iterations. Ek has the potential to guide more informed decisions in the selection and tuning of optimization algorithms in machine learning applications.
arXiv Detail & Related papers (2024-10-25T10:22:22Z)
Efficient Computation of Sparse and Robust Maximum Association Estimators [0.5156484100374059]
High-dimensional empirical examples underline the usefulness of this procedure. A combination of Lagrangian algorithm and sparse descent is implemented to also include suitable constraints for inducing sparse sparsity.
arXiv Detail & Related papers (2023-11-29T11:57:50Z)
Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Federated Learning via Inexact ADMM [46.99210047518554]
In this paper, we develop an inexact alternating direction method of multipliers (ADMM) It is both- and communication-efficient, capable of combating the stragglers' effect, and convergent under mild conditions. It has a high numerical performance compared with several state-of-the-art algorithms for federated learning.
arXiv Detail & Related papers (2022-04-22T09:55:33Z)
Large-scale Optimization of Partial AUC in a Range of False Positive Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning. We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique. Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z)
Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient Descent [8.714458129632158]
Kolmogorov model (KM) is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables. We propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method. It is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80%$.
arXiv Detail & Related papers (2021-07-11T10:33:02Z)
Efficient Learning of Generative Models via Finite-Difference Score Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)
Adaptive Discretization for Model-Based Reinforcement Learning [10.21634042036049]
We introduce the technique of adaptive discretization to design an efficient model-based episodic reinforcement learning algorithm. Our algorithm is based on optimistic one-step value iteration extended to maintain an adaptive discretization of the space.
arXiv Detail & Related papers (2020-07-01T19:36:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.