Score-Based Methods for Discrete Optimization in Deep Learning
- URL: http://arxiv.org/abs/2310.09890v1
- Date: Sun, 15 Oct 2023 17:14:17 GMT
- Title: Score-Based Methods for Discrete Optimization in Deep Learning
- Authors: Eric Lei, Arman Adibi, Hamed Hassani
- Abstract summary: We investigate a score-based approximation framework to solve such problems.
We experimentally demonstrate, in adversarial set classification tasks, that our method achieves a superior trade-off in terms of speed and solution quality compared to methods.
- Score: 30.446056972242616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discrete optimization problems often arise in deep learning tasks, despite
the fact that neural networks typically operate on continuous data. One class
of these problems involve objective functions which depend on neural networks,
but optimization variables which are discrete. Although the discrete
optimization literature provides efficient algorithms, they are still
impractical in these settings due to the high cost of an objective function
evaluation, which involves a neural network forward-pass. In particular, they
require $O(n)$ complexity per iteration, but real data such as point clouds
have values of $n$ in thousands or more. In this paper, we investigate a
score-based approximation framework to solve such problems. This framework uses
a score function as a proxy for the marginal gain of the objective, leveraging
embeddings of the discrete variables and speed of auto-differentiation
frameworks to compute backward-passes in parallel. We experimentally
demonstrate, in adversarial set classification tasks, that our method achieves
a superior trade-off in terms of speed and solution quality compared to
heuristic methods.
Related papers
- Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA)
Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z) - Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - Multilevel Objective-Function-Free Optimization with an Application to
Neural Networks Training [0.0]
A class of multi-level algorithms for unconstrained nonlinear optimization is presented.
The choice of avoiding the evaluation of the objective function is intended to make the algorithms less sensitive to noise.
The evaluation complexity of these algorithms is analyzed and their behaviour in the presence of noise is illustrated.
arXiv Detail & Related papers (2023-02-14T14:03:22Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Training Neural Networks using SAT solvers [1.0152838128195465]
We propose an algorithm to explore the global optimisation method, using SAT solvers, for training a neural net.
In the experiments, we demonstrate the effectiveness of our algorithm against the ADAM optimiser in certain tasks like parity learning.
arXiv Detail & Related papers (2022-06-10T01:31:12Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - A Sparse Structure Learning Algorithm for Bayesian Network
Identification from Discrete High-Dimensional Data [0.40611352512781856]
This paper addresses the problem of learning a sparse structure Bayesian network from high-dimensional discrete data.
We propose a score function that satisfies the sparsity and the DAG property simultaneously.
Specifically, we use a variance reducing method in our optimization algorithm to make the algorithm work efficiently in high-dimensional data.
arXiv Detail & Related papers (2021-08-21T12:21:01Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Learning with Differentiable Perturbed Optimizers [54.351317101356614]
We propose a systematic method to transform operations into operations that are differentiable and never locally constant.
Our approach relies on perturbeds, and can be used readily together with existing solvers.
We show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks.
arXiv Detail & Related papers (2020-02-20T11:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.