Related papers: Neural Lattice Reduction: A Self-Supervised Geometric Deep Learning Approach

Neural Lattice Reduction: A Self-Supervised Geometric Deep Learning Approach

URL: http://arxiv.org/abs/2311.08170v2
Date: Mon, 10 Feb 2025 13:57:58 GMT
Title: Neural Lattice Reduction: A Self-Supervised Geometric Deep Learning Approach
Authors: Giovanni Luca Marchetti, Gabriele Cesa, Pratik Kumar, Arash Behboodi,
Abstract summary: We show that it is possible to parametrize the algorithm space for lattice reduction problem with neural networks and find an algorithm without supervised data.<n>We design a deep neural model outputting factorized unimodular matrices and train it in a self-supervised manner by penalizing non-orthogonal lattice bases.<n>We show that this approach yields an algorithm with comparable complexity and performance to the Lenstra-Lenstra-Lov'asz algorithm on a set of benchmarks.
Score: 12.679411410749521
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Lattice reduction is a combinatorial optimization problem aimed at finding the most orthogonal basis in a given lattice. The Lenstra-Lenstra-Lov\'asz (LLL) algorithm is the best algorithm in the literature for solving this problem. In light of recent research on algorithm discovery, in this work, we would like to answer this question: is it possible to parametrize the algorithm space for lattice reduction problem with neural networks and find an algorithm without supervised data? Our strategy is to use equivariant and invariant parametrizations and train in a self-supervised way. We design a deep neural model outputting factorized unimodular matrices and train it in a self-supervised manner by penalizing non-orthogonal lattice bases. We incorporate the symmetries of lattice reduction into the model by making it invariant to isometries and scaling of the ambient space and equivariant with respect to the hyperocrahedral group permuting and flipping the lattice basis elements. We show that this approach yields an algorithm with comparable complexity and performance to the LLL algorithm on a set of benchmarks. Additionally, motivated by certain applications for wireless communication, we extend our method to a convolutional architecture which performs joint reduction of spatially-correlated lattices arranged in a grid, thereby amortizing its cost over multiple lattices.

Related papers

An Efficient Alternating Algorithm for ReLU-based Symmetric Matrix Decomposition [0.0]
This paper focuses on exploiting the low-rank structure of non-negative and sparse matrices via the rectified linear unit (ReLU) activation function. We propose the ReLU-based nonlinear symmetric matrix decomposition (ReLU-NSMD) model, introduce an accelerated alternating partial Bregman (AAPB) method for its solution, and present the algorithm's convergence results.
arXiv Detail & Related papers (2025-03-21T04:32:53Z)
Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks [15.074950361970194]
We provide a unified analysis for a family of algorithms that encompasses IRLS, the recently proposed linlin-RFM algorithm, and the alternating diagonal neural networks. We show that, with appropriately chosen reweighting policy, a handful of sparse structures can achieve favorable performance. We also show that leveraging this in the reweighting scheme provably improves test error compared to coordinate-wise reweighting.
arXiv Detail & Related papers (2024-06-04T20:37:17Z)
ADMM-MM Algorithm for General Tensor Decomposition [7.0326155922512275]
The proposed algorithm supports three basic loss functions ($ell$-loss, $ell$-loss and KL divergence) and various low-rank tensor decomposition models (CP, Tucker, TT, and TR decompositions) We show that wide-range applications can be solved by the proposed algorithm, and can be easily extended to any established tensor decomposition models in a plug-and-play manner.
arXiv Detail & Related papers (2023-12-19T00:17:34Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints [9.301728976515255]
This article provides new practical and theoretical developments for the landing algorithm. First, the method is extended to the Stiefel manifold. We also consider variance reduction algorithms when the cost function is an average of many functions.
arXiv Detail & Related papers (2023-03-29T07:36:54Z)
Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets. We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z)
Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning [24.97120654216651]
We show how to solve difficult differential equations with momentum on a submanifold. We do so by proposing a generalized version of the Riemannian normal coordinates. We use our approach to simplify existing approaches for structured covariances and develop matrix-inverse-free $2textnd$orders for deep learning with low precision by using only matrix multiplications.
arXiv Detail & Related papers (2023-02-20T03:31:11Z)
Implicit Regularization for Group Sparsity [33.487964460794764]
We show that gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. We analyze the gradient dynamics of the corresponding regression problem in the general noise setting and obtain minimax-optimal error rates. In the degenerate case of size-one groups, our approach gives rise to a new algorithm for sparse linear regression.
arXiv Detail & Related papers (2023-01-29T20:54:03Z)
A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z)
Adaptive Stochastic Optimisation of Nonconvex Composite Objectives [2.1700203922407493]
We propose and analyse a family of generalised composite mirror descent algorithms. With adaptive step sizes, the proposed algorithms converge without requiring prior knowledge of the problem. We exploit the low-dimensional structure of the decision sets for high-dimensional problems.
arXiv Detail & Related papers (2022-11-21T18:31:43Z)
Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints. The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution. We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z)
Graph Polynomial Convolution Models for Node Classification of Non-Homophilous Graphs [52.52570805621925]
We investigate efficient learning from higher-order graph convolution and learning directly from adjacency matrix for node classification. We show that the resulting model lead to new graphs and residual scaling parameter. We demonstrate that the proposed methods obtain improved accuracy for node-classification of non-homophilous parameters.
arXiv Detail & Related papers (2022-09-12T04:46:55Z)
The Dynamics of Riemannian Robbins-Monro Algorithms [101.29301565229265]
We propose a family of Riemannian algorithms generalizing and extending the seminal approximation framework of Robbins and Monro. Compared to their Euclidean counterparts, Riemannian algorithms are much less understood due to lack of a global linear structure on the manifold. We provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes.
arXiv Detail & Related papers (2022-06-14T12:30:11Z)
Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix. Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z)
An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural Network with Group Sparsity [13.27709100571336]
A leaky ReLU network with a group regularization term has been widely used in the recent years. We show that there is a lack of approaches to compute a stationary point deterministically. We propose an inexact augmented Lagrangian algorithm for solving the new model.
arXiv Detail & Related papers (2022-05-11T11:53:15Z)
Nonlinear matrix recovery using optimization on the Grassmann manifold [18.655422834567577]
We investigate the problem of recovering a partially observed high-rank clustering matrix whose columns obey a nonlinear structure such as a union of subspaces. We show that the alternating limit converges to a unique point using the Kurdyka-Lojasi property.
arXiv Detail & Related papers (2021-09-13T16:13:13Z)
Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph. Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
A Parallelizable Lattice Rescoring Strategy with Neural Language Models [62.20538383769179]
A posterior-based lattice expansion algorithm is proposed for efficient lattice rescoring with neural language models (LMs) for automatic speech recognition. Experiments on the Switchboard dataset show that the proposed rescoring strategy obtains comparable recognition performance. The parallel rescoring method offers more flexibility by simplifying the integration of PyTorch-trained neural LMs for lattice rescoring with Kaldi.
arXiv Detail & Related papers (2021-03-08T21:23:12Z)
Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems. We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms. Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.