Related papers: The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI

The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI

URL: http://arxiv.org/abs/2410.18441v1
Date: Thu, 24 Oct 2024 05:29:20 GMT
Title: The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI
Authors: Fulu Li,
Abstract summary: We explore and discuss some potential further enhancement for current state of the art methods for some key underlying technologies of generative AI models. In particular, we present an optimal solution for sub-word encoding (SWE) based on similar initial settings as that of byte-pair encoding (BPE) algorithm.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we give an in-depth analysis on the mathematical problem formulations and the probabilistic optimization explorations for some of the key components in Transformer model [33] in the field of generative AI. We explore and discuss some potential further enhancement for current state of the art methods for some key underlying technologies of generative AI models from algorithmic and probabilistic optimization perspective. In particular, we present an optimal solution for sub-word encoding (SWE) based on similar initial settings as that of byte-pair encoding (BPE) algorithm in [9] with similar objectives as that of WordPiece approach in [28, 31] to maximize the likelihood of the training data. We also present cross entropy optimization method to optimize hyperparameters for word2vec model [17]. In addition, we propose a factored combination of rotary positional encoding (RoPE) [32] and attention with linear biases (ALiBi) [23] with a harmonic series. We also present a probabilistic FlashAttention [6, 7] (PrFlashAttention) method with a probability distribution over block distances in the matrix to decide which block is likely to participate in a given round of attention computation while maintaining the lower triangle shape of the tensor for autoregressive language models by re-shaping the tensors. Finally, we present staircase adaptive quantization (SAQ) of key-value (KV) cache for multi-query attention (MQA) based on the framework presented in [16] to have gradual quantization degradation while achieving reasonable model quality and cost savings.

Related papers

A surrogate model for topology optimisation of elastic structures via parametric autoencoders [0.0]
Instead of learning the parametric solution of the state (and adjoint) problems, the proposed approach devises a surrogate version of the entire optimisation pipeline.<n>The method predicts a quasi-optimal topology for a given problem configuration as a surrogate model of high-fidelity topologies optimised with the homogenisation method.<n>Different architectures are proposed and the approximation and generalisation capabilities of the resulting models are numerically evaluated.
arXiv Detail & Related papers (2025-07-30T10:07:42Z)
Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation [55.88862563823878]
In this work, we present an original algorithm to coarsen an unstructured grid based on the concepts of differentiable physics.<n>We demonstrate performance of the algorithm on two PDEs: a linear equation which governs slightly compressible fluid flow in porous media and the wave equation.<n>Our results show that in the considered scenarios, we reduced the number of grid points up to 10 times while preserving the modeled variable dynamics in the points of interest.
arXiv Detail & Related papers (2025-07-24T11:02:13Z)
Multi-view Bayesian optimisation in reduced dimension for engineering design [0.9626666671366836]
We introduce a multi-view learning strategy that considers both the input design variables and output data representing the objective or constraint functions. Adopting a fully probabilistic viewpoint, we use probabilistic partial least squares (PPLS) to learn an orthogonal mapping from the design variables to the latent variables. We compare the proposed probabilistic partial least squares Bayesian optimisation (PPLS-BO) strategy to its deterministic counterpart, partial least squares Bayesian optimisation (PLS-BO), and classical Bayesian optimisation.
arXiv Detail & Related papers (2025-01-02T22:03:00Z)
Cross-Entropy Optimization for Hyperparameter Optimization in Stochastic Gradient-based Approaches to Train Deep Neural Networks [2.1046873879077794]
We present a cross-entropy optimization method for hyperparameter optimization of a learning algorithm. The presented method can be applied to other areas of optimization problems in deep learning.
arXiv Detail & Related papers (2024-09-14T00:39:37Z)
The Stochastic Conjugate Subgradient Algorithm For Kernel Support Vector Machines [1.738375118265695]
This paper proposes an innovative method specifically designed for kernel support vector machines (SVMs) It not only achieves faster iteration per iteration but also exhibits enhanced convergence when compared to conventional SFO techniques. Our experimental results demonstrate that the proposed algorithm not only maintains but potentially exceeds the scalability of SFO methods.
arXiv Detail & Related papers (2024-07-30T17:03:19Z)
GloptiNets: Scalable Non-Convex Optimization with Certificates [61.50835040805378]
We present a novel approach to non-cube optimization with certificates, which handles smooth functions on the hypercube or on the torus. By exploiting the regularity of the target function intrinsic in the decay of its spectrum, we allow at the same time to obtain precise certificates and leverage the advanced and powerful neural networks.
arXiv Detail & Related papers (2023-06-26T09:42:59Z)
An Optimization-based Deep Equilibrium Model for Hyperspectral Image Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem. A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network. The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z)
Data-Driven Optimization of Directed Information over Discrete Alphabets [15.372626012233736]
Directed information (DI) is a fundamental measure for the study and analysis of sequential analytic models. We propose a novel estimation-optimization framework for DI over discrete input spaces.
arXiv Detail & Related papers (2023-01-02T12:25:40Z)
An AI-Assisted Design Method for Topology Optimization Without Pre-Optimized Training Data [68.8204255655161]
An AI-assisted design method based on topology optimization is presented, which is able to obtain optimized designs in a direct way. Designs are provided by an artificial neural network, the predictor, on the basis of boundary conditions and degree of filling as input data.
arXiv Detail & Related papers (2020-12-11T14:33:27Z)
Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices. We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT) Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)
Adaptive Sampling of Pareto Frontiers with Binary Constraints Using Regression and Classification [0.0]
We present a novel adaptive optimization algorithm for black-box multi-objective optimization problems with binary constraints. Our method is based on probabilistic regression and classification models, which act as a surrogate for the optimization goals. We also present a novel ellipsoid truncation method to speed up the expected hypervolume calculation.
arXiv Detail & Related papers (2020-08-27T09:15:02Z)
A Dynamical Systems Approach for Convergence of the Bayesian EM Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based. The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM) We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z)
Uncertainty Modelling in Risk-averse Supply Chain Systems Using Multi-objective Pareto Optimization [0.0]
One of the arduous tasks in supply chain modelling is to build robust models against irregular variations. We have introduced a novel methodology namely, Pareto Optimization to handle uncertainties and bound the entropy of such uncertainties by explicitly modelling them under some apriori assumptions.
arXiv Detail & Related papers (2020-04-24T21:04:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.