Related papers: Deep Autoencoder with SVD-Like Convergence and Flat Minima

Deep Autoencoder with SVD-Like Convergence and Flat Minima

URL: http://arxiv.org/abs/2410.18148v1
Date: Wed, 23 Oct 2024 00:04:26 GMT
Title: Deep Autoencoder with SVD-Like Convergence and Flat Minima
Authors: Nithin Somasekharan, Shaowu Pan,
Abstract summary: We propose a learnable weighted hybrid autoencoder to overcome the Kolmogorov barrier. We empirically find that our trained model has a sharpness thousands of times smaller compared to other models.
Score: 1.0742675209112622
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Representation learning for high-dimensional, complex physical systems aims to identify a low-dimensional intrinsic latent space, which is crucial for reduced-order modeling and modal analysis. To overcome the well-known Kolmogorov barrier, deep autoencoders (AEs) have been introduced in recent years, but they often suffer from poor convergence behavior as the rank of the latent space increases. To address this issue, we propose the learnable weighted hybrid autoencoder, a hybrid approach that combines the strengths of singular value decomposition (SVD) with deep autoencoders through a learnable weighted framework. We find that the introduction of learnable weighting parameters is essential - without them, the resulting model would either collapse into a standard POD or fail to exhibit the desired convergence behavior. Additionally, we empirically find that our trained model has a sharpness thousands of times smaller compared to other models. Our experiments on classical chaotic PDE systems, including the 1D Kuramoto-Sivashinsky and forced isotropic turbulence datasets, demonstrate that our approach significantly improves generalization performance compared to several competing methods, paving the way for robust representation learning of high-dimensional, complex physical systems.

Related papers

UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality [52.49062565901046]
Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone for aligning large language models with human values.<n>Existing approaches struggle to capture the multi-dimensional, distributional nuances of human preferences.<n>We introduce Utility-Conditioned Multi-Objective Alignment (UC-MOA), a novel framework that overcomes these limitations.
arXiv Detail & Related papers (2025-03-10T09:52:42Z)
Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches. We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Exploiting the Layered Intrinsic Dimensionality of Deep Models for Practical Adversarial Training [31.495803865226158]
Adversarial Training (AT) is rarely, if ever, deployed in practical AI systems for two primary reasons. AT results in a drop in generalization in vision models whereas, in encoder-based language models, generalization either improves or remains unchanged. We show that SMAAT requires only 25-33% of the GPU time compared to standard AT, while significantly improving robustness across all applications.
arXiv Detail & Related papers (2024-05-27T12:48:30Z)
On the Embedding Collapse when Scaling up Recommendation Models [53.66285358088788]
We identify the embedding collapse phenomenon as the inhibition of scalability, wherein the embedding matrix tends to occupy a low-dimensional subspace. We propose a simple yet effective multi-embedding design incorporating embedding-set-specific interaction modules to learn embedding sets with large diversity.
arXiv Detail & Related papers (2023-10-06T17:50:38Z)
Multi-fidelity reduced-order surrogate modeling [5.346062841242067]
We present a new data-driven strategy that combines dimensionality reduction with multi-fidelity neural network surrogates. We show that the onset of instabilities and transients are well captured by this surrogate technique.
arXiv Detail & Related papers (2023-09-01T08:16:53Z)
Low-dimensional Data-based Surrogate Model of a Continuum-mechanical Musculoskeletal System Based on Non-intrusive Model Order Reduction [0.0]
Non-traditional approaches such as surrogate modeling using data-driven model order reduction are used to make high-fidelity models more widely available anyway. We demonstrate the benefits of the surrogate modeling approach on a complex finite element model of a human upper-arm.
arXiv Detail & Related papers (2023-02-13T17:14:34Z)
On Robust Numerical Solver for ODE via Self-Attention Mechanism [82.95493796476767]
We explore training efficient and robust AI-enhanced numerical solvers with a small data size by mitigating intrinsic noise disturbances. We first analyze the ability of the self-attention mechanism to regulate noise in supervised learning and then propose a simple-yet-effective numerical solver, Attr, which introduces an additive self-attention mechanism to the numerical solution of differential equations.
arXiv Detail & Related papers (2023-02-05T01:39:21Z)
Solving High-Dimensional PDEs with Latent Spectral Models [74.1011309005488]
We present Latent Spectral Models (LSM) toward an efficient and precise solver for high-dimensional PDEs. Inspired by classical spectral methods in numerical analysis, we design a neural spectral block to solve PDEs in the latent space. LSM achieves consistent state-of-the-art and yields a relative gain of 11.5% averaged on seven benchmarks.
arXiv Detail & Related papers (2023-01-30T04:58:40Z)
Data-driven Control of Agent-based Models: an Equation/Variable-free Machine Learning Approach [0.0]
We present an Equation/Variable free machine learning (EVFML) framework for the control of the collective dynamics of complex/multiscale systems. The proposed implementation consists of three steps: (A) from high-dimensional agent-based simulations, machine learning (in particular, non-linear manifold learning (DMs)) We exploit the Equation-free approach to perform numerical bifurcation analysis of the emergent dynamics. We design data-driven embedded wash-out controllers that drive the agent-based simulators to their intrinsic, imprecisely known, emergent open-loop unstable steady-states.
arXiv Detail & Related papers (2022-07-12T18:16:22Z)
Interfacing Finite Elements with Deep Neural Operators for Fast Multiscale Modeling of Mechanics Problems [4.280301926296439]
In this work, we explore the idea of multiscale modeling with machine learning and employ DeepONet, a neural operator, as an efficient surrogate of the expensive solver. DeepONet is trained offline using data acquired from the fine solver for learning the underlying and possibly unknown fine-scale dynamics. We present various benchmarks to assess accuracy and speedup, and in particular we develop a coupling algorithm for a time-dependent problem.
arXiv Detail & Related papers (2022-02-25T20:46:08Z)
Deep Variational Models for Collaborative Filtering-based Recommender Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results. Our proposed models apply the variational concept to injectity in the latent space of the deep architecture. Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z)
Learning Deep-Latent Hierarchies by Stacking Wasserstein Autoencoders [22.54887526392739]
We propose a novel approach to training models with deep-latent hierarchies based on Optimal Transport. We show that our method enables the generative model to fully leverage its deep-latent hierarchy, avoiding the well known "latent variable collapse" issue of VAEs.
arXiv Detail & Related papers (2020-10-07T15:04:20Z)
Limited-angle tomographic reconstruction of dense layered objects by dynamical machine learning [68.9515120904028]
Limited-angle tomography of strongly scattering quasi-transparent objects is a challenging, highly ill-posed problem. Regularizing priors are necessary to reduce artifacts by improving the condition of such problems. We devised a recurrent neural network (RNN) architecture with a novel split-convolutional gated recurrent unit (SC-GRU) as the building block.
arXiv Detail & Related papers (2020-07-21T11:48:22Z)
Automated and Formal Synthesis of Neural Barrier Certificates for Dynamical Models [70.70479436076238]
We introduce an automated, formal, counterexample-based approach to synthesise Barrier Certificates (BC) The approach is underpinned by an inductive framework, which manipulates a candidate BC structured as a neural network, and a sound verifier, which either certifies the candidate's validity or generates counter-examples. The outcomes show that we can synthesise sound BCs up to two orders of magnitude faster, with in particular a stark speedup on the verification engine.
arXiv Detail & Related papers (2020-07-07T07:39:42Z)
An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d) This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.