Related papers: Exploring Heterogeneous Characteristics of Layers in ASR Models for More Efficient Training

Related papers

Visual Autoregressive Modelling for Monocular Depth Estimation [69.01449528371916]
We propose a monocular depth estimation method based on visual autoregressive ( VAR) priors.<n>Our method adapts a large-scale text-to-image VAR model and introduces a scale-wise conditional upsampling mechanism.<n>We report state-of-the-art performance in indoor benchmarks under constrained training conditions, and strong performance when applied to outdoor datasets.
arXiv Detail & Related papers (2025-12-27T17:08:03Z)
Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate [1.0152838128195467]
The prevailing paradigm for scaling large language models (LLMs) involves monolithic, end-to-end training.<n>This paper explores an alternative, constructive scaling paradigm, enabled by the principle of emergent semantics in Transformers.<n>We operationalize this with a layer-wise constructive methodology that combines strict layer freezing in early stages with efficient, holistic fine-tuning of the entire model stack.
arXiv Detail & Related papers (2025-07-08T20:01:15Z)
Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
Parameter-Efficient Continual Fine-Tuning: A Survey [5.59258786465086]
We believe the next breakthrough in AI lies in enabling efficient adaptation to evolving environments. One alternative to efficiently adapt these large-scale models is known. Efficient Fine-Tuning (PEFT)
arXiv Detail & Related papers (2025-04-18T17:51:51Z)
Large EEG-U-Transformer for Time-Step Level Detection Without Pre-Training [1.3254304182988286]
We propose a simple U-shaped model to efficiently learn representations by capturing both local and global features.<n>Compared to other window-level classification models, our method directly outputs predictions at the time-step level.<n>Our model won 1st place in the 2025 "seizure detection challenge" organized in the International Conference on Artificial Intelligence in Epilepsy and Other Neurological Disorders.
arXiv Detail & Related papers (2025-04-01T01:33:42Z)
Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment. We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z)
Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships. Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z)
Meta-Learning Adaptable Foundation Models [37.458141335750696]
We introduce a meta-learning framework infused with PEFT in this intermediate retraining stage to learn a model that can be easily adapted to unseen tasks. In this setting, we demonstrate the suboptimality of standard retraining for finding an adaptable set of parameters. We then apply these theoretical insights to retraining the RoBERTa model to predict the continuation of conversations within the ConvAI2 dataset.
arXiv Detail & Related papers (2024-10-29T17:24:18Z)
Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting. We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Explaining the role of Intrinsic Dimensionality in Adversarial Training [31.495803865226158]
We show that off-manifold adversarial examples (AEs) enhance robustness, while on-manifold AEs improve generalization.<n>We introduce SMAAT, which improves the scalability of AT for encoder-based models by perturbing the layer with the lowest intrinsic dimensionality.<n>We validate SMAAT across multiple tasks, including text generation, sentiment classification, safety filtering, and retrieval augmented generation setups.
arXiv Detail & Related papers (2024-05-27T12:48:30Z)
Enhancing Dynamical System Modeling through Interpretable Machine Learning Augmentations: A Case Study in Cathodic Electrophoretic Deposition [0.8796261172196743]
We introduce a comprehensive data-driven framework aimed at enhancing the modeling of physical systems. As a demonstrative application, we pursue the modeling of cathodic electrophoretic deposition (EPD), commonly known as e-coating.
arXiv Detail & Related papers (2024-01-16T14:58:21Z)
Unlocking the Potential of Federated Learning for Deeper Models [24.875271131226707]
Federated learning (FL) is a new paradigm for distributed machine learning that allows a global model to be trained across multiple clients. We propose several technical guidelines based on reducing divergence, such as using wider models and reducing the receptive field. These approaches can greatly improve the accuracy of FL on deeper models.
arXiv Detail & Related papers (2023-06-05T08:45:44Z)
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO) MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts. Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)
FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories. We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z)
Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models [40.08137765886609]
We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics. Our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.
arXiv Detail & Related papers (2021-02-16T17:21:55Z)
Learning Deep-Latent Hierarchies by Stacking Wasserstein Autoencoders [22.54887526392739]
We propose a novel approach to training models with deep-latent hierarchies based on Optimal Transport. We show that our method enables the generative model to fully leverage its deep-latent hierarchy, avoiding the well known "latent variable collapse" issue of VAEs.
arXiv Detail & Related papers (2020-10-07T15:04:20Z)
Deep Autoencoding Topic Model with Scalable Hybrid Bayesian Inference [55.35176938713946]
We develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network. We propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a downward generative model. The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.
arXiv Detail & Related papers (2020-06-15T22:22:56Z)
Towards Interpretable Deep Learning Models for Knowledge Tracing [62.75876617721375]
We propose to adopt the post-hoc method to tackle the interpretability issue for deep learning based knowledge tracing (DLKT) models. Specifically, we focus on applying the layer-wise relevance propagation (LRP) method to interpret RNN-based DLKT model. Experiment results show the feasibility using the LRP method for interpreting the DLKT model's predictions.
arXiv Detail & Related papers (2020-05-13T04:03:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.