Related papers: Latent Iterative Refinement for Modular Source Separation

Latent Iterative Refinement for Modular Source Separation

URL: http://arxiv.org/abs/2211.11917v2
Date: Mon, 16 Oct 2023 03:06:50 GMT
Title: Latent Iterative Refinement for Modular Source Separation
Authors: Dimitrios Bralios, Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux
Abstract summary: Traditional source separation approaches train deep neural network models end-to-end with all the data available at once. We argue that we can significantly increase resource efficiency during both training and inference stages.
Score: 44.78689915209527
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditional source separation approaches train deep neural network models end-to-end with all the data available at once by minimizing the empirical risk on the whole training set. On the inference side, after training the model, the user fetches a static computation graph and runs the full model on some specified observed mixture signal to get the estimated source signals. Additionally, many of those models consist of several basic processing blocks which are applied sequentially. We argue that we can significantly increase resource efficiency during both training and inference stages by reformulating a model's training and inference procedures as iterative mappings of latent signal representations. First, we can apply the same processing block more than once on its output to refine the input signal and consequently improve parameter efficiency. During training, we can follow a block-wise procedure which enables a reduction on memory requirements. Thus, one can train a very complicated network structure using significantly less computation compared to end-to-end training. During inference, we can dynamically adjust how many processing blocks and iterations of a specific block an input signal needs using a gating module.

Related papers

DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion [2.455468619225742]
Training large neural networks with end-to-end backpropagation creates significant memory bottlenecks.<n>We propose $itDiffusionBlocks$, a novel training framework that interprets neural network blocks as performing denoising operations in a continuous-time diffusion process.
arXiv Detail & Related papers (2025-06-17T05:44:18Z)
Improving Location-based Thermal Emission Side-Channel Analysis Using Iterative Transfer Learning [3.5459927850418116]
This paper proposes the use of iterative transfer learning applied to deep learning models for side-channel attacks. Experimental results show that when using thermal or power consumption map images as input, our method improves average performance.
arXiv Detail & Related papers (2024-12-30T15:56:34Z)
Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
Partitioned Neural Network Training via Synthetic Intermediate Labels [0.0]
GPU memory constraints have become a notable bottleneck in training such sizable models. This study advocates partitioning the model across GPU and generating synthetic intermediate labels to train individual segments. This approach results in a more efficient training process that minimizes data communication while maintaining model accuracy.
arXiv Detail & Related papers (2024-03-17T13:06:29Z)
Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges. We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z)
Block-local learning with probabilistic latent representations [2.839567756494814]
Locking and weight transport are problems because they prevent efficient parallelization and horizontal scaling of the training process. We propose a new method to address both these problems and scale up the training of large models. We present results on a variety of tasks and architectures, demonstrating state-of-the-art performance using block-local learning.
arXiv Detail & Related papers (2023-05-24T10:11:30Z)
Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency. We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training. We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z)
Lightweight and Flexible Deep Equilibrium Learning for CSI Feedback in FDD Massive MIMO [13.856867175477042]
In frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems, downlink channel state information (CSI) needs to be sent back to the base station (BS) by the users. We propose a lightweight and flexible deep learning-based CSI feedback approach by capitalizing on deep equilibrium models.
arXiv Detail & Related papers (2022-11-28T05:53:09Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning [14.642266310020505]
This paper proposes a new sequential model learning architecture to solve partially observable Markov decision problems. The proposed architecture generates a latent variable in each data block with a length of multiple timesteps and passes the most relevant information to the next block for policy optimization. Numerical results show that the proposed method significantly outperforms previous methods in various partially observable environments.
arXiv Detail & Related papers (2021-12-10T05:38:24Z)
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. We show how to extend the architecture of a simple RNN by separating its hidden state into different modules. We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
Efficient Learning of Model Weights via Changing Features During Training [0.0]
We propose a machine learning model, which dynamically changes the features during training. Our main motivation is to update the model in a small content during the training process with replacing less descriptive features to new ones from a large pool.
arXiv Detail & Related papers (2020-02-21T12:38:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.