A Model Compression Method with Matrix Product Operators for Speech
Enhancement
- URL: http://arxiv.org/abs/2010.04950v1
- Date: Sat, 10 Oct 2020 08:53:25 GMT
- Title: A Model Compression Method with Matrix Product Operators for Speech
Enhancement
- Authors: Xingwei Sun, Ze-Feng Gao, Zhong-Yi Lu, Junfeng Li, Yonghong Yan
- Abstract summary: We propose a model compression method based on matrix product operators (MPO) to substantially reduce the number of parameters in neural network models for speech enhancement.
Our proposal provides an effective model compression method for speech enhancement, especially in cloud-free application.
- Score: 15.066942043773267
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The deep neural network (DNN) based speech enhancement approaches have
achieved promising performance. However, the number of parameters involved in
these methods is usually enormous for the real applications of speech
enhancement on the device with the limited resources. This seriously restricts
the applications. To deal with this issue, model compression techniques are
being widely studied. In this paper, we propose a model compression method
based on matrix product operators (MPO) to substantially reduce the number of
parameters in DNN models for speech enhancement. In this method, the weight
matrices in the linear transformations of neural network model are replaced by
the MPO decomposition format before training. In experiment, this process is
applied to the causal neural network models, such as the feedforward multilayer
perceptron (MLP) and long short-term memory (LSTM) models. Both MLP and LSTM
models with/without compression are then utilized to estimate the ideal ratio
mask for monaural speech enhancement. The experimental results show that our
proposed MPO-based method outperforms the widely-used pruning method for speech
enhancement under various compression rates, and further improvement can be
achieved with respect to low compression rates. Our proposal provides an
effective model compression method for speech enhancement, especially in
cloud-free application.
Related papers
- Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization [40.15915011575071]
Low-rank compression is a promising technique to reduce non-essential parameters in large language models.
We conduct empirical research on the low-rank characteristics of large models.
We propose a low-rank compression method suitable for large language models.
arXiv Detail & Related papers (2024-05-17T08:27:12Z) - A Survey on Transformer Compression [84.18094368700379]
Transformer plays a vital role in the realms of natural language processing (NLP) and computer vision (CV)
Model compression methods reduce the memory and computational cost of Transformer.
This survey provides a comprehensive review of recent compression methods, with a specific focus on their application to Transformer-based models.
arXiv Detail & Related papers (2024-02-05T12:16:28Z) - CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks [1.5199992713356987]
This paper introduces CompactifAI, an innovative compression approach using quantum-inspired networks.
Our method is versatile and can be implemented with - or on top of - other compression techniques.
As a benchmark, we demonstrate that a combination of CompactifAI with quantization allows to reduce a 93% memory size of LlaMA 7B.
arXiv Detail & Related papers (2024-01-25T11:45:21Z) - Rethinking Compression: Reduced Order Modelling of Latent Features in
Large Language Models [9.91972450276408]
This paper introduces an innovative approach for the parametric and practical compression of Large Language Models (LLMs) based on reduced order modelling.
Our method represents a significant advancement in model compression by leveraging matrix decomposition, demonstrating superior efficacy compared to the prevailing state-of-the-art structured pruning method.
arXiv Detail & Related papers (2023-12-12T07:56:57Z) - High-Fidelity Speech Synthesis with Minimal Supervision: All Using
Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations.
Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z) - Exploring Effective Mask Sampling Modeling for Neural Image Compression [171.35596121939238]
Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy.
Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose a novel pretraining strategy for neural image compression.
Our method achieves competitive performance with lower computational complexity compared to state-of-the-art image compression methods.
arXiv Detail & Related papers (2023-06-09T06:50:20Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed
Representations [51.75960511842552]
Fine-tuning of pretrained language models (PLMs) is prone to overfitting in the low resource scenarios.
We present a novel method that operates on the hidden representations of a PLM to reduce overfitting.
arXiv Detail & Related papers (2022-11-16T09:39:29Z) - Enabling Lightweight Fine-tuning for Pre-trained Language Model
Compression based on Matrix Product Operators [31.461762905053426]
We present a novel pre-trained language models (PLM) compression approach based on the matrix product operator (short as MPO) from quantum many-body physics.
Our approach can be applied to the original or the compressed PLMs in a general way, which derives a lighter network and significantly reduces the parameters to be fine-tuned.
arXiv Detail & Related papers (2021-06-04T01:50:15Z) - Compressing LSTM Networks by Matrix Product Operators [7.395226141345625]
Long Short Term Memory(LSTM) models are the building blocks of many state-of-the-art natural language processing(NLP) and speech enhancement(SE) algorithms.
Here we introduce the MPO decomposition, which describes the local correlation of quantum states in quantum many-body physics.
We propose a matrix product operator(MPO) based neural network architecture to replace the LSTM model.
arXiv Detail & Related papers (2020-12-22T11:50:06Z) - Pretraining Techniques for Sequence-to-Sequence Voice Conversion [57.65753150356411]
Sequence-to-sequence (seq2seq) voice conversion (VC) models are attractive owing to their ability to convert prosody.
We propose to transfer knowledge from other speech processing tasks where large-scale corpora are easily available, typically text-to-speech (TTS) and automatic speech recognition (ASR)
We argue that VC models with such pretrained ASR or TTS model parameters can generate effective hidden representations for high-fidelity, highly intelligible converted speech.
arXiv Detail & Related papers (2020-08-07T11:02:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.