Once-for-All Sequence Compression for Self-Supervised Speech Models
- URL: http://arxiv.org/abs/2211.02332v4
- Date: Tue, 9 May 2023 11:14:52 GMT
- Title: Once-for-All Sequence Compression for Self-Supervised Speech Models
- Authors: Hsuan-Jui Chen, Yen Meng, Hung-yi Lee
- Abstract summary: We introduce a once-for-all sequence compression framework for self-supervised speech models.
The framework is evaluated on various tasks, showing marginal degradation compared to the fixed compressing rate variants.
We also explore adaptive compressing rate learning, demonstrating the ability to select task-specific preferred frame periods without needing a grid search.
- Score: 62.60723685118747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The sequence length along the time axis is often the dominant factor of the
computation in speech processing. Works have been proposed to reduce the
sequence length for lowering the computational cost in self-supervised speech
models. However, different downstream tasks have different tolerance of
sequence compressing, so a model that produces a fixed compressing rate may not
fit all tasks. In this work, we introduce a once-for-all (OFA) sequence
compression framework for self-supervised speech models that supports a
continuous range of operating compressing rates. The framework is evaluated on
various tasks, showing marginal degradation compared to the fixed compressing
rate variants with a smooth performance-efficiency trade-off. We further
explore adaptive compressing rate learning, demonstrating the ability to select
task-specific preferred frame periods without needing a grid search.
Related papers
- Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Ultra Dual-Path Compression For Joint Echo Cancellation And Noise
Suppression [38.09558772881095]
Under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement.
Proposed models show competitive performance compared with fast FullSubNet and DeepNetFilter.
arXiv Detail & Related papers (2023-08-21T21:36:56Z) - Latent Discretization for Continuous-time Sequence Compression [21.062288207034968]
In this work, we treat data sequences as observations from an underlying continuous-time process.
We show that our approaches can automatically achieve reductions in bit rates by learning how to discretize.
arXiv Detail & Related papers (2022-12-28T01:15:27Z) - On Compressing Sequences for Self-Supervised Speech Models [78.62210521316081]
We study fixed-length and variable-length subsampling along the time axis in self-supervised learning.
We find that variable-length subsampling performs particularly well under low frame rates.
If we have access to phonetic boundaries, we find no degradation in performance for an average frame rate as low as 10 Hz.
arXiv Detail & Related papers (2022-10-13T17:10:02Z) - Efficient Unsupervised Sentence Compression by Fine-tuning Transformers
with Reinforcement Learning [10.380414189465347]
Sentence compression reduces the length of text by removing non-essential content.
Unsupervised objective driven methods for sentence compression can be used to create customized models.
arXiv Detail & Related papers (2022-05-17T10:34:28Z) - Unified Multivariate Gaussian Mixture for Efficient Neural Image
Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression.
We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective.
Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z) - Incremental Text to Speech for Neural Sequence-to-Sequence Models using
Reinforcement Learning [60.20205278845412]
Modern approaches to text to speech require the entire input character sequence to be processed before any audio is synthesised.
This latency limits the suitability of such models for time-sensitive tasks like simultaneous interpretation.
We propose a reinforcement learning based framework to train an agent to make this decision.
arXiv Detail & Related papers (2020-08-07T11:48:05Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.