Once-for-All Sequence Compression for Self-Supervised Speech Models
- URL: http://arxiv.org/abs/2211.02332v4
- Date: Tue, 9 May 2023 11:14:52 GMT
- Title: Once-for-All Sequence Compression for Self-Supervised Speech Models
- Authors: Hsuan-Jui Chen, Yen Meng, Hung-yi Lee
- Abstract summary: We introduce a once-for-all sequence compression framework for self-supervised speech models.
The framework is evaluated on various tasks, showing marginal degradation compared to the fixed compressing rate variants.
We also explore adaptive compressing rate learning, demonstrating the ability to select task-specific preferred frame periods without needing a grid search.
- Score: 62.60723685118747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The sequence length along the time axis is often the dominant factor of the
computation in speech processing. Works have been proposed to reduce the
sequence length for lowering the computational cost in self-supervised speech
models. However, different downstream tasks have different tolerance of
sequence compressing, so a model that produces a fixed compressing rate may not
fit all tasks. In this work, we introduce a once-for-all (OFA) sequence
compression framework for self-supervised speech models that supports a
continuous range of operating compressing rates. The framework is evaluated on
various tasks, showing marginal degradation compared to the fixed compressing
rate variants with a smooth performance-efficiency trade-off. We further
explore adaptive compressing rate learning, demonstrating the ability to select
task-specific preferred frame periods without needing a grid search.
Related papers
- Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles [49.65811277223873]
Style-Compress is a lightweight framework that adapts a smaller language model to compress prompts for a larger model on a new task without additional training.
Our approach iteratively generates and selects effective compressed prompts as task-specific demonstrations through style variation and in-context learning.
Style-Compress outperforms two baseline compression models in four tasks: original prompt reconstruction, text summarization, multi-hop QA, and CoT reasoning.
arXiv Detail & Related papers (2024-10-17T21:35:49Z) - Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning [63.43972993473501]
Token compression expedites the training and inference of Vision Transformers (ViTs)
However, when applied to downstream tasks, compression degrees are mismatched between training and inference stages.
We propose a model arithmetic framework to decouple the compression degrees between the two stages.
arXiv Detail & Related papers (2024-08-13T10:36:43Z) - Ultra Dual-Path Compression For Joint Echo Cancellation And Noise
Suppression [38.09558772881095]
Under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement.
Proposed models show competitive performance compared with fast FullSubNet and DeepNetFilter.
arXiv Detail & Related papers (2023-08-21T21:36:56Z) - Latent Discretization for Continuous-time Sequence Compression [21.062288207034968]
In this work, we treat data sequences as observations from an underlying continuous-time process.
We show that our approaches can automatically achieve reductions in bit rates by learning how to discretize.
arXiv Detail & Related papers (2022-12-28T01:15:27Z) - On Compressing Sequences for Self-Supervised Speech Models [78.62210521316081]
We study fixed-length and variable-length subsampling along the time axis in self-supervised learning.
We find that variable-length subsampling performs particularly well under low frame rates.
If we have access to phonetic boundaries, we find no degradation in performance for an average frame rate as low as 10 Hz.
arXiv Detail & Related papers (2022-10-13T17:10:02Z) - Incremental Text to Speech for Neural Sequence-to-Sequence Models using
Reinforcement Learning [60.20205278845412]
Modern approaches to text to speech require the entire input character sequence to be processed before any audio is synthesised.
This latency limits the suitability of such models for time-sensitive tasks like simultaneous interpretation.
We propose a reinforcement learning based framework to train an agent to make this decision.
arXiv Detail & Related papers (2020-08-07T11:48:05Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.