Once-for-All Sequence Compression for Self-Supervised Speech Models
        - URL: http://arxiv.org/abs/2211.02332v4
- Date: Tue, 9 May 2023 11:14:52 GMT
- Title: Once-for-All Sequence Compression for Self-Supervised Speech Models
- Authors: Hsuan-Jui Chen, Yen Meng, Hung-yi Lee
- Abstract summary: We introduce a once-for-all sequence compression framework for self-supervised speech models.
The framework is evaluated on various tasks, showing marginal degradation compared to the fixed compressing rate variants.
We also explore adaptive compressing rate learning, demonstrating the ability to select task-specific preferred frame periods without needing a grid search.
- Score: 62.60723685118747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   The sequence length along the time axis is often the dominant factor of the
computation in speech processing. Works have been proposed to reduce the
sequence length for lowering the computational cost in self-supervised speech
models. However, different downstream tasks have different tolerance of
sequence compressing, so a model that produces a fixed compressing rate may not
fit all tasks. In this work, we introduce a once-for-all (OFA) sequence
compression framework for self-supervised speech models that supports a
continuous range of operating compressing rates. The framework is evaluated on
various tasks, showing marginal degradation compared to the fixed compressing
rate variants with a smooth performance-efficiency trade-off. We further
explore adaptive compressing rate learning, demonstrating the ability to select
task-specific preferred frame periods without needing a grid search.
 
      
        Related papers
        - KV-Distill: Nearly Lossless Learnable Context Compression for LLMs [37.0803484148612]
 We introduce KV-Distill, a Transformer compression framework that distills long context KV caches into significantly shorter representations.
 KV-Distill can be trained as a parameter-efficient adaptor for pretrained models.
It can be fine-tuned on domain-specific contexts to reduce lengths by up to 99% while preserving downstream performance.
 arXiv  Detail & Related papers  (2025-03-13T13:15:28Z)
- Choose Your Model Size: Any Compression by a Single Gradient Descent [9.074689052563878]
 We present Any Compression via Iterative Pruning (ACIP)
ACIP is an algorithmic approach to determine a compression-performance trade-off from a single gradient descent run.
We show that ACIP seamlessly complements common quantization-based compression techniques.
 arXiv  Detail & Related papers  (2025-02-03T18:40:58Z)
- Style-Compress: An LLM-Based Prompt Compression Framework Considering   Task-Specific Styles [49.65811277223873]
 Style-Compress is a lightweight framework that adapts a smaller language model to compress prompts for a larger model on a new task without additional training.
Our approach iteratively generates and selects effective compressed prompts as task-specific demonstrations through style variation and in-context learning.
Style-Compress outperforms two baseline compression models in four tasks: original prompt reconstruction, text summarization, multi-hop QA, and CoT reasoning.
 arXiv  Detail & Related papers  (2024-10-17T21:35:49Z)
- Token Compensator: Altering Inference Cost of Vision Transformer without   Re-Tuning [63.43972993473501]
 Token compression expedites the training and inference of Vision Transformers (ViTs)
However, when applied to downstream tasks, compression degrees are mismatched between training and inference stages.
We propose a model arithmetic framework to decouple the compression degrees between the two stages.
 arXiv  Detail & Related papers  (2024-08-13T10:36:43Z)
- Ultra Dual-Path Compression For Joint Echo Cancellation And Noise
  Suppression [38.09558772881095]
 Under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement.
Proposed models show competitive performance compared with fast FullSubNet and DeepNetFilter.
 arXiv  Detail & Related papers  (2023-08-21T21:36:56Z)
- Latent Discretization for Continuous-time Sequence Compression [21.062288207034968]
 In this work, we treat data sequences as observations from an underlying continuous-time process.
We show that our approaches can automatically achieve reductions in bit rates by learning how to discretize.
 arXiv  Detail & Related papers  (2022-12-28T01:15:27Z)
- On Compressing Sequences for Self-Supervised Speech Models [78.62210521316081]
 We study fixed-length and variable-length subsampling along the time axis in self-supervised learning.
We find that variable-length subsampling performs particularly well under low frame rates.
If we have access to phonetic boundaries, we find no degradation in performance for an average frame rate as low as 10 Hz.
 arXiv  Detail & Related papers  (2022-10-13T17:10:02Z)
- Incremental Text to Speech for Neural Sequence-to-Sequence Models using
  Reinforcement Learning [60.20205278845412]
 Modern approaches to text to speech require the entire input character sequence to be processed before any audio is synthesised.
This latency limits the suitability of such models for time-sensitive tasks like simultaneous interpretation.
We propose a reinforcement learning based framework to train an agent to make this decision.
 arXiv  Detail & Related papers  (2020-08-07T11:48:05Z)
- Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
 In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
 arXiv  Detail & Related papers  (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.