Related papers: Once-for-All Sequence Compression for Self-Supervised Speech Models

Once-for-All Sequence Compression for Self-Supervised Speech Models

URL: http://arxiv.org/abs/2211.02332v4
Date: Tue, 9 May 2023 11:14:52 GMT
Title: Once-for-All Sequence Compression for Self-Supervised Speech Models
Authors: Hsuan-Jui Chen, Yen Meng, Hung-yi Lee
Abstract summary: We introduce a once-for-all sequence compression framework for self-supervised speech models. The framework is evaluated on various tasks, showing marginal degradation compared to the fixed compressing rate variants. We also explore adaptive compressing rate learning, demonstrating the ability to select task-specific preferred frame periods without needing a grid search.
Score: 62.60723685118747
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The sequence length along the time axis is often the dominant factor of the computation in speech processing. Works have been proposed to reduce the sequence length for lowering the computational cost in self-supervised speech models. However, different downstream tasks have different tolerance of sequence compressing, so a model that produces a fixed compressing rate may not fit all tasks. In this work, we introduce a once-for-all (OFA) sequence compression framework for self-supervised speech models that supports a continuous range of operating compressing rates. The framework is evaluated on various tasks, showing marginal degradation compared to the fixed compressing rate variants with a smooth performance-efficiency trade-off. We further explore adaptive compressing rate learning, demonstrating the ability to select task-specific preferred frame periods without needing a grid search.

Related papers

KV-Distill: Nearly Lossless Learnable Context Compression for LLMs [37.0803484148612]
We introduce KV-Distill, a Transformer compression framework that distills long context KV caches into significantly shorter representations. KV-Distill can be trained as a parameter-efficient adaptor for pretrained models. It can be fine-tuned on domain-specific contexts to reduce lengths by up to 99% while preserving downstream performance.
arXiv Detail & Related papers (2025-03-13T13:15:28Z)
Choose Your Model Size: Any Compression by a Single Gradient Descent [9.074689052563878]
We present Any Compression via Iterative Pruning (ACIP) ACIP is an algorithmic approach to determine a compression-performance trade-off from a single gradient descent run. We show that ACIP seamlessly complements common quantization-based compression techniques.
arXiv Detail & Related papers (2025-02-03T18:40:58Z)
Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles [49.65811277223873]
Style-Compress is a lightweight framework that adapts a smaller language model to compress prompts for a larger model on a new task without additional training. Our approach iteratively generates and selects effective compressed prompts as task-specific demonstrations through style variation and in-context learning. Style-Compress outperforms two baseline compression models in four tasks: original prompt reconstruction, text summarization, multi-hop QA, and CoT reasoning.
arXiv Detail & Related papers (2024-10-17T21:35:49Z)
Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning [63.43972993473501]
Token compression expedites the training and inference of Vision Transformers (ViTs) However, when applied to downstream tasks, compression degrees are mismatched between training and inference stages. We propose a model arithmetic framework to decouple the compression degrees between the two stages.
arXiv Detail & Related papers (2024-08-13T10:36:43Z)
Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression [38.09558772881095]
Under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement. Proposed models show competitive performance compared with fast FullSubNet and DeepNetFilter.
arXiv Detail & Related papers (2023-08-21T21:36:56Z)
Latent Discretization for Continuous-time Sequence Compression [21.062288207034968]
In this work, we treat data sequences as observations from an underlying continuous-time process. We show that our approaches can automatically achieve reductions in bit rates by learning how to discretize.
arXiv Detail & Related papers (2022-12-28T01:15:27Z)
On Compressing Sequences for Self-Supervised Speech Models [78.62210521316081]
We study fixed-length and variable-length subsampling along the time axis in self-supervised learning. We find that variable-length subsampling performs particularly well under low frame rates. If we have access to phonetic boundaries, we find no degradation in performance for an average frame rate as low as 10 Hz.
arXiv Detail & Related papers (2022-10-13T17:10:02Z)
Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning [60.20205278845412]
Modern approaches to text to speech require the entire input character sequence to be processed before any audio is synthesised. This latency limits the suitability of such models for time-sensitive tasks like simultaneous interpretation. We propose a reinforcement learning based framework to train an agent to make this decision.
arXiv Detail & Related papers (2020-08-07T11:48:05Z)
Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process. We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.