On Compressing Sequences for Self-Supervised Speech Models
- URL: http://arxiv.org/abs/2210.07189v2
- Date: Fri, 14 Oct 2022 15:21:22 GMT
- Title: On Compressing Sequences for Self-Supervised Speech Models
- Authors: Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia,
Hung-yi Lee, Hao Tang
- Abstract summary: We study fixed-length and variable-length subsampling along the time axis in self-supervised learning.
We find that variable-length subsampling performs particularly well under low frame rates.
If we have access to phonetic boundaries, we find no degradation in performance for an average frame rate as low as 10 Hz.
- Score: 78.62210521316081
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compressing self-supervised models has become increasingly necessary, as
self-supervised models become larger. While previous approaches have primarily
focused on compressing the model size, shortening sequences is also effective
in reducing the computational cost. In this work, we study fixed-length and
variable-length subsampling along the time axis in self-supervised learning. We
explore how individual downstream tasks are sensitive to input frame rates.
Subsampling while training self-supervised models not only improves the overall
performance on downstream tasks under certain frame rates, but also brings
significant speed-up in inference. Variable-length subsampling performs
particularly well under low frame rates. In addition, if we have access to
phonetic boundaries, we find no degradation in performance for an average frame
rate as low as 10 Hz.
Related papers
- Efficient Continuous Video Flow Model for Video Prediction [43.16308241800144]
Multi-step prediction models, such as diffusion and rectified flow models, exhibit higher latency in sampling new frames compared to single-step methods.
We propose a novel approach to modeling the multi-step process, aimed at alleviating latency constraints and facilitating the adaptation of such processes for video prediction tasks.
arXiv Detail & Related papers (2024-12-07T12:11:25Z) - DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models [55.608981341747246]
We introduce Data Adaptive Self-Supervised Early Exit (DAISY), an approach that decides when to exit based on the self-supervised loss.
Our analysis on the adaptivity of DAISY shows that the model exits early (using fewer layers) on clean data while exits late (using more layers) on noisy data.
arXiv Detail & Related papers (2024-06-08T12:58:13Z) - HumMUSS: Human Motion Understanding using State Space Models [6.821961232645209]
We propose a novel attention-free model for human motion understanding building upon recent advancements in state space models.
Our model supports both offline and real-time applications.
For real-time sequential prediction, our model is both memory efficient and several times faster than transformer-based approaches.
arXiv Detail & Related papers (2024-04-16T19:59:21Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep
Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.
We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost.
Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z) - Efficient Video Prediction via Sparsely Conditioned Flow Matching [24.32740918613266]
We introduce a novel generative model for video prediction based on latent flow matching.
We call our model Random frame conditioned flow Integration for VidEo pRediction, or, in short, RIVER.
arXiv Detail & Related papers (2022-11-26T14:18:50Z) - Once-for-All Sequence Compression for Self-Supervised Speech Models [62.60723685118747]
We introduce a once-for-all sequence compression framework for self-supervised speech models.
The framework is evaluated on various tasks, showing marginal degradation compared to the fixed compressing rate variants.
We also explore adaptive compressing rate learning, demonstrating the ability to select task-specific preferred frame periods without needing a grid search.
arXiv Detail & Related papers (2022-11-04T09:19:13Z) - Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead.
We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.