Related papers: Group Communication with Context Codec for Ultra-Lightweight Source Separation

Group Communication with Context Codec for Ultra-Lightweight Source Separation

URL: http://arxiv.org/abs/2012.07291v1
Date: Mon, 14 Dec 2020 06:57:58 GMT
Title: Group Communication with Context Codec for Ultra-Lightweight Source Separation
Authors: Yi Luo, Cong Han, Nima Mesgarani
Abstract summary: We propose the group communication with context (GC3) design to decrease both model size and complexity without sacrificing the model performance. GC3 can achieve on par or better performance than a wide range of baseline architectures with as small as 2.5% model size.
Score: 32.975741399690214
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Ultra-lightweight model design is an important topic for the deployment of existing speech enhancement and source separation techniques on low-resource platforms. Various lightweight model design paradigms have been proposed in recent years; however, most models still suffer from finding a balance between model size, model complexity, and model performance. In this paper, we propose the group communication with context codec (GC3) design to decrease both model size and complexity without sacrificing the model performance. Group communication splits a high-dimensional feature into groups of low-dimensional features and applies a module to capture the inter-group dependency. A model can then be applied to the groups in parallel with a significantly smaller width. A context codec is applied to decrease the length of a sequential feature, where a context encoder compresses the temporal context of local features into a single feature representing the global characteristics of the context, and a context decoder decompresses the transformed global features back to the context features. Experimental results show that GC3 can achieve on par or better performance than a wide range of baseline architectures with as small as 2.5% model size.

Related papers

GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching [41.96482857947199]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.<n>LLMs typically come with a substantial model size, which presents significant challenges in deployment and inference.<n>We develop a novel strategy to compress models by strategically combining or merging layers from finetuned model variants.
arXiv Detail & Related papers (2025-06-25T14:24:59Z)
Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution [88.20464308588889]
We propose a Structural Similarity-Inspired Unfolding (SSIU) method for efficient image SR.<n>This method is designed through unfolding an SR optimization function constrained by structural similarity.<n>Our model outperforms current state-of-the-art models, boasting lower parameter counts and reduced memory consumption.
arXiv Detail & Related papers (2025-06-13T14:29:40Z)
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging [103.98582374569789]
Model merging aims to combine multiple expert models into a single model, thereby reducing storage and serving costs.<n>Previous studies have primarily focused on merging visual classification models or Large Language Models (LLMs) for code and math tasks.<n>We introduce the model merging benchmark for MLLMs, which includes multiple tasks such as VQA, Geometry, Chart, OCR, and Grounding, providing both LoRA and full fine-tuning models.
arXiv Detail & Related papers (2025-05-26T12:23:14Z)
X-Fusion: Introducing New Modality to Frozen Large Language Models [82.3508830643655]
We propose X-Fusion, a framework that extends pretrained Large Language Models for multimodal tasks. X-Fusion employs a dual-tower design with modality-specific weights, keeping the LLM's parameters frozen while integrating vision-specific information for both understanding and generation. Our experiments demonstrate that X-Fusion consistently outperforms alternative architectures on both image-to-text and text-to-image tasks.
arXiv Detail & Related papers (2025-04-29T17:59:45Z)
Merging Feed-Forward Sublayers for Compressed Transformers [16.746335565636976]
We present a novel approach to model compression by merging similar parameter groups within a model. Specifically, we select, align, and merge separate feed-forward sublayers in Transformer models. We demonstrate performance comparable to the original models while combining more than a third of model feed-forward sublayers.
arXiv Detail & Related papers (2025-01-10T17:25:11Z)
Towards Unifying Feature Interaction Models for Click-Through Rate Prediction [19.149554121852724]
We propose a general framework called IPA to unify existing models. We demonstrate that most existing models can be categorized within our framework by making specific choices for these three components. We introduce a novel model that achieves competitive results compared to state-of-the-art CTR models.
arXiv Detail & Related papers (2024-11-19T12:04:02Z)
Collective Model Intelligence Requires Compatible Specialization [29.590052023903457]
We show that as models specialize, the similarity in their feature space structure diminishes, hindering their capacity for collective use. We propose a new direction for achieving collective model intelligence through what we call compatible specialization.
arXiv Detail & Related papers (2024-11-04T15:59:16Z)
Two are better than one: Context window extension with multi-grained self-injection [111.1376461868317]
SharedLLM is a novel approach grounded in the design philosophy of multi-grained context compression and query-aware information retrieval. We introduce a specialized tree-style data structure to efficiently encode, store and retrieve multi-grained contextual information for text chunks.
arXiv Detail & Related papers (2024-10-25T06:08:59Z)
HM3: Heterogeneous Multi-Class Model Merging [0.0]
We explore training-free model merging techniques to consolidate auxiliary guard-rail models into a single, multi-functional model. We propose Heterogeneous Multi-Class Model Merging (HM3) as a simple technique for merging multi-class classifiers with heterogeneous label spaces. We report promising results for merging BERT-based guard models, some of which attain an average F1-score higher than the source models while reducing the inference time by up to 44%.
arXiv Detail & Related papers (2024-09-27T22:42:45Z)
GrootVL: Tree Topology is All You Need in State Space Model [66.36757400689281]
GrootVL is a versatile multimodal framework that can be applied to both visual and textual tasks. Our method significantly outperforms existing structured state space models on image classification, object detection and segmentation. By fine-tuning large language models, our approach achieves consistent improvements in multiple textual tasks at minor training cost.
arXiv Detail & Related papers (2024-06-04T15:09:29Z)
Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. CMDPs serve as an important framework to model many real-world applications with time-varying environments. We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z)
Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications. On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation. Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z)
Sparsity-guided Network Design for Frame Interpolation [39.828644638174225]
We present a compression-driven network design for frame-based algorithms. We leverage model pruning through sparsity-inducing optimization to greatly reduce the model size. We achieve a considerable performance gain with a quarter of the size of the original AdaCoF.
arXiv Detail & Related papers (2022-09-09T23:13:25Z)
SUNet: Scale-aware Unified Network for Panoptic Segmentation [25.626882426111198]
We propose two lightweight modules to mitigate the problem of segmenting objects of various scales. We present an end-to-end Scale-aware Unified Network (SUNet) which is more adaptable to multi-scale objects.
arXiv Detail & Related papers (2022-09-07T01:40:41Z)
Multi-Scale Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition. MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z)
Model Patching: Closing the Subgroup Performance Gap with Data Augmentation [50.35010342284508]
We introduce model patching, a framework for improving robustness of machine learning models. Model patching encourages the model to be invariant to subgroup differences, and focus on class information shared by subgroups. We instantiate model patching with CAMEL, which (1) uses a CycleGAN to learn the intra-class, inter-subgroup augmentations, and (2) balances subgroup performance using a theoretically-motivated consistency regularizer. We demonstrate CAMEL's effectiveness on 3 benchmark datasets, with reductions in robust error up to 33% relative to the best baseline.
arXiv Detail & Related papers (2020-08-15T20:01:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.