Switchable Representation Learning Framework with Self-compatibility
- URL: http://arxiv.org/abs/2206.08289v4
- Date: Thu, 23 Mar 2023 10:54:32 GMT
- Title: Switchable Representation Learning Framework with Self-compatibility
- Authors: Shengsen Wu, Yan Bai, Yihang Lou, Xiongkun Linghu, Jianzhong He and
Ling-Yu Duan
- Abstract summary: We propose a Switchable representation learning Framework with Self-Compatibility (SFSC)
SFSC generates a series of compatible sub-models with different capacities through one training process.
SFSC achieves state-of-the-art performance on the evaluated datasets.
- Score: 50.48336074436792
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world visual search systems involve deployments on multiple platforms
with different computing and storage resources. Deploying a unified model that
suits the minimal-constrain platforms leads to limited accuracy. It is expected
to deploy models with different capacities adapting to the resource
constraints, which requires features extracted by these models to be aligned in
the metric space. The method to achieve feature alignments is called
``compatible learning''. Existing research mainly focuses on the one-to-one
compatible paradigm, which is limited in learning compatibility among multiple
models. We propose a Switchable representation learning Framework with
Self-Compatibility (SFSC). SFSC generates a series of compatible sub-models
with different capacities through one training process. The optimization of
sub-models faces gradients conflict, and we mitigate this problem from the
perspective of the magnitude and direction. We adjust the priorities of
sub-models dynamically through uncertainty estimation to co-optimize sub-models
properly. Besides, the gradients with conflicting directions are projected to
avoid mutual interference. SFSC achieves state-of-the-art performance on the
evaluated datasets.
Related papers
- Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.
Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.
We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.
We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.
Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - Collective Model Intelligence Requires Compatible Specialization [29.590052023903457]
We show that as models specialize, the similarity in their feature space structure diminishes, hindering their capacity for collective use.
We propose a new direction for achieving collective model intelligence through what we call compatible specialization.
arXiv Detail & Related papers (2024-11-04T15:59:16Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [113.89327264634984]
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples.
Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially.
We propose a dual selective SSM projector that dynamically adjusts the projection parameters based on the intermediate features for dynamic adaptation.
arXiv Detail & Related papers (2024-07-08T17:09:39Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.