Enhancing Target-unspecific Tasks through a Features Matrix
- URL: http://arxiv.org/abs/2505.03414v5
- Date: Tue, 03 Jun 2025 04:27:33 GMT
- Title: Enhancing Target-unspecific Tasks through a Features Matrix
- Authors: Fangming Cui, Yonggang Zhang, Xuan Wang, Xinmei Tian, Jun Yu,
- Abstract summary: General knowledge has a strong promotion on target-unspecific tasks.<n>We propose a novel Features Matrix (FM) approach designed to enhance these models on target-unspecific tasks.
- Score: 28.809451200584288
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent developments in prompt learning of large Vision-Language Models (VLMs) have significantly improved performance in target-specific tasks. However, these prompting methods often struggle to tackle the target-unspecific or generalizable tasks effectively. It may be attributed to the fact that overfitting training causes the model to forget its general knowledge. The general knowledge has a strong promotion on target-unspecific tasks. To alleviate this issue, we propose a novel Features Matrix (FM) approach designed to enhance these models on target-unspecific tasks. Our method extracts and leverages general knowledge, shaping a Features Matrix (FM). Specifically, the FM captures the semantics of diverse inputs from a deep and fine perspective, preserving essential general knowledge, which mitigates the risk of overfitting. Representative evaluations demonstrate that: 1) the FM is compatible with existing frameworks as a generic and flexible module, and 2) the FM significantly showcases its effectiveness in enhancing target-unspecific tasks (base-to-novel generalization, domain generalization, and cross-dataset generalization), achieving state-of-the-art performance.
Related papers
- SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning [62.18315467642528]
Multimodal Continual Instruction Tuning (MCIT) aims to enable Multimodal Large Language Models (MLLMs) to incrementally learn new tasks without catastrophic forgetting.<n>Superficial forgetting refers to cases where the model's knowledge may not be genuinely lost, but its responses to previous tasks deviate from expected formats.<n>By contrast, essential forgetting refers to situations where the model provides correctly formatted but factually inaccurate answers, indicating a true loss of knowledge.
arXiv Detail & Related papers (2025-05-05T09:09:41Z) - Chimera: Improving Generalist Model with Domain-Specific Experts [35.706585190958634]
We introduce a scalable and low-cost multi-modal pipeline designed to boost the ability of existing LMMs with domain-specific experts.<n>Specifically, we design a progressive training strategy to integrate features from expert models into the input of a generalist LMM.<n>This results in a versatile model that excels across the chart, table, math, and document domains.
arXiv Detail & Related papers (2024-12-08T16:10:42Z) - Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks.
To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z) - SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense Prediction [17.44991827937427]
Masked Image Modeling techniques have redefined the landscape of computer vision.
Despite their success, the full potential of MIM-based methods in dense prediction tasks, particularly in depth estimation, remains untapped.
We propose SG-MIM, a novel Structured knowledge Guided Masked Image Modeling framework designed to enhance dense prediction tasks by utilizing structured knowledge alongside images.
arXiv Detail & Related papers (2024-09-04T08:24:53Z) - Fully Fine-tuned CLIP Models are Efficient Few-Shot Learners [8.707819647492467]
We explore capturing the task-specific information via meticulous refinement of entire Vision-Language Models (VLMs)
To mitigate these issues, we propose a framework named CLIP-CITE via designing a discriminative visual-text task.
arXiv Detail & Related papers (2024-07-04T15:22:54Z) - Foundation Model Sherpas: Guiding Foundation Models through Knowledge
and Reasoning [23.763256908202496]
Foundation models (FMs) have revolutionized the field of AI by showing remarkable performance in various tasks.
FMs exhibit numerous limitations that prevent their broader adoption in many real-world systems.
We propose a conceptual framework that encapsulates different modes by which agents could interact with FMs.
arXiv Detail & Related papers (2024-02-02T18:00:35Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Specialist or Generalist? Instruction Tuning for Specific NLP Tasks [58.422495509760154]
We investigate whether incorporating broad-coverage generalist instruction tuning can contribute to building a specialist model.
Our experiments assess four target tasks with distinct coverage levels.
The effect is particularly pronounced when the amount of task-specific training data is limited.
arXiv Detail & Related papers (2023-10-23T19:46:48Z) - Learning from models beyond fine-tuning [78.20895343699658]
Learn From Model (LFM) focuses on the research, modification, and design of foundation models (FM) based on the model interface.<n>The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing.<n>This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM.
arXiv Detail & Related papers (2023-10-12T10:20:36Z) - VideoGLUE: Video General Understanding Evaluation of Foundation Models [89.07145427268948]
We evaluate video understanding capabilities of foundation models (FMs) using a carefully designed experiment protocol.
We jointly profile FMs' hallmark and efficacy efficiency when adapting to general video understanding tasks.
arXiv Detail & Related papers (2023-07-06T17:47:52Z) - Generalized Inverse Planning: Learning Lifted non-Markovian Utility for
Generalizable Task Representation [83.55414555337154]
In this work, we study learning such utility from human demonstrations.
We propose a new quest, Generalized Inverse Planning, for utility learning in this domain.
We outline a computational framework, Maximum Entropy Inverse Planning (MEIP), that learns non-Markovian utility and associated concepts in a generative manner.
arXiv Detail & Related papers (2020-11-12T21:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.