Related papers: Enhancing Target-unspecific Tasks through a Features Matrix

Enhancing Target-unspecific Tasks through a Features Matrix

URL: http://arxiv.org/abs/2505.03414v5
Date: Tue, 03 Jun 2025 04:27:33 GMT
Title: Enhancing Target-unspecific Tasks through a Features Matrix
Authors: Fangming Cui, Yonggang Zhang, Xuan Wang, Xinmei Tian, Jun Yu,
Abstract summary: General knowledge has a strong promotion on target-unspecific tasks.<n>We propose a novel Features Matrix (FM) approach designed to enhance these models on target-unspecific tasks.
Score: 28.809451200584288
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent developments in prompt learning of large Vision-Language Models (VLMs) have significantly improved performance in target-specific tasks. However, these prompting methods often struggle to tackle the target-unspecific or generalizable tasks effectively. It may be attributed to the fact that overfitting training causes the model to forget its general knowledge. The general knowledge has a strong promotion on target-unspecific tasks. To alleviate this issue, we propose a novel Features Matrix (FM) approach designed to enhance these models on target-unspecific tasks. Our method extracts and leverages general knowledge, shaping a Features Matrix (FM). Specifically, the FM captures the semantics of diverse inputs from a deep and fine perspective, preserving essential general knowledge, which mitigates the risk of overfitting. Representative evaluations demonstrate that: 1) the FM is compatible with existing frameworks as a generic and flexible module, and 2) the FM significantly showcases its effectiveness in enhancing target-unspecific tasks (base-to-novel generalization, domain generalization, and cross-dataset generalization), achieving state-of-the-art performance.

Related papers

SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning [62.18315467642528]
Multimodal Continual Instruction Tuning (MCIT) aims to enable Multimodal Large Language Models (MLLMs) to incrementally learn new tasks without catastrophic forgetting.<n>Superficial forgetting refers to cases where the model's knowledge may not be genuinely lost, but its responses to previous tasks deviate from expected formats.<n>By contrast, essential forgetting refers to situations where the model provides correctly formatted but factually inaccurate answers, indicating a true loss of knowledge.
arXiv Detail & Related papers (2025-05-05T09:09:41Z)
Chimera: Improving Generalist Model with Domain-Specific Experts [35.706585190958634]
We introduce a scalable and low-cost multi-modal pipeline designed to boost the ability of existing LMMs with domain-specific experts.<n>Specifically, we design a progressive training strategy to integrate features from expert models into the input of a generalist LMM.<n>This results in a versatile model that excels across the chart, table, math, and document domains.
arXiv Detail & Related papers (2024-12-08T16:10:42Z)
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks. To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z)
SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense Prediction [17.44991827937427]
Masked Image Modeling techniques have redefined the landscape of computer vision. Despite their success, the full potential of MIM-based methods in dense prediction tasks, particularly in depth estimation, remains untapped. We propose SG-MIM, a novel Structured knowledge Guided Masked Image Modeling framework designed to enhance dense prediction tasks by utilizing structured knowledge alongside images.
arXiv Detail & Related papers (2024-09-04T08:24:53Z)
Fully Fine-tuned CLIP Models are Efficient Few-Shot Learners [8.707819647492467]
We explore capturing the task-specific information via meticulous refinement of entire Vision-Language Models (VLMs) To mitigate these issues, we propose a framework named CLIP-CITE via designing a discriminative visual-text task.
arXiv Detail & Related papers (2024-07-04T15:22:54Z)
Foundation Model Sherpas: Guiding Foundation Models through Knowledge and Reasoning [23.763256908202496]
Foundation models (FMs) have revolutionized the field of AI by showing remarkable performance in various tasks. FMs exhibit numerous limitations that prevent their broader adoption in many real-world systems. We propose a conceptual framework that encapsulates different modes by which agents could interact with FMs.
arXiv Detail & Related papers (2024-02-02T18:00:35Z)
Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z)
Specialist or Generalist? Instruction Tuning for Specific NLP Tasks [58.422495509760154]
We investigate whether incorporating broad-coverage generalist instruction tuning can contribute to building a specialist model. Our experiments assess four target tasks with distinct coverage levels. The effect is particularly pronounced when the amount of task-specific training data is limited.
arXiv Detail & Related papers (2023-10-23T19:46:48Z)
Learning from models beyond fine-tuning [78.20895343699658]
Learn From Model (LFM) focuses on the research, modification, and design of foundation models (FM) based on the model interface.<n>The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing.<n>This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM.
arXiv Detail & Related papers (2023-10-12T10:20:36Z)
VideoGLUE: Video General Understanding Evaluation of Foundation Models [89.07145427268948]
We evaluate video understanding capabilities of foundation models (FMs) using a carefully designed experiment protocol. We jointly profile FMs' hallmark and efficacy efficiency when adapting to general video understanding tasks.
arXiv Detail & Related papers (2023-07-06T17:47:52Z)
Generalized Inverse Planning: Learning Lifted non-Markovian Utility for Generalizable Task Representation [83.55414555337154]
In this work, we study learning such utility from human demonstrations. We propose a new quest, Generalized Inverse Planning, for utility learning in this domain. We outline a computational framework, Maximum Entropy Inverse Planning (MEIP), that learns non-Markovian utility and associated concepts in a generative manner.
arXiv Detail & Related papers (2020-11-12T21:06:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.