Related papers: CHIPS: Efficient CLIP Adaptation via Curvature-aware Hybrid Influence-based Data Selection

CHIPS: Efficient CLIP Adaptation via Curvature-aware Hybrid Influence-based Data Selection

URL: http://arxiv.org/abs/2511.18519v1
Date: Sun, 23 Nov 2025 16:25:42 GMT
Title: CHIPS: Efficient CLIP Adaptation via Curvature-aware Hybrid Influence-based Data Selection
Authors: Xinlin Zhuang, Yichen Li, Xiwei Liu, Haolin Yang, Yifan Lu, Ziyun Zou, Yulong Li, Huifa Li, Dongliang Chen, Qinglei Wang, Weiyang Liu, Ying Qian, Jiangming Shi, Imran Razzak,
Abstract summary: Adapting CLIP to vertical domains is typically approached by novel fine-tuning strategies or by continual pre-training (CPT) on large domain-specific datasets.<n>We revisit this task from a data-centric perspective: Can effective data selection substitute for large-scale datasets in CPT?<n>We introduce CHIPS (Curvature-aware Hybrid Influence in Projection Subspace), which assigns each image-text pair a utility score that integrates three complementary factors aligned with three goals.
Score: 41.61500990573312
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Adapting CLIP to vertical domains is typically approached by novel fine-tuning strategies or by continual pre-training (CPT) on large domain-specific datasets. Yet, data itself remains an underexplored factor in this process. We revisit this task from a data-centric perspective: Can effective data selection substitute for large-scale datasets in CPT? We introduce CHIPS (Curvature-aware Hybrid Influence in Projection Subspace), which assigns each image-text pair a utility score that integrates three complementary factors aligned with three goals: faithfulness via a curvature-aware, Newton-style alignment computed in CLIP's end-point subspace; scalability via an InfoNCE-aware curvature estimator with Johnson-Lindenstrauss (JL) sketching; and retention via a selection-aware relevance weight combined with learnability to balance target adaptation against general-domain preservation. We justify this design theoretically by proving a lower-bound guarantee on the proxy's correlation with full-parameter alignment and by characterizing the bias-variance trade-offs introduced by curvature mixing and JL sketching. We evaluate CHIPS empirically across various settings: 1) CHIPS attains state-of-the-art performance among selection baselines on 17 medical benchmarks, matches full-dataset CPT with 30% of the data, and outperforms half-dataset CPT using only 10%; 2) on 31 general-domain benchmarks, CHIPS yields the smallest performance drop under 10-30% data-retention budgets. Code, data, and checkpoints will be released.

Related papers

Geometric Prior-Guided Federated Prompt Calibration [21.766231067185956]
Federated Prompt Learning (FPL) offers a parameter-efficient solution for collaboratively training large models.<n>Existing methods, focusing on aggregation or regularization, fail to address this root cause of local training bias.<n>We propose Geometry-Guided Text Prompt (GGTPC), a novel framework that directly corrects this bias by providing clients with a global geometric prior.
arXiv Detail & Related papers (2025-12-08T06:42:32Z)
Geometric Data Valuation via Leverage Scores [0.2538209532048866]
We propose a geometric alternative to Shapley data valuation based on statistical leverage scores.<n>We show that our scores satisfy the dummy, efficiency, and symmetry axioms of Shapley valuation.<n>We also show that training on a leverage-sampled subset produces a model whose parameters and predictive risk are within $O(varepsilon)$ of the full-data optimum.
arXiv Detail & Related papers (2025-11-03T22:20:50Z)
Data-Efficient RLVR via Off-Policy Influence Guidance [84.60336960383867]
This work proposes a theoretically-grounded approach using influence functions to estimate the contribution of each data point to the learning objective.<n>We develop textbfCurriculum textbfRL with textbfOff-textbfPolicy textInfluence guidance (textbfCROPI), a multi-stage RL framework that iteratively selects the most influential data for the current policy.
arXiv Detail & Related papers (2025-10-30T13:40:52Z)
CO-PFL: Contribution-Oriented Personalized Federated Learning for Heterogeneous Networks [51.43780477302533]
Contribution-Oriented PFL (CO-PFL) is a novel algorithm that dynamically estimates each client's contribution for global aggregation.<n>CO-PFL consistently surpasses state-of-the-art methods in robustness in personalization accuracy, robustness, scalability and convergence stability.
arXiv Detail & Related papers (2025-10-23T05:10:06Z)
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning [71.30276778807068]
We propose a unified framework that strategically coordinates sample pruning and token pruning.<n>Q-Tuning achieves a +38% average improvement over the full-data SFT baseline using only 12.5% of the original training data.
arXiv Detail & Related papers (2025-09-28T13:27:38Z)
A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning [43.847168319564844]
We propose an FL backdoor defense framework, named CLIP-Fed, that utilizes the zero-shot learning capabilities of vision-language pre-training models.<n>Our scheme overcomes the limitations of Non-IID imposed on defense effectiveness by integrating pre-aggregation and post-aggregation defense strategies.
arXiv Detail & Related papers (2025-08-14T03:39:54Z)
Quality over Quantity: An Effective Large-Scale Data Reduction Strategy Based on Pointwise V-Information [2.133855532092057]
We propose an effective data reduction strategy based on Pointwise V-Information (PVI)<n>Experiments show that classifier performance is maintained with only a 0.0001% to 0.76% decline in accuracy when 10%-30% of the data is removed.<n>We have adapted the PVI framework, which was previously limited to English datasets, to a variety of Chinese Natural Language Processing (NLP) tasks and base models.
arXiv Detail & Related papers (2025-06-19T06:59:19Z)
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning [19.100022935748225]
Data selection has emerged as a core issue for large-scale visual-language model pretaining (e.g., CLIP)<n>Three main data selection approaches are: (1) leveraging external non-CLIP models to aid data selection, (2) training new CLIP-style embedding models that are more effective at selecting high-quality data, and (3) designing better metrics or strategies universally applicable to any CLIP embedding.
arXiv Detail & Related papers (2024-05-29T22:19:57Z)
FairerCLIP: Debiasing CLIP's Zero-Shot Predictions using Functions in RKHSs [24.991684983495542]
This paper proposes FairerCLIP, a general approach for making zero-shot predictions of CLIP more fair and robust to spurious correlations. We formulate the problem of jointly debiasing CLIP's image and text representations in reproducing Hilbert kernel spaces (RKHSs)
arXiv Detail & Related papers (2024-03-22T19:41:26Z)
Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding. The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data. We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z)
Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data [122.282521548393]
Contrastive Language-Image Pre-training (CLIP) has become the standard for cross-modal image-text representation learning.<n>We introduce HELIP, a cost-effective strategy that improves CLIP models by exploiting challenging text-image pairs within existing datasets in continuous training.
arXiv Detail & Related papers (2023-05-09T07:00:17Z)
DataComp: In search of the next generation of multimodal datasets [179.79323076587255]
DataComp is a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. Our benchmark consists of multiple compute scales spanning four orders of magnitude. In particular, our best baseline, DataComp-1B, enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet.
arXiv Detail & Related papers (2023-04-27T11:37:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.