Related papers: Online-PVLM: Advancing Personalized VLMs with Online Concept Learning

Online-PVLM: Advancing Personalized VLMs with Online Concept Learning

URL: http://arxiv.org/abs/2511.20056v1
Date: Tue, 25 Nov 2025 08:25:30 GMT
Title: Online-PVLM: Advancing Personalized VLMs with Online Concept Learning
Authors: Huiyu Bai, Runze Wang, Zhuoyun Du, Yiyang Zhao, Fengji Zhang, Haoyu Chen, Xiaoyong Zhu, Bo Zheng, Xuejiao Zhao,
Abstract summary: Online-PVLM is a framework for online concept learning by leveraging hyperbolic representations.<n>We develop OP-Eval, a benchmark comprising 1,292 concepts and over 30K high-quality instances with diverse question types.
Score: 19.46716778297505
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalized Visual Language Models (VLMs) are gaining increasing attention for their formidable ability in user-specific concepts aligned interactions (e.g., identifying a user's bike). Existing methods typically require the learning of separate embeddings for each new concept, which fails to support real-time adaptation during testing. This limitation becomes particularly pronounced in large-scale scenarios, where efficient retrieval of concept embeddings is not achievable. To alleviate this gap, we propose Online-PVLM, a framework for online concept learning by leveraging hyperbolic representations. Our approach makes a train-free paradigm for concept embeddings generation at test time, making the use of personalized VLMs both scalable and efficient. In addition, we develop OP-Eval, a comprehensive and large-scale benchmark comprising 1,292 concepts and over 30K high-quality instances with diverse question types, designed to rigorously assess online concept learning in realistic scenarios. Extensive experiments demonstrate the state-of-the-art performance of our proposed framework. Our source code and dataset will be made available.

Related papers

Concept-Aware Batch Sampling Improves Language-Image Pretraining [78.53540190580189]
Concept-Aware Batch Sampling (CABS) is a simple yet effective batch sampling framework that flexibly constructs batches on-the-fly.<n>We show that CABS significantly benefits CLIP/SigLIP model classes and yields highly performant models.<n>Overall, CABS represents a strong open-source alternative to proprietary online data curation algorithms.
arXiv Detail & Related papers (2025-11-25T18:58:07Z)
FaCT: Faithful Concept Traces for Explaining Neural Network Decisions [56.796533084868884]
Deep networks have shown remarkable performance across a wide range of tasks, yet getting a global concept-level understanding of how they function remains a key challenge.<n>We put emphasis on the faithfulness of concept-based explanations and propose a new model with model-inherent mechanistic concept-explanations.<n>Our concepts are shared across classes and, from any layer, their contribution to the logit and their input-visualization can be faithfully traced.
arXiv Detail & Related papers (2025-10-29T13:35:46Z)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model [51.645660375766575]
This paper proposes the first multi-concept personalization paradigm, MC-LLaVA.<n>MC-LLaVA employs a multi-concept instruction tuning strategy, effectively integrating multiple concepts in a single training step.<n> Comprehensive qualitative and quantitative experiments demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses.
arXiv Detail & Related papers (2025-03-24T16:32:17Z)
Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models [25.84386438333865]
We show that concepts and classes form a complex web of relationships, which is susceptible to degradation and needs to be preserved and augmented across experiences.<n>We propose a novel method - MuCIL - that uses multimodal concepts to perform classification without increasing the number of trainable parameters across experiences.
arXiv Detail & Related papers (2025-02-27T18:59:29Z)
CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification [7.641602003136712]
Concept Bottleneck Models (CBMs) tackle the latter by constraining the model output on a set of predefined and human-interpretable concepts.<n>We propose a simple, yet effective, methodology, CBVLM, which tackles both of the aforementioned challenges.<n>We validate our approach with extensive experiments across four medical datasets and twelve LVLMs.
arXiv Detail & Related papers (2025-01-21T16:38:04Z)
MC-LLaVA: Multi-Concept Personalized Vision-Language Model [51.645660375766575]
This paper proposes the first multi-concept personalization paradigm, MC-LLaVA.<n>MC-LLaVA employs a multi-concept instruction tuning strategy, effectively integrating multiple concepts in a single training step.<n> Comprehensive qualitative and quantitative experiments demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses.
arXiv Detail & Related papers (2024-11-18T16:33:52Z)
MyVLM: Personalizing VLMs for User-Specific Queries [78.33252556805931]
We take a first step toward the personalization of vision-language models, enabling them to learn and reason over user-provided concepts. To effectively recognize a variety of user-specific concepts, we augment the VLM with external concept heads that function as toggles for the model. Having recognized the concept, we learn a new concept embedding in the intermediate feature space of the VLM. This embedding is tasked with guiding the language model to naturally integrate the target concept in its generated response.
arXiv Detail & Related papers (2024-03-21T17:51:01Z)
A Competence-aware Curriculum for Visual Concepts Learning via Question Answering [95.35905804211698]
We propose a competence-aware curriculum for visual concept learning in a question-answering manner. We design a neural-symbolic concept learner for learning the visual concepts and a multi-dimensional Item Response Theory (mIRT) model for guiding the learning process. Experimental results on CLEVR show that with a competence-aware curriculum, the proposed method achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-07-03T05:08:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.