LMC: Large Model Collaboration with Cross-assessment for Training-Free
Open-Set Object Recognition
- URL: http://arxiv.org/abs/2309.12780v3
- Date: Thu, 21 Dec 2023 05:52:52 GMT
- Title: LMC: Large Model Collaboration with Cross-assessment for Training-Free
Open-Set Object Recognition
- Authors: Haoxuan Qu, Xiaofei Hui, Yujun Cai, Jun Liu
- Abstract summary: We propose a novel framework named Large Model Collaboration (LMC) to tackle the challenge via collaborating different off-the-shelf large models in a training-free manner.
We also incorporate the proposed framework with several novel designs to effectively extract implicit knowledge from large models.
- Score: 13.703679771847506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-set object recognition aims to identify if an object is from a class
that has been encountered during training or not. To perform open-set object
recognition accurately, a key challenge is how to reduce the reliance on
spurious-discriminative features. In this paper, motivated by that different
large models pre-trained through different paradigms can possess very rich
while distinct implicit knowledge, we propose a novel framework named Large
Model Collaboration (LMC) to tackle the above challenge via collaborating
different off-the-shelf large models in a training-free manner. Moreover, we
also incorporate the proposed framework with several novel designs to
effectively extract implicit knowledge from large models. Extensive experiments
demonstrate the efficacy of our proposed framework. Code is available
https://github.com/Harryqu123/LMC
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - UniFS: Universal Few-shot Instance Perception with Point Representations [36.943019984075065]
We propose UniFS, a universal few-shot instance perception model that unifies a wide range of instance perception tasks.
Our approach makes minimal assumptions about the tasks, yet it achieves competitive results compared to highly specialized and well optimized specialist models.
arXiv Detail & Related papers (2024-04-30T09:47:44Z) - Generative Active Learning for Image Synthesis Personalization [57.01364199734464]
This paper explores the application of active learning, traditionally studied in the context of discriminative models, to generative models.
The primary challenge in conducting active learning on generative models lies in the open-ended nature of querying.
We introduce the concept of anchor directions to transform the querying process into a semi-open problem.
arXiv Detail & Related papers (2024-03-22T06:45:45Z) - MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks [59.09343552273045]
We propose a decoder-only model for multimodal tasks, which is surprisingly effective in jointly learning of these disparate vision-language tasks.
We demonstrate that joint learning of these diverse objectives is simple, effective, and maximizes the weight-sharing of the model across these tasks.
Our model achieves the state of the art on image-text and text-image retrieval, video question answering and open-vocabulary detection tasks, outperforming much larger and more extensively trained foundational models.
arXiv Detail & Related papers (2023-03-29T16:42:30Z) - Prototype-guided Cross-task Knowledge Distillation for Large-scale
Models [103.04711721343278]
Cross-task knowledge distillation helps to train a small student model to obtain a competitive performance.
We propose a Prototype-guided Cross-task Knowledge Distillation (ProC-KD) approach to transfer the intrinsic local-level object knowledge of a large-scale teacher network to various task scenarios.
arXiv Detail & Related papers (2022-12-26T15:00:42Z) - Object Pursuit: Building a Space of Objects via Discriminative Weight
Generation [23.85039747700698]
We propose a framework to continuously learn object-centric representations for visual learning and understanding.
We leverage interactions to sample diverse variations of an object and the corresponding training signals while learning the object-centric representations.
We perform an extensive study of the key features of the proposed framework and analyze the characteristics of the learned representations.
arXiv Detail & Related papers (2021-12-15T08:25:30Z) - An Explicit-Joint and Supervised-Contrastive Learning Framework for
Few-Shot Intent Classification and Slot Filling [12.85364483952161]
Intent classification (IC) and slot filling (SF) are critical building blocks in task-oriented dialogue systems.
Few IC/SF models perform well when the number of training samples per class is quite small.
We propose a novel explicit-joint and supervised-contrastive learning framework for few-shot intent classification and slot filling.
arXiv Detail & Related papers (2021-10-26T13:28:28Z) - Learning from demonstration using products of experts: applications to
manipulation and task prioritization [12.378784643460474]
We show that the fusion of models in different task spaces can be expressed as a product of experts (PoE)
Multiple experiments are presented to show that learning the different models jointly in the PoE framework significantly improves the quality of the model.
arXiv Detail & Related papers (2020-10-07T16:24:41Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.