Analyzing Multimodal Objectives Through the Lens of Generative Diffusion
Guidance
- URL: http://arxiv.org/abs/2302.10305v1
- Date: Fri, 10 Feb 2023 11:17:20 GMT
- Title: Analyzing Multimodal Objectives Through the Lens of Generative Diffusion
Guidance
- Authors: Chaerin Kong, Nojun Kwak
- Abstract summary: We leverage the fact that classifier-guided diffusion models generate images that reflect the semantic signals provided by the classifier.
Specifically, we compare contrastive, matching and captioning loss in terms of their semantic signals, and introduce a simple baseline that not only supports our analyses but also improves the quality of generative guidance.
- Score: 34.27851973031995
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent years have witnessed astonishing advances in the field of multimodal
representation learning, with contrastive learning being the cornerstone for
major breakthroughs. Latest works delivered further improvements by
incorporating different objectives such as masked modeling and captioning into
the frameworks, but our understanding on how these objectives facilitate
learning remains vastly incomplete. In this paper, we leverage the fact that
classifier-guided diffusion models generate images that reflect the semantic
signals provided by the classifier to study the characteristics of multimodal
learning objectives. Specifically, we compare contrastive, matching and
captioning loss in terms of their semantic signals, and introduce a simple
baseline that not only supports our analyses but also improves the quality of
generative guidance in a straightforward manner.
Related papers
- Towards Generative Class Prompt Learning for Fine-grained Visual Recognition [5.633314115420456]
Generative Class Prompt Learning and Contrastive Multi-class Prompt Learning are presented.
Generative Class Prompt Learning improves visio-linguistic synergy in class embeddings by conditioning on few-shot exemplars with learnable class prompts.
CoMPLe builds on this foundation by introducing a contrastive learning component that encourages inter-class separation.
arXiv Detail & Related papers (2024-09-03T12:34:21Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Heterogeneous Contrastive Learning for Foundation Models and Beyond [73.74745053250619]
In the era of big data and Artificial Intelligence, an emerging paradigm is to utilize contrastive self-supervised learning to model large-scale heterogeneous data.
This survey critically evaluates the current landscape of heterogeneous contrastive learning for foundation models.
arXiv Detail & Related papers (2024-03-30T02:55:49Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Objectives Matter: Understanding the Impact of Self-Supervised
Objectives on Vision Transformer Representations [13.437097059358067]
We show that reconstruction-based learning features are significantly dissimilar to joint-embedding based learning features.
We find that joint-embedding features yield better linear probe transfer for classification because the different objectives drive different distributions of information.
arXiv Detail & Related papers (2023-04-25T18:48:23Z) - Improving the Modality Representation with Multi-View Contrastive
Learning for Multimodal Sentiment Analysis [15.623293264871181]
This study investigates the improvement approaches of modality representation with contrastive learning.
We devise a three-stages framework with multi-view contrastive learning to refine representations for the specific objectives.
We conduct experiments on three open datasets, and results show the advance of our model.
arXiv Detail & Related papers (2022-10-28T01:25:16Z) - Do Vision-and-Language Transformers Learn Grounded Predicate-Noun
Dependencies? [0.06299766708197882]
We create a new task targeted at evaluating understanding of predicate-noun dependencies in a controlled setup.
We evaluate a range of state-of-the-art models and find that their performance on the task varies considerably.
This study highlights that targeted and controlled evaluations are a crucial step for a precise and rigorous test of the multimodal knowledge of vision-and-language models.
arXiv Detail & Related papers (2022-10-21T16:07:00Z) - Learning Transferable Adversarial Robust Representations via Multi-view
Consistency [57.73073964318167]
We propose a novel meta-adversarial multi-view representation learning framework with dual encoders.
We demonstrate the effectiveness of our framework on few-shot learning tasks from unseen domains.
arXiv Detail & Related papers (2022-10-19T11:48:01Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z) - Revisiting Meta-Learning as Supervised Learning [69.2067288158133]
We aim to provide a principled, unifying framework by revisiting and strengthening the connection between meta-learning and traditional supervised learning.
By treating pairs of task-specific data sets and target models as (feature, label) samples, we can reduce many meta-learning algorithms to instances of supervised learning.
This view not only unifies meta-learning into an intuitive and practical framework but also allows us to transfer insights from supervised learning directly to improve meta-learning.
arXiv Detail & Related papers (2020-02-03T06:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.