Related papers: GenView++: Unifying Adaptive View Generation and Quality-Driven Supervision for Contrastive Representation Learning

GenView++: Unifying Adaptive View Generation and Quality-Driven Supervision for Contrastive Representation Learning

URL: http://arxiv.org/abs/2509.23770v1
Date: Sun, 28 Sep 2025 09:35:37 GMT
Title: GenView++: Unifying Adaptive View Generation and Quality-Driven Supervision for Contrastive Representation Learning
Authors: Xiaojie Li, Bei Wang, Jianlong Wu, Yue Yu, Liqiang Nie, Min Zhang,
Abstract summary: GenView++ is a unified framework for image-based contrastive learning.<n>It introduces a multi-source adaptive view generation mechanism to synthesize diverse yet semantically coherent views.<n>A quality-driven contrastive learning mechanism assesses each pair's semantic alignment and diversity to dynamically reweight their training contribution.<n>Experiments demonstrate the effectiveness of GenView++ across both vision and vision-language tasks.
Score: 71.47606279139679
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The success of contrastive learning depends on the construction and utilization of high-quality positive pairs. However, current methods face critical limitations on two fronts: on the construction side, both handcrafted and generative augmentations often suffer from limited diversity and risk semantic corruption; on the learning side, the absence of a quality assessment mechanism leads to suboptimal supervision where all pairs are treated equally. To tackle these challenges, we propose GenView++, a unified framework that addresses both fronts by introducing two synergistic innovations. To improve pair construction, GenView++ introduces a multi-source adaptive view generation mechanism to synthesize diverse yet semantically coherent views by dynamically modulating generative parameters across image-conditioned, text-conditioned, and image-text-conditioned strategies. Second, a quality-driven contrastive learning mechanism assesses each pair's semantic alignment and diversity to dynamically reweight their training contribution, prioritizing high-quality pairs while suppressing redundant or misaligned pairs. Extensive experiments demonstrate the effectiveness of GenView++ across both vision and vision-language tasks. For vision representation learning, it improves MoCov2 by +2.5% on ImageNet linear classification. For vision-language learning, it raises the average zero-shot classification accuracy by +12.31% over CLIP and +5.31% over SLIP across ten datasets, and further improves Flickr30k text retrieval R@5 by +3.2%. The code is available at https://github.com/xiaojieli0903/GenViewPlusPlus.

Related papers

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing [67.77471070868852]
DeepGen 1.0 is a lightweight 5B unified model for image generation and editing.<n>It is trained on only 50M samples, surpassing the 80B HunyuanImage by 28% on WISE and the 27B Qwen-Image-Edit by 37% on UniREditBench.<n>By open-sourcing our training code, weights, and datasets, we provide an efficient, high-performance alternative to democratize unified multimodal research.
arXiv Detail & Related papers (2026-02-12T17:44:24Z)
Socratic-Geo: Synthetic Data Generation and Geometric Reasoning via Multi-Agent Interaction [11.021067780524348]
Socratic-Geo is a fully autonomous framework that couples data synthesis with model learning through multi-agent interaction.<n>Socratic-r achieves 49.11 on six benchmarks using one-quarter of baseline data, surpassing strong baselines by 2.43 points.<n>Socratic-Generator achieves 42.4% on GenExam, establishing new state-of-the-art for open-source models.
arXiv Detail & Related papers (2026-02-03T11:42:25Z)
TULIP: Towards Unified Language-Image Pretraining [60.99500935831526]
We introduce T, an open-source, drop-in replacement for existing CLIP-like models.<n>Our method leverages generative data augmentation, enhanced image-image and text-text contrastive learning, and image/text reconstruction regularization to learn fine-grained visual features.<n>Our approach, scaling to over 1B parameters, outperforms existing state-of-the-art (SOTA) models across benchmarks.
arXiv Detail & Related papers (2025-03-19T17:58:57Z)
G-Refine: A General Quality Refiner for Text-to-Image Generation [74.16137826891827]
We introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compromising integrity of high-quality ones. The model is composed of three interconnected modules: a perception quality indicator, an alignment quality indicator, and a general quality enhancement module. Extensive experimentation reveals that AIGIs after G-Refine outperform in 10+ quality metrics across 4 databases.
arXiv Detail & Related papers (2024-04-29T00:54:38Z)
GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning [90.13980177575809]
GenView is a controllable framework that augments the diversity of positive views.<n>We introduce a quality-driven contrastive loss, which assesses the quality of positive pairs.<n>Thanks to the improved positive view quality and the quality-driven contrastive loss, GenView significantly improves self-supervised learning.
arXiv Detail & Related papers (2024-03-18T17:41:26Z)
Hallucination Improves the Performance of Unsupervised Visual Representation Learning [9.504503675097137]
We propose Hallucinator that could efficiently generate additional positive samples for further contrast. The Hallucinator is differentiable and creates new data in the feature space. Remarkably, we empirically prove that the proposed Hallucinator generalizes well to various contrastive learning models.
arXiv Detail & Related papers (2023-07-22T21:15:56Z)
Hierarchical Contrastive Learning Enhanced Heterogeneous Graph Neural Network [59.860534520941485]
Heterogeneous graph neural networks (HGNNs) as an emerging technique have shown superior capacity of dealing with heterogeneous information network (HIN) Recently, contrastive learning, a self-supervised method, becomes one of the most exciting learning paradigms and shows great potential when there are no labels. In this paper, we study the problem of self-supervised HGNNs and propose a novel co-contrastive learning mechanism for HGNNs, named HeCo.
arXiv Detail & Related papers (2023-04-24T16:17:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.