DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
- URL: http://arxiv.org/abs/2503.03651v1
- Date: Wed, 05 Mar 2025 16:26:58 GMT
- Title: DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
- Authors: Rui Zhao, Weijia Mao, Mike Zheng Shou,
- Abstract summary: We propose DoraCycle, which integrates two multimodal cycles: text-to-image-to-text and image-to-text-to-image.<n>The model is optimized through cross-entropy loss computed at the cycle endpoints, where both endpoints share the same modality.<n>For tasks involving new paired knowledge, such as specific identities, a combination of a small set of paired image-text examples and larger-scale unpaired data is sufficient.
- Score: 19.096747443000194
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Adapting generative models to specific domains presents an effective solution for satisfying specialized requirements. However, adapting to some complex domains remains challenging, especially when these domains require substantial paired data to capture the targeted distributions. Since unpaired data from a single modality, such as vision or language, is more readily available, we utilize the bidirectional mappings between vision and language learned by the unified generative model to enable training on unpaired data for domain adaptation. Specifically, we propose DoraCycle, which integrates two multimodal cycles: text-to-image-to-text and image-to-text-to-image. The model is optimized through cross-entropy loss computed at the cycle endpoints, where both endpoints share the same modality. This facilitates self-evolution of the model without reliance on annotated text-image pairs. Experimental results demonstrate that for tasks independent of paired knowledge, such as stylization, DoraCycle can effectively adapt the unified model using only unpaired data. For tasks involving new paired knowledge, such as specific identities, a combination of a small set of paired image-text examples and larger-scale unpaired data is sufficient for effective domain-oriented adaptation. The code will be released at https://github.com/showlab/DoraCycle.
Related papers
- Cross-Domain Content Generation with Domain-Specific Small Language Models [3.2772349789781616]
This study explores methods to enable a small language model to produce coherent and relevant outputs for two different domains.
We find that utilizing custom tokenizers tailored to each dataset significantly enhances generation quality.
Our findings demonstrate that knowledge expansion with frozen layers is an effective method for small language models to generate domain-specific content.
arXiv Detail & Related papers (2024-09-19T21:45:13Z) - Multi-source Unsupervised Domain Adaptation on Graphs with Transferability Modeling [35.39202826643388]
We present the framework Selective Multi-source Adaptation for Graph (method), with a graph-modeling-based domain selector, a sub-graph node selector, and a bi-level alignment objective.
Results on five graph datasets show the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-06-14T22:05:21Z) - SALUDA: Surface-based Automotive Lidar Unsupervised Domain Adaptation [62.889835139583965]
We introduce an unsupervised auxiliary task of learning an implicit underlying surface representation simultaneously on source and target data.
As both domains share the same latent representation, the model is forced to accommodate discrepancies between the two sources of data.
Our experiments demonstrate that our method achieves a better performance than the current state of the art, both in real-to-real and synthetic-to-real scenarios.
arXiv Detail & Related papers (2023-04-06T17:36:23Z) - BiCro: Noisy Correspondence Rectification for Multi-modality Data via
Bi-directional Cross-modal Similarity Consistency [66.8685113725007]
BiCro aims to estimate soft labels for noisy data pairs to reflect their true correspondence degree.
experiments on three popular cross-modal matching datasets demonstrate that BiCro significantly improves the noise-robustness of various matching models.
arXiv Detail & Related papers (2023-03-22T09:33:50Z) - Stacking Ensemble Learning in Deep Domain Adaptation for Ophthalmic
Image Classification [61.656149405657246]
Domain adaptation is effective in image classification tasks where obtaining sufficient label data is challenging.
We propose a novel method, named SELDA, for stacking ensemble learning via extending three domain adaptation methods.
The experimental results using Age-Related Eye Disease Study (AREDS) benchmark ophthalmic dataset demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-09-27T14:19:00Z) - Adapting the Mean Teacher for keypoint-based lung registration under
geometric domain shifts [75.51482952586773]
deep neural networks generally require plenty of labeled training data and are vulnerable to domain shifts between training and test data.
We present a novel approach to geometric domain adaptation for image registration, adapting a model from a labeled source to an unlabeled target domain.
Our method consistently improves on the baseline model by 50%/47% while even matching the accuracy of models trained on target data.
arXiv Detail & Related papers (2022-07-01T12:16:42Z) - PixMatch: Unsupervised Domain Adaptation via Pixelwise Consistency
Training [4.336877104987131]
Unsupervised domain adaptation is a promising technique for semantic segmentation.
We present a novel framework for unsupervised domain adaptation based on the notion of target-domain consistency training.
Our approach is simpler, easier to implement, and more memory-efficient during training.
arXiv Detail & Related papers (2021-05-17T19:36:28Z) - Cross-modal Learning for Domain Adaptation in 3D Semantic Segmentation [11.895722159139108]
Domain adaptation is an important task to enable learning when labels are scarce.
We propose cross-modal learning, where we enforce consistency between the predictions of two modalities via mutual mimicking.
We constrain our network to make correct predictions on labeled data and consistent predictions across modalities on unlabeled target-domain data.
arXiv Detail & Related papers (2021-01-18T18:59:21Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Unsupervised Intra-domain Adaptation for Semantic Segmentation through
Self-Supervision [73.76277367528657]
Convolutional neural network-based approaches have achieved remarkable progress in semantic segmentation.
To cope with this limitation, automatically annotated data generated from graphic engines are used to train segmentation models.
We propose a two-step self-supervised domain adaptation approach to minimize the inter-domain and intra-domain gap together.
arXiv Detail & Related papers (2020-04-16T15:24:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.