Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark
- URL: http://arxiv.org/abs/2108.10840v1
- Date: Tue, 24 Aug 2021 17:07:34 GMT
- Title: Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark
- Authors: Shuhao Qiu, Chuang Zhu, Wenli Zhou
- Abstract summary: In recent years, deep learning-based methods have shown promising results in computer vision area.
A common deep learning model requires a large amount of labeled data, which is labor-intensive to collect and label.
We propose a new method called Meta Self-Learning, which combines the self-learning method with the meta-learning paradigm.
- Score: 3.6248855491320016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, deep learning-based methods have shown promising results in
computer vision area. However, a common deep learning model requires a large
amount of labeled data, which is labor-intensive to collect and label. What's
more, the model can be ruined due to the domain shift between training data and
testing data. Text recognition is a broadly studied field in computer vision
and suffers from the same problems noted above due to the diversity of fonts
and complicated backgrounds. In this paper, we focus on the text recognition
problem and mainly make three contributions toward these problems. First, we
collect a multi-source domain adaptation dataset for text recognition,
including five different domains with over five million images, which is the
first multi-domain text recognition dataset to our best knowledge. Secondly, we
propose a new method called Meta Self-Learning, which combines the
self-learning method with the meta-learning paradigm and achieves a better
recognition result under the scene of multi-domain adaptation. Thirdly,
extensive experiments are conducted on the dataset to provide a benchmark and
also show the effectiveness of our method. The code of our work and dataset are
available soon at https://bupt-ai-cz.github.io/Meta-SelfLearning/.
Related papers
- A Cross-Lingual Meta-Learning Method Based on Domain Adaptation for Speech Emotion Recognition [1.8377902806196766]
Best-performing speech models are trained on large amounts of data in the language they are meant to work for.
Most languages have sparse data, making training models challenging.
Our work explores the model's performance in limited data, specifically for speech emotion recognition.
arXiv Detail & Related papers (2024-10-06T21:33:51Z) - VLMine: Long-Tail Data Mining with Vision Language Models [18.412533708652102]
This work focuses on the problem of identifying rare examples within a corpus of unlabeled data.
We propose a simple and scalable data mining approach that leverages the knowledge contained within a large vision language model (VLM)
Our experiments consistently show large improvements (between 10% and 50%) over the baseline techniques.
arXiv Detail & Related papers (2024-09-23T19:13:51Z) - M3: A Multi-Task Mixed-Objective Learning Framework for Open-Domain Multi-Hop Dense Sentence Retrieval [12.277521531556852]
M3 is a novel Multi-hop dense sentence retrieval system built upon a novel Multi-task Mixed-objective approach for dense text representation learning.
Our approach yields state-of-the-art performance on a large-scale open-domain fact verification benchmark dataset, FEVER.
arXiv Detail & Related papers (2024-03-21T01:52:07Z) - CDFSL-V: Cross-Domain Few-Shot Learning for Videos [58.37446811360741]
Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples.
Existing methods in video action recognition rely on large labeled datasets from the same domain.
We propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning.
arXiv Detail & Related papers (2023-09-07T19:44:27Z) - Using Language to Extend to Unseen Domains [81.37175826824625]
It is expensive to collect training data for every possible domain that a vision model may encounter when deployed.
We consider how simply verbalizing the training domain as well as domains we want to extend to but do not have data for can improve robustness.
Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain.
arXiv Detail & Related papers (2022-10-18T01:14:02Z) - Multimodal Masked Autoencoders Learn Transferable Representations [127.35955819874063]
We propose a simple and scalable network architecture, the Multimodal Masked Autoencoder (M3AE)
M3AE learns a unified encoder for both vision and language data via masked token prediction.
We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks.
arXiv Detail & Related papers (2022-05-27T19:09:42Z) - CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for
Multimodal Sentiment Detection [24.243349217940274]
We propose a Contrastive Learning and Multi-Layer Fusion (CLMLF) method for multimodal sentiment detection.
Specifically, we first encode text and image to obtain hidden representations, and then use a multi-layer fusion module to align and fuse the token-level features of text and image.
In addition to the sentiment analysis task, we also designed two contrastive learning tasks, label based contrastive learning and data based contrastive learning tasks.
arXiv Detail & Related papers (2022-04-12T04:03:06Z) - Text-Based Person Search with Limited Data [66.26504077270356]
Text-based person search (TBPS) aims at retrieving a target person from an image gallery with a descriptive text query.
We present a framework with two novel components to handle the problems brought by limited data.
arXiv Detail & Related papers (2021-10-20T22:20:47Z) - Domain Adaptive Semantic Segmentation without Source Data [50.18389578589789]
We investigate domain adaptive semantic segmentation without source data, which assumes that the model is pre-trained on the source domain.
We propose an effective framework for this challenging problem with two components: positive learning and negative learning.
Our framework can be easily implemented and incorporated with other methods to further enhance the performance.
arXiv Detail & Related papers (2021-10-13T04:12:27Z) - Machine learning with limited data [1.2183405753834562]
We study few shot image classification, in which we only have very few labeled data.
One method is to augment image features by mixing the style of these images.
The second method is applying spatial attention to explore the relations between patches of images.
arXiv Detail & Related papers (2021-01-18T17:10:39Z) - A Review of Single-Source Deep Unsupervised Visual Domain Adaptation [81.07994783143533]
Large-scale labeled training datasets have enabled deep neural networks to excel across a wide range of benchmark vision tasks.
In many applications, it is prohibitively expensive and time-consuming to obtain large quantities of labeled data.
To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled or unlabeled target domain.
arXiv Detail & Related papers (2020-09-01T00:06:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.