DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning
- URL: http://arxiv.org/abs/2111.12062v1
- Date: Tue, 23 Nov 2021 18:22:14 GMT
- Title: DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning
- Authors: Alex Tamkin, Vincent Liu, Rongfei Lu, Daniel Fein, Colin Schultz, Noah
Goodman
- Abstract summary: We present DABS: a Domain-Agnostic Benchmark for Self-supervised learning.
An algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions.
We also present e-Mix and ShED: two baseline domain-agnostic algorithms.
- Score: 6.040682281295584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning algorithms, including BERT and SimCLR, have enabled
significant strides in fields like natural language processing, computer
vision, and speech processing. However, these algorithms are domain-specific,
meaning that new self-supervised learning algorithms must be developed for each
new setting, including myriad healthcare, scientific, and multimodal domains.
To catalyze progress toward domain-agnostic methods, we introduce DABS: a
Domain-Agnostic Benchmark for Self-supervised learning. To perform well on
DABS, an algorithm is evaluated on seven diverse domains: natural images,
multichannel sensor data, English text, speech recordings, multilingual text,
chest x-rays, and images with text descriptions. Each domain contains an
unlabeled dataset for pretraining; the model is then is scored based on its
downstream performance on a set of labeled tasks in the domain. We also present
e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively
modest performance demonstrates that significant progress is needed before
self-supervised learning is an out-of-the-box solution for arbitrary domains.
Code for benchmark datasets and baseline algorithms is available at
https://github.com/alextamkin/dabs.
Related papers
- A Curriculum Learning Approach for Multi-domain Text Classification
Using Keyword weight Ranking [17.71297141482757]
We propose to use a curriculum learning strategy based on keyword weight ranking to improve the performance of multi-domain text classification models.
The experimental results on the Amazon review and FDU-MTL datasets show that our curriculum learning strategy effectively improves the performance of multi-domain text classification models.
arXiv Detail & Related papers (2022-10-27T03:15:26Z) - Domain Invariant Masked Autoencoders for Self-supervised Learning from
Multi-domains [73.54897096088149]
We propose a Domain-invariant Masked AutoEncoder (DiMAE) for self-supervised learning from multi-domains.
The core idea is to augment the input image with style noise from different domains and then reconstruct the image from the embedding of the augmented image.
Experiments on PACS and DomainNet illustrate that DiMAE achieves considerable gains compared with recent state-of-the-art methods.
arXiv Detail & Related papers (2022-05-10T09:49:40Z) - Balancing Multi-Domain Corpora Learning for Open-Domain Response
Generation [3.3242685629646256]
Open-domain conversational systems are assumed to generate equally good responses on multiple domains.
This paper explores methods of generating relevant responses for each of multiple multi-domain corpora.
arXiv Detail & Related papers (2022-05-05T11:10:54Z) - Domain Adaptation via Prompt Learning [39.97105851723885]
Unsupervised domain adaption (UDA) aims to adapt models learned from a well-annotated source domain to a target domain.
We introduce a novel prompt learning paradigm for UDA, named Domain Adaptation via Prompt Learning (DAPL)
arXiv Detail & Related papers (2022-02-14T13:25:46Z) - Contrastive Learning and Self-Training for Unsupervised Domain
Adaptation in Semantic Segmentation [71.77083272602525]
UDA attempts to provide efficient knowledge transfer from a labeled source domain to an unlabeled target domain.
We propose a contrastive learning approach that adapts category-wise centroids across domains.
We extend our method with self-training, where we use a memory-efficient temporal ensemble to generate consistent and reliable pseudo-labels.
arXiv Detail & Related papers (2021-05-05T11:55:53Z) - TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for
Unsupervised Sentence Embedding Learning [53.32740707197856]
We present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE)
It can achieve up to 93.1% of the performance of in-domain supervised approaches.
arXiv Detail & Related papers (2021-04-14T17:02:18Z) - PADA: A Prompt-based Autoregressive Approach for Adaptation to Unseen
Domains [19.682729518136142]
PADA: A Prompt-based Autoregressive Domain Adaptation algorithm, based on the T5 model.
We present PADA: A Prompt-based Autoregressive Domain Adaptation algorithm, based on the T5 model.
In experiments with two tasks, PADA strongly outperforms state-of-the-art approaches and additional strong baselines.
arXiv Detail & Related papers (2021-02-24T11:02:29Z) - CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web
to Special Domain Search [89.48123965553098]
This paper presents a search system to alleviate the special domain adaption problem.
The system utilizes the domain-adaptive pretraining and few-shot learning technologies to help neural rankers mitigate the domain discrepancy.
Our system performs the best among the non-manual runs in Round 2 of the TREC-COVID task.
arXiv Detail & Related papers (2020-11-03T09:10:48Z) - Meta-Learning for Domain Generalization in Semantic Parsing [124.32975734073949]
We use a meta-learning framework which targets zero-shot domain for semantic parsing.
We apply a model-agnostic training algorithm that simulates zero-shot parsing virtual train and test sets from disjoint domains.
arXiv Detail & Related papers (2020-10-22T19:00:36Z) - Cross-Domain Facial Expression Recognition: A Unified Evaluation
Benchmark and Adversarial Graph Learning [85.6386289476598]
We develop a novel adversarial graph representation adaptation (AGRA) framework for cross-domain holistic-local feature co-adaptation.
We conduct extensive and fair evaluations on several popular benchmarks and show that the proposed AGRA framework outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2020-08-03T15:00:31Z) - Text Detection on Roughly Placed Books by Leveraging a Learning-based
Model Trained with Another Domain Data [0.30458514384586394]
In this paper, we focus on how to generate bounding boxes that are appropriate to grasp text areas on books.
We develop algorithms that construct the bounding boxes by improving and leveraging the results of a learning-based method.
Our algorithms can utilize different learning-based approaches to detect scene texts.
arXiv Detail & Related papers (2020-06-26T05:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.