Related papers: Efficient Hierarchical Domain Adaptation for Pretrained Language Models

Efficient Hierarchical Domain Adaptation for Pretrained Language Models

URL: http://arxiv.org/abs/2112.08786v1
Date: Thu, 16 Dec 2021 11:09:29 GMT
Title: Efficient Hierarchical Domain Adaptation for Pretrained Language Models
Authors: Alexandra Chronopoulou, Matthew E. Peters, Jesse Dodge
Abstract summary: Generative language models are trained on diverse, general domain corpora. We introduce a method to scale domain adaptation to many diverse domains using a computationally efficient adapter approach.
Score: 77.02962815423658
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative language models are trained on diverse, general domain corpora. However, this limits their applicability to narrower domains, and prior work has shown that continued in-domain training can provide further gains. In this paper, we introduce a method to scale domain adaptation to many diverse domains using a computationally efficient adapter approach. Our method is based on the observation that textual domains are partially overlapping, and we represent domains as a hierarchical tree structure where each node in the tree is associated with a set of adapter weights. When combined with a frozen pretrained language model, this approach enables parameter sharing among related domains, while avoiding negative interference between unrelated ones. It is efficient and computational cost scales as O(log(D)) for D domains. Experimental results with GPT-2 and a large fraction of the 100 most represented websites in C4 show across-the-board improvements in-domain. We additionally provide an inference time algorithm for a held-out domain and show that averaging over multiple paths through the tree enables further gains in generalization, while adding only a marginal cost to inference.

Related papers

DAOT: Domain-Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting [35.83485358725357]
Domain adaptation is commonly employed in crowd counting to bridge the domain gaps between different datasets. Existing domain adaptation methods tend to focus on inter-dataset differences while overlooking the intra-differences within the same dataset. We propose a Domain-agnostically Aligned Optimal Transport (DAOT) strategy that aligns domain-agnostic factors between domains.
arXiv Detail & Related papers (2023-08-10T02:59:40Z)
Improving Domain Generalization with Domain Relations [77.63345406973097]
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on. We propose a new approach called D$3$G to learn domain-specific models. Our results show that D$3$G consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-06T08:11:16Z)
M2D2: A Massively Multi-domain Language Modeling Dataset [76.13062203588089]
We present M2D2, a fine-grained, massively multi-domain corpus for studying domain adaptation (LMs) Using categories derived from Wikipedia and ArXiv, we organize the domains in each data source into 22 groups. We show the benefits of adapting the LM along a domain hierarchy; adapting to smaller amounts of fine-grained domain-specific data can lead to larger in-domain performance gains.
arXiv Detail & Related papers (2022-10-13T21:34:52Z)
Dynamic Instance Domain Adaptation [109.53575039217094]
Most studies on unsupervised domain adaptation assume that each domain's training samples come with domain labels. We develop a dynamic neural network with adaptive convolutional kernels to generate instance-adaptive residuals to adapt domain-agnostic deep features to each individual instance. Our model, dubbed DIDA-Net, achieves state-of-the-art performance on several commonly used single-source and multi-source UDA datasets.
arXiv Detail & Related papers (2022-03-09T20:05:54Z)
Domain Adaptation via Prompt Learning [39.97105851723885]
Unsupervised domain adaption (UDA) aims to adapt models learned from a well-annotated source domain to a target domain. We introduce a novel prompt learning paradigm for UDA, named Domain Adaptation via Prompt Learning (DAPL)
arXiv Detail & Related papers (2022-02-14T13:25:46Z)
Adaptive Methods for Aggregated Domain Generalization [26.215904177457997]
In many settings, privacy concerns prohibit obtaining domain labels for the training data samples. We propose a domain-adaptive approach to this problem, which operates in two steps. Our approach achieves state-of-the-art performance on a variety of domain generalization benchmarks without using domain labels.
arXiv Detail & Related papers (2021-12-09T08:57:01Z)
Contrastive Learning and Self-Training for Unsupervised Domain Adaptation in Semantic Segmentation [71.77083272602525]
UDA attempts to provide efficient knowledge transfer from a labeled source domain to an unlabeled target domain. We propose a contrastive learning approach that adapts category-wise centroids across domains. We extend our method with self-training, where we use a memory-efficient temporal ensemble to generate consistent and reliable pseudo-labels.
arXiv Detail & Related papers (2021-05-05T11:55:53Z)
Adaptive Methods for Real-World Domain Generalization [32.030688845421594]
In our work, we investigate whether it is possible to leverage domain information from unseen test samples themselves. We propose a domain-adaptive approach consisting of two steps: a) we first learn a discriminative domain embedding from unsupervised training examples, and b) use this domain embedding as supplementary information to build a domain-adaptive model. Our approach achieves state-of-the-art performance on various domain generalization benchmarks.
arXiv Detail & Related papers (2021-03-29T17:44:35Z)
Batch Normalization Embeddings for Deep Domain Generalization [50.51405390150066]
Domain generalization aims at training machine learning models to perform robustly across different and unseen domains. We show a significant increase in classification accuracy over current state-of-the-art techniques on popular domain generalization benchmarks.
arXiv Detail & Related papers (2020-11-25T12:02:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.