Exploring Distributional Shifts in Large Language Models for Code
Analysis
- URL: http://arxiv.org/abs/2303.09128v2
- Date: Tue, 5 Dec 2023 19:25:52 GMT
- Title: Exploring Distributional Shifts in Large Language Models for Code
Analysis
- Authors: Shushan Arakelyan, Rocktim Jyoti Das, Yi Mao and Xiang Ren
- Abstract summary: We study how three large language models with code capabilities generalize to out-of-domain data.
We consider two fundamental applications - code summarization, and code generation.
We find that a model adapted to multiple domains simultaneously performs on par with those adapted to a single domain.
- Score: 36.73114441988879
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We systematically study how three large language models with code
capabilities - CodeT5, Codex, and ChatGPT - generalize to out-of-domain data.
We consider two fundamental applications - code summarization, and code
generation. We split data into domains following its natural boundaries - by an
organization, by a project, and by a module within the software project. We
establish that samples from each new domain present all the models with a
significant challenge of distribution shift. We study how established methods
adapt models to better generalize to new domains. Our experiments show that
while multitask learning alone is a reasonable baseline, combining it with
few-shot finetuning on examples retrieved from training data can achieve very
strong performance. Moreover, this solution can outperform direct finetuning
for very low-data scenarios. Finally, we consider variations of this approach
to create a more broadly applicable method to adapt to multiple domains at
once. We find that for code generation, a model adapted to multiple domains
simultaneously performs on par with those adapted to a single domain
Related papers
- Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification.
We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z) - Cross-Domain Content Generation with Domain-Specific Small Language Models [3.2772349789781616]
This study explores methods to enable a small language model to produce coherent and relevant outputs for two different domains.
We find that utilizing custom tokenizers tailored to each dataset significantly enhances generation quality.
Our findings demonstrate that knowledge expansion with frozen layers is an effective method for small language models to generate domain-specific content.
arXiv Detail & Related papers (2024-09-19T21:45:13Z) - Multi-BERT: Leveraging Adapters and Prompt Tuning for Low-Resource Multi-Domain Adaptation [14.211024633768986]
The rapid expansion of texts' volume and diversity presents formidable challenges in multi-domain settings.
Traditional approaches, either employing a unified model for multiple domains or individual models for each domain, frequently pose significant limitations.
This paper introduces a novel approach composed of one core model with multiple sets of domain-specific parameters.
arXiv Detail & Related papers (2024-04-02T22:15:48Z) - Virtual Classification: Modulating Domain-Specific Knowledge for
Multidomain Crowd Counting [67.38137379297717]
Multidomain crowd counting aims to learn a general model for multiple diverse datasets.
Deep networks prefer modeling distributions of the dominant domains instead of all domains, which is known as domain bias.
We propose a Modulating Domain-specific Knowledge Network (MDKNet) to handle the domain bias issue in multidomain crowd counting.
arXiv Detail & Related papers (2024-02-06T06:49:04Z) - Improving Domain Generalization with Domain Relations [77.63345406973097]
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on.
We propose a new approach called D$3$G to learn domain-specific models.
Our results show that D$3$G consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-06T08:11:16Z) - Multi-Domain Long-Tailed Learning by Augmenting Disentangled
Representations [80.76164484820818]
There is an inescapable long-tailed class-imbalance issue in many real-world classification problems.
We study this multi-domain long-tailed learning problem and aim to produce a model that generalizes well across all classes and domains.
Built upon a proposed selective balanced sampling strategy, TALLY achieves this by mixing the semantic representation of one example with the domain-associated nuisances of another.
arXiv Detail & Related papers (2022-10-25T21:54:26Z) - Learning to Generalize across Domains on Single Test Samples [126.9447368941314]
We learn to generalize across domains on single test samples.
We formulate the adaptation to the single test sample as a variational Bayesian inference problem.
Our model achieves at least comparable -- and often better -- performance than state-of-the-art methods on multiple benchmarks for domain generalization.
arXiv Detail & Related papers (2022-02-16T13:21:04Z) - Boosting Binary Masks for Multi-Domain Learning through Affine
Transformations [49.25451497933657]
The goal of multi-domain learning is to produce a single model performing a task in all the domains together.
Recent works showed how we can address this problem by masking the internal weights of a given original conv-net through learned binary variables.
We provide a general formulation of binary mask based models for multi-domain learning by affine transformations of the original network parameters.
arXiv Detail & Related papers (2021-03-25T14:54:37Z) - StandardGAN: Multi-source Domain Adaptation for Semantic Segmentation of
Very High Resolution Satellite Images by Data Standardization [6.481759968656932]
In this work, we deal with the multi-source domain adaptation problem.
Our method, namely StandardGAN, standardizes each source and target domains so that all the data have similar data distributions.
We conduct extensive experiments on two remote sensing data sets, in which the first one consists of multiple cities from a single country, and the other one contains multiple cities from different countries.
arXiv Detail & Related papers (2020-04-14T10:16:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.