Utilization of Pre-trained Language Model for Adapter-based Knowledge
Transfer in Software Engineering
- URL: http://arxiv.org/abs/2307.08540v2
- Date: Tue, 6 Feb 2024 08:03:31 GMT
- Title: Utilization of Pre-trained Language Model for Adapter-based Knowledge
Transfer in Software Engineering
- Authors: Iman Saberi, Fatemeh Fard and Fuxiang Chen
- Abstract summary: We study the knowledge transfer using adapters on multiple down-stream tasks including cloze test, code clone detection, and code summarization.
adapters are trained on code corpora and are inserted into a PLM that is pre-trained on English corpora or code corpora.
We observed an improvement in results using NL-PLM over a PLM that does not have adapters, and this suggested that adapters can transfer and utilize useful knowledge from NL-PLM to SE tasks.
- Score: 0.3963827913892984
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Software Engineering (SE) Pre-trained Language Models (PLMs), such as
CodeBERT, are pre-trained on large code corpora, and their learned knowledge
has shown success in transferring into downstream tasks (e.g., code clone
detection) through the fine-tuning of PLMs. In Natural Language Processing
(NLP), an alternative in transferring the knowledge of PLMs is explored through
the use of adapter, a compact and parameter efficient module that is inserted
into a PLM. Although the use of adapters has shown promising results in many
NLP-based downstream tasks, their application and exploration in SE-based
downstream tasks are limited.
Here, we study the knowledge transfer using adapters on multiple down-stream
tasks including cloze test, code clone detection, and code summarization. These
adapters are trained on code corpora and are inserted into a PLM that is
pre-trained on English corpora or code corpora. We called these PLMs as NL-PLM
and C-PLM, respectively. We observed an improvement in results using NL-PLM
over a PLM that does not have adapters, and this suggested that adapters can
transfer and utilize useful knowledge from NL-PLM to SE tasks. The results are
sometimes on par with or exceed the results of C-PLM; while being more
efficient in terms of the number of parameters and training time.
Interestingly, adapters inserted into a C-PLM generally yield better results
than a traditional fine-tuned C-PLM. Our results open new directions to build
more compact models for SE tasks.
Related papers
- Leveraging Large Language Models for Wireless Symbol Detection via In-Context Learning [29.28683810366379]
We propose to leverage the in-context learning ability (a.k.a. prompting) of large language models (LLMs) to solve wireless tasks in the low data regime without any training or fine-tuning.
Our results reveal that using LLMs via ICL methods generally outperforms traditional DNNs on the symbol demodulation task.
arXiv Detail & Related papers (2024-08-28T17:19:20Z) - Exploring and Unleashing the Power of Large Language Models in Automated Code Translation [40.25727029618665]
This paper investigates diverse LLMs and learning-based transpilers for automated code translation tasks.
UniTrans is a Unified code Translation framework, applicable to various LLMs.
Three recent LLMs of diverse sizes are tested with UniTrans, and all achieve substantial improvements.
arXiv Detail & Related papers (2024-04-23T00:49:46Z) - CodecLM: Aligning Language Models with Tailored Synthetic Data [51.59223474427153]
We introduce CodecLM, a framework for adaptively generating high-quality synthetic data for instruction-following abilities.
We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution.
We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples.
arXiv Detail & Related papers (2024-04-08T21:15:36Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large
Language Models [77.2078051555533]
We propose a novel and affordable solution for the effective VL adaption of large language models (LLMs)
Instead of using large neural networks to connect the image encoder and LLM, MMA adopts lightweight modules, i.e., adapters.
MMA is also equipped with a routing algorithm to help LLMs achieve an automatic shift between single- and multi-modal instructions.
arXiv Detail & Related papers (2023-05-24T11:06:15Z) - LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of
Large Language Models [75.25782573728677]
This paper presents a framework for adapter-based parameter-efficient fine-tuning (PEFT) of language models (LLMs)
The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, and GPT-J, as well as widely used adapters such as Series adapters, Parallel adapter, Prompt-based learning and Reparametrization-based methods.
We evaluate the effectiveness of the adapters on fourteen datasets from two different reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning.
arXiv Detail & Related papers (2023-04-04T16:31:37Z) - CHAPTER: Exploiting Convolutional Neural Network Adapters for
Self-supervised Speech Models [62.60723685118747]
Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data.
We propose an efficient tuning method specifically designed for SSL speech model, by applying CNN adapters at the feature extractor.
We empirically found that adding CNN to the feature extractor can help the adaptation on emotion and speaker tasks.
arXiv Detail & Related papers (2022-12-01T08:50:12Z) - Selective Token Generation for Few-shot Natural Language Generation [19.015739016376532]
We develop a novel additive learning algorithm based on reinforcement learning (RL)
We show that the proposed selective token generation significantly outperforms the previous additive learning algorithms based on the PLMs.
arXiv Detail & Related papers (2022-09-17T00:48:52Z) - KALA: Knowledge-Augmented Language Model Adaptation [65.92457495576141]
We propose a novel domain adaption framework for pre-trained language models (PLMs)
Knowledge-Augmented Language model Adaptation (KALA) modulates the intermediate hidden representations of PLMs with domain knowledge.
Results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training.
arXiv Detail & Related papers (2022-04-22T08:11:59Z) - On The Cross-Modal Transfer from Natural Language to Code through
Adapter Modules [0.0]
We explore the knowledge transfer using adapters in software engineering.
Three programming languages, C/C++, Python, and Java, are studied along with extensive experiments on the best setup used for adapters.
Our results can open new directions to build smaller models for more software engineering tasks.
arXiv Detail & Related papers (2022-04-19T04:18:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.