Contrastive Prompt Learning-based Code Search based on Interaction
Matrix
- URL: http://arxiv.org/abs/2310.06342v1
- Date: Tue, 10 Oct 2023 06:24:52 GMT
- Title: Contrastive Prompt Learning-based Code Search based on Interaction
Matrix
- Authors: Yubo Zhang, Yanfang Liu, Xinxin Fan, Yunfeng Lu
- Abstract summary: We propose CPLCS, a contrastive prompt learning-based code search method based on the cross-modal interaction mechanism.
We conduct extensive experiments to evaluate the effectiveness of our approach on a real-world dataset across six programming languages.
- Score: 5.379749366580253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code search aims to retrieve the code snippet that highly matches the given
query described in natural language. Recently, many code pre-training
approaches have demonstrated impressive performance on code search. However,
existing code search methods still suffer from two performance constraints:
inadequate semantic representation and the semantic gap between natural
language (NL) and programming language (PL). In this paper, we propose CPLCS, a
contrastive prompt learning-based code search method based on the cross-modal
interaction mechanism. CPLCS comprises:(1) PL-NL contrastive learning, which
learns the semantic matching relationship between PL and NL representations;
(2) a prompt learning design for a dual-encoder structure that can alleviate
the problem of inadequate semantic representation; (3) a cross-modal
interaction mechanism to enhance the fine-grained mapping between NL and PL. We
conduct extensive experiments to evaluate the effectiveness of our approach on
a real-world dataset across six programming languages. The experiment results
demonstrate the efficacy of our approach in improving semantic representation
quality and mapping ability between PL and NL.
Related papers
- Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning [19.93800175353809]
DeTriever is a novel demonstration retrieval framework that learns a weighted combination of hidden states.
Our method significantly outperforms the state-of-the-art baselines on one-shot NL2 tasks.
arXiv Detail & Related papers (2024-06-12T06:33:54Z) - Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code
Generation [19.489202790935902]
We propose a syntax-guided multi-task learning approach TurduckenGen.
Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints.
Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code.
arXiv Detail & Related papers (2023-03-09T06:22:07Z) - CPL: Counterfactual Prompt Learning for Vision and Language Models [76.18024920393245]
This paper presents a novel underlinetextbfCounterfactual underlinetextbfPrompt underlinetextbfLearning (CPL) method for vision and language models.
CPL simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework.
Experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks.
arXiv Detail & Related papers (2022-10-19T08:06:39Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Visualizing the Relationship Between Encoded Linguistic Information and
Task Performance [53.223789395577796]
We study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality.
We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances.
Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance.
arXiv Detail & Related papers (2022-03-29T19:03:10Z) - Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual
Retrieval [51.60862829942932]
We present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved.
However, the peak performance is not met using the general-purpose multilingual text encoders off-the-shelf', but rather relying on their variants that have been further specialized for sentence understanding tasks.
arXiv Detail & Related papers (2021-01-21T00:15:38Z) - Adversarial Training for Code Retrieval with Question-Description
Relevance Regularization [34.29822107097347]
We adapt a simple adversarial learning technique to generate difficult code snippets given the input question.
We propose to leverage question-description relevance to regularize adversarial learning.
Our adversarial learning method is able to improve the performance of state-of-the-art models.
arXiv Detail & Related papers (2020-10-19T19:32:03Z) - Leveraging Code Generation to Improve Code Retrieval and Summarization
via Dual Learning [18.354352985591305]
Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query.
Recent studies have combined these two tasks to improve their performance.
We propose a novel end-to-end model for the two tasks by introducing an additional code generation task.
arXiv Detail & Related papers (2020-02-24T12:26:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.