Related papers: Contrastive Prompt Learning-based Code Search based on Interaction Matrix

Contrastive Prompt Learning-based Code Search based on Interaction Matrix

URL: http://arxiv.org/abs/2310.06342v1
Date: Tue, 10 Oct 2023 06:24:52 GMT
Title: Contrastive Prompt Learning-based Code Search based on Interaction Matrix
Authors: Yubo Zhang, Yanfang Liu, Xinxin Fan, Yunfeng Lu
Abstract summary: We propose CPLCS, a contrastive prompt learning-based code search method based on the cross-modal interaction mechanism. We conduct extensive experiments to evaluate the effectiveness of our approach on a real-world dataset across six programming languages.
Score: 5.379749366580253
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Code search aims to retrieve the code snippet that highly matches the given query described in natural language. Recently, many code pre-training approaches have demonstrated impressive performance on code search. However, existing code search methods still suffer from two performance constraints: inadequate semantic representation and the semantic gap between natural language (NL) and programming language (PL). In this paper, we propose CPLCS, a contrastive prompt learning-based code search method based on the cross-modal interaction mechanism. CPLCS comprises:(1) PL-NL contrastive learning, which learns the semantic matching relationship between PL and NL representations; (2) a prompt learning design for a dual-encoder structure that can alleviate the problem of inadequate semantic representation; (3) a cross-modal interaction mechanism to enhance the fine-grained mapping between NL and PL. We conduct extensive experiments to evaluate the effectiveness of our approach on a real-world dataset across six programming languages. The experiment results demonstrate the efficacy of our approach in improving semantic representation quality and mapping ability between PL and NL.

Related papers

Post-Incorporating Code Structural Knowledge into LLMs via In-Context Learning for Code Translation [10.77747590700758]
Large language models (LLMs) have achieved significant advancements in software mining. handling the syntactic structure of source code remains a challenge. This paper employs incontext learning (ICL) to integrate code structural knowledge into pre-trained LLMs.
arXiv Detail & Related papers (2025-03-28T10:59:42Z)
Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification. In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction. Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z)
DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning [19.93800175353809]
DeTriever is a novel demonstration retrieval framework that learns a weighted combination of hidden states. Our method significantly outperforms the state-of-the-art baselines on one-shot NL2 tasks.
arXiv Detail & Related papers (2024-06-12T06:33:54Z)
Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL) GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval. Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z)
A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code Generation [19.489202790935902]
We propose a syntax-guided multi-task learning approach TurduckenGen. Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints. Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code.
arXiv Detail & Related papers (2023-03-09T06:22:07Z)
CPL: Counterfactual Prompt Learning for Vision and Language Models [76.18024920393245]
This paper presents a novel underlinetextbfCounterfactual underlinetextbfPrompt underlinetextbfLearning (CPL) method for vision and language models. CPL simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework. Experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks.
arXiv Detail & Related papers (2022-10-19T08:06:39Z)
Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z)
Visualizing the Relationship Between Encoded Linguistic Information and Task Performance [53.223789395577796]
We study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality. We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances. Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance.
arXiv Detail & Related papers (2022-03-29T19:03:10Z)
Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval [51.60862829942932]
We present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks. For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved. However, the peak performance is not met using the general-purpose multilingual text encoders off-the-shelf', but rather relying on their variants that have been further specialized for sentence understanding tasks.
arXiv Detail & Related papers (2021-01-21T00:15:38Z)
Adversarial Training for Code Retrieval with Question-Description Relevance Regularization [34.29822107097347]
We adapt a simple adversarial learning technique to generate difficult code snippets given the input question. We propose to leverage question-description relevance to regularize adversarial learning. Our adversarial learning method is able to improve the performance of state-of-the-art models.
arXiv Detail & Related papers (2020-10-19T19:32:03Z)
Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning [18.354352985591305]
Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Recent studies have combined these two tasks to improve their performance. We propose a novel end-to-end model for the two tasks by introducing an additional code generation task.
arXiv Detail & Related papers (2020-02-24T12:26:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.