HGAdapter: Hypergraph-based Adapters in Language Models for Code Summarization and Clone Detection
- URL: http://arxiv.org/abs/2510.17591v1
- Date: Mon, 20 Oct 2025 14:41:28 GMT
- Title: HGAdapter: Hypergraph-based Adapters in Language Models for Code Summarization and Clone Detection
- Authors: Guang Yang, Yujie Zhu,
- Abstract summary: We propose three types of high-order correlations in code tokens, i.e. abstract syntax tree family correlation, lexical correlation, and line correlation.<n>We design a tokens and hyperedges generator to capture these high-order data correlations.<n>Experiments were conducted on several public datasets, including six languages of code summarization and code clone detection tasks.
- Score: 5.383338161281297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models (PLMs) are increasingly being applied to code-related tasks. Although PLMs have achieved good results, they do not take into account potential high-order data correlations within the code. We propose three types of high-order correlations in code tokens, i.e. abstract syntax tree family correlation, lexical correlation, and line correlation. We design a tokens and hyperedges generator to capture these high-order data correlations. We improve the architecture of hypergraph neural networks and combine it with adapter tuning to propose a novel hypergraph-based adapter (HGAdapter) to fine-tune PLMs. HGAdapter can encode high-order data correlations and is allowed to be inserted into various PLMs to enhance performance. Experiments were conducted on several public datasets, including six languages of code summarization and code clone detection tasks. Our methods improved the performance of PLMs in datasets to varying degrees. Experimental results validate the introduction of high-order data correlations that contribute to improved effectiveness.
Related papers
- Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks [41.75017840131367]
Large language models (LLMs) have shown impressive promise in code generation.<n>We present a scalable synthetic data generation pipeline that produces nearly 800k instruction-reasoning-code-test quadruplets.
arXiv Detail & Related papers (2025-10-27T10:54:25Z) - Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - Code Needs Comments: Enhancing Code LLMs with Comment Augmentation [91.52444946362547]
We introduce a novel data augmentation method that generates comments for existing code, coupled with a data filtering strategy that filters out code data poorly correlated with natural language.
We conducted experiments on three code-focused Large Language Models and observed consistent improvements in performance on two widely-used programming skill benchmarks.
arXiv Detail & Related papers (2024-02-20T13:56:38Z) - Heterogeneous Directed Hypergraph Neural Network over abstract syntax tree (AST) for Code Classification [7.470593257656977]
We propose a heterogeneous directed hypergraph (HDHG) to represent abstract syntax tree (AST) and a heterogeneous directed hypergraph neural network (HDHGN) to process the graph for code classification.<n>Our method improves code understanding and can represent high-order data correlations beyond paired interactions.
arXiv Detail & Related papers (2023-05-07T09:28:16Z) - Towards Better Dynamic Graph Learning: New Architecture and Unified
Library [29.625205125350313]
DyGFormer is a Transformer-based architecture for dynamic graph learning.
DyGLib is a unified library with standard training pipelines and coding interfaces.
arXiv Detail & Related papers (2023-03-23T05:27:32Z) - Personalized Decentralized Multi-Task Learning Over Dynamic
Communication Graphs [59.96266198512243]
We propose a decentralized and federated learning algorithm for tasks that are positively and negatively correlated.
Our algorithm uses gradients to calculate the correlations among tasks automatically, and dynamically adjusts the communication graph to connect mutually beneficial tasks and isolate those that may negatively impact each other.
We conduct experiments on a synthetic Gaussian dataset and a large-scale celebrity attributes (CelebA) dataset.
arXiv Detail & Related papers (2022-12-21T18:58:24Z) - Equivariant Hypergraph Diffusion Neural Operators [81.32770440890303]
Hypergraph neural networks (HNNs) using neural networks to encode hypergraphs provide a promising way to model higher-order relations in data.
This work proposes a new HNN architecture named ED-HNN, which provably represents any continuous equivariant hypergraph diffusion operators.
We evaluate ED-HNN for node classification on nine real-world hypergraph datasets.
arXiv Detail & Related papers (2022-07-14T06:17:00Z) - Highly Parallel Autoregressive Entity Linking with Discriminative
Correction [51.947280241185]
We propose a very efficient approach that parallelizes autoregressive linking across all potential mentions.
Our model is >70 times faster and more accurate than the previous generative method.
arXiv Detail & Related papers (2021-09-08T17:28:26Z) - Learnable Hypergraph Laplacian for Hypergraph Learning [34.28748027233654]
HyperGraph Convolutional Neural Networks (HGCNNs) have demonstrated their potential in modeling high-order relations preserved in graph structured data.
We propose the first learning-based method tailored for constructing adaptive hypergraph structure, termed HypERgrAph Laplacian aDaptor (HERALD)
HERALD adaptively optimize the adjacency relationship between hypernodes and hyperedges in an end-to-end manner and thus the task-aware hypergraph is learned.
arXiv Detail & Related papers (2021-06-12T02:07:07Z) - Learnable Hypergraph Laplacian for Hypergraph Learning [34.28748027233654]
HyperGraph Convolutional Neural Networks (HGCNNs) have demonstrated their potential in modeling high-order relations preserved in graph structured data.
We propose the first learning-based method tailored for constructing adaptive hypergraph structure, termed HypERgrAph Laplacian aDaptor (HERALD)
HERALD adaptively optimize the adjacency relationship between hypernodes and hyperedges in an end-to-end manner and thus the task-aware hypergraph is learned.
arXiv Detail & Related papers (2021-06-10T12:37:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.