Implant Global and Local Hierarchy Information to Sequence based Code
Representation Models
- URL: http://arxiv.org/abs/2303.07826v1
- Date: Tue, 14 Mar 2023 12:01:39 GMT
- Title: Implant Global and Local Hierarchy Information to Sequence based Code
Representation Models
- Authors: Kechi Zhang, Zhuo Li, Zhi Jin, Ge Li
- Abstract summary: We analyze how the complete hierarchical structure influences the tokens in code sequences and abstract this influence as a property of code tokens called hierarchical embedding.
We propose the Hierarchy Transformer (HiT), a simple but effective sequence model to incorporate the complete hierarchical embeddings of source code into a Transformer model.
- Score: 25.776540440893257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Source code representation with deep learning techniques is an important
research field. There have been many studies that learn sequential or
structural information for code representation. But sequence-based models and
non-sequence-models both have their limitations. Researchers attempt to
incorporate structural information to sequence-based models, but they only mine
part of token-level hierarchical structure information. In this paper, we
analyze how the complete hierarchical structure influences the tokens in code
sequences and abstract this influence as a property of code tokens called
hierarchical embedding. The hierarchical embedding is further divided into
statement-level global hierarchy and token-level local hierarchy. Furthermore,
we propose the Hierarchy Transformer (HiT), a simple but effective sequence
model to incorporate the complete hierarchical embeddings of source code into a
Transformer model. We demonstrate the effectiveness of hierarchical embedding
on learning code structure with an experiment on variable scope detection task.
Further evaluation shows that HiT outperforms SOTA baseline models and show
stable training efficiency on three source code-related tasks involving
classification and generation tasks across 8 different datasets.
Related papers
- From Logits to Hierarchies: Hierarchical Clustering made Simple [16.132657141993548]
We show that a lightweight procedure implemented on top of pre-trained non-hierarchical clustering models outperforms models designed specifically for hierarchical clustering.
Our proposed approach is computationally efficient and applicable to any pre-trained clustering model that outputs logits, without requiring any fine-tuning.
arXiv Detail & Related papers (2024-10-10T12:27:45Z) - How transformers learn structured data: insights from hierarchical filtering [2.7784685368355744]
We introduce a hierarchical filtering procedure for generative models of sequences on trees.
We provide evidence that vanilla encoder-only transformer architectures can implement the optimal Belief propagation algorithm.
We analyze how the transformer layers succeed by focusing on attention maps from models trained with varying degrees of filtering.
arXiv Detail & Related papers (2024-08-27T15:23:09Z) - Generating Hierarchical Structures for Improved Time Series
Classification Using Stochastic Splitting Functions [0.0]
This study introduces a novel hierarchical divisive clustering approach with splitting functions (SSFs) to enhance classification performance in multi-class datasets through hierarchical classification (HC)
The method has the unique capability of generating hierarchy without requiring explicit information, making it suitable for datasets lacking prior knowledge of hierarchy.
arXiv Detail & Related papers (2023-09-21T10:34:50Z) - How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model [47.617093812158366]
We introduce the Random Hierarchy Model: a family of synthetic tasks inspired by the hierarchical structure of language and images.
We find that deep networks learn the task by developing internal representations invariant to exchanging equivalent groups.
Our results indicate how deep networks overcome the curse of dimensionality by building invariant representations.
arXiv Detail & Related papers (2023-07-05T09:11:09Z) - Use All The Labels: A Hierarchical Multi-Label Contrastive Learning
Framework [75.79736930414715]
We present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes.
We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint.
arXiv Detail & Related papers (2022-04-27T21:41:44Z) - HiStruct+: Improving Extractive Text Summarization with Hierarchical
Structure Information [0.6443952406204634]
We propose a novel approach to formulate, extract, encode and inject hierarchical structure information explicitly into an extractive summarization model.
Using various experimental settings on three datasets (i.e., CNN/DailyMail, PubMed and arXiv), our HiStruct+ model outperforms a strong baseline collectively.
arXiv Detail & Related papers (2022-03-17T21:49:26Z) - HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain
Language Model Compression [53.90578309960526]
Large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods.
We propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information.
arXiv Detail & Related papers (2021-10-16T11:23:02Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z) - Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity.
Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.