DPS: Design Pattern Summarisation Using Code Features
- URL: http://arxiv.org/abs/2504.11081v1
- Date: Tue, 15 Apr 2025 11:27:44 GMT
- Title: DPS: Design Pattern Summarisation Using Code Features
- Authors: Najam Nazar, Sameer Sikka, Christoph Treude,
- Abstract summary: We generate summaries for software design patterns using Java and NLG libraries.<n>Our summaries closely align with human-written summaries.<n>A follow-up survey shows that DPS summaries were rated as capturing context better than human-generated summaries.
- Score: 8.24515384844758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic summarisation has been used efficiently in recent years to condense texts, conversations, audio, code, and various other artefacts. A range of methods, from simple template-based summaries to complex machine learning techniques -- and more recently, large language models -- have been employed to generate these summaries. Summarising software design patterns is important because it helps developers quickly understand and reuse complex design concepts, thereby improving software maintainability and development efficiency. However, the generation of summaries for software design patterns has not yet been explored. Our approach utilises code features and JavaParser to parse the code and create a JSON representation. Using an NLG library on this JSON representation, we convert it into natural language text that acts as a summary of the code, capturing the contextual information of the design pattern. Our empirical results indicate that the summaries generated by our approach capture the context in which patterns are applied in the codebase. Statistical evaluations demonstrate that our summaries closely align with human-written summaries, as evident from high values in the ROUGE-L, BLEU-4, NIST, and FrugalScore metrics. A follow-up survey further shows that DPS summaries were rated as capturing context better than human-generated summaries.
Related papers
- Towards Leveraging Large Language Model Summaries for Topic Modeling in Source Code [0.0]
Large language models (LLMs) have demonstrated remarkable program comprehension capabilities.
transformer-based topic modeling techniques offer effective ways to extract semantic information from text.
This paper proposes and explores a novel approach that combines these strengths to automatically identify meaningful topics in a corpus of Python programs.
arXiv Detail & Related papers (2025-04-24T10:30:40Z) - Bridging Textual-Collaborative Gap through Semantic Codes for Sequential Recommendation [91.13055384151897]
CoCoRec is a novel Code-based textual and Collaborative semantic fusion method for sequential Recommendation.<n>We generate fine-grained semantic codes from multi-view text embeddings through vector quantization techniques.<n>In order to further enhance the fusion of textual and collaborative semantics, we introduce an optimization strategy.
arXiv Detail & Related papers (2025-03-15T15:54:44Z) - Consistency Evaluation of News Article Summaries Generated by Large (and Small) Language Models [0.0]
Large Language Models (LLMs) have shown promise in generating fluent abstractive summaries but they can produce hallucinated details not grounded in the source text.<n>This paper embarks on an exploration of text summarization with a diverse set of techniques, including TextRank, BART, Mistral-7B-Instruct, and OpenAI GPT-3.5-Turbo.<n>We find that all summarization models produce consistent summaries when tested on the XL-Sum dataset.
arXiv Detail & Related papers (2025-02-28T01:58:17Z) - Contextualized Data-Wrangling Code Generation in Computational Notebooks [131.26365849822932]
We propose an automated approach, CoCoMine, to mine data-wrangling code generation examples with clear multi-modal contextual dependency.
We construct CoCoNote, a dataset containing 58,221 examples for Contextualized Data-wrangling Code generation in Notebooks.
Experiment results demonstrate the significance of incorporating data context in data-wrangling code generation.
arXiv Detail & Related papers (2024-09-20T14:49:51Z) - Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - An Extractive-and-Abstractive Framework for Source Code Summarization [28.553366270065656]
Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language.
We propose a novel extractive-and-abstractive framework to generate human-written-like summaries with preserved factual details.
arXiv Detail & Related papers (2022-06-15T02:14:24Z) - Automated News Summarization Using Transformers [4.932130498861987]
We will be presenting a comprehensive comparison of a few transformer architecture based pre-trained models for text summarization.
For analysis and comparison, we have used the BBC news dataset that contains text data that can be used for summarization and human generated summaries.
arXiv Detail & Related papers (2021-04-23T04:22:33Z) - Learning to Synthesize Data for Semantic Parsing [57.190817162674875]
We propose a generative model which models the composition of programs and maps a program to an utterance.
Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand.
We evaluate our method in both in-domain and out-of-domain settings of text-to-Query parsing on the standard benchmarks of GeoQuery and Spider.
arXiv Detail & Related papers (2021-04-12T21:24:02Z) - Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven
Cloze Reward [42.925345819778656]
We present ASGARD, a novel framework for Abstractive Summarization with Graph-Augmentation and semantic-driven RewarD.
We propose the use of dual encoders---a sequential document encoder and a graph-structured encoder---to maintain the global context and local characteristics of entities.
Results show that our models produce significantly higher ROUGE scores than a variant without knowledge graph as input on both New York Times and CNN/Daily Mail datasets.
arXiv Detail & Related papers (2020-05-03T18:23:06Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.