APIDocBooster: An Extract-Then-Abstract Framework Leveraging Large
Language Models for Augmenting API Documentation
- URL: http://arxiv.org/abs/2312.10934v2
- Date: Wed, 10 Jan 2024 11:02:33 GMT
- Title: APIDocBooster: An Extract-Then-Abstract Framework Leveraging Large
Language Models for Augmenting API Documentation
- Authors: Chengran Yang, Jiakun Liu, Bowen Xu, Christoph Treude, Yunbo Lyu,
Junda He, Ming Li, David Lo
- Abstract summary: APIDocBooster fuses the advantages of both extractive (i.e., enabling faithful summaries without length limitation) and abstractive summarization (i.e., producing coherent and concise summaries)
APIDocBooster consists of two stages: textbfSentence textbfSection textbfClassification (CSSC) and textbfUPdate textbfSUMmarization (UPSUM)
- Score: 21.417218830976488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: API documentation is often the most trusted resource for programming. Many
approaches have been proposed to augment API documentation by summarizing
complementary information from external resources such as Stack Overflow.
Existing extractive-based summarization approaches excel in producing faithful
summaries that accurately represent the source content without input length
restrictions. Nevertheless, they suffer from inherent readability limitations.
On the other hand, our empirical study on the abstractive-based summarization
method, i.e., GPT-4, reveals that GPT-4 can generate coherent and concise
summaries but presents limitations in terms of informativeness and
faithfulness.
We introduce APIDocBooster, an extract-then-abstract framework that
seamlessly fuses the advantages of both extractive (i.e., enabling faithful
summaries without length limitation) and abstractive summarization (i.e.,
producing coherent and concise summaries). APIDocBooster consists of two
stages: (1) \textbf{C}ontext-aware \textbf{S}entence \textbf{S}ection
\textbf{C}lassification (CSSC) and (2) \textbf{UP}date \textbf{SUM}marization
(UPSUM). CSSC classifies API-relevant information collected from multiple
sources into API documentation sections. UPSUM first generates extractive
summaries distinct from the original API documentation and then generates
abstractive summaries guided by extractive summaries through in-context
learning.
To enable automatic evaluation of APIDocBooster, we construct the first
dataset for API document augmentation. Our automatic evaluation results reveal
that each stage in APIDocBooster outperforms its baselines by a large margin.
Our human evaluation also demonstrates the superiority of APIDocBooster over
GPT-4 and shows that it improves informativeness, relevance, and faithfulness
by 13.89\%, 15.15\%, and 30.56\%, respectively.
Related papers
- Context-Aware Hierarchical Merging for Long Document Summarization [56.96619074316232]
We propose different approaches to enrich hierarchical merging with context from the source document.
Experimental results on datasets representing legal and narrative domains show that contextual augmentation consistently outperforms zero-shot and hierarchical merging baselines.
arXiv Detail & Related papers (2025-02-03T01:14:31Z) - Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback.
Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z) - ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions [44.938469262938725]
ABEX is a generative data augmentation methodology for Natural Language Understanding (NLU) tasks.
We first convert a document into its concise, abstract description and then generate new documents based on expanding the resultant abstraction.
We demonstrate the effectiveness of ABEX on 4 NLU tasks spanning 12 datasets and 4 low-resource settings.
arXiv Detail & Related papers (2024-06-06T17:29:57Z) - Source Identification in Abstractive Summarization [0.8883733362171033]
We define input sentences that contain essential information in the generated summary as $textitsource sentences$ and study how abstractive summaries are made by analyzing the source sentences.
We formulate automatic source sentence detection and compare multiple methods to establish a strong baseline for the task.
Experimental results show that the perplexity-based method performs well in highly abstractive settings, while similarity-based methods robustly in relatively extractive settings.
arXiv Detail & Related papers (2024-02-07T09:09:09Z) - APIGen: Generative API Method Recommendation [16.541442856821]
APIGen is a generative API recommendation approach through enhanced in-context learning (ICL)
APIGen searches for similar posts to the programming queries from the lexical, syntactical, and semantic perspectives.
With the reasoning process, APIGen makes recommended APIs better meet the programming requirement of queries.
arXiv Detail & Related papers (2024-01-29T02:35:42Z) - Leveraging Deep Learning for Abstractive Code Summarization of
Unofficial Documentation [1.1816942730023887]
This paper proposes an automatic approach using the BART algorithm to generate summaries for APIs discussed in StackOverflow.
We built an oracle of human-generated summaries to evaluate our approach against it using ROUGE and BLEU metrics.
Our findings demonstrate that using deep learning algorithms can improve summaries' quality and outperform the previous work by an average of %57 for Precision.
arXiv Detail & Related papers (2023-10-23T15:10:37Z) - On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries.
Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens.
We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - Reinforcing Semantic-Symmetry for Document Summarization [15.113768658584979]
Document summarization condenses a long document into a short version with salient information and accurate semantic descriptions.
This paper introduces a new textbfreinforcing stextbfemantic-textbfsymmetry learning textbfmodel is proposed for document summarization.
A series of experiments have been conducted on two wildly used benchmark datasets CNN/Daily Mail and BigPatent.
arXiv Detail & Related papers (2021-12-14T17:41:37Z) - Eider: Evidence-enhanced Document-level Relation Extraction [56.71004595444816]
Document-level relation extraction (DocRE) aims at extracting semantic relations among entity pairs in a document.
We propose a three-stage evidence-enhanced DocRE framework consisting of joint relation and evidence extraction, evidence-centered relation extraction (RE), and fusion of extraction results.
arXiv Detail & Related papers (2021-06-16T09:43:16Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z) - At Which Level Should We Extract? An Empirical Analysis on Extractive
Document Summarization [110.54963847339775]
We show that unnecessity and redundancy issues exist when extracting full sentences.
We propose extracting sub-sentential units based on the constituency parsing tree.
arXiv Detail & Related papers (2020-04-06T13:35:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.