Multi-granular Software Annotation using File-level Weak Labelling
- URL: http://arxiv.org/abs/2311.11607v1
- Date: Mon, 20 Nov 2023 08:48:18 GMT
- Title: Multi-granular Software Annotation using File-level Weak Labelling
- Authors: Cezar Sas, Andrea Capiluppi
- Abstract summary: We propose a novel approach to annotate source code files using a weak labelling approach and a hierarchical aggregation.
We evaluate whether this approach is effective in achieving multi-granular annotations of software projects.
- Score: 7.106986689736827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the most time-consuming tasks for developers is the comprehension of
new code bases. An effective approach to aid this process is to label source
code files with meaningful annotations, which can help developers understand
the content and functionality of a code base quicker. However, most existing
solutions for code annotation focus on project-level classification: manually
labelling individual files is time-consuming, error-prone and hard to scale.
The work presented in this paper aims to automate the annotation of files by
leveraging project-level labels; and using the file-level annotations to
annotate items at larger levels of granularity, for example, packages and a
whole project.
We propose a novel approach to annotate source code files using a weak
labelling approach and a subsequent hierarchical aggregation. We investigate
whether this approach is effective in achieving multi-granular annotations of
software projects, which can aid developers in understanding the content and
functionalities of a code base more quickly.
Our evaluation uses a combination of human assessment and automated metrics
to evaluate the annotations' quality. Our approach correctly annotated 50% of
files and more than 50\% of packages. Moreover, the information captured at the
file-level allowed us to identify, on average, three new relevant labels for
any given project.
We can conclude that the proposed approach is a convenient and promising way
to generate noisy (not precise) annotations for files. Furthermore,
hierarchical aggregation effectively preserves the information captured at
file-level, and it can be propagated to packages and the overall project
itself.
Related papers
- Codetations: Intelligent, Persistent Notes and UIs for Programs and Other Documents [0.85830154886823]
We present Codetations, a system that helps developers contextualize documents with rich notes and tools.
Unlike previous approaches, notes in Codetations stay outside the document to prevent code clutter, attaching to spans in the document using a hybrid edit-tracking/LLM-based method.
Their content is dynamic, interactive, and synchronized with code changes.
arXiv Detail & Related papers (2025-04-25T21:33:25Z) - Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets [51.74296438621836]
We introduce Scribbles for All, a label and training data generation algorithm for semantic segmentation trained on scribble labels.
The main limitation of scribbles as source for weak supervision is the lack of challenging datasets for scribble segmentation.
Scribbles for All provides scribble labels for several popular segmentation datasets and provides an algorithm to automatically generate scribble labels for any dataset with dense annotations.
arXiv Detail & Related papers (2024-08-22T15:29:08Z) - AutoFL: A Tool for Automatic Multi-granular Labelling of Software Repositories [6.0158981171030685]
AutoFL is a tool for automatically labelling software repositories from source code.
It allows multi-granular annotations including: textitfile, textitpackage, and textitproject -level.
arXiv Detail & Related papers (2024-08-05T15:34:26Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - Deep Active Learning with Noisy Oracle in Object Detection [5.5165579223151795]
We propose a composite active learning framework including a label review module for deep object detection.
We show that utilizing part of the annotation budget to correct the noisy annotations partially in the active dataset leads to early improvements in model performance.
In our experiments we achieve improvements of up to 4.5 mAP points of object detection performance by incorporating label reviews at equal annotation budget.
arXiv Detail & Related papers (2023-09-30T13:28:35Z) - Substituting Data Annotation with Balanced Updates and Collective Loss
in Multi-label Text Classification [19.592985329023733]
Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text.
We study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels.
Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing along the label dependency graph.
arXiv Detail & Related papers (2023-09-24T04:12:52Z) - Multi-Label Knowledge Distillation [86.03990467785312]
We propose a novel multi-label knowledge distillation method.
On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems.
On the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings.
arXiv Detail & Related papers (2023-08-12T03:19:08Z) - Visual Recognition by Request [111.94887516317735]
We present a novel protocol of annotation and evaluation for visual recognition.
It does not require the labeler/algorithm to annotate/recognize all targets (objects, parts, etc.) at once, but instead raises a number of recognition instructions and the algorithm recognizes targets by request.
We evaluate the recognition system on two mixed-annotated datasets, CPP and ADE20K, and demonstrate its promising ability of learning from partially labeled data.
arXiv Detail & Related papers (2022-07-28T16:55:11Z) - Fine-Grained Visual Entailment [51.66881737644983]
We propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image.
Unlike prior work, our method is inherently explainable and makes logical predictions at different levels of granularity.
We evaluate our method on a new dataset of manually annotated knowledge elements and show that our method achieves 68.18% accuracy at this challenging task.
arXiv Detail & Related papers (2022-03-29T16:09:38Z) - Assisted Text Annotation Using Active Learning to Achieve High Quality
with Little Effort [9.379650501033465]
We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations.
We combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories.
Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even complex and subtle frames.
arXiv Detail & Related papers (2021-12-15T13:14:58Z) - OPAD: An Optimized Policy-based Active Learning Framework for Document
Content Analysis [6.159771892460152]
We propose textitOPAD, a novel framework using reinforcement policy for active learning in content detection tasks for documents.
The framework learns the acquisition function to decide the samples to be selected while optimizing performance metrics.
We show superior performance of the proposed textitOPAD framework for active learning for various tasks related to document understanding.
arXiv Detail & Related papers (2021-10-01T07:40:56Z) - Weakly-Supervised Salient Object Detection via Scribble Annotations [54.40518383782725]
We propose a weakly-supervised salient object detection model to learn saliency from scribble labels.
We present a new metric, termed saliency structure measure, to measure the structure alignment of the predicted saliency maps.
Our method not only outperforms existing weakly-supervised/unsupervised methods, but also is on par with several fully-supervised state-of-the-art models.
arXiv Detail & Related papers (2020-03-17T12:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.