On-the-Fly Syntax Highlighting: Generalisation and Speed-ups
- URL: http://arxiv.org/abs/2402.08754v1
- Date: Tue, 13 Feb 2024 19:43:22 GMT
- Title: On-the-Fly Syntax Highlighting: Generalisation and Speed-ups
- Authors: Marco Edoardo Palma, Alex Wolf, Pasquale Salza, Harald C. Gall
- Abstract summary: On-the-fly syntax highlighting is the task of rapidly associating visual secondary notation values with each character of a language derivation.
Speed constraints are essential to ensure tool usability, manifesting as responsiveness for end users accessing online source code.
achieving precise highlighting is critical for enhancing code comprehensibility.
addressing the development costs of such resolvers is imperative, given the multitude of programming language versions.
- Score: 2.208443815105053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On-the-fly syntax highlighting is the task of rapidly associating visual
secondary notation values with each character of a language derivation.
Research in this domain is driven by the prevalence of online software
development tools, which frequently display source code on screen and heavily
rely on syntax highlighting mechanisms. In this context, three contrasting
demands confront resolvers in this space: speed, accuracy, and development
costs. Speed constraints are essential to ensure tool usability, manifesting as
responsiveness for end users accessing online source code and minimising system
overhead. Simultaneously, achieving precise highlighting is critical for
enhancing code comprehensibility. Nevertheless, obtaining accurate results
necessitates the capacity to perform grammatical analysis on the code under
consideration, even in cases of varying grammatical correctness. Furthermore,
addressing the development costs of such resolvers is imperative, given the
multitude of programming language versions. The current state-of-the-art
approach in this field leverages the original lexer and parser of programming
languages to create syntax highlighting oracles, subsequently used for training
base Recurrent Neural Network models. As the question of the generalisation of
such a solution persists, this paper addresses this aspect by extending the
original work to three additional mainstream programming languages and
conducting a comprehensive review of the outcomes. Moreover, the original
limitations in evaluation performance and training costs are mitigated through
the introduction of a novel Convolutional based Neural Network model. This
study examines the performance gains of running models on GPUs, finding that
the new CNN implementation is much faster than previous methods while
maintaining high accuracy.
Related papers
- Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition [5.525052547053668]
Scene text recognition has attracted more and more attention due to its diverse applications.
Most state-of-the-art methods adopt an encoder-decoder framework with the attention mechanism, autoregressively generating text from left to right.
We propose an alternative solution, using a parallel and iterative decoder that adopts an easy-first decoding strategy.
arXiv Detail & Related papers (2023-12-19T08:03:19Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - A Transformer-based Approach for Arabic Offline Handwritten Text
Recognition [0.0]
We introduce two alternative architectures for recognizing offline Arabic handwritten text.
Our approach can model language dependencies and relies only on the attention mechanism, thereby making it more parallelizable and less complex.
Our evaluation on the Arabic KHATT dataset demonstrates that our proposed method outperforms the current state-of-the-art approaches.
arXiv Detail & Related papers (2023-07-27T17:51:52Z) - Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization [76.57699934689468]
We propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side to enhance the performance of neural models.
To overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens.
arXiv Detail & Related papers (2023-05-18T16:02:04Z) - On Robustness of Prompt-based Semantic Parsing with Large Pre-trained
Language Model: An Empirical Study on Codex [48.588772371355816]
This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, codex.
Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples.
arXiv Detail & Related papers (2023-01-30T13:21:00Z) - A Survey on Pretrained Language Models for Neural Code Intelligence [4.020523898765404]
The field of Neural Code Intelligence (NCI) has emerged as a promising solution to tackle analytical tasks on source code.
NCI aims to improve programming efficiency and minimize human errors within the software industry.
Pretrained language models have become a dominant force in NCI research, consistently delivering state-of-the-art results.
arXiv Detail & Related papers (2022-12-20T08:34:56Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z) - Enhanced Modality Transition for Image Captioning [51.72997126838352]
We build a Modality Transition Module (MTM) to transfer visual features into semantic representations before forwarding them to the language model.
During the training phase, the modality transition network is optimised by the proposed modality loss.
Experiments have been conducted on the MS-COCO dataset demonstrating the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-02-23T07:20:12Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z) - Sequence Model Design for Code Completion in the Modern IDE [3.4824234779710452]
We propose a novel design for predicting top-k next tokens that combines static analysis' ability to enumerate all valid keywords and in-scope identifiers with the ability of a language model to place a probability distribution over them.
Our model mixes character-level input representation with token output to represent out-of-vocabulary (OOV) tokens meaningfully and minimize prediction latency.
arXiv Detail & Related papers (2020-04-10T22:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.