On-the-Fly Syntax Highlighting: Generalisation and Speed-ups
- URL: http://arxiv.org/abs/2402.08754v1
- Date: Tue, 13 Feb 2024 19:43:22 GMT
- Title: On-the-Fly Syntax Highlighting: Generalisation and Speed-ups
- Authors: Marco Edoardo Palma, Alex Wolf, Pasquale Salza, Harald C. Gall
- Abstract summary: On-the-fly syntax highlighting is the task of rapidly associating visual secondary notation values with each character of a language derivation.
Speed constraints are essential to ensure tool usability, manifesting as responsiveness for end users accessing online source code.
achieving precise highlighting is critical for enhancing code comprehensibility.
addressing the development costs of such resolvers is imperative, given the multitude of programming language versions.
- Score: 2.208443815105053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On-the-fly syntax highlighting is the task of rapidly associating visual
secondary notation values with each character of a language derivation.
Research in this domain is driven by the prevalence of online software
development tools, which frequently display source code on screen and heavily
rely on syntax highlighting mechanisms. In this context, three contrasting
demands confront resolvers in this space: speed, accuracy, and development
costs. Speed constraints are essential to ensure tool usability, manifesting as
responsiveness for end users accessing online source code and minimising system
overhead. Simultaneously, achieving precise highlighting is critical for
enhancing code comprehensibility. Nevertheless, obtaining accurate results
necessitates the capacity to perform grammatical analysis on the code under
consideration, even in cases of varying grammatical correctness. Furthermore,
addressing the development costs of such resolvers is imperative, given the
multitude of programming language versions. The current state-of-the-art
approach in this field leverages the original lexer and parser of programming
languages to create syntax highlighting oracles, subsequently used for training
base Recurrent Neural Network models. As the question of the generalisation of
such a solution persists, this paper addresses this aspect by extending the
original work to three additional mainstream programming languages and
conducting a comprehensive review of the outcomes. Moreover, the original
limitations in evaluation performance and training costs are mitigated through
the introduction of a novel Convolutional based Neural Network model. This
study examines the performance gains of running models on GPUs, finding that
the new CNN implementation is much faster than previous methods while
maintaining high accuracy.
Related papers
- ReLearn: Unlearning via Learning for Large Language Models [64.2802606302194]
We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning.
This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation.
Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
arXiv Detail & Related papers (2025-02-16T16:31:00Z) - A Framework for On the Fly Input Refinement for Deep Learning Models [0.0]
Deep learning models still exhibit notable mispredictions in real-world applications, even when trained on up-to-date data.
This research introduces an adaptive, on-the-fly input refinement framework aimed at improving model performance through input validation and transformation.
As a scalable and resource-efficient solution, this framework holds significant promise for high-stakes applications in software engineering, natural language processing, and computer vision.
arXiv Detail & Related papers (2025-02-08T05:41:01Z) - Language Models for Code Optimization: Survey, Challenges and Future Directions [7.928856221466083]
Language models (LMs) built upon deep neural networks (DNNs) have recently demonstrated breakthrough effectiveness in software engineering tasks.
This study aims to provide actionable insights and references for both researchers and practitioners in this rapidly evolving field.
arXiv Detail & Related papers (2025-01-02T14:20:36Z) - Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition [5.525052547053668]
Scene text recognition has attracted more and more attention due to its diverse applications.
Most state-of-the-art methods adopt an encoder-decoder framework with the attention mechanism, autoregressively generating text from left to right.
We propose an alternative solution, using a parallel and iterative decoder that adopts an easy-first decoding strategy.
arXiv Detail & Related papers (2023-12-19T08:03:19Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - A Transformer-based Approach for Arabic Offline Handwritten Text
Recognition [0.0]
We introduce two alternative architectures for recognizing offline Arabic handwritten text.
Our approach can model language dependencies and relies only on the attention mechanism, thereby making it more parallelizable and less complex.
Our evaluation on the Arabic KHATT dataset demonstrates that our proposed method outperforms the current state-of-the-art approaches.
arXiv Detail & Related papers (2023-07-27T17:51:52Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z) - Enhanced Modality Transition for Image Captioning [51.72997126838352]
We build a Modality Transition Module (MTM) to transfer visual features into semantic representations before forwarding them to the language model.
During the training phase, the modality transition network is optimised by the proposed modality loss.
Experiments have been conducted on the MS-COCO dataset demonstrating the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-02-23T07:20:12Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z) - Sequence Model Design for Code Completion in the Modern IDE [3.4824234779710452]
We propose a novel design for predicting top-k next tokens that combines static analysis' ability to enumerate all valid keywords and in-scope identifiers with the ability of a language model to place a probability distribution over them.
Our model mixes character-level input representation with token output to represent out-of-vocabulary (OOV) tokens meaningfully and minimize prediction latency.
arXiv Detail & Related papers (2020-04-10T22:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.