Related papers: On-the-Fly Syntax Highlighting: Generalisation and Speed-ups

On-the-Fly Syntax Highlighting: Generalisation and Speed-ups

URL: http://arxiv.org/abs/2402.08754v1
Date: Tue, 13 Feb 2024 19:43:22 GMT
Title: On-the-Fly Syntax Highlighting: Generalisation and Speed-ups
Authors: Marco Edoardo Palma, Alex Wolf, Pasquale Salza, Harald C. Gall
Abstract summary: On-the-fly syntax highlighting is the task of rapidly associating visual secondary notation values with each character of a language derivation. Speed constraints are essential to ensure tool usability, manifesting as responsiveness for end users accessing online source code. achieving precise highlighting is critical for enhancing code comprehensibility. addressing the development costs of such resolvers is imperative, given the multitude of programming language versions.
Score: 2.208443815105053
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: On-the-fly syntax highlighting is the task of rapidly associating visual secondary notation values with each character of a language derivation. Research in this domain is driven by the prevalence of online software development tools, which frequently display source code on screen and heavily rely on syntax highlighting mechanisms. In this context, three contrasting demands confront resolvers in this space: speed, accuracy, and development costs. Speed constraints are essential to ensure tool usability, manifesting as responsiveness for end users accessing online source code and minimising system overhead. Simultaneously, achieving precise highlighting is critical for enhancing code comprehensibility. Nevertheless, obtaining accurate results necessitates the capacity to perform grammatical analysis on the code under consideration, even in cases of varying grammatical correctness. Furthermore, addressing the development costs of such resolvers is imperative, given the multitude of programming language versions. The current state-of-the-art approach in this field leverages the original lexer and parser of programming languages to create syntax highlighting oracles, subsequently used for training base Recurrent Neural Network models. As the question of the generalisation of such a solution persists, this paper addresses this aspect by extending the original work to three additional mainstream programming languages and conducting a comprehensive review of the outcomes. Moreover, the original limitations in evaluation performance and training costs are mitigated through the introduction of a novel Convolutional based Neural Network model. This study examines the performance gains of running models on GPUs, finding that the new CNN implementation is much faster than previous methods while maintaining high accuracy.

Related papers

CAAD: Context-Aware Adaptive Decoding for Truthful Text Generation [31.469511576774252]
We propose a context-aware adaptive decoding method for large language models.<n>Our approach achieves a 2.8 percent average improvement on TruthfulQA.<n>Our model-agnostic, scalable, and efficient method requires only a single generation pass.
arXiv Detail & Related papers (2025-08-04T08:28:25Z)
From Reasoning to Code: GRPO Optimization for Underrepresented Languages [0.7864304771129751]
This paper introduces a generalizable approach that uses small-scale code versions of the Qwen 2.5 model combined with Group Relative Policy Optimization.<n>It produces logically consistent and syntactically accurate code by directly integrating reasoning-driven feedback into the reinforcement learning loop.
arXiv Detail & Related papers (2025-05-20T11:28:48Z)
Guided Tensor Lifting [54.10411390218929]
Domain-specific languages (s) for machine learning are revolutionizing the speed and efficiency of machine learning workloads. To take advantage of these capabilities, a user must first translate their legacy code from the language it is currently written in, into the new DSL. Process of automatically lifting code into these DSLs has been identified by several recent works, which propose program synthesis as a solution.
arXiv Detail & Related papers (2025-04-28T12:00:10Z)
A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition [32.142713322062306]
Text recognition systems often depend on large end-to-end architectures that require extensive training and are prohibitively expensive for real-time scenarios. We propose a training-free plug-and-play framework that leverages the strengths of pre-trained text recognizers while minimizing redundant computations. Our approach uses context-based understanding and introduces an attention-based segmentation stage, which refines candidate text regions at the pixel level.
arXiv Detail & Related papers (2025-03-19T18:51:01Z)
Dynamic Bi-Elman Attention Networks: A Dual-Directional Context-Aware Test-Time Learning for Text Classification [17.33216148544084]
This paper proposes the Dynamic Bidirectional Elman with Attention Network (DBEAN) DBEAN integrates bidirectional temporal modeling with self-attention mechanisms. It dynamically assigns weights to critical segments of input, improving contextual representation while maintaining computational efficiency.
arXiv Detail & Related papers (2025-03-19T17:45:13Z)
Retrieval Backward Attention without Additional Training: Enhance Embeddings of Large Language Models via Repetition [4.249842620609683]
This paper focuses on improving the performance of pre-trained language models in zero-shot settings through a simple and easily implementable method. We propose a novel backward attention mechanism to enhance contextual information encoding.
arXiv Detail & Related papers (2025-02-28T05:19:18Z)
ReLearn: Unlearning via Learning for Large Language Models [64.2802606302194]
We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
arXiv Detail & Related papers (2025-02-16T16:31:00Z)
Language Models for Code Optimization: Survey, Challenges and Future Directions [7.928856221466083]
Language models (LMs) built upon deep neural networks (DNNs) have recently demonstrated breakthrough effectiveness in software engineering tasks. This study aims to provide actionable insights and references for both researchers and practitioners in this rapidly evolving field.
arXiv Detail & Related papers (2025-01-02T14:20:36Z)
Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process. This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z)
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition [5.525052547053668]
Scene text recognition has attracted more and more attention due to its diverse applications. Most state-of-the-art methods adopt an encoder-decoder framework with the attention mechanism, autoregressively generating text from left to right. We propose an alternative solution, using a parallel and iterative decoder that adopts an easy-first decoding strategy.
arXiv Detail & Related papers (2023-12-19T08:03:19Z)
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models. We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z)
A Transformer-based Approach for Arabic Offline Handwritten Text Recognition [0.0]
We introduce two alternative architectures for recognizing offline Arabic handwritten text. Our approach can model language dependencies and relies only on the attention mechanism, thereby making it more parallelizable and less complex. Our evaluation on the Arabic KHATT dataset demonstrates that our proposed method outperforms the current state-of-the-art approaches.
arXiv Detail & Related papers (2023-07-27T17:51:52Z)
Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization [76.57699934689468]
We propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side to enhance the performance of neural models. To overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens.
arXiv Detail & Related papers (2023-05-18T16:02:04Z)
On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex [48.588772371355816]
This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, codex. Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples.
arXiv Detail & Related papers (2023-01-30T13:21:00Z)
A Survey on Pretrained Language Models for Neural Code Intelligence [4.020523898765404]
The field of Neural Code Intelligence (NCI) has emerged as a promising solution to tackle analytical tasks on source code. NCI aims to improve programming efficiency and minimize human errors within the software industry. Pretrained language models have become a dominant force in NCI research, consistently delivering state-of-the-art results.
arXiv Detail & Related papers (2022-12-20T08:34:56Z)
Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep. We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z)
Enhanced Modality Transition for Image Captioning [51.72997126838352]
We build a Modality Transition Module (MTM) to transfer visual features into semantic representations before forwarding them to the language model. During the training phase, the modality transition network is optimised by the proposed modality loss. Experiments have been conducted on the MS-COCO dataset demonstrating the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-02-23T07:20:12Z)
Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
Sequence Model Design for Code Completion in the Modern IDE [3.4824234779710452]
We propose a novel design for predicting top-k next tokens that combines static analysis' ability to enumerate all valid keywords and in-scope identifiers with the ability of a language model to place a probability distribution over them. Our model mixes character-level input representation with token output to represent out-of-vocabulary (OOV) tokens meaningfully and minimize prediction latency.
arXiv Detail & Related papers (2020-04-10T22:40:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.