Fugu-MT 論文翻訳(概要): Multi Language Models for On-the-Fly Syntax Highlighting

論文の概要: Multi Language Models for On-the-Fly Syntax Highlighting

arxiv url: http://arxiv.org/abs/2510.04166v1
Date: Sun, 05 Oct 2025 11:48:49 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.498561
Title: Multi Language Models for On-the-Fly Syntax Highlighting
Title（参考訳）: オンザフライ構文ハイライトのための多言語モデル
Authors: Marco Edoardo Palma, Pooja Rani, Harald C. Gall,
Abstract要約: 本稿では,最大6つの主流プログラミング言語を強調表示できる統一モデルを提案する。デプロイメントの複雑さを6倍に減らし、目に見えない言語のパフォーマンスを向上させる。
参考スコア（独自算出の注目度）: 2.4216414826638353
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Syntax highlighting is a critical feature in modern software development environments, enhancing code readability and developer productivity. However, delivering accurate highlighting in real time remains challenging for online and web-based development tools due to strict time and memory constraints on backend services. These systems must serve highlights rapidly and frequently, even when code is partially valid or invalid. This has led to on-the-fly syntax highlighting, where visual annotations are generated just before content is served, often at high request rates and under incomplete input conditions. To meet these demands efficiently, state-of-the-art models use deep learning to learn the behavior of brute-force syntax highlighting resolvers, tools that are easy to implement but too slow for production. Through the Deep Abstraction process, brute-force strategies are encoded into fast statistical models that achieve both high accuracy and low-latency inference. Despite their success, such models face key challenges: they support only one programming language per model, require large datasets from slow brute-force generators, and involve resource-intensive training. In multi-language environments, this means maintaining multiple independent models, increasing system complexity and operational cost. This work addresses these issues by introducing a unified model capable of highlighting up to six mainstream programming languages, reducing deployment complexity by a factor of six and improving performance on unseen languages. A novel normalization technique significantly enhances model generalization, while few-shot learning experiments show that a small number of oracle samples can replace large datasets, minimizing dependence on brute-force generators. Combined, these innovations enable efficient, scalable, and cost-effective syntax highlighting across diverse programming languages.
Abstract（参考訳）: 構文強調表示は現代のソフトウェア開発環境において重要な機能であり、コードの可読性と開発者の生産性を向上させる。しかし、バックエンドサービスに厳格な時間とメモリ制限があるため、オンラインおよびWebベースの開発ツールでは、リアルタイムに正確なハイライトを提供するのは難しい。これらのシステムは、コードが部分的に有効または無効である場合でも、素早く頻繁にハイライトを提供する必要がある。これはオンザフライの構文ハイライトにつながり、コンテンツが提供される直前に視覚的なアノテーションが生成され、しばしば要求率が高く、不完全な入力条件下で実行される。これらの要求を効率的に満たすため、最先端のモデルはディープラーニングを使用して、ブルートフォース構文ハイライトリゾルバ(実装が容易だが運用には遅すぎるツール)の振る舞いを学習する。 Deep Abstractionプロセスを通じて、ブルートフォース戦略は高速な統計モデルに符号化され、高い精度と低レイテンシ推論の両方を達成する。モデル毎にひとつのプログラミング言語しかサポートせず、遅いブルートフォースジェネレータからの大きなデータセットを必要とし、リソース集約的なトレーニングを伴います。マルチ言語環境では、複数の独立したモデルを維持し、システムの複雑さと運用コストを増大させる。この作業は、最大6つの主流プログラミング言語を強調表示できる統一モデルを導入し、デプロイの複雑さを6倍に減らし、目に見えない言語のパフォーマンスを向上させることで、これらの問題に対処する。新たな正規化技術によりモデル一般化が大幅に向上する一方、少数の学習実験では、少数のオラクルサンプルが大規模なデータセットを置き換えることができ、ブルートフォースジェネレータへの依存を最小限に抑えることが示されている。これらのイノベーションを組み合わせることで、様々なプログラミング言語にまたがる効率的でスケーラブルで費用対効果の高い構文が実現できます。

論文の概要: Multi Language Models for On-the-Fly Syntax Highlighting

関連論文リスト