Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20241101となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# DQ-DETR: ティニーオブジェクト検出のための動的クエリ付きDTR DQ-DETR: DETR with Dynamic Query for Tiny Object Detection ( http://arxiv.org/abs/2404.03507v5 ) ライセンス: Link先を確認	Hou-I Liu, Yi-Xin Huang, Hong-Han Shuai, Wen-Huang Cheng,	(参考訳) 従来のDETRのような手法がジェネリックオブジェクト検出に成功しているにも関わらず、オブジェクトクエリの位置情報は、通常オブジェクトよりもスケールが極端に小さい小さなオブジェクトを検出するためにカスタマイズされていないため、小さなオブジェクト検出は依然として難しい課題である。また、一定の数のクエリを使用したDETRライクなメソッドは、小さなオブジェクトのみを含む空中データセットには適せず、インスタンスの数は異なるイメージ間で不均衡である。そこで本稿では,DQ-DETRという,分類的カウントモジュール,カウント誘導機能拡張,動的クエリ選択という,3つのコンポーネントから構成されるシンプルなモデルを提案する。 DQ-DETRは、カテゴリカウントモジュールからの予測と密度マップを使用して、オブジェクトクエリの数を動的に調整し、クエリの位置情報を改善する。我々のモデルDQ-DETRは従来のCNNやDETRのような手法より優れており、AI-TOD-V2データセット上で最先端のmAPを30.2%達成している。私たちのコードはhttps://github.com/Katie0723/DQ-DETRで公開されます。 Despite previous DETR-like methods having performed successfully in generic object detection, tiny object detection is still a challenging task for them since the positional information of object queries is not customized for detecting tiny objects, whose scale is extraordinarily smaller than general objects. Also, DETR-like methods using a fixed number of queries make them unsuitable for aerial datasets, which only contain tiny objects, and the numbers of instances are imbalanced between different images. Thus, we present a simple yet effective model, named DQ-DETR, which consists of three different components: categorical counting module, counting-guided feature enhancement, and dynamic query selection to solve the above-mentioned problems. DQ-DETR uses the prediction and density maps from the categorical counting module to dynamically adjust the number of object queries and improve the positional information of queries. Our model DQ-DETR outperforms previous CNN-based and DETR-like methods, achieving state-of-the-art mAP 30.2% on the AI-TOD-V2 dataset, which mostly consists of tiny objects. Our code will be available at https://github.com/Katie0723/DQ-DETR.	翻訳日:2024-11-09 03:26:10 公開日:2024-11-01
# HENASY:Egocentric Video-Language Modelのためのシーンエンティティの集合学習 HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model ( http://arxiv.org/abs/2406.00307v3 ) ライセンス: Link先を確認	Khoa Vo, Thinh Phan, Kashu Yamazaki, Minh Tran, Ngan Le,	(参考訳) 現在のビデオ言語モデル(VLM)は、ビデオと言語モダリティ間のインスタンスレベルのアライメントに大きく依存しており、(1)視覚的推論は、人間が一人称視点で行う自然な認識に反し、推論の解釈の欠如を招き、(2)学習は、2つのモダリティ間の固有のきめ細かい関係を捉えるのに限られている。本稿では、人間の知覚からインスピレーションを得て、エゴセントリックな映像表現のための構成的アプローチを探求する。 HENASY (Hierarchical ENtities ASsemblY) を導入し、時間を通して動的に進化するシーンエンティティを明示的にアレンジし、ビデオ表現とそれらの関係をモデル化する時空間トークングループ化機構を含む。構成構造理解を活用することで、HENASYは、自由形式のテキストクエリによる視覚的グラウンドリングを通じて、強い解釈性を持つ。さらに、エンティティ中心の理解を促進するために、多義的なコントラスト損失のスイートについても検討する。これは、ビデオナレーション、名詞、動詞のアライメントという3つのアライメントタイプから構成される。提案手法は,ビデオ/テキスト検索,アクション認識,マルチチョイスクエリ,自然言語クエリ,モーメントクエリを含む,ゼロショット転送やビデオ/テキスト表現による5つの下流タスクの競合性能を維持しながら,定量的および定性的な実験において高い解釈性を示す。 Current video-language models (VLMs) rely extensively on instance-level alignment between video and language modalities, which presents two major limitations: (1) visual reasoning disobeys the natural perception that humans do in first-person perspective, leading to a lack of reasoning interpretation; and (2) learning is limited in capturing inherent fine-grained relationships between two modalities. In this paper, we take an inspiration from human perception and explore a compositional approach for egocentric video representation. We introduce HENASY (Hierarchical ENtities ASsemblY), which includes a spatiotemporal token grouping mechanism to explicitly assemble dynamically evolving scene entities through time and model their relationship for video representation. By leveraging compositional structure understanding, HENASY possesses strong interpretability via visual grounding with free-form text queries. We further explore a suite of multi-grained contrastive losses to facilitate entity-centric understandings. This comprises three alignment types: video-narration, noun-entity, verb-entities alignments. Our method demonstrates strong interpretability in both quantitative and qualitative experiments; while maintaining competitive performances on five downstream tasks via zero-shot transfer or as video/text representation, including video/text retrieval, action recognition, multi-choice query, natural language query, and moments query.	翻訳日:2024-11-09 01:56:09 公開日:2024-11-01
# DuQuant: デュアルトランスフォーメーションによるアウトリーチの配布により、より強力な量子LLMが実現 DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs ( http://arxiv.org/abs/2406.01721v2 ) ライセンス: Link先を確認	Haokun Lin, Haobo Xu, Yichen Wu, Jingzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun, Ying Wei,	(参考訳) 大規模言語モデル(LLM)の量子化は、特に効率的な低ビット表現を妨げる外部アクティベーションの存在により、大きな課題に直面している。従来のアプローチは主に$\textit{Normal Outliers}$に対処する。しかし、これらの手法は、より大きな値を表示する$\textit{Massive Outliers}$の滑らか化に苦慮し、低ビット量子化の大幅な性能低下につながる。本稿では, 回転変換と置換変換を利用して, 大規模および正常な外れ値の両方を効果的に緩和する新しいアプローチであるDuQuantを紹介する。まず、DuQuantは、特定の外周次元を事前の知識として使用して回転行列を構築し、ブロックワイズ回転により隣接チャネルへの外周を再分配することから始める。第2に,ブロック間における外乱の分布のバランスをとるためにジグザグの変分を用いることにより,ブロック幅のばらつきを低減させる。その後の回転はアクティベーションランドスケープをさらに円滑にし、モデル性能を高める。 DuQuantは、量子化プロセスを単純化し、アウトリーチの管理に優れ、4ビットの重みアクティベーション量子化であっても、複数のタスクにおいて、さまざまなサイズやタイプのLLMに対して最先端のベースラインよりも優れています。私たちのコードはhttps://github.com/Hsu1023/DuQuant.comから入手可能です。 Quantization of large language models (LLMs) faces significant challenges, particularly due to the presence of outlier activations that impede efficient low-bit representation. Traditional approaches predominantly address $\textit{Normal Outliers}$, which are activations across all tokens with relatively large magnitudes. However, these methods struggle with smoothing $\textit{Massive Outliers}$ that display significantly larger values, which leads to significant performance degradation in low-bit quantization. In this paper, we introduce DuQuant, a novel approach that utilizes rotation and permutation transformations to more effectively mitigate both massive and normal outliers. First, DuQuant starts by constructing rotation matrices, using specific outlier dimensions as prior knowledge, to redistribute outliers to adjacent channels by block-wise rotation. Second, We further employ a zigzag permutation to balance the distribution of outliers across blocks, thereby reducing block-wise variance. A subsequent rotation further smooths the activation landscape, enhancing model performance. DuQuant simplifies the quantization process and excels in managing outliers, outperforming the state-of-the-art baselines across various sizes and types of LLMs on multiple tasks, even with 4-bit weight-activation quantization. Our code is available at https://github.com/Hsu1023/DuQuant.	翻訳日:2024-11-09 01:56:09 公開日:2024-11-01
# ゴール制約付き双方向探索による二重符号化合成計画 Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search ( http://arxiv.org/abs/2407.06334v2 ) ライセンス: Link先を確認	Kevin Yu, Jihye Roh, Ziang Li, Wenhao Gao, Runzhong Wang, Connor W. Coley,	(参考訳) コンピュータ支援合成計画(CASP)アルゴリズムは、低から中程度の複雑さの分子への逆合成経路を計画する専門家レベルの能力を示している。しかし、現在の探索法では、任意の構造ブロックに到達し、特定の分子の使用が望まれる一般的な現実世界の制約に対処できないと仮定している。そこで,本論文では,材料制約を起点とした合成計画の定式化について述べる。本定式化では,目的と目標からの展開をインターリーブし,制約を満たすために,双方向グラフ探索方式に基づく新しいCASPアルゴリズムであるDouble-Ended Synthesis Planning (DESP)を提案する。探索アルゴリズムは、有効化学反応の部分的に観察されたハイパーグラフからオフラインで学習した目標条件付きコストネットワークによって導かれる。複数の新しいベンチマークにおいて、専門家の目標に向けた合成計画に偏りを生じさせることで、解解率の向上と探索拡張数の削減にDESPの有用性を実証する。 DESPは既存のワンステップ逆合成モデルを利用することができ、これらのワンステップモデルの性能が向上するにつれて、その性能が拡大すると予想する。 Computer-aided synthesis planning (CASP) algorithms have demonstrated expert-level abilities in planning retrosynthetic routes to molecules of low to moderate complexity. However, current search methods assume the sufficiency of reaching arbitrary building blocks, failing to address the common real-world constraint where using specific molecules is desired. To this end, we present a formulation of synthesis planning with starting material constraints. Under this formulation, we propose Double-Ended Synthesis Planning (DESP), a novel CASP algorithm under a bidirectional graph search scheme that interleaves expansions from the target and from the goal starting materials to ensure constraint satisfiability. The search algorithm is guided by a goal-conditioned cost network learned offline from a partially observed hypergraph of valid chemical reactions. We demonstrate the utility of DESP in improving solve rates and reducing the number of search expansions by biasing synthesis planning towards expert goals on multiple new benchmarks. DESP can make use of existing one-step retrosynthesis models, and we anticipate its performance to scale as these one-step model capabilities improve.	翻訳日:2024-11-08 23:13:33 公開日:2024-11-01
# バックストリーのアンソロジーによる言語モデルのための仮想ペルソナ Virtual Personas for Language Models via an Anthology of Backstories ( http://arxiv.org/abs/2407.06576v2 ) ライセンス: Link先を確認	Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Kohen Behar, David M. Chan,	(参考訳) 大規模言語モデル(LLM)は、何百万人もの異なる著者によって書かれた膨大なテキストリポジトリから訓練され、人間の特性の多様性を反映している。これらのモデルは、行動学的研究において、人間の被験者の近似として使われる可能性があるが、これまでは、個人のユーザーに合わせてモデル応答を操ることに限られていた。本研究では,オープンエンドのライフストーリーを活用することで,LLMを特定の仮想人格に調和させる手法であるAnthologyを紹介し,これを「バックストリー」と呼ぶ。本手法は,実験結果の一貫性と信頼性を高めつつ,多様なサブ集団のより良い表現を確実にすることを示す。 Pew Research CenterのAmerican Trends Panel (ATP) で実施された3つの全国的代表的人間調査のうち、Anthology は人間の回答分布の一致を最大18%改善し、一貫性の指標を27%改善することを示した。私たちのコードと生成されたバックストリーはhttps://github.com/CannyLab/anthology.comで公開されています。 Large language models (LLMs) are trained from vast repositories of text authored by millions of distinct authors, reflecting an enormous diversity of human traits. While these models bear the potential to be used as approximations of human subjects in behavioral studies, prior efforts have been limited in steering model responses to match individual human users. In this work, we introduce "Anthology", a method for conditioning LLMs to particular virtual personas by harnessing open-ended life narratives, which we refer to as "backstories." We show that our methodology enhances the consistency and reliability of experimental outcomes while ensuring better representation of diverse sub-populations. Across three nationally representative human surveys conducted as part of Pew Research Center's American Trends Panel (ATP), we demonstrate that Anthology achieves up to 18% improvement in matching the response distributions of human respondents and 27% improvement in consistency metrics. Our code and generated backstories are available at https://github.com/CannyLab/anthology.	翻訳日:2024-11-08 23:02:19 公開日:2024-11-01
# バックストリーのアンソロジーによる言語モデルのための仮想ペルソナ Virtual Personas for Language Models via an Anthology of Backstories ( http://arxiv.org/abs/2407.06576v3 ) ライセンス: Link先を確認	Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Kohen Behar, David M. Chan,	(参考訳) 大規模言語モデル(LLM)は、何百万人もの異なる著者によって書かれた膨大なテキストリポジトリから訓練され、人間の特性の多様性を反映している。これらのモデルは、行動学的研究において、人間の被験者の近似として使われる可能性があるが、これまでは、個人のユーザーに合わせてモデル応答を操ることに限られていた。本研究では,オープンエンドのライフストーリーを活用することで,LLMを特定の仮想人格に調和させる手法であるAnthologyを紹介し,これを「バックストリー」と呼ぶ。本手法は,実験結果の一貫性と信頼性を高めつつ,多様なサブ集団のより良い表現を確実にすることを示す。 Pew Research CenterのAmerican Trends Panel (ATP) で実施された3つの全国的代表的人間調査のうち、Anthology は人間の回答分布の一致を最大18%改善し、一貫性の指標を27%改善することを示した。私たちのコードと生成されたバックストリーはhttps://github.com/CannyLab/anthology.comで公開されています。 Large language models (LLMs) are trained from vast repositories of text authored by millions of distinct authors, reflecting an enormous diversity of human traits. While these models bear the potential to be used as approximations of human subjects in behavioral studies, prior efforts have been limited in steering model responses to match individual human users. In this work, we introduce "Anthology", a method for conditioning LLMs to particular virtual personas by harnessing open-ended life narratives, which we refer to as "backstories." We show that our methodology enhances the consistency and reliability of experimental outcomes while ensuring better representation of diverse sub-populations. Across three nationally representative human surveys conducted as part of Pew Research Center's American Trends Panel (ATP), we demonstrate that Anthology achieves up to 18% improvement in matching the response distributions of human respondents and 27% improvement in consistency metrics. Our code and generated backstories are available at https://github.com/CannyLab/anthology.	翻訳日:2024-11-08 23:02:19 公開日:2024-11-01
# サイバーセキュリティ環境におけるモデル非依存クリーンラベルバックドア緩和 Model-agnostic clean-label backdoor mitigation in cybersecurity environments ( http://arxiv.org/abs/2407.08159v3 ) ライセンス: Link先を確認	Giorgio Severi, Simona Boboila, John Holodnak, Kendra Kratkiewicz, Rauf Izmailov, Michael J. De Lucia, Alina Oprea,	(参考訳) 機械学習モデルのトレーニングフェーズは、特にサイバーセキュリティにおける微妙なステップである。近年の研究では、トレーニングラベルを変更することなく、セキュリティ分類タスク用に設計されたモデルにバックドアを注入する、一連の悪質なトレーニングタイム攻撃が表面化している。本研究では,サイバーセキュリティの脅威モデルに対する洞察を利用して,これらのクリーンラベル中毒攻撃を効果的に軽減し,モデルユーティリティを保ちながら,新たな手法を提案する。慎重に選択された特徴部分空間上で密度に基づくクラスタリングを行い、新たな反復的なスコアリング手順によって不審なクラスタを段階的に分離することにより、既存のバックドア防衛文献に共通する前提の多くを必要とせずに攻撃を緩和することができる。提案手法の汎用性を示すため,ネットワークフローの分類とマルウェアの分類という,2つの古典的サイバーセキュリティデータに対するクリーンラベルモデルに依存しない2つの攻撃について,勾配強化とニューラルネットワークモデルを用いて評価を行った。 The training phase of machine learning models is a delicate step, especially in cybersecurity contexts. Recent research has surfaced a series of insidious training-time attacks that inject backdoors in models designed for security classification tasks without altering the training labels. With this work, we propose new techniques that leverage insights in cybersecurity threat models to effectively mitigate these clean-label poisoning attacks, while preserving the model utility. By performing density-based clustering on a carefully chosen feature subspace, and progressively isolating the suspicious clusters through a novel iterative scoring procedure, our defensive mechanism can mitigate the attacks without requiring many of the common assumptions in the existing backdoor defense literature. To show the generality of our proposed mitigation, we evaluate it on two clean-label model-agnostic attacks on two different classic cybersecurity data modalities: network flows classification and malware classification, using gradient boosting and neural network models.	翻訳日:2024-11-08 22:29:08 公開日:2024-11-01
# MAVIS: 自動データエンジンによる数学的ビジュアルインストラクションチューニング MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine ( http://arxiv.org/abs/2407.08739v2 ) ライセンス: Link先を確認	Renrui Zhang, Xinyu Wei, Dongzhi Jiang, Ziyu Guo, Shicheng Li, Yichi Zhang, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Bin Wei, Shanghang Zhang, Peng Gao, Chunyuan Li, Hongsheng Li,	(参考訳) MLLM(Multi-modal Large Language Models)の数学的能力は、数学図の視覚的エンコーディング、図形言語アライメント、チェーン・オブ・シークレット(CoT)推論の3つの領域で、まだ未調査のままである。これにより、効果的なトレーニングパラダイムと、詳細なCoTの合理性を備えた大規模で包括的なデータセットの緊急需要が生まれ、手動でアノテートするのは難しく、コストもかかる。この問題に対処するために,MLLMのための数学的なVISual命令チューニングパイプラインであるMAVISを提案する。我々は,データ生成プロセスが人間の介入やGPT APIの使用に完全に依存するように設計し,ダイアグラム・キャプション対応,質問応答の正しさ,CoT推論品質を保証した。このアプローチでは,MAVIS-Caption(558Kダイアグラムキャプションペア)とMAVIS-Instruct(834K視覚数学問題)の2つのデータセットをキュレートし,MLLMをスクラッチからトレーニングするための4つの段階を提案する。まず,MAVIS-Captionを用いて,図形視覚符号化の改良に適した対照的学習により,数学固有の視覚エンコーダ(CLIP-Math)を微調整する。第二に、MAVIS-Captionを利用して、CLIP-Mathをプロジェクション層によって大きな言語モデル(LLM)に整合させ、数学的領域における視覚言語アライメントを向上させる。第3に、ロバストな問題解決スキルの指導チューニングを行うためにMAVIS-Instructを採用し、結果のモデルをMAVIS-7Bと呼ぶ。第4に、我々のモデルのCoT能力を高めるために直接選好最適化(DPO)を適用し、ステップワイズ推論性能をさらに改善する。コードとデータはhttps://github.com/ZrrSkywalker/MAVISで公開される。 The mathematical capabilities of Multi-modal Large Language Models (MLLMs) remain under-explored with three areas to be improved: visual encoding of math diagrams, diagram-language alignment, and chain-of-thought (CoT) reasoning. This draws forth an urgent demand for an effective training paradigm and a large-scale, comprehensive dataset with detailed CoT rationales, which is challenging to collect and costly to annotate manually. To tackle this issue, we propose MAVIS, a MAthematical VISual instruction tuning pipeline for MLLMs, featuring an automatic data engine to efficiently create mathematical visual datasets. We design the data generation process to be entirely independent of human intervention or GPT API usage, while ensuring the diagram-caption correspondence, question-answer correctness, and CoT reasoning quality. With this approach, we curate two datasets, MAVIS-Caption (558K diagram-caption pairs) and MAVIS-Instruct (834K visual math problems with CoT rationales), and propose four progressive stages for training MLLMs from scratch. First, we utilize MAVIS-Caption to fine-tune a math-specific vision encoder (CLIP-Math) through contrastive learning, tailored for improved diagram visual encoding. Second, we also leverage MAVIS-Caption to align the CLIP-Math with a large language model (LLM) by a projection layer, enhancing vision-language alignment in mathematical domains. Third, we adopt MAVIS-Instruct to perform the instruction tuning for robust problem-solving skills, and term the resulting model as MAVIS-7B. Fourth, we apply Direct Preference Optimization (DPO) to enhance the CoT capabilities of our model, further refining its step-wise reasoning performance. Code and data will be released at https://github.com/ZrrSkywalker/MAVIS	翻訳日:2024-11-08 22:17:54 公開日:2024-11-01
# DPEC:低光度画像明瞭度向上のためのデュアルパス誤差補償法 DPEC: Dual-Path Error Compensation Method for Enhanced Low-Light Image Clarity ( http://arxiv.org/abs/2407.09553v3 ) ライセンス: Link先を確認	Shuang Wang, Qianwen Lu, Yihe Nie, Qingchuan Tao, Yanmei Yu,	(参考訳) 低照度画像強調の課題に対して,ディープラーニングに基づくアルゴリズムは従来の手法に比べて優れ,有効性を示している。既存のディープラーニングアルゴリズムは、主にRetinex理論に基づいて提案されているが、入力に含まれるノイズや色歪みを見落とし、最終的な結果において大きなノイズ増幅と局所色歪みをもたらすことがしばしばある。そこで本研究では,低照度条件下での画質向上を目的としたDual-Path Error Compensation法(DPEC)を提案する。 DPECは、微妙なピクセル差を正確に捉えた正確なピクセルレベルの誤差推定と、不要なノイズを効果的に除去する独立デノナイズを行う。局所的なテクスチャの詳細を保存し、ノイズ増幅を回避しつつ、画像の明るさを復元する。さらに,従来のCNNの長期的意味情報収集能力の限界を補うとともに,計算速度と資源効率の両方を考慮して,VMambaアーキテクチャをDPECのバックボーンに統合した。さらに, DPECのトレーニングを制約するため, HIS-Retinex損失を導入し, 画像の全体輝度分布が実環境とより密に一致していることを確認する。総合的な定量的および定性的な実験結果から,本アルゴリズムは6つのベンチマークテストにおいて,最先端の手法を著しく上回っていることが示された。 For the task of low-light image enhancement, deep learning-based algorithms have demonstrated superiority and effectiveness compared to traditional methods. Existing deep learning algorithms are proposed mainly based on the Retinex theory but overlook the noise and color distortion present in the input, which frequently results in significant noise amplification and local color distortion in the final results. To address this, we propose a Dual-Path Error Compensation method (DPEC), which aims to improve image quality in low-light conditions. DPEC performs precise pixel-level error estimation, which accurately captures subtle pixels differences, and independent denoising, which effectively removes unnecessary noise. This method restores image brightness while preserving local texture details and avoiding noise amplification. Furthermore, to compensate for the traditional CNN's limited ability to capture long-range semantic information and considering both computational speed and resource efficiency, we integrated the VMamba architecture into the backbone of DPEC. In addition, we introduced the HIS-Retinex loss to constrain the training of DPEC, ensuring that the overall brightness distribution of the images more closely aligns with real-world conditions. Comprehensive quantitative and qualitative experimental results demonstrate that our algorithm significantly outperforms state-of-the-art methods across six benchmark tests.	翻訳日:2024-11-08 21:54:45 公開日:2024-11-01
# DPEC:低光度画像明瞭度向上のためのデュアルパス誤差補償法 DPEC: Dual-Path Error Compensation Method for Enhanced Low-Light Image Clarity ( http://arxiv.org/abs/2407.09553v4 ) ライセンス: Link先を確認	Shuang Wang, Qianwen Lu, Boxing Peng, Yihe Nie, Qingchuan Tao,	(参考訳) 低照度画像強調の課題に対して,ディープラーニングに基づくアルゴリズムは従来の手法に比べて優れ,有効性を示している。しかし、これらの手法は主にレチネックス理論に基づいており、入力画像のノイズや色歪みを見落とし、ノイズの増幅や局所色歪みが増大する傾向にある。これらの問題に対処するため,低照度条件下での画質向上を目的としたDual-Path Error Compensation (DPEC)法を提案する。 DPECには、微妙な違いを捉えるための正確なピクセルレベルの誤差推定と、ノイズ増幅を防ぐための独立したデノナイジング機構が組み込まれている。我々は、DPECのトレーニングをガイドするためにHIS-Retinex損失を導入し、拡張画像の輝度分布が現実世界の条件と密接に一致していることを保証する。グローバルコンテキストの包括的理解のためにDPECを訓練しながら計算速度と資源効率のバランスをとるため,VMambaアーキテクチャをバックボーンに統合した。総合的な定量的および定性的実験結果から,このアルゴリズムは低照度画像強調における最先端手法を著しく上回っていることが示された。コードはhttps://github.com/wangshuang233/DPECで公開されている。 For the task of low-light image enhancement, deep learning-based algorithms have demonstrated superiority and effectiveness compared to traditional methods. However, these methods, primarily based on Retinex theory, tend to overlook the noise and color distortions in input images, leading to significant noise amplification and local color distortions in enhanced results. To address these issues, we propose the Dual-Path Error Compensation (DPEC) method, designed to improve image quality under low-light conditions by preserving local texture details while restoring global image brightness without amplifying noise. DPEC incorporates precise pixel-level error estimation to capture subtle differences and an independent denoising mechanism to prevent noise amplification. We introduce the HIS-Retinex loss to guide DPEC's training, ensuring the brightness distribution of enhanced images closely aligns with real-world conditions. To balance computational speed and resource efficiency while training DPEC for a comprehensive understanding of the global context, we integrated the VMamba architecture into its backbone. Comprehensive quantitative and qualitative experimental results demonstrate that our algorithm significantly outperforms state-of-the-art methods in low-light image enhancement. The code is publicly available online at https://github.com/wangshuang233/DPEC.	翻訳日:2024-11-08 21:54:45 公開日:2024-11-01
# $\texttt{MixGR}$:Complementary Granularityによる科学領域のRetriever Generalizationの強化 $\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity ( http://arxiv.org/abs/2407.10691v2 ) ライセンス: Link先を確認	Fengyu Cai, Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Iryna Gurevych, Heinz Koeppl,	(参考訳) 近年の研究では、知識ギャップを埋めることにより、科学領域内でのLLM、すなわちRAGの生成において文書検索の重要性が増している。しかし、密度の高い検索者は、特にクエリセグメントがドキュメントの様々な部分に対応する場合、ドメイン固有の検索と複雑なクエリドキュメントの関係に苦慮することが多い。そこで本研究では,クエリやドキュメントの様々なレベルの粒度にまたがるクエリ文書マッチングに対する高密度な検索者の認識を改善するために,ゼロショットアプローチを用いて$\texttt{MixGR}$を導入する。 $\texttt{MixGR}$は、これらの粒度に基づくさまざまなメトリクスを統合スコアに融合させ、包括的なクエリドキュメントの類似性を反映させる。実験の結果,nDCG@5では$\texttt{MixGR}$が従来の文書検索を24.7%,9.8%,6.9%で上回った。さらに、下流の2つの科学的質問応答タスクの有効性は、科学領域におけるLSMの適用を促進するために$\texttt{MixGR}$の利点を強調している。コードと実験データセットが利用可能だ。 Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domain-specific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, this paper introduces $\texttt{MixGR}$, which improves dense retrievers' awareness of query-document matching across various levels of granularity in queries and documents using a zero-shot approach. $\texttt{MixGR}$ fuses various metrics based on these granularities to a united score that reflects a comprehensive query-document similarity. Our experiments demonstrate that $\texttt{MixGR}$ outperforms previous document retrieval by 24.7%, 9.8%, and 6.9% on nDCG@5 with unsupervised, supervised, and LLM-based retrievers, respectively, averaged on queries containing multiple subqueries from five scientific retrieval datasets. Moreover, the efficacy of two downstream scientific question-answering tasks highlights the advantage of $\texttt{MixGR}$ to boost the application of LLMs in the scientific domain. The code and experimental datasets are available.	翻訳日:2024-11-08 21:32:38 公開日:2024-11-01
# コード生成におけるモジュール性獲得の影響を再考する Revisiting the Impact of Pursuing Modularity for Code Generation ( http://arxiv.org/abs/2407.11406v2 ) ライセンス: Link先を確認	Deokyeong Kang, Ki Jung Seo, Taeuk Kim,	(参考訳) より小さな独立したビルディングブロックを統合することで最終プログラムを構築することを目的としたモジュールプログラミングは、ソフトウェア開発において望ましい実践とみなされてきた。しかし、最近、大きな言語モデル(LLM)上に構築されたコード生成エージェントの台頭により、疑問が浮かび上がっている。本研究では,コード生成におけるモジュラリティの影響を定量的な測定基準として導入することによって評価する。驚くべきことに、このトピックに関する従来の知恵とは異なり、モジュラリティはコード生成モデルのパフォーマンスを改善するための中核的な要素ではない。また、LLMがモジュラーコードよりもモジュラーコードを好む理由についても検討する。 Modular programming, which aims to construct the final program by integrating smaller, independent building blocks, has been regarded as a desirable practice in software development. However, with the rise of recent code generation agents built upon large language models (LLMs), a question emerges: is this traditional practice equally effective for these new tools? In this work, we assess the impact of modularity in code generation by introducing a novel metric for its quantitative measurement. Surprisingly, unlike conventional wisdom on the topic, we find that modularity is not a core factor for improving the performance of code generation models. We also explore potential explanations for why LLMs do not exhibit a preference for modular code compared to non-modular code.	翻訳日:2024-11-08 21:10:26 公開日:2024-11-01
# コード生成におけるモジュール性獲得の影響を再考する Revisiting the Impact of Pursuing Modularity for Code Generation ( http://arxiv.org/abs/2407.11406v3 ) ライセンス: Link先を確認	Deokyeong Kang, Ki Jung Seo, Taeuk Kim,	(参考訳) より小さな独立したビルディングブロックを統合することで最終プログラムを構築することを目的としたモジュールプログラミングは、ソフトウェア開発において望ましい実践とみなされてきた。しかし、最近、大きな言語モデル(LLM)上に構築されたコード生成エージェントの台頭により、疑問が浮かび上がっている。本研究では,コード生成におけるモジュラリティの影響を定量的な測定基準として導入することによって評価する。驚くべきことに、このトピックに関する従来の知恵とは異なり、モジュラリティはコード生成モデルのパフォーマンスを改善するための中核的な要素ではない。また、LLMがモジュラーコードよりもモジュラーコードを好む理由についても検討する。 Modular programming, which aims to construct the final program by integrating smaller, independent building blocks, has been regarded as a desirable practice in software development. However, with the rise of recent code generation agents built upon large language models (LLMs), a question emerges: is this traditional practice equally effective for these new tools? In this work, we assess the impact of modularity in code generation by introducing a novel metric for its quantitative measurement. Surprisingly, unlike conventional wisdom on the topic, we find that modularity is not a core factor for improving the performance of code generation models. We also explore potential explanations for why LLMs do not exhibit a preference for modular code compared to non-modular code.	翻訳日:2024-11-08 21:10:26 公開日:2024-11-01
# HDLCopilot: ハードウェア設計とライブラリの自然言語探索 HDLCopilot: Natural Language Exploration of Hardware Designs and Libraries ( http://arxiv.org/abs/2407.12749v2 ) ライセンス: Link先を確認	Manar Abdelatty, Jacob Rosenstein, Sherief Reda,	(参考訳) ハードウェア設計のワークフローは、様々な製造ラボからプロセスデザインキット(PDK)を扱うことが多く、それぞれが速度、電力、密度などのメトリクスに最適化された、独自の標準セルライブラリを含んでいる。これらのライブラリには、セルのタイミングと電気的性質に関する情報、セルレイアウトの詳細、プロセス設計規則に関する複数のビューが含まれている。エンジニアは通常、設計とターゲット技術の間をナビゲートして、エリア最適化のための特定のゲートの選択やクリティカルパス速度の向上など、異なる設計シナリオに関する情報決定を行う。ゲートや設計ルールに関する特定の情報を取得するために、この複雑な風景をナビゲートすることは、しばしば時間がかかり、エラーが発生します。そこで本研究では,ハードウェア設計やPDKとのインタラクションを,自然言語クエリを通じて効率化する,大規模言語モデルを用いたマルチエージェント協調フレームワークであるHDLCopilotを提案する。 HDLCopilotは、エンジニアがゲートや設計ルールに関する関連情報に迅速にアクセスし、領域、速度、電力に関するトレードオフを評価して、情報決定を効率的かつ正確に行うことを可能にする。このフレームワークは、複雑な自然言語クエリの多様なセットに対して96.33\%の実行精度を達成する。 HDLCopilotは、ハードウェア設計ワークフローにおける強力なアシスタントとしての地位を確立し、生産性を高め、潜在的なヒューマンエラーを減らす。 Hardware design workflows often involve working with Process Design Kits (PDKs) from various fabrication labs, each containing its own set of standard cell libraries optimized for metrics such as speed, power, or density. These libraries include multiple views for information on timing and electrical properties of cells, cell layout details, and process design rules. Engineers typically navigate between the design and the target technology to make informed decisions on different design scenarios, such as selecting specific gates for area optimization or enhancing critical path speed. Navigating this complex landscape to retrieve specific information about gates or design rules is often time-consuming and error-prone. To address this, we present HDLCopilot, a multi-agent collaborative framework powered by large language models that enables engineers to streamline interactions with hardware design and PDKs through natural language queries. HDLCopilot enables engineers to quickly access relevant information on gates and design rules, evaluate tradeoffs related to area, speed, and power in order to make informed decisions efficiently and accurately. The framework achieves an execution accuracy of 96.33\% on a diverse set of complex natural language queries. HDLCopilot positions itself as a powerful assistant in hardware design workflows, enhancing productivity and reducing potential human errors.	翻訳日:2024-11-08 20:36:48 公開日:2024-11-01
# 語彙によるスケーリング法則:より大きなモデルはより大きな語彙を保存する Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies ( http://arxiv.org/abs/2407.13623v3 ) ライセンス: Link先を確認	Chaofan Tao, Qian Liu, Longxu Dou, Niklas Muennighoff, Zhongwei Wan, Ping Luo, Min Lin, Ngai Wong,	(参考訳) 大規模言語モデル(LLM)のスケーリングに関する研究は、主に、語彙サイズの役割を見越して、モデルパラメータとトレーニングデータサイズに重点を置いている。語彙サイズがLLMスケーリング法にどう影響するかを,最大500B文字における33Mから3Bパラメータのトレーニングモデルを用いて検討した。本稿では,IsoFLOPs解析,微分推定,損失関数のパラメトリック適合という,計算-最適語彙サイズを予測するための3つの補完的手法を提案する。我々のアプローチは、最適な語彙サイズは計算予算に依存し、より大きなモデルはより大きな語彙を必要とするという結論に収束する。しかし、ほとんどのLLMは語彙サイズが不十分である。例えば、Llama2-70Bの最適な語彙サイズは少なくとも216Kであり、32Kの語彙の7倍である。 FLOPの予算の異なる3Bパラメータのトレーニングモデルを用いて予測を実証的に検証する。予測された最適な語彙サイズを採用することで、一般的に使用される語彙サイズよりも下流のパフォーマンスが一貫して向上する。従来の32Kから43Kへの語彙サイズ拡大により、同じ2.3e21 FLOPでARC-Challengeの性能を29.1から32.0に改善した。我々の研究は、トークン化とモデルのスケーリングを効果的に事前学習するために共同で検討することの重要性を強調している。コードとデモはhttps://github.com/sail-sg/scaling-with-vocabとhttps://hf.co/spaces/sail/scaling-with-vocab-demoで公開されている。 Research on scaling large language models (LLMs) has primarily focused on model parameters and training data size, overlooking the role of vocabulary size. We investigate how vocabulary size impacts LLM scaling laws by training models ranging from 33M to 3B parameters on up to 500B characters with various vocabulary configurations. We propose three complementary approaches for predicting the compute-optimal vocabulary size: IsoFLOPs analysis, derivative estimation, and parametric fit of the loss function. Our approaches converge on the conclusion that the optimal vocabulary size depends on the compute budget, with larger models requiring larger vocabularies. Most LLMs, however, use insufficient vocabulary sizes. For example, we predict that the optimal vocabulary size of Llama2-70B should have been at least 216K, 7 times larger than its vocabulary of 32K. We validate our predictions empirically by training models with 3B parameters across different FLOPs budgets. Adopting our predicted optimal vocabulary size consistently improves downstream performance over commonly used vocabulary sizes. By increasing the vocabulary size from the conventional 32K to 43K, we improve performance on ARC-Challenge from 29.1 to 32.0 with the same 2.3e21 FLOPs. Our work highlights the importance of jointly considering tokenization and model scaling for efficient pre-training. The code and demo are available at https://github.com/sail-sg/scaling-with-vocab and https://hf.co/spaces/sail/scaling-with-vocab-demo.	翻訳日:2024-11-08 20:25:29 公開日:2024-11-01
# 手術映像における弱教師付き物体検出とセグメンテーションのための空間的時間的知識の遠心化 Disentangling spatio-temporal knowledge for weakly supervised object detection and segmentation in surgical video ( http://arxiv.org/abs/2407.15794v4 ) ライセンス: Link先を確認	Guiqiu Liao, Matjaz Jogan, Sai Koushik, Eric Eaton, Daniel A. Hashimoto,	(参考訳) 弱教師付きビデオオブジェクトセグメンテーション(WSVOS)は、オブジェクトマスクの広範なトレーニングデータセットを必要としないセグメンテーションマップの識別を可能にし、代わりに、オブジェクトの存在を示す粗いビデオラベルに依存する。現在の最先端の手法では、モーションキューを使用する複数の独立した処理段階を必要とするか、あるいはエンドツーエンドのトレーニング可能なネットワークの場合、セグメント化の精度が欠如している。これにより、複数の手術ツールが視野内を頻繁に移動する手術ビデオのセマンティックアノテーションに対するWSVOSの適用が制限されるが、WSVOSでは通常遭遇するよりも難しい問題である。本稿では,半分離型知識蒸留を用いて時空間情報を分散し,高品質なクラスアクティベーションマップ(CAM)を予測するフレームワークであるVDST-Netを提案する。ビデオ中の物体の位置やタイミングに関する特定情報が提供されていない場合の時間的矛盾を解決するために設計された教師ネットワークは、時間的依存を活用して情報を統合する学生ネットワークで動作する。提案するフレームワークは,一般的な参照データセットや,オブジェクトが平均60倍未満のアノテートフレームに存在するような,より困難な手術用ビデオデータセット上で有効であることを示す。本手法は最先端技術より優れ,映像レベルの弱い監督下で優れたセグメンテーションマスクを生成する。 Weakly supervised video object segmentation (WSVOS) enables the identification of segmentation maps without requiring an extensive training dataset of object masks, relying instead on coarse video labels indicating object presence. Current state-of-the-art methods either require multiple independent stages of processing that employ motion cues or, in the case of end-to-end trainable networks, lack in segmentation accuracy, in part due to the difficulty of learning segmentation maps from videos with transient object presence. This limits the application of WSVOS for semantic annotation of surgical videos where multiple surgical tools frequently move in and out of the field of view, a problem that is more difficult than typically encountered in WSVOS. This paper introduces Video Spatio-Temporal Disentanglement Networks (VDST-Net), a framework to disentangle spatiotemporal information using semi-decoupled knowledge distillation to predict high-quality class activation maps (CAMs). A teacher network designed to resolve temporal conflicts when specifics about object location and timing in the video are not provided works with a student network that integrates information over time by leveraging temporal dependencies. We demonstrate the efficacy of our framework on a public reference dataset and on a more challenging surgical video dataset where objects are, on average, present in less than 60\% of annotated frames. Our method outperforms state-of-the-art techniques and generates superior segmentation masks under video-level weak supervision.	翻訳日:2024-11-08 15:45:25 公開日:2024-11-01
# CrysToGraph: 結晶材料特性の総合予測モデルとベンチマーク CrysToGraph: A Comprehensive Predictive Model for Crystal Materials Properties and the Benchmark ( http://arxiv.org/abs/2407.16131v2 ) ライセンス: Link先を確認	Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong,	(参考訳) 格子を横切るイオン結合と秩序のある顕微鏡構造は、独特の対称性を持つ結晶を包含し、そのマクロな性質を決定づける。特に非伝統的な結晶は、非古典的な格子構造を示すか、またはエキゾチックな物理的性質を持つため、研究対象として興味をそそる。したがって、結晶の物理的および化学的性質を正確に予測するためには、長距離秩序を考えることが重要である。 GNNは結晶中の原子の局所的な環境を捉えるのに優れていますが、その深さが限られているため、しばしば長距離の相互作用を効果的に捉えるという課題に直面します。本稿では,非古典結晶系に特化して設計された新しい変圧器ベースの幾何グラフネットワークであるCrysToGraph ($\textbf{Crys}$tals with $\textbf{T}$ransformers $\textbf{o}$n $\textbf{Graph}$sと,欠陥結晶,低次元結晶,MOFなどの非古典結晶材料に対するモデル予測性能を評価するための総合ベンチマークであるUnconvBenchを提案する。 CrysToGraphは、トランスフォーマーベースのグラフ畳み込みブロックと、グラフワイドトランスフォーマーブロックとの長距離インタラクションを効果的にキャプチャする。 CrysToGraphは、非伝統的な結晶材料を複数のタスクでモデル化する効果を証明し、また、非伝統的な結晶と伝統的な結晶の両方のベンチマークにおいて、新しい最先端の結果を達成して、既存の方法よりも優れていることを証明している。 The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of crystals, it is crucial to consider long-range orders. While GNN excels at capturing the local environment of atoms in crystals, they often face challenges in effectively capturing longer-ranged interactions due to their limited depth. In this paper, we propose CrysToGraph ($\textbf{Crys}$tals with $\textbf{T}$ransformers $\textbf{o}$n $\textbf{Graph}$s), a novel transformer-based geometric graph network designed specifically for unconventional crystalline systems, and UnconvBench, a comprehensive benchmark to evaluate models' predictive performance on unconventional crystal materials such as defected crystals, low-dimension crystals and MOF. CrysToGraph effectively captures short-range interactions with transformer-based graph convolution blocks as well as long-range interactions with graph-wise transformer blocks. CrysToGraph proofs its effectiveness in modelling unconventional crystal materials in multiple tasks, and moreover, it outperforms most existing methods, achieving new state-of-the-art results on the benchmarks of both unconventional crystals and traditional crystals.	翻訳日:2024-11-08 15:34:26 公開日:2024-11-01
# QLDPC手術の改善 : 論理的計測とブリッジコード Improved QLDPC Surgery: Logical Measurements and Bridging Codes ( http://arxiv.org/abs/2407.18393v2 ) ライセンス: Link先を確認	Andrew Cross, Zhiyang He, Patrick Rall, Theodore Yoder,	(参考訳) 本稿では,Cohen et al ~ (Sci.~Adv.〜8,eabn1717) の構成に基づく論理的測定法であるゲージ固定型QLDPC手術法を提案する。提案手法はタナーグラフの拡張特性を利用してQLDPC手術の空間オーバーヘッドを大幅に低減する。ある場合には、重量$w$論理演算子をフォールトトレラントに測定するために、$\Theta(w)$ ancilla qubitsしか必要としない。提案手法の符号距離と故障距離を厳密に解析し,最大故障距離を実現するモジュールデコーディングアルゴリズムを提案する。さらに,論理演算子の耐故障継手測定を容易にするブリッジシステムを導入する。このブリッジ構築により、我々のスキームは、異なるQLDPC符号のファミリーを1つのユニバーサルアーキテクチャに接続するために使用できる。ツールボックスを適用して、[144,12,12]二変量自転車のコードですべての論理的なクリフォードゲートを実行する方法を示します。本手法では接続グラフに103個のアンシラ量子ビットを付加し,12個の論理量子ビットのうちの1つをゲート合成のアンシラとして用いる。論理的測定は、288 パウリ積の測定を実装するために Bravyi et al ~ (Nature 627, 778-782) によって研究された自己同型ゲートと組み合わせられる。本稿では,BPOSDとマッチングを組み合わせたモジュール型デコーダを用いて,回路レベルのノイズシミュレーションにより提案手法の実用性を実証する。 In this paper, we introduce the gauge-fixed QLDPC surgery scheme, an improved logical measurement scheme based on the construction of Cohen et al.~(Sci.~Adv.~8, eabn1717). Our scheme leverages expansion properties of the Tanner graph to substantially reduce the space overhead of QLDPC surgery. In certain cases, we only require $\Theta(w)$ ancilla qubits to fault-tolerantly measure a weight $w$ logical operator. We provide rigorous analysis for the code distance and fault distance of our schemes, and present a modular decoding algorithm that achieves maximal fault-distance. We further introduce a bridge system to facilitate fault-tolerant joint measurements of logical operators. Augmented by this bridge construction, our scheme can be used to connect different families of QLDPC codes into one universal architecture. Applying our toolbox, we show how to perform all logical Clifford gates on the [[144,12,12]] bivariate bicycle code. Our scheme adds 103 ancilla qubits into the connectivity graph, and one of the twelve logical qubits is used as an ancilla for gate synthesis. Logical measurements are combined with the automorphism gates studied by Bravyi et al.~(Nature 627, 778-782) to implement 288 Pauli product measurements. We demonstrate the practicality of our scheme through circuit-level noise simulations, leveraging our proposed modular decoder that combines BPOSD with matching.	翻訳日:2024-11-08 14:50:05 公開日:2024-11-01
# QLDPC手術の改善 : 論理的計測とブリッジコード Improved QLDPC Surgery: Logical Measurements and Bridging Codes ( http://arxiv.org/abs/2407.18393v3 ) ライセンス: Link先を確認	Andrew Cross, Zhiyang He, Patrick Rall, Theodore Yoder,	(参考訳) 本稿では,Cohen et al ~ (Sci.~Adv.〜8,eabn1717) の構成に基づく論理的測定法であるゲージ固定型QLDPC手術法を提案する。提案手法はタナーグラフの拡張特性を利用してQLDPC手術の空間オーバーヘッドを大幅に低減する。ある場合には、重量$w$論理演算子をフォールトトレラントに測定するために、$\Theta(w)$ ancilla qubitsしか必要としない。提案手法の符号距離と故障距離を厳密に解析し,最大故障距離を実現するモジュールデコーディングアルゴリズムを提案する。さらに,論理演算子の耐故障継手測定を容易にするブリッジシステムを導入する。このブリッジ構築により、我々のスキームは、異なるQLDPC符号のファミリーを1つのユニバーサルアーキテクチャに接続するために使用できる。ツールボックスを適用して、[144,12,12]二変量自転車のコードですべての論理的なクリフォードゲートを実行する方法を示します。本手法では接続グラフに103個のアンシラ量子ビットを付加し,12個の論理量子ビットのうちの1つをゲート合成のアンシラとして用いる。論理的測定は、288 パウリ積の測定を実装するために Bravyi et al ~ (Nature 627, 778-782) によって研究された自己同型ゲートと組み合わせられる。本稿では,BPOSDとマッチングを組み合わせたモジュール型デコーダを用いて,回路レベルのノイズシミュレーションにより提案手法の実用性を実証する。 In this paper, we introduce the gauge-fixed QLDPC surgery scheme, an improved logical measurement scheme based on the construction of Cohen et al.~(Sci.~Adv.~8, eabn1717). Our scheme leverages expansion properties of the Tanner graph to substantially reduce the space overhead of QLDPC surgery. In certain cases, we only require $\Theta(w)$ ancilla qubits to fault-tolerantly measure a weight $w$ logical operator. We provide rigorous analysis for the code distance and fault distance of our schemes, and present a modular decoding algorithm that achieves maximal fault-distance. We further introduce a bridge system to facilitate fault-tolerant joint measurements of logical operators. Augmented by this bridge construction, our scheme can be used to connect different families of QLDPC codes into one universal architecture. Applying our toolbox, we show how to perform all logical Clifford gates on the [[144,12,12]] bivariate bicycle code. Our scheme adds 103 ancilla qubits into the connectivity graph, and one of the twelve logical qubits is used as an ancilla for gate synthesis. Logical measurements are combined with the automorphism gates studied by Bravyi et al.~(Nature 627, 778-782) to implement 288 Pauli product measurements. We demonstrate the practicality of our scheme through circuit-level noise simulations, leveraging our proposed modular decoder that combines BPOSD with matching.	翻訳日:2024-11-08 14:50:05 公開日:2024-11-01
# 逆ロバスト決定変換器 Adversarially Robust Decision Transformer ( http://arxiv.org/abs/2407.18414v2 ) ライセンス: Link先を確認	Xiaohang Tang, Afonso Marques, Parameswaran Kamalaruban, Ilija Bogunovic,	(参考訳) Reinforcement Learning via Supervised Learning (RvS) 手法の代表的な1つであるDecision Transformer (DT) は、強力なTransformerアーキテクチャを活用して、オフライン学習タスクにおいて強力なパフォーマンスを実現している。しかしながら、敵の環境では、リターンは意思決定者と敵双方の戦略に依存しているため、これらの手法は損なわれない。観測されたリターンに条件付き確率モデルのトレーニングは、データセットのリターンを達成する軌道が、最適でない振る舞いの逆によって達成された可能性があるため、一般化に失敗する可能性がある。そこで我々は,最低ケース対応のRvSアルゴリズムであるAdversarially Robust Decision Transformer (ARDT)を提案する。 ARDTは、最小限の期待回帰によって学習した最悪のケースリターンとターゲットリターンを一致させ、強力なテストタイム敵に対する堅牢性を高める。完全なデータカバレッジを持つシーケンシャルゲームで実施された実験では、ARDTは最大の対向ロバスト性を持つ解である最大(ナッシュ平衡)戦略を生成することができる。大規模なシーケンシャルゲームや、部分的なデータカバレッジを持つ連続的敵RL環境では、ARDTは強力なテストタイムの敵に対して非常に優れたロバスト性を示し、現代のDT法と比較して最悪のケースリターンを達成している。 Decision Transformer (DT), as one of the representative Reinforcement Learning via Supervised Learning (RvS) methods, has achieved strong performance in offline learning tasks by leveraging the powerful Transformer architecture for sequential decision-making. However, in adversarial environments, these methods can be non-robust, since the return is dependent on the strategies of both the decision-maker and adversary. Training a probabilistic model conditioned on observed return to predict action can fail to generalize, as the trajectories that achieve a return in the dataset might have done so due to a suboptimal behavior adversary. To address this, we propose a worst-case-aware RvS algorithm, the Adversarially Robust Decision Transformer (ARDT), which learns and conditions the policy on in-sample minimax returns-to-go. ARDT aligns the target return with the worst-case return learned through minimax expectile regression, thereby enhancing robustness against powerful test-time adversaries. In experiments conducted on sequential games with full data coverage, ARDT can generate a maximin (Nash Equilibrium) strategy, the solution with the largest adversarial robustness. In large-scale sequential games and continuous adversarial RL environments with partial data coverage, ARDT demonstrates significantly superior robustness to powerful test-time adversaries and attains higher worst-case returns compared to contemporary DT methods.	翻訳日:2024-11-08 14:50:05 公開日:2024-11-01
# ハイゼンベルクスピンチェーン量子電池のエルゴトロピーとキャパシティ最適化 Ergotropy and capacity optimization in Heisenberg spin-chain quantum batteries ( http://arxiv.org/abs/2408.00133v2 ) ライセンス: Link先を確認	Asad Ali, Saif Al-Kuwari, M. I. Hussain, Tim Byrnes, M. T. Rahim, James Q. Quach, Mehrdad Ghominejad, Saeed Haddadi,	(参考訳) 本研究は, ハイゼンベルクスピンモデルを用いた有限スピン量子電池 (QB) の性能を, ジアルシンスキー-モリヤ (DM) とカプラン-シェフトマン-エンチン-ヴルマン-アハロニー (KSEA) 相互作用を用いて検討した。 QBは局所的不均一磁場における相互作用量子スピンとしてモデル化され、可変ゼーマン分裂を誘導する。最近 Yang et al [Phys. Rev. Lett. 131, 030402 (2023)] が検討したように, 最大抽出可能作業, エルゴトロピー, QBs の容量に関する解析式を導出する。これらの量は、前述の研究で示されたように、特定の量子相関を通じて分析的にリンクされる。異なるハイゼンベルクスピンチェーンモデルは異なる条件下での異なる挙動を示し、QB性能を最適化するためのモデル選択の重要性を強調している。反強磁性(AFM)系では、最大エルゴトロピーはいずれのスピンにも作用するゼーマン分裂場と共に起こるが、強磁性(FM)系は均一なゼーマン場から恩恵を受ける。 AFM症例のエルゴトロピーは, FM症例と比較して温度上昇に対して概ね強いが, 温度はQB性能に大きく影響した。 DMとKSEAの結合はQBのキャパシティとエルゴトロピーの抽出を著しく向上させる。しかし、これらの相互作用のさらなる増加がキャパシティとエルゴトロピーの急激な減少を引き起こすしきい値が存在する。この挙動は温度と量子コヒーレンスの影響を受けており、これは突然の相転移の発生を示唆している。 Baumgratzらによって提唱された量子コヒーレンスの資源理論(Phys. Lett. 113, 140401 (2014))は、エルゴトロピーとキャパシティを高める上で重要な役割を果たす。しかしながら、エルゴトロピーはシステムの能力とコヒーレンス量の両方によって制限される。これらの知見はスピンベースのQBの理論的枠組みを支持しており、将来の量子エネルギー貯蔵装置の研究に役立つかもしれない。 This study examines the performance of finite spin quantum batteries (QBs) using Heisenberg spin models with Dzyaloshinsky-Moriya (DM) and Kaplan--Shekhtman--Entin-Wohlman--Aharony (KSEA) interactions. The QBs are modeled as interacting quantum spins in local inhomogeneous magnetic fields, inducing variable Zeeman splitting. We derive analytical expressions for the maximal extractable work, ergotropy and the capacity of QBs, as recently examined by Yang et al. [Phys. Rev. Lett. 131, 030402 (2023)]. These quantities are analytically linked through certain quantum correlations, as posited in the aforementioned study. Different Heisenberg spin chain models exhibit distinct behaviors under varying conditions, emphasizing the importance of model selection for optimizing QB performance. In antiferromagnetic (AFM) systems, maximum ergotropy occurs with a Zeeman splitting field applied to either spin, while ferromagnetic (FM) systems benefit from a uniform Zeeman field. Temperature significantly impacts QB performance, with ergotropy in the AFM case being generally more robust against temperature increases compared to the FM case. Incorporating DM and KSEA couplings can significantly enhance the capacity and ergotropy extraction of QBs. However, there exists a threshold beyond which additional increases in these interactions cause a sharp decline in capacity and ergotropy. This behavior is influenced by temperature and quantum coherence, which signal the occurrence of a sudden phase transition. The resource theory of quantum coherence proposed by Baumgratz et al. [Phys. Rev. Lett. 113, 140401 (2014)] plays a crucial role in enhancing ergotropy and capacity. However, ergotropy is limited by both the system's capacity and the amount of coherence. These findings support the theoretical framework of spin-based QBs and may benefit future research on quantum energy storage devices.	翻訳日:2024-11-08 13:40:31 公開日:2024-11-01
# テキスト属性を計算するためのテーブル変換器 Table Transformers for Imputing Textual Attributes ( http://arxiv.org/abs/2408.02128v2 ) ライセンス: Link先を確認	Ting-Ruen Wei, Yuan Wang, Yoshitaka Inoue, Hsin-Tai Wu, Yi Fang,	(参考訳) ダウンストリームタスクのパフォーマンスは通常、トレーニングデータセットの完全性に依存するため、表形式のデータセットでのデータの欠落は一般的な問題である。従来のデータ計算手法では、数値列と分類列に重点を置いていたが、変換器をベースとしたテーブル変換器(TTITA)と呼ばれる新しいエンドツーエンドの手法を提案し、テーブル内の他の列を用いて非構造化テキスト列をインプットする。提案手法は,3つのデータセットに対して広範な実験を行い,リカレントニューラルネットワークやLlama2などのベースラインモデルよりも優れた性能を示す。ターゲットシーケンスの長さが長い場合には、パフォーマンスの改善がより重要である。さらに、マルチタスク学習を組み込んで、不均一な列を同時にインプットし、テキストインプットの性能を高める。また、現実的なアプリケーションではChatGPTと定性的に比較する。 Missing data in tabular dataset is a common issue as the performance of downstream tasks usually depends on the completeness of the training dataset. Previous missing data imputation methods focus on numeric and categorical columns, but we propose a novel end-to-end approach called Table Transformers for Imputing Textual Attributes (TTITA) based on the transformer to impute unstructured textual columns using other columns in the table. We conduct extensive experiments on three datasets, and our approach shows competitive performance outperforming baseline models such as recurrent neural networks and Llama2. The performance improvement is more significant when the target sequence has a longer length. Additionally, we incorporate multi-task learning to simultaneously impute for heterogeneous columns, boosting the performance for text imputation. We also qualitatively compare with ChatGPT for realistic applications.	翻訳日:2024-11-08 12:55:51 公開日:2024-11-01
# コンテキストコンカヤパラメータ:コミットメッセージ生成におけるプロプライエタリLLMの性能 Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation ( http://arxiv.org/abs/2408.02502v2 ) ライセンス: Link先を確認	Aaron Imani, Iftekhar Ahmed, Mohammad Moshirpour,	(参考訳) コミットメッセージは、自然言語を使ってコミットで行った変更の説明を提供する。近年のLLM(Large Language Models)の発展は、Omniscient Message Generator (OMG)のような高品質なコミットメッセージの生成に寄与している。この方法はGPT-4を使って最先端のコミットメッセージを生成する。しかし、コーディングタスクにおける GPT-4 のような独自 LLM の使用は、プライバシとサステナビリティの懸念を生じさせ、産業的採用を妨げる可能性がある。コンパイラバリデーションなどの開発者タスクにおいて,オープンソースのLLMが競争力のあるパフォーマンスを達成したことを考慮し,OMGに匹敵するコミットメッセージの生成に利用することができるかを検討する。実験の結果,オープンソース LLM はOMG に匹敵するコミットメッセージを生成することができることがわかった。さらに,4ビット量子化8BオープンソースLCMを用いたCMG手法であるlOcal MessagE GenerAtor (OMEGA)を提案する。 OMEGAは最先端のコミットメッセージを生成し、実践者の好みでGPT-4のパフォーマンスを上回っている。 Commit messages provide descriptions of the modifications made in a commit using natural language, making them crucial for software maintenance and evolution. Recent developments in Large Language Models (LLMs) have led to their use in generating high-quality commit messages, such as the Omniscient Message Generator (OMG). This method employs GPT-4 to produce state-of-the-art commit messages. However, the use of proprietary LLMs like GPT-4 in coding tasks raises privacy and sustainability concerns, which may hinder their industrial adoption. Considering that open-source LLMs have achieved competitive performance in developer tasks such as compiler validation, this study investigates whether they can be used to generate commit messages that are comparable with OMG. Our experiments show that an open-source LLM can generate commit messages that are comparable to those produced by OMG. In addition, through a series of contextual refinements, we propose lOcal MessagE GenerAtor (OMEGA) , a CMG approach that uses a 4-bit quantized 8B open-source LLM. OMEGA produces state-of-the-art commit messages, surpassing the performance of GPT-4 in practitioners' preference.	翻訳日:2024-11-08 12:55:50 公開日:2024-11-01
# ゲームにおける性能予測とメカニズム設計 Performative Prediction on Games and Mechanism Design ( http://arxiv.org/abs/2408.05146v2 ) ライセンス: Link先を確認	António Góis, Mehrnaz Mofakhami, Fernando P. Santos, Gauthier Gidel, Simon Lacoste-Julien,	(参考訳) エージェントは集団の行動に依存する個々の目標を持つことが多い。エージェントが集団行動の予測を信頼し、戦略的に適応すれば、そのような予測は結果に非自明に影響を与え、結果としてパフォーマンス予測の一形態となる。この効果は、パンデミックの予測から選挙投票まで、あらゆるシナリオで見られるが、既存の研究は予測されたエージェント間の相互依存を無視している。この方向への第一歩として、エージェントが過去の正確性に基づいて予測を信頼するかを動的に決定する集団リスクジレンマについて検討する。予測が集合的な結果を形成するにつれて、社会福祉は関心の指標として自然に現れる。精度と福祉の相互作用を考察し、安定した正確な予測を求めることが、我々の設定において高い確率で社会福祉を最小化できることを実証する。ベイズエージェントの行動モデルに関する知識を仮定することにより、よりよいトレードオフをどうやって達成し、それらをメカニズム設計に利用するかを示す。 Agents often have individual goals which depend on a group's actions. If agents trust a forecast of collective action and adapt strategically, such prediction can influence outcomes non-trivially, resulting in a form of performative prediction. This effect is ubiquitous in scenarios ranging from pandemic predictions to election polls, but existing work has ignored interdependencies among predicted agents. As a first step in this direction, we study a collective risk dilemma where agents dynamically decide whether to trust predictions based on past accuracy. As predictions shape collective outcomes, social welfare arises naturally as a metric of concern. We explore the resulting interplay between accuracy and welfare, and demonstrate that searching for stable accurate predictions can minimize social welfare with high probability in our setting. By assuming knowledge of a Bayesian agent behavior model, we then show how to achieve better trade-offs and use them for mechanism design.	翻訳日:2024-11-08 12:00:36 公開日:2024-11-01
# フォールトトレラント量子入出力 Fault-tolerant quantum input/output ( http://arxiv.org/abs/2408.05260v2 ) ライセンス: Link先を確認	Matthias Christandl, Omar Fawzi, Ashutosh Goswami,	(参考訳) フォールトトレラント計算の一般的なシナリオは、Shorのファクタリングアルゴリズムのような古典関数を計算する量子アルゴリズムのフォールトトレラントな実現に関するものである。特にこれは、量子アルゴリズムへの入力と出力が古典的であることを意味する。スタンドアローンのシングルコア量子コンピュータとは対照的に、多くの分散シナリオでは、量子情報は1つの量子情報処理システムから別の量子に渡さなければならない。このような状況では、量子情報処理装置は量子入力、量子出力、あるいはその両方を持ち、互いに量子ビットを渡す。我々は[Kitaev, 1997]のフォールトトレラント・フレームワークで、量子入力と出力を持つ任意の量子回路をフォールトトレラント・サーキットに変換し、入力と出力に何らかの制御されたノイズを印加した理想回路を生成することを示す。このフレームワークはステートメントの直接的な構成を可能にし、汎用的な将来のアプリケーションを可能にする。これを2つの具体的な応用例で説明する。第一に、故障した符号化と復号処理を伴うノイズのあるチャネル上の通信 [Christandl and M{\"u}ller-Hermes, 2024]。線形最小距離の通信符号に対しては、一般的な雑音(コヒーレントエラーを含む)に対するフォールトトレラントエンコーダとデコーダを構築する。より弱いが標準的な局所確率雑音のモデルに対して、一定の分数ランダム誤差を補正できる通信符号に対して、フォールトトレラントエンコーダとデコーダを得る。第2の応用では、[Gottesman, 2014] の構成における状態準備回路として、一般雑音に対するフォールトトレラントな量子計算が一定の空間オーバーヘッドで達成できることを示すために、我々の結果を用いている。 Usual scenarios of fault-tolerant computation are concerned with the fault-tolerant realization of quantum algorithms that compute classical functions, such as Shor's algorithm for factoring. In particular, this means that input and output to the quantum algorithm are classical. In contrast to stand-alone single-core quantum computers, in many distributed scenarios, quantum information might have to be passed on from one quantum information processing system to another one, possibly via noisy quantum communication channels with noise levels above fault-tolerant thresholds. In such situations, quantum information processing devices will have quantum inputs, quantum outputs or even both, which pass qubits among each other. Working in the fault-tolerant framework of [Kitaev, 1997], we show that any quantum circuit with quantum input and output can be transformed into a fault-tolerant circuit that produces the ideal circuit with some controlled noise applied at the input and output. The framework allows the direct composition of the statements, enabling versatile future applications. We illustrate this with two concrete applications. The first one concerns communication over a noisy channel with faulty encoding and decoding operations [Christandl and M{\"u}ller-Hermes, 2024]. For communication codes with linear minimum distance, we construct fault-tolerant encoders and decoders for general noise (including coherent errors). For the weaker, but standard, model of local stochastic noise, we obtain fault-tolerant encoders and decoders for any communication code that can correct a constant fraction random errors. In the second application, we use our result for a state preparation circuit within the construction of [Gottesman, 2014] to establish that fault-tolerant quantum computation for general noise can be achieved with constant space overhead.	翻訳日:2024-11-08 12:00:36 公開日:2024-11-01
# 相互学習 Reciprocal Learning ( http://arxiv.org/abs/2408.06257v2 ) ライセンス: Link先を確認	Julian Rodemann, Christoph Jansen, Georg Schollmeyer,	(参考訳) 我々は、幅広い機械学習アルゴリズムが1つのパラダイムの特定の例であることを示した。これらのインスタンスは、マルチアームのバンディットに関するアクティブな学習から、自己学習まで多岐にわたる。これらのアルゴリズムは、データからパラメータを学習するだけでなく、その逆も示す: 現在のモデルに適合する方法で、トレーニングデータを反復的に変更する。本稿では,これらのアルゴリズムの一般化として,決定論の言語を用いた相互学習を紹介する。これにより、どの条件で収束するかを研究できます。鍵となるのは、バナッハの不動点定理が適用されるような相互学習契約を保証することである。このようにして、相反学習アルゴリズムは損失関数の比較的穏やかな仮定の下で線形速度でほぼ最適モデルに収束する。我々はこれらの知見を解釈し、特定のアクティブラーニング、自己学習、およびバンディットのアルゴリズムに関連づけられたコースを提供する。 We demonstrate that a wide array of machine learning algorithms are specific instances of one single paradigm: reciprocal learning. These instances range from active learning over multi-armed bandits to self-training. We show that all these algorithms do not only learn parameters from data but also vice versa: They iteratively alter training data in a way that depends on the current model fit. We introduce reciprocal learning as a generalization of these algorithms using the language of decision theory. This allows us to study under what conditions they converge. The key is to guarantee that reciprocal learning contracts such that the Banach fixed-point theorem applies. In this way, we find that reciprocal learning algorithms converge at linear rates to an approximately optimal model under relatively mild assumptions on the loss function, if their predictions are probabilistic and the sample adaption is both non-greedy and either randomized or regularized. We interpret these findings and provide corollaries that relate them to specific active learning, self-training, and bandit algorithms.	翻訳日:2024-11-08 11:38:16 公開日:2024-11-01
# 相互学習 Reciprocal Learning ( http://arxiv.org/abs/2408.06257v3 ) ライセンス: Link先を確認	Julian Rodemann, Christoph Jansen, Georg Schollmeyer,	(参考訳) 我々は、幅広い機械学習アルゴリズムが1つのパラダイムの特定の例であることを示した。これらのインスタンスは、マルチアームのバンディットに関するアクティブな学習から、自己学習まで多岐にわたる。これらのアルゴリズムは、データからパラメータを学習するだけでなく、その逆も示す: 現在のモデルに適合する方法で、トレーニングデータを反復的に変更する。本稿では,これらのアルゴリズムの一般化として,決定論の言語を用いた相互学習を紹介する。これにより、どの条件で収束するかを研究できます。鍵となるのは、バナッハの不動点定理が適用されるような相互学習契約を保証することである。このようにして、相反学習アルゴリズムは損失関数の比較的穏やかな仮定の下で線形速度でほぼ最適モデルに収束する。我々はこれらの知見を解釈し、特定のアクティブラーニング、自己学習、およびバンディットのアルゴリズムに関連づけられたコースを提供する。 We demonstrate that a wide array of machine learning algorithms are specific instances of one single paradigm: reciprocal learning. These instances range from active learning over multi-armed bandits to self-training. We show that all these algorithms do not only learn parameters from data but also vice versa: They iteratively alter training data in a way that depends on the current model fit. We introduce reciprocal learning as a generalization of these algorithms using the language of decision theory. This allows us to study under what conditions they converge. The key is to guarantee that reciprocal learning contracts such that the Banach fixed-point theorem applies. In this way, we find that reciprocal learning algorithms converge at linear rates to an approximately optimal model under relatively mild assumptions on the loss function, if their predictions are probabilistic and the sample adaption is both non-greedy and either randomized or regularized. We interpret these findings and provide corollaries that relate them to specific active learning, self-training, and bandit algorithms.	翻訳日:2024-11-08 11:38:16 公開日:2024-11-01
# RED-CT:計算社会科学のためのエッジ分類器の訓練と展開にLLMラベルデータを使用するシステム設計手法 RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science ( http://arxiv.org/abs/2408.08217v2 ) ライセンス: Link先を確認	David Farr, Nico Manzonelli, Iain Cruickshank, Jevin West,	(参考訳) 大規模言語モデル(LLM)は、構造化されていない自然言語データを迅速に分析し分類する能力を向上した。しかしながら、コスト、ネットワーク制限、セキュリティ上の制約に関する懸念は、彼らの作業プロセスへの統合に問題を引き起こしている。本研究では,下流教師あり学習課題において,LLMを不完全なデータアノテータとして活用するためのシステム設計アプローチを採用し,分類性能の向上を目的とした新たなシステム介入対策を導入する。提案手法は, LLM生成ラベルを8つのテストのうち7つのテストで上回り, 多くの産業ユースケースにおいて, 専門的, 教師あり学習モデルの設計と展開にLLMを組み込むことの効果的な戦略を示す。 Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.	翻訳日:2024-11-08 07:29:14 公開日:2024-11-01
# シークエンシャルレコメンデーションのためのインスタンスワイズ LoRA を用いた言語モデルのカスタマイズ Customizing Language Models with Instance-wise LoRA for Sequential Recommendation ( http://arxiv.org/abs/2408.10159v2 ) ライセンス: Link先を確認	Xiaoyu Kong, Jiancan Wu, An Zhang, Leheng Sheng, Hui Lin, Xiang Wang, Xiangnan He,	(参考訳) 時系列レコメンデーションシステムは、過去のインタラクションを分析し、個別の好みに合わせてレコメンデーションを調整することで、ユーザの次の関心項目を予測する。知識理解と推論におけるLLM(Large Language Models)の強みを生かして、近年のアプローチでは、LLMを言語生成パラダイムを通じてシーケンシャルなレコメンデーションに応用している。これらの手法は,Low-Rank Adaptation (LoRA) モジュールを用いて,ユーザ動作シーケンスをLLM微調整のプロンプトに変換する。しかし、多様なユーザの行動にまたがるLoRAの均一な適用は、個々の変動を捉えるのに失敗することがある。これらの課題に対処するため、我々は、LoRAとMixture of Experts (MoE)フレームワークを統合するインスタンスワイドLoRA(iLoRA)を提案する。 iLoRAはさまざまな専門家の配列を生成し、それぞれがユーザの好みの特定の側面をキャプチャし、シーケンス表現ガイドゲート関数を導入している。このゲート関数は歴史的相互作用シーケンスを処理してリッチな表現を生成し、ゲーティングネットワークにカスタマイズされた専門家参加重みを出力させる。この調整されたアプローチは、ネガティブな伝達を軽減し、多様な行動パターンに動的に適応する。 3つのベンチマークデータセットに対する大規模な実験は、iLoRAの有効性を示し、ユーザ固有の好みをキャプチャし、レコメンデーションの精度を向上させる既存の方法と比較して、その優れたパフォーマンスを強調している。 Sequential recommendation systems predict a user's next item of interest by analyzing past interactions, aligning recommendations with individual preferences. Leveraging the strengths of Large Language Models (LLMs) in knowledge comprehension and reasoning, recent approaches have applied LLMs to sequential recommendation through language generation paradigms. These methods convert user behavior sequences into prompts for LLM fine-tuning, utilizing Low-Rank Adaptation (LoRA) modules to refine recommendations. However, the uniform application of LoRA across diverse user behaviors sometimes fails to capture individual variability, leading to suboptimal performance and negative transfer between disparate sequences. To address these challenges, we propose Instance-wise LoRA (iLoRA), integrating LoRA with the Mixture of Experts (MoE) framework. iLoRA creates a diverse array of experts, each capturing specific aspects of user preferences, and introduces a sequence representation guided gate function. This gate function processes historical interaction sequences to generate enriched representations, guiding the gating network to output customized expert participation weights. This tailored approach mitigates negative transfer and dynamically adjusts to diverse behavior patterns. Extensive experiments on three benchmark datasets demonstrate the effectiveness of iLoRA, highlighting its superior performance compared to existing methods in capturing user-specific preferences and improving recommendation accuracy.	翻訳日:2024-11-08 06:44:48 公開日:2024-11-01
# シークエンシャルレコメンデーションのためのインスタンスワイズ LoRA を用いた言語モデルのカスタマイズ Customizing Language Models with Instance-wise LoRA for Sequential Recommendation ( http://arxiv.org/abs/2408.10159v3 ) ライセンス: Link先を確認	Xiaoyu Kong, Jiancan Wu, An Zhang, Leheng Sheng, Hui Lin, Xiang Wang, Xiangnan He,	(参考訳) 時系列レコメンデーションシステムは、ユーザの過去のインタラクションに基づいて次のインタラクション項目を予測し、個別の好みに合わせてレコメンデーションを調整する。知識理解と推論において、LLM(Large Language Models)の強みを活用することで、最近のアプローチは、LLMをシーケンシャルなレコメンデーションに適用したいと考えている。一般的なパラダイムは、ユーザ動作シーケンスを命令データに変換し、Low-Rank Adaption (LoRA)のようなパラメータ効率の良い細調整(PEFT)手法でLPMを微調整する。しかし、多様なユーザの行動にまたがるLoRAの均一な適用は、個々の変動を捉えるには不十分であり、異なるシーケンス間の負の移動をもたらす。これらの課題に対処するために、インスタンスワイズLoRA(iLoRA)を提案する。逐次レコメンデーションタスクをマルチタスク学習の一形態として,LoRAとMixture of Experts(MoE)フレームワークを統合した。このアプローチは、さまざまな専門家にユーザ行動のさまざまな側面を捉えるように促します。さらに、各ユーザシーケンスごとにカスタマイズされた専門家参加ウェイトを生成するシーケンス表現ガイドゲート関数を導入し、インスタンスワイドレコメンデーションの動的パラメータ調整を可能にする。逐次レコメンデーションでは,iLoRA は基本 LoRA よりも11.4\% の相対的改善を達成し,トレーニング可能なパラメータの相対的増加は 1\% 未満である。 3つのベンチマークデータセットに対する大規模な実験は、iLoRAの有効性を示し、負の転送を緩和し、レコメンデーション精度を向上させる既存の方法に比べて優れたパフォーマンスを示している。私たちのデータとコードはhttps://github.com/AkaliKong/iLoRA.comで公開されています。 Sequential recommendation systems predict the next interaction item based on users' past interactions, aligning recommendations with individual preferences. Leveraging the strengths of Large Language Models (LLMs) in knowledge comprehension and reasoning, recent approaches are eager to apply LLMs to sequential recommendation. A common paradigm is converting user behavior sequences into instruction data, and fine-tuning the LLM with parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaption (LoRA). However, the uniform application of LoRA across diverse user behaviors is insufficient to capture individual variability, resulting in negative transfer between disparate sequences. To address these challenges, we propose Instance-wise LoRA (iLoRA). We innovatively treat the sequential recommendation task as a form of multi-task learning, integrating LoRA with the Mixture of Experts (MoE) framework. This approach encourages different experts to capture various aspects of user behavior. Additionally, we introduce a sequence representation guided gate function that generates customized expert participation weights for each user sequence, which allows dynamic parameter adjustment for instance-wise recommendations. In sequential recommendation, iLoRA achieves an average relative improvement of 11.4\% over basic LoRA in the hit ratio metric, with less than a 1\% relative increase in trainable parameters. Extensive experiments on three benchmark datasets demonstrate the effectiveness of iLoRA, highlighting its superior performance compared to existing methods in mitigating negative transfer and improving recommendation accuracy. Our data and code are available at https://github.com/AkaliKong/iLoRA.	翻訳日:2024-11-08 06:44:48 公開日:2024-11-01
# LongVILA:ロングビデオのためのロングコンテキストビジュアル言語モデルのスケーリング LongVILA: Scaling Long-Context Visual Language Models for Long Videos ( http://arxiv.org/abs/2408.10188v4 ) ライセンス: Link先を確認	Fuzhao Xue, Yukang Chen, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han,	(参考訳) ロングコンテクスト能力はマルチモーダル基礎モデル、特にロングビデオ理解において重要である。本稿では,LongVILAを提案する。LongVILAは長文ビジュアル言語モデルのためのフルスタックソリューションで,アルゴリズムとシステムを共同設計する。モデルトレーニングでは、既存のVLMをアップグレードして、2つの追加ステージ、すなわち、長期文脈拡張と長期ビデオ教師付き微調整を組み込むことにより、長いビデオ理解を支援する。しかし、長ビデオのトレーニングは計算的かつメモリ集約的である。我々は,長いビデオのトレーニングと推論を効率的に並列化し,勾配チェックポイントを使わずに256GPU上で2Mのコンテキスト長トレーニングを可能にする,長文マルチモーダルシーケンス並列(MM-SP)システムを提案する。 LongVILA は VILA の動画フレーム数を 8 から 2048 に効率的に拡張し、長いビデオキャプションスコアを 2.00 から 3.26 に改善し、6,000 フレーム (100 万枚以上のトークン) のビデオニードル・イン・ア・ヘイスタックで 99.8% の精度を実現した。 LongVILA-7B は VideoMME ベンチマークで強い精度を示す。加えて、MM-SPはリングスタイルのシーケンス並列性より2.1x - 5.7倍速く、ハイブリッドコンテキストとテンソル並列性を持つメガトロンより1.1x - 1.4倍速い。さらに、Hugging Face Transformersとシームレスに統合される。 Long-context capability is critical for multi-modal foundation models, especially for long video understanding. We introduce LongVILA, a full-stack solution for long-context visual-language models \qinghao{by co-designing the algorithm and system. For model training, we upgrade existing VLMs to support long video understanding by incorporating two additional stages, {\em i.e.}, long context extension and long video supervised fine-tuning. However, training on long video is computationally and memory intensive. We introduce the long-context Multi-Modal Sequence Parallelism (MM-SP) system that efficiently parallelizes long video training and inference, enabling 2M context length training on 256 GPUs without any gradient checkpointing. LongVILA efficiently extends the number of video frames of VILA from 8 to 2048, improving the long video captioning score from 2.00 to 3.26 (out of 5), achieving 99.8% accuracy in 6,000-frame (more than 1 million tokens) video needle-in-a-haystack. LongVILA-7B demonstrates strong accuracy on the VideoMME benchmark, i.e., 61.8% with subtitle. Besides, MM-SP is 2.1x - 5.7x faster than ring style sequence parallelism and 1.1x - 1.4x faster than Megatron with a hybrid context and tensor parallelism. Moreover, it seamlessly integrates with Hugging Face Transformers.	翻訳日:2024-11-08 06:44:48 公開日:2024-11-01
# LongVILA:ロングビデオのためのロングコンテキストビジュアル言語モデルのスケーリング LongVILA: Scaling Long-Context Visual Language Models for Long Videos ( http://arxiv.org/abs/2408.10188v5 ) ライセンス: Link先を確認	Fuzhao Xue, Yukang Chen, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han,	(参考訳) ロングコンテクスト能力はマルチモーダル基礎モデル、特にロングビデオ理解において重要である。本稿では,LongVILAを提案する。LongVILAは,アルゴリズムとシステムの共同設計により,長文ビジュアル言語モデルのためのフルスタックソリューションである。モデルトレーニングでは,既存のVLMをアップグレードして,2つの追加ステージ,すなわち長期文脈拡張と長期ビデオ教師付き微調整を組み込むことで,長時間ビデオ理解を支援する。しかし、長ビデオのトレーニングは計算的かつメモリ集約的である。我々は,長いビデオのトレーニングと推論を効率的に並列化し,勾配チェックポイントを使わずに256GPU上で2Mのコンテキスト長トレーニングを可能にする,長文マルチモーダルシーケンス並列(MM-SP)システムを提案する。 LongVILA は VILA の動画フレーム数を 8 から 2048 に効率的に拡張し、長いビデオキャプションスコアを 2.00 から 3.26 に改善し、6,000 フレーム (100 万枚以上のトークン) のビデオニードル・イン・ア・ヘイスタックで 99.8% の精度を実現した。 LongVILA-7B は VideoMME ベンチマークで強い精度を示す。加えて、MM-SPはリングスタイルのシーケンス並列性より2.1x - 5.7倍速く、ハイブリッドコンテキストとテンソル並列性を持つメガトロンより1.1x - 1.4倍速い。さらに、Hugging Face Transformersとシームレスに統合される。 Long-context capability is critical for multi-modal foundation models, especially for long video understanding. We introduce LongVILA, a full-stack solution for long-context visual-language models by co-designing the algorithm and system. For model training, we upgrade existing VLMs to support long video understanding by incorporating two additional stages, i.e., long context extension and long video supervised fine-tuning. However, training on long video is computationally and memory intensive. We introduce the long-context Multi-Modal Sequence Parallelism (MM-SP) system that efficiently parallelizes long video training and inference, enabling 2M context length training on 256 GPUs without any gradient checkpointing. LongVILA efficiently extends the number of video frames of VILA from 8 to 2048, improving the long video captioning score from 2.00 to 3.26 (out of 5), achieving 99.8% accuracy in 6,000-frame (more than 1 million tokens) video needle-in-a-haystack. LongVILA-7B demonstrates strong accuracy on the VideoMME benchmark, i.e., 61.8% with subtitle. Besides, MM-SP is 2.1x - 5.7x faster than ring style sequence parallelism and 1.1x - 1.4x faster than Megatron with a hybrid context and tensor parallelism. Moreover, it seamlessly integrates with Hugging Face Transformers.	翻訳日:2024-11-08 06:44:48 公開日:2024-11-01
# 食品融合 : 拡散モデルによる食品画像合成の新しいアプローチ Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models ( http://arxiv.org/abs/2408.14135v2 ) ライセンス: Link先を確認	Chaohua Shi, Xuan Wang, Si Shi, Xule Wang, Mingrui Zhu, Nannan Wang, Xinbo Gao,	(参考訳) 食品画像の構成には、既存の食器画像と背景画像を用いて自然な新しいイメージを合成する必要があるが、拡散モデルは画像生成に大きな進歩をもたらし、将来性のある結果をもたらすエンドツーエンドアーキテクチャの構築を可能にしている。しかし、既存の拡散モデルでは、複数の画像からの情報処理と融合が困難であり、高品質な公開データセットへのアクセスが欠如しているため、食品画像合成における拡散モデルの適用が妨げられる。本稿では,22,000個の前景,背景,地上の真理3値からなる大規模で高品質な食品画像合成データセットFC22kを紹介する。さらに,事前学習した拡散モデルの能力を生かした新しい食品画像合成手法であるFoodfusionを提案し,前景や背景情報を処理・統合するためのFusion Moduleを組み込んだ。この融合した情報は、デノイングUNetのクロスアテンション層にグローバルな構造情報をマージすることにより、前景の特徴と背景構造とを整合させる。背景のコンテンツと構造をさらに強化するため、コンテンツ構造制御モジュールも統合する。提案手法の有効性と拡張性を示す実験を行った。 Food image composition requires the use of existing dish images and background images to synthesize a natural new image, while diffusion models have made significant advancements in image generation, enabling the construction of end-to-end architectures that yield promising results. However, existing diffusion models face challenges in processing and fusing information from multiple images and lack access to high-quality publicly available datasets, which prevents the application of diffusion models in food image composition. In this paper, we introduce a large-scale, high-quality food image composite dataset, FC22k, which comprises 22,000 foreground, background, and ground truth ternary image pairs. Additionally, we propose a novel food image composition method, Foodfusion, which leverages the capabilities of the pre-trained diffusion models and incorporates a Fusion Module for processing and integrating foreground and background information. This fused information aligns the foreground features with the background structure by merging the global structural information at the cross-attention layer of the denoising UNet. To further enhance the content and structure of the background, we also integrate a Content-Structure Control Module. Extensive experiments demonstrate the effectiveness and scalability of our proposed method.	翻訳日:2024-11-08 05:04:12 公開日:2024-11-01
# 空間認識拡散モデルによる大域的電場再構成とスパース観測 Spatially-Aware Diffusion Models with Cross-Attention for Global Field Reconstruction with Sparse Observations ( http://arxiv.org/abs/2409.00230v2 ) ライセンス: Link先を確認	Yilin Zhuang, Sibo Cheng, Karthik Duraisamy,	(参考訳) 拡散モデルは、複雑な分布を表現し、不確実性を組み込む能力に注目されており、ノイズや不完全データの存在下での堅牢な予測に理想的である。本研究では,部分的な観測から完全な空間場を推定するフィールド再構成タスクにおいて,スコアに基づく拡散モデルを開発し,拡張する。本研究では,観測された領域と観測されていない領域間のトラクタブルマッピングを構築するために,スパース観測と補間フィールドの学習可能な統合を帰納バイアスとして利用する条件符号化手法を提案する。センシング表現の洗練と時間次元の未解決により、任意の移動センサを処理し、フィールドを効果的に再構築することができる。さらに,静的および時間依存PDEにおける決定論的補間法に対するアプローチの総合的なベンチマークを行う。本研究は, 様々なサンプリングハイパーパラメータ, ノイズレベル, コンディショニング手法における性能評価のための, 強いベースラインのギャップに対処する試みである。提案手法は,ノイズのないデータに優れるが,クロスアテンションを持つ拡散モデルと条件エンコーディングにより,雑音条件下での他の手法よりも優れることを示す。さらに、拡散モデルと決定論的手法の両方が、定常問題に対する精度と計算コストの数値的アプローチを超越している。また, アンサンブルサンプリングを用いた共分散に基づく修正作業において, モデルが再現可能かどうかを把握し, 融合結果の精度を向上させる能力を示す。 Diffusion models have gained attention for their ability to represent complex distributions and incorporate uncertainty, making them ideal for robust predictions in the presence of noisy or incomplete data. In this study, we develop and enhance score-based diffusion models in field reconstruction tasks, where the goal is to estimate complete spatial fields from partial observations. We introduce a condition encoding approach to construct a tractable mapping mapping between observed and unobserved regions using a learnable integration of sparse observations and interpolated fields as an inductive bias. With refined sensing representations and an unraveled temporal dimension, our method can handle arbitrary moving sensors and effectively reconstruct fields. Furthermore, we conduct a comprehensive benchmark of our approach against a deterministic interpolation-based method across various static and time-dependent PDEs. Our study attempts to addresses the gap in strong baselines for evaluating performance across varying sampling hyperparameters, noise levels, and conditioning methods. Our results show that diffusion models with cross-attention and the proposed conditional encoding generally outperform other methods under noisy conditions, although the deterministic method excels with noiseless data. Additionally, both the diffusion models and the deterministic method surpass the numerical approach in accuracy and computational cost for the steady problem. We also demonstrate the ability of the model to capture possible reconstructions and improve the accuracy of fused results in covariance-based correction tasks using ensemble sampling.	翻訳日:2024-11-08 03:46:25 公開日:2024-11-01
# 添加物製造におけるディジタルツイン : システムレビュー Digital Twins in Additive Manufacturing: A Systematic Review ( http://arxiv.org/abs/2409.00877v2 ) ライセンス: Link先を確認	Md Manjurul Ahsan, Yingtao Liu, Shivakumar Raman, Zahed Siddique,	(参考訳) Digital Twins (DT) は、AMマシンの物理的コンポーネントの仮想レプリカを作成する能力によって、リアルタイム生産監視に役立っているため、アダプティブマニュファクチャリング (AM) で人気が高まっている。機械学習(ML)、拡張現実(AR)、シミュレーションベースのモデルといった高度な技術は、製造プロセスにおいてインテリジェントで適応可能なDTを開発する上で重要な役割を果たします。しかし、スケーラビリティ、高品質なデータの統合、DT開発におけるリアルタイムアプリケーションに必要な計算能力について疑問が残る。 AMにおけるDTの現在の状態を理解することは、これらの課題に対処し、AMプロセスを進める上でそのポテンシャルを完全に活用するために不可欠である。この機会を考慮して、本研究は以下の4つの研究課題に対処することで、AMにおけるDTの総合的な概要を提供することを目的としている。 2)最近のDTの開発と実装について教えてください。 (3)プロセス改善とハイブリッド製造にDTはどのように使われているか? (4) DTは産業用 4.0 技術とどのように統合されているか? 現在の応用と技術について議論することで、AMやDTの研究者や実践者に対して、より深い理解と今後の研究の方向性を提供することを目指している。 Digital Twins (DTs) are becoming popular in Additive Manufacturing (AM) due to their ability to create virtual replicas of physical components of AM machines, which helps in real-time production monitoring. Advanced techniques such as Machine Learning (ML), Augmented Reality (AR), and simulation-based models play key roles in developing intelligent and adaptable DTs in manufacturing processes. However, questions remain regarding scalability, the integration of high-quality data, and the computational power required for real-time applications in developing DTs. Understanding the current state of DTs in AM is essential to address these challenges and fully utilize their potential in advancing AM processes. Considering this opportunity, this work aims to provide a comprehensive overview of DTs in AM by addressing the following four research questions: (1) What are the key types of DTs used in AM and their specific applications? (2) What are the recent developments and implementations of DTs? (3) How are DTs employed in process improvement and hybrid manufacturing? (4) How are DTs integrated with Industry 4.0 technologies? By discussing current applications and techniques, we aim to offer a better understanding and potential future research directions for researchers and practitioners in AM and DTs.	翻訳日:2024-11-08 03:35:26 公開日:2024-11-01
# ニューラルネットワークを用いた高精度実空間電子密度 Highly Accurate Real-space Electron Densities with Neural Networks ( http://arxiv.org/abs/2409.01306v2 ) ライセンス: Link先を確認	Lixue Cheng, P. Bernát Szabó, Zeno Schätzle, Derk P. Kooi, Jonas Köhler, Klaas J. H. Giesbertz, Frank Noé, Jan Hermann, Paola Gori-Giorgi, Adam Foster,	(参考訳) 量子化学における変分ab-initio法は、波動関数への直接アクセスを提供する他の方法の中でも際立っている。これは原則として、エネルギー以外の他の観測可能な興味の抽出を可能にするが、実際、この抽出は技術的に困難であり、計算的に非現実的であることが多い。ここでは,電子密度を量子化学において観測可能な中心となるものとみなし,その密度を既知の漸近特性を捉えるニューラルネットワークを用いて表現し,スコアマッチングとノイズコントラスト推定により波動関数からトレーニングすることにより,実空間多電子波関数から正確な密度を求める新しい手法を提案する。深層学習型 ans\atze (深部QMC) を用いた変分量子モンテカルロを用いて、基底セット誤差のない高精度な波動関数を得るとともに、新しい手法を用いて、双極子モーメント、原子間力、接触密度、その他の密度に基づく特性を計算して、対応する正確な電子密度を求める。 Variational ab-initio methods in quantum chemistry stand out among other methods in providing direct access to the wave function. This allows in principle straightforward extraction of any other observable of interest, besides the energy, but in practice this extraction is often technically difficult and computationally impractical. Here, we consider the electron density as a central observable in quantum chemistry and introduce a novel method to obtain accurate densities from real-space many-electron wave functions by representing the density with a neural network that captures known asymptotic properties and is trained from the wave function by score matching and noise-contrastive estimation. We use variational quantum Monte Carlo with deep-learning ans\"atze (deep QMC) to obtain highly accurate wave functions free of basis set errors, and from them, using our novel method, correspondingly accurate electron densities, which we demonstrate by calculating dipole moments, nuclear forces, contact densities, and other density-based properties.	翻訳日:2024-11-08 03:23:46 公開日:2024-11-01
# カスタム環境多目的強化学習のための効率的な逆関数探索器としての大規模言語モデル Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning ( http://arxiv.org/abs/2409.02428v2 ) ライセンス: Link先を確認	Guanwen Xie, Jingzehua Xu, Yiyuan Yang, Yimian Ding, Shuai Zhang,	(参考訳) 複雑なカスタム環境と複数の要件を持つ強化学習(RL)タスクにおいて,報酬関数の効果的な設計と改善を実現することは,大きな課題となる。本稿では,LLMを用いた効率的な報酬関数探索機能であるERFSLを提案する。具体的には、各数値的明示的なユーザ要求に対して報酬成分を生成し、報酬批評家を用いて正しいコード形式を特定する。次に、LLMは、トレーニングログアナライザによって提供されるコンテキストに基づいて、遺伝的アルゴリズムと同様に、方向変異や交叉戦略を柔軟に適用することにより、報酬成分に重みを割り当て、そのバランスをとるとともに、曖昧さや冗長な調整なしに重みを反復的に調整する。このフレームワークを水中データ収集RLタスクに適用し、直接のフィードバックや報酬の例(ゼロショット学習)を使わずに適用した。報酬批評家は、各要求に対して1つのフィードバックインスタンスで報酬コードを修正し、修正不可能なエラーを効果的に防止する。ウェイトの初期化は、ウェイト探索を必要とせず、パレート解集合内の異なる報酬関数の取得を可能にする。重量が500倍の場合でも、平均してユーザ要求を満たすのに5.2回しか必要ありません。 ERFSLは、GPT-4o miniを利用するほとんどのプロンプトともうまく機能する。 Achieving the effective design and improvement of reward functions in reinforcement learning (RL) tasks with complex custom environments and multiple requirements presents considerable challenges. In this paper, we propose ERFSL, an efficient reward function searcher using LLMs, which enables LLMs to be effective white-box searchers and highlights their advanced semantic understanding capabilities. Specifically, we generate reward components for each numerically explicit user requirement and employ a reward critic to identify the correct code form. Then, LLMs assign weights to the reward components to balance their values and iteratively adjust the weights without ambiguity and redundant adjustments by flexibly adopting directional mutation and crossover strategies, similar to genetic algorithms, based on the context provided by the training log analyzer. We applied the framework to an underwater data collection RL task without direct human feedback or reward examples (zero-shot learning). The reward critic successfully corrects the reward code with only one feedback instance for each requirement, effectively preventing unrectifiable errors. The initialization of weights enables the acquisition of different reward functions within the Pareto solution set without the need for weight search. Even in cases where a weight is 500 times off, on average, only 5.2 iterations are needed to meet user requirements. The ERFSL also works well with most prompts utilizing GPT-4o mini, as we decompose the weight searching process to reduce the requirement for numerical and long-context understanding capabilities	翻訳日:2024-11-07 23:45:04 公開日:2024-11-01
# カスタム環境多目的強化学習のための効率的な逆関数探索器としての大規模言語モデル Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning ( http://arxiv.org/abs/2409.02428v3 ) ライセンス: Link先を確認	Guanwen Xie, Jingzehua Xu, Yiyuan Yang, Yimian Ding, Shuai Zhang,	(参考訳) 複雑なカスタム環境と複数の要件を持つ強化学習(RL)タスクにおいて,報酬関数の効果的な設計と改善を実現することは,大きな課題となる。本稿では,LLMを用いた効率的な報酬関数探索機能であるERFSLを提案する。具体的には、各数値的明示的なユーザ要求に対して報酬成分を生成し、報酬批評家を用いて正しいコード形式を特定する。次に、LLMは、トレーニングログアナライザによって提供されるコンテキストに基づいて、遺伝的アルゴリズムと同様に、方向変異や交叉戦略を柔軟に適用することにより、報酬成分に重みを割り当て、そのバランスをとるとともに、曖昧さや冗長な調整なしに重みを反復的に調整する。このフレームワークを水中データ収集RLタスクに適用し、直接のフィードバックや報酬の例(ゼロショット学習)を使わずに適用した。報酬批評家は、各要求に対して1つのフィードバックインスタンスで報酬コードを修正し、修正不可能なエラーを効果的に防止する。ウェイトの初期化は、ウェイト探索を必要とせず、パレート解集合内の異なる報酬関数の取得を可能にする。重量が500倍の場合でも、平均してユーザ要求を満たすのに5.2回しか必要ありません。 ERFSLは、GPT-4o miniを利用するほとんどのプロンプトともうまく機能する。 Achieving the effective design and improvement of reward functions in reinforcement learning (RL) tasks with complex custom environments and multiple requirements presents considerable challenges. In this paper, we propose ERFSL, an efficient reward function searcher using LLMs, which enables LLMs to be effective white-box searchers and highlights their advanced semantic understanding capabilities. Specifically, we generate reward components for each numerically explicit user requirement and employ a reward critic to identify the correct code form. Then, LLMs assign weights to the reward components to balance their values and iteratively adjust the weights without ambiguity and redundant adjustments by flexibly adopting directional mutation and crossover strategies, similar to genetic algorithms, based on the context provided by the training log analyzer. We applied the framework to an underwater data collection RL task without direct human feedback or reward examples (zero-shot learning). The reward critic successfully corrects the reward code with only one feedback instance for each requirement, effectively preventing unrectifiable errors. The initialization of weights enables the acquisition of different reward functions within the Pareto solution set without the need for weight search. Even in cases where a weight is 500 times off, on average, only 5.2 iterations are needed to meet user requirements. The ERFSL also works well with most prompts utilizing GPT-4o mini, as we decompose the weight searching process to reduce the requirement for numerical and long-context understanding capabilities	翻訳日:2024-11-07 23:45:04 公開日:2024-11-01
# AmazonのアクティブファイアモデリングにおけるLSTMとGRUを用いたニューラルネットワーク Neural Networks with LSTM and GRU in Modeling Active Fires in the Amazon ( http://arxiv.org/abs/2409.02681v4 ) ライセンス: Link先を確認	Ramon Tavares, Ricardo Olinda,	(参考訳) 本研究は,ブラジルのアマゾンにあるAqua\_M-T衛星によって検出された活動点の歴史的時系列をモデル化し,予測するための包括的方法論を提案する。このアプローチでは、Long Short-Term Memory(LSTM)とGated Recurrent Unit(GRU)アーキテクチャを組み合わせた混合リカレントニューラルネットワーク(RNN)モデルを採用して、毎日検出されたアクティブファイアスポットの月次蓄積を予測する。データ分析の結果、一貫した季節性を示し、年間最大値と最低値が毎年同じ期間に繰り返される傾向があった。主な目的は、予測が機械学習技術によってこの固有の季節を捉えているかどうかを検証することである。この手法は,2種の種子を用いたクロスバリデーションを用いたデータ準備,モデル構成,トレーニングを慎重に行い,両種子の試験および検証セットの両方にデータを一般化することを保証した。その結果,LSTMモデルとGRUモデルを組み合わせることで予測性能が向上し,複雑な時間パターンを捕捉し,観測時系列をモデル化する効果が示された。本研究は, 環境モニタリングにおける深層学習技術の適用, 特にアクティブファイアスポットの予測に大きく貢献する。提案手法は,他の時系列予測課題への適応の可能性を強調し,機械学習の研究開発と自然現象の予測に新たな機会を開く。キーワード:時系列予測、リカレントニューラルネットワーク、ディープラーニング。 This study presents a comprehensive methodology for modeling and forecasting the historical time series of active fire spots detected by the AQUA\_M-T satellite in the Amazon, Brazil. The approach employs a mixed Recurrent Neural Network (RNN) model, combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures to predict the monthly accumulations of daily detected active fire spots. Data analysis revealed a consistent seasonality over time, with annual maximum and minimum values tending to repeat at the same periods each year. The primary objective is to verify whether the forecasts capture this inherent seasonality through machine learning techniques. The methodology involved careful data preparation, model configuration, and training using cross-validation with two seeds, ensuring that the data generalizes well to both the test and validation sets for both seeds. The results indicate that the combined LSTM and GRU model delivers excellent forecasting performance, demonstrating its effectiveness in capturing complex temporal patterns and modeling the observed time series. This research significantly contributes to the application of deep learning techniques in environmental monitoring, specifically in forecasting active fire spots. The proposed approach highlights the potential for adaptation to other time series forecasting challenges, opening new opportunities for research and development in machine learning and prediction of natural phenomena. Keywords: Time Series Forecasting; Recurrent Neural Networks; Deep Learning.	翻訳日:2024-11-07 23:34:03 公開日:2024-11-01
# AmazonのアクティブファイアモデリングにおけるLSTMとGRUを用いたニューラルネットワーク Neural Networks with LSTM and GRU in Modeling Active Fires in the Amazon ( http://arxiv.org/abs/2409.02681v5 ) ライセンス: Link先を確認	Ramon Tavares, Ricardo Olinda,	(参考訳) 本研究は,ブラジルのアマゾンにあるAqua\_M-T衛星によって検出された活動点の歴史的時系列をモデル化し,予測するための包括的方法論を提案する。このアプローチでは、Long Short-Term Memory(LSTM)とGated Recurrent Unit(GRU)アーキテクチャを組み合わせた混合リカレントニューラルネットワーク(RNN)モデルを採用して、毎日検出されたアクティブファイアスポットの月次蓄積を予測する。データ分析の結果、一貫した季節性を示し、年間最大値と最低値が毎年同じ期間に繰り返される傾向があった。主な目的は、予測が機械学習技術によってこの固有の季節を捉えているかどうかを検証することである。この手法は,2種の種子を用いたクロスバリデーションを用いたデータ準備,モデル構成,トレーニングを慎重に行い,両種子の試験および検証セットの両方にデータを一般化することを保証した。その結果,LSTMモデルとGRUモデルを組み合わせることで予測性能が向上し,複雑な時間パターンを捕捉し,観測時系列をモデル化する効果が示された。本研究は, 環境モニタリングにおける深層学習技術の適用, 特にアクティブファイアスポットの予測に大きく貢献する。提案手法は,他の時系列予測課題への適応の可能性を強調し,機械学習の研究開発と自然現象の予測に新たな機会を開く。キーワード:時系列予測、リカレントニューラルネットワーク、ディープラーニング。 This study presents a comprehensive methodology for modeling and forecasting the historical time series of active fire spots detected by the AQUA\_M-T satellite in the Amazon, Brazil. The approach employs a mixed Recurrent Neural Network (RNN) model, combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures to predict the monthly accumulations of daily detected active fire spots. Data analysis revealed a consistent seasonality over time, with annual maximum and minimum values tending to repeat at the same periods each year. The primary objective is to verify whether the forecasts capture this inherent seasonality through machine learning techniques. The methodology involved careful data preparation, model configuration, and training using cross-validation with two seeds, ensuring that the data generalizes well to both the test and validation sets for both seeds. The results indicate that the combined LSTM and GRU model delivers excellent forecasting performance, demonstrating its effectiveness in capturing complex temporal patterns and modeling the observed time series. This research significantly contributes to the application of deep learning techniques in environmental monitoring, specifically in forecasting active fire spots. The proposed approach highlights the potential for adaptation to other time series forecasting challenges, opening new opportunities for research and development in machine learning and prediction of natural phenomena. Keywords: Time Series Forecasting; Recurrent Neural Networks; Deep Learning.	翻訳日:2024-11-07 23:34:03 公開日:2024-11-01
# AmazonのアクティブファイアモデリングにおけるLSTMとGRUを用いたニューラルネットワーク Neural Networks with LSTM and GRU in Modeling Active Fires in the Amazon ( http://arxiv.org/abs/2409.02681v6 ) ライセンス: Link先を確認	Ramon Tavares, Ricardo Olinda,	(参考訳) 本研究は,ブラジルのアマゾンにあるAqua\_M-T衛星によって検出された活動点の歴史的時系列をモデル化し,予測するための包括的方法論を提案する。このアプローチでは、Long Short-Term Memory(LSTM)とGated Recurrent Unit(GRU)アーキテクチャを組み合わせた混合リカレントニューラルネットワーク(RNN)モデルを採用して、毎日検出されたアクティブファイアスポットの月次蓄積を予測する。データ分析の結果、一貫した季節性を示し、年間最大値と最低値が毎年同じ期間に繰り返される傾向があった。主な目的は、予測が機械学習技術によってこの固有の季節を捉えているかどうかを検証することである。この手法は,2種の種子を用いたクロスバリデーションを用いたデータ準備,モデル構成,トレーニングを慎重に行い,両種子の試験および検証セットの両方にデータを一般化することを保証した。その結果,LSTMモデルとGRUモデルを組み合わせることで予測性能が向上し,複雑な時間パターンを捕捉し,観測時系列をモデル化する効果が示された。本研究は, 環境モニタリングにおける深層学習技術の適用, 特にアクティブファイアスポットの予測に大きく貢献する。提案手法は,他の時系列予測課題への適応の可能性を強調し,機械学習の研究開発と自然現象の予測に新たな機会を開く。キーワード:時系列予測、リカレントニューラルネットワーク、ディープラーニング。 This study presents a comprehensive methodology for modeling and forecasting the historical time series of active fire spots detected by the AQUA\_M-T satellite in the Amazon, Brazil. The approach employs a mixed Recurrent Neural Network (RNN) model, combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures to predict the monthly accumulations of daily detected active fire spots. Data analysis revealed a consistent seasonality over time, with annual maximum and minimum values tending to repeat at the same periods each year. The primary objective is to verify whether the forecasts capture this inherent seasonality through machine learning techniques. The methodology involved careful data preparation, model configuration, and training using cross-validation with two seeds, ensuring that the data generalizes well to both the test and validation sets for both seeds. The results indicate that the combined LSTM and GRU model delivers excellent forecasting performance, demonstrating its effectiveness in capturing complex temporal patterns and modeling the observed time series. This research significantly contributes to the application of deep learning techniques in environmental monitoring, specifically in forecasting active fire spots. The proposed approach highlights the potential for adaptation to other time series forecasting challenges, opening new opportunities for research and development in machine learning and prediction of natural phenomena. Keywords: Time Series Forecasting; Recurrent Neural Networks; Deep Learning.	翻訳日:2024-11-07 23:34:03 公開日:2024-11-01
# CMM-Math:大規模マルチモーダルモデルの数学推論の評価と拡張を目的とした中国のマルチモーダル数学データセット CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models ( http://arxiv.org/abs/2409.02834v3 ) ライセンス: Link先を確認	Wentao Liu, Qianjun Pan, Yi Zhang, Zhuo Liu, Ji Wu, Jie Zhou, Aimin Zhou, Qin Chen, Bo Jiang, Liang He,	(参考訳) 大規模言語モデル(LLM)は、人間の知能の基礎となる数学的推論において有望な結果を得た。従来の研究は、テキスト数学推論データセット(例えば、MATH、GSM8K)に基づくLLMの性能改善と測定に重点を置いていた。最近、数人の研究者が大規模なマルチモーダルモデル(LMM)の有効性を評価するために、英語のマルチモーダル数学データセット(例えば、MATHVISTA、MATH-V)をリリースした。本稿では,LMMの数学的推論を評価するために,ベンチマークやトレーニング部品を含む中国のマルチモーダル数学(CMM-Math)データセットをリリースする。 CMM-Mathには28,000以上の高品質なサンプルが含まれており、中国の小学校から高校まで、12段階の詳細なソリューションを備えた様々な問題タイプ(例えば、多重選択、ブランクの補充など)が特徴である。特に、視覚的コンテキストは質問や意見の中に存在し、このデータセットをより困難にします。包括的分析により、CMM-Mathデータセット上の最先端のLMMが課題に直面しており、LMM開発におけるさらなる改善の必要性を強調している。また,複数画像とテキストセグメントの混合入力による問題に対処するマルチモーダル数学的LMM(Math-LMM)を提案する。基礎的な事前学習、基礎的な微調整、数学的微調整を含む3つの段階を用いてモデルを訓練する。より広範な実験により,本モデルは3つのマルチモーダルな数学的データセット上でのSOTA LMMと比較することにより,数学推論性能を効果的に向上することが示された。 Large language models (LLMs) have obtained promising results in mathematical reasoning, which is a foundational skill for human intelligence. Most previous studies focus on improving and measuring the performance of LLMs based on textual math reasoning datasets (e.g., MATH, GSM8K). Recently, a few researchers have released English multimodal math datasets (e.g., MATHVISTA and MATH-V) to evaluate the effectiveness of large multimodal models (LMMs). In this paper, we release a Chinese multimodal math (CMM-Math) dataset, including benchmark and training parts, to evaluate and enhance the mathematical reasoning of LMMs. CMM-Math contains over 28,000 high-quality samples, featuring a variety of problem types (e.g., multiple-choice, fill-in-the-blank, and so on) with detailed solutions across 12 grade levels from elementary to high school in China. Specifically, the visual context may be present in the questions or opinions, which makes this dataset more challenging. Through comprehensive analysis, we discover that state-of-the-art LMMs on the CMM-Math dataset face challenges, emphasizing the necessity for further improvements in LMM development. We also propose a Multimodal Mathematical LMM (Math-LMM) to handle the problems with mixed input of multiple images and text segments. We train our model using three stages, including foundational pre-training, foundational fine-tuning, and mathematical fine-tuning. The extensive experiments indicate that our model effectively improves math reasoning performance by comparing it with the SOTA LMMs over three multimodal mathematical datasets.	翻訳日:2024-11-07 23:34:03 公開日:2024-11-01
# LLMによる競争性市場行動に関する実験的研究 An Experimental Study of Competitive Market Behavior Through LLMs ( http://arxiv.org/abs/2409.08357v2 ) ライセンス: Link先を確認	Jingru Jia, Zehua Yuan,	(参考訳) 本研究では,市場実験を行うための大規模言語モデル (LLM) の可能性について検討し,競争市場のダイナミクスを理解する能力を理解することを目的とした。我々は,市場エージェントの行動を制御された実験環境でモデル化し,競争均衡に向けて収束する能力を評価する。その結果,人間の取引行動に特徴的な動的意思決定プロセスの複製において,LLMが直面する課題が明らかになった。人間とは異なり、LLMは市場均衡を達成する能力に欠けていた。この研究は、LLMがスケーラブルで再現可能な市場シミュレーションのための貴重なツールを提供する一方で、現在の制限は市場行動の複雑さを完全に捉えるためにさらなる進歩を必要としていることを実証している。動的学習能力を高め、行動経済学の要素を取り入れた将来の仕事は、経済領域におけるLLMの有効性を改善し、市場のダイナミクスに関する新たな洞察を提供し、経済政策の洗練に寄与する。 This study explores the potential of large language models (LLMs) to conduct market experiments, aiming to understand their capability to comprehend competitive market dynamics. We model the behavior of market agents in a controlled experimental setting, assessing their ability to converge toward competitive equilibria. The results reveal the challenges current LLMs face in replicating the dynamic decision-making processes characteristic of human trading behavior. Unlike humans, LLMs lacked the capacity to achieve market equilibrium. The research demonstrates that while LLMs provide a valuable tool for scalable and reproducible market simulations, their current limitations necessitate further advancements to fully capture the complexities of market behavior. Future work that enhances dynamic learning capabilities and incorporates elements of behavioral economics could improve the effectiveness of LLMs in the economic domain, providing new insights into market dynamics and aiding in the refinement of economic policies.	翻訳日:2024-11-07 21:20:36 公開日:2024-11-01
# 差分プライバシーのためのセキュアサンプリングプロトコルのベンチマーク Benchmarking Secure Sampling Protocols for Differential Privacy ( http://arxiv.org/abs/2409.10667v2 ) ライセンス: Link先を確認	Yucheng Fu, Tianhao Wang,	(参考訳) 差分プライバシー(DP)は、集約されたデータからの情報漏洩を制限することにより、個人に対してプライバシー保護を提供するために広く利用されている。 DPの2つのよく知られたモデルは、中心モデルと局所モデルである。前者はデータアグリゲーションに信頼できるサーバを必要とし、後者は個人がノイズを加えることを必要とし、集約された結果の有用性を著しく低下させる。近年,分散環境でのセキュアなマルチパーティ計算(MPC)によるDPの実現,すなわち,特定のセキュリティ前提の下では,中央モデルに匹敵するユーティリティを持つ分散モデルの実現が提案されている。分散モデルにおけるDPを実現する一つの課題は、MPCで効率的にノイズをサンプリングすることである。多くの安全なサンプリング法が提案されているが、それらは異なるセキュリティ仮定と独立した理論解析を持っている。パフォーマンスを計測し比較する実験的な評価が不足しています。我々は、既存のサンプリングプロトコルをMPCでベンチマークし、その効率を総合的に測定することで、このギャップを埋める。まず,これらのサンプリングプロトコルの基礎となる手法の分類について述べる。第二に、広く使われている分散ノイズ発生プロトコルを拡張して、ビザンチン攻撃に対する耐性を高める。第3に、離散サンプリングプロトコルを実装し、セキュリティ設定を公平に比較する。そして、その効率性と有用性を研究するために、広範囲な評価を行う。 Differential privacy (DP) is widely employed to provide privacy protection for individuals by limiting information leakage from the aggregated data. Two well-known models of DP are the central model and the local model. The former requires a trustworthy server for data aggregation, while the latter requires individuals to add noise, significantly decreasing the utility of aggregated results. Recently, many studies have proposed to achieve DP with Secure Multi-party Computation (MPC) in distributed settings, namely, the distributed model, which has utility comparable to central model while, under specific security assumptions, preventing parties from obtaining others' information. One challenge of realizing DP in distributed model is efficiently sampling noise with MPC. Although many secure sampling methods have been proposed, they have different security assumptions and isolated theoretical analyses. There is a lack of experimental evaluations to measure and compare their performances. We fill this gap by benchmarking existing sampling protocols in MPC and performing comprehensive measurements of their efficiency. First, we present a taxonomy of the underlying techniques of these sampling protocols. Second, we extend widely used distributed noise generation protocols to be resilient against Byzantine attackers. Third, we implement discrete sampling protocols and align their security settings for a fair comparison. We then conduct an extensive evaluation to study their efficiency and utility.	翻訳日:2024-11-07 20:24:11 公開日:2024-11-01
# 真空から魔法を放つ Harvesting magic from the vacuum ( http://arxiv.org/abs/2409.11473v2 ) ライセンス: Link先を確認	Ron Nyström, Nicola Pranzini, Esko Keski-Vakkuri,	(参考訳) Magic(マジック)は、量子コンピュータが古典的な計算によって効率的にシミュレートできない操作を実行できる量子リソースである。そのため、量子システムにおける魔法の生成は、量子上の優位性を達成するために不可欠である。この手紙は、初期真空状態の量子場と相互作用する3レベルのUnruh-DeWitt検出器(量子ビット)によって魔法を収穫できることを示している。量子場理論(QFT)から資源を抽出するという考え方は、絡み合いの収穫から生まれたものであるが、この結果は、石英を非魔法の状態から魔法状態へと進化させるプロトコルを拡張し、QFTから魔法を生成することができる。 Magic is the quantum resource allowing a quantum computer to perform operations that cannot be simulated efficiently by classical computation. As such, generating magic in a quantum system is crucial for achieving quantum advantage. This letter shows that magic can be harvested by a three-level Unruh-DeWitt detector (a qutrit) interacting with a quantum field in an initial vacuum state. While the idea of extracting resources from Quantum Field Theories (QFT) was born from the harvesting of entanglement, our result extends the protocol to evolve a qutrit from a non-magical state to a magical one, making it possible to generate magic from QFT.	翻訳日:2024-11-07 20:01:55 公開日:2024-11-01
# TART: 説明可能なテーブルベースの推論のためのオープンソースのツール拡張フレームワーク TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning ( http://arxiv.org/abs/2409.11724v2 ) ライセンス: Link先を確認	Xinyuan Lu, Liangming Pan, Yubo Ma, Preslav Nakov, Min-Yen Kan,	(参考訳) 現在のLarge Language Models (LLMs) は、テーブル構造を理解し、正確な数値推論を適用する能力に限界があり、これはテーブル質問応答(TQA)やテーブルベースの事実検証(TFV)といったタスクに不可欠である。これらの課題に対処するために、特殊なツールとLLMを統合するTART(Tool-Augmented Reasoning framework for Tables)を紹介します。 TARTには、正確なデータ表現を保証するテーブルフォーマッター、特定の計算ツールを開発するツールメーカー、説明可能性を維持するための説明ジェネレータの3つの重要なコンポーネントが含まれている。また、テーブル-ツール統合におけるLLMのトレーニングに特化して設計された新しいベンチマークであるTOOLTABデータセットも提示する。実験の結果,データ処理の精度と推論プロセスの明確さを両立させることにより,既存の手法(例えばChain-of-Thought)よりも大幅に改善できることが示唆された。特に、CodeLlamaと組み合わせたTARTは、クローズドソースのLCM GPT-3.5-turboの精度の90.0%を達成し、さまざまな実世界のシナリオにおける堅牢性を強調している。すべてのコードとデータはhttps://github.com/XinyuanLu00/TARTで入手できる。 Current Large Language Models (LLMs) exhibit limited ability to understand table structures and to apply precise numerical reasoning, which is crucial for tasks such as table question answering (TQA) and table-based fact verification (TFV). To address these challenges, we introduce our Tool-Augmented Reasoning framework for Tables (TART), which integrates LLMs with specialized tools. TART contains three key components: a table formatter to ensure accurate data representation, a tool maker to develop specific computational tools, and an explanation generator to maintain explainability. We also present the TOOLTAB dataset, a new benchmark designed specifically for training LLMs in table-tool integration. Our experiments indicate that TART achieves substantial improvements over existing methods (e.g., Chain-of-Thought) by improving both the precision of data processing and the clarity of the reasoning process. Notably, TART paired with CodeLlama achieves 90.0% of the accuracy of the closed-sourced LLM GPT-3.5-turbo, highlighting its robustness in diverse real-world scenarios. All the code and data are available at https://github.com/XinyuanLu00/TART.	翻訳日:2024-11-07 19:50:48 公開日:2024-11-01
# HSIGene:ハイパースペクトル画像生成の基礎モデル HSIGene: A Foundation Model For Hyperspectral Image Generation ( http://arxiv.org/abs/2409.12470v1 ) ライセンス: Link先を確認	Li Pang, Datao Tang, Shuang Xu, Deyu Meng, Xiangyong Cao,	(参考訳) ハイパースペクトル画像(HSI)は農業や環境モニタリングなど様々な分野で重要な役割を果たしている。しかし、高コストな取得コストのため、ハイパースペクトル画像の数は制限され、下流タスクの性能が低下する。近年、拡散モデルを用いてHSIを合成しようとする研究もあるが、それでもHSIの不足に悩まされ、生成した画像の信頼性と多様性に影響を及ぼす。空間的多様性を高めるためにマルチモーダルデータを組み込むことを提案する研究もあるが、スペクトルの忠実度は保証できない。さらに、既存のHSI合成モデルは、通常は制御不能または単一条件制御のみをサポートし、正確で信頼性の高いHSIを生成する能力を制限する。これらの問題を緩和するため,我々は遅延拡散に基づく新しいHSI生成基盤モデルであるHSIGeneを提案し,より正確で信頼性の高いHSI生成を実現する。スペクトル密度を保ちながらトレーニングデータの空間的多様性を高めるため,空間超解像に基づく新たなデータ拡張手法を提案する。さらに,拡張データの知覚的品質を向上させるために,まずRGBバンドを超解像化し,次に,ガイド付きHSI超解像にRGAN(Rectangular Guided Attention Network)を用いた新しい2段階HSI超解像フレームワークを導入する。実験により,提案モデルでは,デノナイズや超解像といった下流タスクに対して,膨大な量の現実的なHSIを生成することができることが示された。コードとモデルはhttps://github.com/LiPang/HSIGene.comで入手できる。 Hyperspectral image (HSI) plays a vital role in various fields such as agriculture and environmental monitoring. However, due to the expensive acquisition cost, the number of hyperspectral images is limited, degenerating the performance of downstream tasks. Although some recent studies have attempted to employ diffusion models to synthesize HSIs, they still struggle with the scarcity of HSIs, affecting the reliability and diversity of the generated images. Some studies propose to incorporate multi-modal data to enhance spatial diversity, but the spectral fidelity cannot be ensured. In addition, existing HSI synthesis models are typically uncontrollable or only support single-condition control, limiting their ability to generate accurate and reliable HSIs. To alleviate these issues, we propose HSIGene, a novel HSI generation foundation model which is based on latent diffusion and supports multi-condition control, allowing for more precise and reliable HSI generation. To enhance the spatial diversity of the training data while preserving spectral fidelity, we propose a new data augmentation method based on spatial super-resolution, in which HSIs are upscaled first, and thus abundant training patches could be obtained by cropping the high-resolution HSIs. In addition, to improve the perceptual quality of the augmented data, we introduce a novel two-stage HSI super-resolution framework, which first applies RGB bands super-resolution and then utilizes our proposed Rectangular Guided Attention Network (RGAN) for guided HSI super-resolution. Experiments demonstrate that the proposed model is capable of generating a vast quantity of realistic HSIs for downstream tasks such as denoising and super-resolution. The code and models are available at https://github.com/LiPang/HSIGene.	翻訳日:2024-11-07 14:41:29 公開日:2024-11-01
# HSIGene:ハイパースペクトル画像生成の基礎モデル HSIGene: A Foundation Model For Hyperspectral Image Generation ( http://arxiv.org/abs/2409.12470v2 ) ライセンス: Link先を確認	Li Pang, Xiangyong Cao, Datao Tang, Shuang Xu, Xueru Bai, Feng Zhou, Deyu Meng,	(参考訳) ハイパースペクトル画像(HSI)は農業や環境モニタリングなど様々な分野で重要な役割を果たしている。しかし、高コストな取得コストのため、ハイパースペクトル画像の数は制限され、下流タスクの性能が低下する。近年、拡散モデルを用いてHSIを合成しようとする研究もあるが、それでもHSIの不足に悩まされ、生成した画像の信頼性と多様性に影響を及ぼす。空間的多様性を高めるためにマルチモーダルデータを組み込むことを提案する研究もあるが、スペクトルの忠実度は保証できない。さらに、既存のHSI合成モデルは、通常は制御不能または単一条件制御のみをサポートし、正確で信頼性の高いHSIを生成する能力を制限する。これらの問題を緩和するため,我々は遅延拡散に基づく新しいHSI生成基盤モデルであるHSIGeneを提案し,より正確で信頼性の高いHSI生成を実現する。スペクトル密度を保ちながらトレーニングデータの空間的多様性を高めるため,空間超解像に基づく新たなデータ拡張手法を提案する。さらに,拡張データの知覚的品質を向上させるために,まずRGBバンドを超解像化し,次に,ガイド付きHSI超解像にRGAN(Rectangular Guided Attention Network)を用いた新しい2段階HSI超解像フレームワークを導入する。実験により,提案モデルでは,デノナイズや超解像といった下流タスクに対して,膨大な量の現実的なHSIを生成することができることが示された。コードとモデルはhttps://github.com/LiPang/HSIGene.comで入手できる。 Hyperspectral image (HSI) plays a vital role in various fields such as agriculture and environmental monitoring. However, due to the expensive acquisition cost, the number of hyperspectral images is limited, degenerating the performance of downstream tasks. Although some recent studies have attempted to employ diffusion models to synthesize HSIs, they still struggle with the scarcity of HSIs, affecting the reliability and diversity of the generated images. Some studies propose to incorporate multi-modal data to enhance spatial diversity, but the spectral fidelity cannot be ensured. In addition, existing HSI synthesis models are typically uncontrollable or only support single-condition control, limiting their ability to generate accurate and reliable HSIs. To alleviate these issues, we propose HSIGene, a novel HSI generation foundation model which is based on latent diffusion and supports multi-condition control, allowing for more precise and reliable HSI generation. To enhance the spatial diversity of the training data while preserving spectral fidelity, we propose a new data augmentation method based on spatial super-resolution, in which HSIs are upscaled first, and thus abundant training patches could be obtained by cropping the high-resolution HSIs. In addition, to improve the perceptual quality of the augmented data, we introduce a novel two-stage HSI super-resolution framework, which first applies RGB bands super-resolution and then utilizes our proposed Rectangular Guided Attention Network (RGAN) for guided HSI super-resolution. Experiments demonstrate that the proposed model is capable of generating a vast quantity of realistic HSIs for downstream tasks such as denoising and super-resolution. The code and models are available at https://github.com/LiPang/HSIGene.	翻訳日:2024-11-07 14:41:29 公開日:2024-11-01
# フレームレベル基準に基づく軽量トランスデューサ Lightweight Transducer Based on Frame-Level Criterion ( http://arxiv.org/abs/2409.13698v1 ) ライセンス: Link先を確認	Genshun Wan, Mengzhi Wang, Tingzhi Mao, Hang Chen, Zhongfu Ye,	(参考訳) シーケンスレベルの基準に基づいてトレーニングされたトランスデューサモデルは、大きな確率行列を生成するため、多くのメモリを必要とする。我々は,CTC強制アライメントアルゴリズムの結果を用いて,フレーム単位のラベルを決定する軽量トランスデューサモデルを提案する。そして、デコーダ出力は、トランスデューサのように、デコーダ出力の各素子にエンコーダ出力を付加するのではなく、対応するタイミングでデコーダ出力と組み合わせることができる。これにより、メモリと計算の要求が大幅に削減される。ラベル中の過剰な空白による不均衡な分類の問題に対処するため、空白と非ブランク確率を分離し、空白分類器の勾配をメインネットワークに切り離す。これにより、軽量なトランスデューサがトランスデューサと同じような結果が得られる。さらに、よりリッチな情報を用いてブランクの確率を予測し、トランスデューサに優れた結果を得る。 The transducer model trained based on sequence-level criterion requires a lot of memory due to the generation of the large probability matrix. We proposed a lightweight transducer model based on frame-level criterion, which uses the results of the CTC forced alignment algorithm to determine the label for each frame. Then the encoder output can be combined with the decoder output at the corresponding time, rather than adding each element output by the encoder to each element output by the decoder as in the transducer. This significantly reduces memory and computation requirements. To address the problem of imbalanced classification caused by excessive blanks in the label, we decouple the blank and non-blank probabilities and truncate the gradient of the blank classifier to the main network. This enables the lightweight transducer achieving similar results to transducer. Additionally, we use richer information to predict the probability of blank, achieving superior results to transducer.	翻訳日:2024-11-07 05:57:35 公開日:2024-11-01
# フレームレベル基準に基づく軽量トランスデューサ Lightweight Transducer Based on Frame-Level Criterion ( http://arxiv.org/abs/2409.13698v2 ) ライセンス: Link先を確認	Genshun Wan, Mengzhi Wang, Tingzhi Mao, Hang Chen, Zhongfu Ye,	(参考訳) シーケンスレベルの基準に基づいてトレーニングされたトランスデューサモデルは、大きな確率行列を生成するため、多くのメモリを必要とする。我々は,CTC強制アライメントアルゴリズムの結果を用いて,フレーム単位のラベルを決定する軽量トランスデューサモデルを提案する。そして、デコーダ出力は、トランスデューサのように、デコーダ出力の各素子にエンコーダ出力を付加するのではなく、対応するタイミングでデコーダ出力と組み合わせることができる。これにより、メモリと計算の要求が大幅に削減される。ラベル中の過剰な空白による不均衡な分類の問題に対処するため、空白と非ブランク確率を分離し、空白分類器の勾配をメインネットワークに切り離す。 AISHELL-1の実験では、軽量トランスデューサがトランスデューサと同じような結果が得られることを示した。さらに、よりリッチな情報を用いてブランクの確率を予測し、トランスデューサに優れた結果を得る。 The transducer model trained based on sequence-level criterion requires a lot of memory due to the generation of the large probability matrix. We proposed a lightweight transducer model based on frame-level criterion, which uses the results of the CTC forced alignment algorithm to determine the label for each frame. Then the encoder output can be combined with the decoder output at the corresponding time, rather than adding each element output by the encoder to each element output by the decoder as in the transducer. This significantly reduces memory and computation requirements. To address the problem of imbalanced classification caused by excessive blanks in the label, we decouple the blank and non-blank probabilities and truncate the gradient of the blank classifier to the main network. Experiments on the AISHELL-1 demonstrate that this enables the lightweight transducer to achieve similar results to transducer. Additionally, we use richer information to predict the probability of blank, achieving superior results to transducer.	翻訳日:2024-11-07 05:57:35 公開日:2024-11-01
# RACOON:知識グラフを用いた検索型カラム型アノテーションのためのLLMベースのフレームワーク RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph ( http://arxiv.org/abs/2409.14556v1 ) ライセンス: Link先を確認	Linxi Wei, Guorui Xiao, Magdalena Balazinska,	(参考訳) データ探索と統合の重要なコンポーネントとして、カラム型アノテーション(CTA)は、1つ以上のセマンティックタイプを持つテーブルの列をラベル付けすることを目的としている。最近のLarge Language Models (LLMs)の開発で、研究者は強力なゼロショット機能を活用して、CTAにLLMを使用する可能性を探り始めた。本稿では、この有望な作業に基づいて、LLMに提供されたコンテキスト情報をKG(Knowledge Graph)を用いて拡張する方法を示すことで、CTAのLCMベースの手法を改善する。 RACOONと呼ばれる我々の手法は、CTAにおけるLLMの性能を向上させるために、事前訓練されたパラメトリック知識と非パラメトリック知識を組み合わせる。実験の結果, RACOONはバニラLEM推定と比較して最大0.21マイクロF-1の改善を達成できた。 As an important component of data exploration and integration, Column Type Annotation (CTA) aims to label columns of a table with one or more semantic types. With the recent development of Large Language Models (LLMs), researchers have started to explore the possibility of using LLMs for CTA, leveraging their strong zero-shot capabilities. In this paper, we build on this promising work and improve on LLM-based methods for CTA by showing how to use a Knowledge Graph (KG) to augment the context information provided to the LLM. Our approach, called RACOON, combines both pre-trained parametric and non-parametric knowledge during generation to improve LLMs' performance on CTA. Our experiments show that RACOON achieves up to a 0.21 micro F-1 improvement compared against vanilla LLM inference.	翻訳日:2024-11-06 22:08:18 公開日:2024-11-01
# RACOON:知識グラフを用いた検索型カラム型アノテーションのためのLLMベースのフレームワーク RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph ( http://arxiv.org/abs/2409.14556v2 ) ライセンス: Link先を確認	Lindsey Linxi Wei, Guorui Xiao, Magdalena Balazinska,	(参考訳) データ探索と統合の重要なコンポーネントとして、カラム型アノテーション(CTA)は、1つ以上のセマンティックタイプを持つテーブルの列をラベル付けすることを目的としている。最近のLarge Language Models (LLMs)の開発で、研究者は強力なゼロショット機能を活用して、CTAにLLMを使用する可能性を探り始めた。本稿では、この有望な作業に基づいて、LLMに提供されたコンテキスト情報をKG(Knowledge Graph)を用いて拡張する方法を示すことで、CTAのLCMベースの手法を改善する。 RACOONと呼ばれる我々の手法は、CTAにおけるLLMの性能を向上させるために、事前訓練されたパラメトリック知識と非パラメトリック知識を組み合わせる。実験の結果, RACOONはバニラLEM推定と比較して最大0.21マイクロF-1の改善を達成できた。 As an important component of data exploration and integration, Column Type Annotation (CTA) aims to label columns of a table with one or more semantic types. With the recent development of Large Language Models (LLMs), researchers have started to explore the possibility of using LLMs for CTA, leveraging their strong zero-shot capabilities. In this paper, we build on this promising work and improve on LLM-based methods for CTA by showing how to use a Knowledge Graph (KG) to augment the context information provided to the LLM. Our approach, called RACOON, combines both pre-trained parametric and non-parametric knowledge during generation to improve LLMs' performance on CTA. Our experiments show that RACOON achieves up to a 0.21 micro F-1 improvement compared against vanilla LLM inference.	翻訳日:2024-11-06 22:08:18 公開日:2024-11-01
# 地上観測衛星ネットワークのための航空深層学習統合意味推論モデル On-Air Deep Learning Integrated Semantic Inference Models for Enhanced Earth Observation Satellite Networks ( http://arxiv.org/abs/2409.15246v2 ) ライセンス: Link先を確認	Hong-fu Chou, Vu Nguyen Ha, Prabhu Thiruvasagam, Thanh-Dung Le, Geoffrey Eappen, Ti Ti Nguyen, Luis M. Garces-Socarras, Jorge L. Gonzalez-Rios, Juan Carlos Merlano-Duncan, Symeon Chatzinotas,	(参考訳) 地球観測(EO)システムは、衛星ネットワークを通じて重要なグローバルデータを収集・分析することで持続可能な開発目標を達成する上で重要な役割を担っている。これらのシステムは, マッピング, 災害監視, 資源管理といったタスクには不可欠だが, 農業や災害対応などの専門分野において, 大量のEOデータを処理, 送信する上で, 課題に直面している。ドメイン適応型大規模言語モデル(LLM)は、広範なEOデータとセマンティックEOデータとのデータ融合を容易にすることで、有望なソリューションを提供する。多様なデータセットの統合と解釈を改善することで、LLMは農業や災害対応アプリケーションで専門的な情報を処理するという課題に対処する。この融合は送信されたデータの正確性と関連性を高める。本稿では,EO衛星ネットワークにおけるセマンティック通信のためのフレームワークを提案する。提案方式では,ディスクリート・タスク指向のソース・チャネル符号化 (DT-JSCC) とセマンティック・データ拡張 (SA) を用いて,通信オーバーヘッドを最小限に抑えながら関連情報に集中する。認知的セマンティック処理と衛星間リンクを統合することにより、マルチスペクトル衛星画像の解析と伝送を強化し、オブジェクト検出、パターン認識、リアルタイム意思決定を改善する。 CSA(Cognitive Semantic Augmentation)の導入により、衛星はセマンティック情報を処理および送信することができ、環境やアプリケーションニーズの変化への適応性を高めることができる。このエンドツーエンドアーキテクチャは、6Gをサポートする次世代衛星ネットワーク向けに調整されており、効率と精度が大幅に向上している。 Earth Observation (EO) systems play a crucial role in achieving Sustainable Development Goals by collecting and analyzing vital global data through satellite networks. These systems are essential for tasks like mapping, disaster monitoring, and resource management, but they face challenges in processing and transmitting large volumes of EO data, especially in specialized fields such as agriculture and real-time disaster response. Domain-adapted Large Language Models (LLMs) provide a promising solution by facilitating data fusion between extensive EO data and semantic EO data. By improving integration and interpretation of diverse datasets, LLMs address the challenges of processing specialized information in agriculture and disaster response applications. This fusion enhances the accuracy and relevance of transmitted data. This paper presents a framework for semantic communication in EO satellite networks, aimed at improving data transmission efficiency and overall system performance through cognitive processing techniques. The proposed system employs Discrete-Task-Oriented Source-Channel Coding (DT-JSCC) and Semantic Data Augmentation (SA) to focus on relevant information while minimizing communication overhead. By integrating cognitive semantic processing and inter-satellite links, the framework enhances the analysis and transmission of multispectral satellite imagery, improving object detection, pattern recognition, and real-time decision-making. The introduction of Cognitive Semantic Augmentation (CSA) allows satellites to process and transmit semantic information, boosting adaptability to changing environments and application needs. This end-to-end architecture is tailored for next-generation satellite networks, such as those supporting 6G, and demonstrates significant improvements in efficiency and accuracy.	翻訳日:2024-11-06 20:27:58 公開日:2024-11-01
# 地上観測衛星ネットワークのための航空深層学習統合意味推論モデル On-Air Deep Learning Integrated Semantic Inference Models for Enhanced Earth Observation Satellite Networks ( http://arxiv.org/abs/2409.15246v3 ) ライセンス: Link先を確認	Hong-fu Chou, Vu Nguyen Ha, Prabhu Thiruvasagam, Thanh-Dung Le, Geoffrey Eappen, Ti Ti Nguyen, Luis M. Garces-Socarras, Jorge L. Gonzalez-Rios, Juan Carlos Merlano-Duncan, Symeon Chatzinotas,	(参考訳) 地球観測(EO)システムは、地図作成、災害監視、資源管理に不可欠である。それにもかかわらず、特に精密農業やリアルタイム災害対応といった専門分野において、広範なデータの処理や伝達にかなりの障害に直面している。リモートセンシング技術を備えた地球観測衛星は、オンボードセンサーやIoT対応の地上オブジェクトからデータを収集し、リモートで重要な情報を提供する。ドメイン適応型大規模言語モデル(LLM)は、生および処理されたEOデータの統合を可能にするソリューションを提供する。ドメイン適応により、LLMは多くのデータソースの同化と分析を改善し、農業や災害対応における特別なデータセットの複雑さに対処する。 LLMによって誘導されるこのデータ合成は、伝達された情報の精度とパーシステンスを高める。本研究は,高度なEOシステムのための意味推論と深層学習を徹底的に検討する。 EO衛星ネットワークにおけるセマンティック通信のための革新的なアーキテクチャを提案し,セマンティック処理手法を用いてデータ伝送効率を向上させる。近年のオンボード処理技術の進歩は、軌道上での信頼性、適応性、エネルギー効率の高いデータ管理を可能にしている。これらの改良により、放射線硬化・再構成技術による悪環境における信頼性の高い性能が保証される。これらの進歩により、次世代衛星ミッションの処理能力は向上し、運用の柔軟性と6G衛星通信におけるリアルタイムな意思決定に欠かせないものとなった。 Earth Observation (EO) systems are crucial for cartography, disaster surveillance, and resource administration. Nonetheless, they encounter considerable obstacles in the processing and transmission of extensive data, especially in specialized domains such as precision agriculture and real-time disaster response. Earth observation satellites, outfitted with remote sensing technology, gather data from onboard sensors and IoT-enabled terrestrial objects, delivering important information remotely. Domain-adapted Large Language Models (LLMs) provide a solution by enabling the integration of raw and processed EO data. Through domain adaptation, LLMs improve the assimilation and analysis of many data sources, tackling the intricacies of specialized datasets in agriculture and disaster response. This data synthesis, directed by LLMs, enhances the precision and pertinence of conveyed information. This study provides a thorough examination of using semantic inference and deep learning for sophisticated EO systems. It presents an innovative architecture for semantic communication in EO satellite networks, designed to improve data transmission efficiency using semantic processing methodologies. Recent advancements in onboard processing technologies enable dependable, adaptable, and energy-efficient data management in orbit. These improvements guarantee reliable performance in adverse space circumstances using radiation-hardened and reconfigurable technology. Collectively, these advancements enable next-generation satellite missions with improved processing capabilities, crucial for operational flexibility and real-time decision-making in 6G satellite communication.	翻訳日:2024-11-06 20:27:58 公開日:2024-11-01
# IIoTにおけるデータ不均一性を考慮した表面欠陥分類のための対向的フェデレーション・コンセンサス学習 Adversarial Federated Consensus Learning for Surface Defect Classification Under Data Heterogeneity in IIoT ( http://arxiv.org/abs/2409.15711v2 ) ライセンス: Link先を確認	Jixuan Cui, Jun Li, Zhen Mei, Yiyang Ni, Wen Chen, Zengxiang Li,	(参考訳) データ不足の課題は、産業用表面欠陥分類(SDC)におけるディープラーニングの適用を妨げる。プライバシー上の懸念から、産業用モノのインターネット(IIoT)のさまざまなエンティティから十分なトレーニングデータを収集、集中させることが難しいからだ。フェデレートラーニング(FL)は、プライバシを維持しながら、クライアント間で協調的なグローバルモデルトレーニングを可能にするソリューションを提供する。しかし、クライアント間でのデータ分散が不均一であるためにパフォーマンスが低下する可能性がある。本稿では,SDC の異なるクライアント間でのデータの異質性に挑戦するために,Adversarial Federated Consensus Learning (AFedCL) という新しいパーソナライズされた FL (PFL) アプローチを提案する。まず,データの不均一性による性能劣化を軽減するために,動的コンセンサス構築戦略を開発する。敵対的トレーニングを通じて、異なるクライアントのローカルモデルは、グローバルモデルをブリッジとして利用し、分散アライメントを実現し、グローバル知識の忘れる問題を緩和する。この戦略を補完し,コンセンサスを考慮したアグリゲーション機構を提案する。グローバルな知識学習における有効性に基づいて、集約重みを異なるクライアントに割り当て、グローバルなモデルの一般化能力を高める。最後に,グローバルな知識利用効率を高めるために,適応的な特徴融合モジュールを設計する。パーソナライズされた融合重みは、グローバルな特徴とローカルな特徴を最適にバランスするために、各クライアントに対して徐々に調整される。 FedALAのような最先端のFL法と比較して、提案手法は3つのSDCデータセットで最大5.67%の精度向上を実現する。 The challenge of data scarcity hinders the application of deep learning in industrial surface defect classification (SDC), as it's difficult to collect and centralize sufficient training data from various entities in Industrial Internet of Things (IIoT) due to privacy concerns. Federated learning (FL) provides a solution by enabling collaborative global model training across clients while maintaining privacy. However, performance may suffer due to data heterogeneity-discrepancies in data distributions among clients. In this paper, we propose a novel personalized FL (PFL) approach, named Adversarial Federated Consensus Learning (AFedCL), for the challenge of data heterogeneity across different clients in SDC. First, we develop a dynamic consensus construction strategy to mitigate the performance degradation caused by data heterogeneity. Through adversarial training, local models from different clients utilize the global model as a bridge to achieve distribution alignment, alleviating the problem of global knowledge forgetting. Complementing this strategy, we propose a consensus-aware aggregation mechanism. It assigns aggregation weights to different clients based on their efficacy in global knowledge learning, thereby enhancing the global model's generalization capabilities. Finally, we design an adaptive feature fusion module to further enhance global knowledge utilization efficiency. Personalized fusion weights are gradually adjusted for each client to optimally balance global and local features. Compared with state-of-the-art FL methods like FedALA, the proposed AFedCL method achieves an accuracy increase of up to 5.67% on three SDC datasets.	翻訳日:2024-11-06 19:32:29 公開日:2024-11-01
# VascXモデル:カラーファウンダス画像からの網膜血管解析のためのモデルアンサンブル VascX Models: Model Ensembles for Retinal Vascular Analysis from Color Fundus Images ( http://arxiv.org/abs/2409.16016v2 ) ライセンス: Link先を確認	Jose Vargas Quiros, Bart Liefers, Karin van Garderen, Jeroen Vermeulen, Eyened Reading Center, Sinergia Consortium, Caroline Klaver,	(参考訳) 本稿では,カラーファンドス画像(CFI)から網膜血管を解析するための包括的モデルアンサンブルであるVascXモデルを紹介する。アノテーション付きCFIは、公開データセットから集約された。人口を基盤とするロッテルダム研究(Rotterdam Study)から追加のCFIは、ピクセルレベルの動脈や静脈のグレーダーによって注釈され、患者人口と画像条件に多様なデータセットが得られた。 VascXモデルは、既存の公開モデルと比較してデータセット、画像品質レベル、解剖学的領域のセグメンテーション性能が優れていることを示した。動脈・静脈・椎間板のセグメンテーション性能,特に大コホートや臨床データセットに共通する中品質CFIのセグメンテーションにおいて重要な改善が認められた。重要な点として,これらの改善は,VascXセグメンテーションマスクから抽出した特徴と,以前のモデルで生成したセグメンテーションマスクから抽出した特徴とを比較すると,より正確な血管機能に変換された。 VascXモデルでは、実装を簡素化し、自動網膜血管解析の品質を向上させることを目的とした、堅牢で実用性の高いモデルアンサンブルと推論コードを提供しています。モデルによって生成された正確な血管パラメータは、眼の内外における病気のパターンを識別するための出発点として機能する。 We introduce VascX models, a comprehensive set of model ensembles for analyzing retinal vasculature from color fundus images (CFIs). Annotated CFIs were aggregated from public datasets . Additional CFIs, mainly from the population-based Rotterdam Study were annotated by graders for arteries and veins at pixel level, resulting in a dataset diverse in patient demographics and imaging conditions. VascX models demonstrated superior segmentation performance across datasets, image quality levels, and anatomic regions when compared to existing, publicly available models, likely due to the increased size and variety of our training set. Important improvements were observed in artery-vein and disc segmentation performance, particularly in segmentations of these structures on CFIs of intermediate quality, common in large cohorts and clinical datasets. Importantly, these improvements translated into significantly more accurate vascular features when we compared features extracted from VascX segmentation masks with features extracted from segmentation masks generated by previous models. With VascX models we provide a robust, ready-to-use set of model ensembles and inference code aimed at simplifying the implementation and enhancing the quality of automated retinal vasculature analyses. The precise vessel parameters generated by the model can serve as starting points for the identification of disease patterns in and outside of the eye.	翻訳日:2024-11-06 18:04:33 公開日:2024-11-01
# 複数グループ:シミュレート・ソーシャル・アンサンブルによるLCMの指導システム Plurals: A System for Guiding LLMs Via Simulated Social Ensembles ( http://arxiv.org/abs/2409.17213v3 ) ライセンス: Link先を確認	Joshua Ashkinaze, Emily Fry, Narendra Edara, Eric Gilbert, Ceren Budak,	(参考訳) 近年の議論は、言語モデルが特定の視点を好むのではないかという懸念を提起した。しかし、もし解決策が"どこからでも見る"ことではなく、むしろ異なる視点を活用することにあるとしたらどうでしょう? 本稿では,多言語AIのためのシステムとPythonライブラリであるPluralsを紹介する。複数言語は、カスタマイズ可能な構造内で意図的に行われるエージェント(LLM、オプションでペルソナを含む)と、モデレーターが審議を監督する。 Pluralsは、シミュレートされたソーシャルアンサンブルのジェネレータである。 Pluralsは政府データセットを統合して、全国的に代表されるペルソナを作成し、民主的な熟考理論に触発された熟考テンプレートを含み、ユーザーは情報共有構造と構造内の熟考行動の両方をカスタマイズできる。 6つのケーススタディは、理論的構成と有効性に対する忠実さを示している。 3つのランダム化実験は、シミュレーションされた焦点群が関連する聴衆のオンラインサンプル(75%の試験でゼロショット生成を超越した)と共振する結果を示した。複数言語は多元的AIのためのパラダイムと具体的なシステムである。 Pluralsライブラリはhttps://github.com/josh-ashkinaze/pluralsで公開されている。 Recent debates raised concerns that language models may favor certain viewpoints. But what if the solution is not to aim for a 'view from nowhere' but rather to leverage different viewpoints? We introduce Plurals, a system and Python library for pluralistic AI deliberation. Plurals consists of Agents (LLMs, optionally with personas) which deliberate within customizable Structures, with Moderators overseeing deliberation. Plurals is a generator of simulated social ensembles. Plurals integrates with government datasets to create nationally representative personas, includes deliberation templates inspired by democratic deliberation theory, and allows users to customize both information-sharing structures and deliberation behavior within Structures. Six case studies demonstrate fidelity to theoretical constructs and efficacy. Three randomized experiments show simulated focus groups produced output resonant with an online sample of the relevant audiences (chosen over zero-shot generation in 75% of trials). Plurals is both a paradigm and a concrete system for pluralistic AI. The Plurals library is available at https://github.com/josh-ashkinaze/plurals and will be continually updated.	翻訳日:2024-11-06 16:30:51 公開日:2024-11-01
# 複数グループ:シミュレート・ソーシャル・アンサンブルによるLCMの指導システム Plurals: A System for Guiding LLMs Via Simulated Social Ensembles ( http://arxiv.org/abs/2409.17213v4 ) ライセンス: Link先を確認	Joshua Ashkinaze, Emily Fry, Narendra Edara, Eric Gilbert, Ceren Budak,	(参考訳) 近年の議論は、言語モデルが特定の視点を好むのではないかという懸念を提起した。しかし、もし解決策が"どこからでも見る"ことではなく、むしろ異なる視点を活用することにあるとしたらどうでしょう? 本稿では,多言語AIのためのシステムとPythonライブラリであるPluralsを紹介する。複数言語は、カスタマイズ可能な構造内で意図的に行われるエージェント(LLM、オプションでペルソナを含む)と、モデレーターが審議を監督する。 Pluralsは、シミュレートされたソーシャルアンサンブルのジェネレータである。 Pluralsは政府データセットを統合して、全国的に代表されるペルソナを作成し、民主的な熟考理論に触発された熟考テンプレートを含み、ユーザーは情報共有構造と構造内の熟考行動の両方をカスタマイズできる。 6つのケーススタディは、理論的構成と有効性に対する忠実さを示している。 3つのランダム化実験は、シミュレーションされた焦点群が関連する聴衆のオンラインサンプル(75%の試験でゼロショット生成を超越した)と共振する結果を示した。複数言語は多元的AIのためのパラダイムと具体的なシステムである。 Pluralsライブラリはhttps://github.com/josh-ashkinaze/pluralsで公開されている。 Recent debates raised concerns that language models may favor certain viewpoints. But what if the solution is not to aim for a 'view from nowhere' but rather to leverage different viewpoints? We introduce Plurals, a system and Python library for pluralistic AI deliberation. Plurals consists of Agents (LLMs, optionally with personas) which deliberate within customizable Structures, with Moderators overseeing deliberation. Plurals is a generator of simulated social ensembles. Plurals integrates with government datasets to create nationally representative personas, includes deliberation templates inspired by democratic deliberation theory, and allows users to customize both information-sharing structures and deliberation behavior within Structures. Six case studies demonstrate fidelity to theoretical constructs and efficacy. Three randomized experiments show simulated focus groups produced output resonant with an online sample of the relevant audiences (chosen over zero-shot generation in 75% of trials). Plurals is both a paradigm and a concrete system for pluralistic AI. The Plurals library is available at https://github.com/josh-ashkinaze/plurals and will be continually updated.	翻訳日:2024-11-06 16:30:51 公開日:2024-11-01
# Uni-Med: マルチタスク学習のコネクタ-MoEのための統一医療一般モデル Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE ( http://arxiv.org/abs/2409.17508v2 ) ライセンス: Link先を確認	Xun Zhu, Ying Hu, Fanbin Mo, Miao Li, Ji Wu,	(参考訳) MLLM(Multi-modal large language model)は、様々な視覚的・言語的タスクのための汎用インタフェースとして、印象的な機能を示している。しかし、医療分野におけるマルチタスク学習のための統一MLLMの構築は、依然として厄介な課題である。 MLLMにおけるマルチモーダルマルチタスク最適化の綱引き問題を軽減するため、近年の進歩は、モダリティ間のギャップを埋めるコネクタを無視しつつ、LLMコンポーネントの改善に重点を置いている。本稿では,Uni-Medについて紹介する。Uni-Med,Uni-Med,Uni-Med,Uni-Med,Uni-Med,Uni-Med,Uni-Med,Uni-Med,Uni-Med。コネクタにプロジェクションの専門家が混在したよく設計されたルータを活用したCMoEの利点として、Uni-Medは綱引き問題に対する効率的な解決策を実現し、質問応答、視覚的質問応答、レポート生成、表現理解の参照、表現生成、画像分類を含む6つの異なる医療タスクを実行できる。我々の知る限り、Uni-MedはMLLMのコネクタにおけるマルチタスク干渉に対処する最初の試みである。大規模なアブレーション実験により、任意の構成でCMoEを導入する効果が検証され、平均8%の性能向上が得られた。さらに、勾配最適化とパラメータ統計の観点から、綱引き問題の解釈分析を行う。従来の最先端の医療MLLMと比較すると、Uni-Medは多様なタスクにおける競争力や優れた評価基準を達成している。コードとリソースはhttps://github.com/tsinghua-msiip/Uni-Med.comで入手できる。 Multi-modal large language models (MLLMs) have shown impressive capabilities as a general-purpose interface for various visual and linguistic tasks. However, building a unified MLLM for multi-task learning in the medical field remains a thorny challenge. To mitigate the tug-of-war problem of multi-modal multi-task optimization in MLLMs, recent advances primarily focus on improving the LLM components, while neglecting the connector that bridges the gap between modalities. In this paper, we introduce Uni-Med, a novel medical generalist foundation model which consists of a universal visual feature extraction module, a connector mixture-of-experts (CMoE) module, and an LLM. Benefiting from the proposed CMoE that leverages a well-designed router with a mixture of projection experts at the connector, Uni-Med achieves efficient solution to the tug-of-war problem and can perform six different medical tasks including question answering, visual question answering, report generation, referring expression comprehension, referring expression generation and image classification. To the best of our knowledge, Uni-Med is the first effort to tackle multi-task interference at the connector in MLLMs. Extensive ablation experiments validate the effectiveness of introducing CMoE under any configuration, with up to an average 8% performance gains. We further provide interpretation analysis of the tug-of-war problem from the perspective of gradient optimization and parameter statistics. Compared to previous state-of-the-art medical MLLMs, Uni-Med achieves competitive or superior evaluation metrics on diverse tasks. Code and resources are available at https://github.com/tsinghua-msiip/Uni-Med.	翻訳日:2024-11-06 16:20:44 公開日:2024-11-01
# MetaMath:大規模言語モデルにおける数学的推論強化のための自然言語とコードの統合 MetaMath: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models ( http://arxiv.org/abs/2409.19381v1 ) ライセンス: Link先を確認	Xuyuan Xiong, Simeng Han, Ziyue Zhou, Arman Cohan,	(参考訳) 大規模言語モデル(LLM)は、自然言語、コード、あるいは両者の組み合わせという、以下の形式で数学的推論問題の解を生成するために一般的に用いられる。本稿では,GPT-4o-mini や LLama-3.1-8b-Turbo など,最先端の LLM を用いた自然言語とコードを用いた数学的推論問題の解法に関する基礎的考察を行う。その結果,LLMはコードよりも自然言語の推論が優れていることがわかった。さらに、自然言語とコードは相補的な推論の形式として機能するが、特定のシナリオでは負の形で互いに影響しあうことができる。これらの知見は, LLMを利用して最適推論形式を動的に選択し, GPT-4o-miniと同等のベースライン上での性能を向上させるメタマスという新たなプロンプト手法の開発を動機付けている。 Large Language Models (LLMs) are commonly used to generate solutions for mathematical reasoning problems in the following formats: natural language, code, or a combination of both. In this paper, we explore fundamental questions related to solving mathematical reasoning problems using natural language and code with state-of-the-art LLMs, including GPT-4o-mini and LLama-3.1-8b-Turbo. Our findings show that LLMs are better at reasoning in natural language compared to code. Additionally, although natural language and code serve as complementary forms of reasoning, they can affect each other in a negative way in certain scenarios. These insights motivate our development of a new prompting method, MetaMath, which leverages an LLM to dynamically select the most appropriate reasoning form, resulting in improved performance over comparable baselines with GPT-4o-mini.	翻訳日:2024-11-05 23:38:55 公開日:2024-11-01
# INC-Math:大規模言語モデルにおける数学的推論強化のための自然言語とコードの統合 INC-Math: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models ( http://arxiv.org/abs/2409.19381v2 ) ライセンス: Link先を確認	Xuyuan Xiong, Simeng Han, Ziyue Zhou, Arman Cohan,	(参考訳) 大規模言語モデル(LLM)は、自然言語、コード、あるいは両者の組み合わせという、以下の形式で数学的推論問題の解を生成するために一般的に用いられる。本稿では,GPT-4o-mini や LLama-3.1-8b-Turbo など,最先端の LLM を用いた自然言語とコードを用いた数学的推論問題の解法に関する基礎的考察を行う。その結果,LLMはコードよりも自然言語の推論が優れていることがわかった。さらに、自然言語とコードは相補的な推論の形式として機能するが、特定のシナリオでは負の形で互いに影響しあうことができる。これらの知見は, LLM を利用して最適推論形式を動的に選択し, GPT-4o-mini で同等のベースライン上での性能を向上させる新しいプロンプト手法 INC-Math の開発を動機付けている。 Large Language Models (LLMs) are commonly used to generate solutions for mathematical reasoning problems in the following formats: natural language, code, or a combination of both. In this paper, we explore fundamental questions related to solving mathematical reasoning problems using natural language and code with state-of-the-art LLMs, including GPT-4o-mini and LLama-3.1-8b-Turbo. Our findings show that LLMs are better at reasoning in natural language compared to code. Additionally, although natural language and code serve as complementary forms of reasoning, they can affect each other in a negative way in certain scenarios. These insights motivate our development of a new prompting method, INC-Math, which leverages an LLM to dynamically select the most appropriate reasoning form, resulting in improved performance over comparable baselines with GPT-4o-mini.	翻訳日:2024-11-05 23:38:55 公開日:2024-11-01
# INC-Math:大規模言語モデルにおける数学的推論強化のための自然言語とコードの統合 INC-Math: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models ( http://arxiv.org/abs/2409.19381v3 ) ライセンス: Link先を確認	Xuyuan Xiong, Simeng Han, Ziyue Zhou, Arman Cohan,	(参考訳) 大規模言語モデル(LLM)は、自然言語、コード、あるいは両者の組み合わせという、以下の形式で数学的推論問題の解を生成するために一般的に用いられる。本稿では,GPT-4o-mini や LLama-3.1-8b-Turbo など,最先端の LLM を用いた自然言語とコードを用いた数学的推論問題の解法に関する基礎的考察を行う。その結果,LLMはコードよりも自然言語の推論が優れていることがわかった。さらに、自然言語とコードは相補的な推論の形式として機能するが、特定のシナリオでは負の形で互いに影響しあうことができる。これらの知見は, LLM を利用して最適推論形式を動的に選択し, GPT-4o-mini で同等のベースライン上での性能を向上させる新しいプロンプト手法 INC-Math の開発を動機付けている。 Large Language Models (LLMs) are commonly used to generate solutions for mathematical reasoning problems in the following formats: natural language, code, or a combination of both. In this paper, we explore fundamental questions related to solving mathematical reasoning problems using natural language and code with state-of-the-art LLMs, including GPT-4o-mini and LLama-3.1-8b-Turbo. Our findings show that LLMs are better at reasoning in natural language compared to code. Additionally, although natural language and code serve as complementary forms of reasoning, they can affect each other in a negative way in certain scenarios. These insights motivate our development of a new prompting method, INC-Math, which leverages an LLM to dynamically select the most appropriate reasoning form, resulting in improved performance over comparable baselines with GPT-4o-mini.	翻訳日:2024-11-05 23:38:55 公開日:2024-11-01
# 不完全データを用いたロバストマルチモーダル感性分析に向けて Towards Robust Multimodal Sentiment Analysis with Incomplete Data ( http://arxiv.org/abs/2409.20012v1 ) ライセンス: Link先を確認	Haoyu Zhang, Wenbin Wang, Tianshu Yu,	(参考訳) マルチモーダル・センティメント・アナリティクス(MSA)の分野は、データ不完全性の問題に対処する新たな方向性を最近見てきた。言語モダリティには通常、密度の強い感情情報が含まれていることを認識し、これを支配的なモダリティとみなし、堅牢なMSAを実現するために、言語に支配された耐雑音学習ネットワーク(LNLN)を提案する。提案したLNLNは、支配的モダリティ補正(DMC)モジュールと支配的モダリティベースマルチモーダル学習(DMML)モジュールを備え、支配的モダリティ表現の品質を保証することにより、様々なノイズシナリオにおけるモデルの堅牢性を高める。方法論的な設計とは別に,いくつかの一般的なデータセット(\textit{e g ,} MOSI, MOSEI, SIMS)の多様かつ有意義な設定を利用して,ランダムなデータ不足シナリオ下で総合的な実験を行い,文献における既存の評価に比べて統一性,透明性,公正性を付加する。経験的に、LNLNは既存のベースラインを一貫して上回り、これらの挑戦的で広範な評価指標よりも優れたパフォーマンスを示している。 The field of Multimodal Sentiment Analysis (MSA) has recently witnessed an emerging direction seeking to tackle the issue of data incompleteness. Recognizing that the language modality typically contains dense sentiment information, we consider it as the dominant modality and present an innovative Language-dominated Noise-resistant Learning Network (LNLN) to achieve robust MSA. The proposed LNLN features a dominant modality correction (DMC) module and dominant modality based multimodal learning (DMML) module, which enhances the model's robustness across various noise scenarios by ensuring the quality of dominant modality representations. Aside from the methodical design, we perform comprehensive experiments under random data missing scenarios, utilizing diverse and meaningful settings on several popular datasets (\textit{e.g.,} MOSI, MOSEI, and SIMS), providing additional uniformity, transparency, and fairness compared to existing evaluations in the literature. Empirically, LNLN consistently outperforms existing baselines, demonstrating superior performance across these challenging and extensive evaluation metrics.	翻訳日:2024-11-05 16:08:18 公開日:2024-11-01
# 不完全データを用いたロバストマルチモーダル感性分析に向けて Towards Robust Multimodal Sentiment Analysis with Incomplete Data ( http://arxiv.org/abs/2409.20012v2 ) ライセンス: Link先を確認	Haoyu Zhang, Wenbin Wang, Tianshu Yu,	(参考訳) マルチモーダル・センティメント・アナリティクス(MSA)の分野は、データ不完全性の問題に対処する新たな方向性を最近見てきた。言語モダリティには通常、密度の強い感情情報が含まれていることを認識し、これを支配的なモダリティとみなし、堅牢なMSAを実現するために、言語に支配された耐雑音学習ネットワーク(LNLN)を提案する。提案したLNLNは、支配的モダリティ補正(DMC)モジュールと支配的モダリティベースマルチモーダル学習(DMML)モジュールを備え、支配的モダリティ表現の品質を保証することにより、様々なノイズシナリオにおけるモデルの堅牢性を高める。方法論的な設計とは別に,いくつかの一般的なデータセット(\textit{e g ,} MOSI, MOSEI, SIMS)の多様かつ有意義な設定を利用して,ランダムなデータ不足シナリオ下で総合的な実験を行い,文献における既存の評価に比べて統一性,透明性,公正性を付加する。経験的に、LNLNは既存のベースラインを一貫して上回り、これらの挑戦的で広範な評価指標よりも優れたパフォーマンスを示している。 The field of Multimodal Sentiment Analysis (MSA) has recently witnessed an emerging direction seeking to tackle the issue of data incompleteness. Recognizing that the language modality typically contains dense sentiment information, we consider it as the dominant modality and present an innovative Language-dominated Noise-resistant Learning Network (LNLN) to achieve robust MSA. The proposed LNLN features a dominant modality correction (DMC) module and dominant modality based multimodal learning (DMML) module, which enhances the model's robustness across various noise scenarios by ensuring the quality of dominant modality representations. Aside from the methodical design, we perform comprehensive experiments under random data missing scenarios, utilizing diverse and meaningful settings on several popular datasets (\textit{e.g.,} MOSI, MOSEI, and SIMS), providing additional uniformity, transparency, and fairness compared to existing evaluations in the literature. Empirically, LNLN consistently outperforms existing baselines, demonstrating superior performance across these challenging and extensive evaluation metrics.	翻訳日:2024-11-05 16:08:18 公開日:2024-11-01
# Embodied Agent Interface: Embodied Decision Making のための LLM ベンチマーク Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making ( http://arxiv.org/abs/2410.07166v2 ) ライセンス: Link先を確認	Manling Li, Shiyu Zhao, Qineng Wang, Kangrui Wang, Yu Zhou, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu,	(参考訳) 我々は,大規模言語モデル (LLM) を具体的意思決定のために評価することを目指している。具体的環境における意思決定にLLMを利用することは大きな成果を上げてきましたが、通常、異なる目的のために異なるドメインに適用され、異なる入力や出力に基づいて構築されるため、それらのパフォーマンスに関する体系的な理解はいまだに欠けています。さらに、既存の評価は最終成功率にのみ依存する傾向にあり、LLMに欠落している能力や、その問題のある場所を特定することは困難であり、結果として、具体化されたエージェントがLLMを効果的に選択的に活用することを妨げる。これらの制約に対処するために,多種多様なタスクの形式化とLCMベースのモジュールの入出力仕様をサポートする汎用インタフェース(Embodied Agent Interface)を提案する。具体的には統合できるのです 1)国家と時間的拡張目標の両方を含む幅広い意思決定課題の具体化。 2 意思決定に広く用いられる4つのLCMベースのモジュール:ゴール解釈、サブゴール分解、アクションシークエンシング、トランジションモデリング 3) 評価を幻覚の誤り、余裕の誤り、様々な種類の計画の誤りなど、さまざまな種類のエラーに分解する詳細な指標の収集。総合的に、我々のベンチマークは、異なるサブタスクに対するLLMのパフォーマンスを総合的に評価し、LLM駆動型AIシステムの強みと弱みを指摘し、具体的意思決定においてLLMを効果的かつ選択的に活用するための洞察を提供する。 We aim to evaluate Large Language Models (LLMs) for embodied decision making. While a significant body of work has been leveraging LLMs for decision making in embodied environments, we still lack a systematic understanding of their performance because they are usually applied in different domains, for different purposes, and built based on different inputs and outputs. Furthermore, existing evaluations tend to rely solely on a final success rate, making it difficult to pinpoint what ability is missing in LLMs and where the problem lies, which in turn blocks embodied agents from leveraging LLMs effectively and selectively. To address these limitations, we propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks and input-output specifications of LLM-based modules. Specifically, it allows us to unify 1) a broad set of embodied decision-making tasks involving both state and temporally extended goals, 2) four commonly-used LLM-based modules for decision making: goal interpretation, subgoal decomposition, action sequencing, and transition modeling, and 3) a collection of fine-grained metrics which break down evaluation into various types of errors, such as hallucination errors, affordance errors, various types of planning errors, etc. Overall, our benchmark offers a comprehensive assessment of LLMs' performance for different subtasks, pinpointing the strengths and weaknesses in LLM-powered embodied AI systems, and providing insights for effective and selective use of LLMs in embodied decision making.	翻訳日:2024-11-05 14:59:58 公開日:2024-11-01
# FedGraph:フェデレーショングラフ学習のための研究ライブラリとベンチマーク FedGraph: A Research Library and Benchmark for Federated Graph Learning ( http://arxiv.org/abs/2410.06340v2 ) ライセンス: Link先を確認	Yuhang Yao, Yuan Li, Xinyi Fan, Junhao Li, Kay Liu, Weizhao Jin, Srivatsan Ravi, Philip S. Yu, Carlee Joe-Wong,	(参考訳) フェデレーショングラフ学習は、重要な実践上の課題を持つ新興分野である。大規模グラフ上のノード分類問題に対するグラフニューラルネットワークのトレーニング精度を高めるために,多くのアルゴリズムが提案されているが,現実の展開に不可欠であるにもかかわらず,そのシステム性能は見過ごされがちである。このギャップに対処するため、フェデレーショングラフ学習において、実用的な分散デプロイメントとベンチマークのための研究ライブラリであるFedGraphを紹介した。 FedGraphは、最先端のグラフ学習メソッドをサポートし、トレーニング中の通信と計算コストに特化して、システムパフォーマンスを評価するための組み込みのプロファイリングツールを含んでいる。既存のベンチマークプラットフォームとは異なり、FedGraphは同型暗号化をネイティブに取り入れてプライバシ保護を強化し、複数の物理マシンをまたいだ分散トレーニングを可能にし、将来のフェデレーショングラフ学習アルゴリズムのシステム設計をガイドする評価フレームワークを提供することにより、実用的なアプリケーションの開発を促進する。これらの最適化を活用して、1億のノードを持つグラフ上で実行される最初のプライバシ保護フェデレーション学習システムを示すために、FedGraphを使用します。 Federated graph learning is an emerging field with significant practical challenges. While many algorithms have been proposed to enhance the accuracy of training graph neural networks, e.g., for node classification problems on large graphs, in a federated manner, their system performance is often overlooked, even though it is crucial for real-world deployment. To address this gap, we introduce FedGraph, a research library built for practical distributed deployment and benchmarking in federated graph learning. FedGraph supports a range of state-of-the-art graph learning methods and includes built-in profiling tools to evaluate system performance, focusing specifically on communication and computation costs during training. Unlike existing benchmark platforms, FedGraph natively incorporates homomorphic encryption to enhance privacy preservation and facilitates the development of practical applications by enabling distributed training across multiple physical machines, providing an evaluation framework that can guide the system design of future federated graph learning algorithms. Leveraging these optimizations, we use FedGraph to demonstrate the first privacy-preserving federated learning system to run on graphs with 100 million nodes.	翻訳日:2024-11-05 14:50:13 公開日:2024-11-01
# 量子アドバンテージの暗号解析 Cryptographic Characterization of Quantum Advantage ( http://arxiv.org/abs/2410.00499v1 ) ライセンス: Link先を確認	Tomoyuki Morimae, Yuki Shirakawa, Takashi Yamakawa,	(参考訳) 量子計算の優位性は、量子コンピューティングでは容易だが古典的な計算では困難である計算タスクの存在を指す。無条件で量子的優位性を示すことは、現在の複雑性理論の理解以上のものであり、従っていくつかの計算的な仮定が必要である。どの複雑性の仮定が必要で、量子的優位性に十分か? 本稿では,古典的一方向パズル(OWPuzzs)が存在する場合にのみ,量子性の非効率検証証明(IV-PoQ)が存在することを示す。私たちが知る限りでは、量子優位性の完全な暗号的特徴が得られたのはこれが初めてである。 IV-PoQは、サンプリングの優位性や探索の優位性など、以前に研究された様々な種類の量子優位性を捉えている。以前の研究(森前と山川2024)では、IV-PoQはOWFから構築できるが、弱い仮定によるIV-PoQの構築は未解決であった。私たちの結果はオープンな問題を解決します。 OWPuzzsは、ワンウェイ関数(OWF)よりも弱い多くの量子暗号プリミティブによって暗示される最も基本的な量子暗号プリミティブの1つである。したがって、IV-PoQと古典的なセキュリティを持つOWPuzzsの同値性は、量子的優位性がなければ、これらの基本原始は存在しないことを強調する。等価性はまた、量子的優位性はOWPuzzsの応用の例であることを意味する。コミットメントを除いて、OWPuzzsの応用は以前には知られていなかった。量子優位性はOWPuzzsの別の応用であることを示す。さらに、OWPuzzsの量子計算古典通信(QCCC)アプリケーションとしては初めてである。 Quantum computational advantage refers to an existence of computational tasks that are easy for quantum computing but hard for classical one. Unconditionally showing quantum advantage is beyond our current understanding of complexity theory, and therefore some computational assumptions are needed. Which complexity assumption is necessary and sufficient for quantum advantage? In this paper, we show that inefficient-verifier proofs of quantumness (IV-PoQ) exist if and only if classically-secure one-way puzzles (OWPuzzs) exist. As far as we know, this is the first time that a complete cryptographic characterization of quantum advantage is obtained. IV-PoQ capture various types of quantum advantage previously studied, such as sampling advantage and searching advantage. Previous work [Morimae and Yamakawa 2024] showed that IV-PoQ can be constructed from OWFs, but a construction of IV-PoQ from weaker assumptions was left open. Our result solves the open problem. OWPuzzs are one of the most fundamental quantum cryptographic primitives implied by many quantum cryptographic primitives weaker than one-way functions (OWFs). The equivalence between IV-PoQ and classically-secure OWPuzzs therefore highlights that if there is no quantum advantage, then these fundamental primitives do not exist. The equivalence also means that quantum advantage is an example of the applications of OWPuzzs. Except for commitments, no application of OWPuzzs was known before. Our result shows that quantum advantage is another application of OWPuzzs. Moreover, it is the first quantum computation classical communication (QCCC) application of OWPuzzs.	翻訳日:2024-11-05 05:16:55 公開日:2024-11-01
# 量子アドバンテージの暗号解析 Cryptographic Characterization of Quantum Advantage ( http://arxiv.org/abs/2410.00499v2 ) ライセンス: Link先を確認	Tomoyuki Morimae, Yuki Shirakawa, Takashi Yamakawa,	(参考訳) 量子計算の優位性は、量子コンピューティングでは容易だが古典的な計算では困難である計算タスクの存在を指す。無条件で量子的優位性を示すことは、現在の複雑性理論の理解以上のものであり、従っていくつかの計算的な仮定が必要である。どの複雑性の仮定が必要で、量子的優位性に十分か? 本稿では,古典的一方向パズル(OWPuzzs)が存在する場合にのみ,量子性の非効率検証証明(IV-PoQ)が存在することを示す。私たちが知る限りでは、量子優位性の完全な暗号的特徴が得られたのはこれが初めてである。 IV-PoQは、サンプリングベースの量子アドバンテージや探索ベースの利点など、以前に研究された様々な種類の量子アドバンテージをキャプチャする。これまでの研究(森前、山川、暗号2024)では、IV-PoQはOWFから構築できるが、弱い仮定によるIV-PoQの構築は未解決であった。私たちの結果はオープンな問題を解決します。 OWPuzzsは、ワンウェイ関数(OWF)よりも弱い多くの量子暗号プリミティブによって暗示される最も基本的な量子暗号プリミティブの1つである。したがって、IV-PoQと古典的なセキュリティを持つOWPuzzsの同値性は、量子的優位性がなければ、これらの基本原始は存在しないことを強調する。等価性はまた、量子的優位性はOWPuzzsの応用の例であることを意味する。コミットメントを除いて、OWPuzzsの応用は以前には知られていなかった。この結果は,[Chung, Goldin, and Gray, Crypto 2024] の解答であるOWPuzzsの別の応用であることを示す。さらに、OWPuzzsの最初の量子計算古典通信(QCCC)応用である。 Quantum computational advantage refers to an existence of computational tasks that are easy for quantum computing but hard for classical one. Unconditionally showing quantum advantage is beyond our current understanding of complexity theory, and therefore some computational assumptions are needed. Which complexity assumption is necessary and sufficient for quantum advantage? In this paper, we show that inefficient-verifier proofs of quantumness (IV-PoQ) exist if and only if classically-secure one-way puzzles (OWPuzzs) exist. As far as we know, this is the first time that a complete cryptographic characterization of quantum advantage is obtained. IV-PoQ capture various types of quantum advantage previously studied, such as sampling-based quantum advantage and searching-based one. Previous work [Morimae and Yamakawa, Crypto 2024] showed that IV-PoQ can be constructed from OWFs, but a construction of IV-PoQ from weaker assumptions was left open. Our result solves the open problem. OWPuzzs are one of the most fundamental quantum cryptographic primitives implied by many quantum cryptographic primitives weaker than one-way functions (OWFs). The equivalence between IV-PoQ and classically-secure OWPuzzs therefore highlights that if there is no quantum advantage, then these fundamental primitives do not exist. The equivalence also means that quantum advantage is an example of the applications of OWPuzzs. Except for commitments, no application of OWPuzzs was known before. Our result shows that quantum advantage is another application of OWPuzzs, which solves the open question of [Chung, Goldin, and Gray, Crypto 2024]. Moreover, it is the first quantum-computation-classical-communication (QCCC) application of OWPuzzs.	翻訳日:2024-11-05 05:16:55 公開日:2024-11-01
# STONE: アクティブ3次元オブジェクト検出のためのサブモジュール最適化フレームワーク STONE: A Submodular Optimization Framework for Active 3D Object Detection ( http://arxiv.org/abs/2410.03918v2 ) ライセンス: Link先を確認	Ruiyu Mao, Sarthak Kumar Maharana, Rishabh K Iyer, Yunhui Guo,	(参考訳) 3Dオブジェクト検出は、自律運転やロボット工学など、様々な新興アプリケーションにとって基本的に重要である。正確な3Dオブジェクト検出器をトレーニングするための重要な要件は、大量のLiDARベースのポイントクラウドデータが利用可能であることである。残念なことに、ポイントクラウドデータのラベル付けは非常に難しい。本稿では,3次元物体検出装置のトレーニングにおけるラベル付けコストを大幅に削減する,統合されたアクティブな3次元物体検出フレームワークを提案する。本フレームワークは, アクティブな3次元物体検出の問題に特化して, サブモジュラー最適化の新たな定式化を基礎としている。特に, アクティブな3Dオブジェクト検出に関連する2つの基本的な課題に対処する: データ不均衡と, 様々な難易度を持つLiDARベースのポイントクラウドデータを含むデータの分布をカバーする必要性。大規模実験により,本手法は既存の能動学習法と比較して,高い計算効率で最先端の性能を達成できることが実証された。コードはhttps://github.com/RuiyuM/STONEで入手できる。 3D object detection is fundamentally important for various emerging applications, including autonomous driving and robotics. A key requirement for training an accurate 3D object detector is the availability of a large amount of LiDAR-based point cloud data. Unfortunately, labeling point cloud data is extremely challenging, as accurate 3D bounding boxes and semantic labels are required for each potential object. This paper proposes a unified active 3D object detection framework, for greatly reducing the labeling cost of training 3D object detectors. Our framework is based on a novel formulation of submodular optimization, specifically tailored to the problem of active 3D object detection. In particular, we address two fundamental challenges associated with active 3D object detection: data imbalance and the need to cover the distribution of the data, including LiDAR-based point cloud data of varying difficulty levels. Extensive experiments demonstrate that our method achieves state-of-the-art performance with high computational efficiency compared to existing active learning methods. The code is available at https://github.com/RuiyuM/STONE.	翻訳日:2024-11-04 21:09:23 公開日:2024-11-01
# オーバーコンプリート画素を用いたコントラスト学習による動きブラインド画像の配向 Aligning Motion-Blurred Images Using Contrastive Learning on Overcomplete Pixels ( http://arxiv.org/abs/2410.07410v2 ) ライセンス: Link先を確認	Leonid Pogorelyuk, Stefan T. Radev,	(参考訳) 動きのぼかしに不変なオーバーコンプリート画素レベルの特徴を学習するための新しいコントラスト的目的を提案する。他の不変性(例えば、ポーズ、照明、天候)は、自己監督訓練中にラベルのない画像に対応する変換を適用することで学習することができる。我々の目的を訓練した単純なU-Netは、現実的で困難な条件下で撮影される見えないビデオのフレームを移動カメラに合わせるのに有用なローカル機能を生み出すことができることを実証する。また、慎重にデザインされた玩具の例を用いて、画像中のオブジェクトの同一性やそれらのオブジェクトに対する画素座標を符号化できることも示す。 We propose a new contrastive objective for learning overcomplete pixel-level features that are invariant to motion blur. Other invariances (e.g., pose, illumination, or weather) can be learned by applying the corresponding transformations on unlabeled images during self-supervised training. We showcase that a simple U-Net trained with our objective can produce local features useful for aligning the frames of an unseen video captured with a moving camera under realistic and challenging conditions. Using a carefully designed toy example, we also show that the overcomplete pixels can encode the identity of objects in an image and the pixel coordinates relative to these objects.	翻訳日:2024-11-04 21:09:23 公開日:2024-11-01
# In-Context Transfer Learning: 類似タスクの転送によるデモレーション合成 In-Context Transfer Learning: Demonstration Synthesis by Transferring Similar Tasks ( http://arxiv.org/abs/2410.01548v1 ) ライセンス: Link先を確認	Dingzirui Wang, Xuangliang Zhang, Qiguang Chen, Longxu Dou, Xiao Xu, Rongyu Cao, Yingwei Ma, Qingfu Zhu, Wanxiang Che, Binhua Li, Fei Huang, Yongbin Li,	(参考訳) In-context Learning (ICL) は、大規模言語モデル(LLM)が様々なタスクに適応するための効果的なアプローチである。ラベル付けデモのコストが高いことを考えると、多くの手法がLSMを用いてスクラッチからデモを合成することを提案している。しかし、スクラッチから合成された実演の質は、LLMの能力と知識によって制限される。そこで本稿では,移動学習にヒントを得たICTL(In-Context Transfer Learning)を提案する。 ICTLはソースサンプリングとターゲット転送の2つのステップから構成される。まず,対象タスクに類似したサンプルソースデモへの転送エラーを最小限に抑える最適化目標を定義する。次に,LLMを用いてサンプルソースのデモを対象タスクに転送し,対象タスクの定義と形式を一致させる。 Super-NI実験の結果,ICTLの合成効率は平均2.0%向上し,本手法の有効性が示された。 In-context learning (ICL) is an effective approach to help large language models (LLMs) adapt to various tasks by providing demonstrations of the target task. Considering the high cost of labeling demonstrations, many methods propose synthesizing demonstrations from scratch using LLMs. However, the quality of the demonstrations synthesized from scratch is limited by the capabilities and knowledge of LLMs. To address this, inspired by transfer learning, we propose In-Context Transfer Learning (ICTL), which synthesizes target task demonstrations by transferring labeled demonstrations from similar source tasks. ICTL consists of two steps: source sampling and target transfer. First, we define an optimization objective, which minimizes transfer error to sample source demonstrations similar to the target task. Then, we employ LLMs to transfer the sampled source demonstrations to the target task, matching the definition and format of the target task. Experiments on Super-NI show that ICTL outperforms synthesis from scratch by 2.0% on average, demonstrating the effectiveness of our method.	翻訳日:2024-11-04 17:04:38 公開日:2024-11-01
# In-Context Transfer Learning: 類似タスクの転送によるデモレーション合成 In-Context Transfer Learning: Demonstration Synthesis by Transferring Similar Tasks ( http://arxiv.org/abs/2410.01548v2 ) ライセンス: Link先を確認	Dingzirui Wang, Xuanliang Zhang, Qiguang Chen, Longxu Dou, Xiao Xu, Rongyu Cao, Yingwei Ma, Qingfu Zhu, Wanxiang Che, Binhua Li, Fei Huang, Yongbin Li,	(参考訳) In-context Learning (ICL) は、大規模言語モデル(LLM)が様々なタスクに適応するための効果的なアプローチである。ラベル付けデモのコストが高いことを考えると、多くの手法がLSMを用いてスクラッチからデモを合成することを提案している。しかし、スクラッチから合成された実演の質は、LLMの能力と知識によって制限される。そこで本稿では,移動学習にヒントを得たICTL(In-Context Transfer Learning)を提案する。 ICTLはソースサンプリングとターゲット転送の2つのステップから構成される。まず,対象タスクに類似したサンプルソースデモへの転送エラーを最小限に抑える最適化目標を定義する。次に,LLMを用いてサンプルソースのデモを対象タスクに転送し,対象タスクの定義と形式を一致させる。 Super-NI実験の結果,ICTLの合成効率は平均2.0%向上し,本手法の有効性が示された。 In-context learning (ICL) is an effective approach to help large language models (LLMs) adapt to various tasks by providing demonstrations of the target task. Considering the high cost of labeling demonstrations, many methods propose synthesizing demonstrations from scratch using LLMs. However, the quality of the demonstrations synthesized from scratch is limited by the capabilities and knowledge of LLMs. To address this, inspired by transfer learning, we propose In-Context Transfer Learning (ICTL), which synthesizes target task demonstrations by transferring labeled demonstrations from similar source tasks. ICTL consists of two steps: source sampling and target transfer. First, we define an optimization objective, which minimizes transfer error to sample source demonstrations similar to the target task. Then, we employ LLMs to transfer the sampled source demonstrations to the target task, matching the definition and format of the target task. Experiments on Super-NI show that ICTL outperforms synthesis from scratch by 2.0% on average, demonstrating the effectiveness of our method.	翻訳日:2024-11-04 17:04:38 公開日:2024-11-01
# 量子コミットと量子片方向のOracle分離 Oracle Separation Between Quantum Commitments and Quantum One-wayness ( http://arxiv.org/abs/2410.03358v2 ) ライセンス: Link先を確認	John Bostanci, Boyang Chen, Barak Nehoran,	(参考訳) 量子コミットメントが存在するが、(効果的に検証可能な)片方向状態生成器が存在しないような、ユニタリな量子オラクルが存在することを示す。どちらも、暗号の最小仮定としてワンウェイ関数を置き換える候補として広く考えられている。最近の研究は、一方の状態発生器からコミットメントを構築することができることを示したが、他方の方向は未解決のままである。我々の結果はブラックボックスの構成を除外し、この決定的なオープンな問題を解決し、量子コミットメント(EFI対の同値クラス、量子オブザーバー転送、セキュアな量子多パーティ計算)が、すべての既知のプリミティブの中では、極端に弱いように見えることを示唆している。 We show that there exists a unitary quantum oracle relative to which quantum commitments exist but no (efficiently verifiable) one-way state generators exist. Both have been widely considered candidates for replacing one-way functions as the minimal assumption for cryptography: the weakest cryptographic assumption implied by all of computational cryptography. Recent work has shown that commitments can be constructed from one-way state generators, but the other direction has remained open. Our results rule out any black-box construction, and thus settle this crucial open problem, suggesting that quantum commitments (as well as its equivalency class of EFI pairs, quantum oblivious transfer, and secure quantum multiparty computation) appear to be strictly weakest among all known cryptographic primitives.	翻訳日:2024-11-04 14:45:01 公開日:2024-11-01
# 目標認識型コントラスト損失の増大によるノード表現の改善 Improving Node Representation by Boosting Target-Aware Contrastive Loss ( http://arxiv.org/abs/2410.03901v2 ) ライセンス: Link先を確認	Ying-Chun Lin, Jennifer Neville,	(参考訳) グラフは、複雑な接続をキャプチャするノードとエッジを持つエンティティ間の複雑な関係をモデル化する。ノード表現学習では、ノードを低次元の埋め込みに変換する。これらの埋め込みは典型的には下流タスクの機能として使用される。そのため、その品質はタスクのパフォーマンスに大きな影響を与えます。ノード表現学習のための既存のアプローチは、半教師付き、非教師なし、自己教師付きパラダイムである。グラフ領域では、(半教師付き学習はクラスラベルに基づくモデルのみを最適化し、他の豊富なグラフ信号を無視し、一般化を制限する。自己教師付き学習や教師なし学習は、基礎となるグラフ信号をよりよくキャプチャする表現を生成するが、これらのキャプチャされた信号が下流のターゲットタスクに有用であることは、様々である。このギャップを埋めるために,目標タスクとノード表現間の相互情報を自己教師型学習プロセスで最大化し,目標タスク性能を向上させることを目的としたターゲット認識コントラスト学習(Target-Aware Contrastive Learning, CL)を導入する。これは、XGBoost Sampler (XGSampler) というサンプリング機能によって実現され、提案されているTarget-Aware Contrastive Loss (XTCL) の適切な正のサンプルをサンプリングする。 XTCLを最小化することにより、ターゲット認識CLはターゲットタスクとノード表現間の相互情報を増大させ、モデルの一般化が向上する。さらに、XGSamplerは適切な正のサンプルをサンプリングするための重みを示すことによって、各信号の解釈可能性を高める。実験により,XTCLはノード分類とリンク予測タスクの2つのタスクにおいて,最先端モデルと比較して性能を著しく向上することを示した。 Graphs model complex relationships between entities, with nodes and edges capturing intricate connections. Node representation learning involves transforming nodes into low-dimensional embeddings. These embeddings are typically used as features for downstream tasks. Therefore, their quality has a significant impact on task performance. Existing approaches for node representation learning span (semi-)supervised, unsupervised, and self-supervised paradigms. In graph domains, (semi-)supervised learning often only optimizes models based on class labels, neglecting other abundant graph signals, which limits generalization. While self-supervised or unsupervised learning produces representations that better capture underlying graph signals, the usefulness of these captured signals for downstream target tasks can vary. To bridge this gap, we introduce Target-Aware Contrastive Learning (Target-aware CL) which aims to enhance target task performance by maximizing the mutual information between the target task and node representations with a self-supervised learning process. This is achieved through a sampling function, XGBoost Sampler (XGSampler), to sample proper positive examples for the proposed Target-Aware Contrastive Loss (XTCL). By minimizing XTCL, Target-aware CL increases the mutual information between the target task and node representations, such that model generalization is improved. Additionally, XGSampler enhances the interpretability of each signal by showing the weights for sampling the proper positive examples. We show experimentally that XTCL significantly improves the performance on two target tasks: node classification and link prediction tasks, compared to state-of-the-art models.	翻訳日:2024-11-04 14:45:01 公開日:2024-11-01
# 限られたラベルを持つソーシャルメディア上での自殺検出のための大規模言語モデルの導入 Leveraging Large Language Models for Suicide Detection on Social Media with Limited Labels ( http://arxiv.org/abs/2410.04501v3 ) ライセンス: Link先を確認	Vy Nguyen, Chau Pham,	(参考訳) 自殺思考の頻度の増加は、早期発見と介入の重要性を強調している。ソーシャルメディアプラットフォームでは、ユーザが個人的な経験を共有し、助けを求める場合、リスクのある個人を特定するために利用することができる。しかし、大量の日刊記事が手作業によるレビューを非現実的にしている。本稿では,テキストベースのソーシャルメディア投稿における自殺的内容を自動的に検出するLarge Language Models (LLMs) について検討する。ラベルの精度を高めるため,従来の分類微調整技術とともに,LLMの促進によるラベルなしデータの擬似ラベル生成手法を提案する。そこで我々は,Qwen2-72B-インストラクタとLlama3-8B,Llama3.1-8B,Gemma2-9Bなどの微調整モデルを用いて,Qwen2-72B-インストラクタを誘導するアンサンブルモデルを開発した。我々は、IEEE Big Data 2024 Big Data Cupのトラックである、ソーシャルメディアチャレンジにおける自殺思想検出のデータセットに対するアプローチを評価した。さらに、異なるモデルの影響を総合的に分析し、検出性能に対する微調整戦略について検討する。実験の結果,アンサンブルモデルでは個々のモデルと比較して5%の精度で検出精度が向上した。公開テストセットで0.770、プライベートテストセットで0.731の重みF1スコアを達成し、ソーシャルメディアで自殺内容を特定するための有望なソリューションを提供する。解析の結果,LLMの選択が性能に影響を及ぼし,より大きなモデルで精度が向上した。私たちのコードとチェックポイントはhttps://github.com/khanhvynguyen/Suicide_Detection_LLMs.comで公開されています。 The increasing frequency of suicidal thoughts highlights the importance of early detection and intervention. Social media platforms, where users often share personal experiences and seek help, could be utilized to identify individuals at risk. However, the large volume of daily posts makes manual review impractical. This paper explores the use of Large Language Models (LLMs) to automatically detect suicidal content in text-based social media posts. We propose a novel method for generating pseudo-labels for unlabeled data by prompting LLMs, along with traditional classification fine-tuning techniques to enhance label accuracy. To create a strong suicide detection model, we develop an ensemble approach involving prompting with Qwen2-72B-Instruct, and using fine-tuned models such as Llama3-8B, Llama3.1-8B, and Gemma2-9B. We evaluate our approach on the dataset of the Suicide Ideation Detection on Social Media Challenge, a track of the IEEE Big Data 2024 Big Data Cup. Additionally, we conduct a comprehensive analysis to assess the impact of different models and fine-tuning strategies on detection performance. Experimental results show that the ensemble model significantly improves the detection accuracy, by 5% points compared with the individual models. It achieves a weight F1 score of 0.770 on the public test set, and 0.731 on the private test set, providing a promising solution for identifying suicidal content in social media. Our analysis shows that the choice of LLMs affects the prompting performance, with larger models providing better accuracy. Our code and checkpoints are publicly available at https://github.com/khanhvynguyen/Suicide_Detection_LLMs.	翻訳日:2024-11-04 14:45:01 公開日:2024-11-01
# アウトカム非依存型MNARコンバウンディングにおける因果効果の境界と感度解析 Bounds and Sensitivity Analysis of the Causal Effect Under Outcome-Independent MNAR Confounding ( http://arxiv.org/abs/2410.06726v2 ) ライセンス: Link先を確認	Jose M. Peña,	(参考訳) 共同ファウンダーがランダムに欠席している場合、被曝下での潜在的な結果の確率と、露出しない非露光とのコントラストについて仮定自由境界を報告する。欠落メカニズムは結果非依存であると仮定する。また,境界を補完する感度解析手法も報告した。 We report assumption-free bounds for any contrast between the probabilities of the potential outcome under exposure and non-exposure when the confounders are missing not at random. We assume that the missingness mechanism is outcome-independent. We also report a sensitivity analysis method to complement our bounds.	翻訳日:2024-11-04 14:45:01 公開日:2024-11-01
# 多言語自己改善のための言語不均衡駆動リワード Language Imbalance Driven Rewarding for Multilingual Self-improving ( http://arxiv.org/abs/2410.08964v2 ) ライセンス: Link先を確認	Wen Yang, Junhong Wu, Chen Wang, Chengqing Zong, Jiajun Zhang,	(参考訳) 大規模言語モデル(LLM)は多くのタスクで最先端のパフォーマンスを達成した。しかし、これらの進歩は英語や中国語のような「第一級」の言語に大きく恩恵を受けており、他の多くの言語が不足している。この不均衡は、より広範なアプリケーションを制限する一方で、言語間の自然な選好ランキングを生成し、自己改善的な方法でLLMの多言語機能をブートストラップする機会を提供する。そこで我々は, LLM内の支配的言語と非支配的言語との間の固有不均衡を報酬信号として活用する$\textit{Language Im Balance Driven Rewarding}$を提案する。反復的なDPO訓練は、このアプローチが非支配言語におけるLLM性能を高めるだけでなく、支配言語の性能も向上し、反復的な報酬信号が得られることを示した。このアプローチの2回にわたる微調整のMeta-Llama-3-8B-インストラクションにより、命令追従タスクと算術推論タスクの多言語パフォーマンスが継続的に改善され、X-AlpacaEvalのリードボードでは平均7.46%、MGSMベンチマークでは13.9%の精度で改善されたことが証明された。この研究は初期の探索として機能し、LLMの多言語自己改善の道を開いた。 Large Language Models (LLMs) have achieved state-of-the-art performance across numerous tasks. However, these advancements have predominantly benefited "first-class" languages such as English and Chinese, leaving many other languages underrepresented. This imbalance, while limiting broader applications, generates a natural preference ranking between languages, offering an opportunity to bootstrap the multilingual capabilities of LLM in a self-improving manner. Thus, we propose $\textit{Language Imbalance Driven Rewarding}$, where the inherent imbalance between dominant and non-dominant languages within LLMs is leveraged as a reward signal. Iterative DPO training demonstrates that this approach not only enhances LLM performance in non-dominant languages but also improves the dominant language's capacity, thereby yielding an iterative reward signal. Fine-tuning Meta-Llama-3-8B-Instruct over two iterations of this approach results in continuous improvements in multilingual performance across instruction-following and arithmetic reasoning tasks, evidenced by an average improvement of 7.46% win rate on the X-AlpacaEval leaderboard and 13.9% accuracy on the MGSM benchmark. This work serves as an initial exploration, paving the way for multilingual self-improvement of LLMs.	翻訳日:2024-11-04 14:45:01 公開日:2024-11-01

Title

Authors

Abstract

論文公表日・翻訳日

# DQ-DETR: ティニーオブジェクト検出のための動的クエリ付きDTR

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection ( http://arxiv.org/abs/2404.03507v5 )

ライセンス: Link先を確認

Hou-I Liu, Yi-Xin Huang, Hong-Han Shuai, Wen-Huang Cheng,

(参考訳) 従来のDETRのような手法がジェネリックオブジェクト検出に成功しているにも関わらず、オブジェクトクエリの位置情報は、通常オブジェクトよりもスケールが極端に小さい小さなオブジェクトを検出するためにカスタマイズされていないため、小さなオブジェクト検出は依然として難しい課題である。また、一定の数のクエリを使用したDETRライクなメソッドは、小さなオブジェクトのみを含む空中データセットには適せず、インスタンスの数は異なるイメージ間で不均衡である。そこで本稿では,DQ-DETRという,分類的カウントモジュール,カウント誘導機能拡張,動的クエリ選択という,3つのコンポーネントから構成されるシンプルなモデルを提案する。 DQ-DETRは、カテゴリカウントモジュールからの予測と密度マップを使用して、オブジェクトクエリの数を動的に調整し、クエリの位置情報を改善する。我々のモデルDQ-DETRは従来のCNNやDETRのような手法より優れており、AI-TOD-V2データセット上で最先端のmAPを30.2%達成している。私たちのコードはhttps://github.com/Katie0723/DQ-DETRで公開されます。

Despite previous DETR-like methods having performed successfully in generic object detection, tiny object detection is still a challenging task for them since the positional information of object queries is not customized for detecting tiny objects, whose scale is extraordinarily smaller than general objects. Also, DETR-like methods using a fixed number of queries make them unsuitable for aerial datasets, which only contain tiny objects, and the numbers of instances are imbalanced between different images. Thus, we present a simple yet effective model, named DQ-DETR, which consists of three different components: categorical counting module, counting-guided feature enhancement, and dynamic query selection to solve the above-mentioned problems. DQ-DETR uses the prediction and density maps from the categorical counting module to dynamically adjust the number of object queries and improve the positional information of queries. Our model DQ-DETR outperforms previous CNN-based and DETR-like methods, achieving state-of-the-art mAP 30.2% on the AI-TOD-V2 dataset, which mostly consists of tiny objects. Our code will be available at https://github.com/Katie0723/DQ-DETR.

翻訳日:2024-11-09 03:26:10 公開日:2024-11-01

# HENASY:Egocentric Video-Language Modelのためのシーンエンティティの集合学習

HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model ( http://arxiv.org/abs/2406.00307v3 )

ライセンス: Link先を確認

Khoa Vo, Thinh Phan, Kashu Yamazaki, Minh Tran, Ngan Le,

(参考訳) 現在のビデオ言語モデル(VLM)は、ビデオと言語モダリティ間のインスタンスレベルのアライメントに大きく依存しており、(1)視覚的推論は、人間が一人称視点で行う自然な認識に反し、推論の解釈の欠如を招き、(2)学習は、2つのモダリティ間の固有のきめ細かい関係を捉えるのに限られている。本稿では、人間の知覚からインスピレーションを得て、エゴセントリックな映像表現のための構成的アプローチを探求する。 HENASY (Hierarchical ENtities ASsemblY) を導入し、時間を通して動的に進化するシーンエンティティを明示的にアレンジし、ビデオ表現とそれらの関係をモデル化する時空間トークングループ化機構を含む。構成構造理解を活用することで、HENASYは、自由形式のテキストクエリによる視覚的グラウンドリングを通じて、強い解釈性を持つ。さらに、エンティティ中心の理解を促進するために、多義的なコントラスト損失のスイートについても検討する。これは、ビデオナレーション、名詞、動詞のアライメントという3つのアライメントタイプから構成される。提案手法は,ビデオ/テキスト検索,アクション認識,マルチチョイスクエリ,自然言語クエリ,モーメントクエリを含む,ゼロショット転送やビデオ/テキスト表現による5つの下流タスクの競合性能を維持しながら,定量的および定性的な実験において高い解釈性を示す。

Current video-language models (VLMs) rely extensively on instance-level alignment between video and language modalities, which presents two major limitations: (1) visual reasoning disobeys the natural perception that humans do in first-person perspective, leading to a lack of reasoning interpretation; and (2) learning is limited in capturing inherent fine-grained relationships between two modalities. In this paper, we take an inspiration from human perception and explore a compositional approach for egocentric video representation. We introduce HENASY (Hierarchical ENtities ASsemblY), which includes a spatiotemporal token grouping mechanism to explicitly assemble dynamically evolving scene entities through time and model their relationship for video representation. By leveraging compositional structure understanding, HENASY possesses strong interpretability via visual grounding with free-form text queries. We further explore a suite of multi-grained contrastive losses to facilitate entity-centric understandings. This comprises three alignment types: video-narration, noun-entity, verb-entities alignments. Our method demonstrates strong interpretability in both quantitative and qualitative experiments; while maintaining competitive performances on five downstream tasks via zero-shot transfer or as video/text representation, including video/text retrieval, action recognition, multi-choice query, natural language query, and moments query.

翻訳日:2024-11-09 01:56:09 公開日:2024-11-01

# DuQuant: デュアルトランスフォーメーションによるアウトリーチの配布により、より強力な量子LLMが実現

DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs ( http://arxiv.org/abs/2406.01721v2 )

ライセンス: Link先を確認

Haokun Lin, Haobo Xu, Yichen Wu, Jingzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun, Ying Wei,

(参考訳) 大規模言語モデル(LLM)の量子化は、特に効率的な低ビット表現を妨げる外部アクティベーションの存在により、大きな課題に直面している。従来のアプローチは主に$\textit{Normal Outliers}$に対処する。しかし、これらの手法は、より大きな値を表示する$\textit{Massive Outliers}$の滑らか化に苦慮し、低ビット量子化の大幅な性能低下につながる。本稿では, 回転変換と置換変換を利用して, 大規模および正常な外れ値の両方を効果的に緩和する新しいアプローチであるDuQuantを紹介する。まず、DuQuantは、特定の外周次元を事前の知識として使用して回転行列を構築し、ブロックワイズ回転により隣接チャネルへの外周を再分配することから始める。第2に,ブロック間における外乱の分布のバランスをとるためにジグザグの変分を用いることにより,ブロック幅のばらつきを低減させる。その後の回転はアクティベーションランドスケープをさらに円滑にし、モデル性能を高める。 DuQuantは、量子化プロセスを単純化し、アウトリーチの管理に優れ、4ビットの重みアクティベーション量子化であっても、複数のタスクにおいて、さまざまなサイズやタイプのLLMに対して最先端のベースラインよりも優れています。私たちのコードはhttps://github.com/Hsu1023/DuQuant.comから入手可能です。

Quantization of large language models (LLMs) faces significant challenges, particularly due to the presence of outlier activations that impede efficient low-bit representation. Traditional approaches predominantly address $\textit{Normal Outliers}$, which are activations across all tokens with relatively large magnitudes. However, these methods struggle with smoothing $\textit{Massive Outliers}$ that display significantly larger values, which leads to significant performance degradation in low-bit quantization. In this paper, we introduce DuQuant, a novel approach that utilizes rotation and permutation transformations to more effectively mitigate both massive and normal outliers. First, DuQuant starts by constructing rotation matrices, using specific outlier dimensions as prior knowledge, to redistribute outliers to adjacent channels by block-wise rotation. Second, We further employ a zigzag permutation to balance the distribution of outliers across blocks, thereby reducing block-wise variance. A subsequent rotation further smooths the activation landscape, enhancing model performance. DuQuant simplifies the quantization process and excels in managing outliers, outperforming the state-of-the-art baselines across various sizes and types of LLMs on multiple tasks, even with 4-bit weight-activation quantization. Our code is available at https://github.com/Hsu1023/DuQuant.

翻訳日:2024-11-09 01:56:09 公開日:2024-11-01

# ゴール制約付き双方向探索による二重符号化合成計画

Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search ( http://arxiv.org/abs/2407.06334v2 )

ライセンス: Link先を確認

Kevin Yu, Jihye Roh, Ziang Li, Wenhao Gao, Runzhong Wang, Connor W. Coley,

(参考訳) コンピュータ支援合成計画(CASP)アルゴリズムは、低から中程度の複雑さの分子への逆合成経路を計画する専門家レベルの能力を示している。しかし、現在の探索法では、任意の構造ブロックに到達し、特定の分子の使用が望まれる一般的な現実世界の制約に対処できないと仮定している。そこで,本論文では,材料制約を起点とした合成計画の定式化について述べる。本定式化では,目的と目標からの展開をインターリーブし,制約を満たすために,双方向グラフ探索方式に基づく新しいCASPアルゴリズムであるDouble-Ended Synthesis Planning (DESP)を提案する。探索アルゴリズムは、有効化学反応の部分的に観察されたハイパーグラフからオフラインで学習した目標条件付きコストネットワークによって導かれる。複数の新しいベンチマークにおいて、専門家の目標に向けた合成計画に偏りを生じさせることで、解解率の向上と探索拡張数の削減にDESPの有用性を実証する。 DESPは既存のワンステップ逆合成モデルを利用することができ、これらのワンステップモデルの性能が向上するにつれて、その性能が拡大すると予想する。

Computer-aided synthesis planning (CASP) algorithms have demonstrated expert-level abilities in planning retrosynthetic routes to molecules of low to moderate complexity. However, current search methods assume the sufficiency of reaching arbitrary building blocks, failing to address the common real-world constraint where using specific molecules is desired. To this end, we present a formulation of synthesis planning with starting material constraints. Under this formulation, we propose Double-Ended Synthesis Planning (DESP), a novel CASP algorithm under a bidirectional graph search scheme that interleaves expansions from the target and from the goal starting materials to ensure constraint satisfiability. The search algorithm is guided by a goal-conditioned cost network learned offline from a partially observed hypergraph of valid chemical reactions. We demonstrate the utility of DESP in improving solve rates and reducing the number of search expansions by biasing synthesis planning towards expert goals on multiple new benchmarks. DESP can make use of existing one-step retrosynthesis models, and we anticipate its performance to scale as these one-step model capabilities improve.

翻訳日:2024-11-08 23:13:33 公開日:2024-11-01

# バックストリーのアンソロジーによる言語モデルのための仮想ペルソナ

Virtual Personas for Language Models via an Anthology of Backstories ( http://arxiv.org/abs/2407.06576v2 )

ライセンス: Link先を確認

Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Kohen Behar, David M. Chan,

(参考訳) 大規模言語モデル(LLM)は、何百万人もの異なる著者によって書かれた膨大なテキストリポジトリから訓練され、人間の特性の多様性を反映している。これらのモデルは、行動学的研究において、人間の被験者の近似として使われる可能性があるが、これまでは、個人のユーザーに合わせてモデル応答を操ることに限られていた。本研究では,オープンエンドのライフストーリーを活用することで,LLMを特定の仮想人格に調和させる手法であるAnthologyを紹介し,これを「バックストリー」と呼ぶ。本手法は,実験結果の一貫性と信頼性を高めつつ,多様なサブ集団のより良い表現を確実にすることを示す。 Pew Research CenterのAmerican Trends Panel (ATP) で実施された3つの全国的代表的人間調査のうち、Anthology は人間の回答分布の一致を最大18%改善し、一貫性の指標を27%改善することを示した。私たちのコードと生成されたバックストリーはhttps://github.com/CannyLab/anthology.comで公開されています。

Large language models (LLMs) are trained from vast repositories of text authored by millions of distinct authors, reflecting an enormous diversity of human traits. While these models bear the potential to be used as approximations of human subjects in behavioral studies, prior efforts have been limited in steering model responses to match individual human users. In this work, we introduce "Anthology", a method for conditioning LLMs to particular virtual personas by harnessing open-ended life narratives, which we refer to as "backstories." We show that our methodology enhances the consistency and reliability of experimental outcomes while ensuring better representation of diverse sub-populations. Across three nationally representative human surveys conducted as part of Pew Research Center's American Trends Panel (ATP), we demonstrate that Anthology achieves up to 18% improvement in matching the response distributions of human respondents and 27% improvement in consistency metrics. Our code and generated backstories are available at https://github.com/CannyLab/anthology.

翻訳日:2024-11-08 23:02:19 公開日:2024-11-01

# バックストリーのアンソロジーによる言語モデルのための仮想ペルソナ

Virtual Personas for Language Models via an Anthology of Backstories ( http://arxiv.org/abs/2407.06576v3 )

ライセンス: Link先を確認

Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Kohen Behar, David M. Chan,

翻訳日:2024-11-08 23:02:19 公開日:2024-11-01

# サイバーセキュリティ環境におけるモデル非依存クリーンラベルバックドア緩和

Model-agnostic clean-label backdoor mitigation in cybersecurity environments ( http://arxiv.org/abs/2407.08159v3 )

ライセンス: Link先を確認

Giorgio Severi, Simona Boboila, John Holodnak, Kendra Kratkiewicz, Rauf Izmailov, Michael J. De Lucia, Alina Oprea,

(参考訳) 機械学習モデルのトレーニングフェーズは、特にサイバーセキュリティにおける微妙なステップである。近年の研究では、トレーニングラベルを変更することなく、セキュリティ分類タスク用に設計されたモデルにバックドアを注入する、一連の悪質なトレーニングタイム攻撃が表面化している。本研究では,サイバーセキュリティの脅威モデルに対する洞察を利用して,これらのクリーンラベル中毒攻撃を効果的に軽減し,モデルユーティリティを保ちながら,新たな手法を提案する。慎重に選択された特徴部分空間上で密度に基づくクラスタリングを行い、新たな反復的なスコアリング手順によって不審なクラスタを段階的に分離することにより、既存のバックドア防衛文献に共通する前提の多くを必要とせずに攻撃を緩和することができる。提案手法の汎用性を示すため,ネットワークフローの分類とマルウェアの分類という,2つの古典的サイバーセキュリティデータに対するクリーンラベルモデルに依存しない2つの攻撃について,勾配強化とニューラルネットワークモデルを用いて評価を行った。

The training phase of machine learning models is a delicate step, especially in cybersecurity contexts. Recent research has surfaced a series of insidious training-time attacks that inject backdoors in models designed for security classification tasks without altering the training labels. With this work, we propose new techniques that leverage insights in cybersecurity threat models to effectively mitigate these clean-label poisoning attacks, while preserving the model utility. By performing density-based clustering on a carefully chosen feature subspace, and progressively isolating the suspicious clusters through a novel iterative scoring procedure, our defensive mechanism can mitigate the attacks without requiring many of the common assumptions in the existing backdoor defense literature. To show the generality of our proposed mitigation, we evaluate it on two clean-label model-agnostic attacks on two different classic cybersecurity data modalities: network flows classification and malware classification, using gradient boosting and neural network models.

翻訳日:2024-11-08 22:29:08 公開日:2024-11-01

# MAVIS: 自動データエンジンによる数学的ビジュアルインストラクションチューニング

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine ( http://arxiv.org/abs/2407.08739v2 )

ライセンス: Link先を確認

Renrui Zhang, Xinyu Wei, Dongzhi Jiang, Ziyu Guo, Shicheng Li, Yichi Zhang, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Bin Wei, Shanghang Zhang, Peng Gao, Chunyuan Li, Hongsheng Li,

(参考訳) MLLM(Multi-modal Large Language Models)の数学的能力は、数学図の視覚的エンコーディング、図形言語アライメント、チェーン・オブ・シークレット(CoT)推論の3つの領域で、まだ未調査のままである。これにより、効果的なトレーニングパラダイムと、詳細なCoTの合理性を備えた大規模で包括的なデータセットの緊急需要が生まれ、手動でアノテートするのは難しく、コストもかかる。この問題に対処するために,MLLMのための数学的なVISual命令チューニングパイプラインであるMAVISを提案する。我々は,データ生成プロセスが人間の介入やGPT APIの使用に完全に依存するように設計し,ダイアグラム・キャプション対応,質問応答の正しさ,CoT推論品質を保証した。このアプローチでは,MAVIS-Caption(558Kダイアグラムキャプションペア)とMAVIS-Instruct(834K視覚数学問題)の2つのデータセットをキュレートし,MLLMをスクラッチからトレーニングするための4つの段階を提案する。まず,MAVIS-Captionを用いて,図形視覚符号化の改良に適した対照的学習により,数学固有の視覚エンコーダ(CLIP-Math)を微調整する。第二に、MAVIS-Captionを利用して、CLIP-Mathをプロジェクション層によって大きな言語モデル(LLM)に整合させ、数学的領域における視覚言語アライメントを向上させる。第3に、ロバストな問題解決スキルの指導チューニングを行うためにMAVIS-Instructを採用し、結果のモデルをMAVIS-7Bと呼ぶ。第4に、我々のモデルのCoT能力を高めるために直接選好最適化(DPO)を適用し、ステップワイズ推論性能をさらに改善する。コードとデータはhttps://github.com/ZrrSkywalker/MAVISで公開される。

The mathematical capabilities of Multi-modal Large Language Models (MLLMs) remain under-explored with three areas to be improved: visual encoding of math diagrams, diagram-language alignment, and chain-of-thought (CoT) reasoning. This draws forth an urgent demand for an effective training paradigm and a large-scale, comprehensive dataset with detailed CoT rationales, which is challenging to collect and costly to annotate manually. To tackle this issue, we propose MAVIS, a MAthematical VISual instruction tuning pipeline for MLLMs, featuring an automatic data engine to efficiently create mathematical visual datasets. We design the data generation process to be entirely independent of human intervention or GPT API usage, while ensuring the diagram-caption correspondence, question-answer correctness, and CoT reasoning quality. With this approach, we curate two datasets, MAVIS-Caption (558K diagram-caption pairs) and MAVIS-Instruct (834K visual math problems with CoT rationales), and propose four progressive stages for training MLLMs from scratch. First, we utilize MAVIS-Caption to fine-tune a math-specific vision encoder (CLIP-Math) through contrastive learning, tailored for improved diagram visual encoding. Second, we also leverage MAVIS-Caption to align the CLIP-Math with a large language model (LLM) by a projection layer, enhancing vision-language alignment in mathematical domains. Third, we adopt MAVIS-Instruct to perform the instruction tuning for robust problem-solving skills, and term the resulting model as MAVIS-7B. Fourth, we apply Direct Preference Optimization (DPO) to enhance the CoT capabilities of our model, further refining its step-wise reasoning performance. Code and data will be released at https://github.com/ZrrSkywalker/MAVIS

翻訳日:2024-11-08 22:17:54 公開日:2024-11-01

# DPEC:低光度画像明瞭度向上のためのデュアルパス誤差補償法

DPEC: Dual-Path Error Compensation Method for Enhanced Low-Light Image Clarity ( http://arxiv.org/abs/2407.09553v3 )

ライセンス: Link先を確認

Shuang Wang, Qianwen Lu, Yihe Nie, Qingchuan Tao, Yanmei Yu,

(参考訳) 低照度画像強調の課題に対して,ディープラーニングに基づくアルゴリズムは従来の手法に比べて優れ,有効性を示している。既存のディープラーニングアルゴリズムは、主にRetinex理論に基づいて提案されているが、入力に含まれるノイズや色歪みを見落とし、最終的な結果において大きなノイズ増幅と局所色歪みをもたらすことがしばしばある。そこで本研究では,低照度条件下での画質向上を目的としたDual-Path Error Compensation法(DPEC)を提案する。 DPECは、微妙なピクセル差を正確に捉えた正確なピクセルレベルの誤差推定と、不要なノイズを効果的に除去する独立デノナイズを行う。局所的なテクスチャの詳細を保存し、ノイズ増幅を回避しつつ、画像の明るさを復元する。さらに,従来のCNNの長期的意味情報収集能力の限界を補うとともに,計算速度と資源効率の両方を考慮して,VMambaアーキテクチャをDPECのバックボーンに統合した。さらに, DPECのトレーニングを制約するため, HIS-Retinex損失を導入し, 画像の全体輝度分布が実環境とより密に一致していることを確認する。総合的な定量的および定性的な実験結果から,本アルゴリズムは6つのベンチマークテストにおいて,最先端の手法を著しく上回っていることが示された。

For the task of low-light image enhancement, deep learning-based algorithms have demonstrated superiority and effectiveness compared to traditional methods. Existing deep learning algorithms are proposed mainly based on the Retinex theory but overlook the noise and color distortion present in the input, which frequently results in significant noise amplification and local color distortion in the final results. To address this, we propose a Dual-Path Error Compensation method (DPEC), which aims to improve image quality in low-light conditions. DPEC performs precise pixel-level error estimation, which accurately captures subtle pixels differences, and independent denoising, which effectively removes unnecessary noise. This method restores image brightness while preserving local texture details and avoiding noise amplification. Furthermore, to compensate for the traditional CNN's limited ability to capture long-range semantic information and considering both computational speed and resource efficiency, we integrated the VMamba architecture into the backbone of DPEC. In addition, we introduced the HIS-Retinex loss to constrain the training of DPEC, ensuring that the overall brightness distribution of the images more closely aligns with real-world conditions. Comprehensive quantitative and qualitative experimental results demonstrate that our algorithm significantly outperforms state-of-the-art methods across six benchmark tests.

翻訳日:2024-11-08 21:54:45 公開日:2024-11-01

# DPEC:低光度画像明瞭度向上のためのデュアルパス誤差補償法

DPEC: Dual-Path Error Compensation Method for Enhanced Low-Light Image Clarity ( http://arxiv.org/abs/2407.09553v4 )

ライセンス: Link先を確認

Shuang Wang, Qianwen Lu, Boxing Peng, Yihe Nie, Qingchuan Tao,

(参考訳) 低照度画像強調の課題に対して,ディープラーニングに基づくアルゴリズムは従来の手法に比べて優れ,有効性を示している。しかし、これらの手法は主にレチネックス理論に基づいており、入力画像のノイズや色歪みを見落とし、ノイズの増幅や局所色歪みが増大する傾向にある。これらの問題に対処するため,低照度条件下での画質向上を目的としたDual-Path Error Compensation (DPEC)法を提案する。 DPECには、微妙な違いを捉えるための正確なピクセルレベルの誤差推定と、ノイズ増幅を防ぐための独立したデノナイジング機構が組み込まれている。我々は、DPECのトレーニングをガイドするためにHIS-Retinex損失を導入し、拡張画像の輝度分布が現実世界の条件と密接に一致していることを保証する。グローバルコンテキストの包括的理解のためにDPECを訓練しながら計算速度と資源効率のバランスをとるため,VMambaアーキテクチャをバックボーンに統合した。総合的な定量的および定性的実験結果から,このアルゴリズムは低照度画像強調における最先端手法を著しく上回っていることが示された。コードはhttps://github.com/wangshuang233/DPECで公開されている。

For the task of low-light image enhancement, deep learning-based algorithms have demonstrated superiority and effectiveness compared to traditional methods. However, these methods, primarily based on Retinex theory, tend to overlook the noise and color distortions in input images, leading to significant noise amplification and local color distortions in enhanced results. To address these issues, we propose the Dual-Path Error Compensation (DPEC) method, designed to improve image quality under low-light conditions by preserving local texture details while restoring global image brightness without amplifying noise. DPEC incorporates precise pixel-level error estimation to capture subtle differences and an independent denoising mechanism to prevent noise amplification. We introduce the HIS-Retinex loss to guide DPEC's training, ensuring the brightness distribution of enhanced images closely aligns with real-world conditions. To balance computational speed and resource efficiency while training DPEC for a comprehensive understanding of the global context, we integrated the VMamba architecture into its backbone. Comprehensive quantitative and qualitative experimental results demonstrate that our algorithm significantly outperforms state-of-the-art methods in low-light image enhancement. The code is publicly available online at https://github.com/wangshuang233/DPEC.

翻訳日:2024-11-08 21:54:45 公開日:2024-11-01

# $\texttt{MixGR}$:Complementary Granularityによる科学領域のRetriever Generalizationの強化

$\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity ( http://arxiv.org/abs/2407.10691v2 )

ライセンス: Link先を確認

Fengyu Cai, Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Iryna Gurevych, Heinz Koeppl,

(参考訳) 近年の研究では、知識ギャップを埋めることにより、科学領域内でのLLM、すなわちRAGの生成において文書検索の重要性が増している。しかし、密度の高い検索者は、特にクエリセグメントがドキュメントの様々な部分に対応する場合、ドメイン固有の検索と複雑なクエリドキュメントの関係に苦慮することが多い。そこで本研究では,クエリやドキュメントの様々なレベルの粒度にまたがるクエリ文書マッチングに対する高密度な検索者の認識を改善するために,ゼロショットアプローチを用いて$\texttt{MixGR}$を導入する。 $\texttt{MixGR}$は、これらの粒度に基づくさまざまなメトリクスを統合スコアに融合させ、包括的なクエリドキュメントの類似性を反映させる。実験の結果,nDCG@5では$\texttt{MixGR}$が従来の文書検索を24.7%,9.8%,6.9%で上回った。さらに、下流の2つの科学的質問応答タスクの有効性は、科学領域におけるLSMの適用を促進するために$\texttt{MixGR}$の利点を強調している。コードと実験データセットが利用可能だ。

Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domain-specific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, this paper introduces $\texttt{MixGR}$, which improves dense retrievers' awareness of query-document matching across various levels of granularity in queries and documents using a zero-shot approach. $\texttt{MixGR}$ fuses various metrics based on these granularities to a united score that reflects a comprehensive query-document similarity. Our experiments demonstrate that $\texttt{MixGR}$ outperforms previous document retrieval by 24.7%, 9.8%, and 6.9% on nDCG@5 with unsupervised, supervised, and LLM-based retrievers, respectively, averaged on queries containing multiple subqueries from five scientific retrieval datasets. Moreover, the efficacy of two downstream scientific question-answering tasks highlights the advantage of $\texttt{MixGR}$ to boost the application of LLMs in the scientific domain. The code and experimental datasets are available.

翻訳日:2024-11-08 21:32:38 公開日:2024-11-01

# コード生成におけるモジュール性獲得の影響を再考する

Revisiting the Impact of Pursuing Modularity for Code Generation ( http://arxiv.org/abs/2407.11406v2 )

ライセンス: Link先を確認

Deokyeong Kang, Ki Jung Seo, Taeuk Kim,

(参考訳) より小さな独立したビルディングブロックを統合することで最終プログラムを構築することを目的としたモジュールプログラミングは、ソフトウェア開発において望ましい実践とみなされてきた。しかし、最近、大きな言語モデル(LLM)上に構築されたコード生成エージェントの台頭により、疑問が浮かび上がっている。本研究では,コード生成におけるモジュラリティの影響を定量的な測定基準として導入することによって評価する。驚くべきことに、このトピックに関する従来の知恵とは異なり、モジュラリティはコード生成モデルのパフォーマンスを改善するための中核的な要素ではない。また、LLMがモジュラーコードよりもモジュラーコードを好む理由についても検討する。

Modular programming, which aims to construct the final program by integrating smaller, independent building blocks, has been regarded as a desirable practice in software development. However, with the rise of recent code generation agents built upon large language models (LLMs), a question emerges: is this traditional practice equally effective for these new tools? In this work, we assess the impact of modularity in code generation by introducing a novel metric for its quantitative measurement. Surprisingly, unlike conventional wisdom on the topic, we find that modularity is not a core factor for improving the performance of code generation models. We also explore potential explanations for why LLMs do not exhibit a preference for modular code compared to non-modular code.

翻訳日:2024-11-08 21:10:26 公開日:2024-11-01

# コード生成におけるモジュール性獲得の影響を再考する

Revisiting the Impact of Pursuing Modularity for Code Generation ( http://arxiv.org/abs/2407.11406v3 )

ライセンス: Link先を確認

Deokyeong Kang, Ki Jung Seo, Taeuk Kim,

翻訳日:2024-11-08 21:10:26 公開日:2024-11-01

# HDLCopilot: ハードウェア設計とライブラリの自然言語探索

HDLCopilot: Natural Language Exploration of Hardware Designs and Libraries ( http://arxiv.org/abs/2407.12749v2 )

ライセンス: Link先を確認

Manar Abdelatty, Jacob Rosenstein, Sherief Reda,

(参考訳) ハードウェア設計のワークフローは、様々な製造ラボからプロセスデザインキット(PDK)を扱うことが多く、それぞれが速度、電力、密度などのメトリクスに最適化された、独自の標準セルライブラリを含んでいる。これらのライブラリには、セルのタイミングと電気的性質に関する情報、セルレイアウトの詳細、プロセス設計規則に関する複数のビューが含まれている。エンジニアは通常、設計とターゲット技術の間をナビゲートして、エリア最適化のための特定のゲートの選択やクリティカルパス速度の向上など、異なる設計シナリオに関する情報決定を行う。ゲートや設計ルールに関する特定の情報を取得するために、この複雑な風景をナビゲートすることは、しばしば時間がかかり、エラーが発生します。そこで本研究では,ハードウェア設計やPDKとのインタラクションを,自然言語クエリを通じて効率化する,大規模言語モデルを用いたマルチエージェント協調フレームワークであるHDLCopilotを提案する。 HDLCopilotは、エンジニアがゲートや設計ルールに関する関連情報に迅速にアクセスし、領域、速度、電力に関するトレードオフを評価して、情報決定を効率的かつ正確に行うことを可能にする。このフレームワークは、複雑な自然言語クエリの多様なセットに対して96.33\%の実行精度を達成する。 HDLCopilotは、ハードウェア設計ワークフローにおける強力なアシスタントとしての地位を確立し、生産性を高め、潜在的なヒューマンエラーを減らす。

Hardware design workflows often involve working with Process Design Kits (PDKs) from various fabrication labs, each containing its own set of standard cell libraries optimized for metrics such as speed, power, or density. These libraries include multiple views for information on timing and electrical properties of cells, cell layout details, and process design rules. Engineers typically navigate between the design and the target technology to make informed decisions on different design scenarios, such as selecting specific gates for area optimization or enhancing critical path speed. Navigating this complex landscape to retrieve specific information about gates or design rules is often time-consuming and error-prone. To address this, we present HDLCopilot, a multi-agent collaborative framework powered by large language models that enables engineers to streamline interactions with hardware design and PDKs through natural language queries. HDLCopilot enables engineers to quickly access relevant information on gates and design rules, evaluate tradeoffs related to area, speed, and power in order to make informed decisions efficiently and accurately. The framework achieves an execution accuracy of 96.33\% on a diverse set of complex natural language queries. HDLCopilot positions itself as a powerful assistant in hardware design workflows, enhancing productivity and reducing potential human errors.

翻訳日:2024-11-08 20:36:48 公開日:2024-11-01

# 語彙によるスケーリング法則:より大きなモデルはより大きな語彙を保存する

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies ( http://arxiv.org/abs/2407.13623v3 )

ライセンス: Link先を確認

Chaofan Tao, Qian Liu, Longxu Dou, Niklas Muennighoff, Zhongwei Wan, Ping Luo, Min Lin, Ngai Wong,

(参考訳) 大規模言語モデル(LLM)のスケーリングに関する研究は、主に、語彙サイズの役割を見越して、モデルパラメータとトレーニングデータサイズに重点を置いている。語彙サイズがLLMスケーリング法にどう影響するかを,最大500B文字における33Mから3Bパラメータのトレーニングモデルを用いて検討した。本稿では,IsoFLOPs解析,微分推定,損失関数のパラメトリック適合という,計算-最適語彙サイズを予測するための3つの補完的手法を提案する。我々のアプローチは、最適な語彙サイズは計算予算に依存し、より大きなモデルはより大きな語彙を必要とするという結論に収束する。しかし、ほとんどのLLMは語彙サイズが不十分である。例えば、Llama2-70Bの最適な語彙サイズは少なくとも216Kであり、32Kの語彙の7倍である。 FLOPの予算の異なる3Bパラメータのトレーニングモデルを用いて予測を実証的に検証する。予測された最適な語彙サイズを採用することで、一般的に使用される語彙サイズよりも下流のパフォーマンスが一貫して向上する。従来の32Kから43Kへの語彙サイズ拡大により、同じ2.3e21 FLOPでARC-Challengeの性能を29.1から32.0に改善した。我々の研究は、トークン化とモデルのスケーリングを効果的に事前学習するために共同で検討することの重要性を強調している。コードとデモはhttps://github.com/sail-sg/scaling-with-vocabとhttps://hf.co/spaces/sail/scaling-with-vocab-demoで公開されている。

Research on scaling large language models (LLMs) has primarily focused on model parameters and training data size, overlooking the role of vocabulary size. We investigate how vocabulary size impacts LLM scaling laws by training models ranging from 33M to 3B parameters on up to 500B characters with various vocabulary configurations. We propose three complementary approaches for predicting the compute-optimal vocabulary size: IsoFLOPs analysis, derivative estimation, and parametric fit of the loss function. Our approaches converge on the conclusion that the optimal vocabulary size depends on the compute budget, with larger models requiring larger vocabularies. Most LLMs, however, use insufficient vocabulary sizes. For example, we predict that the optimal vocabulary size of Llama2-70B should have been at least 216K, 7 times larger than its vocabulary of 32K. We validate our predictions empirically by training models with 3B parameters across different FLOPs budgets. Adopting our predicted optimal vocabulary size consistently improves downstream performance over commonly used vocabulary sizes. By increasing the vocabulary size from the conventional 32K to 43K, we improve performance on ARC-Challenge from 29.1 to 32.0 with the same 2.3e21 FLOPs. Our work highlights the importance of jointly considering tokenization and model scaling for efficient pre-training. The code and demo are available at https://github.com/sail-sg/scaling-with-vocab and https://hf.co/spaces/sail/scaling-with-vocab-demo.

翻訳日:2024-11-08 20:25:29 公開日:2024-11-01

# 手術映像における弱教師付き物体検出とセグメンテーションのための空間的時間的知識の遠心化

Disentangling spatio-temporal knowledge for weakly supervised object detection and segmentation in surgical video ( http://arxiv.org/abs/2407.15794v4 )

ライセンス: Link先を確認

Guiqiu Liao, Matjaz Jogan, Sai Koushik, Eric Eaton, Daniel A. Hashimoto,

(参考訳) 弱教師付きビデオオブジェクトセグメンテーション(WSVOS)は、オブジェクトマスクの広範なトレーニングデータセットを必要としないセグメンテーションマップの識別を可能にし、代わりに、オブジェクトの存在を示す粗いビデオラベルに依存する。現在の最先端の手法では、モーションキューを使用する複数の独立した処理段階を必要とするか、あるいはエンドツーエンドのトレーニング可能なネットワークの場合、セグメント化の精度が欠如している。これにより、複数の手術ツールが視野内を頻繁に移動する手術ビデオのセマンティックアノテーションに対するWSVOSの適用が制限されるが、WSVOSでは通常遭遇するよりも難しい問題である。本稿では,半分離型知識蒸留を用いて時空間情報を分散し,高品質なクラスアクティベーションマップ(CAM)を予測するフレームワークであるVDST-Netを提案する。ビデオ中の物体の位置やタイミングに関する特定情報が提供されていない場合の時間的矛盾を解決するために設計された教師ネットワークは、時間的依存を活用して情報を統合する学生ネットワークで動作する。提案するフレームワークは,一般的な参照データセットや,オブジェクトが平均60倍未満のアノテートフレームに存在するような,より困難な手術用ビデオデータセット上で有効であることを示す。本手法は最先端技術より優れ,映像レベルの弱い監督下で優れたセグメンテーションマスクを生成する。

Weakly supervised video object segmentation (WSVOS) enables the identification of segmentation maps without requiring an extensive training dataset of object masks, relying instead on coarse video labels indicating object presence. Current state-of-the-art methods either require multiple independent stages of processing that employ motion cues or, in the case of end-to-end trainable networks, lack in segmentation accuracy, in part due to the difficulty of learning segmentation maps from videos with transient object presence. This limits the application of WSVOS for semantic annotation of surgical videos where multiple surgical tools frequently move in and out of the field of view, a problem that is more difficult than typically encountered in WSVOS. This paper introduces Video Spatio-Temporal Disentanglement Networks (VDST-Net), a framework to disentangle spatiotemporal information using semi-decoupled knowledge distillation to predict high-quality class activation maps (CAMs). A teacher network designed to resolve temporal conflicts when specifics about object location and timing in the video are not provided works with a student network that integrates information over time by leveraging temporal dependencies. We demonstrate the efficacy of our framework on a public reference dataset and on a more challenging surgical video dataset where objects are, on average, present in less than 60\% of annotated frames. Our method outperforms state-of-the-art techniques and generates superior segmentation masks under video-level weak supervision.

翻訳日:2024-11-08 15:45:25 公開日:2024-11-01

# CrysToGraph: 結晶材料特性の総合予測モデルとベンチマーク

CrysToGraph: A Comprehensive Predictive Model for Crystal Materials Properties and the Benchmark ( http://arxiv.org/abs/2407.16131v2 )

ライセンス: Link先を確認

Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong,

(参考訳) 格子を横切るイオン結合と秩序のある顕微鏡構造は、独特の対称性を持つ結晶を包含し、そのマクロな性質を決定づける。特に非伝統的な結晶は、非古典的な格子構造を示すか、またはエキゾチックな物理的性質を持つため、研究対象として興味をそそる。したがって、結晶の物理的および化学的性質を正確に予測するためには、長距離秩序を考えることが重要である。 GNNは結晶中の原子の局所的な環境を捉えるのに優れていますが、その深さが限られているため、しばしば長距離の相互作用を効果的に捉えるという課題に直面します。本稿では,非古典結晶系に特化して設計された新しい変圧器ベースの幾何グラフネットワークであるCrysToGraph ($\textbf{Crys}$tals with $\textbf{T}$ransformers $\textbf{o}$n $\textbf{Graph}$sと,欠陥結晶,低次元結晶,MOFなどの非古典結晶材料に対するモデル予測性能を評価するための総合ベンチマークであるUnconvBenchを提案する。 CrysToGraphは、トランスフォーマーベースのグラフ畳み込みブロックと、グラフワイドトランスフォーマーブロックとの長距離インタラクションを効果的にキャプチャする。 CrysToGraphは、非伝統的な結晶材料を複数のタスクでモデル化する効果を証明し、また、非伝統的な結晶と伝統的な結晶の両方のベンチマークにおいて、新しい最先端の結果を達成して、既存の方法よりも優れていることを証明している。

The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of crystals, it is crucial to consider long-range orders. While GNN excels at capturing the local environment of atoms in crystals, they often face challenges in effectively capturing longer-ranged interactions due to their limited depth. In this paper, we propose CrysToGraph ($\textbf{Crys}$tals with $\textbf{T}$ransformers $\textbf{o}$n $\textbf{Graph}$s), a novel transformer-based geometric graph network designed specifically for unconventional crystalline systems, and UnconvBench, a comprehensive benchmark to evaluate models' predictive performance on unconventional crystal materials such as defected crystals, low-dimension crystals and MOF. CrysToGraph effectively captures short-range interactions with transformer-based graph convolution blocks as well as long-range interactions with graph-wise transformer blocks. CrysToGraph proofs its effectiveness in modelling unconventional crystal materials in multiple tasks, and moreover, it outperforms most existing methods, achieving new state-of-the-art results on the benchmarks of both unconventional crystals and traditional crystals.

翻訳日:2024-11-08 15:34:26 公開日:2024-11-01

# QLDPC手術の改善 : 論理的計測とブリッジコード

Improved QLDPC Surgery: Logical Measurements and Bridging Codes ( http://arxiv.org/abs/2407.18393v2 )

ライセンス: Link先を確認

Andrew Cross, Zhiyang He, Patrick Rall, Theodore Yoder,

(参考訳) 本稿では,Cohen et al ~ (Sci.~Adv.〜8,eabn1717) の構成に基づく論理的測定法であるゲージ固定型QLDPC手術法を提案する。提案手法はタナーグラフの拡張特性を利用してQLDPC手術の空間オーバーヘッドを大幅に低減する。ある場合には、重量$w$論理演算子をフォールトトレラントに測定するために、$\Theta(w)$ ancilla qubitsしか必要としない。提案手法の符号距離と故障距離を厳密に解析し,最大故障距離を実現するモジュールデコーディングアルゴリズムを提案する。さらに,論理演算子の耐故障継手測定を容易にするブリッジシステムを導入する。このブリッジ構築により、我々のスキームは、異なるQLDPC符号のファミリーを1つのユニバーサルアーキテクチャに接続するために使用できる。ツールボックスを適用して、[144,12,12]二変量自転車のコードですべての論理的なクリフォードゲートを実行する方法を示します。本手法では接続グラフに103個のアンシラ量子ビットを付加し,12個の論理量子ビットのうちの1つをゲート合成のアンシラとして用いる。論理的測定は、288 パウリ積の測定を実装するために Bravyi et al ~ (Nature 627, 778-782) によって研究された自己同型ゲートと組み合わせられる。本稿では,BPOSDとマッチングを組み合わせたモジュール型デコーダを用いて,回路レベルのノイズシミュレーションにより提案手法の実用性を実証する。

In this paper, we introduce the gauge-fixed QLDPC surgery scheme, an improved logical measurement scheme based on the construction of Cohen et al.~(Sci.~Adv.~8, eabn1717). Our scheme leverages expansion properties of the Tanner graph to substantially reduce the space overhead of QLDPC surgery. In certain cases, we only require $\Theta(w)$ ancilla qubits to fault-tolerantly measure a weight $w$ logical operator. We provide rigorous analysis for the code distance and fault distance of our schemes, and present a modular decoding algorithm that achieves maximal fault-distance. We further introduce a bridge system to facilitate fault-tolerant joint measurements of logical operators. Augmented by this bridge construction, our scheme can be used to connect different families of QLDPC codes into one universal architecture. Applying our toolbox, we show how to perform all logical Clifford gates on the [[144,12,12]] bivariate bicycle code. Our scheme adds 103 ancilla qubits into the connectivity graph, and one of the twelve logical qubits is used as an ancilla for gate synthesis. Logical measurements are combined with the automorphism gates studied by Bravyi et al.~(Nature 627, 778-782) to implement 288 Pauli product measurements. We demonstrate the practicality of our scheme through circuit-level noise simulations, leveraging our proposed modular decoder that combines BPOSD with matching.

翻訳日:2024-11-08 14:50:05 公開日:2024-11-01

# QLDPC手術の改善 : 論理的計測とブリッジコード

Improved QLDPC Surgery: Logical Measurements and Bridging Codes ( http://arxiv.org/abs/2407.18393v3 )

ライセンス: Link先を確認

Andrew Cross, Zhiyang He, Patrick Rall, Theodore Yoder,

翻訳日:2024-11-08 14:50:05 公開日:2024-11-01

# 逆ロバスト決定変換器

Adversarially Robust Decision Transformer ( http://arxiv.org/abs/2407.18414v2 )

ライセンス: Link先を確認

Xiaohang Tang, Afonso Marques, Parameswaran Kamalaruban, Ilija Bogunovic,

(参考訳) Reinforcement Learning via Supervised Learning (RvS) 手法の代表的な1つであるDecision Transformer (DT) は、強力なTransformerアーキテクチャを活用して、オフライン学習タスクにおいて強力なパフォーマンスを実現している。しかしながら、敵の環境では、リターンは意思決定者と敵双方の戦略に依存しているため、これらの手法は損なわれない。観測されたリターンに条件付き確率モデルのトレーニングは、データセットのリターンを達成する軌道が、最適でない振る舞いの逆によって達成された可能性があるため、一般化に失敗する可能性がある。そこで我々は,最低ケース対応のRvSアルゴリズムであるAdversarially Robust Decision Transformer (ARDT)を提案する。 ARDTは、最小限の期待回帰によって学習した最悪のケースリターンとターゲットリターンを一致させ、強力なテストタイム敵に対する堅牢性を高める。完全なデータカバレッジを持つシーケンシャルゲームで実施された実験では、ARDTは最大の対向ロバスト性を持つ解である最大(ナッシュ平衡)戦略を生成することができる。大規模なシーケンシャルゲームや、部分的なデータカバレッジを持つ連続的敵RL環境では、ARDTは強力なテストタイムの敵に対して非常に優れたロバスト性を示し、現代のDT法と比較して最悪のケースリターンを達成している。

Decision Transformer (DT), as one of the representative Reinforcement Learning via Supervised Learning (RvS) methods, has achieved strong performance in offline learning tasks by leveraging the powerful Transformer architecture for sequential decision-making. However, in adversarial environments, these methods can be non-robust, since the return is dependent on the strategies of both the decision-maker and adversary. Training a probabilistic model conditioned on observed return to predict action can fail to generalize, as the trajectories that achieve a return in the dataset might have done so due to a suboptimal behavior adversary. To address this, we propose a worst-case-aware RvS algorithm, the Adversarially Robust Decision Transformer (ARDT), which learns and conditions the policy on in-sample minimax returns-to-go. ARDT aligns the target return with the worst-case return learned through minimax expectile regression, thereby enhancing robustness against powerful test-time adversaries. In experiments conducted on sequential games with full data coverage, ARDT can generate a maximin (Nash Equilibrium) strategy, the solution with the largest adversarial robustness. In large-scale sequential games and continuous adversarial RL environments with partial data coverage, ARDT demonstrates significantly superior robustness to powerful test-time adversaries and attains higher worst-case returns compared to contemporary DT methods.

翻訳日:2024-11-08 14:50:05 公開日:2024-11-01

# ハイゼンベルクスピンチェーン量子電池のエルゴトロピーとキャパシティ最適化

Ergotropy and capacity optimization in Heisenberg spin-chain quantum batteries ( http://arxiv.org/abs/2408.00133v2 )

ライセンス: Link先を確認

Asad Ali, Saif Al-Kuwari, M. I. Hussain, Tim Byrnes, M. T. Rahim, James Q. Quach, Mehrdad Ghominejad, Saeed Haddadi,

(参考訳) 本研究は, ハイゼンベルクスピンモデルを用いた有限スピン量子電池 (QB) の性能を, ジアルシンスキー-モリヤ (DM) とカプラン-シェフトマン-エンチン-ヴルマン-アハロニー (KSEA) 相互作用を用いて検討した。 QBは局所的不均一磁場における相互作用量子スピンとしてモデル化され、可変ゼーマン分裂を誘導する。最近 Yang et al [Phys. Rev. Lett. 131, 030402 (2023)] が検討したように, 最大抽出可能作業, エルゴトロピー, QBs の容量に関する解析式を導出する。これらの量は、前述の研究で示されたように、特定の量子相関を通じて分析的にリンクされる。異なるハイゼンベルクスピンチェーンモデルは異なる条件下での異なる挙動を示し、QB性能を最適化するためのモデル選択の重要性を強調している。反強磁性(AFM)系では、最大エルゴトロピーはいずれのスピンにも作用するゼーマン分裂場と共に起こるが、強磁性(FM)系は均一なゼーマン場から恩恵を受ける。 AFM症例のエルゴトロピーは, FM症例と比較して温度上昇に対して概ね強いが, 温度はQB性能に大きく影響した。 DMとKSEAの結合はQBのキャパシティとエルゴトロピーの抽出を著しく向上させる。しかし、これらの相互作用のさらなる増加がキャパシティとエルゴトロピーの急激な減少を引き起こすしきい値が存在する。この挙動は温度と量子コヒーレンスの影響を受けており、これは突然の相転移の発生を示唆している。 Baumgratzらによって提唱された量子コヒーレンスの資源理論(Phys. Lett. 113, 140401 (2014))は、エルゴトロピーとキャパシティを高める上で重要な役割を果たす。しかしながら、エルゴトロピーはシステムの能力とコヒーレンス量の両方によって制限される。これらの知見はスピンベースのQBの理論的枠組みを支持しており、将来の量子エネルギー貯蔵装置の研究に役立つかもしれない。

This study examines the performance of finite spin quantum batteries (QBs) using Heisenberg spin models with Dzyaloshinsky-Moriya (DM) and Kaplan--Shekhtman--Entin-Wohlman--Aharony (KSEA) interactions. The QBs are modeled as interacting quantum spins in local inhomogeneous magnetic fields, inducing variable Zeeman splitting. We derive analytical expressions for the maximal extractable work, ergotropy and the capacity of QBs, as recently examined by Yang et al. [Phys. Rev. Lett. 131, 030402 (2023)]. These quantities are analytically linked through certain quantum correlations, as posited in the aforementioned study. Different Heisenberg spin chain models exhibit distinct behaviors under varying conditions, emphasizing the importance of model selection for optimizing QB performance. In antiferromagnetic (AFM) systems, maximum ergotropy occurs with a Zeeman splitting field applied to either spin, while ferromagnetic (FM) systems benefit from a uniform Zeeman field. Temperature significantly impacts QB performance, with ergotropy in the AFM case being generally more robust against temperature increases compared to the FM case. Incorporating DM and KSEA couplings can significantly enhance the capacity and ergotropy extraction of QBs. However, there exists a threshold beyond which additional increases in these interactions cause a sharp decline in capacity and ergotropy. This behavior is influenced by temperature and quantum coherence, which signal the occurrence of a sudden phase transition. The resource theory of quantum coherence proposed by Baumgratz et al. [Phys. Rev. Lett. 113, 140401 (2014)] plays a crucial role in enhancing ergotropy and capacity. However, ergotropy is limited by both the system's capacity and the amount of coherence. These findings support the theoretical framework of spin-based QBs and may benefit future research on quantum energy storage devices.

翻訳日:2024-11-08 13:40:31 公開日:2024-11-01

# テキスト属性を計算するためのテーブル変換器

Table Transformers for Imputing Textual Attributes ( http://arxiv.org/abs/2408.02128v2 )

ライセンス: Link先を確認

Ting-Ruen Wei, Yuan Wang, Yoshitaka Inoue, Hsin-Tai Wu, Yi Fang,

(参考訳) ダウンストリームタスクのパフォーマンスは通常、トレーニングデータセットの完全性に依存するため、表形式のデータセットでのデータの欠落は一般的な問題である。従来のデータ計算手法では、数値列と分類列に重点を置いていたが、変換器をベースとしたテーブル変換器(TTITA)と呼ばれる新しいエンドツーエンドの手法を提案し、テーブル内の他の列を用いて非構造化テキスト列をインプットする。提案手法は,3つのデータセットに対して広範な実験を行い,リカレントニューラルネットワークやLlama2などのベースラインモデルよりも優れた性能を示す。ターゲットシーケンスの長さが長い場合には、パフォーマンスの改善がより重要である。さらに、マルチタスク学習を組み込んで、不均一な列を同時にインプットし、テキストインプットの性能を高める。また、現実的なアプリケーションではChatGPTと定性的に比較する。

Missing data in tabular dataset is a common issue as the performance of downstream tasks usually depends on the completeness of the training dataset. Previous missing data imputation methods focus on numeric and categorical columns, but we propose a novel end-to-end approach called Table Transformers for Imputing Textual Attributes (TTITA) based on the transformer to impute unstructured textual columns using other columns in the table. We conduct extensive experiments on three datasets, and our approach shows competitive performance outperforming baseline models such as recurrent neural networks and Llama2. The performance improvement is more significant when the target sequence has a longer length. Additionally, we incorporate multi-task learning to simultaneously impute for heterogeneous columns, boosting the performance for text imputation. We also qualitatively compare with ChatGPT for realistic applications.

翻訳日:2024-11-08 12:55:51 公開日:2024-11-01

# コンテキストコンカヤパラメータ:コミットメッセージ生成におけるプロプライエタリLLMの性能

Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation ( http://arxiv.org/abs/2408.02502v2 )

ライセンス: Link先を確認

Aaron Imani, Iftekhar Ahmed, Mohammad Moshirpour,

(参考訳) コミットメッセージは、自然言語を使ってコミットで行った変更の説明を提供する。近年のLLM(Large Language Models)の発展は、Omniscient Message Generator (OMG)のような高品質なコミットメッセージの生成に寄与している。この方法はGPT-4を使って最先端のコミットメッセージを生成する。しかし、コーディングタスクにおける GPT-4 のような独自 LLM の使用は、プライバシとサステナビリティの懸念を生じさせ、産業的採用を妨げる可能性がある。コンパイラバリデーションなどの開発者タスクにおいて,オープンソースのLLMが競争力のあるパフォーマンスを達成したことを考慮し,OMGに匹敵するコミットメッセージの生成に利用することができるかを検討する。実験の結果,オープンソース LLM はOMG に匹敵するコミットメッセージを生成することができることがわかった。さらに,4ビット量子化8BオープンソースLCMを用いたCMG手法であるlOcal MessagE GenerAtor (OMEGA)を提案する。 OMEGAは最先端のコミットメッセージを生成し、実践者の好みでGPT-4のパフォーマンスを上回っている。

Commit messages provide descriptions of the modifications made in a commit using natural language, making them crucial for software maintenance and evolution. Recent developments in Large Language Models (LLMs) have led to their use in generating high-quality commit messages, such as the Omniscient Message Generator (OMG). This method employs GPT-4 to produce state-of-the-art commit messages. However, the use of proprietary LLMs like GPT-4 in coding tasks raises privacy and sustainability concerns, which may hinder their industrial adoption. Considering that open-source LLMs have achieved competitive performance in developer tasks such as compiler validation, this study investigates whether they can be used to generate commit messages that are comparable with OMG. Our experiments show that an open-source LLM can generate commit messages that are comparable to those produced by OMG. In addition, through a series of contextual refinements, we propose lOcal MessagE GenerAtor (OMEGA) , a CMG approach that uses a 4-bit quantized 8B open-source LLM. OMEGA produces state-of-the-art commit messages, surpassing the performance of GPT-4 in practitioners' preference.

翻訳日:2024-11-08 12:55:50 公開日:2024-11-01

# ゲームにおける性能予測とメカニズム設計

Performative Prediction on Games and Mechanism Design ( http://arxiv.org/abs/2408.05146v2 )

ライセンス: Link先を確認

António Góis, Mehrnaz Mofakhami, Fernando P. Santos, Gauthier Gidel, Simon Lacoste-Julien,

(参考訳) エージェントは集団の行動に依存する個々の目標を持つことが多い。エージェントが集団行動の予測を信頼し、戦略的に適応すれば、そのような予測は結果に非自明に影響を与え、結果としてパフォーマンス予測の一形態となる。この効果は、パンデミックの予測から選挙投票まで、あらゆるシナリオで見られるが、既存の研究は予測されたエージェント間の相互依存を無視している。この方向への第一歩として、エージェントが過去の正確性に基づいて予測を信頼するかを動的に決定する集団リスクジレンマについて検討する。予測が集合的な結果を形成するにつれて、社会福祉は関心の指標として自然に現れる。精度と福祉の相互作用を考察し、安定した正確な予測を求めることが、我々の設定において高い確率で社会福祉を最小化できることを実証する。ベイズエージェントの行動モデルに関する知識を仮定することにより、よりよいトレードオフをどうやって達成し、それらをメカニズム設計に利用するかを示す。

Agents often have individual goals which depend on a group's actions. If agents trust a forecast of collective action and adapt strategically, such prediction can influence outcomes non-trivially, resulting in a form of performative prediction. This effect is ubiquitous in scenarios ranging from pandemic predictions to election polls, but existing work has ignored interdependencies among predicted agents. As a first step in this direction, we study a collective risk dilemma where agents dynamically decide whether to trust predictions based on past accuracy. As predictions shape collective outcomes, social welfare arises naturally as a metric of concern. We explore the resulting interplay between accuracy and welfare, and demonstrate that searching for stable accurate predictions can minimize social welfare with high probability in our setting. By assuming knowledge of a Bayesian agent behavior model, we then show how to achieve better trade-offs and use them for mechanism design.

翻訳日:2024-11-08 12:00:36 公開日:2024-11-01

# フォールトトレラント量子入出力

Fault-tolerant quantum input/output ( http://arxiv.org/abs/2408.05260v2 )

ライセンス: Link先を確認

Matthias Christandl, Omar Fawzi, Ashutosh Goswami,

(参考訳) フォールトトレラント計算の一般的なシナリオは、Shorのファクタリングアルゴリズムのような古典関数を計算する量子アルゴリズムのフォールトトレラントな実現に関するものである。特にこれは、量子アルゴリズムへの入力と出力が古典的であることを意味する。スタンドアローンのシングルコア量子コンピュータとは対照的に、多くの分散シナリオでは、量子情報は1つの量子情報処理システムから別の量子に渡さなければならない。このような状況では、量子情報処理装置は量子入力、量子出力、あるいはその両方を持ち、互いに量子ビットを渡す。我々は[Kitaev, 1997]のフォールトトレラント・フレームワークで、量子入力と出力を持つ任意の量子回路をフォールトトレラント・サーキットに変換し、入力と出力に何らかの制御されたノイズを印加した理想回路を生成することを示す。このフレームワークはステートメントの直接的な構成を可能にし、汎用的な将来のアプリケーションを可能にする。これを2つの具体的な応用例で説明する。第一に、故障した符号化と復号処理を伴うノイズのあるチャネル上の通信 [Christandl and M{\"u}ller-Hermes, 2024]。線形最小距離の通信符号に対しては、一般的な雑音(コヒーレントエラーを含む)に対するフォールトトレラントエンコーダとデコーダを構築する。より弱いが標準的な局所確率雑音のモデルに対して、一定の分数ランダム誤差を補正できる通信符号に対して、フォールトトレラントエンコーダとデコーダを得る。第2の応用では、[Gottesman, 2014] の構成における状態準備回路として、一般雑音に対するフォールトトレラントな量子計算が一定の空間オーバーヘッドで達成できることを示すために、我々の結果を用いている。

Usual scenarios of fault-tolerant computation are concerned with the fault-tolerant realization of quantum algorithms that compute classical functions, such as Shor's algorithm for factoring. In particular, this means that input and output to the quantum algorithm are classical. In contrast to stand-alone single-core quantum computers, in many distributed scenarios, quantum information might have to be passed on from one quantum information processing system to another one, possibly via noisy quantum communication channels with noise levels above fault-tolerant thresholds. In such situations, quantum information processing devices will have quantum inputs, quantum outputs or even both, which pass qubits among each other. Working in the fault-tolerant framework of [Kitaev, 1997], we show that any quantum circuit with quantum input and output can be transformed into a fault-tolerant circuit that produces the ideal circuit with some controlled noise applied at the input and output. The framework allows the direct composition of the statements, enabling versatile future applications. We illustrate this with two concrete applications. The first one concerns communication over a noisy channel with faulty encoding and decoding operations [Christandl and M{\"u}ller-Hermes, 2024]. For communication codes with linear minimum distance, we construct fault-tolerant encoders and decoders for general noise (including coherent errors). For the weaker, but standard, model of local stochastic noise, we obtain fault-tolerant encoders and decoders for any communication code that can correct a constant fraction random errors. In the second application, we use our result for a state preparation circuit within the construction of [Gottesman, 2014] to establish that fault-tolerant quantum computation for general noise can be achieved with constant space overhead.

翻訳日:2024-11-08 12:00:36 公開日:2024-11-01

# 相互学習

Reciprocal Learning ( http://arxiv.org/abs/2408.06257v2 )

ライセンス: Link先を確認

Julian Rodemann, Christoph Jansen, Georg Schollmeyer,

(参考訳) 我々は、幅広い機械学習アルゴリズムが1つのパラダイムの特定の例であることを示した。これらのインスタンスは、マルチアームのバンディットに関するアクティブな学習から、自己学習まで多岐にわたる。これらのアルゴリズムは、データからパラメータを学習するだけでなく、その逆も示す: 現在のモデルに適合する方法で、トレーニングデータを反復的に変更する。本稿では,これらのアルゴリズムの一般化として,決定論の言語を用いた相互学習を紹介する。これにより、どの条件で収束するかを研究できます。鍵となるのは、バナッハの不動点定理が適用されるような相互学習契約を保証することである。このようにして、相反学習アルゴリズムは損失関数の比較的穏やかな仮定の下で線形速度でほぼ最適モデルに収束する。我々はこれらの知見を解釈し、特定のアクティブラーニング、自己学習、およびバンディットのアルゴリズムに関連づけられたコースを提供する。

We demonstrate that a wide array of machine learning algorithms are specific instances of one single paradigm: reciprocal learning. These instances range from active learning over multi-armed bandits to self-training. We show that all these algorithms do not only learn parameters from data but also vice versa: They iteratively alter training data in a way that depends on the current model fit. We introduce reciprocal learning as a generalization of these algorithms using the language of decision theory. This allows us to study under what conditions they converge. The key is to guarantee that reciprocal learning contracts such that the Banach fixed-point theorem applies. In this way, we find that reciprocal learning algorithms converge at linear rates to an approximately optimal model under relatively mild assumptions on the loss function, if their predictions are probabilistic and the sample adaption is both non-greedy and either randomized or regularized. We interpret these findings and provide corollaries that relate them to specific active learning, self-training, and bandit algorithms.

翻訳日:2024-11-08 11:38:16 公開日:2024-11-01

# 相互学習

Reciprocal Learning ( http://arxiv.org/abs/2408.06257v3 )

ライセンス: Link先を確認

Julian Rodemann, Christoph Jansen, Georg Schollmeyer,

翻訳日:2024-11-08 11:38:16 公開日:2024-11-01

# RED-CT:計算社会科学のためのエッジ分類器の訓練と展開にLLMラベルデータを使用するシステム設計手法

RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science ( http://arxiv.org/abs/2408.08217v2 )

ライセンス: Link先を確認

David Farr, Nico Manzonelli, Iain Cruickshank, Jevin West,

(参考訳) 大規模言語モデル(LLM)は、構造化されていない自然言語データを迅速に分析し分類する能力を向上した。しかしながら、コスト、ネットワーク制限、セキュリティ上の制約に関する懸念は、彼らの作業プロセスへの統合に問題を引き起こしている。本研究では,下流教師あり学習課題において,LLMを不完全なデータアノテータとして活用するためのシステム設計アプローチを採用し,分類性能の向上を目的とした新たなシステム介入対策を導入する。提案手法は, LLM生成ラベルを8つのテストのうち7つのテストで上回り, 多くの産業ユースケースにおいて, 専門的, 教師あり学習モデルの設計と展開にLLMを組み込むことの効果的な戦略を示す。

Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.

翻訳日:2024-11-08 07:29:14 公開日:2024-11-01

# シークエンシャルレコメンデーションのためのインスタンスワイズ LoRA を用いた言語モデルのカスタマイズ

Customizing Language Models with Instance-wise LoRA for Sequential Recommendation ( http://arxiv.org/abs/2408.10159v2 )

ライセンス: Link先を確認

Xiaoyu Kong, Jiancan Wu, An Zhang, Leheng Sheng, Hui Lin, Xiang Wang, Xiangnan He,

(参考訳) 時系列レコメンデーションシステムは、過去のインタラクションを分析し、個別の好みに合わせてレコメンデーションを調整することで、ユーザの次の関心項目を予測する。知識理解と推論におけるLLM(Large Language Models)の強みを生かして、近年のアプローチでは、LLMを言語生成パラダイムを通じてシーケンシャルなレコメンデーションに応用している。これらの手法は,Low-Rank Adaptation (LoRA) モジュールを用いて,ユーザ動作シーケンスをLLM微調整のプロンプトに変換する。しかし、多様なユーザの行動にまたがるLoRAの均一な適用は、個々の変動を捉えるのに失敗することがある。これらの課題に対処するため、我々は、LoRAとMixture of Experts (MoE)フレームワークを統合するインスタンスワイドLoRA(iLoRA)を提案する。 iLoRAはさまざまな専門家の配列を生成し、それぞれがユーザの好みの特定の側面をキャプチャし、シーケンス表現ガイドゲート関数を導入している。このゲート関数は歴史的相互作用シーケンスを処理してリッチな表現を生成し、ゲーティングネットワークにカスタマイズされた専門家参加重みを出力させる。この調整されたアプローチは、ネガティブな伝達を軽減し、多様な行動パターンに動的に適応する。 3つのベンチマークデータセットに対する大規模な実験は、iLoRAの有効性を示し、ユーザ固有の好みをキャプチャし、レコメンデーションの精度を向上させる既存の方法と比較して、その優れたパフォーマンスを強調している。

Sequential recommendation systems predict a user's next item of interest by analyzing past interactions, aligning recommendations with individual preferences. Leveraging the strengths of Large Language Models (LLMs) in knowledge comprehension and reasoning, recent approaches have applied LLMs to sequential recommendation through language generation paradigms. These methods convert user behavior sequences into prompts for LLM fine-tuning, utilizing Low-Rank Adaptation (LoRA) modules to refine recommendations. However, the uniform application of LoRA across diverse user behaviors sometimes fails to capture individual variability, leading to suboptimal performance and negative transfer between disparate sequences. To address these challenges, we propose Instance-wise LoRA (iLoRA), integrating LoRA with the Mixture of Experts (MoE) framework. iLoRA creates a diverse array of experts, each capturing specific aspects of user preferences, and introduces a sequence representation guided gate function. This gate function processes historical interaction sequences to generate enriched representations, guiding the gating network to output customized expert participation weights. This tailored approach mitigates negative transfer and dynamically adjusts to diverse behavior patterns. Extensive experiments on three benchmark datasets demonstrate the effectiveness of iLoRA, highlighting its superior performance compared to existing methods in capturing user-specific preferences and improving recommendation accuracy.

翻訳日:2024-11-08 06:44:48 公開日:2024-11-01

Xiaoyu Kong, Jiancan Wu, An Zhang, Leheng Sheng, Hui Lin, Xiang Wang, Xiangnan He,

(参考訳) 時系列レコメンデーションシステムは、ユーザの過去のインタラクションに基づいて次のインタラクション項目を予測し、個別の好みに合わせてレコメンデーションを調整する。知識理解と推論において、LLM(Large Language Models)の強みを活用することで、最近のアプローチは、LLMをシーケンシャルなレコメンデーションに適用したいと考えている。一般的なパラダイムは、ユーザ動作シーケンスを命令データに変換し、Low-Rank Adaption (LoRA)のようなパラメータ効率の良い細調整(PEFT)手法でLPMを微調整する。しかし、多様なユーザの行動にまたがるLoRAの均一な適用は、個々の変動を捉えるには不十分であり、異なるシーケンス間の負の移動をもたらす。これらの課題に対処するために、インスタンスワイズLoRA(iLoRA)を提案する。逐次レコメンデーションタスクをマルチタスク学習の一形態として,LoRAとMixture of Experts(MoE)フレームワークを統合した。このアプローチは、さまざまな専門家にユーザ行動のさまざまな側面を捉えるように促します。さらに、各ユーザシーケンスごとにカスタマイズされた専門家参加ウェイトを生成するシーケンス表現ガイドゲート関数を導入し、インスタンスワイドレコメンデーションの動的パラメータ調整を可能にする。逐次レコメンデーションでは,iLoRA は基本 LoRA よりも11.4\% の相対的改善を達成し,トレーニング可能なパラメータの相対的増加は 1\% 未満である。 3つのベンチマークデータセットに対する大規模な実験は、iLoRAの有効性を示し、負の転送を緩和し、レコメンデーション精度を向上させる既存の方法に比べて優れたパフォーマンスを示している。私たちのデータとコードはhttps://github.com/AkaliKong/iLoRA.comで公開されています。

Sequential recommendation systems predict the next interaction item based on users' past interactions, aligning recommendations with individual preferences. Leveraging the strengths of Large Language Models (LLMs) in knowledge comprehension and reasoning, recent approaches are eager to apply LLMs to sequential recommendation. A common paradigm is converting user behavior sequences into instruction data, and fine-tuning the LLM with parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaption (LoRA). However, the uniform application of LoRA across diverse user behaviors is insufficient to capture individual variability, resulting in negative transfer between disparate sequences. To address these challenges, we propose Instance-wise LoRA (iLoRA). We innovatively treat the sequential recommendation task as a form of multi-task learning, integrating LoRA with the Mixture of Experts (MoE) framework. This approach encourages different experts to capture various aspects of user behavior. Additionally, we introduce a sequence representation guided gate function that generates customized expert participation weights for each user sequence, which allows dynamic parameter adjustment for instance-wise recommendations. In sequential recommendation, iLoRA achieves an average relative improvement of 11.4\% over basic LoRA in the hit ratio metric, with less than a 1\% relative increase in trainable parameters. Extensive experiments on three benchmark datasets demonstrate the effectiveness of iLoRA, highlighting its superior performance compared to existing methods in mitigating negative transfer and improving recommendation accuracy. Our data and code are available at https://github.com/AkaliKong/iLoRA.

翻訳日:2024-11-08 06:44:48 公開日:2024-11-01

# LongVILA:ロングビデオのためのロングコンテキストビジュアル言語モデルのスケーリング

LongVILA: Scaling Long-Context Visual Language Models for Long Videos ( http://arxiv.org/abs/2408.10188v4 )

ライセンス: Link先を確認

Fuzhao Xue, Yukang Chen, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han,

(参考訳) ロングコンテクスト能力はマルチモーダル基礎モデル、特にロングビデオ理解において重要である。本稿では,LongVILAを提案する。LongVILAは長文ビジュアル言語モデルのためのフルスタックソリューションで,アルゴリズムとシステムを共同設計する。モデルトレーニングでは、既存のVLMをアップグレードして、2つの追加ステージ、すなわち、長期文脈拡張と長期ビデオ教師付き微調整を組み込むことにより、長いビデオ理解を支援する。しかし、長ビデオのトレーニングは計算的かつメモリ集約的である。我々は,長いビデオのトレーニングと推論を効率的に並列化し,勾配チェックポイントを使わずに256GPU上で2Mのコンテキスト長トレーニングを可能にする,長文マルチモーダルシーケンス並列(MM-SP)システムを提案する。 LongVILA は VILA の動画フレーム数を 8 から 2048 に効率的に拡張し、長いビデオキャプションスコアを 2.00 から 3.26 に改善し、6,000 フレーム (100 万枚以上のトークン) のビデオニードル・イン・ア・ヘイスタックで 99.8% の精度を実現した。 LongVILA-7B は VideoMME ベンチマークで強い精度を示す。加えて、MM-SPはリングスタイルのシーケンス並列性より2.1x - 5.7倍速く、ハイブリッドコンテキストとテンソル並列性を持つメガトロンより1.1x - 1.4倍速い。さらに、Hugging Face Transformersとシームレスに統合される。

Long-context capability is critical for multi-modal foundation models, especially for long video understanding. We introduce LongVILA, a full-stack solution for long-context visual-language models \qinghao{by co-designing the algorithm and system. For model training, we upgrade existing VLMs to support long video understanding by incorporating two additional stages, {\em i.e.}, long context extension and long video supervised fine-tuning. However, training on long video is computationally and memory intensive. We introduce the long-context Multi-Modal Sequence Parallelism (MM-SP) system that efficiently parallelizes long video training and inference, enabling 2M context length training on 256 GPUs without any gradient checkpointing. LongVILA efficiently extends the number of video frames of VILA from 8 to 2048, improving the long video captioning score from 2.00 to 3.26 (out of 5), achieving 99.8% accuracy in 6,000-frame (more than 1 million tokens) video needle-in-a-haystack. LongVILA-7B demonstrates strong accuracy on the VideoMME benchmark, i.e., 61.8% with subtitle. Besides, MM-SP is 2.1x - 5.7x faster than ring style sequence parallelism and 1.1x - 1.4x faster than Megatron with a hybrid context and tensor parallelism. Moreover, it seamlessly integrates with Hugging Face Transformers.

翻訳日:2024-11-08 06:44:48 公開日:2024-11-01

# LongVILA:ロングビデオのためのロングコンテキストビジュアル言語モデルのスケーリング

LongVILA: Scaling Long-Context Visual Language Models for Long Videos ( http://arxiv.org/abs/2408.10188v5 )

ライセンス: Link先を確認

(参考訳) ロングコンテクスト能力はマルチモーダル基礎モデル、特にロングビデオ理解において重要である。本稿では,LongVILAを提案する。LongVILAは,アルゴリズムとシステムの共同設計により,長文ビジュアル言語モデルのためのフルスタックソリューションである。モデルトレーニングでは,既存のVLMをアップグレードして,2つの追加ステージ,すなわち長期文脈拡張と長期ビデオ教師付き微調整を組み込むことで,長時間ビデオ理解を支援する。しかし、長ビデオのトレーニングは計算的かつメモリ集約的である。我々は,長いビデオのトレーニングと推論を効率的に並列化し,勾配チェックポイントを使わずに256GPU上で2Mのコンテキスト長トレーニングを可能にする,長文マルチモーダルシーケンス並列(MM-SP)システムを提案する。 LongVILA は VILA の動画フレーム数を 8 から 2048 に効率的に拡張し、長いビデオキャプションスコアを 2.00 から 3.26 に改善し、6,000 フレーム (100 万枚以上のトークン) のビデオニードル・イン・ア・ヘイスタックで 99.8% の精度を実現した。 LongVILA-7B は VideoMME ベンチマークで強い精度を示す。加えて、MM-SPはリングスタイルのシーケンス並列性より2.1x - 5.7倍速く、ハイブリッドコンテキストとテンソル並列性を持つメガトロンより1.1x - 1.4倍速い。さらに、Hugging Face Transformersとシームレスに統合される。

Long-context capability is critical for multi-modal foundation models, especially for long video understanding. We introduce LongVILA, a full-stack solution for long-context visual-language models by co-designing the algorithm and system. For model training, we upgrade existing VLMs to support long video understanding by incorporating two additional stages, i.e., long context extension and long video supervised fine-tuning. However, training on long video is computationally and memory intensive. We introduce the long-context Multi-Modal Sequence Parallelism (MM-SP) system that efficiently parallelizes long video training and inference, enabling 2M context length training on 256 GPUs without any gradient checkpointing. LongVILA efficiently extends the number of video frames of VILA from 8 to 2048, improving the long video captioning score from 2.00 to 3.26 (out of 5), achieving 99.8% accuracy in 6,000-frame (more than 1 million tokens) video needle-in-a-haystack. LongVILA-7B demonstrates strong accuracy on the VideoMME benchmark, i.e., 61.8% with subtitle. Besides, MM-SP is 2.1x - 5.7x faster than ring style sequence parallelism and 1.1x - 1.4x faster than Megatron with a hybrid context and tensor parallelism. Moreover, it seamlessly integrates with Hugging Face Transformers.

翻訳日:2024-11-08 06:44:48 公開日:2024-11-01

# 食品融合 : 拡散モデルによる食品画像合成の新しいアプローチ

Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models ( http://arxiv.org/abs/2408.14135v2 )

ライセンス: Link先を確認

Chaohua Shi, Xuan Wang, Si Shi, Xule Wang, Mingrui Zhu, Nannan Wang, Xinbo Gao,

(参考訳) 食品画像の構成には、既存の食器画像と背景画像を用いて自然な新しいイメージを合成する必要があるが、拡散モデルは画像生成に大きな進歩をもたらし、将来性のある結果をもたらすエンドツーエンドアーキテクチャの構築を可能にしている。しかし、既存の拡散モデルでは、複数の画像からの情報処理と融合が困難であり、高品質な公開データセットへのアクセスが欠如しているため、食品画像合成における拡散モデルの適用が妨げられる。本稿では,22,000個の前景,背景,地上の真理3値からなる大規模で高品質な食品画像合成データセットFC22kを紹介する。さらに,事前学習した拡散モデルの能力を生かした新しい食品画像合成手法であるFoodfusionを提案し,前景や背景情報を処理・統合するためのFusion Moduleを組み込んだ。この融合した情報は、デノイングUNetのクロスアテンション層にグローバルな構造情報をマージすることにより、前景の特徴と背景構造とを整合させる。背景のコンテンツと構造をさらに強化するため、コンテンツ構造制御モジュールも統合する。提案手法の有効性と拡張性を示す実験を行った。

Food image composition requires the use of existing dish images and background images to synthesize a natural new image, while diffusion models have made significant advancements in image generation, enabling the construction of end-to-end architectures that yield promising results. However, existing diffusion models face challenges in processing and fusing information from multiple images and lack access to high-quality publicly available datasets, which prevents the application of diffusion models in food image composition. In this paper, we introduce a large-scale, high-quality food image composite dataset, FC22k, which comprises 22,000 foreground, background, and ground truth ternary image pairs. Additionally, we propose a novel food image composition method, Foodfusion, which leverages the capabilities of the pre-trained diffusion models and incorporates a Fusion Module for processing and integrating foreground and background information. This fused information aligns the foreground features with the background structure by merging the global structural information at the cross-attention layer of the denoising UNet. To further enhance the content and structure of the background, we also integrate a Content-Structure Control Module. Extensive experiments demonstrate the effectiveness and scalability of our proposed method.

翻訳日:2024-11-08 05:04:12 公開日:2024-11-01

# 空間認識拡散モデルによる大域的電場再構成とスパース観測

Spatially-Aware Diffusion Models with Cross-Attention for Global Field Reconstruction with Sparse Observations ( http://arxiv.org/abs/2409.00230v2 )

ライセンス: Link先を確認

Yilin Zhuang, Sibo Cheng, Karthik Duraisamy,

(参考訳) 拡散モデルは、複雑な分布を表現し、不確実性を組み込む能力に注目されており、ノイズや不完全データの存在下での堅牢な予測に理想的である。本研究では,部分的な観測から完全な空間場を推定するフィールド再構成タスクにおいて,スコアに基づく拡散モデルを開発し,拡張する。本研究では,観測された領域と観測されていない領域間のトラクタブルマッピングを構築するために,スパース観測と補間フィールドの学習可能な統合を帰納バイアスとして利用する条件符号化手法を提案する。センシング表現の洗練と時間次元の未解決により、任意の移動センサを処理し、フィールドを効果的に再構築することができる。さらに,静的および時間依存PDEにおける決定論的補間法に対するアプローチの総合的なベンチマークを行う。本研究は, 様々なサンプリングハイパーパラメータ, ノイズレベル, コンディショニング手法における性能評価のための, 強いベースラインのギャップに対処する試みである。提案手法は,ノイズのないデータに優れるが,クロスアテンションを持つ拡散モデルと条件エンコーディングにより,雑音条件下での他の手法よりも優れることを示す。さらに、拡散モデルと決定論的手法の両方が、定常問題に対する精度と計算コストの数値的アプローチを超越している。また, アンサンブルサンプリングを用いた共分散に基づく修正作業において, モデルが再現可能かどうかを把握し, 融合結果の精度を向上させる能力を示す。

Diffusion models have gained attention for their ability to represent complex distributions and incorporate uncertainty, making them ideal for robust predictions in the presence of noisy or incomplete data. In this study, we develop and enhance score-based diffusion models in field reconstruction tasks, where the goal is to estimate complete spatial fields from partial observations. We introduce a condition encoding approach to construct a tractable mapping mapping between observed and unobserved regions using a learnable integration of sparse observations and interpolated fields as an inductive bias. With refined sensing representations and an unraveled temporal dimension, our method can handle arbitrary moving sensors and effectively reconstruct fields. Furthermore, we conduct a comprehensive benchmark of our approach against a deterministic interpolation-based method across various static and time-dependent PDEs. Our study attempts to addresses the gap in strong baselines for evaluating performance across varying sampling hyperparameters, noise levels, and conditioning methods. Our results show that diffusion models with cross-attention and the proposed conditional encoding generally outperform other methods under noisy conditions, although the deterministic method excels with noiseless data. Additionally, both the diffusion models and the deterministic method surpass the numerical approach in accuracy and computational cost for the steady problem. We also demonstrate the ability of the model to capture possible reconstructions and improve the accuracy of fused results in covariance-based correction tasks using ensemble sampling.

翻訳日:2024-11-08 03:46:25 公開日:2024-11-01

# 添加物製造におけるディジタルツイン : システムレビュー

Digital Twins in Additive Manufacturing: A Systematic Review ( http://arxiv.org/abs/2409.00877v2 )

ライセンス: Link先を確認

Md Manjurul Ahsan, Yingtao Liu, Shivakumar Raman, Zahed Siddique,

(参考訳) Digital Twins (DT) は、AMマシンの物理的コンポーネントの仮想レプリカを作成する能力によって、リアルタイム生産監視に役立っているため、アダプティブマニュファクチャリング (AM) で人気が高まっている。機械学習(ML)、拡張現実(AR)、シミュレーションベースのモデルといった高度な技術は、製造プロセスにおいてインテリジェントで適応可能なDTを開発する上で重要な役割を果たします。しかし、スケーラビリティ、高品質なデータの統合、DT開発におけるリアルタイムアプリケーションに必要な計算能力について疑問が残る。 AMにおけるDTの現在の状態を理解することは、これらの課題に対処し、AMプロセスを進める上でそのポテンシャルを完全に活用するために不可欠である。この機会を考慮して、本研究は以下の4つの研究課題に対処することで、AMにおけるDTの総合的な概要を提供することを目的としている。 2)最近のDTの開発と実装について教えてください。 (3)プロセス改善とハイブリッド製造にDTはどのように使われているか? (4) DTは産業用 4.0 技術とどのように統合されているか? 現在の応用と技術について議論することで、AMやDTの研究者や実践者に対して、より深い理解と今後の研究の方向性を提供することを目指している。

Digital Twins (DTs) are becoming popular in Additive Manufacturing (AM) due to their ability to create virtual replicas of physical components of AM machines, which helps in real-time production monitoring. Advanced techniques such as Machine Learning (ML), Augmented Reality (AR), and simulation-based models play key roles in developing intelligent and adaptable DTs in manufacturing processes. However, questions remain regarding scalability, the integration of high-quality data, and the computational power required for real-time applications in developing DTs. Understanding the current state of DTs in AM is essential to address these challenges and fully utilize their potential in advancing AM processes. Considering this opportunity, this work aims to provide a comprehensive overview of DTs in AM by addressing the following four research questions: (1) What are the key types of DTs used in AM and their specific applications? (2) What are the recent developments and implementations of DTs? (3) How are DTs employed in process improvement and hybrid manufacturing? (4) How are DTs integrated with Industry 4.0 technologies? By discussing current applications and techniques, we aim to offer a better understanding and potential future research directions for researchers and practitioners in AM and DTs.

翻訳日:2024-11-08 03:35:26 公開日:2024-11-01

# ニューラルネットワークを用いた高精度実空間電子密度

Highly Accurate Real-space Electron Densities with Neural Networks ( http://arxiv.org/abs/2409.01306v2 )

ライセンス: Link先を確認

Lixue Cheng, P. Bernát Szabó, Zeno Schätzle, Derk P. Kooi, Jonas Köhler, Klaas J. H. Giesbertz, Frank Noé, Jan Hermann, Paola Gori-Giorgi, Adam Foster,

(参考訳) 量子化学における変分ab-initio法は、波動関数への直接アクセスを提供する他の方法の中でも際立っている。これは原則として、エネルギー以外の他の観測可能な興味の抽出を可能にするが、実際、この抽出は技術的に困難であり、計算的に非現実的であることが多い。ここでは,電子密度を量子化学において観測可能な中心となるものとみなし,その密度を既知の漸近特性を捉えるニューラルネットワークを用いて表現し,スコアマッチングとノイズコントラスト推定により波動関数からトレーニングすることにより,実空間多電子波関数から正確な密度を求める新しい手法を提案する。深層学習型 ans\atze (深部QMC) を用いた変分量子モンテカルロを用いて、基底セット誤差のない高精度な波動関数を得るとともに、新しい手法を用いて、双極子モーメント、原子間力、接触密度、その他の密度に基づく特性を計算して、対応する正確な電子密度を求める。

Variational ab-initio methods in quantum chemistry stand out among other methods in providing direct access to the wave function. This allows in principle straightforward extraction of any other observable of interest, besides the energy, but in practice this extraction is often technically difficult and computationally impractical. Here, we consider the electron density as a central observable in quantum chemistry and introduce a novel method to obtain accurate densities from real-space many-electron wave functions by representing the density with a neural network that captures known asymptotic properties and is trained from the wave function by score matching and noise-contrastive estimation. We use variational quantum Monte Carlo with deep-learning ans\"atze (deep QMC) to obtain highly accurate wave functions free of basis set errors, and from them, using our novel method, correspondingly accurate electron densities, which we demonstrate by calculating dipole moments, nuclear forces, contact densities, and other density-based properties.

翻訳日:2024-11-08 03:23:46 公開日:2024-11-01

# カスタム環境多目的強化学習のための効率的な逆関数探索器としての大規模言語モデル

Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning ( http://arxiv.org/abs/2409.02428v2 )

ライセンス: Link先を確認

Guanwen Xie, Jingzehua Xu, Yiyuan Yang, Yimian Ding, Shuai Zhang,

(参考訳) 複雑なカスタム環境と複数の要件を持つ強化学習(RL)タスクにおいて,報酬関数の効果的な設計と改善を実現することは,大きな課題となる。本稿では,LLMを用いた効率的な報酬関数探索機能であるERFSLを提案する。具体的には、各数値的明示的なユーザ要求に対して報酬成分を生成し、報酬批評家を用いて正しいコード形式を特定する。次に、LLMは、トレーニングログアナライザによって提供されるコンテキストに基づいて、遺伝的アルゴリズムと同様に、方向変異や交叉戦略を柔軟に適用することにより、報酬成分に重みを割り当て、そのバランスをとるとともに、曖昧さや冗長な調整なしに重みを反復的に調整する。このフレームワークを水中データ収集RLタスクに適用し、直接のフィードバックや報酬の例(ゼロショット学習)を使わずに適用した。報酬批評家は、各要求に対して1つのフィードバックインスタンスで報酬コードを修正し、修正不可能なエラーを効果的に防止する。ウェイトの初期化は、ウェイト探索を必要とせず、パレート解集合内の異なる報酬関数の取得を可能にする。重量が500倍の場合でも、平均してユーザ要求を満たすのに5.2回しか必要ありません。 ERFSLは、GPT-4o miniを利用するほとんどのプロンプトともうまく機能する。

Achieving the effective design and improvement of reward functions in reinforcement learning (RL) tasks with complex custom environments and multiple requirements presents considerable challenges. In this paper, we propose ERFSL, an efficient reward function searcher using LLMs, which enables LLMs to be effective white-box searchers and highlights their advanced semantic understanding capabilities. Specifically, we generate reward components for each numerically explicit user requirement and employ a reward critic to identify the correct code form. Then, LLMs assign weights to the reward components to balance their values and iteratively adjust the weights without ambiguity and redundant adjustments by flexibly adopting directional mutation and crossover strategies, similar to genetic algorithms, based on the context provided by the training log analyzer. We applied the framework to an underwater data collection RL task without direct human feedback or reward examples (zero-shot learning). The reward critic successfully corrects the reward code with only one feedback instance for each requirement, effectively preventing unrectifiable errors. The initialization of weights enables the acquisition of different reward functions within the Pareto solution set without the need for weight search. Even in cases where a weight is 500 times off, on average, only 5.2 iterations are needed to meet user requirements. The ERFSL also works well with most prompts utilizing GPT-4o mini, as we decompose the weight searching process to reduce the requirement for numerical and long-context understanding capabilities

翻訳日:2024-11-07 23:45:04 公開日:2024-11-01

Guanwen Xie, Jingzehua Xu, Yiyuan Yang, Yimian Ding, Shuai Zhang,

翻訳日:2024-11-07 23:45:04 公開日:2024-11-01

# AmazonのアクティブファイアモデリングにおけるLSTMとGRUを用いたニューラルネットワーク

Neural Networks with LSTM and GRU in Modeling Active Fires in the Amazon ( http://arxiv.org/abs/2409.02681v4 )

ライセンス: Link先を確認

Ramon Tavares, Ricardo Olinda,

(参考訳) 本研究は,ブラジルのアマゾンにあるAqua\_M-T衛星によって検出された活動点の歴史的時系列をモデル化し,予測するための包括的方法論を提案する。このアプローチでは、Long Short-Term Memory(LSTM)とGated Recurrent Unit(GRU)アーキテクチャを組み合わせた混合リカレントニューラルネットワーク(RNN)モデルを採用して、毎日検出されたアクティブファイアスポットの月次蓄積を予測する。データ分析の結果、一貫した季節性を示し、年間最大値と最低値が毎年同じ期間に繰り返される傾向があった。主な目的は、予測が機械学習技術によってこの固有の季節を捉えているかどうかを検証することである。この手法は,2種の種子を用いたクロスバリデーションを用いたデータ準備,モデル構成,トレーニングを慎重に行い,両種子の試験および検証セットの両方にデータを一般化することを保証した。その結果,LSTMモデルとGRUモデルを組み合わせることで予測性能が向上し,複雑な時間パターンを捕捉し,観測時系列をモデル化する効果が示された。本研究は, 環境モニタリングにおける深層学習技術の適用, 特にアクティブファイアスポットの予測に大きく貢献する。提案手法は,他の時系列予測課題への適応の可能性を強調し,機械学習の研究開発と自然現象の予測に新たな機会を開く。キーワード:時系列予測、リカレントニューラルネットワーク、ディープラーニング。

This study presents a comprehensive methodology for modeling and forecasting the historical time series of active fire spots detected by the AQUA\_M-T satellite in the Amazon, Brazil. The approach employs a mixed Recurrent Neural Network (RNN) model, combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures to predict the monthly accumulations of daily detected active fire spots. Data analysis revealed a consistent seasonality over time, with annual maximum and minimum values tending to repeat at the same periods each year. The primary objective is to verify whether the forecasts capture this inherent seasonality through machine learning techniques. The methodology involved careful data preparation, model configuration, and training using cross-validation with two seeds, ensuring that the data generalizes well to both the test and validation sets for both seeds. The results indicate that the combined LSTM and GRU model delivers excellent forecasting performance, demonstrating its effectiveness in capturing complex temporal patterns and modeling the observed time series. This research significantly contributes to the application of deep learning techniques in environmental monitoring, specifically in forecasting active fire spots. The proposed approach highlights the potential for adaptation to other time series forecasting challenges, opening new opportunities for research and development in machine learning and prediction of natural phenomena. Keywords: Time Series Forecasting; Recurrent Neural Networks; Deep Learning.

翻訳日:2024-11-07 23:34:03 公開日:2024-11-01

# AmazonのアクティブファイアモデリングにおけるLSTMとGRUを用いたニューラルネットワーク

Neural Networks with LSTM and GRU in Modeling Active Fires in the Amazon ( http://arxiv.org/abs/2409.02681v5 )

ライセンス: Link先を確認

Ramon Tavares, Ricardo Olinda,

翻訳日:2024-11-07 23:34:03 公開日:2024-11-01

# AmazonのアクティブファイアモデリングにおけるLSTMとGRUを用いたニューラルネットワーク

Neural Networks with LSTM and GRU in Modeling Active Fires in the Amazon ( http://arxiv.org/abs/2409.02681v6 )

ライセンス: Link先を確認

Ramon Tavares, Ricardo Olinda,

翻訳日:2024-11-07 23:34:03 公開日:2024-11-01

# CMM-Math:大規模マルチモーダルモデルの数学推論の評価と拡張を目的とした中国のマルチモーダル数学データセット

CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models ( http://arxiv.org/abs/2409.02834v3 )

ライセンス: Link先を確認

Wentao Liu, Qianjun Pan, Yi Zhang, Zhuo Liu, Ji Wu, Jie Zhou, Aimin Zhou, Qin Chen, Bo Jiang, Liang He,

(参考訳) 大規模言語モデル(LLM)は、人間の知能の基礎となる数学的推論において有望な結果を得た。従来の研究は、テキスト数学推論データセット(例えば、MATH、GSM8K)に基づくLLMの性能改善と測定に重点を置いていた。最近、数人の研究者が大規模なマルチモーダルモデル(LMM)の有効性を評価するために、英語のマルチモーダル数学データセット(例えば、MATHVISTA、MATH-V)をリリースした。本稿では,LMMの数学的推論を評価するために,ベンチマークやトレーニング部品を含む中国のマルチモーダル数学(CMM-Math)データセットをリリースする。 CMM-Mathには28,000以上の高品質なサンプルが含まれており、中国の小学校から高校まで、12段階の詳細なソリューションを備えた様々な問題タイプ(例えば、多重選択、ブランクの補充など)が特徴である。特に、視覚的コンテキストは質問や意見の中に存在し、このデータセットをより困難にします。包括的分析により、CMM-Mathデータセット上の最先端のLMMが課題に直面しており、LMM開発におけるさらなる改善の必要性を強調している。また,複数画像とテキストセグメントの混合入力による問題に対処するマルチモーダル数学的LMM(Math-LMM)を提案する。基礎的な事前学習、基礎的な微調整、数学的微調整を含む3つの段階を用いてモデルを訓練する。より広範な実験により,本モデルは3つのマルチモーダルな数学的データセット上でのSOTA LMMと比較することにより,数学推論性能を効果的に向上することが示された。

Large language models (LLMs) have obtained promising results in mathematical reasoning, which is a foundational skill for human intelligence. Most previous studies focus on improving and measuring the performance of LLMs based on textual math reasoning datasets (e.g., MATH, GSM8K). Recently, a few researchers have released English multimodal math datasets (e.g., MATHVISTA and MATH-V) to evaluate the effectiveness of large multimodal models (LMMs). In this paper, we release a Chinese multimodal math (CMM-Math) dataset, including benchmark and training parts, to evaluate and enhance the mathematical reasoning of LMMs. CMM-Math contains over 28,000 high-quality samples, featuring a variety of problem types (e.g., multiple-choice, fill-in-the-blank, and so on) with detailed solutions across 12 grade levels from elementary to high school in China. Specifically, the visual context may be present in the questions or opinions, which makes this dataset more challenging. Through comprehensive analysis, we discover that state-of-the-art LMMs on the CMM-Math dataset face challenges, emphasizing the necessity for further improvements in LMM development. We also propose a Multimodal Mathematical LMM (Math-LMM) to handle the problems with mixed input of multiple images and text segments. We train our model using three stages, including foundational pre-training, foundational fine-tuning, and mathematical fine-tuning. The extensive experiments indicate that our model effectively improves math reasoning performance by comparing it with the SOTA LMMs over three multimodal mathematical datasets.

翻訳日:2024-11-07 23:34:03 公開日:2024-11-01

# LLMによる競争性市場行動に関する実験的研究

An Experimental Study of Competitive Market Behavior Through LLMs ( http://arxiv.org/abs/2409.08357v2 )

ライセンス: Link先を確認

Jingru Jia, Zehua Yuan,

(参考訳) 本研究では,市場実験を行うための大規模言語モデル (LLM) の可能性について検討し,競争市場のダイナミクスを理解する能力を理解することを目的とした。我々は,市場エージェントの行動を制御された実験環境でモデル化し,競争均衡に向けて収束する能力を評価する。その結果,人間の取引行動に特徴的な動的意思決定プロセスの複製において,LLMが直面する課題が明らかになった。人間とは異なり、LLMは市場均衡を達成する能力に欠けていた。この研究は、LLMがスケーラブルで再現可能な市場シミュレーションのための貴重なツールを提供する一方で、現在の制限は市場行動の複雑さを完全に捉えるためにさらなる進歩を必要としていることを実証している。動的学習能力を高め、行動経済学の要素を取り入れた将来の仕事は、経済領域におけるLLMの有効性を改善し、市場のダイナミクスに関する新たな洞察を提供し、経済政策の洗練に寄与する。

This study explores the potential of large language models (LLMs) to conduct market experiments, aiming to understand their capability to comprehend competitive market dynamics. We model the behavior of market agents in a controlled experimental setting, assessing their ability to converge toward competitive equilibria. The results reveal the challenges current LLMs face in replicating the dynamic decision-making processes characteristic of human trading behavior. Unlike humans, LLMs lacked the capacity to achieve market equilibrium. The research demonstrates that while LLMs provide a valuable tool for scalable and reproducible market simulations, their current limitations necessitate further advancements to fully capture the complexities of market behavior. Future work that enhances dynamic learning capabilities and incorporates elements of behavioral economics could improve the effectiveness of LLMs in the economic domain, providing new insights into market dynamics and aiding in the refinement of economic policies.

翻訳日:2024-11-07 21:20:36 公開日:2024-11-01

# 差分プライバシーのためのセキュアサンプリングプロトコルのベンチマーク

Benchmarking Secure Sampling Protocols for Differential Privacy ( http://arxiv.org/abs/2409.10667v2 )

ライセンス: Link先を確認

Yucheng Fu, Tianhao Wang,

(参考訳) 差分プライバシー(DP)は、集約されたデータからの情報漏洩を制限することにより、個人に対してプライバシー保護を提供するために広く利用されている。 DPの2つのよく知られたモデルは、中心モデルと局所モデルである。前者はデータアグリゲーションに信頼できるサーバを必要とし、後者は個人がノイズを加えることを必要とし、集約された結果の有用性を著しく低下させる。近年,分散環境でのセキュアなマルチパーティ計算(MPC)によるDPの実現,すなわち,特定のセキュリティ前提の下では,中央モデルに匹敵するユーティリティを持つ分散モデルの実現が提案されている。分散モデルにおけるDPを実現する一つの課題は、MPCで効率的にノイズをサンプリングすることである。多くの安全なサンプリング法が提案されているが、それらは異なるセキュリティ仮定と独立した理論解析を持っている。パフォーマンスを計測し比較する実験的な評価が不足しています。我々は、既存のサンプリングプロトコルをMPCでベンチマークし、その効率を総合的に測定することで、このギャップを埋める。まず,これらのサンプリングプロトコルの基礎となる手法の分類について述べる。第二に、広く使われている分散ノイズ発生プロトコルを拡張して、ビザンチン攻撃に対する耐性を高める。第3に、離散サンプリングプロトコルを実装し、セキュリティ設定を公平に比較する。そして、その効率性と有用性を研究するために、広範囲な評価を行う。

Differential privacy (DP) is widely employed to provide privacy protection for individuals by limiting information leakage from the aggregated data. Two well-known models of DP are the central model and the local model. The former requires a trustworthy server for data aggregation, while the latter requires individuals to add noise, significantly decreasing the utility of aggregated results. Recently, many studies have proposed to achieve DP with Secure Multi-party Computation (MPC) in distributed settings, namely, the distributed model, which has utility comparable to central model while, under specific security assumptions, preventing parties from obtaining others' information. One challenge of realizing DP in distributed model is efficiently sampling noise with MPC. Although many secure sampling methods have been proposed, they have different security assumptions and isolated theoretical analyses. There is a lack of experimental evaluations to measure and compare their performances. We fill this gap by benchmarking existing sampling protocols in MPC and performing comprehensive measurements of their efficiency. First, we present a taxonomy of the underlying techniques of these sampling protocols. Second, we extend widely used distributed noise generation protocols to be resilient against Byzantine attackers. Third, we implement discrete sampling protocols and align their security settings for a fair comparison. We then conduct an extensive evaluation to study their efficiency and utility.

翻訳日:2024-11-07 20:24:11 公開日:2024-11-01

# 真空から魔法を放つ

Harvesting magic from the vacuum ( http://arxiv.org/abs/2409.11473v2 )

ライセンス: Link先を確認

Ron Nyström, Nicola Pranzini, Esko Keski-Vakkuri,

(参考訳) Magic(マジック)は、量子コンピュータが古典的な計算によって効率的にシミュレートできない操作を実行できる量子リソースである。そのため、量子システムにおける魔法の生成は、量子上の優位性を達成するために不可欠である。この手紙は、初期真空状態の量子場と相互作用する3レベルのUnruh-DeWitt検出器(量子ビット)によって魔法を収穫できることを示している。量子場理論(QFT)から資源を抽出するという考え方は、絡み合いの収穫から生まれたものであるが、この結果は、石英を非魔法の状態から魔法状態へと進化させるプロトコルを拡張し、QFTから魔法を生成することができる。

Magic is the quantum resource allowing a quantum computer to perform operations that cannot be simulated efficiently by classical computation. As such, generating magic in a quantum system is crucial for achieving quantum advantage. This letter shows that magic can be harvested by a three-level Unruh-DeWitt detector (a qutrit) interacting with a quantum field in an initial vacuum state. While the idea of extracting resources from Quantum Field Theories (QFT) was born from the harvesting of entanglement, our result extends the protocol to evolve a qutrit from a non-magical state to a magical one, making it possible to generate magic from QFT.

翻訳日:2024-11-07 20:01:55 公開日:2024-11-01

# TART: 説明可能なテーブルベースの推論のためのオープンソースのツール拡張フレームワーク

TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning ( http://arxiv.org/abs/2409.11724v2 )

ライセンス: Link先を確認

Xinyuan Lu, Liangming Pan, Yubo Ma, Preslav Nakov, Min-Yen Kan,

(参考訳) 現在のLarge Language Models (LLMs) は、テーブル構造を理解し、正確な数値推論を適用する能力に限界があり、これはテーブル質問応答(TQA)やテーブルベースの事実検証(TFV)といったタスクに不可欠である。これらの課題に対処するために、特殊なツールとLLMを統合するTART(Tool-Augmented Reasoning framework for Tables)を紹介します。 TARTには、正確なデータ表現を保証するテーブルフォーマッター、特定の計算ツールを開発するツールメーカー、説明可能性を維持するための説明ジェネレータの3つの重要なコンポーネントが含まれている。また、テーブル-ツール統合におけるLLMのトレーニングに特化して設計された新しいベンチマークであるTOOLTABデータセットも提示する。実験の結果,データ処理の精度と推論プロセスの明確さを両立させることにより,既存の手法(例えばChain-of-Thought)よりも大幅に改善できることが示唆された。特に、CodeLlamaと組み合わせたTARTは、クローズドソースのLCM GPT-3.5-turboの精度の90.0%を達成し、さまざまな実世界のシナリオにおける堅牢性を強調している。すべてのコードとデータはhttps://github.com/XinyuanLu00/TARTで入手できる。

Current Large Language Models (LLMs) exhibit limited ability to understand table structures and to apply precise numerical reasoning, which is crucial for tasks such as table question answering (TQA) and table-based fact verification (TFV). To address these challenges, we introduce our Tool-Augmented Reasoning framework for Tables (TART), which integrates LLMs with specialized tools. TART contains three key components: a table formatter to ensure accurate data representation, a tool maker to develop specific computational tools, and an explanation generator to maintain explainability. We also present the TOOLTAB dataset, a new benchmark designed specifically for training LLMs in table-tool integration. Our experiments indicate that TART achieves substantial improvements over existing methods (e.g., Chain-of-Thought) by improving both the precision of data processing and the clarity of the reasoning process. Notably, TART paired with CodeLlama achieves 90.0% of the accuracy of the closed-sourced LLM GPT-3.5-turbo, highlighting its robustness in diverse real-world scenarios. All the code and data are available at https://github.com/XinyuanLu00/TART.

翻訳日:2024-11-07 19:50:48 公開日:2024-11-01

# HSIGene:ハイパースペクトル画像生成の基礎モデル

HSIGene: A Foundation Model For Hyperspectral Image Generation ( http://arxiv.org/abs/2409.12470v1 )

ライセンス: Link先を確認

Li Pang, Datao Tang, Shuang Xu, Deyu Meng, Xiangyong Cao,

(参考訳) ハイパースペクトル画像(HSI)は農業や環境モニタリングなど様々な分野で重要な役割を果たしている。しかし、高コストな取得コストのため、ハイパースペクトル画像の数は制限され、下流タスクの性能が低下する。近年、拡散モデルを用いてHSIを合成しようとする研究もあるが、それでもHSIの不足に悩まされ、生成した画像の信頼性と多様性に影響を及ぼす。空間的多様性を高めるためにマルチモーダルデータを組み込むことを提案する研究もあるが、スペクトルの忠実度は保証できない。さらに、既存のHSI合成モデルは、通常は制御不能または単一条件制御のみをサポートし、正確で信頼性の高いHSIを生成する能力を制限する。これらの問題を緩和するため,我々は遅延拡散に基づく新しいHSI生成基盤モデルであるHSIGeneを提案し,より正確で信頼性の高いHSI生成を実現する。スペクトル密度を保ちながらトレーニングデータの空間的多様性を高めるため,空間超解像に基づく新たなデータ拡張手法を提案する。さらに,拡張データの知覚的品質を向上させるために,まずRGBバンドを超解像化し,次に,ガイド付きHSI超解像にRGAN(Rectangular Guided Attention Network)を用いた新しい2段階HSI超解像フレームワークを導入する。実験により,提案モデルでは,デノナイズや超解像といった下流タスクに対して,膨大な量の現実的なHSIを生成することができることが示された。コードとモデルはhttps://github.com/LiPang/HSIGene.comで入手できる。

Hyperspectral image (HSI) plays a vital role in various fields such as agriculture and environmental monitoring. However, due to the expensive acquisition cost, the number of hyperspectral images is limited, degenerating the performance of downstream tasks. Although some recent studies have attempted to employ diffusion models to synthesize HSIs, they still struggle with the scarcity of HSIs, affecting the reliability and diversity of the generated images. Some studies propose to incorporate multi-modal data to enhance spatial diversity, but the spectral fidelity cannot be ensured. In addition, existing HSI synthesis models are typically uncontrollable or only support single-condition control, limiting their ability to generate accurate and reliable HSIs. To alleviate these issues, we propose HSIGene, a novel HSI generation foundation model which is based on latent diffusion and supports multi-condition control, allowing for more precise and reliable HSI generation. To enhance the spatial diversity of the training data while preserving spectral fidelity, we propose a new data augmentation method based on spatial super-resolution, in which HSIs are upscaled first, and thus abundant training patches could be obtained by cropping the high-resolution HSIs. In addition, to improve the perceptual quality of the augmented data, we introduce a novel two-stage HSI super-resolution framework, which first applies RGB bands super-resolution and then utilizes our proposed Rectangular Guided Attention Network (RGAN) for guided HSI super-resolution. Experiments demonstrate that the proposed model is capable of generating a vast quantity of realistic HSIs for downstream tasks such as denoising and super-resolution. The code and models are available at https://github.com/LiPang/HSIGene.

翻訳日:2024-11-07 14:41:29 公開日:2024-11-01

# HSIGene:ハイパースペクトル画像生成の基礎モデル

HSIGene: A Foundation Model For Hyperspectral Image Generation ( http://arxiv.org/abs/2409.12470v2 )

ライセンス: Link先を確認

Li Pang, Xiangyong Cao, Datao Tang, Shuang Xu, Xueru Bai, Feng Zhou, Deyu Meng,

翻訳日:2024-11-07 14:41:29 公開日:2024-11-01

# フレームレベル基準に基づく軽量トランスデューサ

Lightweight Transducer Based on Frame-Level Criterion ( http://arxiv.org/abs/2409.13698v1 )

ライセンス: Link先を確認

Genshun Wan, Mengzhi Wang, Tingzhi Mao, Hang Chen, Zhongfu Ye,

(参考訳) シーケンスレベルの基準に基づいてトレーニングされたトランスデューサモデルは、大きな確率行列を生成するため、多くのメモリを必要とする。我々は,CTC強制アライメントアルゴリズムの結果を用いて,フレーム単位のラベルを決定する軽量トランスデューサモデルを提案する。そして、デコーダ出力は、トランスデューサのように、デコーダ出力の各素子にエンコーダ出力を付加するのではなく、対応するタイミングでデコーダ出力と組み合わせることができる。これにより、メモリと計算の要求が大幅に削減される。ラベル中の過剰な空白による不均衡な分類の問題に対処するため、空白と非ブランク確率を分離し、空白分類器の勾配をメインネットワークに切り離す。これにより、軽量なトランスデューサがトランスデューサと同じような結果が得られる。さらに、よりリッチな情報を用いてブランクの確率を予測し、トランスデューサに優れた結果を得る。

The transducer model trained based on sequence-level criterion requires a lot of memory due to the generation of the large probability matrix. We proposed a lightweight transducer model based on frame-level criterion, which uses the results of the CTC forced alignment algorithm to determine the label for each frame. Then the encoder output can be combined with the decoder output at the corresponding time, rather than adding each element output by the encoder to each element output by the decoder as in the transducer. This significantly reduces memory and computation requirements. To address the problem of imbalanced classification caused by excessive blanks in the label, we decouple the blank and non-blank probabilities and truncate the gradient of the blank classifier to the main network. This enables the lightweight transducer achieving similar results to transducer. Additionally, we use richer information to predict the probability of blank, achieving superior results to transducer.

翻訳日:2024-11-07 05:57:35 公開日:2024-11-01

# フレームレベル基準に基づく軽量トランスデューサ

Lightweight Transducer Based on Frame-Level Criterion ( http://arxiv.org/abs/2409.13698v2 )

ライセンス: Link先を確認

Genshun Wan, Mengzhi Wang, Tingzhi Mao, Hang Chen, Zhongfu Ye,

(参考訳) シーケンスレベルの基準に基づいてトレーニングされたトランスデューサモデルは、大きな確率行列を生成するため、多くのメモリを必要とする。我々は,CTC強制アライメントアルゴリズムの結果を用いて,フレーム単位のラベルを決定する軽量トランスデューサモデルを提案する。そして、デコーダ出力は、トランスデューサのように、デコーダ出力の各素子にエンコーダ出力を付加するのではなく、対応するタイミングでデコーダ出力と組み合わせることができる。これにより、メモリと計算の要求が大幅に削減される。ラベル中の過剰な空白による不均衡な分類の問題に対処するため、空白と非ブランク確率を分離し、空白分類器の勾配をメインネットワークに切り離す。 AISHELL-1の実験では、軽量トランスデューサがトランスデューサと同じような結果が得られることを示した。さらに、よりリッチな情報を用いてブランクの確率を予測し、トランスデューサに優れた結果を得る。

The transducer model trained based on sequence-level criterion requires a lot of memory due to the generation of the large probability matrix. We proposed a lightweight transducer model based on frame-level criterion, which uses the results of the CTC forced alignment algorithm to determine the label for each frame. Then the encoder output can be combined with the decoder output at the corresponding time, rather than adding each element output by the encoder to each element output by the decoder as in the transducer. This significantly reduces memory and computation requirements. To address the problem of imbalanced classification caused by excessive blanks in the label, we decouple the blank and non-blank probabilities and truncate the gradient of the blank classifier to the main network. Experiments on the AISHELL-1 demonstrate that this enables the lightweight transducer to achieve similar results to transducer. Additionally, we use richer information to predict the probability of blank, achieving superior results to transducer.

翻訳日:2024-11-07 05:57:35 公開日:2024-11-01

# RACOON:知識グラフを用いた検索型カラム型アノテーションのためのLLMベースのフレームワーク

RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph ( http://arxiv.org/abs/2409.14556v1 )

ライセンス: Link先を確認

Linxi Wei, Guorui Xiao, Magdalena Balazinska,

(参考訳) データ探索と統合の重要なコンポーネントとして、カラム型アノテーション(CTA)は、1つ以上のセマンティックタイプを持つテーブルの列をラベル付けすることを目的としている。最近のLarge Language Models (LLMs)の開発で、研究者は強力なゼロショット機能を活用して、CTAにLLMを使用する可能性を探り始めた。本稿では、この有望な作業に基づいて、LLMに提供されたコンテキスト情報をKG(Knowledge Graph)を用いて拡張する方法を示すことで、CTAのLCMベースの手法を改善する。 RACOONと呼ばれる我々の手法は、CTAにおけるLLMの性能を向上させるために、事前訓練されたパラメトリック知識と非パラメトリック知識を組み合わせる。実験の結果, RACOONはバニラLEM推定と比較して最大0.21マイクロF-1の改善を達成できた。

As an important component of data exploration and integration, Column Type Annotation (CTA) aims to label columns of a table with one or more semantic types. With the recent development of Large Language Models (LLMs), researchers have started to explore the possibility of using LLMs for CTA, leveraging their strong zero-shot capabilities. In this paper, we build on this promising work and improve on LLM-based methods for CTA by showing how to use a Knowledge Graph (KG) to augment the context information provided to the LLM. Our approach, called RACOON, combines both pre-trained parametric and non-parametric knowledge during generation to improve LLMs' performance on CTA. Our experiments show that RACOON achieves up to a 0.21 micro F-1 improvement compared against vanilla LLM inference.

翻訳日:2024-11-06 22:08:18 公開日:2024-11-01

Lindsey Linxi Wei, Guorui Xiao, Magdalena Balazinska,

翻訳日:2024-11-06 22:08:18 公開日:2024-11-01

# 地上観測衛星ネットワークのための航空深層学習統合意味推論モデル

On-Air Deep Learning Integrated Semantic Inference Models for Enhanced Earth Observation Satellite Networks ( http://arxiv.org/abs/2409.15246v2 )

ライセンス: Link先を確認

Hong-fu Chou, Vu Nguyen Ha, Prabhu Thiruvasagam, Thanh-Dung Le, Geoffrey Eappen, Ti Ti Nguyen, Luis M. Garces-Socarras, Jorge L. Gonzalez-Rios, Juan Carlos Merlano-Duncan, Symeon Chatzinotas,

(参考訳) 地球観測(EO)システムは、衛星ネットワークを通じて重要なグローバルデータを収集・分析することで持続可能な開発目標を達成する上で重要な役割を担っている。これらのシステムは, マッピング, 災害監視, 資源管理といったタスクには不可欠だが, 農業や災害対応などの専門分野において, 大量のEOデータを処理, 送信する上で, 課題に直面している。ドメイン適応型大規模言語モデル(LLM)は、広範なEOデータとセマンティックEOデータとのデータ融合を容易にすることで、有望なソリューションを提供する。多様なデータセットの統合と解釈を改善することで、LLMは農業や災害対応アプリケーションで専門的な情報を処理するという課題に対処する。この融合は送信されたデータの正確性と関連性を高める。本稿では,EO衛星ネットワークにおけるセマンティック通信のためのフレームワークを提案する。提案方式では,ディスクリート・タスク指向のソース・チャネル符号化 (DT-JSCC) とセマンティック・データ拡張 (SA) を用いて,通信オーバーヘッドを最小限に抑えながら関連情報に集中する。認知的セマンティック処理と衛星間リンクを統合することにより、マルチスペクトル衛星画像の解析と伝送を強化し、オブジェクト検出、パターン認識、リアルタイム意思決定を改善する。 CSA(Cognitive Semantic Augmentation)の導入により、衛星はセマンティック情報を処理および送信することができ、環境やアプリケーションニーズの変化への適応性を高めることができる。このエンドツーエンドアーキテクチャは、6Gをサポートする次世代衛星ネットワーク向けに調整されており、効率と精度が大幅に向上している。

Earth Observation (EO) systems play a crucial role in achieving Sustainable Development Goals by collecting and analyzing vital global data through satellite networks. These systems are essential for tasks like mapping, disaster monitoring, and resource management, but they face challenges in processing and transmitting large volumes of EO data, especially in specialized fields such as agriculture and real-time disaster response. Domain-adapted Large Language Models (LLMs) provide a promising solution by facilitating data fusion between extensive EO data and semantic EO data. By improving integration and interpretation of diverse datasets, LLMs address the challenges of processing specialized information in agriculture and disaster response applications. This fusion enhances the accuracy and relevance of transmitted data. This paper presents a framework for semantic communication in EO satellite networks, aimed at improving data transmission efficiency and overall system performance through cognitive processing techniques. The proposed system employs Discrete-Task-Oriented Source-Channel Coding (DT-JSCC) and Semantic Data Augmentation (SA) to focus on relevant information while minimizing communication overhead. By integrating cognitive semantic processing and inter-satellite links, the framework enhances the analysis and transmission of multispectral satellite imagery, improving object detection, pattern recognition, and real-time decision-making. The introduction of Cognitive Semantic Augmentation (CSA) allows satellites to process and transmit semantic information, boosting adaptability to changing environments and application needs. This end-to-end architecture is tailored for next-generation satellite networks, such as those supporting 6G, and demonstrates significant improvements in efficiency and accuracy.

翻訳日:2024-11-06 20:27:58 公開日:2024-11-01

Hong-fu Chou, Vu Nguyen Ha, Prabhu Thiruvasagam, Thanh-Dung Le, Geoffrey Eappen, Ti Ti Nguyen, Luis M. Garces-Socarras, Jorge L. Gonzalez-Rios, Juan Carlos Merlano-Duncan, Symeon Chatzinotas,

(参考訳) 地球観測(EO)システムは、地図作成、災害監視、資源管理に不可欠である。それにもかかわらず、特に精密農業やリアルタイム災害対応といった専門分野において、広範なデータの処理や伝達にかなりの障害に直面している。リモートセンシング技術を備えた地球観測衛星は、オンボードセンサーやIoT対応の地上オブジェクトからデータを収集し、リモートで重要な情報を提供する。ドメイン適応型大規模言語モデル(LLM)は、生および処理されたEOデータの統合を可能にするソリューションを提供する。ドメイン適応により、LLMは多くのデータソースの同化と分析を改善し、農業や災害対応における特別なデータセットの複雑さに対処する。 LLMによって誘導されるこのデータ合成は、伝達された情報の精度とパーシステンスを高める。本研究は,高度なEOシステムのための意味推論と深層学習を徹底的に検討する。 EO衛星ネットワークにおけるセマンティック通信のための革新的なアーキテクチャを提案し,セマンティック処理手法を用いてデータ伝送効率を向上させる。近年のオンボード処理技術の進歩は、軌道上での信頼性、適応性、エネルギー効率の高いデータ管理を可能にしている。これらの改良により、放射線硬化・再構成技術による悪環境における信頼性の高い性能が保証される。これらの進歩により、次世代衛星ミッションの処理能力は向上し、運用の柔軟性と6G衛星通信におけるリアルタイムな意思決定に欠かせないものとなった。

Earth Observation (EO) systems are crucial for cartography, disaster surveillance, and resource administration. Nonetheless, they encounter considerable obstacles in the processing and transmission of extensive data, especially in specialized domains such as precision agriculture and real-time disaster response. Earth observation satellites, outfitted with remote sensing technology, gather data from onboard sensors and IoT-enabled terrestrial objects, delivering important information remotely. Domain-adapted Large Language Models (LLMs) provide a solution by enabling the integration of raw and processed EO data. Through domain adaptation, LLMs improve the assimilation and analysis of many data sources, tackling the intricacies of specialized datasets in agriculture and disaster response. This data synthesis, directed by LLMs, enhances the precision and pertinence of conveyed information. This study provides a thorough examination of using semantic inference and deep learning for sophisticated EO systems. It presents an innovative architecture for semantic communication in EO satellite networks, designed to improve data transmission efficiency using semantic processing methodologies. Recent advancements in onboard processing technologies enable dependable, adaptable, and energy-efficient data management in orbit. These improvements guarantee reliable performance in adverse space circumstances using radiation-hardened and reconfigurable technology. Collectively, these advancements enable next-generation satellite missions with improved processing capabilities, crucial for operational flexibility and real-time decision-making in 6G satellite communication.

翻訳日:2024-11-06 20:27:58 公開日:2024-11-01

# IIoTにおけるデータ不均一性を考慮した表面欠陥分類のための対向的フェデレーション・コンセンサス学習

Adversarial Federated Consensus Learning for Surface Defect Classification Under Data Heterogeneity in IIoT ( http://arxiv.org/abs/2409.15711v2 )

ライセンス: Link先を確認

Jixuan Cui, Jun Li, Zhen Mei, Yiyang Ni, Wen Chen, Zengxiang Li,

(参考訳) データ不足の課題は、産業用表面欠陥分類(SDC)におけるディープラーニングの適用を妨げる。プライバシー上の懸念から、産業用モノのインターネット(IIoT)のさまざまなエンティティから十分なトレーニングデータを収集、集中させることが難しいからだ。フェデレートラーニング(FL)は、プライバシを維持しながら、クライアント間で協調的なグローバルモデルトレーニングを可能にするソリューションを提供する。しかし、クライアント間でのデータ分散が不均一であるためにパフォーマンスが低下する可能性がある。本稿では,SDC の異なるクライアント間でのデータの異質性に挑戦するために,Adversarial Federated Consensus Learning (AFedCL) という新しいパーソナライズされた FL (PFL) アプローチを提案する。まず,データの不均一性による性能劣化を軽減するために,動的コンセンサス構築戦略を開発する。敵対的トレーニングを通じて、異なるクライアントのローカルモデルは、グローバルモデルをブリッジとして利用し、分散アライメントを実現し、グローバル知識の忘れる問題を緩和する。この戦略を補完し,コンセンサスを考慮したアグリゲーション機構を提案する。グローバルな知識学習における有効性に基づいて、集約重みを異なるクライアントに割り当て、グローバルなモデルの一般化能力を高める。最後に,グローバルな知識利用効率を高めるために,適応的な特徴融合モジュールを設計する。パーソナライズされた融合重みは、グローバルな特徴とローカルな特徴を最適にバランスするために、各クライアントに対して徐々に調整される。 FedALAのような最先端のFL法と比較して、提案手法は3つのSDCデータセットで最大5.67%の精度向上を実現する。

The challenge of data scarcity hinders the application of deep learning in industrial surface defect classification (SDC), as it's difficult to collect and centralize sufficient training data from various entities in Industrial Internet of Things (IIoT) due to privacy concerns. Federated learning (FL) provides a solution by enabling collaborative global model training across clients while maintaining privacy. However, performance may suffer due to data heterogeneity-discrepancies in data distributions among clients. In this paper, we propose a novel personalized FL (PFL) approach, named Adversarial Federated Consensus Learning (AFedCL), for the challenge of data heterogeneity across different clients in SDC. First, we develop a dynamic consensus construction strategy to mitigate the performance degradation caused by data heterogeneity. Through adversarial training, local models from different clients utilize the global model as a bridge to achieve distribution alignment, alleviating the problem of global knowledge forgetting. Complementing this strategy, we propose a consensus-aware aggregation mechanism. It assigns aggregation weights to different clients based on their efficacy in global knowledge learning, thereby enhancing the global model's generalization capabilities. Finally, we design an adaptive feature fusion module to further enhance global knowledge utilization efficiency. Personalized fusion weights are gradually adjusted for each client to optimally balance global and local features. Compared with state-of-the-art FL methods like FedALA, the proposed AFedCL method achieves an accuracy increase of up to 5.67% on three SDC datasets.

翻訳日:2024-11-06 19:32:29 公開日:2024-11-01

# VascXモデル:カラーファウンダス画像からの網膜血管解析のためのモデルアンサンブル

VascX Models: Model Ensembles for Retinal Vascular Analysis from Color Fundus Images ( http://arxiv.org/abs/2409.16016v2 )

ライセンス: Link先を確認

Jose Vargas Quiros, Bart Liefers, Karin van Garderen, Jeroen Vermeulen, Eyened Reading Center, Sinergia Consortium, Caroline Klaver,

(参考訳) 本稿では,カラーファンドス画像(CFI)から網膜血管を解析するための包括的モデルアンサンブルであるVascXモデルを紹介する。アノテーション付きCFIは、公開データセットから集約された。人口を基盤とするロッテルダム研究(Rotterdam Study)から追加のCFIは、ピクセルレベルの動脈や静脈のグレーダーによって注釈され、患者人口と画像条件に多様なデータセットが得られた。 VascXモデルは、既存の公開モデルと比較してデータセット、画像品質レベル、解剖学的領域のセグメンテーション性能が優れていることを示した。動脈・静脈・椎間板のセグメンテーション性能,特に大コホートや臨床データセットに共通する中品質CFIのセグメンテーションにおいて重要な改善が認められた。重要な点として,これらの改善は,VascXセグメンテーションマスクから抽出した特徴と,以前のモデルで生成したセグメンテーションマスクから抽出した特徴とを比較すると,より正確な血管機能に変換された。 VascXモデルでは、実装を簡素化し、自動網膜血管解析の品質を向上させることを目的とした、堅牢で実用性の高いモデルアンサンブルと推論コードを提供しています。モデルによって生成された正確な血管パラメータは、眼の内外における病気のパターンを識別するための出発点として機能する。

We introduce VascX models, a comprehensive set of model ensembles for analyzing retinal vasculature from color fundus images (CFIs). Annotated CFIs were aggregated from public datasets . Additional CFIs, mainly from the population-based Rotterdam Study were annotated by graders for arteries and veins at pixel level, resulting in a dataset diverse in patient demographics and imaging conditions. VascX models demonstrated superior segmentation performance across datasets, image quality levels, and anatomic regions when compared to existing, publicly available models, likely due to the increased size and variety of our training set. Important improvements were observed in artery-vein and disc segmentation performance, particularly in segmentations of these structures on CFIs of intermediate quality, common in large cohorts and clinical datasets. Importantly, these improvements translated into significantly more accurate vascular features when we compared features extracted from VascX segmentation masks with features extracted from segmentation masks generated by previous models. With VascX models we provide a robust, ready-to-use set of model ensembles and inference code aimed at simplifying the implementation and enhancing the quality of automated retinal vasculature analyses. The precise vessel parameters generated by the model can serve as starting points for the identification of disease patterns in and outside of the eye.

翻訳日:2024-11-06 18:04:33 公開日:2024-11-01

# 複数グループ:シミュレート・ソーシャル・アンサンブルによるLCMの指導システム

Plurals: A System for Guiding LLMs Via Simulated Social Ensembles ( http://arxiv.org/abs/2409.17213v3 )

ライセンス: Link先を確認

Joshua Ashkinaze, Emily Fry, Narendra Edara, Eric Gilbert, Ceren Budak,

(参考訳) 近年の議論は、言語モデルが特定の視点を好むのではないかという懸念を提起した。しかし、もし解決策が"どこからでも見る"ことではなく、むしろ異なる視点を活用することにあるとしたらどうでしょう? 本稿では,多言語AIのためのシステムとPythonライブラリであるPluralsを紹介する。複数言語は、カスタマイズ可能な構造内で意図的に行われるエージェント(LLM、オプションでペルソナを含む)と、モデレーターが審議を監督する。 Pluralsは、シミュレートされたソーシャルアンサンブルのジェネレータである。 Pluralsは政府データセットを統合して、全国的に代表されるペルソナを作成し、民主的な熟考理論に触発された熟考テンプレートを含み、ユーザーは情報共有構造と構造内の熟考行動の両方をカスタマイズできる。 6つのケーススタディは、理論的構成と有効性に対する忠実さを示している。 3つのランダム化実験は、シミュレーションされた焦点群が関連する聴衆のオンラインサンプル(75%の試験でゼロショット生成を超越した)と共振する結果を示した。複数言語は多元的AIのためのパラダイムと具体的なシステムである。 Pluralsライブラリはhttps://github.com/josh-ashkinaze/pluralsで公開されている。

Recent debates raised concerns that language models may favor certain viewpoints. But what if the solution is not to aim for a 'view from nowhere' but rather to leverage different viewpoints? We introduce Plurals, a system and Python library for pluralistic AI deliberation. Plurals consists of Agents (LLMs, optionally with personas) which deliberate within customizable Structures, with Moderators overseeing deliberation. Plurals is a generator of simulated social ensembles. Plurals integrates with government datasets to create nationally representative personas, includes deliberation templates inspired by democratic deliberation theory, and allows users to customize both information-sharing structures and deliberation behavior within Structures. Six case studies demonstrate fidelity to theoretical constructs and efficacy. Three randomized experiments show simulated focus groups produced output resonant with an online sample of the relevant audiences (chosen over zero-shot generation in 75% of trials). Plurals is both a paradigm and a concrete system for pluralistic AI. The Plurals library is available at https://github.com/josh-ashkinaze/plurals and will be continually updated.

翻訳日:2024-11-06 16:30:51 公開日:2024-11-01

# 複数グループ:シミュレート・ソーシャル・アンサンブルによるLCMの指導システム

Plurals: A System for Guiding LLMs Via Simulated Social Ensembles ( http://arxiv.org/abs/2409.17213v4 )

ライセンス: Link先を確認

Joshua Ashkinaze, Emily Fry, Narendra Edara, Eric Gilbert, Ceren Budak,

翻訳日:2024-11-06 16:30:51 公開日:2024-11-01

# Uni-Med: マルチタスク学習のコネクタ-MoEのための統一医療一般モデル

Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE ( http://arxiv.org/abs/2409.17508v2 )

ライセンス: Link先を確認

Xun Zhu, Ying Hu, Fanbin Mo, Miao Li, Ji Wu,

(参考訳) MLLM(Multi-modal large language model)は、様々な視覚的・言語的タスクのための汎用インタフェースとして、印象的な機能を示している。しかし、医療分野におけるマルチタスク学習のための統一MLLMの構築は、依然として厄介な課題である。 MLLMにおけるマルチモーダルマルチタスク最適化の綱引き問題を軽減するため、近年の進歩は、モダリティ間のギャップを埋めるコネクタを無視しつつ、LLMコンポーネントの改善に重点を置いている。本稿では,Uni-Medについて紹介する。Uni-Med,Uni-Med,Uni-Med,Uni-Med,Uni-Med,Uni-Med,Uni-Med,Uni-Med,Uni-Med。コネクタにプロジェクションの専門家が混在したよく設計されたルータを活用したCMoEの利点として、Uni-Medは綱引き問題に対する効率的な解決策を実現し、質問応答、視覚的質問応答、レポート生成、表現理解の参照、表現生成、画像分類を含む6つの異なる医療タスクを実行できる。我々の知る限り、Uni-MedはMLLMのコネクタにおけるマルチタスク干渉に対処する最初の試みである。大規模なアブレーション実験により、任意の構成でCMoEを導入する効果が検証され、平均8%の性能向上が得られた。さらに、勾配最適化とパラメータ統計の観点から、綱引き問題の解釈分析を行う。従来の最先端の医療MLLMと比較すると、Uni-Medは多様なタスクにおける競争力や優れた評価基準を達成している。コードとリソースはhttps://github.com/tsinghua-msiip/Uni-Med.comで入手できる。

Multi-modal large language models (MLLMs) have shown impressive capabilities as a general-purpose interface for various visual and linguistic tasks. However, building a unified MLLM for multi-task learning in the medical field remains a thorny challenge. To mitigate the tug-of-war problem of multi-modal multi-task optimization in MLLMs, recent advances primarily focus on improving the LLM components, while neglecting the connector that bridges the gap between modalities. In this paper, we introduce Uni-Med, a novel medical generalist foundation model which consists of a universal visual feature extraction module, a connector mixture-of-experts (CMoE) module, and an LLM. Benefiting from the proposed CMoE that leverages a well-designed router with a mixture of projection experts at the connector, Uni-Med achieves efficient solution to the tug-of-war problem and can perform six different medical tasks including question answering, visual question answering, report generation, referring expression comprehension, referring expression generation and image classification. To the best of our knowledge, Uni-Med is the first effort to tackle multi-task interference at the connector in MLLMs. Extensive ablation experiments validate the effectiveness of introducing CMoE under any configuration, with up to an average 8% performance gains. We further provide interpretation analysis of the tug-of-war problem from the perspective of gradient optimization and parameter statistics. Compared to previous state-of-the-art medical MLLMs, Uni-Med achieves competitive or superior evaluation metrics on diverse tasks. Code and resources are available at https://github.com/tsinghua-msiip/Uni-Med.

翻訳日:2024-11-06 16:20:44 公開日:2024-11-01

# MetaMath:大規模言語モデルにおける数学的推論強化のための自然言語とコードの統合

MetaMath: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models ( http://arxiv.org/abs/2409.19381v1 )

ライセンス: Link先を確認

Xuyuan Xiong, Simeng Han, Ziyue Zhou, Arman Cohan,

(参考訳) 大規模言語モデル(LLM)は、自然言語、コード、あるいは両者の組み合わせという、以下の形式で数学的推論問題の解を生成するために一般的に用いられる。本稿では,GPT-4o-mini や LLama-3.1-8b-Turbo など,最先端の LLM を用いた自然言語とコードを用いた数学的推論問題の解法に関する基礎的考察を行う。その結果,LLMはコードよりも自然言語の推論が優れていることがわかった。さらに、自然言語とコードは相補的な推論の形式として機能するが、特定のシナリオでは負の形で互いに影響しあうことができる。これらの知見は, LLMを利用して最適推論形式を動的に選択し, GPT-4o-miniと同等のベースライン上での性能を向上させるメタマスという新たなプロンプト手法の開発を動機付けている。

Large Language Models (LLMs) are commonly used to generate solutions for mathematical reasoning problems in the following formats: natural language, code, or a combination of both. In this paper, we explore fundamental questions related to solving mathematical reasoning problems using natural language and code with state-of-the-art LLMs, including GPT-4o-mini and LLama-3.1-8b-Turbo. Our findings show that LLMs are better at reasoning in natural language compared to code. Additionally, although natural language and code serve as complementary forms of reasoning, they can affect each other in a negative way in certain scenarios. These insights motivate our development of a new prompting method, MetaMath, which leverages an LLM to dynamically select the most appropriate reasoning form, resulting in improved performance over comparable baselines with GPT-4o-mini.

翻訳日:2024-11-05 23:38:55 公開日:2024-11-01

# INC-Math:大規模言語モデルにおける数学的推論強化のための自然言語とコードの統合

INC-Math: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models ( http://arxiv.org/abs/2409.19381v2 )

ライセンス: Link先を確認

Xuyuan Xiong, Simeng Han, Ziyue Zhou, Arman Cohan,

(参考訳) 大規模言語モデル(LLM)は、自然言語、コード、あるいは両者の組み合わせという、以下の形式で数学的推論問題の解を生成するために一般的に用いられる。本稿では,GPT-4o-mini や LLama-3.1-8b-Turbo など,最先端の LLM を用いた自然言語とコードを用いた数学的推論問題の解法に関する基礎的考察を行う。その結果,LLMはコードよりも自然言語の推論が優れていることがわかった。さらに、自然言語とコードは相補的な推論の形式として機能するが、特定のシナリオでは負の形で互いに影響しあうことができる。これらの知見は, LLM を利用して最適推論形式を動的に選択し, GPT-4o-mini で同等のベースライン上での性能を向上させる新しいプロンプト手法 INC-Math の開発を動機付けている。

Large Language Models (LLMs) are commonly used to generate solutions for mathematical reasoning problems in the following formats: natural language, code, or a combination of both. In this paper, we explore fundamental questions related to solving mathematical reasoning problems using natural language and code with state-of-the-art LLMs, including GPT-4o-mini and LLama-3.1-8b-Turbo. Our findings show that LLMs are better at reasoning in natural language compared to code. Additionally, although natural language and code serve as complementary forms of reasoning, they can affect each other in a negative way in certain scenarios. These insights motivate our development of a new prompting method, INC-Math, which leverages an LLM to dynamically select the most appropriate reasoning form, resulting in improved performance over comparable baselines with GPT-4o-mini.

翻訳日:2024-11-05 23:38:55 公開日:2024-11-01

Xuyuan Xiong, Simeng Han, Ziyue Zhou, Arman Cohan,

(参考訳) 大規模言語モデル(LLM)は、自然言語、コード、あるいは両者の組み合わせという、以下の形式で数学的推論問題の解を生成するために一般的に用いられる。本稿では,GPT-4o-mini や LLama-3.1-8b-Turbo など,最先端の LLM を用いた自然言語とコードを用いた数学的推論問題の解法に関する基礎的考察を行う。その結果,LLMはコードよりも自然言語の推論が優れていることがわかった。さらに、自然言語とコードは相補的な推論の形式として機能するが、特定のシナリオでは負の形で互いに影響しあうことができる。これらの知見は, LLM を利用して最適推論形式を動的に選択し, GPT-4o-mini で同等のベースライン上での性能を向上させる新しいプロンプト手法 INC-Math の開発を動機付けている。

Large Language Models (LLMs) are commonly used to generate solutions for mathematical reasoning problems in the following formats: natural language, code, or a combination of both. In this paper, we explore fundamental questions related to solving mathematical reasoning problems using natural language and code with state-of-the-art LLMs, including GPT-4o-mini and LLama-3.1-8b-Turbo. Our findings show that LLMs are better at reasoning in natural language compared to code. Additionally, although natural language and code serve as complementary forms of reasoning, they can affect each other in a negative way in certain scenarios. These insights motivate our development of a new prompting method, INC-Math, which leverages an LLM to dynamically select the most appropriate reasoning form, resulting in improved performance over comparable baselines with GPT-4o-mini.

翻訳日:2024-11-05 23:38:55 公開日:2024-11-01

# 不完全データを用いたロバストマルチモーダル感性分析に向けて

Towards Robust Multimodal Sentiment Analysis with Incomplete Data ( http://arxiv.org/abs/2409.20012v1 )

ライセンス: Link先を確認

Haoyu Zhang, Wenbin Wang, Tianshu Yu,

(参考訳) マルチモーダル・センティメント・アナリティクス(MSA)の分野は、データ不完全性の問題に対処する新たな方向性を最近見てきた。言語モダリティには通常、密度の強い感情情報が含まれていることを認識し、これを支配的なモダリティとみなし、堅牢なMSAを実現するために、言語に支配された耐雑音学習ネットワーク(LNLN)を提案する。提案したLNLNは、支配的モダリティ補正(DMC)モジュールと支配的モダリティベースマルチモーダル学習(DMML)モジュールを備え、支配的モダリティ表現の品質を保証することにより、様々なノイズシナリオにおけるモデルの堅牢性を高める。方法論的な設計とは別に,いくつかの一般的なデータセット(\textit{e g ,} MOSI, MOSEI, SIMS)の多様かつ有意義な設定を利用して,ランダムなデータ不足シナリオ下で総合的な実験を行い,文献における既存の評価に比べて統一性,透明性,公正性を付加する。経験的に、LNLNは既存のベースラインを一貫して上回り、これらの挑戦的で広範な評価指標よりも優れたパフォーマンスを示している。

The field of Multimodal Sentiment Analysis (MSA) has recently witnessed an emerging direction seeking to tackle the issue of data incompleteness. Recognizing that the language modality typically contains dense sentiment information, we consider it as the dominant modality and present an innovative Language-dominated Noise-resistant Learning Network (LNLN) to achieve robust MSA. The proposed LNLN features a dominant modality correction (DMC) module and dominant modality based multimodal learning (DMML) module, which enhances the model's robustness across various noise scenarios by ensuring the quality of dominant modality representations. Aside from the methodical design, we perform comprehensive experiments under random data missing scenarios, utilizing diverse and meaningful settings on several popular datasets (\textit{e.g.,} MOSI, MOSEI, and SIMS), providing additional uniformity, transparency, and fairness compared to existing evaluations in the literature. Empirically, LNLN consistently outperforms existing baselines, demonstrating superior performance across these challenging and extensive evaluation metrics.

翻訳日:2024-11-05 16:08:18 公開日:2024-11-01

# 不完全データを用いたロバストマルチモーダル感性分析に向けて

Towards Robust Multimodal Sentiment Analysis with Incomplete Data ( http://arxiv.org/abs/2409.20012v2 )

ライセンス: Link先を確認

Haoyu Zhang, Wenbin Wang, Tianshu Yu,

翻訳日:2024-11-05 16:08:18 公開日:2024-11-01

# Embodied Agent Interface: Embodied Decision Making のための LLM ベンチマーク

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making ( http://arxiv.org/abs/2410.07166v2 )

ライセンス: Link先を確認

Manling Li, Shiyu Zhao, Qineng Wang, Kangrui Wang, Yu Zhou, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu,

(参考訳) 我々は,大規模言語モデル (LLM) を具体的意思決定のために評価することを目指している。具体的環境における意思決定にLLMを利用することは大きな成果を上げてきましたが、通常、異なる目的のために異なるドメインに適用され、異なる入力や出力に基づいて構築されるため、それらのパフォーマンスに関する体系的な理解はいまだに欠けています。さらに、既存の評価は最終成功率にのみ依存する傾向にあり、LLMに欠落している能力や、その問題のある場所を特定することは困難であり、結果として、具体化されたエージェントがLLMを効果的に選択的に活用することを妨げる。これらの制約に対処するために,多種多様なタスクの形式化とLCMベースのモジュールの入出力仕様をサポートする汎用インタフェース(Embodied Agent Interface)を提案する。具体的には統合できるのです 1)国家と時間的拡張目標の両方を含む幅広い意思決定課題の具体化。 2 意思決定に広く用いられる4つのLCMベースのモジュール:ゴール解釈、サブゴール分解、アクションシークエンシング、トランジションモデリング 3) 評価を幻覚の誤り、余裕の誤り、様々な種類の計画の誤りなど、さまざまな種類のエラーに分解する詳細な指標の収集。総合的に、我々のベンチマークは、異なるサブタスクに対するLLMのパフォーマンスを総合的に評価し、LLM駆動型AIシステムの強みと弱みを指摘し、具体的意思決定においてLLMを効果的かつ選択的に活用するための洞察を提供する。

We aim to evaluate Large Language Models (LLMs) for embodied decision making. While a significant body of work has been leveraging LLMs for decision making in embodied environments, we still lack a systematic understanding of their performance because they are usually applied in different domains, for different purposes, and built based on different inputs and outputs. Furthermore, existing evaluations tend to rely solely on a final success rate, making it difficult to pinpoint what ability is missing in LLMs and where the problem lies, which in turn blocks embodied agents from leveraging LLMs effectively and selectively. To address these limitations, we propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks and input-output specifications of LLM-based modules. Specifically, it allows us to unify 1) a broad set of embodied decision-making tasks involving both state and temporally extended goals, 2) four commonly-used LLM-based modules for decision making: goal interpretation, subgoal decomposition, action sequencing, and transition modeling, and 3) a collection of fine-grained metrics which break down evaluation into various types of errors, such as hallucination errors, affordance errors, various types of planning errors, etc. Overall, our benchmark offers a comprehensive assessment of LLMs' performance for different subtasks, pinpointing the strengths and weaknesses in LLM-powered embodied AI systems, and providing insights for effective and selective use of LLMs in embodied decision making.

翻訳日:2024-11-05 14:59:58 公開日:2024-11-01

# FedGraph:フェデレーショングラフ学習のための研究ライブラリとベンチマーク

FedGraph: A Research Library and Benchmark for Federated Graph Learning ( http://arxiv.org/abs/2410.06340v2 )

ライセンス: Link先を確認

Yuhang Yao, Yuan Li, Xinyi Fan, Junhao Li, Kay Liu, Weizhao Jin, Srivatsan Ravi, Philip S. Yu, Carlee Joe-Wong,

(参考訳) フェデレーショングラフ学習は、重要な実践上の課題を持つ新興分野である。大規模グラフ上のノード分類問題に対するグラフニューラルネットワークのトレーニング精度を高めるために,多くのアルゴリズムが提案されているが,現実の展開に不可欠であるにもかかわらず,そのシステム性能は見過ごされがちである。このギャップに対処するため、フェデレーショングラフ学習において、実用的な分散デプロイメントとベンチマークのための研究ライブラリであるFedGraphを紹介した。 FedGraphは、最先端のグラフ学習メソッドをサポートし、トレーニング中の通信と計算コストに特化して、システムパフォーマンスを評価するための組み込みのプロファイリングツールを含んでいる。既存のベンチマークプラットフォームとは異なり、FedGraphは同型暗号化をネイティブに取り入れてプライバシ保護を強化し、複数の物理マシンをまたいだ分散トレーニングを可能にし、将来のフェデレーショングラフ学習アルゴリズムのシステム設計をガイドする評価フレームワークを提供することにより、実用的なアプリケーションの開発を促進する。これらの最適化を活用して、1億のノードを持つグラフ上で実行される最初のプライバシ保護フェデレーション学習システムを示すために、FedGraphを使用します。

Federated graph learning is an emerging field with significant practical challenges. While many algorithms have been proposed to enhance the accuracy of training graph neural networks, e.g., for node classification problems on large graphs, in a federated manner, their system performance is often overlooked, even though it is crucial for real-world deployment. To address this gap, we introduce FedGraph, a research library built for practical distributed deployment and benchmarking in federated graph learning. FedGraph supports a range of state-of-the-art graph learning methods and includes built-in profiling tools to evaluate system performance, focusing specifically on communication and computation costs during training. Unlike existing benchmark platforms, FedGraph natively incorporates homomorphic encryption to enhance privacy preservation and facilitates the development of practical applications by enabling distributed training across multiple physical machines, providing an evaluation framework that can guide the system design of future federated graph learning algorithms. Leveraging these optimizations, we use FedGraph to demonstrate the first privacy-preserving federated learning system to run on graphs with 100 million nodes.

翻訳日:2024-11-05 14:50:13 公開日:2024-11-01

# 量子アドバンテージの暗号解析

Cryptographic Characterization of Quantum Advantage ( http://arxiv.org/abs/2410.00499v1 )

ライセンス: Link先を確認

Tomoyuki Morimae, Yuki Shirakawa, Takashi Yamakawa,

(参考訳) 量子計算の優位性は、量子コンピューティングでは容易だが古典的な計算では困難である計算タスクの存在を指す。無条件で量子的優位性を示すことは、現在の複雑性理論の理解以上のものであり、従っていくつかの計算的な仮定が必要である。どの複雑性の仮定が必要で、量子的優位性に十分か? 本稿では,古典的一方向パズル(OWPuzzs)が存在する場合にのみ,量子性の非効率検証証明(IV-PoQ)が存在することを示す。私たちが知る限りでは、量子優位性の完全な暗号的特徴が得られたのはこれが初めてである。 IV-PoQは、サンプリングの優位性や探索の優位性など、以前に研究された様々な種類の量子優位性を捉えている。以前の研究(森前と山川2024)では、IV-PoQはOWFから構築できるが、弱い仮定によるIV-PoQの構築は未解決であった。私たちの結果はオープンな問題を解決します。 OWPuzzsは、ワンウェイ関数(OWF)よりも弱い多くの量子暗号プリミティブによって暗示される最も基本的な量子暗号プリミティブの1つである。したがって、IV-PoQと古典的なセキュリティを持つOWPuzzsの同値性は、量子的優位性がなければ、これらの基本原始は存在しないことを強調する。等価性はまた、量子的優位性はOWPuzzsの応用の例であることを意味する。コミットメントを除いて、OWPuzzsの応用は以前には知られていなかった。量子優位性はOWPuzzsの別の応用であることを示す。さらに、OWPuzzsの量子計算古典通信(QCCC)アプリケーションとしては初めてである。

Quantum computational advantage refers to an existence of computational tasks that are easy for quantum computing but hard for classical one. Unconditionally showing quantum advantage is beyond our current understanding of complexity theory, and therefore some computational assumptions are needed. Which complexity assumption is necessary and sufficient for quantum advantage? In this paper, we show that inefficient-verifier proofs of quantumness (IV-PoQ) exist if and only if classically-secure one-way puzzles (OWPuzzs) exist. As far as we know, this is the first time that a complete cryptographic characterization of quantum advantage is obtained. IV-PoQ capture various types of quantum advantage previously studied, such as sampling advantage and searching advantage. Previous work [Morimae and Yamakawa 2024] showed that IV-PoQ can be constructed from OWFs, but a construction of IV-PoQ from weaker assumptions was left open. Our result solves the open problem. OWPuzzs are one of the most fundamental quantum cryptographic primitives implied by many quantum cryptographic primitives weaker than one-way functions (OWFs). The equivalence between IV-PoQ and classically-secure OWPuzzs therefore highlights that if there is no quantum advantage, then these fundamental primitives do not exist. The equivalence also means that quantum advantage is an example of the applications of OWPuzzs. Except for commitments, no application of OWPuzzs was known before. Our result shows that quantum advantage is another application of OWPuzzs. Moreover, it is the first quantum computation classical communication (QCCC) application of OWPuzzs.

翻訳日:2024-11-05 05:16:55 公開日:2024-11-01

# 量子アドバンテージの暗号解析

Cryptographic Characterization of Quantum Advantage ( http://arxiv.org/abs/2410.00499v2 )

ライセンス: Link先を確認

Tomoyuki Morimae, Yuki Shirakawa, Takashi Yamakawa,

(参考訳) 量子計算の優位性は、量子コンピューティングでは容易だが古典的な計算では困難である計算タスクの存在を指す。無条件で量子的優位性を示すことは、現在の複雑性理論の理解以上のものであり、従っていくつかの計算的な仮定が必要である。どの複雑性の仮定が必要で、量子的優位性に十分か? 本稿では,古典的一方向パズル(OWPuzzs)が存在する場合にのみ,量子性の非効率検証証明(IV-PoQ)が存在することを示す。私たちが知る限りでは、量子優位性の完全な暗号的特徴が得られたのはこれが初めてである。 IV-PoQは、サンプリングベースの量子アドバンテージや探索ベースの利点など、以前に研究された様々な種類の量子アドバンテージをキャプチャする。これまでの研究(森前、山川、暗号2024)では、IV-PoQはOWFから構築できるが、弱い仮定によるIV-PoQの構築は未解決であった。私たちの結果はオープンな問題を解決します。 OWPuzzsは、ワンウェイ関数(OWF)よりも弱い多くの量子暗号プリミティブによって暗示される最も基本的な量子暗号プリミティブの1つである。したがって、IV-PoQと古典的なセキュリティを持つOWPuzzsの同値性は、量子的優位性がなければ、これらの基本原始は存在しないことを強調する。等価性はまた、量子的優位性はOWPuzzsの応用の例であることを意味する。コミットメントを除いて、OWPuzzsの応用は以前には知られていなかった。この結果は,[Chung, Goldin, and Gray, Crypto 2024] の解答であるOWPuzzsの別の応用であることを示す。さらに、OWPuzzsの最初の量子計算古典通信(QCCC)応用である。

Quantum computational advantage refers to an existence of computational tasks that are easy for quantum computing but hard for classical one. Unconditionally showing quantum advantage is beyond our current understanding of complexity theory, and therefore some computational assumptions are needed. Which complexity assumption is necessary and sufficient for quantum advantage? In this paper, we show that inefficient-verifier proofs of quantumness (IV-PoQ) exist if and only if classically-secure one-way puzzles (OWPuzzs) exist. As far as we know, this is the first time that a complete cryptographic characterization of quantum advantage is obtained. IV-PoQ capture various types of quantum advantage previously studied, such as sampling-based quantum advantage and searching-based one. Previous work [Morimae and Yamakawa, Crypto 2024] showed that IV-PoQ can be constructed from OWFs, but a construction of IV-PoQ from weaker assumptions was left open. Our result solves the open problem. OWPuzzs are one of the most fundamental quantum cryptographic primitives implied by many quantum cryptographic primitives weaker than one-way functions (OWFs). The equivalence between IV-PoQ and classically-secure OWPuzzs therefore highlights that if there is no quantum advantage, then these fundamental primitives do not exist. The equivalence also means that quantum advantage is an example of the applications of OWPuzzs. Except for commitments, no application of OWPuzzs was known before. Our result shows that quantum advantage is another application of OWPuzzs, which solves the open question of [Chung, Goldin, and Gray, Crypto 2024]. Moreover, it is the first quantum-computation-classical-communication (QCCC) application of OWPuzzs.

翻訳日:2024-11-05 05:16:55 公開日:2024-11-01

# STONE: アクティブ3次元オブジェクト検出のためのサブモジュール最適化フレームワーク

STONE: A Submodular Optimization Framework for Active 3D Object Detection ( http://arxiv.org/abs/2410.03918v2 )

ライセンス: Link先を確認

Ruiyu Mao, Sarthak Kumar Maharana, Rishabh K Iyer, Yunhui Guo,

(参考訳) 3Dオブジェクト検出は、自律運転やロボット工学など、様々な新興アプリケーションにとって基本的に重要である。正確な3Dオブジェクト検出器をトレーニングするための重要な要件は、大量のLiDARベースのポイントクラウドデータが利用可能であることである。残念なことに、ポイントクラウドデータのラベル付けは非常に難しい。本稿では,3次元物体検出装置のトレーニングにおけるラベル付けコストを大幅に削減する,統合されたアクティブな3次元物体検出フレームワークを提案する。本フレームワークは, アクティブな3次元物体検出の問題に特化して, サブモジュラー最適化の新たな定式化を基礎としている。特に, アクティブな3Dオブジェクト検出に関連する2つの基本的な課題に対処する: データ不均衡と, 様々な難易度を持つLiDARベースのポイントクラウドデータを含むデータの分布をカバーする必要性。大規模実験により,本手法は既存の能動学習法と比較して,高い計算効率で最先端の性能を達成できることが実証された。コードはhttps://github.com/RuiyuM/STONEで入手できる。

3D object detection is fundamentally important for various emerging applications, including autonomous driving and robotics. A key requirement for training an accurate 3D object detector is the availability of a large amount of LiDAR-based point cloud data. Unfortunately, labeling point cloud data is extremely challenging, as accurate 3D bounding boxes and semantic labels are required for each potential object. This paper proposes a unified active 3D object detection framework, for greatly reducing the labeling cost of training 3D object detectors. Our framework is based on a novel formulation of submodular optimization, specifically tailored to the problem of active 3D object detection. In particular, we address two fundamental challenges associated with active 3D object detection: data imbalance and the need to cover the distribution of the data, including LiDAR-based point cloud data of varying difficulty levels. Extensive experiments demonstrate that our method achieves state-of-the-art performance with high computational efficiency compared to existing active learning methods. The code is available at https://github.com/RuiyuM/STONE.

翻訳日:2024-11-04 21:09:23 公開日:2024-11-01

# オーバーコンプリート画素を用いたコントラスト学習による動きブラインド画像の配向

Aligning Motion-Blurred Images Using Contrastive Learning on Overcomplete Pixels ( http://arxiv.org/abs/2410.07410v2 )

ライセンス: Link先を確認

Leonid Pogorelyuk, Stefan T. Radev,

(参考訳) 動きのぼかしに不変なオーバーコンプリート画素レベルの特徴を学習するための新しいコントラスト的目的を提案する。他の不変性(例えば、ポーズ、照明、天候)は、自己監督訓練中にラベルのない画像に対応する変換を適用することで学習することができる。我々の目的を訓練した単純なU-Netは、現実的で困難な条件下で撮影される見えないビデオのフレームを移動カメラに合わせるのに有用なローカル機能を生み出すことができることを実証する。また、慎重にデザインされた玩具の例を用いて、画像中のオブジェクトの同一性やそれらのオブジェクトに対する画素座標を符号化できることも示す。

We propose a new contrastive objective for learning overcomplete pixel-level features that are invariant to motion blur. Other invariances (e.g., pose, illumination, or weather) can be learned by applying the corresponding transformations on unlabeled images during self-supervised training. We showcase that a simple U-Net trained with our objective can produce local features useful for aligning the frames of an unseen video captured with a moving camera under realistic and challenging conditions. Using a carefully designed toy example, we also show that the overcomplete pixels can encode the identity of objects in an image and the pixel coordinates relative to these objects.

翻訳日:2024-11-04 21:09:23 公開日:2024-11-01

# In-Context Transfer Learning: 類似タスクの転送によるデモレーション合成

In-Context Transfer Learning: Demonstration Synthesis by Transferring Similar Tasks ( http://arxiv.org/abs/2410.01548v1 )

ライセンス: Link先を確認

Dingzirui Wang, Xuangliang Zhang, Qiguang Chen, Longxu Dou, Xiao Xu, Rongyu Cao, Yingwei Ma, Qingfu Zhu, Wanxiang Che, Binhua Li, Fei Huang, Yongbin Li,

(参考訳) In-context Learning (ICL) は、大規模言語モデル(LLM)が様々なタスクに適応するための効果的なアプローチである。ラベル付けデモのコストが高いことを考えると、多くの手法がLSMを用いてスクラッチからデモを合成することを提案している。しかし、スクラッチから合成された実演の質は、LLMの能力と知識によって制限される。そこで本稿では,移動学習にヒントを得たICTL(In-Context Transfer Learning)を提案する。 ICTLはソースサンプリングとターゲット転送の2つのステップから構成される。まず,対象タスクに類似したサンプルソースデモへの転送エラーを最小限に抑える最適化目標を定義する。次に,LLMを用いてサンプルソースのデモを対象タスクに転送し,対象タスクの定義と形式を一致させる。 Super-NI実験の結果,ICTLの合成効率は平均2.0%向上し,本手法の有効性が示された。

In-context learning (ICL) is an effective approach to help large language models (LLMs) adapt to various tasks by providing demonstrations of the target task. Considering the high cost of labeling demonstrations, many methods propose synthesizing demonstrations from scratch using LLMs. However, the quality of the demonstrations synthesized from scratch is limited by the capabilities and knowledge of LLMs. To address this, inspired by transfer learning, we propose In-Context Transfer Learning (ICTL), which synthesizes target task demonstrations by transferring labeled demonstrations from similar source tasks. ICTL consists of two steps: source sampling and target transfer. First, we define an optimization objective, which minimizes transfer error to sample source demonstrations similar to the target task. Then, we employ LLMs to transfer the sampled source demonstrations to the target task, matching the definition and format of the target task. Experiments on Super-NI show that ICTL outperforms synthesis from scratch by 2.0% on average, demonstrating the effectiveness of our method.

翻訳日:2024-11-04 17:04:38 公開日:2024-11-01

Dingzirui Wang, Xuanliang Zhang, Qiguang Chen, Longxu Dou, Xiao Xu, Rongyu Cao, Yingwei Ma, Qingfu Zhu, Wanxiang Che, Binhua Li, Fei Huang, Yongbin Li,

翻訳日:2024-11-04 17:04:38 公開日:2024-11-01

# 量子コミットと量子片方向のOracle分離

Oracle Separation Between Quantum Commitments and Quantum One-wayness ( http://arxiv.org/abs/2410.03358v2 )

ライセンス: Link先を確認

John Bostanci, Boyang Chen, Barak Nehoran,

(参考訳) 量子コミットメントが存在するが、(効果的に検証可能な)片方向状態生成器が存在しないような、ユニタリな量子オラクルが存在することを示す。どちらも、暗号の最小仮定としてワンウェイ関数を置き換える候補として広く考えられている。最近の研究は、一方の状態発生器からコミットメントを構築することができることを示したが、他方の方向は未解決のままである。我々の結果はブラックボックスの構成を除外し、この決定的なオープンな問題を解決し、量子コミットメント(EFI対の同値クラス、量子オブザーバー転送、セキュアな量子多パーティ計算)が、すべての既知のプリミティブの中では、極端に弱いように見えることを示唆している。

We show that there exists a unitary quantum oracle relative to which quantum commitments exist but no (efficiently verifiable) one-way state generators exist. Both have been widely considered candidates for replacing one-way functions as the minimal assumption for cryptography: the weakest cryptographic assumption implied by all of computational cryptography. Recent work has shown that commitments can be constructed from one-way state generators, but the other direction has remained open. Our results rule out any black-box construction, and thus settle this crucial open problem, suggesting that quantum commitments (as well as its equivalency class of EFI pairs, quantum oblivious transfer, and secure quantum multiparty computation) appear to be strictly weakest among all known cryptographic primitives.