Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20241004となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# Spectrum-Aware Debiasing - 主要コンポーネントの回帰処理を応用した現代的な推論フレームワーク Spectrum-Aware Debiasing: A Modern Inference Framework with Applications to Principal Components Regression ( http://arxiv.org/abs/2309.07810v4 ) ライセンス: Link先を確認	Yufan Li, Pragya Sur,	(参考訳) 偏見は高次元統計学における基本的な概念である。自由度調整は、高次元線形回帰における最先端技術である一方、これはi.d.サンプルと亜ガウス共変量に限られる。これらの制約は、その広範な実用性を妨げている。本稿では,高次元回帰のための新しい手法であるSpectrum-Aware Debiasingを紹介する。我々のアプローチは、構造化された依存関係、重いテール、低ランク構造に関する問題に適用されます。提案手法は, サンプル共分散行列のスペクトル情報を用いて再スケーリング係数を導出し, 再スケール勾配降下ステップによるデバイアス化を実現する。スペクトルベースのアプローチは、より広い文脈での正確な偏りの除去を可能にする。特徴量とサンプル数が比例的にスケールする共通近代体制を考察する。我々は、共変量体が右回転不変であるとき、様々な収束概念の下で、提案した推定器の漸近正規性(好適に中心化およびスケール化)を確立する。このような設計は、圧縮センシングにおいて重要な役割を担っているため、近年注目を集めている。さらに、その漸近的分散に対する一貫した推定器を考案する。まず、主成分回帰(PCR)のバイアスを補正するためにSpectrum-Aware Debiasingを使用し、高次元における最初の脱バイアスPCR推定器を提供する。第2に、サンプル共分散行列の信号と固有ベクトルとの整合性を確認するための原理的テストを導入する。このテストは、近似メッセージパッシング(英語版)、Leave-one-out(英語版)、凸ガウスのmin-max定理(英語版)を用いて開発された統計手法には独立に有用である。シミュレーションおよび実データ実験により本手法を実証する。技術的には、近似メッセージパッシングアルゴリズムとデバイアスを結合し、ベクトル近似メッセージパッシング(V-AMP)のコーシー性の最初の証明を提供する。 Debiasing is a fundamental concept in high-dimensional statistics. While degrees-of-freedom adjustment is the state-of-the-art technique in high-dimensional linear regression, it is limited to i.i.d. samples and sub-Gaussian covariates. These constraints hinder its broader practical use. Here, we introduce Spectrum-Aware Debiasing--a novel method for high-dimensional regression. Our approach applies to problems with structured dependencies, heavy tails, and low-rank structures. Our method achieves debiasing through a rescaled gradient descent step, deriving the rescaling factor using spectral information of the sample covariance matrix. The spectrum-based approach enables accurate debiasing in much broader contexts. We study the common modern regime where the number of features and samples scale proportionally. We establish asymptotic normality of our proposed estimator (suitably centered and scaled) under various convergence notions when the covariates are right-rotationally invariant. Such designs have garnered recent attention due to their crucial role in compressed sensing. Furthermore, we devise a consistent estimator for its asymptotic variance. Our work has two notable by-products: first, we use Spectrum-Aware Debiasing to correct bias in principal components regression (PCR), providing the first debiased PCR estimator in high dimensions. Second, we introduce a principled test for checking alignment between the signal and the eigenvectors of the sample covariance matrix. This test is independently valuable for statistical methods developed using approximate message passing, leave-one-out, or convex Gaussian min-max theorems. We demonstrate our method through simulated and real data experiments. Technically, we connect approximate message passing algorithms with debiasing and provide the first proof of the Cauchy property of vector approximate message passing (V-AMP).	翻訳日:2024-11-09 14:28:50 公開日:2024-10-04
# 注意層上でのシンプルなドロップインロラ条件は拡散モデルを改善する Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model ( http://arxiv.org/abs/2405.03958v2 ) ライセンス: Link先を確認	Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu,	(参考訳) 現在の最先端拡散モデルでは、畳み込み層と(qkv)自己アテンション層を含むU-Netアーキテクチャを採用している。 U-Netは、サンプリングステップ毎にタイム埋め込み入力と、所望の条件生成に対応するクラスまたはキャプション埋め込み入力とに基づいて、条件付きで画像を処理する。このような条件付けは、畳み込み層へのスケール・アンド・シフト操作を含むが、注意層に直接影響しない。これらの標準的なアーキテクチャ選択は確かに有効であるが、注意層を条件付けしないことは任意であり、潜在的に最適であると感じている。本研究では,U-Netアーキテクチャの他の部分を変更・調整することなく,LoRAコンディショニングをアテンション層に追加するだけで画像生成品質が向上することを示す。例えば、EDM拡散モデルにLoRA条件を付加すると、不条件およびクラス条件のCIFAR-10生成に対するFIDスコアが 1.91/1.75 となり、ベースラインが 1.97/1.79 となる。 Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79.	翻訳日:2024-11-09 02:52:29 公開日:2024-10-04
# 結合チューニングに基づく固定位相可変方向カプラ A fixed phase tunable directional coupler based on coupling tuning ( http://arxiv.org/abs/2405.13660v2 ) ライセンス: Link先を確認	Yang Yang, Tim Weiss, Hamed Arianfard, Akram Youssry, Alberto Peruzzo,	(参考訳) フォトニック集積回路の分野は近年大きく進歩し、高性能な再構成が可能なデバイスへの需要が高まっている。従来の調整可能な指向性カプラ(TDC)が、反射率を調整しながら一定の位相を維持することができないため、大規模な回路構築において、反射率調整のための一次構造ブロックとしてマッハ・ツェンダー干渉計(MZI)が使用される。しかし、MZIは、そのスケーラビリティを妨げる0-1反射率を達成するために、完全なバランスの取れた方向性結合器を必要とするため、製造エラーを起こしやすい。本研究では,薄膜Lithium Niobateプラットフォームにおける結合定数チューニングに基づくTDCの設計と最適化設計を提案する。最適化されたTDC設計は、幅広い動作波長で一貫した位相を確保しつつ、任意の反射率調整を可能にする。さらに、MZIよりも曲げ面積が少なく、MZIと従来のTDCと比べ、導波路形状および結合長の加工誤差に本質的に耐性がある。本研究は,光通信システムや量子情報処理など,様々な分野に影響を及ぼす高性能フォトニック集積回路の開発に寄与する。 The field of photonic integrated circuits has witnessed significant progress in recent years, with a growing demand for devices that offer high-performance reconfigurability. Due to the inability of conventional tunable directional couplers (TDCs) to maintain a fixed phase while tuning the reflectivity, Mach-Zehnder interferometers (MZIs) are employed as the primary building blocks for reflectivity tuning in constructing large-scale circuits. However, MZIs are prone to fabrication errors due to the need for perfect balanced directional couplers to achieve 0-1 reflectivity, which hinders their scalability. In this study, we introduce a design of a TDC based on coupling constant tuning in the thin film Lithium Niobate platform and present an optimized design. Our optimized TDC design enables arbitrary reflectivity tuning while ensuring a consistent phase across a wide range of operating wavelengths. Furthermore, it exhibits fewer bending sections than MZIs and is inherently resilient to fabrication errors in waveguide geometry and coupling length compared to both MZIs and conventional TDCs. Our work contributes to developing high-performance photonic integrated circuits with implications for various fields, including optical communication systems and quantum information processing.	翻訳日:2024-11-09 02:18:45 公開日:2024-10-04
# LearnerVoice:非負の英語学習者の自発音声のデータセット LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech ( http://arxiv.org/abs/2407.04280v2 ) ライセンス: Link先を確認	Haechan Kim, Junho Myung, Seoyoung Kim, Sungpah Lee, Dongyeop Kang, Juho Kim,	(参考訳) 第二言語(L2)学習者による自然発話における非文法的表現と不一致は、自動音声認識(ASR)システムに固有の課題を提起する。しかし、L2学習音声に適したデータセットはほとんどない。我々はLearnerVoiceを公開し、LearnerVoiceは50.04時間の音声とL2学習者の自然発話の書き起こしからなるデータセットである。言語学的分析の結果,L2S(L2学習者の自発音声)の特徴は,非文法的表現と不一致(例えば,充足語,単語繰り返し,自己修復,偽開始)から成り立っていることがわかった。 LearnerVoiceによる微調整のwhisper-small.enのWERは10.26%、バニラのwhisper-small.enよりも44.2%低い。さらに,LearnerVoiceにおけるバニラモデルの誤差の54.2%がL2Sの特徴によるもので,48.1%が微調整モデルで減少している。 Prevalent ungrammatical expressions and disfluencies in spontaneous speech from second language (L2) learners pose unique challenges to Automatic Speech Recognition (ASR) systems. However, few datasets are tailored to L2 learner speech. We publicly release LearnerVoice, a dataset consisting of 50.04 hours of audio and transcriptions of L2 learners' spontaneous speech. Our linguistic analysis reveals that transcriptions in our dataset contain L2S (L2 learner's Spontaneous speech) features, consisting of ungrammatical expressions and disfluencies (e.g., filler words, word repetitions, self-repairs, false starts), significantly more than native speech datasets. Fine-tuning whisper-small.en with LearnerVoice achieves a WER of 10.26%, 44.2% lower than vanilla whisper-small.en. Furthermore, our qualitative analysis indicates that 54.2% of errors from the vanilla model on LearnerVoice are attributable to L2S features, with 48.1% of them being reduced in the fine-tuned model.	翻訳日:2024-11-08 23:57:53 公開日:2024-10-04
# 微分プライベートインダクティブマイナー Differentially Private Inductive Miner ( http://arxiv.org/abs/2407.04595v2 ) ライセンス: Link先を確認	Max Schulze, Yorck Zisgen, Moritz Kirschte, Esfandiar Mohammadi, Agnes Koschmider,	(参考訳) プロセスマイニングにおけるイベントトレースのような個人に関する個人データの保護は、個人が引き起こしたプロセスモデルにおいて、イベントトレースがパスに関する情報を漏らすため、本質的に難しい作業である。しかし、k-匿名性やイベントログの衛生化といったイベントトレースの以前の匿名化手法は、そのようなリークに対して、特に十分な背景知識を持つ敵に対する防御に苦慮していた。本研究では,プライバシ保護方式でプロセスツリーを学習し,センシティブなイベントトレースを要約する手法を提案する。我々は、いわゆる差分プライバシー(DP)プロパティを通して、結果の要約から、イベントトレース内の任意の個人データについて有用な推論ができないことを証明した。技術的には、インダクティブマイナーの微分プライベート近似(DPIM)を導入する。実験により、DPIMとインダクティブマイナーを14の現実世界のイベントトレースで比較し、フィットネス、精度、単純さ、一般化といったよく知られた指標を評価した。実験の結果,DPIMは個人データを保護するだけでなく,インダクティブ・マイナーよりも有効性が低い忠実なプロセスツリーを生成することがわかった。 Protecting personal data about individuals, such as event traces in process mining, is an inherently difficult task since an event trace leaks information about the path in a process model that an individual has triggered. Yet, prior anonymization methods of event traces like k-anonymity or event log sanitization struggled to protect against such leakage, in particular against adversaries with sufficient background knowledge. In this work, we provide a method that tackles the challenge of summarizing sensitive event traces by learning the underlying process tree in a privacy-preserving manner. We prove via the so-called Differential Privacy (DP) property that from the resulting summaries no useful inference can be drawn about any personal data in an event trace. On the technical side, we introduce a differentially private approximation (DPIM) of the Inductive Miner. Experimentally, we compare our DPIM with the Inductive Miner on 14 real-world event traces by evaluating well-known metrics: fitness, precision, simplicity, and generalization. The experiments show that our DPIM not only protects personal data but also generates faithful process trees that exhibit little utility loss above the Inductive Miner.	翻訳日:2024-11-08 23:46:45 公開日:2024-10-04
# CopyBench: 言語モデル生成における著作権保護テキストのリテラルと非リテラル再現の測定 CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation ( http://arxiv.org/abs/2407.07087v2 ) ライセンス: Link先を確認	Tong Chen, Akari Asai, Niloofar Mireshghallah, Sewon Min, James Grimmelmann, Yejin Choi, Hannaneh Hajishirzi, Luke Zettlemoyer, Pang Wei Koh,	(参考訳) 言語モデル(LM)による著作権保護されたコンテンツの再生の度合いを評価することは、AIと法的なコミュニティにとって重要な関心事である。再現度を評価する際には、リテラルと非リテラルの類似性の両方が裁判所によって検討されているが、先行研究はリテラルの類似性のみに焦点を当てている。このギャップを埋めるために、私たちは、LM世代におけるリテラルと非リテラルの両方のコピーを測定するために設計されたベンチマークであるCopyBenchを紹介します。著作権書をテキストソースとして使用することにより,著作権書から事実を想起し,流動的な完成物を生成する能力の観点から,リテラルおよびノンリテラルコピーを評価するための自動評価プロトコルを提供する。リテラル複写は比較的稀であるが、イベント複写と文字複写という2種類の非リテラル複写は、7Bパラメータのモデルでも発生する。 Llama3-8Bモデルと70Bモデルを比較すると、リテラルコピー率は0.2\%から10.5\%に増加し、非リテラルコピーは2.3\%から5.9\%に増加した。さらに,(1) トレーニング時アライメントはリテラル複写を削減できるが,非リテラル複写を増大させる可能性があり,(2) 現行の推論時緩和手法はリテラルを減少させるが,非リテラル複写を減少させるものではないことを示す。 Evaluating the degree of reproduction of copyright-protected content by language models (LMs) is of significant interest to the AI and legal communities. Although both literal and non-literal similarities are considered by courts when assessing the degree of reproduction, prior research has focused only on literal similarities. To bridge this gap, we introduce CopyBench, a benchmark designed to measure both literal and non-literal copying in LM generations. Using copyrighted fiction books as text sources, we provide automatic evaluation protocols to assess literal and non-literal copying, balanced against the model utility in terms of the ability to recall facts from the copyrighted works and generate fluent completions. We find that, although literal copying is relatively rare, two types of non-literal copying -- event copying and character copying -- occur even in models as small as 7B parameters. Larger models demonstrate significantly more copying, with literal copying rates increasing from 0.2\% to 10.5\% and non-literal copying from 2.3\% to 5.9\% when comparing Llama3-8B and 70B models, respectively. We further evaluate the effectiveness of current strategies for mitigating copying and show that (1) training-time alignment can reduce literal copying but may increase non-literal copying, and (2) current inference-time mitigation methods primarily reduce literal but not non-literal copying.	翻訳日:2024-11-08 22:51:19 公開日:2024-10-04
# 共形場理論とホログラフィーにおける絡み合い非対称性 Entanglement asymmetry in conformal field theory and holography ( http://arxiv.org/abs/2407.07969v2 ) ライセンス: Link先を確認	Francesco Benini, Victor Godet, Amartya Harsh Singh,	(参考訳) エンタングルメント非対称性(英: Entanglement asymmetric)は、量子情報理論に着想を得た量子サブシステムにおける対称性の破れの尺度である。 U(1)対称性を持つ共形場の量子論における励起的「コヒーレント状態」のクラスの絡み合い非対称性を、位相対称性の欠陥を持つユークリッドパス積分法とレプリカ形式主義を用いて研究する。摂動理論では、平面空間における有限球面部分領域、有限体積、正の温度を含む様々なサブシステムの非対称性を先導的に計算する。我々はまた、そのローレンツ時間の進化を研究し、熱化による対称性の動的復元と量子ムペンバ効果の存在を示す。我々の結果は普遍的であり、任意の次元に適用できる。また、摂動エンタングルメント非対称性は、ホランドス=ウォルド標準エネルギーと呼ばれる既知のホログラフィック双対を持つフィッシャー情報量と関係しており、エンタングルメントウェッジに含まれるAdSバルク電荷によって捕捉されることを示す。 Entanglement asymmetry is a measure of symmetry breaking in quantum subsystems, inspired by quantum information theory, particularly suited to study out-of-equilibrium states. We study the entanglement asymmetry of a class of excited "coherent states" in conformal quantum field theories with a U(1) symmetry, employing Euclidean path-integral methods with topological symmetry defects and the replica formalism. We compute, at leading order in perturbation theory, the asymmetry for a variety of subsystems, including finite spherical subregions in flat space, in finite volume, and at positive temperature. We also study its Lorentzian time evolution, showcasing the dynamical restoration of the symmetry due to thermalization, as well as the presence of a quantum Mpemba effect. Our results are universal, and apply in any number of dimensions. We also show that the perturbative entanglement asymmetry is related to the Fisher information metric, which has a known holographic dual called Hollands-Wald canonical energy, and that it is captured by the AdS bulk charge contained in the entanglement wedge.	翻訳日:2024-11-08 22:29:09 公開日:2024-10-04
# 2024年欧州議会議員選挙に関する事例研究 Investigating LLMs as Voting Assistants via Contextual Augmentation: A Case Study on the European Parliament Elections 2024 ( http://arxiv.org/abs/2407.08495v2 ) ライセンス: Link先を確認	Ilias Chalkidis,	(参考訳) 2024年の欧州議会議員選挙では、LLMがVoting Advice Applications (VAA)として利用できるかどうかを調査している。我々は、MISTRALとMIXTRALモデルを評価し、最新の「EUとI」投票支援アンケートに基づいて、政党の姿勢を予測する際の精度を評価する。さらに、Web検索に頼って入力コンテキストをRAG(Retrieval-Augmented Generation)によって拡張し、モデルの内部メモリから関連コンテンツを再収集することを目的とした、段階的会話を用いた自己回帰(Self-Reflection)により、モデルの性能を改善する方法を検討する。その結果,MIXTRALは平均82%の精度で高い精度を示し,異なる政治グループ(50～95%)で有意な性能差が認められた。入力コンテキストを専門家による情報で拡張することで、近似が大幅に向上する可能性がある。これは、キュレートされたコンテンツを考慮しても、自動RAGアプローチのオープンな課題であり続けている。 In light of the recent 2024 European Parliament elections, we are investigating if LLMs can be used as Voting Advice Applications (VAAs). We audit MISTRAL and MIXTRAL models and evaluate their accuracy in predicting the stance of political parties based on the latest "EU and I" voting assistance questionnaire. Furthermore, we explore alternatives to improve models' performance by augmenting the input context via Retrieval-Augmented Generation (RAG) relying on web search, and Self-Reflection using staged conversations that aim to re-collect relevant content from the model's internal memory. We find that MIXTRAL is highly accurate with an 82% accuracy on average with a significant performance disparity across different political groups (50-95%). Augmenting the input context with expert-curated information can lead to a significant boost of approx. 9%, which remains an open challenge for automated RAG approaches, even considering curated content.	翻訳日:2024-11-08 22:17:54 公開日:2024-10-04
# 深層学習を用いたシングルイメージシャドウ除去:包括的調査 Single-Image Shadow Removal Using Deep Learning: A Comprehensive Survey ( http://arxiv.org/abs/2407.08865v2 ) ライセンス: Link先を確認	Laniqng Guo, Chong Wang, Yufei Wang, Yi Yu, Siyu Huang, Wenhan Yang, Alex C. Kot, Bihan Wen,	(参考訳) シャドウ除去は、シャドウ領域と非シャドウ領域の均一な照明分布を追求し、シャドウ領域内の画像内容を復元することを目的としている。【他の画像復元作業と比較して,影除去には2つの課題がある】 1) 影のパターンは任意であり、変化し、しばしば非常に複雑なトレース構造を持つため、「トレースレス」画像の回復は困難である。 2) 陰影による劣化は空間的に不均一であり, 照度と陰影領域と非陰影領域の色の矛盾が生じている。この分野での最近の開発は、主にディープラーニングベースのソリューションによって進められており、様々な学習戦略、ネットワークアーキテクチャ、損失関数、トレーニングデータを利用している。それでも、ディープラーニングに基づくシャドウ除去技術に関する、徹底的で洞察に富んだレビューは、まだ欠落している。本稿では,技術詳細からアプリケーションまで,さまざまな側面をカバーする総合的な調査を初めて実施する。深層学習に基づくシングルイメージシャドウ除去手法の大きな進歩を強調し、様々なカテゴリにわたる過去の研究を徹底的にレビューし、これらの発展の歴史的進展に関する洞察を提供する。さらに,性能比較を定量的かつ質的に要約する。シャドウ除去の技術的側面の他に、この分野の将来的な方向性についても検討する。 Shadow removal aims at restoring the image content within shadow regions, pursuing a uniform distribution of illumination that is consistent between shadow and non-shadow regions. {Comparing to other image restoration tasks, there are two unique challenges in shadow removal:} 1) The patterns of shadows are arbitrary, varied, and often have highly complex trace structures, making ``trace-less'' image recovery difficult. 2) The degradation caused by shadows is spatially non-uniform, resulting in inconsistencies in illumination and color between shadow and non-shadow areas. Recent developments in this field are primarily driven by deep learning-based solutions, employing a variety of learning strategies, network architectures, loss functions, and training data. Nevertheless, a thorough and insightful review of deep learning-based shadow removal techniques is still lacking. In this paper, we are the first to provide a comprehensive survey to cover various aspects ranging from technical details to applications. We highlight the major advancements in deep learning-based single-image shadow removal methods, thoroughly review previous research across various categories, and provide insights into the historical progression of these developments. Additionally, we summarize performance comparisons both quantitatively and qualitatively. Beyond the technical aspects of shadow removal methods, we also explore potential future directions for this field.	翻訳日:2024-11-08 22:17:54 公開日:2024-10-04
# Rydberg-atom arrayを用いたディジタルアナログ量子遺伝的アルゴリズム Digital-analog quantum genetic algorithm using Rydberg-atom arrays ( http://arxiv.org/abs/2407.09308v2 ) ライセンス: Link先を確認	Aleix Llenas, Lucas Lamata,	(参考訳) デジタルアナログ量子コンピューティング(DAQC)は、デジタルゲートとアナログ演算を組み合わせて、普遍的な量子計算の代替パラダイムを提供する。このアプローチは、アナログ演算の高忠実度と局所的な単一量子ゲートの柔軟性を活用する。本稿では,Rydberg-atom emulator を用いたDAQCフレームワーク内の量子遺伝的アルゴリズムを提案する。このアルゴリズムは、デジタル領域における単一量子演算とアナログ領域におけるライドバーグ・ハミルトニアンに基づく大域的駆動相互作用を用いる。我々は,ハミルトニアンの基底状態エネルギーを推定してアルゴリズムの性能を評価し,$\rm H_2$,$\rm LiH$,$\rm BeH_2$などの分子に着目した。その結果, 誤差が1%未満で, 状態重なりが1に近づき, 計算時間は, 数分で$\rm H_2$ (2-qubit) から$\rm LiH$ と $\rm BeH_2$ (6-qubit) までの1～2日間であった。グローバルアナログ演算のゲート忠実性は、ノイズの多い中間スケール量子時代における有望な量子コンピューティング戦略としてDAQCをさらに強調する。 Digital-analog quantum computing (DAQC) combines digital gates with analog operations, offering an alternative paradigm for universal quantum computation. This approach leverages the higher fidelities of analog operations and the flexibility of local single-qubit gates. In this paper, we propose a quantum genetic algorithm within the DAQC framework using a Rydberg-atom emulator. The algorithm employs single-qubit operations in the digital domain and a global driving interaction based on the Rydberg Hamiltonian in the analog domain. We evaluate the algorithm performance by estimating the ground-state energy of Hamiltonians, with a focus on molecules such as $\rm H_2$, $\rm LiH$, and $\rm BeH_2$. Our results show energy estimations with less than 1% error and state overlaps nearing 1, with computation times ranging from a few minutes for $\rm H_2$ (2-qubit circuits) to one to two days for $\rm LiH$ and $\rm BeH_2$ (6-qubit circuits). The gate fidelities of global analog operations further underscore DAQC as a promising quantum computing strategy in the noisy intermediate-scale quantum era.	翻訳日:2024-11-08 22:06:29 公開日:2024-10-04
# AIoTにおけるFPGAを用いた時系列予測のための整数のみ量子変換器 Integer-only Quantized Transformers for Embedded FPGA-based Time-series Forecasting in AIoT ( http://arxiv.org/abs/2407.11041v3 ) ライセンス: Link先を確認	Tianheng Ling, Chao Qian, Gregor Schiele,	(参考訳) 本稿では,AIoTシステムにおけるデバイス上の時系列予測に最適化されたTransformers用ハードウェアアクセラレータの設計について述べる。整数のみの量子化と量子化アウェアトレーニングを最適化されたハードウェア設計と統合し、6ビットおよび4ビットの量子化トランスフォーマーモデルを実現し、関連する研究から8ビットの量子化モデルに匹敵する精度を達成した。組み込みFPGA(Xilinx Spartan-7 XC7S15)の完全な実装を利用して,組込みIoTデバイスにTransformerモデルをデプロイする可能性を検討する。これには、達成可能な精度、リソース利用、タイミング、電力、デバイス上の推論のためのエネルギー消費の徹底的な分析が含まれる。以上の結果から,十分な性能を達成できたとしても,最適化プロセスは簡単ではないことが示唆された。例えば、量子化ビット幅の削減は、様々な最適化の組み合わせを体系的に探索する必要性を強調し、レイテンシやエネルギー消費を一貫して減少させるわけではない。関連する研究で8ビット量子トランスモデルと比較すると、我々の4ビット量子トランスモデルはテスト損失をわずか0.63%増加させ、最大132.33倍速く動作し、48.19倍のエネルギーを消費する。 This paper presents the design of a hardware accelerator for Transformers, optimized for on-device time-series forecasting in AIoT systems. It integrates integer-only quantization and Quantization-Aware Training with optimized hardware designs to realize 6-bit and 4-bit quantized Transformer models, which achieved precision comparable to 8-bit quantized models from related research. Utilizing a complete implementation on an embedded FPGA (Xilinx Spartan-7 XC7S15), we examine the feasibility of deploying Transformer models on embedded IoT devices. This includes a thorough analysis of achievable precision, resource utilization, timing, power, and energy consumption for on-device inference. Our results indicate that while sufficient performance can be attained, the optimization process is not trivial. For instance, reducing the quantization bitwidth does not consistently result in decreased latency or energy consumption, underscoring the necessity of systematically exploring various optimization combinations. Compared to an 8-bit quantized Transformer model in related studies, our 4-bit quantized Transformer model increases test loss by only 0.63%, operates up to 132.33x faster, and consumes 48.19x less energy.	翻訳日:2024-11-08 21:21:36 公開日:2024-10-04
# AIoTにおけるFPGAを用いた時系列予測のための整数のみ量子変換器 Integer-only Quantized Transformers for Embedded FPGA-based Time-series Forecasting in AIoT ( http://arxiv.org/abs/2407.11041v4 ) ライセンス: Link先を確認	Tianheng Ling, Chao Qian, Gregor Schiele,	(参考訳) 本稿では,AIoTシステムにおけるデバイス上の時系列予測に最適化されたTransformers用ハードウェアアクセラレータの設計について述べる。整数のみの量子化と量子化アウェアトレーニングを最適化されたハードウェア設計と統合し、6ビットおよび4ビットの量子化トランスフォーマーモデルを実現し、関連する研究から8ビットの量子化モデルに匹敵する精度を達成した。組み込みFPGA(Xilinx Spartan-7 XC7S15)の完全な実装を利用して,組込みIoTデバイスにTransformerモデルをデプロイする可能性を検討する。これには、達成可能な精度、リソース利用、タイミング、電力、デバイス上の推論のためのエネルギー消費の徹底的な分析が含まれる。以上の結果から,十分な性能を達成できたとしても,最適化プロセスは簡単ではないことが示唆された。例えば、量子化ビット幅の削減は、様々な最適化の組み合わせを体系的に探索する必要性を強調し、レイテンシやエネルギー消費を一貫して減少させるわけではない。関連する研究で8ビット量子トランスモデルと比較すると、我々の4ビット量子トランスモデルはテスト損失をわずか0.63%増加させ、最大132.33倍速く動作し、48.19倍のエネルギーを消費する。 This paper presents the design of a hardware accelerator for Transformers, optimized for on-device time-series forecasting in AIoT systems. It integrates integer-only quantization and Quantization-Aware Training with optimized hardware designs to realize 6-bit and 4-bit quantized Transformer models, which achieved precision comparable to 8-bit quantized models from related research. Utilizing a complete implementation on an embedded FPGA (Xilinx Spartan-7 XC7S15), we examine the feasibility of deploying Transformer models on embedded IoT devices. This includes a thorough analysis of achievable precision, resource utilization, timing, power, and energy consumption for on-device inference. Our results indicate that while sufficient performance can be attained, the optimization process is not trivial. For instance, reducing the quantization bitwidth does not consistently result in decreased latency or energy consumption, underscoring the necessity of systematically exploring various optimization combinations. Compared to an 8-bit quantized Transformer model in related studies, our 4-bit quantized Transformer model increases test loss by only 0.63%, operates up to 132.33x faster, and consumes 48.19x less energy.	翻訳日:2024-11-08 21:21:36 公開日:2024-10-04
# VLMはチャートを本当に理解しているか? 一貫性とロバストさを深く掘り下げる Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness ( http://arxiv.org/abs/2407.11229v2 ) ライセンス: Link先を確認	Srija Mukhopadhyay, Adnan Qidwai, Aparna Garimella, Pritika Ramu, Vivek Gupta, Dan Roth,	(参考訳) チャート質問応答(CQA)は、ビジュアル言語理解の重要な領域である。しかし、この分野における現在のVisual Language Models (VLM) の頑健さと一貫性はいまだ解明されていない。本稿では,多種多様な問合せカテゴリやチャート形式を含む包括的データセット上での最先端VLMの評価を行う。私たちは2つの重要な側面を調査します。 1) モデルが様々なレベルのチャートを処理し、複雑さを問う能力、及び 2)同じ基礎データの異なる視覚的表現にまたがる堅牢性。本分析では,従来のモデルの強みと弱みを両立させ,質問型とチャート型に基づく有意な性能変化を明らかにした。さらに,より堅牢で信頼性の高いCQAシステムを構築するために,改善すべき領域を特定し,今後の研究方向性を提案する。この研究は、現在のモデルの限界に光を当て、今後の分野の発展への道を開く。 Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art VLMs on comprehensive datasets, developed specifically for this study, encompassing diverse question categories and chart formats. We investigate two key aspects: 1) the models' ability to handle varying levels of chart and question complexity, and 2) their robustness across different visual representations of the same underlying data. Our analysis reveals significant performance variations based on question and chart types, highlighting both strengths and weaknesses of current models. Additionally, we identify areas for improvement and propose future research directions to build more robust and reliable CQA systems. This study sheds light on the limitations of current models and paves the way for future advancements in the field.	翻訳日:2024-11-08 21:10:26 公開日:2024-10-04
# タブラルデータに対する敵対的攻撃の非受容性の検討--経験的分析 Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis ( http://arxiv.org/abs/2407.11463v3 ) ライセンス: Link先を確認	Zhipeng He, Chun Ouyang, Laith Alzubaidi, Alistair Barros, Catarina Moreira,	(参考訳) 敵対的攻撃は、入力データに対する知覚不能な摂動を通じて誤った予測を引き起こすことによって、機械学習モデルに対する潜在的な脅威である。これらの攻撃は、画像のような構造化されていないデータで広く研究されているが、それらを表のデータに適用することは、新しい課題をもたらす。これらの課題は、画像データとは異なる表データの固有の不均一性と複雑な特徴相互依存性から生じる。この区別を考慮に入れるには、表型データに特有な適合不能基準を確立する必要がある。しかし、現在、表データに対する敵攻撃の非受容性を評価するための標準化された指標が欠如している。このギャップに対処するために、表データに対する知覚不能な敵攻撃を包括的に特徴付けるために、重要な特性とそれに対応するメトリクスのセットを提案する。それらは、元の入力に近づき、変更された特徴の空間性、元のデータ分布からの逸脱、狭い分散を伴う摂動特性の感度、変更すべき機能の不変性、有効な実用的な範囲を超えてはならない特定の特徴値の実現性、データ属性間の複雑な関係をキャプチャする機能相互依存性である。提案手法を用いて,有界攻撃と非有界攻撃の両方を含む5つの敵攻撃の非受容性を評価する。その結果、これらの攻撃の不可避性と有効性の間のトレードオフが明らかとなった。この研究は、現在の攻撃アルゴリズムの限界を特定し、この分野における将来の研究をガイドする洞察を提供する。この経験的分析から得られた知見は、敵攻撃アルゴリズムの設計を強化する上で貴重な方向を提供する。 Adversarial attacks are a potential threat to machine learning models by causing incorrect predictions through imperceptible perturbations to the input data. While these attacks have been extensively studied in unstructured data like images, applying them to tabular data, poses new challenges. These challenges arise from the inherent heterogeneity and complex feature interdependencies in tabular data, which differ from the image data. To account for this distinction, it is necessary to establish tailored imperceptibility criteria specific to tabular data. However, there is currently a lack of standardised metrics for assessing the imperceptibility of adversarial attacks on tabular data. To address this gap, we propose a set of key properties and corresponding metrics designed to comprehensively characterise imperceptible adversarial attacks on tabular data. These are: proximity to the original input, sparsity of altered features, deviation from the original data distribution, sensitivity in perturbing features with narrow distribution, immutability of certain features that should remain unchanged, feasibility of specific feature values that should not go beyond valid practical ranges, and feature interdependencies capturing complex relationships between data attributes. We evaluate the imperceptibility of five adversarial attacks, including both bounded attacks and unbounded attacks, on tabular data using the proposed imperceptibility metrics. The results reveal a trade-off between the imperceptibility and effectiveness of these attacks. The study also identifies limitations in current attack algorithms, offering insights that can guide future research in the area. The findings gained from this empirical analysis provide valuable direction for enhancing the design of adversarial attack algorithms, thereby advancing adversarial machine learning on tabular data.	翻訳日:2024-11-08 21:10:26 公開日:2024-10-04
# TGIF:テキスト入力による偽造データ TGIF: Text-Guided Inpainting Forgery Dataset ( http://arxiv.org/abs/2407.11566v2 ) ライセンス: Link先を確認	Hannes Mareen, Dimitrios Karageorgiou, Glenn Van Wallendael, Peter Lambert, Symeon Papadopoulos,	(参考訳) デジタル画像操作は、生成AI技術の出現により、ますますアクセスしやすく、現実的なものになりつつある。近年の進歩により、テキストガイドによるインペイントが可能となり、最小限の努力で高度な画像編集が可能になった。これはデジタルメディアの法医学に新たな課題をもたらす。例えば、拡散モデルに基づくアプローチは、塗装された領域を元の画像に分割するか、あるいは全体像を再生することができる。後者の場合、従来のイメージフォージェリーローカライゼーション(IFL)メソッドは通常失敗する。本稿では,画像フォージェリローカライゼーションと合成画像検出(SID)手法のトレーニングと評価を支援するために設計された画像の包括的コレクションであるText-Guided Inpainting Forgery (TGIF)データセットを紹介する。 TGIFデータセットには、SD2、SDXL、Adobe Fireflyといった人気のあるオープンソースおよび商用メソッドから派生した、約75kの偽画像が含まれている。我々は、TGIF上で、最先端のIFLとSIDのいくつかの手法をベンチマークする。従来のIRF法ではスプライシング画像が検出できるが、再生されたインペイント画像は検出できない。さらに、従来のSIDは、再生した塗布された画像が偽のものであることを検出できるが、塗布された領域をローカライズすることはできない。最後に、IFLとSIDのどちらの手法も強い圧縮にさらされると失敗するが、WEBPのような現代の圧縮アルゴリズムでは堅牢ではない。結論として、現代の生成的アプローチによる局所的な操作に対する最先端検出器の非効率性を実証し、より有能なIFL法とSID法の開発を支援することを目的としている。データセットとコードはhttps://github.com/IDLabMedia/tgif-dataset.comからダウンロードできる。 Digital image manipulation has become increasingly accessible and realistic with the advent of generative AI technologies. Recent developments allow for text-guided inpainting, making sophisticated image edits possible with minimal effort. This poses new challenges for digital media forensics. For example, diffusion model-based approaches could either splice the inpainted region into the original image, or regenerate the entire image. In the latter case, traditional image forgery localization (IFL) methods typically fail. This paper introduces the Text-Guided Inpainting Forgery (TGIF) dataset, a comprehensive collection of images designed to support the training and evaluation of image forgery localization and synthetic image detection (SID) methods. The TGIF dataset includes approximately 75k forged images, originating from popular open-source and commercial methods, namely SD2, SDXL, and Adobe Firefly. We benchmark several state-of-the-art IFL and SID methods on TGIF. Whereas traditional IFL methods can detect spliced images, they fail to detect regenerated inpainted images. Moreover, traditional SID may detect the regenerated inpainted images to be fake, but cannot localize the inpainted area. Finally, both IFL and SID methods fail when exposed to stronger compression, while they are less robust to modern compression algorithms, such as WEBP. In conclusion, this work demonstrates the inefficiency of state-of-the-art detectors on local manipulations performed by modern generative approaches, and aspires to help with the development of more capable IFL and SID methods. The dataset and code can be downloaded at https://github.com/IDLabMedia/tgif-dataset.	翻訳日:2024-11-08 21:10:26 公開日:2024-10-04
# 歴史インク:19世紀のラテンアメリカ・スペイン新聞社 LLM OCR 補正 Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction ( http://arxiv.org/abs/2407.12838v2 ) ライセンス: Link先を確認	Laura Manrique-Gómez, Tony Montes, Arturo Rodríguez-Herrera, Rubén Manrique,	(参考訳) まず、19世紀のラテンアメリカの新聞のテキストのデータセットを導入し、この地域の歴史的・言語学的分析のための特殊なコーパスにおける重要なギャップに対処する。第二に、デジタルコーパスにおけるOCR誤り訂正と言語表面形状検出にLarge Language Modelを利用するフレキシブルなフレームワークを開発する。この半自動フレームワークは、さまざまなコンテキストやデータセットに適用可能で、新しく作成されたデータセットに適用できる。 This paper presents two significant contributions: First, it introduces a novel dataset of 19th-century Latin American newspaper texts, addressing a critical gap in specialized corpora for historical and linguistic analysis in this region. Second, it develops a flexible framework that utilizes a Large Language Model for OCR error correction and linguistic surface form detection in digitized corpora. This semi-automated framework is adaptable to various contexts and datasets and is applied to the newly created dataset.	翻訳日:2024-11-08 20:25:29 公開日:2024-10-04
# 大規模言語モデルは人間レベルナラティブを生成することができるか? Are Large Language Models Capable of Generating Human-Level Narratives? ( http://arxiv.org/abs/2407.13248v2 ) ライセンス: Link先を確認	Yufei Tian, Tenghao Huang, Miri Liu, Derek Jiang, Alexander Spangher, Muhao Chen, Jonathan May, Nanyun Peng,	(参考訳) 本稿ではストーリーテリングにおけるLLMの能力について考察し,物語の展開とプロットの進行に着目した。 3つの談話レベルの側面から物語を分析するための新しい計算フレームワークを導入する。ストーリー・アーク; ストーリー・アーク; ストーリー・アーク二点を回すこと、及び三覚醒及び静寂を含む情緒的寸法専門家と自動アノテーションを活用することで、LLMと人間による物語の間に大きな相違点が明らかになる。人間による物語はサスペンスがあり、刺激的であり、物語構造において多様であるが、LLMの物語は均質に肯定的であり、緊張を欠いている。次に,ナラティブ推論スキルを生成能力の前駆体として測定し,ほとんどのLLMは言論理解における人間の能力に欠けていると結論付けた。最後に, 上記の談話機能の明示的な統合は, 多様性, サスペンス, 覚醒の観点から, 40%以上のニューラルストーリーテリングの改善が示されるように, ストーリーテリングを促進できることを示す。 This paper investigates the capability of LLMs in storytelling, focusing on narrative development and plot progression. We introduce a novel computational framework to analyze narratives through three discourse-level aspects: i) story arcs, ii) turning points, and iii) affective dimensions, including arousal and valence. By leveraging expert and automatic annotations, we uncover significant discrepancies between the LLM- and human- written stories. While human-written stories are suspenseful, arousing, and diverse in narrative structures, LLM stories are homogeneously positive and lack tension. Next, we measure narrative reasoning skills as a precursor to generative capacities, concluding that most LLMs fall short of human abilities in discourse understanding. Finally, we show that explicit integration of aforementioned discourse features can enhance storytelling, as is demonstrated by over 40% improvement in neural storytelling in terms of diversity, suspense, and arousal.	翻訳日:2024-11-08 20:14:30 公開日:2024-10-04
# SpeciaLex: In-Context Specialized Lexicon Learningのベンチマーク SpeciaLex: A Benchmark for In-Context Specialized Lexicon Learning ( http://arxiv.org/abs/2407.13297v2 ) ライセンス: Link先を確認	Joseph Marvin Imperial, Harish Tayyar Madabushi,	(参考訳) 特殊レキシコン(英: Specialated lexicons)は、特別な定義、特定の役割、目的とする対象のオーディエンスなど、関連する制約のある単語の集合である。これらの制約は、テキストコンテンツの曖昧さを減らし、特定のオーディエンスに対する全体的な可読性を高めることを目的として、コンテンツ生成およびドキュメントタスク(例えば、テクニカルマニュアルや子供の読書資料を書くこと)に必要である。これらの制約をいかに大きな言語モデルが捉えるかを理解することで、研究者はNLPコミュニティを超えて、より優れた、より影響力のあるツールを構築することができる。この目的に向けて、私たちはSpeciaLexを紹介した。これは言語モデルが18の多様なサブタスクにまたがる特別なレキシコンベースの制約に従う能力を評価するためのベンチマークであり、チェック、識別、書き換え、オープンジェネレーションのコアタスクをカバーする1,785のテストインスタンスである。本稿では,15のオープン・クローズド・ソース LLM の実証評価を行い,モデルスケール,オープンネス,セットアップ,信頼性などの要因が,ベンチマークで評価した場合のパフォーマンスに与える影響について考察する。 Specialized lexicons are collections of words with associated constraints such as special definitions, specific roles, and intended target audiences. These constraints are necessary for content generation and documentation tasks (e.g., writing technical manuals or children's reading materials), where the goal is to reduce the ambiguity of text content and increase its overall readability for a specific group of audience. Understanding how large language models can capture these constraints can help researchers build better, more impactful tools for wider use beyond the NLP community. Towards this end, we introduce SpeciaLex, a benchmark for evaluating a language model's ability to follow specialized lexicon-based constraints across 18 diverse subtasks with 1,785 test instances covering core tasks of Checking, Identification, Rewriting, and Open Generation. We present an empirical evaluation of 15 open and closed-source LLMs and discuss insights on how factors such as model scale, openness, setup, and recency affect performance upon evaluating with the benchmark.	翻訳日:2024-11-08 20:14:30 公開日:2024-10-04
# 逆二乗相互作用を持つ新しい翻訳的不変な超対称鎖:分配関数、熱力学、臨界性 A novel translationally invariant supersymmetric chain with inverse-square interactions: partition function, thermodynamics and criticality ( http://arxiv.org/abs/2407.13827v3 ) ライセンス: Link先を確認	Bireswar Basu-Mallick, Federico Finkel, Artemio González-López,	(参考訳) 我々は、ルート系に直接関連しない長距離相互作用を持つ翻訳不変su$(m\|n)$超対称スピン鎖の新しい族を導入する。我々はこれらのモデルの対称性について研究し、特にこの種のシステムのボソン-フェルミオン双対性(boson-fermion duality)特性の存在を確立した。新しい鎖とそれに付随する多体超対称スピン力学モデルの関係を利用して、m$と$n$のすべての値と任意の数のスピンに対して、それらの分割関数を閉形式で計算することができる。 m$ と $n$ の両方が偶数であるとき、分配函数は2つの超対称ハルダン-シャストリースピン鎖の分配函数の積として分解され、したがって適切な転移行列のペロン固有値の観点からスピン毎の熱力学自由エネルギーの簡単な式が導かれる。この式を用いて、これらの鎖の大規模な熱力学を解析し、特に、特定の熱が1つのショットキーピークを、適切な$k$レベルのモデルとほぼ同じ温度で表すことを示す。また,新しい鎖の臨界挙動,特に基底状態の縮退と線形エネルギー-分子分散関係による低エネルギー励起の存在を解析した。このようにして、可能な唯一の臨界鎖は$m=0,1,2$であることを示すことができる。さらに、分割函数の明示的な公式を用いて、$n$ の Su$(0\|n)$ および su$(2\|n)$ の鎖の臨界性を確立し、関連する共形体理論の中心電荷を評価することができる。 We introduce a novel family of translationally-invariant su$(m\|n)$ supersymmetric spin chains with long-range interaction not directly associated to a root system. We study the symmetries of these models, establishing in particular the existence of a boson-fermion duality characteristic of this type of systems. Taking advantage of the relation of the new chains with an associated many-body supersymmetric spin dynamical model, we are able to compute their partition function in closed form for all values of $m$ and $n$ and for an arbitrary number of spins. When both $m$ and $n$ are even, we show that the partition function factorizes as the product of the partition functions of two supersymmetric Haldane-Shastry spin chains, which in turn leads to a simple expression for the thermodynamic free energy per spin in terms of the Perron eigenvalue of a suitable transfer matrix. We use this expression to study the thermodynamics of a large class of these chains, showing in particular that the specific heat presents a single Schottky peak at approximately the same temperature as a suitable $k$-level model. We also analyze the critical behavior of the new chains, and in particular the ground state degeneracy and the existence of low energy excitations with a linear energy-momentum dispersion relation. In this way we are able to show that the only possible critical chains are the ones with $m=0,1,2$. In addition, using the explicit formula for the partition function we are able to establish the criticality of the su$(0\|n)$ and su$(2\|n)$ chains with even $n$, and to evaluate the central charge of their associated conformal field theory.	翻訳日:2024-11-08 20:01:00 公開日:2024-10-04
# グリッドパズル解決のためのステップバイステップ推論: LLMはFalterとは? Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter? ( http://arxiv.org/abs/2407.14790v2 ) ライセンス: Link先を確認	Nemika Tyagi, Mihir Parmar, Mohith Kulkarni, Aswin RRV, Nisarg Patel, Mutsumi Nakamura, Arindam Mitra, Chitta Baral,	(参考訳) グリッドパズルを解くには、かなりの量の論理的推論が必要となる。したがって、モデルの推論能力を評価することは良いドメインであり、モデルの推論能力を改善するために私たちを導くことができる。しかし、既存のほとんどの研究は、LLMの推論連鎖の詳細な分析(例えば、その分岐点など)を掘り下げたり、それらを評価するためのより詳細な指標を提供することなく、パズルの最終的な解のみを評価する。 LLMは単純なヒューリスティックやアーティファクトに頼って最終解を予測できるため、LLMの推論能力を正確に評価するためには、全体的な正当性測定以上の推論連鎖を評価することが重要である。この目的のために、まずGridPuzzleを開発した。これは、複雑度が異なる274のグリッドベースのパズルからなる評価データセットである。第2に, GPT-4, Claude-3, Gemini, Mistral, Llama-2 など LLM の推論鎖を手動で解析した新しい誤り分類法を提案する。そこで我々は,大規模主観的評価(すなわち誤りの特定)のためのLLMベースのフレームワークと客観的な指標であるPuzzleEvalを開発し,推論連鎖の正しさを評価する。 LLMから推論鎖を評価することは、いくつかの興味深い発見につながる。さらに、モデルの推論能力を向上させるために使われている既存のプロンプト手法は、GridPuzzleの性能を向上しないことを示す。このことは、細粒度エラーを理解することの重要性を強調し、これらのエラーに対処する手法を開発することにより、LLMのパズル解決能力を高めるための今後の研究課題を示す。データとソースコードはhttps://github.com/Mihir3009/GridPuzzle.comで入手できる。 Solving grid puzzles involves a significant amount of logical reasoning. Hence, it is a good domain to evaluate the reasoning capability of a model which can then guide us to improve the reasoning ability of models. However, most existing works evaluate only the final predicted answer of a puzzle, without delving into an in-depth analysis of the LLMs' reasoning chains (such as where they falter) or providing any finer metrics to evaluate them. Since LLMs may rely on simple heuristics or artifacts to predict the final answer, it is crucial to evaluate the generated reasoning chain beyond overall correctness measures, for accurately evaluating the reasoning abilities of LLMs. To this end, we first develop GridPuzzle, an evaluation dataset comprising 274 grid-based puzzles with different complexities. Second, we propose a new error taxonomy derived from manual analysis of reasoning chains from LLMs including GPT-4, Claude-3, Gemini, Mistral, and Llama-2. Then, we develop an LLM-based framework for large-scale subjective evaluation (i.e., identifying errors) and an objective metric, PuzzleEval, to evaluate the correctness of reasoning chains. Evaluating reasoning chains from LLMs leads to several interesting findings. We further show that existing prompting methods used for enhancing models' reasoning abilities do not improve performance on GridPuzzle. This highlights the importance of understanding fine-grained errors and presents a challenge for future research to enhance LLMs' puzzle-solving abilities by developing methods that address these errors. Data and source code are available at https://github.com/Mihir3009/GridPuzzle.	翻訳日:2024-11-08 19:27:32 公開日:2024-10-04
# クロスドメインマニピュレーションインタフェースとしてのフロー Flow as the Cross-Domain Manipulation Interface ( http://arxiv.org/abs/2407.15208v2 ) ライセンス: Link先を確認	Mengda Xu, Zhenjia Xu, Yinghao Xu, Cheng Chi, Gordon Wetzstein, Manuela Veloso, Shuran Song,	(参考訳) In2Flow2Actは、ロボットが現実世界のロボットのトレーニングデータを必要とせずに、現実世界の操作スキルを習得できるスケーラブルな学習フレームワークである。 Im2Flow2Actの背景にある重要な考え方は、操作インターフェースとしてオブジェクトフローを使用すること、異なる実施形態(人間とロボット)とトレーニング環境(現実世界とシミュレーション)の間のドメインギャップを埋めることである。 Im2Flow2Actはフロー生成ネットワークとフロー条件ポリシーの2つのコンポーネントから構成される。人間のデモビデオに基づいて訓練されたフロー生成ネットワークは、タスク記述に基づいて初期シーン画像からオブジェクトフローを生成する。シミュレーションされたロボットプレイデータに基づいて訓練されたフロー条件付きポリシーは、生成されたオブジェクトフローをロボットアクションにマッピングし、所望のオブジェクトの動きを実現する。フローを入力として使うことで、このポリシーは最小限のsim-to-realギャップで現実世界に直接展開できる。実世界の人間のビデオとシミュレーションされたロボットのプレイデータを活用することで、現実世界での物理的ロボットの遠隔操作という課題を回避し、多様なタスクのためのスケーラブルなシステムを実現する。我々はIm2Flow2Actの様々な実世界のタスクにおいて、剛性、調音、変形可能なオブジェクトの操作を含む能力を実証する。 We present Im2Flow2Act, a scalable learning framework that enables robots to acquire real-world manipulation skills without the need of real-world robot training data. The key idea behind Im2Flow2Act is to use object flow as the manipulation interface, bridging domain gaps between different embodiments (i.e., human and robot) and training environments (i.e., real-world and simulated). Im2Flow2Act comprises two components: a flow generation network and a flow-conditioned policy. The flow generation network, trained on human demonstration videos, generates object flow from the initial scene image, conditioned on the task description. The flow-conditioned policy, trained on simulated robot play data, maps the generated object flow to robot actions to realize the desired object movements. By using flow as input, this policy can be directly deployed in the real world with a minimal sim-to-real gap. By leveraging real-world human videos and simulated robot play data, we bypass the challenges of teleoperating physical robots in the real world, resulting in a scalable system for diverse tasks. We demonstrate Im2Flow2Act's capabilities in a variety of real-world tasks, including the manipulation of rigid, articulated, and deformable objects.	翻訳日:2024-11-08 15:56:37 公開日:2024-10-04
# PreAlign:多言語アライメントの早期確立による言語間移動の促進 PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment ( http://arxiv.org/abs/2407.16222v2 ) ライセンス: Link先を確認	Jiahuan Li, Shujian Huang, Aarron Ching, Xinyu Dai, Jiajun Chen,	(参考訳) 大規模な言語モデルは、英語中心の事前訓練にもかかわらず、合理的な多言語能力を示す。しかし、これらのモデルにおける自発的な多言語アライメントは弱く、不満足な言語間移動と知識共有をもたらすことが示されている。事前訓練の前後に多言語アライメント情報を明示的に注入することでこの問題に対処する。したがって、事前訓練の初期段階において、アライメントは言語間で情報や知識を共有するために弱い。本稿では,言語モデル事前学習に先立って多言語アライメントを確立するフレームワークであるPreAlignを提案する。 PreAlignはモデルを初期化して多言語アライメントを注入し、アライメントされた単語の類似表現を生成し、事前訓練中にコードスイッチング戦略を用いてこのアライメントを保存する。 PreAlignは、言語モデリング、ゼロショットの言語間移動、および言語間知識アプリケーションにおいて、標準多言語共同訓練を著しく上回っている。実世界のシナリオにおけるさらなる実験は、様々なモデルサイズにわたるPreAlignの有効性をさらに検証した。 Large language models demonstrate reasonable multilingual abilities, despite predominantly English-centric pretraining. However, the spontaneous multilingual alignment in these models is shown to be weak, leading to unsatisfactory cross-lingual transfer and knowledge sharing. Previous works attempt to address this issue by explicitly injecting multilingual alignment information during or after pretraining. Thus for the early stage in pretraining, the alignment is weak for sharing information or knowledge across languages. In this paper, we propose PreAlign, a framework that establishes multilingual alignment prior to language model pretraining. PreAlign injects multilingual alignment by initializing the model to generate similar representations of aligned words and preserves this alignment using a code-switching strategy during pretraining. Extensive experiments in a synthetic English to English-Clone setting demonstrate that PreAlign significantly outperforms standard multilingual joint training in language modeling, zero-shot cross-lingual transfer, and cross-lingual knowledge application. Further experiments in real-world scenarios further validate PreAlign's effectiveness across various model sizes.	翻訳日:2024-11-08 15:34:26 公開日:2024-10-04
# OpenHands: ジェネラリストエージェントとしてのAIソフトウェア開発者のためのオープンプラットフォーム OpenHands: An Open Platform for AI Software Developers as Generalist Agents ( http://arxiv.org/abs/2407.16741v2 ) ライセンス: Link先を確認	Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig,	(参考訳) ソフトウェアは人間の手元にある最も強力なツールの1つです。熟練したプログラマが複雑で深い方法で世界と対話することを可能にするのです。同時に、大きな言語モデル(LLM)の改善により、周辺環境の変化と相互作用し、影響を及ぼすAIエージェントの急速な開発も行われている。本稿では,人間の開発者と同じような方法で世界と対話する,強力で柔軟なAIエージェントを開発するためのプラットフォームであるOpenHands(f.k.OpenDevin)を紹介します。本稿では,新しいエージェントの実装,コード実行のためのサンドボックス環境との安全なインタラクション,複数エージェント間の調整,評価ベンチマークの導入について述べる。現在組み込まれているベンチマークに基づいて、ソフトウェアエンジニアリング(SWE-BENCHなど)やWebブラウジング(WEBARENAなど)を含む15の課題タスクに対してエージェントの評価を行う。寛容なMITライセンスの下でリリースされているOpenHandsは、学術と産業にまたがるコミュニティプロジェクトであり、188人以上のコントリビュータから2.1K以上のコントリビューションがある。 Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenHands (f.k.a. OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web. We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, coordination between multiple agents, and incorporation of evaluation benchmarks. Based on our currently incorporated benchmarks, we perform an evaluation of agents over 15 challenging tasks, including software engineering (e.g., SWE-BENCH) and web browsing (e.g., WEBARENA), among others. Released under the permissive MIT license, OpenHands is a community project spanning academia and industry with more than 2.1K contributions from over 188 contributors.	翻訳日:2024-11-08 15:23:20 公開日:2024-10-04
# 知るか知らないか : あいまいさ下における大規模言語モデルの自己整合性の分析 To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity ( http://arxiv.org/abs/2407.17125v3 ) ライセンス: Link先を確認	Anastasiia Sedova, Robert Litschko, Diego Frassinelli, Benjamin Roth, Barbara Plank,	(参考訳) 大規模言語モデル(LLM)の顕著な性能に寄与する主要な側面の1つは、事前学習中に蓄積された膨大な事実知識である。しかし、多くのLDMは自己整合性に悩まされており、信頼性と信頼性に疑問を呈している。本稿では, 実体型あいまいさに着目し, 不明瞭な実体を刺激した場合の事実知識の適用において, 最先端のLCMの習熟度と一貫性を解析する。そこで本研究では,49個の曖昧なエンティティ上で,知識の適用から知識を逸脱する評価プロトコルを提案し,最先端のLCMをテストした。実験の結果, LLMは正しいエンティティの読み出しに苦慮し, 平均精度は85%, 未特定プロンプトで75%と低かった。結果は、LLMの行動における系統的な差異を明らかにし、モデルが知識を持っている一方で、それらを一貫して適用することに苦労し、好ましい読み方に対する偏見を示し、自己矛盾を示すことを示した。これは、より信頼できるLLMのための将来的なエンティティの曖昧さに対処する必要性を強調します。 One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.	翻訳日:2024-11-08 15:23:20 公開日:2024-10-04
# 計算量削減のためのツール支援学習 Tool-Assisted Learning of Computational Reductions ( http://arxiv.org/abs/2407.18215v2 ) ライセンス: Link先を確認	Tristan Kneisel, Elias Radtke, Marko Schmellenkamp, Fabian Vehlken, Thomas Zeume,	(参考訳) 計算機科学において計算量削減は重要かつ強力な概念である。しかし、多くの学生には理解が難しい。本稿では,削減学習が教育支援システムによってどのように支援されるか,という概念を概説する。本稿では,そのようなシステムにおける概念の具体的実装について述べるとともに,理論計算機科学の入門講座において,その教材を用いた経験を報告する。 Computational reductions are an important and powerful concept in computer science. However, they are difficult for many students to grasp. In this paper, we outline a concept for how the learning of reductions can be supported by educational support systems. We present an implementation of the concept within such a system, concrete web-based and interactive learning material for reductions, and report on our experiences using the material in a large introductory course on theoretical computer science.	翻訳日:2024-11-08 15:01:09 公開日:2024-10-04
# ベイズ並列分岐グラフニューラルネットワークにおけるロバスト学習:狭幅限界 Robust Learning in Bayesian Parallel Branching Graph Neural Networks: The Narrow Width Limit ( http://arxiv.org/abs/2407.18807v2 ) ライセンス: Link先を確認	Zechen Zhang, Haim Sompolinsky,	(参考訳) ランダムニューラルネットワークの無限幅制限は、タスク非依存のカーネルを特徴とするGaussian Process (NNGP) (Lee et al [2018]) としてニューラルネットワークに現れることが知られている。より広いネットワーク幅が一般化に寄与することが広く受け入れられている(Park et al [2019])。しかし、この研究は、残余ネットワークに類似したアーキテクチャであるベイズ並列分岐グラフニューラルネットワーク(BPB-GNN)の幅制限を調査することによって、この概念に挑戦する。我々は,BPB-GNNの幅がトレーニング例の数に比べて著しく小さい場合,各分岐はカーネル再正規化における分岐の対称性の破れにより,より堅牢な学習を示すことを示した。驚いたことに、狭い幅制限におけるBPB-GNNの性能は、バイアス制限シナリオの幅制限で達成されるものよりも、一般的に優れているか、同等である。さらに、狭い幅制限における各ブランチの読み出しノルムは、アーキテクチャのハイパーパラメータとは独立しているが、概してデータの性質を反映している。本結果は,並列分岐ネットワークにおいて,新たに定義された狭帯域方式を特徴付けるものである。 The infinite width limit of random neural networks is known to result in Neural Networks as Gaussian Process (NNGP) (Lee et al. [2018]), characterized by task-independent kernels. It is widely accepted that larger network widths contribute to improved generalization (Park et al. [2019]). However, this work challenges this notion by investigating the narrow width limit of the Bayesian Parallel Branching Graph Neural Network (BPB-GNN), an architecture that resembles residual networks. We demonstrate that when the width of a BPB-GNN is significantly smaller compared to the number of training examples, each branch exhibits more robust learning due to a symmetry breaking of branches in kernel renormalization. Surprisingly, the performance of a BPB-GNN in the narrow width limit is generally superior or comparable to that achieved in the wide width limit in bias-limited scenarios. Furthermore, the readout norms of each branch in the narrow width limit are mostly independent of the architectural hyperparameters but generally reflective of the nature of the data. Our results characterize a newly defined narrow-width regime for parallel branching networks in general.	翻訳日:2024-11-08 14:50:05 公開日:2024-10-04
# 高次解法における異方性p適応と誤差推定のための強化学習 Reinforcement learning for anisotropic p-adaptation and error estimation in high-order solvers ( http://arxiv.org/abs/2407.19000v2 ) ライセンス: Link先を確認	David Huergo, Martín de Frutos, Eduardo Jané, Oscar A. Marino, Gonzalo Rubio, Esteban Ferrer,	(参考訳) Reinforcement Learning (RL) を用いた高次h/pソルバにおける異方性p適応の自動化と最適化のための新しい手法を提案する。動的RL適応は、高階多項式を調整するために進化的解を用いる。我々は,シミュレーションを行う際の最小限のオーバーコストを示す,主解法から切り離されたオフライントレーニング手法を開発した。さらに、局所的な離散化誤差の定量化を可能にする、安価なRLベースの誤差推定手法を導出する。提案手法は計算メッシュと解く偏微分方程式の両方に非依存である。 RLのメッシュ適応への応用にはいくつかの利点がある。これにより、自動化された適応的なメッシュリファインメントが可能になり、手作業による介入の必要が軽減される。計算資源を最適化し、必要であれば高次多項式を動的に割当て、安定な領域での洗練を最小化する。これにより、解の精度を維持しながら計算コストの削減につながる。さらに、RLは従来のメッシュ適応の探索を可能にし、シミュレーションの精度と堅牢性を高める可能性がある。この研究は、より堅牢で再現性があり、複雑な3次元問題に適用可能なアプローチを提供することによって、我々の当初の研究を拡張します。本稿では, 円柱, テイラー・グリーン・ボルテックス, 10MWの風力タービンによる, 提案手法の柔軟性の検証を行う。 We present a novel approach to automate and optimize anisotropic p-adaptation in high-order h/p solvers using Reinforcement Learning (RL). The dynamic RL adaptation uses the evolving solution to adjust the high-order polynomials. We develop an offline training approach, decoupled from the main solver, which shows minimal overcost when performing simulations. In addition, we derive an inexpensive RL-based error estimation approach that enables the quantification of local discretization errors. The proposed methodology is agnostic to both the computational mesh and the partial differential equation to be solved. The application of RL to mesh adaptation offers several benefits. It enables automated and adaptive mesh refinement, reducing the need for manual intervention. It optimizes computational resources by dynamically allocating high-order polynomials where necessary and minimizing refinement in stable regions. This leads to computational cost savings while maintaining the accuracy of the solution. Furthermore, RL allows for the exploration of unconventional mesh adaptations, potentially enhancing the accuracy and robustness of simulations. This work extends our original research, offering a more robust, reproducible, and generalizable approach applicable to complex three-dimensional problems. We provide validation for laminar and turbulent cases: circular cylinders, Taylor Green Vortex and a 10MW wind turbine to illustrate the flexibility of the proposed approach.	翻訳日:2024-11-08 14:38:53 公開日:2024-10-04
# 低雑音に対するスパースLPNとLSPNのアルゴリズム Algorithms for Sparse LPN and LSPN Against Low-noise ( http://arxiv.org/abs/2407.19215v2 ) ライセンス: Link先を確認	Xue Chen, Wenxuan Shu, Zhaienhe Zhou,	(参考訳) 本研究では,古典的学習パリティ(LPN)問題の2つのスパース変種に対する学習アルゴリズムについて検討した。我々は、幅広いパラメータの最先端性を改善する新しいアルゴリズムフレームワークを提供する。このフレームワークは、従来のアプローチと異なる単純な構造を持ち、最初のステップはスパーシティの知識によるドメインの縮小であり、ガウスの除去によるサブプロブレムの解決である。 n$ を次元とし、$k$ を空間パラメータとし、$\eta$ をノイズレートとし、各ラベルが確率$\eta$ で反転する。スパースLPN問題(様々なパラメータを持つ)は、暗号に広く応用されている。標準のLPN問題とは異なり、$\mathbf{F}_2^n$のランダムベクトルをサンプリングし、ランダムな$k$スパースベクトルをサンプリングする。誕生日パラドックスは、$m=n^{k/2}$サンプルを与えられた自明な区別アルゴリズムを意味する。 m=n^{1+(\frac{k}{2}-1)(1-\delta)}$と$\delta \in (0,1)$の場合、最もよく知られているアルゴリズムは実行時間$\min\{e^{\eta n}, e^{\tilde{O}(n^{\delta})}\}$である。時間複雑性$e^{\tilde{O}(\eta \cdot n^{\frac{1+\delta}{2}})}$およびサンプル複雑性$m=\max\{1,\frac{\eta \cdot n^{\frac{1+\delta}{2}}}{k^2}\} \cdot \tilde{O}(n)^{1+(\frac{k-1}{2})(1-\delta)}$とするスパースLPNの学習アルゴリズムを提案する。これは、より広い範囲の$\eta$を持つ任意の定数または超定数$k$に対する以前の結果を改善する。ノイズによる学習スパースパリティ(LSPN)問題は、隠れパリティが$k$スパースであると仮定する。 LSPNは学習理論と暗号の両方で広く研究されている。しかし、最先端技術は、幅広いパラメータに対して${n \choose k/2} = \Omega(n/k)^{k/2}$時間を必要とし、単純な列挙アルゴリズムは${n \choose k}=O(n/k)^k$時間を必要とする。 LSPNアルゴリズムは、任意の$\eta$と$k$に対して、時間$O(\eta \cdot n/k)^k$で実行される。これにより、幅広いパラメータでスパースパリティを学習するための最先端技術が改善される。 We study learning algorithms for two sparse variants of the classical learning parity with noise (LPN) problem. We provide a new algorithmic framework that improves the state of the art for a wide range of parameters. This framework has a simple structure different from previous approaches: the first step is a domain reduction via the knowledge of sparsity; then it solves sub-problems by Gaussian elimination. Let $n$ be the dimension, $k$ be the sparsity parameter, and $\eta$ be the noise rate such that each label gets flipped with probability $\eta$. The sparse LPN problem (with various parameters) has wide applications in cryptography. Different from the standard LPN problem that samples random vectors in $\mathbf{F}_2^n$, it samples random $k$-sparse vectors. The birthday paradox implies a trivial distinguishing algorithm given $m=n^{k/2}$ samples. For $m=n^{1+(\frac{k}{2}-1)(1-\delta)}$ with $\delta \in (0,1)$, the best known algorithm has running time $\min\{e^{\eta n}, e^{\tilde{O}(n^{\delta})}\}$. We present a learning algorithm for sparse LPN with time complexity $e^{\tilde{O}(\eta \cdot n^{\frac{1+\delta}{2}})}$ and sample complexity $m=\max\{1,\frac{\eta \cdot n^{\frac{1+\delta}{2}}}{k^2}\} \cdot \tilde{O}(n)^{1+(\frac{k-1}{2})(1-\delta)}$. It improves previous results for any constant or super-constant $k$ with a wide range of $\eta$. The learning sparse parity with noise (LSPN) problem assumes the hidden parity is $k$-sparse. LSPN has been extensively studied in both learning theory and cryptography. However, the state of the art needs ${n \choose k/2} = \Omega(n/k)^{k/2}$ time for a wide range of parameters while the simple enumeration algorithm takes ${n \choose k}=O(n/k)^k$ time. Our LSPN algorithm runs in time $O(\eta \cdot n/k)^k$ for any $\eta$ and $k$. This improves the state-of-the-art for learning sparse parity in a wide range of parameters.	翻訳日:2024-11-08 14:38:53 公開日:2024-10-04
# 低雑音に対するスパースLPNとLSPNのアルゴリズム Algorithms for Sparse LPN and LSPN Against Low-noise ( http://arxiv.org/abs/2407.19215v3 ) ライセンス: Link先を確認	Xue Chen, Wenxuan Shu, Zhaienhe Zhou,	(参考訳) 本研究では,古典的学習パリティ(LPN)問題の2つのスパース変種に対する学習アルゴリズムについて検討した。我々は、幅広いパラメータの最先端性を改善する新しいアルゴリズムフレームワークを提供する。このフレームワークは、従来のアプローチと異なる単純な構造を持ち、最初のステップはスパーシティの知識によるドメインの縮小であり、ガウスの除去によるサブプロブレムの解決である。 n$ を次元とし、$k$ を空間パラメータとし、$\eta$ をノイズレートとし、各ラベルが確率$\eta$ で反転する。スパースLPN問題(様々なパラメータを持つ)は、暗号に広く応用されている。標準のLPN問題とは異なり、$\mathbf{F}_2^n$のランダムベクトルをサンプリングし、ランダムな$k$スパースベクトルをサンプリングする。誕生日パラドックスは、$m=n^{k/2}$サンプルを与えられた自明な区別アルゴリズムを意味する。 m=n^{1+(\frac{k}{2}-1)(1-\delta)}$と$\delta \in (0,1)$の場合、最もよく知られているアルゴリズムは実行時間$\min\{e^{\eta n}, e^{\tilde{O}(n^{\delta})}\}$である。時間複雑性$e^{\tilde{O}(\eta \cdot n^{\frac{1+\delta}{2}})}$およびサンプル複雑性$m=\max\{1,\frac{\eta \cdot n^{\frac{1+\delta}{2}}}{k^2}\} \cdot \tilde{O}(n)^{1+(\frac{k-1}{2})(1-\delta)}$とするスパースLPNの学習アルゴリズムを提案する。これは、より広い範囲の$\eta$を持つ任意の定数または超定数$k$に対する以前の結果を改善する。ノイズによる学習スパースパリティ(LSPN)問題は、隠れパリティが$k$スパースであると仮定する。 LSPNは学習理論と暗号の両方で広く研究されている。しかし、最先端技術は、幅広いパラメータに対して${n \choose k/2} = \Omega(n/k)^{k/2}$時間を必要とし、単純な列挙アルゴリズムは${n \choose k}=O(n/k)^k$時間を必要とする。 LSPNアルゴリズムは、任意の$\eta$と$k$に対して、時間$O(\eta \cdot n/k)^k$で実行される。これにより、幅広いパラメータでスパースパリティを学習するための最先端技術が改善される。 We study learning algorithms for two sparse variants of the classical learning parity with noise (LPN) problem. We provide a new algorithmic framework that improves the state of the art for a wide range of parameters. This framework has a simple structure different from previous approaches: the first step is a domain reduction via the knowledge of sparsity; then it solves sub-problems by Gaussian elimination. Let $n$ be the dimension, $k$ be the sparsity parameter, and $\eta$ be the noise rate such that each label gets flipped with probability $\eta$. The sparse LPN problem (with various parameters) has wide applications in cryptography. Different from the standard LPN problem that samples random vectors in $\mathbf{F}_2^n$, it samples random $k$-sparse vectors. The birthday paradox implies a trivial distinguishing algorithm given $m=n^{k/2}$ samples. For $m=n^{1+(\frac{k}{2}-1)(1-\delta)}$ with $\delta \in (0,1)$, the best known algorithm has running time $\min\{e^{\eta n}, e^{\tilde{O}(n^{\delta})}\}$. We present a learning algorithm for sparse LPN with time complexity $e^{\tilde{O}(\eta \cdot n^{\frac{1+\delta}{2}})}$ and sample complexity $m=\max\{1,\frac{\eta \cdot n^{\frac{1+\delta}{2}}}{k^2}\} \cdot \tilde{O}(n)^{1+(\frac{k-1}{2})(1-\delta)}$. It improves previous results for any constant or super-constant $k$ with a wide range of $\eta$. The learning sparse parity with noise (LSPN) problem assumes the hidden parity is $k$-sparse. LSPN has been extensively studied in both learning theory and cryptography. However, the state of the art needs ${n \choose k/2} = \Omega(n/k)^{k/2}$ time for a wide range of parameters while the simple enumeration algorithm takes ${n \choose k}=O(n/k)^k$ time. Our LSPN algorithm runs in time $O(\eta \cdot n/k)^k$ for any $\eta$ and $k$. This improves the state-of-the-art for learning sparse parity in a wide range of parameters.	翻訳日:2024-11-08 14:38:53 公開日:2024-10-04
# 方向性グラフのための良い位置エンコーディングとは何か? What Are Good Positional Encodings for Directed Graphs? ( http://arxiv.org/abs/2407.20912v2 ) ライセンス: Link先を確認	Yinan Huang, Haoyu Wang, Pan Li,	(参考訳) 位置エンコーディング(PE)は、ノード間の相対空間関係を効果的に捉えるため、強力で表現力のあるグラフニューラルネットワークとグラフトランスフォーマーを構築するために不可欠である。無向グラフのPEについて広範な研究が行われてきたが、有向グラフのPEは比較的未探索のままである。この研究はこのギャップに対処しようと試みている。まず、有向グラフに対するウォークカウントシーケンスの一般化であるウォークプロファイルの概念を紹介する。ウォークプロファイルは、プログラム解析や回路性能予測など、有向グラフ関連アプリケーションに不可欠な多くの構造的特徴を含んでいる。歩行プロファイルの表現における既存のPE手法の限界を特定し,複数のポテンシャル因子を組み込むことで,磁気ラプラシア固有ベクトルに基づくPEを拡張した,新しいMulti-q Magnetic Laplacian PEを提案する。新しいPEは、歩行プロファイルを確実に表現できる。さらに,従来の基底不変ニューラルネットワークを一般化し,複雑な領域における新しいPEの安定した利用を可能にする。提案するPEの表現性を検証し,ネットワークの整合性の解決と回路ベンチマークの高速化に有効であることを示す。私たちのコードはhttps://github.com/Graph-COM/Multi-q-Maglapで利用可能です。 Positional encodings (PEs) are essential for building powerful and expressive graph neural networks and graph transformers, as they effectively capture the relative spatial relationships between nodes. Although extensive research has been devoted to PEs in undirected graphs, PEs for directed graphs remain relatively unexplored. This work seeks to address this gap. We first introduce the notion of Walk Profile, a generalization of walk-counting sequences for directed graphs. A walk profile encompasses numerous structural features crucial for directed graph-relevant applications, such as program analysis and circuit performance prediction. We identify the limitations of existing PE methods in representing walk profiles and propose a novel Multi-q Magnetic Laplacian PE, which extends the Magnetic Laplacian eigenvector-based PE by incorporating multiple potential factors. The new PE can provably express walk profiles. Furthermore, we generalize prior basis-invariant neural networks to enable the stable use of the new PE in the complex domain. Our numerical experiments validate the expressiveness of the proposed PEs and demonstrate their effectiveness in solving sorting network satisfiability and performing well on general circuit benchmarks. Our code is available at https://github.com/Graph-COM/Multi-q-Maglap.	翻訳日:2024-11-08 14:05:01 公開日:2024-10-04
# U(N)$の量子信号処理と量子特異値変換 Quantum Signal Processing and Quantum Singular Value Transformation on $U(N)$ ( http://arxiv.org/abs/2408.01439v2 ) ライセンス: Link先を確認	Xi Lu, Yuan Liu, Hongwei Lin,	(参考訳) 量子信号処理と量子特異値変換は、ブロック符号化行列の多項式変換を量子コンピュータに実装するための強力なツールであり、多くの著名な量子アルゴリズムにおいて漸近的に最適な複雑性を達成した。ブロック符号化された入力から複数の多項式を同時に実現する量子信号処理と量子特異値変換の枠組みを,元となるフレームワークにおける$U(2)$の一般化として提案する。また、達成可能な多項式の包括的解析を行い、所望の多項式変換を与える量子回路を構成する再帰的アルゴリズムを与える。二変量多項式関数を実現するためのフレームワークを提案し、漸近的に最適なクエリ複雑性を持つ量子振幅推定アルゴリズムについて検討する。 Quantum signal processing and quantum singular value transformation are powerful tools to implement polynomial transformations of block-encoded matrices on quantum computers, and has achieved asymptotically optimal complexity in many prominent quantum algorithms. We propose a framework of quantum signal processing and quantum singular value transformation on $U(N)$, which realizes multiple polynomials simultaneously from a block-encoded input, as a generalization of those on $U(2)$ in the original frameworks. We also perform a comprehensive analysis on achievable polynomials and give a recursive algorithm to construct the quantum circuit that gives the desired polynomial transformation. As two example applications, we propose a framework to realize bi-variate polynomial functions, and study the quantum amplitude estimation algorithm with asymptotically optimal query complexity.	翻訳日:2024-11-08 13:18:17 公開日:2024-10-04
# ランダウアーの原理とブラックホール領域の量子化 Landauer's principle and black hole area quantization ( http://arxiv.org/abs/2408.02077v3 ) ライセンス: Link先を確認	Bijan Bagchi, Aritra Ghosh, Sauvik Sen,	(参考訳) この記事では、シュワルツシルトブラックホールの領域量子化の文脈における情報理論からランダウアーの原理を評価する。ホーキング蒸発が領域(または質量)スペクトルの離散状態間の遷移の観点で解釈できる量子力学的視点の中では、ランダウアーの原理は、ブラックホールのミクロ状態の数が2^n$になるときに飽和形で一貫して成り立つことを正当化する。これは、$\Delta A = \alpha l_P^2$(自然単位)の領域と等価であり、$\alpha = 4 \ln 2$ はボルツマン単位の連続レベル間のエントロピー間隔がちょうど1ビットの情報と一致する。また、文献で一般的な$\alpha$の他の値についてもコメントします。 This article assesses Landauer's principle from information theory in the context of area quantization of the Schwarzschild black hole. Within a quantum-mechanical perspective where Hawking evaporation can be interpreted in terms of transitions between the discrete states of the area (or mass) spectrum, we justify that Landauer's principle holds consistently in the saturated form when the number of microstates of the black hole goes as $2^n$, where $n$ is a large positive integer labeling the levels of the area/mass spectrum in the semiclassical regime. This is equivalent to the area spacing $\Delta A = \alpha l_P^2$ (in natural units), where $\alpha = 4 \ln 2$ for which the entropy spacing between consecutive levels in Boltzmann units coincides exactly with one bit of information. We also comment on the situation for other values of $\alpha$ prevalent in the literature.	翻訳日:2024-11-08 12:55:51 公開日:2024-10-04
# Logistic Regression は小さな LLM を強力かつ説明可能な "tens-of-shot" 分類器にする Logistic Regression makes small LLMs strong and explainable "tens-of-shot" classifiers ( http://arxiv.org/abs/2408.03414v2 ) ライセンス: Link先を確認	Marcus Buckmann, Edward Hill,	(参考訳) 簡単な分類処理では,性能のトレードオフや追加のラベル付けコストを伴わずに,大規模な商用モデルではなく,小規模でローカルな生成言語モデルを使用することの利点を享受できることが示される。プライバシ、可用性、コスト、説明可能性といったこれらのアドバンテージは、商用アプリケーションにおいても、AIの広範な民主化においても重要です。 17の文分類タスク (2-4クラス) の実験を通して、小さなLLMの埋め込みにおける対物的回帰は、"tens-of-shot"体制における大きなLLMの性能に等しい(そして通常より優れている)ことを示す。これは、大きなLLMのパフォーマンスを検証するのに必要な以上のラベル付きインスタンスを必要としない。最後に,分類決定のための安定かつ合理的な説明を抽出する。 For simple classification tasks, we show that users can benefit from the advantages of using small, local, generative language models instead of large commercial models without a trade-off in performance or introducing extra labelling costs. These advantages, including those around privacy, availability, cost, and explainability, are important both in commercial applications and in the broader democratisation of AI. Through experiments on 17 sentence classification tasks (2-4 classes), we show that penalised logistic regression on the embeddings from a small LLM equals (and usually betters) the performance of a large LLM in the "tens-of-shot" regime. This requires no more labelled instances than are needed to validate the performance of the large LLM. Finally, we extract stable and sensible explanations for classification decisions.	翻訳日:2024-11-08 12:33:46 公開日:2024-10-04
# Mathfish:教育カリキュラムのグラウンド化による言語モデル数学推論の評価 Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula ( http://arxiv.org/abs/2408.04226v3 ) ライセンス: Link先を確認	Li Lucy, Tal August, Rose E. Wang, Luca Soldaini, Courtney Allison, Kyle Lo,	(参考訳) 数学カリキュラムが学級に適しており、教育基準に従って重要なスキルや概念と整合することを保証するため、教育専門家は、公表された数学問題を何ヶ月も慎重にレビューすることができる。このプロセスからインスピレーションを得て,本研究は,言語モデル(LM)の数学的能力を評価するための新しい角度を示し,それらが数学コンテンツによって実現されるスキルや概念を識別できるかどうかを検証した。 1つは、Achieve the Core(ATC)のK-12数学スキルと概念、あるいは標準を385のきめ細かい記述からなり、もう1つは、これらの標準(MathFish)をラベル付けした9.9K数学問題の1つである。本研究では, 1 つの問題が与えられた基準に合致するかどうかを検証し, 2 つの問題に一貫した基準を付したタグ付けを行うことにより, LM の数学的問題を評価する能力を評価するための2つのタスクを開発する。経験豊富な教師と一緒に働くと、LMは問題に関連する標準をタグ付けして検証するのに苦労し、代わりに、真実に近いが微妙な方法で異なるラベルを予測することに気付きます。また, LMは, プロンプトに記載されている標準と完全に一致しない問題が発生することが少なく, かつ, LMを巻き込むユースケースに対して, 慎重に精査する必要があることが示唆された。最後に、GSM8kの問題を数学標準を用いて分類し、なぜ他のモデルよりも解決が難しいのかをよりよく理解する。 To ensure that math curriculum is grade-appropriate and aligns with critical skills or concepts in accordance with educational standards, pedagogical experts can spend months carefully reviewing published math problems. Drawing inspiration from this process, our work presents a novel angle for evaluating language models' (LMs) mathematical abilities, by investigating whether they can discern skills and concepts enabled by math content. We contribute two datasets: one consisting of 385 fine-grained descriptions of K-12 math skills and concepts, or standards, from Achieve the Core (ATC), and another of 9.9K math problems labeled with these standards (MathFish). We develop two tasks for evaluating LMs' abilities to assess math problems: (1) verifying whether a problem aligns with a given standard, and (2) tagging a problem with all aligned standards. Working with experienced teachers, we find that LMs struggle to tag and verify standards linked to problems, and instead predict labels that are close to ground truth, but differ in subtle ways. We also show that LMs often generate problems that do not fully align with standards described in prompts, suggesting the need for careful scrutiny on use cases involving LMs for generating curricular materials. Finally, we categorize problems in GSM8k using math standards, allowing us to better understand why some problems are more difficult to solve for models than others.	翻訳日:2024-11-08 12:22:45 公開日:2024-10-04
# モデル評価のためのクロスバリデーションに基づく品質指標のロバスト性調査 Robustness investigation of cross-validation based quality measures for model assessment ( http://arxiv.org/abs/2408.04391v2 ) ライセンス: Link先を確認	Thomas Most, Lars Gräning, Sebastian Wolff,	(参考訳) 本稿では,機械学習モデルの評価のための品質指標の精度とロバスト性について検討する。機械学習モデルの予測品質は、未知データに対して近似誤差を推定するクロスバリデーションアプローチに基づいて、モデルに依存しない評価を行う。提案手法は,モデル予測における説明された変動量の定量化である。これらの測定の信頼性は、いくつかの数値的な例を用いて評価され、推定された予測誤差の検証のための追加データセットが利用可能である。さらに、提案した品質指標の信頼性境界を推定し、クロスバリデーション手法により得られた予測残差から局所品質指標を導出する。 In this paper the accuracy and robustness of quality measures for the assessment of machine learning models are investigated. The prediction quality of a machine learning model is evaluated model-independent based on a cross-validation approach, where the approximation error is estimated for unknown data. The presented measures quantify the amount of explained variation in the model prediction. The reliability of these measures is assessed by means of several numerical examples, where an additional data set for the verification of the estimated prediction error is available. Furthermore, the confidence bounds of the presented quality measures are estimated and local quality measures are derived from the prediction residuals obtained by the cross-validation approach.	翻訳日:2024-11-08 12:11:36 公開日:2024-10-04
# DataNarrative: 可視化とテキストによるデータ駆動ストーリテリングの自動化 DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts ( http://arxiv.org/abs/2408.05346v3 ) ライセンス: Link先を確認	Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty,	(参考訳) データ駆動型ストーリーテリングは、物語技法と可視化とテキストを組み合わせることで洞察を伝達する強力な方法である。これらのストーリーには、ハイライトされたバーやチャートの行などの視覚的補助と、洞察を説明するテキストアノテーションが組み込まれている。しかし、そのような物語を作るには、データと綿密な物語計画の深い理解が必要であり、しばしば人間の介入を必要とする。 LLM(Large Language Models)は様々なNLPタスクに優れていますが、一貫性のある包括的なデータストーリーを生成する能力はまだ未定です。本研究では,データストーリ生成のための新しいタスクと,さまざまなソースから1,449件のストーリを含むベンチマークを紹介する。一貫性のあるデータストーリーを作成する上での課題に対処するために,人間のストーリーテリングプロセスを再現する2つのLLMエージェントを用いたマルチエージェントフレームワークを提案する。我々のエージェント・フレームワークは一般的にモデルベースと人的評価の両方において非エージェント・フレームワークよりも優れていますが、結果はデータ・ストーリー・ジェネレーションにおける独特な課題を明らかにします。 Data-driven storytelling is a powerful method for conveying insights by combining narrative techniques with visualizations and text. These stories integrate visual aids, such as highlighted bars and lines in charts, along with textual annotations explaining insights. However, creating such stories requires a deep understanding of the data and meticulous narrative planning, often necessitating human intervention, which can be time-consuming and mentally taxing. While Large Language Models (LLMs) excel in various NLP tasks, their ability to generate coherent and comprehensive data stories remains underexplored. In this work, we introduce a novel task for data story generation and a benchmark containing 1,449 stories from diverse sources. To address the challenges of crafting coherent data stories, we propose a multiagent framework employing two LLM agents designed to replicate the human storytelling process: one for understanding and describing the data (Reflection), generating the outline, and narration, and another for verification at each intermediary step. While our agentic framework generally outperforms non-agentic counterparts in both model-based and human evaluations, the results also reveal unique challenges in data story generation.	翻訳日:2024-11-08 12:00:35 公開日:2024-10-04
# 条件の連鎖:条件質問応答のための構成、検証、解決条件 Chain of Condition: Construct, Verify and Solve Conditions for Conditional Question Answering ( http://arxiv.org/abs/2408.05442v2 ) ライセンス: Link先を確認	Jiuheng Lin, Yuxuan Lai, Yansong Feng,	(参考訳) 条件付き質問応答(CQA)は、可能な回答を見つけ、不足した条件を特定することを目的とした重要なタスクである。既存のアプローチは,(1)必要な条件と論理的関係を正確に同定し,(2)不足しているものを検出するための条件を検証するという2つの課題により,CQAと競合する。本稿では,まずすべての条件を特定し,その条件が満たされているかどうかを検証し,最後に論理的表現を解き,不足した条件を示し,それに応じて回答を生成することによって,条件の連鎖という新しいプロンプト手法を提案する。 2つのCQAベンチマークデータセットの実験では、私たちの状態の連鎖は、既存のプロンプトベースラインよりも優れており、新しい最先端技術を確立しています。さらに, GPT-3.5-Turbo や GPT-4 が既存の教師付きモデルよりも優れていることを示す。 Conditional question answering (CQA) is an important task that aims to find probable answers and identify missing conditions. Existing approaches struggle with CQA due to two challenges: (1) precisely identifying necessary conditions and the logical relationship, and (2) verifying conditions to detect any that are missing. In this paper, we propose a novel prompting approach, Chain of condition, by first identifying all conditions and constructing their logical relationships explicitly according to the document, then verifying whether these conditions are satisfied, finally solving the logical expression to indicate any missing conditions and generating the answer accordingly. Experiments on two CQA benchmark datasets show our chain of condition outperforms existing prompting baselines, establishing a new state of the art. Furthermore, with only a few examples, our method can facilitate GPT-3.5-Turbo or GPT-4 to outperform all existing supervised models.	翻訳日:2024-11-08 12:00:35 公開日:2024-10-04
# LLMを用いたタスク計画のための検索型階層型階層型インコンテクスト強化学習と隠れモジュールリフレクション Retrieval-Augmented Hierarchical in-Context Reinforcement Learning and Hindsight Modular Reflections for Task Planning with LLMs ( http://arxiv.org/abs/2408.06520v2 ) ライセンス: Link先を確認	Chuanneng Sun, Songjun Huang, Dario Pompili,	(参考訳) 大規模言語モデル(LLM)は、様々な言語タスクにおいて顕著な能力を示しており、ロボット工学における意思決定の候補として有望である。階層強化学習(Hierarchical Reinforcement Learning, HRL)に着想を得て, 複雑なタスクをLDMベースの高レベルポリシーを用いてサブタスクに分解する新しいフレームワークであるRetrieval-Augmented in-context reinforcement Learning (RAHL)を提案する。目標によって定義されたサブタスクは、完成する低レベルポリシーに割り当てられる。マルチエピソード実行におけるエージェントの性能を向上させるため,HMR(Hindsight Modular Reflection)を提案する。提案するRAHLの判定能力は,ALFWorld,Webshop,HotpotQAの3つのベンチマーク環境で評価した。以上の結果から, RAHLは5エピソードで9%, 42%, 10%において, 強いベースラインでのパフォーマンス向上を達成できることが示唆された。さらに,Boston Dynamics SPOTロボットにRAHLを実装した。実験の結果、ロボットは環境をスキャンし、入り口を見つけ、LSMポリシーで制御された新しい部屋へと移動できることがわかった。 Large Language Models (LLMs) have demonstrated remarkable abilities in various language tasks, making them promising candidates for decision-making in robotics. Inspired by Hierarchical Reinforcement Learning (HRL), we propose Retrieval-Augmented in-context reinforcement Learning (RAHL), a novel framework that decomposes complex tasks into sub-tasks using an LLM-based high-level policy, in which a complex task is decomposed into sub-tasks by a high-level policy on-the-fly. The sub-tasks, defined by goals, are assigned to the low-level policy to complete. To improve the agent's performance in multi-episode execution, we propose Hindsight Modular Reflection (HMR), where, instead of reflecting on the full trajectory, we let the agent reflect on shorter sub-trajectories to improve reflection efficiency. We evaluated the decision-making ability of the proposed RAHL in three benchmark environments--ALFWorld, Webshop, and HotpotQA. The results show that RAHL can achieve an improvement in performance in 9%, 42%, and 10% in 5 episodes of execution in strong baselines. Furthermore, we also implemented RAHL on the Boston Dynamics SPOT robot. The experiment shows that the robot can scan the environment, find entrances, and navigate to new rooms controlled by the LLM policy.	翻訳日:2024-11-08 11:26:46 公開日:2024-10-04
# リンク予測における知識グラフ埋め込みの予測多重性 Predictive Multiplicity of Knowledge Graph Embeddings in Link Prediction ( http://arxiv.org/abs/2408.08226v2 ) ライセンス: Link先を確認	Yuqicheng Zhu, Nico Potyka, Mojtaba Nayyeri, Bo Xiong, Yunjie He, Evgeny Kharlamov, Steffen Staab,	(参考訳) 知識グラフ埋め込み(KGE)モデルは、知識グラフ(KG)の欠落するリンクを予測するためにしばしば使用される。しかし、複数のKG埋め込みはリンク予測にほぼ等しく機能するが、未知のクエリに対して矛盾する予測を与える。この現象は文学において「textit{predictive multiplicity}」と呼ばれる。ハイテイク領域におけるKGEベースのアプリケーションには重大なリスクが伴うが、KGEの研究では見落とされている。我々は、リンク予測における予測多重度を定義し、評価指標を導入し、一般的なベンチマークデータセット上の代表的KGE法に対する予測多重度を測定する。私たちの経験的研究は、リンク予測において重大な予測多重性を示し、矛盾する予測を示すクエリを8\%から39\%に削減しています。社会選択理論から投票方法を活用することでこの問題に対処し、我々の実験では、紛争を6,6\%から7,8\%に大幅に軽減する。 Knowledge graph embedding (KGE) models are often used to predict missing links for knowledge graphs (KGs). However, multiple KG embeddings can perform almost equally well for link prediction yet give conflicting predictions for unseen queries. This phenomenon is termed \textit{predictive multiplicity} in the literature. It poses substantial risks for KGE-based applications in high-stake domains but has been overlooked in KGE research. We define predictive multiplicity in link prediction, introduce evaluation metrics and measure predictive multiplicity for representative KGE methods on commonly used benchmark datasets. Our empirical study reveals significant predictive multiplicity in link prediction, with $8\%$ to $39\%$ testing queries exhibiting conflicting predictions. We address this issue by leveraging voting methods from social choice theory, significantly mitigating conflicts by $66\%$ to $78\%$ in our experiments.	翻訳日:2024-11-08 07:29:14 公開日:2024-10-04
# 鮮明な視点から見たラター凝集と品質 Rater Cohesion and Quality from a Vicarious Perspective ( http://arxiv.org/abs/2408.08411v2 ) ライセンス: Link先を確認	Deepak Pandita, Tharindu Cyril Weerasooriya, Sujan Dutta, Sarah K. Luger, Tharindu Ranasinghe, Ashiqur R. KhudaBukhsh, Marcos Zampieri, Christopher M. Homan,	(参考訳) 人間のフィードバックは、AI安全性、コンテンツモデレーション、感情分析など、不一致が頻発する領域にわたって、人間中心のAIシステムを構築するために不可欠である。多くの意見の相違は、特に政治的に告発された状況において、ラッカーが反対の価値観や信念を持っているために生じる。 Vicariousアノテーションは、他の人がデータにアノテートすると考える方法をラウンダーに問うことによって、不一致を断ち切る方法である。本稿では,レーダの不一致を緩和するための分析手法を用いた活気あるアノテーションの利用について検討する。我々はレーダ結束指標を用いて、政治的関係や人種的背景がラテンダーの犯罪に対する認識に与える影響について検討する。さらに、ラッカーの人口動態を考慮に入れたCrowdTruthのレーダ品質指標を用いて、ラッカーとそのアノテーションをスコアリングする。我々は,レーダの品質指標が,個人的および活気あるレベルにわたって,グループ内およびグループ間レーダの凝集にどのように影響するかを検討する。 Human feedback is essential for building human-centered AI systems across domains where disagreement is prevalent, such as AI safety, content moderation, or sentiment analysis. Many disagreements, particularly in politically charged settings, arise because raters have opposing values or beliefs. Vicarious annotation is a method for breaking down disagreement by asking raters how they think others would annotate the data. In this paper, we explore the use of vicarious annotation with analytical methods for moderating rater disagreement. We employ rater cohesion metrics to study the potential influence of political affiliations and demographic backgrounds on raters' perceptions of offense. Additionally, we utilize CrowdTruth's rater quality metrics, which consider the demographics of the raters, to score the raters and their annotations. We study how the rater quality metrics influence the in-group and cross-group rater cohesion across the personal and vicarious levels.	翻訳日:2024-11-08 07:29:14 公開日:2024-10-04
# Soda-Eval:LLM時代のオープンドメイン対話評価 Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs ( http://arxiv.org/abs/2408.10902v2 ) ライセンス: Link先を確認	John Mendonça, Isabel Trancoso, Alon Lavie,	(参考訳) オープンドメイン対話評価では,人間による評価がゴールドスタンダードとなっているが,Large Language Models (LLMs) を用いた自動評価の人気が高まっている。しかし、ほとんどのフレームワークは、現在のモデルに関連する課題を反映していない、流布や妥当性といった側面で古いチャットボットを評価するベンチマークを活用している。実際、GPT-3.5生成対話データセットであるSodaの質的分析では、現在のチャットボットはコヒーレンスやコモンセンスの知識にまつわるいくつかの繰り返しの問題を示す可能性があるが、一般的には高度に流動的で関連する応答を生成する。上述の制限について,本論文では,10K対話で120K以上のターンレベルアセスメントをカバーし,GPT-4でアノテーションを生成するSoda-Evalについて紹介する。 Soda-Eval をベンチマークとして,複数のオープンアクセス命令チューニング LLM の性能を調べた結果,対話評価は依然として困難であることが判明した。これらのモデルを微調整することで、相関と説明の両面において、数ショットの推論よりもパフォーマンスが向上する。 Although human evaluation remains the gold standard for open-domain dialogue evaluation, the growing popularity of automated evaluation using Large Language Models (LLMs) has also extended to dialogue. However, most frameworks leverage benchmarks that assess older chatbots on aspects such as fluency and relevance, which are not reflective of the challenges associated with contemporary models. In fact, a qualitative analysis on Soda, a GPT-3.5 generated dialogue dataset, suggests that current chatbots may exhibit several recurring issues related to coherence and commonsense knowledge, but generally produce highly fluent and relevant responses. Noting the aforementioned limitations, this paper introduces Soda-Eval, an annotated dataset based on Soda that covers over 120K turn-level assessments across 10K dialogues, where the annotations were generated by GPT-4. Using Soda-Eval as a benchmark, we then study the performance of several open-access instruction-tuned LLMs, finding that dialogue evaluation remains challenging. Fine-tuning these models improves performance over few-shot inferences, both in terms of correlation and explanation.	翻訳日:2024-11-08 06:22:37 公開日:2024-10-04
# Soda-Eval:LLM時代のオープンドメイン対話評価 Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs ( http://arxiv.org/abs/2408.10902v3 ) ライセンス: Link先を確認	John Mendonça, Isabel Trancoso, Alon Lavie,	(参考訳) オープンドメイン対話評価では,人間による評価がゴールドスタンダードとなっているが,Large Language Models (LLMs) を用いた自動評価の人気が高まっている。しかし、ほとんどのフレームワークは、現在のモデルに関連する課題を反映していない、流布や妥当性といった側面で古いチャットボットを評価するベンチマークを活用している。実際、GPT-3.5生成対話データセットであるSodaの質的分析では、現在のチャットボットはコヒーレンスやコモンセンスの知識にまつわるいくつかの繰り返しの問題を示す可能性があるが、一般的には高度に流動的で関連する応答を生成する。上述の制限について,本論文では,10K対話で120K以上のターンレベルアセスメントをカバーし,GPT-4でアノテーションを生成するSoda-Evalについて紹介する。 Soda-Eval をベンチマークとして,複数のオープンアクセス命令チューニング LLM の性能を調べた結果,対話評価は依然として困難であることが判明した。これらのモデルを微調整することで、相関と説明の両面において、数ショットの推論よりもパフォーマンスが向上する。 Although human evaluation remains the gold standard for open-domain dialogue evaluation, the growing popularity of automated evaluation using Large Language Models (LLMs) has also extended to dialogue. However, most frameworks leverage benchmarks that assess older chatbots on aspects such as fluency and relevance, which are not reflective of the challenges associated with contemporary models. In fact, a qualitative analysis on Soda, a GPT-3.5 generated dialogue dataset, suggests that current chatbots may exhibit several recurring issues related to coherence and commonsense knowledge, but generally produce highly fluent and relevant responses. Noting the aforementioned limitations, this paper introduces Soda-Eval, an annotated dataset based on Soda that covers over 120K turn-level assessments across 10K dialogues, where the annotations were generated by GPT-4. Using Soda-Eval as a benchmark, we then study the performance of several open-access instruction-tuned LLMs, finding that dialogue evaluation remains challenging. Fine-tuning these models improves performance over few-shot inferences, both in terms of correlation and explanation.	翻訳日:2024-11-08 06:22:37 公開日:2024-10-04
# LLMを用いたRAGとFew-Shot In-Context Learningを用いたエビデンス支援Fact Checking Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs ( http://arxiv.org/abs/2408.12060v2 ) ライセンス: Link先を確認	Ronit Singhal, Pransh Patwa, Parth Patwa, Aman Chadha, Amitava Das,	(参考訳) ソーシャルメディア上で偽情報の拡散が広まる中、オンラインクレームのファクトチェック機構を実装することが不可欠である。すべてのクレームを手動で検証することは非常に困難で、自動化されたファクトチェックシステムの必要性を強調します。本稿では,この問題に対処するためのシステムについて述べる。 Averitec データセット (Schlichtkrull et al , 2023) を用いてファクトチェックシステムの性能を評価する。精度予測に加えて,本システムでは,データセットから抽出した証拠を裏付ける。本研究では,知識ベースから関連するエビデンス文を抽出する検索・生成(RAG)パイプラインを開発し,そのクレームとともに分類のための大規模言語モデル(LLM)に入力する。また,複数のLLMのICL(In-Context Learning)機能についても検討した。本システムでは,ベースラインに対する22%の絶対改善である0.33の「平均」スコアを達成している。私たちのコードはhttps://github.com/ronit-singhal/evidence-backed-fact-checking-using-rag-and-few-in-context-learning -with-llmsで公開されています。 Given the widespread dissemination of misinformation on social media, implementing fact-checking mechanisms for online claims is essential. Manually verifying every claim is very challenging, underscoring the need for an automated fact-checking system. This paper presents our system designed to address this issue. We utilize the Averitec dataset (Schlichtkrull et al., 2023) to assess the performance of our fact-checking system. In addition to veracity prediction, our system provides supporting evidence, which is extracted from the dataset. We develop a Retrieve and Generate (RAG) pipeline to extract relevant evidence sentences from a knowledge base, which are then inputted along with the claim into a large language model (LLM) for classification. We also evaluate the few-shot In-Context Learning (ICL) capabilities of multiple LLMs. Our system achieves an 'Averitec' score of 0.33, which is a 22% absolute improvement over the baseline. Our Code is publicly available on https://github.com/ronit-singhal/evidence-backed-fact-checking-using-rag-and-few-shot-in-context-lea rning-with-llms.	翻訳日:2024-11-08 05:49:00 公開日:2024-10-04
# 適応言語モデルに対する優先誘導反射サンプリング Preference-Guided Reflective Sampling for Aligning Language Models ( http://arxiv.org/abs/2408.12163v2 ) ライセンス: Link先を確認	Hai Ye, Hwee Tou Ng,	(参考訳) 反復データ生成とモデル再訓練は、大きな言語モデル(LLM)を人間の好みに効果的に合わせることができる。データサンプリングのプロセスは、政策改善の成功に大きな影響を与えるため、非常に重要である。繰り返しランダムサンプリングは、独立してモデルを複数回クエリして出力を生成するために広く使われている手法である。本研究では,より効果的なサンプリング手法であるPreference-Guided Reflective Smpling(PRS)を提案する。ランダムサンプリングとは異なり、PSSはより効率的なサンプリングを可能にするためにツリーベースの生成フレームワークを使用している。適応的な自己精製技術を活用してサンプリング空間をよりよく探索する。自然言語でユーザの好みを指定することで、PSSはこれらの好みに応じて応答生成をさらに最適化することができる。その結果、PSSはモデルを多様なユーザの好みに合わせることができる。実験の結果,PSSは高い報酬率で高品質な応答を生成できることがわかった。 AlpacaEval と Arena-Hard では、PSS は N$ のサンプリングで繰り返しランダムサンプリングを著しく上回っている。さらに、PSSは、反復的なオフラインRLトレーニングに適用した場合、高い性能を示す。 Iterative data generation and model re-training can effectively align large language models(LLMs) to human preferences. The process of data sampling is crucial, as it significantly influences the success of policy improvement. Repeated random sampling is a widely used method that independently queries the model multiple times to generate outputs. In this work, we propose a more effective sampling method, named Preference-Guided Reflective Sampling (PRS). Unlike random sampling, PRS employs a tree-based generation framework to enable more efficient sampling. It leverages adaptive self-refinement techniques to better explore the sampling space. By specifying user preferences in natural language, PRS can further optimize response generation according to these preferences. As a result, PRS can align models to diverse user preferences. Our experiments demonstrate that PRS generates higher-quality responses with significantly higher rewards. On AlpacaEval and Arena-Hard, PRS substantially outperforms repeated random sampling in best-of-$N$ sampling. Moreover, PRS shows strong performance when applied in iterative offline RL training.	翻訳日:2024-11-08 05:49:00 公開日:2024-10-04
# 深層学習に基づく量子鍵分布プロトコルの連続攻撃 Deep-learning-based continuous attacks on quantum key distribution protocols ( http://arxiv.org/abs/2408.12571v2 ) ライセンス: Link先を確認	Théo Lejeune, François Damanet,	(参考訳) 量子鍵分配(QKD)プロトコルの最も重要な特徴は、サードパーティの攻撃に対するセキュリティと潜在的な対策である。新たなタイプの攻撃は文献で定期的に開発されているが、弱い連続測定を使用することは滅多にない。ここでは、深いリカレントニューラルネットワークの強力なパターン認識能力とともに連続測定を利用する、$\textit{Deep-learning-based continuous attack}$ (DLCA)と呼ばれる新たな攻撃方式を設計する。 BB84プロトコルに適用した場合、スパイが量子通信チャネルに送信された量子ビットの状態に関する重要な情報を抽出しながらも、我々の攻撃に気づくことが困難であることを示す。最後に、スパイが量子フィードバックを利用してトラックをさらにカバーする方法について研究する。我々の攻撃方法は、まだおもちゃモデルの初期段階にあるが、様々なQKDプロトコルに適用でき、様々な方法で一般化できるため、調査に値する潜在的な脅威を構成している。 The most important characteristic of a Quantum Key Distribution (QKD) protocol is its security against third-party attacks, and the potential countermeasures available. While new types of attacks are regularly developed in the literature, they rarely involve the use of weak continuous measurement. Here, we design a new attack scheme called $\textit{Deep-learning-based continuous attack}$ (DLCA) that exploits continuous measurement together with the powerful pattern recognition capacities of deep recurrent neural networks. We show that, when applied to the BB84 protocol, our attack can be difficult to notice while still allowing the spy to extract significant information about the states of the qubits sent in the quantum communication channel. Finally, we study how the spy can exploit quantum feedback to further cover their tracks. Our attack scheme, while still at the early stages of a toy model, constitutes a potential threat which is worthwhile to be investigated, also as it could be applied to different QKD protocols and generalized in many different ways.	翻訳日:2024-11-08 05:37:29 公開日:2024-10-04
# MobileQuant:オンデバイス言語モデルのためのモバイルフレンドリーな量子化 MobileQuant: Mobile-friendly Quantization for On-device Language Models ( http://arxiv.org/abs/2408.13933v2 ) ライセンス: Link先を確認	Fuwen Tan, Royson Lee, Łukasz Dudziak, Shell Xu Hu, Sourav Bhattacharya, Timothy Hospedales, Georgios Tzimiropoulos, Brais Martinez,	(参考訳) 大規模言語モデル(LLM)は言語処理に革命をもたらし、複数のアプリケーションにまたがって優れた結果をもたらしている。しかしながら、エッジデバイスにLSMをデプロイすることは、メモリ、エネルギ、計算コストに関していくつかの課題をもたらし、携帯電話などのデバイスでの利用を制限している。期待できる解決策は、ウェイトとアクティベーションを表すために使われるビットの数を減らすことである。既存の研究は、LLMを低ビット幅、eg 4ビットの重みに量子化することに部分的に成功し、16ビット以上のアクティベーションを量子化することは、デバイス上の量子化サポートの貧弱さや相当な精度低下による大きな計算オーバーヘッドにつながることがしばしばある。しかし、8ビットのアクティベーションは、モバイルフレンドリーなハードウェア、例えばNeural Processing Units(NPU)をLLMが完全に活用できるようにするため、デバイス上でのデプロイメントにとって非常に魅力的なものだ。本研究では、整数のみの量子化を用いたLCMのデバイス上での展開を容易にするための最初の試みを行う。まず、オンデバイス展開における既存の量子化手法の限界について検討し、特にアクティベーション量子化に着目した。この制限に対処するため、MobileQuantという簡単な後学習量子化手法を導入し、ウェイト変換とアクティベーションレンジパラメータをエンドツーエンドに最適化することで、従来のウェイト等価変換作業を拡張した。 MobileQuantが既存のメソッドよりも優れた機能をデモ 1) LLM ベンチマークの広い範囲でニアロスレス量子化を実現する。 2) 現在のオンデバイス量子化戦略と比較して, レイテンシとエネルギー消費を20～50%削減した。 3)計算予算の制限。 4)モバイルフレンドリーな計算ユニット,例えばNPUと互換性がある。 Large language models (LLMs) have revolutionized language processing, delivering outstanding results across multiple applications. However, deploying LLMs on edge devices poses several challenges with respect to memory, energy, and compute costs, limiting their widespread use in devices such as mobile phones. A promising solution is to reduce the number of bits used to represent weights and activations. While existing works have found partial success at quantizing LLMs to lower bitwidths, e.g. 4-bit weights, quantizing activations beyond 16 bits often leads to large computational overheads due to poor on-device quantization support, or a considerable accuracy drop. Yet, 8-bit activations are very attractive for on-device deployment as they would enable LLMs to fully exploit mobile-friendly hardware, e.g. Neural Processing Units (NPUs). In this work, we make a first attempt to facilitate the on-device deployment of LLMs using integer-only quantization. We first investigate the limitations of existing quantization methods for on-device deployment, with a special focus on activation quantization. We then address these limitations by introducing a simple post-training quantization method, named MobileQuant, that extends previous weight equivalent transformation works by jointly optimizing the weight transformation and activation range parameters in an end-to-end manner. MobileQuant demonstrates superior capabilities over existing methods by 1) achieving near-lossless quantization on a wide range of LLM benchmarks, 2) reducing latency and energy consumption by 20\%-50\% compared to current on-device quantization strategies, 3) requiring limited compute budget, 4) being compatible with mobile-friendly compute units, e.g. NPU.	翻訳日:2024-11-08 05:15:13 公開日:2024-10-04
# 物理認識による時空間予測を利用した言語モデル Language Model Empowered Spatio-Temporal Forecasting via Physics-Aware Reprogramming ( http://arxiv.org/abs/2408.14505v2 ) ライセンス: Link先を確認	Hao Wang, Jindong Han, Wei Fan, Hao Liu,	(参考訳) 時空間予測は、交通計画、エネルギー管理、気候モニタリングなど、多くの実世界の応用において重要である。本研究では,プレトレーニング言語モデル(PLM)の推論と一般化能力を活用して,特にデータ共有シナリオにおいて,より効果的な時空間予測を実現することを目的とする。しかし、最近の研究では、主にテキストデータに基づいて訓練されているPLMが、数値時系列における複雑な相関をモデル化するタスクをこなすと、しばしば混乱し、時空間データの解釈におけるその効果が制限されることが判明している。このギャップを埋めるために,時空間予測に適した物理対応 PLM 再プログラミングフレームワーク RePST を提案する。具体的には、まず、空間的に相関した時系列を解釈可能なサブコンポーネントに適応的に分解する物理認識型デコンポザを提案する。さらに,時空間の空間を拡大して時空間列を離散表現に投影する,選択的離散的再プログラミング手法を提案する。このスキームは、再プログラミング中の情報損失を最小限に抑え、PLMから派生した表現を豊かにする。実世界のデータセットに対する大規模な実験により、提案したRePSTは、特にデータスカースシナリオにおいて12の最先端のベースライン手法より優れており、時空間予測におけるPLMの有効性と優れた一般化能力を強調している。 Spatio-temporal forecasting is pivotal in numerous real-world applications, including transportation planning, energy management, and climate monitoring. In this work, we aim to harness the reasoning and generalization abilities of Pre-trained Language Models (PLMs) for more effective spatio-temporal forecasting, particularly in data-scarce scenarios. However, recent studies uncover that PLMs, which are primarily trained on textual data, often falter when tasked with modeling the intricate correlations in numerical time series, thereby limiting their effectiveness in comprehending spatio-temporal data. To bridge the gap, we propose RePST, a physics-aware PLM reprogramming framework tailored for spatio-temporal forecasting. Specifically, we first propose a physics-aware decomposer that adaptively disentangles spatially correlated time series into interpretable sub-components, which facilitates PLM to understand sophisticated spatio-temporal dynamics via a divide-and-conquer strategy. Moreover, we propose a selective discrete reprogramming scheme, which introduces an expanded spatio-temporal vocabulary space to project spatio-temporal series into discrete representations. This scheme minimizes the information loss during reprogramming and enriches the representations derived by PLMs. Extensive experiments on real-world datasets show that the proposed RePST outperforms twelve state-of-the-art baseline methods, particularly in data-scarce scenarios, highlighting the effectiveness and superior generalization capabilities of PLMs for spatio-temporal forecasting.	翻訳日:2024-11-08 05:04:12 公開日:2024-10-04
# T-FAKE: 顔のランドマークのための熱画像の合成 T-FAKE: Synthesizing Thermal Images for Facial Landmarking ( http://arxiv.org/abs/2408.15127v2 ) ライセンス: Link先を確認	Philipp Flotho, Moritz Piening, Anna Kukleva, Gabriele Steidl,	(参考訳) 顔分析は、セキュリティ、自律運転、エンターテイメント、ヘルスケアなど、幅広いアプリケーションにおいて重要なコンポーネントである。様々な顔のRGBデータセットが利用可能であるにもかかわらず、生命科学、医学、バイオメトリックスにおいて重要な役割を果たす熱モダリティはほとんど見過ごされてきた。このギャップに対処するために、スパースと密集したランドマークを備えた新しい大規模合成熱データセットであるT-FAKEデータセットを導入する。データセットの作成を容易にするため,RGB面へのサーマルスタイルの移動を可能にする新しいRGB2熱損失関数を提案する。サーマルパッチとRGBパッチ間のワッサースタイン距離と臨床温度分布の統計解析を利用して、生成したサーマルイメージが実際の試料とよく似ていることを確かめる。 RGB2熱損失関数に基づくRGB2熱伝達を用いて、顔の大規模合成熱データセットであるT-FAKEデータセットを作成する。新たなT-FAKEデータセット、確率的ランドマーク予測、ラベル適応ネットワークを活用して、異なるランドマーク規則における熱画像のランドマーク検出方法の大幅な改善を示す。我々のモデルは、スパース70点のランドマークと密度478点のランドマークアノテーションの両方で優れた性能を示している。私たちのコードとモデルはhttps://github.com/phflot/tfake.comで公開されています。 Facial analysis is a key component in a wide range of applications such as security, autonomous driving, entertainment, and healthcare. Despite the availability of various facial RGB datasets, the thermal modality, which plays a crucial role in life sciences, medicine, and biometrics, has been largely overlooked. To address this gap, we introduce the T-FAKE dataset, a new large-scale synthetic thermal dataset with sparse and dense landmarks. To facilitate the creation of the dataset, we propose a novel RGB2Thermal loss function, which enables the transfer of thermal style to RGB faces. By utilizing the Wasserstein distance between thermal and RGB patches and the statistical analysis of clinical temperature distributions on faces, we ensure that the generated thermal images closely resemble real samples. Using RGB2Thermal style transfer based on our RGB2Thermal loss function, we create the T-FAKE dataset, a large-scale synthetic thermal dataset of faces. Leveraging our novel T-FAKE dataset, probabilistic landmark prediction, and label adaptation networks, we demonstrate significant improvements in landmark detection methods on thermal images across different landmark conventions. Our models show excellent performance with both sparse 70-point landmarks and dense 478-point landmark annotations. Our code and models are available at https://github.com/phflot/tfake.	翻訳日:2024-11-08 04:41:58 公開日:2024-10-04
# 基本エントロピーの不等式から生じる量子エントロピーの連続性境界 Continuity bounds for quantum entropies arising from a fundamental entropic inequality ( http://arxiv.org/abs/2408.15306v2 ) ライセンス: Link先を確認	Koenraad Audenaert, Bjarne Bergh, Nilanjana Datta, Michael G. Jabbour, Ángela Capel, Paul Gondolf,	(参考訳) 我々は、フォン・ノイマンエントロピーの2つの量子状態、$\rho_1$ と $\rho_2$ の差について、厳密な上限を確立する。この境界は、差作用素 $(\rho_1 - \rho_2)$ のヨルダン=ハーン分解から導かれる相互直交状態のフォン・ノイマンエントロピーで表される。これは、よく知られた Audenaert-Fannes (AF) の不等式を意味する新しいエントロピー不等式をもたらす。事実、これはAFの不平等の洗練にも繋がる。この不等式を用いて、条件系上の限界が一致する2つの状態の量子条件エントロピーに対して一様連続性を得る。さらに、両変数の量子相対エントロピーに対して有界な連続性を導出するためにそれを用いる。我々の証明は、大まかに言えば、一般化理論と凸最適化に基づいている。興味深いことに、基本エントロピーの不等式は無限次元においても有効である。 We establish a tight upper bound for the difference in von Neumann entropies between two quantum states, $\rho_1$ and $\rho_2$. This bound is expressed in terms of the von Neumann entropies of the mutually orthogonal states derived from the Jordan-Hahn decomposition of the difference operator $(\rho_1 - \rho_2)$. This yields a novel entropic inequality that implies the well-known Audenaert-Fannes (AF) inequality. In fact, it also leads to a refinement of the AF inequality. We employ this inequality to obtain a uniform continuity bound for the quantum conditional entropy of two states whose marginals on the conditioning system coincide. We additionally use it to derive a continuity bound for the quantum relative entropy in both variables. Our proofs are largely based on majorization theory and convex optimization. Interestingly, the fundamental entropic inequality is also valid in infinite dimensions.	翻訳日:2024-11-08 04:41:58 公開日:2024-10-04
# CyberCortex.AI: 自律ロボットと複雑自動化のためのAIベースのオペレーティングシステム CyberCortex.AI: An AI-based Operating System for Autonomous Robotics and Complex Automation ( http://arxiv.org/abs/2409.01241v2 ) ライセンス: Link先を確認	Sorin Grigorescu, Mihai Zaha,	(参考訳) 自律型ロボットと複雑な自動化アプリケーションを制御するための基盤となるフレームワークは、知覚制御タスクをスケジューリングできるオペレーティングシステム(OS)であり、他のロボットピアやリモートクラウドコンピュータにリアルタイムのデータ通信を提供する。本稿では、異種AIベースのロボットと複雑な自動化アプリケーションを実現するために設計されたロボットOSであるCyberCortex AIを紹介する。 CyberCortex AIは分散分散OSで、ロボット同士の対話やクラウド上の高性能コンピュータ(HPC)との通信を可能にする。ロボットのセンサーと制御データは、その後ロボットにデプロイされるAIアルゴリズムのトレーニングを目的として、HPCシステムに向けてストリームされる。ロボットの各機能(例えば、知覚データ取得、経路計画、動作制御など)は、インターネットを介して共有されるいわゆるDataBlock of Filterの中で実行される。データは、いわゆるTAM(Temporal Addressable Memory)を通じて格納され、各フィルタの入力と出力の間のゲートウェイとして機能する。 CyberCortex.AIには2つの主要なコンポーネントがある。一ロボットの組込みハードウェア上で動作するDataBlockのリアルタイム実装であるCyberCortex AI推論システム二クラウド上のHPCコンピュータ上で実行されるCyberCortex AI dojoで、AIアルゴリズムの設計、トレーニング、デプロイに使用される。本稿では,2つの協調ロボティクスアプリケーションを用いて提案手法の定量的,定性的な性能解析を行う。一ユニツリーA1脚ロボット及びAnafi Parrot 4Kドローンに基づく森林火災防止システム二協調認識及び運動制御にCyberCortex.AIを使用する自律運転システム。 The underlying framework for controlling autonomous robots and complex automation applications are Operating Systems (OS) capable of scheduling perception-and-control tasks, as well as providing real-time data communication to other robotic peers and remote cloud computers. In this paper, we introduce CyberCortex AI, a robotics OS designed to enable heterogeneous AI-based robotics and complex automation applications. CyberCortex AI is a decentralized distributed OS which enables robots to talk to each other, as well as to High Performance Computers (HPC) in the cloud. Sensory and control data from the robots is streamed towards HPC systems with the purpose of training AI algorithms, which are afterwards deployed on the robots. Each functionality of a robot (e.g. sensory data acquisition, path planning, motion control, etc.) is executed within a so-called DataBlock of Filters shared through the internet, where each filter is computed either locally on the robot itself, or remotely on a different robotic system. The data is stored and accessed via a so-called Temporal Addressable Memory (TAM), which acts as a gateway between each filter's input and output. CyberCortex.AI has two main components: i) the CyberCortex AI inference system, which is a real-time implementation of the DataBlock running on the robots' embedded hardware, and ii) the CyberCortex AI dojo, which runs on an HPC computer in the cloud, and it is used to design, train and deploy AI algorithms. We present a quantitative and qualitative performance analysis of the proposed approach using two collaborative robotics applications: i) a forest fires prevention system based on an Unitree A1 legged robot and an Anafi Parrot 4K drone, as well as ii) an autonomous driving system which uses CyberCortex.AI for collaborative perception and motion control.	翻訳日:2024-11-08 03:23:46 公開日:2024-10-04
# CyberCortex.AI: 自律ロボットと複雑自動化のためのAIベースのオペレーティングシステム CyberCortex.AI: An AI-based Operating System for Autonomous Robotics and Complex Automation ( http://arxiv.org/abs/2409.01241v3 ) ライセンス: Link先を確認	Sorin Grigorescu, Mihai Zaha,	(参考訳) 自律型ロボットと複雑な自動化アプリケーションを制御するための基盤となるフレームワークは、知覚制御タスクをスケジューリングできるオペレーティングシステム(OS)であり、他のロボットピアやリモートクラウドコンピュータにリアルタイムのデータ通信を提供する。本稿では、異種AIベースのロボットと複雑な自動化アプリケーションを実現するために設計されたロボットOSであるCyberCortex AIを紹介する。 CyberCortex AIは分散分散OSで、ロボット同士の対話やクラウド上の高性能コンピュータ(HPC)との通信を可能にする。ロボットのセンサーと制御データは、その後ロボットにデプロイされるAIアルゴリズムのトレーニングを目的として、HPCシステムに向けてストリームされる。ロボットの各機能(例えば、知覚データ取得、経路計画、動作制御など)は、インターネットを介して共有されるいわゆるDataBlock of Filterの中で実行される。データは、いわゆるTAM(Temporal Addressable Memory)を通じて格納され、各フィルタの入力と出力の間のゲートウェイとして機能する。 CyberCortex AIには2つの主要コンポーネントがある。一ロボットの組込みハードウェア上で動作するDataBlockのリアルタイム実装であるCyberCortex AI推論システム二クラウド上のHPCコンピュータ上で実行されるCyberCortex AI dojoで、AIアルゴリズムの設計、トレーニング、デプロイに使用される。本稿では,2つの協調ロボティクスアプリケーションを用いて提案手法の定量的,定性的な性能解析を行う。一ユニツリーA1脚ロボット及びAnafi Parrot 4Kドローンに基づく森林火災防止システム二協調認識及び運動制御にCyberCortex AIを使用する自律運転システム。 The underlying framework for controlling autonomous robots and complex automation applications are Operating Systems (OS) capable of scheduling perception-and-control tasks, as well as providing real-time data communication to other robotic peers and remote cloud computers. In this paper, we introduce CyberCortex AI, a robotics OS designed to enable heterogeneous AI-based robotics and complex automation applications. CyberCortex AI is a decentralized distributed OS which enables robots to talk to each other, as well as to High Performance Computers (HPC) in the cloud. Sensory and control data from the robots is streamed towards HPC systems with the purpose of training AI algorithms, which are afterwards deployed on the robots. Each functionality of a robot (e.g. sensory data acquisition, path planning, motion control, etc.) is executed within a so-called DataBlock of Filters shared through the internet, where each filter is computed either locally on the robot itself, or remotely on a different robotic system. The data is stored and accessed via a so-called Temporal Addressable Memory (TAM), which acts as a gateway between each filter's input and output. CyberCortex AI has two main components: i) the CyberCortex AI inference system, which is a real-time implementation of the DataBlock running on the robots' embedded hardware, and ii) the CyberCortex AI dojo, which runs on an HPC computer in the cloud, and it is used to design, train and deploy AI algorithms. We present a quantitative and qualitative performance analysis of the proposed approach using two collaborative robotics applications: i) a forest fires prevention system based on an Unitree A1 legged robot and an Anafi Parrot 4K drone, as well as ii) an autonomous driving system which uses CyberCortex AI for collaborative perception and motion control.	翻訳日:2024-11-08 03:23:46 公開日:2024-10-04
# 原子干渉計を用いた重力曲率の局所測定方式 Local Measurement Scheme of Gravitational Curvature using Atom Interferometers ( http://arxiv.org/abs/2409.03515v3 ) ライセンス: Link先を確認	Michael Werner, Ali Lezeik, Dennis Schlippert, Ernst Rasel, Naceur Gaaloul, Klemens Hammerer,	(参考訳) 光パルス原子干渉計(英: Light pulse atom Interferometers、AIF)は、空間的不均一性と重力曲率の精巧な量子プローブである。さらに、極長塩基性原子干渉計(VLBAI)には詳細な測定と校正が必要不可欠である。ここでは、2つの共位置干渉計の差分信号が重力ポテンシャルの曲率に比例した位相シフトを逸脱する手法を提案する。スケール係数は、光子波数、干渉計時間、原子反動など、よく制御された量にのみ依存し、測定された位相から曲率を正確に推定することができる。ケーススタディでは,ハノーバーVLBAI施設の文脈において,このような重力波干渉計を数値シミュレーションし,複雑な空間依存性を持つ重力場における位相シフトのロバスト性を証明する。非自明な重力場に対する重力曲率の推定器を定義し、空間分解能に関する信号強度と推定精度のトレードオフを計算する。本稿では,時間依存重力場とそれに対応する測定戦略について考察する。 Light pulse atom interferometers (AIFs) are exquisite quantum probes of spatial inhomogeneity and gravitational curvature. Moreover, detailed measurement and calibration are necessary prerequisites for very-long-baseline atom interferometry (VLBAI). Here we present a method in which the differential signal of two co-located interferometers singles out a phase shift proportional to the curvature of the gravitational potential. The scale factor depends only on well controlled quantities, namely the photon wave number, the interferometer time and the atomic recoil, which allows the curvature to be accurately inferred from a measured phase. As a case study, we numerically simulate such a co-located gradiometric interferometer in the context of the Hannover VLBAI facility and prove the robustness of the phase shift in gravitational fields with complex spatial dependence. We define an estimator of the gravitational curvature for non-trivial gravitational fields and calculate the trade-off between signal strength and estimation accuracy with regard to spatial resolution. As a perspective, we discuss the case of a time-dependent gravitational field and corresponding measurement strategies.	翻訳日:2024-11-07 23:23:02 公開日:2024-10-04
# Qihoo-T2X:テキスト・ツー・アニータスクのための効率的なプロキシ・トークン型拡散変換器 Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task ( http://arxiv.org/abs/2409.04005v2 ) ライセンス: Link先を確認	Jing Wang, Ao Ma, Jiasong Feng, Dawei Leng, Yuhui Yin, Xiaodan Liang,	(参考訳) 拡散変圧器におけるグローバル自己保持機構は、視覚情報のスパースと冗長性に起因する冗長な計算を伴い、空間窓内のトークンの注意マップは、かなりの類似性を示している。この冗長性に対処するため、グローバルな視覚情報を効率的にモデル化するために、スパースな代表トークンアテンション(代表トークンの数はトークンの総数よりもはるかに少ない)を利用するプロキシ・トークン化拡散変換器(PT-DiT)を提案する。具体的には、各トランスブロック内で、各時空間ウィンドウから平均化トークンを計算し、その領域のプロキシトークンとして機能する。グローバルセマンティクスは、これらのプロキシトークンの自己アテンションを通じてキャプチャされ、その後、クロスアテンションを介してすべての潜在トークンに注入される。同時に、スパースアテンション機構によって引き起こされる詳細モデリングの限界に対処するために、ウィンドウとシフトウインドウのアテンションを導入する。 PT-DiTに基づいて,T2I,T2V,T2MVタスクの様々なモデルを含むQihoo-T2Xファミリーをさらに発展させる。実験の結果,PT-DiTは画像生成タスクと映像生成タスクの計算複雑性を減らし,競争性能が向上することがわかった(例:DiTに比べて49%,PixArt-$\alpha$に比べて34%)。 Qihoo-T2Xのビジュアルエキシビションとソースコードはhttps://360cvgroup.github.io/Qihoo-T2X/で公開されている。 The global self-attention mechanism in diffusion transformers involves redundant computation due to the sparse and redundant nature of visual information, and the attention map of tokens within a spatial window shows significant similarity. To address this redundancy, we propose the Proxy-Tokenized Diffusion Transformer (PT-DiT), which employs sparse representative token attention (where the number of representative tokens is much smaller than the total number of tokens) to model global visual information efficiently. Specifically, within each transformer block, we compute an averaging token from each spatial-temporal window to serve as a proxy token for that region. The global semantics are captured through the self-attention of these proxy tokens and then injected into all latent tokens via cross-attention. Simultaneously, we introduce window and shift window attention to address the limitations in detail modeling caused by the sparse attention mechanism. Building on the well-designed PT-DiT, we further develop the Qihoo-T2X family, which includes a variety of models for T2I, T2V, and T2MV tasks. Experimental results show that PT-DiT achieves competitive performance while reducing the computational complexity in both image and video generation tasks (e.g., a 49% reduction compared to DiT and a 34% reduction compared to PixArt-$\alpha$). The visual exhibition and source code of Qihoo-T2X is available at https://360cvgroup.github.io/Qihoo-T2X/.	翻訳日:2024-11-07 23:11:54 公開日:2024-10-04
# GRVFL-MV:マルチビュー学習に基づくグラフランダムベクトル関数リンク GRVFL-MV: Graph Random Vector Functional Link Based on Multi-View Learning ( http://arxiv.org/abs/2409.04743v2 ) ライセンス: Link先を確認	M. Tanveer, R. K. Sharma, M. Sajid, A. Quadir,	(参考訳) ランダム化されたニューラルネットワークであるランダムベクトル汎関数リンク(RVFL)の分類性能は広く認識されている。しかし、その浅い学習特性のため、RVFLはデータセットで利用可能なすべての関連情報を考慮できないことが多い。さらにデータセットの幾何学的性質も見落としている。これらの制約に対処するために,マルチビュー学習(GRVFL-MV)モデルに基づく新しいグラフランダムベクトル汎関数リンクを提案する。提案モデルは,マルチビュー学習(MVL)の概念を取り入れた複数のビューに基づいて学習し,グラフ埋め込み(GE)フレームワークを用いて,すべてのビューの幾何学的特性を取り入れた。 RVFLネットワーク, MVL, GEフレームワークの融合により, 提案したモデルにより, 以下のことを実現できる。 i) 効率的な学習: RVFLのトポロジを活用することにより,提案したモデルは,多視点データ内の非線形関係を効率的に把握し,効率的かつ正確な予測を容易にする。二包括的表現多様な視点から情報を融合することにより、提案されたモデルがデータ内の複雑なパターンや関係を捕捉し、モデル全体の一般化性能を向上させる能力を高めること。三構造意識:本提案モデルは、GEフレームワークを用いて、本質的及びペナルティ的サブスペース学習基準の両方を自然に活用することにより、データセットの本来のデータ分布を利用する。 27のUCIデータセットとKEELデータセット、Corel5kの50データセット、AwAの45データセットを含む、さまざまなデータセット上で提案されたGRVFL-MVモデルの評価は、ベースラインモデルよりも優れたパフォーマンスを示している。これらの結果は,提案したGRVFL-MVモデルの多種多様なデータセットに対する拡張一般化能力を強調した。 The classification performance of the random vector functional link (RVFL), a randomized neural network, has been widely acknowledged. However, due to its shallow learning nature, RVFL often fails to consider all the relevant information available in a dataset. Additionally, it overlooks the geometrical properties of the dataset. To address these limitations, a novel graph random vector functional link based on multi-view learning (GRVFL-MV) model is proposed. The proposed model is trained on multiple views, incorporating the concept of multiview learning (MVL), and it also incorporates the geometrical properties of all the views using the graph embedding (GE) framework. The fusion of RVFL networks, MVL, and GE framework enables our proposed model to achieve the following: i) efficient learning: by leveraging the topology of RVFL, our proposed model can efficiently capture nonlinear relationships within the multi-view data, facilitating efficient and accurate predictions; ii) comprehensive representation: fusing information from diverse perspectives enhance the proposed model's ability to capture complex patterns and relationships within the data, thereby improving the model's overall generalization performance; and iii) structural awareness: by employing the GE framework, our proposed model leverages the original data distribution of the dataset by naturally exploiting both intrinsic and penalty subspace learning criteria. The evaluation of the proposed GRVFL-MV model on various datasets, including 27 UCI and KEEL datasets, 50 datasets from Corel5k, and 45 datasets from AwA, demonstrates its superior performance compared to baseline models. These results highlight the enhanced generalization capabilities of the proposed GRVFL-MV model across a diverse range of datasets.	翻訳日:2024-11-07 22:49:49 公開日:2024-10-04
# データアトリビューションに対する敵対的攻撃 Adversarial Attacks on Data Attribution ( http://arxiv.org/abs/2409.05657v2 ) ライセンス: Link先を確認	Xinhe Wang, Pingbang Hu, Junwei Deng, Jiaqi W. Ma,	(参考訳) データ属性は、トレーニングデータの価値を測定し、データプロバイダを補うために使用されるAIモデルの出力に対する個々のトレーニングデータポイントの貢献を定量化することを目的としている。金融決定と補償機構への影響を考えると、データ帰属手法の対角的堅牢性に批判的な疑問が生じる。しかし、この問題に対処する体系的な研究はほとんど行われていない。本研究は、敵の目標と能力について明確な仮定で脅威モデルを詳述し、データ帰属に対する原則的敵攻撃手法を提案することによって、このギャップを埋めることを目的としている。本稿では, 処理したデータセットを生成し, 補償を逆方向に拡大するシャドウアタックとアウトリーアタックという2つの手法を提案する。シャドーアタック(シャドーアタック)は、AIアプリケーションにおけるデータ配布に関する知識を活用し、メンバシップ推論攻撃で一般的に使用されるテクニックである"シャドートレーニング(Shadow training)"を通じて、敵の摂動を導出する。対照的に、Outlier攻撃はデータ配布に関する知識を前提とせず、ターゲットモデルの予測にブラックボックスクエリのみに依存する。多くのデータ属性メソッドに存在する帰納バイアス(アウトリーなデータポイントは影響を受けやすい)を活用し、操作されたデータセットを生成するために逆例を使用する。画像分類やテキスト生成タスクにおいて、シャドウアタックはデータ属性ベースの補償を少なくとも200%増加させ、アウトリエアタックは185%から643%の補償インフレーションを達成する。 Data attribution aims to quantify the contribution of individual training data points to the outputs of an AI model, which has been used to measure the value of training data and compensate data providers. Given the impact on financial decisions and compensation mechanisms, a critical question arises concerning the adversarial robustness of data attribution methods. However, there has been little to no systematic research addressing this issue. In this work, we aim to bridge this gap by detailing a threat model with clear assumptions about the adversary's goal and capabilities and proposing principled adversarial attack methods on data attribution. We present two methods, Shadow Attack and Outlier Attack, which generate manipulated datasets to inflate the compensation adversarially. The Shadow Attack leverages knowledge about the data distribution in the AI applications, and derives adversarial perturbations through "shadow training", a technique commonly used in membership inference attacks. In contrast, the Outlier Attack does not assume any knowledge about the data distribution and relies solely on black-box queries to the target model's predictions. It exploits an inductive bias present in many data attribution methods - outlier data points are more likely to be influential - and employs adversarial examples to generate manipulated datasets. Empirically, in image classification and text generation tasks, the Shadow Attack can inflate the data-attribution-based compensation by at least 200%, while the Outlier Attack achieves compensation inflation ranging from 185% to as much as 643%.	翻訳日:2024-11-07 22:27:40 公開日:2024-10-04
# エンド・ツー・エンド・エンド・ラーニング・アプローチによるマルチ・エボディメント・ロコモーション One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion ( http://arxiv.org/abs/2409.06366v2 ) ライセンス: Link先を確認	Nico Bohlinger, Grzegorz Czechmanowski, Maciej Krupka, Piotr Kicki, Krzysztof Walas, Jan Peters, Davide Tateo,	(参考訳) 深層強化学習技術は、頑丈な足の移動において最先端の結果を達成する。四足歩行、ヒューマノイド、ヘキサポッドなどの多種多様な脚を持つプラットフォームが存在するが、この分野には、これらの異なる実施物を簡単かつ効果的に制御できる単一の学習フレームワークがまだ欠けている。本稿では,このギャップを埋めるために,統一ロボット形態学アーキテクチャであるURMAを紹介する。筆者らのフレームワークは,脚ロボットの領域にエンド・ツー・エンドのマルチタスク強化学習アプローチを導入し,学習方針がロボット形態を制御できるようにする。提案手法の鍵となる考え方は,形態に依存しないエンコーダとデコーダにより,ネットワークがエボディメント間でシームレスに共有できる抽象的な移動制御器を学習できるようにすることである。この柔軟なアーキテクチャは、足歩行ロボットの移動の基礎モデルを構築するための第一歩となる可能性がある。実験の結果,URMAは,シミュレーションや実世界において,見えないロボットプラットフォームに容易に移動可能な,複数の実施形態の移動ポリシーを学習できることが判明した。 Deep Reinforcement Learning techniques are achieving state-of-the-art results in robust legged locomotion. While there exists a wide variety of legged platforms such as quadruped, humanoids, and hexapods, the field is still missing a single learning framework that can control all these different embodiments easily and effectively and possibly transfer, zero or few-shot, to unseen robot embodiments. We introduce URMA, the Unified Robot Morphology Architecture, to close this gap. Our framework brings the end-to-end Multi-Task Reinforcement Learning approach to the realm of legged robots, enabling the learned policy to control any type of robot morphology. The key idea of our method is to allow the network to learn an abstract locomotion controller that can be seamlessly shared between embodiments thanks to our morphology-agnostic encoders and decoders. This flexible architecture can be seen as a potential first step in building a foundation model for legged robot locomotion. Our experiments show that URMA can learn a locomotion policy on multiple embodiments that can be easily transferred to unseen robot platforms in simulation and the real world.	翻訳日:2024-11-07 22:16:23 公開日:2024-10-04
# RePlay: 実験と生産のための推奨フレームワーク RePlay: a Recommendation Framework for Experimentation and Production Use ( http://arxiv.org/abs/2409.07272v3 ) ライセンス: Link先を確認	Alexey Vasilev, Anna Volodkevich, Denis Kulandin, Tatiana Bysheva, Anton Klenitskiy,	(参考訳) 推奨システムの構築と比較に1つのツールを使用すると、新しいモデルの市場投入までの時間が大幅に削減される。さらに、このようなツールを使用する場合の比較結果は、より一貫性があるように見える。そのため、リコメンデーション分野の研究者のための様々なツールやライブラリが最近登場した。残念なことに、これらのフレームワークのほとんどは主に研究者を対象としており、大規模なデータセットや不適切なアーキテクチャで作業できないため、本番環境での使用のために修正が必要である。このデモでは、オープンソースのツールキットであるRePlayを紹介します。 RePlayはまた、各ステージでパイプラインに適したスタック(Pandas、Polars、Spark)を使用することもできる。これにより、ライブラリは計算をスケールし、クラスタにデプロイできる。したがって、RePlayはデータサイエンティストが同じインターフェイスを使って簡単に研究モードからプロダクションモードに移行することを可能にする。 Using a single tool to build and compare recommender systems significantly reduces the time to market for new models. In addition, the comparison results when using such tools look more consistent. This is why many different tools and libraries for researchers in the field of recommendations have recently appeared. Unfortunately, most of these frameworks are aimed primarily at researchers and require modification for use in production due to the inability to work on large datasets or an inappropriate architecture. In this demo, we present our open-source toolkit RePlay - a framework containing an end-to-end pipeline for building recommender systems, which is ready for production use. RePlay also allows you to use a suitable stack for the pipeline on each stage: Pandas, Polars, or Spark. This allows the library to scale computations and deploy to a cluster. Thus, RePlay allows data scientists to easily move from research mode to production mode using the same interfaces.	翻訳日:2024-11-07 21:53:46 公開日:2024-10-04
# ソフトペアワイズ精度による自動計量の人的評価における統計的意義の改善 Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy ( http://arxiv.org/abs/2409.09598v2 ) ライセンス: Link先を確認	Brian Thompson, Nitika Mathur, Daniel Deutsch, Huda Khayrallah,	(参考訳) 人間のアノテータを最もよくエミュレートする自動計量を選択することは、しばしば「ベストエミュレート」という明確な定義がないため、非自明である。メタメトリックは、人間の判断と自動メートル法スコアを比較するために必要であり、メートル法ランキングはメタメトリックの選択に依存する。我々は,Pairwise Accuracy(PA)に基づく新しいメタメトリックであるSoft Pairwise Accuracy(SPA)を提案する。評価に用いるシステム/セグメント数の変化に対して,SPAはPAよりも安定であることを示す。また,測定値に異なる出力値の小さなセットのみを割り当てることが可能であることを示し,その結果,多くの指標が全く同じPAスコアに人工的に割り当てられることがわかった。 SPAがこの問題を修正することを実証します。最後に、SPAはPAよりも差別的であり、メトリクス間の統計的に有意な比較をもたらすことを示す。 SPAは2024 WMT Metrics Shared Taskの公式なシステムレベルメトリクスに選ばれた。 Selecting an automatic metric that best emulates human annotators is often non-trivial, because there is no clear definition of "best emulates." A meta-metric is required to compare the human judgments to the automatic metric scores, and metric rankings depend on the choice of meta-metric. We propose Soft Pairwise Accuracy (SPA), a new meta-metric that builds on Pairwise Accuracy (PA) but incorporates the statistical significance of both the human judgments and the metric scores. We show that SPA is more stable than PA with respect to changes in the number of systems/segments used for evaluation. We also show that PA can only assign a small set of distinct output values to metrics, and this results in many metrics being artificially assigned the exact same PA score. We demonstrate that SPA fixes this issue. Finally, we show that SPA is more discriminative than PA, producing more statistically significant comparisons between metrics. SPA was selected as the official system-level metric for the 2024 WMT Metrics Shared Task.	翻訳日:2024-11-07 20:46:36 公開日:2024-10-04
# GOSt-MT: 機械翻訳における作業関連性バイアスの知識グラフ GOSt-MT: A Knowledge Graph for Occupation-related Gender Biases in Machine Translation ( http://arxiv.org/abs/2409.10989v2 ) ライセンス: Link先を確認	Orfeas Menis Mastromichalakis, Giorgos Filandrianos, Eva Tsouparopoulou, Dimitris Parsanoglou, Maria Symeonaki, Giorgos Stamou,	(参考訳) 機械翻訳(MT)システムにおけるジェンダーバイアスは、しばしば有害なステレオタイプを補強する重大な課題を引き起こす。特に、職業が特定の性別と不正確な関係にある労働領域では、そのような偏見は伝統的なジェンダーのステレオタイプを持続させ、社会に大きな影響を及ぼす。これらの問題に対処することは、公平かつ正確なMTシステムの確保に不可欠である。本稿では, GOSt-MT (Gender and Occupation Statistics for Machine Translation) Knowledge Graph の作成を通じて, 職業関連性バイアスを研究するための新しい手法を提案する。 GOSt-MTは、MTトレーニングで使用される実世界の労働データとテキストコーパスからの包括的性別統計を統合している。この知識グラフは、英語、フランス語、ギリシア語にまたがる男女バイアスの詳細な分析を可能にし、永続的なステレオタイプと介入を必要とする領域の同定を容易にする。 GOSt-MTは、労働市場とMTシステムの両方でどのように職業がジェンダー化されているかを理解するための構造化された枠組みを提供することによって、MTシステムをより公平にし、自動翻訳における性別バイアスを減らすことを目的とした取り組みに貢献している。 Gender bias in machine translation (MT) systems poses significant challenges that often result in the reinforcement of harmful stereotypes. Especially in the labour domain where frequently occupations are inaccurately associated with specific genders, such biases perpetuate traditional gender stereotypes with a significant impact on society. Addressing these issues is crucial for ensuring equitable and accurate MT systems. This paper introduces a novel approach to studying occupation-related gender bias through the creation of the GOSt-MT (Gender and Occupation Statistics for Machine Translation) Knowledge Graph. GOSt-MT integrates comprehensive gender statistics from real-world labour data and textual corpora used in MT training. This Knowledge Graph allows for a detailed analysis of gender bias across English, French, and Greek, facilitating the identification of persistent stereotypes and areas requiring intervention. By providing a structured framework for understanding how occupations are gendered in both labour markets and MT systems, GOSt-MT contributes to efforts aimed at making MT systems more equitable and reducing gender biases in automated translations.	翻訳日:2024-11-07 20:13:03 公開日:2024-10-04
# 家庭の音:音声除去された音声イベント検出用家庭用オーディオデータセット The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection ( http://arxiv.org/abs/2409.11262v2 ) ライセンス: Link先を確認	Gabriel Bibbó, Thomas Deacon, Arshdeep Singh, Mark D. Plumbley,	(参考訳) 本稿では,高齢者の幸福感向上を目的としたスマートホームアプリケーションのための音声イベント検出研究を支援する住宅用オーディオデータセットを提案する。このデータセットは、55～80歳の8人の家庭に7日間の音声記録システムを展開することで構築される。音響特性は、詳細なフロアプランと建設材料情報を通して記録され、AIモデル展開のための記録環境の複製を可能にする。事前訓練された音声ニューラルネットワークを用いて、他の音声イベントを含むセグメントを保存しながら、音声を含むセグメントを検出し、除去する、新しい自動音声除去パイプラインを開発する。得られたデータセットは、住宅空間内の日常生活の音環境と活動を正確に把握するプライバシーに準拠したオーディオ記録で構成されている。本稿では,データセット作成手法,カスケードモデルアーキテクチャを利用した音声除去パイプライン,音声ラベル分布の解析を行い,音声除去プロセスの検証を行う。このデータセットは、家庭内アプリケーションに特化した音響イベント検出モデルの開発とベンチマークを可能にする。 This paper presents a residential audio dataset to support sound event detection research for smart home applications aimed at promoting wellbeing for older adults. The dataset is constructed by deploying audio recording systems in the homes of 8 participants aged 55-80 years for a 7-day period. Acoustic characteristics are documented through detailed floor plans and construction material information to enable replication of the recording environments for AI model deployment. A novel automated speech removal pipeline is developed, using pre-trained audio neural networks to detect and remove segments containing spoken voice, while preserving segments containing other sound events. The resulting dataset consists of privacy-compliant audio recordings that accurately capture the soundscapes and activities of daily living within residential spaces. The paper details the dataset creation methodology, the speech removal pipeline utilizing cascaded model architectures, and an analysis of the vocal label distribution to validate the speech removal process. This dataset enables the development and benchmarking of sound event detection models tailored specifically for in-home applications.	翻訳日:2024-11-07 20:13:03 公開日:2024-10-04
# EIA: プライバシ漏洩のためのジェネリストWebエージェントに対する環境注入攻撃 EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage ( http://arxiv.org/abs/2409.11295v2 ) ライセンス: Link先を確認	Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun,	(参考訳) ジェネラリストのウェブエージェントは、実際のウェブサイトで広範囲のタスクを自律的に完了させ、人間の生産性を著しく向上させる驚くべき可能性を示してきた。しかし、フライトの予約のようなウェブタスクは、通常ユーザーのPIIを介し、Webエージェントが誤って侵害されたウェブサイトと対話した場合、潜在的にプライバシー上のリスクにさらされる可能性がある。本研究では,敵環境におけるジェネラリストWebエージェントのプライバシーリスクに関する最初の研究を行うことにより,このギャップを狭める。まず,Webサイト上での攻撃に対する現実的な脅威モデルを提示し,ユーザ固有のPIIを盗むか,あるいはユーザ要求全体に対して,敵対的な2つのターゲットを検討する。次に,環境注入攻撃(EIA)と呼ばれる新しい攻撃手法を提案する。 EIAは、エージェントが操作する環境に順応するように設計された悪意のあるコンテンツを注入し、我々の作業は、Web環境のプライバシーシナリオに特化してEIAをインスタンス化する。我々は、Mind2Webから様々なPIIカテゴリを含む177のアクションステップを収集し、これまでで最も有能なジェネラリストWebエージェントフレームワークの1つを使用して実験を行う。その結果、EIAは特定のPIIを盗む際に最大70%のASRを達成し、16%のASRを全ユーザ要求で達成した。さらに、ステルスネスにアクセスして防衛システムプロンプトを試すことにより、EIAは検出および緩和が困難であることを示す。特に、Webページに適さない攻撃は、人間の検査によって検出できるため、セキュリティと自律性の間のトレードオフに関する議論につながります。しかし、追加の攻撃者の努力はEIAをシームレスに適応させ、そのような監督を効果的にしない。そこで我々は,人事監督に頼らず,より先進的な防衛戦略を求めることなく,Webサイトの前・後段階での防衛についてさらに議論する。 Generalist web agents have demonstrated remarkable potential in autonomously completing a wide range of tasks on real websites, significantly boosting human productivity. However, web tasks, such as booking flights, usually involve users' PII, which may be exposed to potential privacy risks if web agents accidentally interact with compromised websites, a scenario that remains largely unexplored in the literature. In this work, we narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments. First, we present a realistic threat model for attacks on the website, where we consider two adversarial targets: stealing users' specific PII or the entire user request. Then, we propose a novel attack method, termed Environmental Injection Attack (EIA). EIA injects malicious content designed to adapt well to environments where the agents operate and our work instantiates EIA specifically for privacy scenarios in web environments. We collect 177 action steps that involve diverse PII categories on realistic websites from the Mind2Web, and conduct experiments using one of the most capable generalist web agent frameworks to date. The results demonstrate that EIA achieves up to 70% ASR in stealing specific PII and 16% ASR for full user request. Additionally, by accessing the stealthiness and experimenting with a defensive system prompt, we indicate that EIA is hard to detect and mitigate. Notably, attacks that are not well adapted for a webpage can be detected via human inspection, leading to our discussion about the trade-off between security and autonomy. However, extra attackers' efforts can make EIA seamlessly adapted, rendering such supervision ineffective. Thus, we further discuss the defenses at the pre- and post-deployment stages of the websites without relying on human supervision and call for more advanced defense strategies.	翻訳日:2024-11-07 20:13:03 公開日:2024-10-04
# EIA: プライバシ漏洩のためのジェネリストWebエージェントに対する環境注入攻撃 EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage ( http://arxiv.org/abs/2409.11295v3 ) ライセンス: Link先を確認	Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun,	(参考訳) ジェネラリストのウェブエージェントは、実際のウェブサイトで広範囲のタスクを自律的に完了させ、人間の生産性を著しく向上させる驚くべき可能性を示してきた。しかし、フライトの予約のようなウェブタスクは、通常ユーザーのPIIを介し、Webエージェントが誤って侵害されたウェブサイトと対話した場合、潜在的にプライバシー上のリスクにさらされる可能性がある。本研究では,敵環境におけるジェネラリストWebエージェントのプライバシーリスクに関する最初の研究を行うことにより,このギャップを狭める。まず,Webサイト上での攻撃に対する現実的な脅威モデルを提示し,ユーザ固有のPIIを盗むか,あるいはユーザ要求全体に対して,敵対的な2つのターゲットを検討する。次に,環境注入攻撃(EIA)と呼ばれる新しい攻撃手法を提案する。 EIAは、エージェントが操作する環境に順応するように設計された悪意のあるコンテンツを注入し、我々の作業は、Web環境のプライバシーシナリオに特化してEIAをインスタンス化する。我々は、Mind2Webから様々なPIIカテゴリを含む177のアクションステップを収集し、これまでで最も有能なジェネラリストWebエージェントフレームワークの1つを使用して実験を行う。その結果、EIAは特定のPIIを盗む際に最大70%のASRを達成し、16%のASRを全ユーザ要求で達成した。さらに、ステルスネスにアクセスして防衛システムプロンプトを試すことにより、EIAは検出および緩和が困難であることを示す。特に、Webページに適さない攻撃は、人間の検査によって検出できるため、セキュリティと自律性の間のトレードオフに関する議論につながります。しかし、追加の攻撃者の努力はEIAをシームレスに適応させ、そのような監督を効果的にしない。そこで我々は,人事監督に頼らず,より先進的な防衛戦略を求めることなく,Webサイトの前・後段階での防衛についてさらに議論する。 Generalist web agents have demonstrated remarkable potential in autonomously completing a wide range of tasks on real websites, significantly boosting human productivity. However, web tasks, such as booking flights, usually involve users' PII, which may be exposed to potential privacy risks if web agents accidentally interact with compromised websites, a scenario that remains largely unexplored in the literature. In this work, we narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments. First, we present a realistic threat model for attacks on the website, where we consider two adversarial targets: stealing users' specific PII or the entire user request. Then, we propose a novel attack method, termed Environmental Injection Attack (EIA). EIA injects malicious content designed to adapt well to environments where the agents operate and our work instantiates EIA specifically for privacy scenarios in web environments. We collect 177 action steps that involve diverse PII categories on realistic websites from the Mind2Web, and conduct experiments using one of the most capable generalist web agent frameworks to date. The results demonstrate that EIA achieves up to 70% ASR in stealing specific PII and 16% ASR for full user request. Additionally, by accessing the stealthiness and experimenting with a defensive system prompt, we indicate that EIA is hard to detect and mitigate. Notably, attacks that are not well adapted for a webpage can be detected via human inspection, leading to our discussion about the trade-off between security and autonomy. However, extra attackers' efforts can make EIA seamlessly adapted, rendering such supervision ineffective. Thus, we further discuss the defenses at the pre- and post-deployment stages of the websites without relying on human supervision and call for more advanced defense strategies.	翻訳日:2024-11-07 20:13:03 公開日:2024-10-04
# 確率的時系列予測のためのリカレント補間器 Recurrent Interpolants for Probabilistic Time Series Prediction ( http://arxiv.org/abs/2409.11684v2 ) ライセンス: Link先を確認	Yu Chen, Marin Biloš, Sarthak Mittal, Wei Deng, Kashif Rasul, Anderson Schneider,	(参考訳) リカレントニューラルネットワークやトランスフォーマーのような逐次モデルは、様々な領域にわたる確率的多変量時系列予測の標準となっている。その強みにもかかわらず、彼らは高次元の分布と機能横断的な依存関係を捉えるのに苦労している。近年の研究では、拡散モデルやフローベースモデルを用いて、時系列計算や予測に拡張した生成的アプローチについて検討している。しかし、スケーラビリティは依然として課題である。本研究は, 確率的補間と制御機能付き条件付き生成に基づく拡散モデルの確率モデルに, 繰り返しニューラルネットワークの効率を組み合わす新しい手法を提案する。 Sequential models like recurrent neural networks and transformers have become standard for probabilistic multivariate time series forecasting across various domains. Despite their strengths, they struggle with capturing high-dimensional distributions and cross-feature dependencies. Recent work explores generative approaches using diffusion or flow-based models, extending to time series imputation and forecasting. However, scalability remains a challenge. This work proposes a novel method combining recurrent neural networks' efficiency with diffusion models' probabilistic modeling, based on stochastic interpolants and conditional generation with control features, offering insights for future developments in this dynamic field.	翻訳日:2024-11-07 19:50:48 公開日:2024-10-04
# 衛星映像における赤外小ターゲット検出:新しいデータセットと新しい特徴再構成フレームワーク Infrared Small Target Detection in Satellite Videos: A New Dataset and A Novel Recurrent Feature Refinement Framework ( http://arxiv.org/abs/2409.12448v1 ) ライセンス: Link先を確認	Xinyi Ying, Li Liu, Zaipin Lin, Yangsi Shi, Yingqian Wang, Ruojing Li, Xu Cao, Boyang Li, Shilin Zhou,	(参考訳) 衛星ビデオにおけるMIRST(Multi-frame infrared small target)検出は、何十年にもわたって持続する基本的かつ困難な課題であり、その課題は次のように要約できる: 第一に、非常に小さなターゲットサイズ、非常に複雑なクラッタとノイズ、様々な衛星の動きは、限られた特徴表現、高い偽アラーム、難しい動き解析である。第2に、衛星ビデオにおける大規模公開可能なMIRSTデータセットの欠如は、アルゴリズムの開発を著しく妨げている。上記の課題に対処するため、我々はまず衛星ビデオ(IRSatVideo-LEO)におけるMIRST検出のための大規模データセットを構築し、次にベースライン法として繰り返し機能改善(RFR)フレームワークを開発する。具体的には、IRSatVideo-LEOは、合成された衛星の動き、ターゲットの外観、軌道、強度を備えたセミシミュレートされたデータセットであり、衛星ビデオ生成のための標準ツールボックスと、アルゴリズム開発を容易にする信頼性の高い評価プラットフォームを提供することができる。ベースライン法では,時間的依存の長期利用と統合的動き補償とMIRST検出のための既存の強力なCNNベースの手法が提案されている。具体的には, ピラミッド変形性アライメント (PDA) モジュールと時間空間周波数変調 (TSFM) モジュールを提案し, 効率的な特徴アライメント, 伝搬, 凝集, 精製を実現する。提案手法の有効性と優位性を示すため, 大規模な実験を行った。比較の結果,ResUNetのRFRは最先端のMIRST検出法よりも優れていた。データセットとコードはhttps://github.com/XinyiYing/RFR.comで公開されている。 Multi-frame infrared small target (MIRST) detection in satellite videos is a long-standing, fundamental yet challenging task for decades, and the challenges can be summarized as: First, extremely small target size, highly complex clutters & noises, various satellite motions result in limited feature representation, high false alarms, and difficult motion analyses. Second, the lack of large-scale public available MIRST dataset in satellite videos greatly hinders the algorithm development. To address the aforementioned challenges, in this paper, we first build a large-scale dataset for MIRST detection in satellite videos (namely IRSatVideo-LEO), and then develop a recurrent feature refinement (RFR) framework as the baseline method. Specifically, IRSatVideo-LEO is a semi-simulated dataset with synthesized satellite motion, target appearance, trajectory and intensity, which can provide a standard toolbox for satellite video generation and a reliable evaluation platform to facilitate the algorithm development. For baseline method, RFR is proposed to be equipped with existing powerful CNN-based methods for long-term temporal dependency exploitation and integrated motion compensation & MIRST detection. Specifically, a pyramid deformable alignment (PDA) module and a temporal-spatial-frequency modulation (TSFM) module are proposed to achieve effective and efficient feature alignment, propagation, aggregation and refinement. Extensive experiments have been conducted to demonstrate the effectiveness and superiority of our scheme. The comparative results show that ResUNet equipped with RFR outperforms the state-of-the-art MIRST detection methods. Dataset and code are released at https://github.com/XinyiYing/RFR.	翻訳日:2024-11-07 14:52:37 公開日:2024-10-04
# 衛星映像における赤外小ターゲット検出:新しいデータセットと新しい特徴再構成フレームワーク Infrared Small Target Detection in Satellite Videos: A New Dataset and A Novel Recurrent Feature Refinement Framework ( http://arxiv.org/abs/2409.12448v2 ) ライセンス: Link先を確認	Xinyi Ying, Li Liu, Zaipin Lin, Yangsi Shi, Yingqian Wang, Ruojing Li, Xu Cao, Boyang Li, Shilin Zhou,	(参考訳) 衛星ビデオにおけるMIRST(Multi-frame infrared small target)検出は、何十年にもわたって持続する基本的かつ困難な課題であり、その課題は次のように要約できる: 第一に、非常に小さなターゲットサイズ、非常に複雑なクラッタとノイズ、様々な衛星の動きは、限られた特徴表現、高い偽アラーム、難しい動き解析である。第2に、衛星ビデオにおける大規模公開可能なMIRSTデータセットの欠如は、アルゴリズムの開発を著しく妨げている。上記の課題に対処するため、我々はまず衛星ビデオ(IRSatVideo-LEO)におけるMIRST検出のための大規模データセットを構築し、次にベースライン法として繰り返し機能改善(RFR)フレームワークを開発する。具体的には、IRSatVideo-LEOは、合成された衛星の動き、ターゲットの外観、軌道、強度を備えたセミシミュレートされたデータセットであり、衛星ビデオ生成のための標準ツールボックスと、アルゴリズム開発を容易にする信頼性の高い評価プラットフォームを提供することができる。ベースライン法では,時間的依存の長期利用と統合的動き補償とMIRST検出のための既存の強力なCNNベースの手法が提案されている。具体的には, ピラミッド変形性アライメント (PDA) モジュールと時間空間周波数変調 (TSFM) モジュールを提案し, 効率的な特徴アライメント, 伝搬, 凝集, 精製を実現する。提案手法の有効性と優位性を示すため, 大規模な実験を行った。比較の結果,ResUNetのRFRは最先端のMIRST検出法よりも優れていた。データセットとコードはhttps://github.com/XinyiYing/RFR.comで公開されている。 Multi-frame infrared small target (MIRST) detection in satellite videos is a long-standing, fundamental yet challenging task for decades, and the challenges can be summarized as: First, extremely small target size, highly complex clutters & noises, various satellite motions result in limited feature representation, high false alarms, and difficult motion analyses. Second, the lack of large-scale public available MIRST dataset in satellite videos greatly hinders the algorithm development. To address the aforementioned challenges, in this paper, we first build a large-scale dataset for MIRST detection in satellite videos (namely IRSatVideo-LEO), and then develop a recurrent feature refinement (RFR) framework as the baseline method. Specifically, IRSatVideo-LEO is a semi-simulated dataset with synthesized satellite motion, target appearance, trajectory and intensity, which can provide a standard toolbox for satellite video generation and a reliable evaluation platform to facilitate the algorithm development. For baseline method, RFR is proposed to be equipped with existing powerful CNN-based methods for long-term temporal dependency exploitation and integrated motion compensation & MIRST detection. Specifically, a pyramid deformable alignment (PDA) module and a temporal-spatial-frequency modulation (TSFM) module are proposed to achieve effective and efficient feature alignment, propagation, aggregation and refinement. Extensive experiments have been conducted to demonstrate the effectiveness and superiority of our scheme. The comparative results show that ResUNet equipped with RFR outperforms the state-of-the-art MIRST detection methods. Dataset and code are released at https://github.com/XinyiYing/RFR.	翻訳日:2024-11-07 14:52:37 公開日:2024-10-04
# CodePlan: コード形式計画のスケールアップによる大規模ランガウジモデルにおける推論可能性のアンロック CodePlan: Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning ( http://arxiv.org/abs/2409.12452v1 ) ライセンス: Link先を確認	Jiaxin Wen, Jian Guan, Hongning Wang, Wei Wu, Minlie Huang,	(参考訳) 従来の自然言語処理タスクにおける大規模言語モデル(LLM)の顕著な成功にもかかわらず、その計画能力は複雑な多段階推論タスクに取り組む上で重要なボトルネックとなっている。既存のアプローチは主にプロンプトやタスク固有の微調整に依存しており、しばしば弱い堅牢性とクロスタスクの一般化に悩まされている。この制限に対処するため,私たちは,高度で構造化された推論プロセスの概要を概説したコード形式計画の擬似コードの生成と追跡を可能にする,スケーラブルなパラダイムであるCODEPLANを紹介した。 CODEPLANは、構造化され汎用的なコードの性質を活用することで、洗練された推論に固有のリッチなセマンティクスと制御フローを効果的にキャプチャする。重要な点として、CODEPLANは、大規模で広範囲なテキストコーパスから、修正されたタスク固有のデータセットを必要とせずに、コード形式のプランを自動的に抽出することを可能にする。これにより、効率よくスケールアップし、さまざまなシナリオにおける推論機能を改善することができる。 CODEPLANをトレーニングするために,既存のコーパスから標準のプロンプト応答ペアとコード形式計画を統合する2Mサンプルの大規模データセットを構築した。 CODEPLANは、トレーニングと推論の間、計算オーバーヘッドが最小限に抑えられ、直接生成する応答と比較して25.1%の改善を実現し、数学的推論、記号的推論、命令追従、マルチホップQA、意思決定タスクにまたがる13の挑戦的なマルチステップ推論ベンチマークで平均化されている。さらなる分析により、CODEPLANはより複雑な推論タスクの性能向上と、その一般化能力によるデータ効率の向上を明らかにしている。 Despite the remarkable success of large language models (LLMs) on traditional natural language processing tasks, their planning ability remains a critical bottleneck in tackling complex multi-step reasoning tasks. Existing approaches mainly rely on prompting or task-specific fine-tuning, often suffering from weak robustness and cross-task generalization. To address the limitation, we introduce CODEPLAN, a scalable paradigm that empowers LLMs to generate and follow code-form plans pseudocode that outlines high-level, structured reasoning processes. By leveraging the structured and versatile nature of code, CODEPLAN effectively captures the rich semantics and control flows inherent to sophisticated reasoning. Importantly, CODEPLAN allows the automatic extraction of code-form plans from massive, wide-ranging text corpora without the need for curated, task-specific datasets. This enables it to scale up efficiently and improve reasoning capabilities across diverse scenarios. To train CODEPLAN, we construct a large-scale dataset of 2M examples that integrate code-form plans with standard prompt-response pairs from existing corpora. With minimal computation overhead during both training and inference, CODEPLAN achieves a 25.1% relative improvement compared with directly generating responses, averaged across 13 challenging multi-step reasoning benchmarks, spanning mathematical reasoning, symbolic reasoning, instruction-following, multi-hop QA, and decision-making tasks. Further analysis reveals CODEPLAN's increasing performance gains on more complex reasoning tasks, as well as significant data efficiency thanks to its generalization ability.	翻訳日:2024-11-07 14:52:37 公開日:2024-10-04
# コード・フォーム・プランニングのスケーリングによるLangaugeモデルにおけるアンロック推論の可能性 Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning ( http://arxiv.org/abs/2409.12452v2 ) ライセンス: Link先を確認	Jiaxin Wen, Jian Guan, Hongning Wang, Wei Wu, Minlie Huang,	(参考訳) 従来の自然言語処理タスクにおける大規模言語モデル(LLM)の顕著な成功にもかかわらず、その計画能力は複雑な多段階推論タスクに取り組む上で重要なボトルネックとなっている。既存のアプローチは主にプロンプトやタスク固有の微調整に依存しており、しばしばロバスト性やクロスタスクの一般化に悩まされている。この制限に対処するため、私たちはスケーラブルなフレームワークであるCodePlanを紹介します。 CodePlanは構造化され、汎用的なコードの性質を活用することで、洗練された推論タスクに固有のリッチなセマンティクスと制御フローを効果的にキャプチャする。重要な点として、CodePlanは、大規模で広範囲なテキストコーパスから、修正されたタスク固有のデータセットを必要とせずに、コード形式のプランを自動的に抽出することを可能にする。これにより、効率よくスケールアップでき、様々なシナリオでLCMの推論能力を改善することができる。 CodePlanをトレーニングするために、コードフォームプランと既存のコーパスから標準のプロンプト-レスポンスペアを統合する2Mサンプルの大規模なデータセットを構築した。トレーニングと推論の両方で計算オーバーヘッドが最小限に抑えられ、CodePlanは直接生成する応答と比較して25.1\%の改善を実現し、数学的推論、記号的推論、命令追従、マルチホップQA、意思決定タスクにまたがる13の挑戦的なマルチステップ推論ベンチマークで平均化されている。さらなる分析により、より複雑な推論タスクにおけるCodePlanのパフォーマンス向上と、その一般化能力によるデータ効率の向上が明らかになった。 Despite the remarkable success of large language models (LLMs) on traditional natural language processing tasks, their planning ability remains a critical bottleneck in tackling complex multi-step reasoning tasks. Existing approaches mainly rely on prompting or task-specific fine-tuning, often suffering from poor robustness and cross-task generalization. To address the limitation, we introduce CodePlan, a scalable framework that empowers LLMs to generate and follow \textit{code-form plans} -- pseudocode that outlines high-level, structured reasoning processes. By leveraging the structured and versatile nature of code, CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks. Importantly, CodePlan allows automatic extraction of code-form plans from massive, wide-ranging text corpora without the need for curated, task-specific datasets. This enables it to scale up efficiently and improve LLM's reasoning capabilities across diverse scenarios. To train CodePlan, we construct a large-scale dataset of 2M examples that integrate code-form plans with standard prompt-response pairs from existing corpora. With minimal computation overhead during both training and inference, CodePlan achieves a 25.1\% relative improvement compared with directly generating responses, averaged across 13 challenging multi-step reasoning benchmarks, spanning mathematical reasoning, symbolic reasoning, instruction-following, multi-hop QA, and decision-making tasks. Further analysis reveals CodePlan's increasing performance gains on more complex reasoning tasks, as well as significant data efficiency thanks to its generalization ability.	翻訳日:2024-11-07 14:52:37 公開日:2024-10-04
# 強化学習による自己補正のための言語モデルの構築 Training Language Models to Self-Correct via Reinforcement Learning ( http://arxiv.org/abs/2409.12917v1 ) ライセンス: Link先を確認	Aviral Kumar, Vincent Zhuang, Rishabh Agarwal, Yi Su, John D Co-Reyes, Avi Singh, Kate Baumli, Shariq Iqbal, Colton Bishop, Rebecca Roelofs, Lei M Zhang, Kay McKinney, Disha Shrivastava, Cosmin Paduraru, George Tucker, Doina Precup, Feryal Behbahani, Aleksandra Faust,	(参考訳) 自己補正は大規模言語モデル(LLM)において非常に望ましい能力であるが、現代のLLMではほとんど効果がないことが一貫して確認されている。自己補正を訓練するための既存のアプローチは、複数のモデルを必要とするか、より有能なモデルや他の形式の監督に依存している。この目的のために,完全自己生成データを用いたLLMの自己補正能力を大幅に向上させるマルチターンオンライン強化学習(RL)手法であるSCoReを開発した。 SCoReを構築するために、オフラインモデル生成補正トレースにおける教師付き微調整(SFT)の変種が自己補正動作の注入に不十分であることを示す。特に、SFTによるトレーニングは、トレーニングデータとモデル自身の応答の間の分布ミスマッチに苦しむか、あるいはテスト時に有効でない特定の修正行動のみを暗黙的に好むかのどちらかである。 SCoReは、モデル独自の自己生成補正トレースの分布の下でトレーニングを行い、適切な正規化を使用して、与えられたプロンプトに単純にハイリワード応答を適合させるのではなく、テスト時に有効である自己補正戦略を学習する。この正規化は、ベースモデル上でRLの第1フェーズを実行して、崩壊しにくいポリシー初期化を生成し、トレーニング中の自己補正を増幅するために報酬ボーナスを使用する。 Gemini 1.0 Pro と 1.5 Flash モデルに適用すると、SCoRe は最先端の自己補正性能を達成し、それぞれ MATH と HumanEval ベンチマークでベースモデルの自己補正を 15.6% と 9.1% 改善している。 Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Existing approaches for training self-correction either require multiple models or rely on a more capable model or other forms of supervision. To this end, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data. To build SCoRe, we first show that variants of supervised fine-tuning (SFT) on offline model-generated correction traces are insufficient for instilling self-correction behavior. In particular, we observe that training via SFT either suffers from a distribution mismatch between the training data and the model's own responses or implicitly prefers only a certain mode of correction behavior that is often not effective at test time. SCoRe addresses these challenges by training under the model's own distribution of self-generated correction traces and using appropriate regularization to steer the learning process into learning a self-correction strategy that is effective at test time as opposed to simply fitting high-reward responses for a given prompt. This regularization prescribes running a first phase of RL on a base model to generate a policy initialization that is less susceptible to collapse and then using a reward bonus to amplify self-correction during training. When applied to Gemini 1.0 Pro and 1.5 Flash models, we find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on the MATH and HumanEval benchmarks.	翻訳日:2024-11-07 12:59:09 公開日:2024-10-04
# 強化学習による自己補正のための言語モデルの構築 Training Language Models to Self-Correct via Reinforcement Learning ( http://arxiv.org/abs/2409.12917v2 ) ライセンス: Link先を確認	Aviral Kumar, Vincent Zhuang, Rishabh Agarwal, Yi Su, John D Co-Reyes, Avi Singh, Kate Baumli, Shariq Iqbal, Colton Bishop, Rebecca Roelofs, Lei M Zhang, Kay McKinney, Disha Shrivastava, Cosmin Paduraru, George Tucker, Doina Precup, Feryal Behbahani, Aleksandra Faust,	(参考訳) 自己補正は大規模言語モデル(LLM)において非常に望ましい能力であるが、現代のLLMではほとんど効果がないことが一貫して確認されている。現在の自己補正の訓練方法は、通常、複数のモデル、より高度なモデル、または追加の監督形式に依存する。これらの欠点に対処するため、完全自己生成データを用いたLLMの自己補正能力を大幅に向上させるマルチターンオンライン強化学習(RL)アプローチであるSCoReを開発した。 SCoReを構築するために、オフラインモデル生成補正トレースにおける教師付き微調整(SFT)の変種は、しばしば自己補正動作の注入に不十分であることを示す。特に、SFTによるトレーニングは、データ収集ポリシーとモデル自身の応答のミスによる分布ミスマッチや、学習が暗黙的に修正行動の特定のモードのみを優先する行動崩壊に陥ることが観察された。 SCoReは、モデル独自の自己生成補正トレースの分布の下でトレーニングを行い、適切な正規化を使用して、与えられたプロンプトに高次応答を適用するのではなく、テスト時に有効である自己補正動作を学ぶ。この正規化プロセスは、基本モデル上のマルチターンRLの初期フェーズを含み、崩壊しにくいポリシー初期化を生成し、その後、報酬ボーナスを使用して自己補正を増幅する。 Gemini 1.0 Pro と 1.5 Flash モデルでは、SCoRe は最先端の自己補正性能を達成し、ベースモデルの自己補正を MATH と HumanEval でそれぞれ 15.6% と 9.1% 改善している。 Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Current methods for training self-correction typically depend on either multiple models, a more advanced model, or additional forms of supervision. To address these shortcomings, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data. To build SCoRe, we first show that variants of supervised fine-tuning (SFT) on offline model-generated correction traces are often insufficient for instilling self-correction behavior. In particular, we observe that training via SFT falls prey to either a distribution mismatch between mistakes made by the data-collection policy and the model's own responses, or to behavior collapse, where learning implicitly prefers only a certain mode of correction behavior that is often not effective at self-correction on test problems. SCoRe addresses these challenges by training under the model's own distribution of self-generated correction traces and using appropriate regularization to steer the learning process into learning a self-correction behavior that is effective at test time as opposed to fitting high-reward responses for a given prompt. This regularization process includes an initial phase of multi-turn RL on a base model to generate a policy initialization that is less susceptible to collapse, followed by using a reward bonus to amplify self-correction. With Gemini 1.0 Pro and 1.5 Flash models, we find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on MATH and HumanEval.	翻訳日:2024-11-07 12:59:09 公開日:2024-10-04
# 最適大域制御による絡み合い型量子センシング Entanglement-enhanced quantum sensing via optimal global control ( http://arxiv.org/abs/2409.12932v1 ) ライセンス: Link先を確認	Vineesha Srivastava, Sven Jandura, Gavin K Brennen, Guido Pupillo,	(参考訳) 共役キャビティモードに結合した$N$スピンの対称ディック部分空間における任意の絡み合った状態を生成するための決定論的プロトコルを提案する。このプロトコルは、新しい幾何学的位相ゲート、ノイズのある量子チャネルダイナミクスの解析解、最適制御法を組み合わせることで、量子センシングに有用な絡み合った状態を作成し、光子キャビティ損失、自然放出、復号化の存在下で、標準量子限界よりも精度が大幅に向上する。この研究は、キャビティ内の冷たい閉じ込められた原子と絡み合うエンハンスドセンシングへの道を開き、また、閉じ込められたイオンの実験にも直接的に関係している。 We present a deterministic protocol for the preparation of arbitrary entangled states in the symmetric Dicke subspace of $N$ spins coupled to a common cavity mode. By combining a new geometric phase gate, an analytic solution of the noisy quantum channel dynamics and optimal control methods, the protocol prepares entangled states that are useful for quantum sensing, achieving a precision significantly better than the standard quantum limit in the presence of photon cavity loss, spontaneous emission and dephasing. This work opens the way to entanglement-enhanced sensing with cold trapped atoms in cavities and is also directly relevant for experiments with trapped ions.	翻訳日:2024-11-07 12:48:01 公開日:2024-10-04
# 最適大域制御による絡み合い型量子センシング Entanglement-enhanced quantum sensing via optimal global control ( http://arxiv.org/abs/2409.12932v2 ) ライセンス: Link先を確認	Vineesha Srivastava, Sven Jandura, Gavin K Brennen, Guido Pupillo,	(参考訳) 共役キャビティモードに結合した$N$スピンの対称ディック部分空間における任意の絡み合った状態を生成するための決定論的プロトコルを提案する。このプロトコルは、新しい幾何学的位相ゲート、ノイズのある量子チャネルダイナミクスの解析解、最適制御法を組み合わせることで、量子センシングに有用な絡み合った状態を作成し、光子キャビティ損失、自然放出、復号化の存在下で、標準量子限界よりも精度が大幅に向上する。この研究は、キャビティ内の冷たい閉じ込められた原子と絡み合うエンハンスドセンシングへの道を開き、また、閉じ込められたイオンの実験にも直接的に関係している。 We present a deterministic protocol for the preparation of arbitrary entangled states in the symmetric Dicke subspace of $N$ spins coupled to a common cavity mode. By combining a new geometric phase gate, an analytic solution of the noisy quantum channel dynamics and optimal control methods, the protocol prepares entangled states that are useful for quantum sensing, achieving a precision significantly better than the standard quantum limit in the presence of photon cavity loss, spontaneous emission and dephasing. This work opens the way to entanglement-enhanced sensing with cold trapped atoms in cavities and is also directly relevant for experiments with trapped ions.	翻訳日:2024-11-07 12:48:01 公開日:2024-10-04
# データダイエット:PET/CTデータセットをトリミングできるか? Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation? ( http://arxiv.org/abs/2409.13548v1 ) ライセンス: Link先を確認	Alexander Jaus, Simon Reiß, Jens Klesiek, Rainer Stiefelhagen,	(参考訳) 本稿では,AutoPET3データ中心のトラックで競合するアプローチについて述べる。従来の知恵は、より大きなデータセットがより良いモデル性能をもたらすことを示しているが、最近の研究では、特定のトレーニングサンプルを除くと、モデルの精度が向上することを示している。 AutoPETIIIデータセットでは,特にPSMA-PETに対して多数の偽陽性を発生させることにより,データセット全体をトレーニングしたモデルが望ましくない特性を示すことがわかった。我々は、スクラッチから再トレーニングする前に、モデル損失によって測定されたトレーニングデータセットから最も簡単なサンプルを取り除き、これを対処する。提案手法を用いることで, 予備試験セットにおける偽陰体積とダイススコアの両方において, 偽負体積を下げ, ベースラインモデルを改善することができる。コードと事前訓練されたモデルはgithub.com/alexanderjaus/autopet3_datadietで入手できる。 In this work, we describe our approach to compete in the autoPET3 datacentric track. While conventional wisdom suggests that larger datasets lead to better model performance, recent studies indicate that excluding certain training samples can enhance model accuracy. We find that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics by producing a large number of false positives particularly for PSMA-PETs. We counteract this by removing the easiest samples from the training dataset as measured by the model loss before retraining from scratch. Using the proposed approach we manage to drive down the false negative volume and improve upon the baseline model in both false negative volume and dice score on the preliminary test set. Code and pre-trained models are available at github.com/alexanderjaus/autopet3_datadiet.	翻訳日:2024-11-07 06:41:58 公開日:2024-10-04
# データダイエット:PET/CTデータセットをトリミングできるか? Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation? ( http://arxiv.org/abs/2409.13548v2 ) ライセンス: Link先を確認	Alexander Jaus, Simon Reiß, Jens Klesiek, Rainer Stiefelhagen,	(参考訳) 本稿では,AutoPET3データ中心のトラックで競合するアプローチについて述べる。従来の知恵は、より大きなデータセットがより良いモデル性能をもたらすことを示しているが、最近の研究では、特定のトレーニングサンプルを除くと、モデルの精度が向上することを示している。 AutoPETIIIデータセットでは,特にPSMA-PETに対して多数の偽陽性を発生させることにより,データセット全体をトレーニングしたモデルが望ましくない特性を示すことがわかった。我々は、スクラッチから再トレーニングする前に、モデル損失によって測定されたトレーニングデータセットから最も簡単なサンプルを取り除き、これを対処する。提案手法を用いることで, 予備試験セットにおける偽陰体積とダイススコアの両方において, 偽負体積を下げ, ベースラインモデルを改善することができる。コードと事前訓練されたモデルはgithub.com/alexanderjaus/autopet3_datadietで入手できる。 In this work, we describe our approach to compete in the autoPET3 datacentric track. While conventional wisdom suggests that larger datasets lead to better model performance, recent studies indicate that excluding certain training samples can enhance model accuracy. We find that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics by producing a large number of false positives particularly for PSMA-PETs. We counteract this by removing the easiest samples from the training dataset as measured by the model loss before retraining from scratch. Using the proposed approach we manage to drive down the false negative volume and improve upon the baseline model in both false negative volume and dice score on the preliminary test set. Code and pre-trained models are available at github.com/alexanderjaus/autopet3_datadiet.	翻訳日:2024-11-07 06:41:58 公開日:2024-10-04
# データダイエット:PET/CTデータセットをトリミングできるか? Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation? ( http://arxiv.org/abs/2409.13548v3 ) ライセンス: Link先を確認	Alexander Jaus, Simon Reiß, Jens Klesiek, Rainer Stiefelhagen,	(参考訳) 本稿では,AutoPET3データ中心のトラックで競合するアプローチについて述べる。従来の知恵は、より大きなデータセットがより良いモデル性能をもたらすことを示しているが、最近の研究では、特定のトレーニングサンプルを除くと、モデルの精度が向上することを示している。 AutoPETIIIデータセットでは,特にPSMA-PETに対して多数の偽陽性を発生させることにより,データセット全体をトレーニングしたモデルが望ましくない特性を示すことがわかった。我々は、スクラッチから再トレーニングする前に、モデル損失によって測定されたトレーニングデータセットから最も簡単なサンプルを取り除き、これを対処する。提案手法を用いることで, 予備試験セットにおける偽陰体積とダイススコアの両方において, 偽負体積を下げ, ベースラインモデルを改善することができる。コードと事前訓練されたモデルはgithub.com/alexanderjaus/autopet3_datadietで入手できる。 In this work, we describe our approach to compete in the autoPET3 datacentric track. While conventional wisdom suggests that larger datasets lead to better model performance, recent studies indicate that excluding certain training samples can enhance model accuracy. We find that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics by producing a large number of false positives particularly for PSMA-PETs. We counteract this by removing the easiest samples from the training dataset as measured by the model loss before retraining from scratch. Using the proposed approach we manage to drive down the false negative volume and improve upon the baseline model in both false negative volume and dice score on the preliminary test set. Code and pre-trained models are available at github.com/alexanderjaus/autopet3_datadiet.	翻訳日:2024-11-07 06:41:58 公開日:2024-10-04
# ブロックワールドにおける修復: マルチモーダル言語モデルによるユーザ訂正処理のための新しいベンチマーク Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models ( http://arxiv.org/abs/2409.14247v1 ) ライセンス: Link先を確認	Javier Chiyah-Garcia, Alessandro Suglia, Arash Eshghi,	(参考訳) 対話では、ディレクタはまず話者を誤解し、誤って応答し、しばしば第3の位置修正(TPR)で次のターンで誤解を修正するように促す。このような修復シーケンスを適切に処理し、応答する能力は、会話型AIシステムにおいて重要である。本稿では,まずBlockWorld-Repairsを設計・分析・公開し,指示追従操作タスクにおけるマルチモーダルなTPRシーケンスのデータセットについて述べる。このデータセットを用いて、複数の設定にまたがって複数の最先端のビジョン・アンド・言語モデル(VLM)を評価し、TPRを処理し、正確に応答し、それによって誤通信から回復する能力に焦点を当てる。このタスクでは、人間に比べて、すべてのモデルの性能が著しく劣っていることが分かりました。次に、VLMは、微調整中に関連するトークンをターゲットとした特別な損失の恩恵を受けることができ、より良い性能と生成性を実現することができることを示す。これらのモデルは、修復が一般的であるマルチモーダルな協調環境において、まだ展開する準備が整っていないことを示唆し、インタラクションからの学習を容易にするトレーニング体制や目的を設計する必要性を強調した。 In dialogue, the addressee may initially misunderstand the speaker and respond erroneously, often prompting the speaker to correct the misunderstanding in the next turn with a Third Position Repair (TPR). The ability to process and respond appropriately to such repair sequences is thus crucial in conversational AI systems. In this paper, we first collect, analyse, and publicly release BlockWorld-Repairs: a dataset of multi-modal TPR sequences in an instruction-following manipulation task that is, by design, rife with referential ambiguity. We employ this dataset to evaluate several state-of-the-art Vision and Language Models (VLM) across multiple settings, focusing on their capability to process and accurately respond to TPRs and thus recover from miscommunication. We find that, compared to humans, all models significantly underperform in this task. We then show that VLMs can benefit from specialised losses targeting relevant tokens during fine-tuning, achieving better performance and generisability. Our results suggest that these models are not yet ready to be deployed in multi-modal collaborative settings where repairs are common, and highlight the need to design training regimes and objectives that facilitate learning from interaction.	翻訳日:2024-11-06 23:37:15 公開日:2024-10-04
# ブロックワールドにおける修復: マルチモーダル言語モデルによるユーザ訂正処理のための新しいベンチマーク Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models ( http://arxiv.org/abs/2409.14247v2 ) ライセンス: Link先を確認	Javier Chiyah-Garcia, Alessandro Suglia, Arash Eshghi,	(参考訳) 対話では、ディレクタはまず話者を誤解し、誤って応答し、しばしば第3の位置修正(TPR)で次のターンで誤解を修正するように促す。このような修復シーケンスを適切に処理し、応答する能力は、会話型AIシステムにおいて重要である。本稿では,まずBlockWorld-Repairsを設計・分析・公開し,指示追従操作タスクにおけるマルチモーダルなTPRシーケンスのデータセットについて述べる。このデータセットを用いて、複数の設定にまたがって複数の最先端のビジョン・アンド・言語モデル(VLM)を評価し、TPRを処理し、正確に応答し、それによって誤通信から回復する能力に焦点を当てる。このタスクでは、人間に比べて、すべてのモデルの性能が著しく劣っていることが分かりました。次に、VLMは、微調整中に関連するトークンをターゲットとした特別な損失の恩恵を受けることができ、パフォーマンスが向上し、新しいシナリオに最適化できることを示す。これらのモデルは、修復が一般的であるマルチモーダルな協調環境において、まだ展開する準備が整っていないことを示唆し、インタラクションからの学習を容易にするトレーニング体制や目的を設計する必要性を強調した。私たちのコードとデータはwww.github.com/JChiyah/blockworld-repairsで利用可能です。 In dialogue, the addressee may initially misunderstand the speaker and respond erroneously, often prompting the speaker to correct the misunderstanding in the next turn with a Third Position Repair (TPR). The ability to process and respond appropriately to such repair sequences is thus crucial in conversational AI systems. In this paper, we first collect, analyse, and publicly release BlockWorld-Repairs: a dataset of multi-modal TPR sequences in an instruction-following manipulation task that is, by design, rife with referential ambiguity. We employ this dataset to evaluate several state-of-the-art Vision and Language Models (VLM) across multiple settings, focusing on their capability to process and accurately respond to TPRs and thus recover from miscommunication. We find that, compared to humans, all models significantly underperform in this task. We then show that VLMs can benefit from specialised losses targeting relevant tokens during fine-tuning, achieving better performance and generalising better to new scenarios. Our results suggest that these models are not yet ready to be deployed in multi-modal collaborative settings where repairs are common, and highlight the need to design training regimes and objectives that facilitate learning from interaction. Our code and data are available at www.github.com/JChiyah/blockworld-repairs	翻訳日:2024-11-06 23:37:15 公開日:2024-10-04
# アスペクト感度トリプレット抽出におけるASTE変換器の依存性のモデル化 ASTE Transformer Modelling Dependencies in Aspect-Sentiment Triplet Extraction ( http://arxiv.org/abs/2409.15202v2 ) ライセンス: Link先を確認	Iwo Naglik, Mateusz Lango,	(参考訳) Aspect-Sentiment Triplet extract (ASTE)は、最近提案されたアスペクトベースの感情分析のタスクであり、ある文から三重項(アスペクトフレーズ、意見フレーズ、感情極性)を抽出する。最近の最先端の手法では、まず与えられたテキストから可能なすべてのテキストを抽出し、次に潜在的なアスペクトと意見句を分類器でフィルタリングし、最後にすべてのペアを別の分類器で考慮し、さらに感情の極性を割り当てることによって、このタスクにアプローチしている。上記のスキームのいくつかのバリエーションが提案されているが、一般的な特徴は、最終的な結果が独立した分類器の連続によって構成されることである。これにより、抽出されたフレーズ間の依存関係の活用が妨げられ、分類器間の相互関係に関する知識の使用が防止され、性能が向上する。本稿では,3つのトランスフォーマーにインスパイアされたレイヤからなる新しいASTE手法を提案する。実験結果から,この手法はF1測度において,他のベンチマーク手法よりも高い性能を示すことが示された。さらに,簡単な事前学習手法により,モデルの性能が向上することを示す。 Aspect-Sentiment Triplet Extraction (ASTE) is a recently proposed task of aspect-based sentiment analysis that consists in extracting (aspect phrase, opinion phrase, sentiment polarity) triples from a given sentence. Recent state-of-the-art methods approach this task by first extracting all possible text spans from a given text, then filtering the potential aspect and opinion phrases with a classifier, and finally considering all their pairs with another classifier that additionally assigns sentiment polarity to them. Although several variations of the above scheme have been proposed, the common feature is that the final result is constructed by a sequence of independent classifier decisions. This hinders the exploitation of dependencies between extracted phrases and prevents the use of knowledge about the interrelationships between classifier predictions to improve performance. In this paper, we propose a new ASTE approach consisting of three transformer-inspired layers, which enables the modelling of dependencies both between phrases and between the final classifier decisions. Experimental results show that the method achieves higher performance in terms of F1 measure than other methods studied on popular benchmarks. In addition, we show that a simple pre-training technique further improves the performance of the model.	翻訳日:2024-11-06 20:27:58 公開日:2024-10-04
# 視覚認識におけるパラメータ効率変換学習(PETL)の統一的研究から学んだ教訓 Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition ( http://arxiv.org/abs/2409.16434v2 ) ライセンス: Link先を確認	Zheda Mai, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Li Zhang, Wei-Lun Chao,	(参考訳) 近年, パラメータ効率変換学習 (PETL) が注目されている。これは, 事前学習モデルのサイズが増大し, より優れたダウンストリーム性能を実現するために, それらを微調整 (FT) する必要があるためである。このコミュニティ全体の熱意は、多くのアプローチを引き起こしました。それにもかかわらず、パフォーマンスと適切なアプリケーションシナリオを理解するための体系的な研究には不足があり、PETLをいつ適用するか、どのアプローチを使うかといった疑問がほとんど答えられていない。本稿では,視覚変換器の文脈における代表的PETL手法の統一的な実証的研究を行う。我々は、下流タスクの精度を正確に比較するために、これらのハイパーパラメータを体系的に調整する。私たちの研究は価値あるユーザーガイドを提供するだけでなく、いくつかの新しい洞察も発表しています。まず、慎重に調整すると、異なるPETL法がローショットベンチマークVTAB-1Kで同様の精度が得られる。これにはFTのような単純な方法が含まれており、バイアス項は劣っていると報告されている。第二に、PETL法は類似した精度で異なる誤りと高い信頼率の予測を行う。このような矛盾(あるいは相補性)はアンサンブル手法の機会を開き、予備的な試みを行う。第3に、一般的に使用されるローショットタスクを超えて、PETLは、多くのショットレシエーションでも有用であることが分かりました。最後に,PETLの分散シフトに対する頑健性(例えば,CLIPバックボーン)を維持する能力について検討する。おそらく驚くことではないが、PETL法は完全なFT法よりも優れている。しかし、重量空間のアンサンブルでは、完全な微調整モデルにより、分布と分布シフト性能のバランスが良くなり、PETLの今後の研究方向性が示唆される。 Parameter-efficient transfer learning (PETL) has attracted significant attention lately, due to the increasing size of pre-trained models and the need to fine-tune (FT) them for superior downstream performance. This community-wide enthusiasm has sparked a plethora of approaches. Nevertheless, a systematic study to understand their performance and suitable application scenarios is lacking, leaving questions like when to apply PETL and which approach to use largely unanswered. In this paper, we conduct a unifying empirical study of representative PETL methods in the context of Vision Transformers. We systematically tune their hyper-parameters to fairly compare their accuracy on downstream tasks. Our study not only offers a valuable user guide but also unveils several new insights. First, if tuned carefully, different PETL methods can obtain similar accuracy in the low-shot benchmark VTAB-1K. This includes simple methods like FT the bias terms that were reported inferior. Second, though with similar accuracy, we find that PETL methods make different mistakes and high-confidence predictions, likely due to their different inductive biases. Such an inconsistency (or complementariness) opens up the opportunity for ensemble methods, and we make preliminary attempts at this. Third, going beyond the commonly used low-shot tasks, we find that PETL is also useful in many-shot regimes -- it achieves comparable and sometimes better accuracy than full FT, using much fewer learnable parameters. Last but not least, we investigate PETL's ability to preserve a pre-trained model's robustness to distribution shifts (e.g., a CLIP backbone). Perhaps not surprisingly, PETL methods outperform full FT alone. However, with weight-space ensembles, the fully fine-tuned model can better balance target (i.e., downstream) distribution and distribution shift performance, suggesting a future research direction for PETL.	翻訳日:2024-11-06 17:30:16 公開日:2024-10-04
# 視覚認識におけるパラメータ効率変換学習(PETL)の統一的研究から学んだ教訓 Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition ( http://arxiv.org/abs/2409.16434v3 ) ライセンス: Link先を確認	Zheda Mai, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Li Zhang, Wei-Lun Chao,	(参考訳) 近年, パラメータ効率変換学習 (PETL) が注目されている。これは, 事前学習モデルのサイズが増大し, より優れたダウンストリーム性能を実現するために, それらを微調整 (FT) する必要があるためである。このコミュニティ全体の熱意は、多くのアプローチを引き起こしました。それにもかかわらず、パフォーマンスと適切なアプリケーションシナリオを理解するための体系的な研究には不足があり、PETLをいつ適用するか、どのアプローチを使うかといった疑問がほとんど答えられていない。本稿では,視覚変換器の文脈における代表的PETL手法の統一的な実証的研究を行う。我々は、下流タスクの精度を正確に比較するために、これらのハイパーパラメータを体系的に調整する。私たちの研究は価値あるユーザーガイドを提供するだけでなく、いくつかの新しい洞察も発表しています。まず、慎重に調整すると、異なるPETL法がローショットベンチマークVTAB-1Kで同様の精度が得られる。これにはFTのような単純な方法が含まれており、バイアス項は劣っていると報告されている。第二に、PETL法は類似した精度で異なる誤りと高い信頼率の予測を行う。このような矛盾(あるいは相補性)はアンサンブル手法の機会を開き、予備的な試みを行う。第3に、一般的に使用されるローショットタスクを超えて、PETLは、多くのショットレシエーションでも有用であることが分かりました。最後に,PETLの分散シフトに対する頑健性(例えば,CLIPバックボーン)を維持する能力について検討する。おそらく驚くことではないが、PETL法は完全なFT法よりも優れている。しかし、重量空間のアンサンブルでは、完全な微調整モデルにより、分布と分布シフト性能のバランスが良くなり、PETLの今後の研究方向性が示唆される。 Parameter-efficient transfer learning (PETL) has attracted significant attention lately, due to the increasing size of pre-trained models and the need to fine-tune (FT) them for superior downstream performance. This community-wide enthusiasm has sparked a plethora of approaches. Nevertheless, a systematic study to understand their performance and suitable application scenarios is lacking, leaving questions like when to apply PETL and which approach to use largely unanswered. In this paper, we conduct a unifying empirical study of representative PETL methods in the context of Vision Transformers. We systematically tune their hyper-parameters to fairly compare their accuracy on downstream tasks. Our study not only offers a valuable user guide but also unveils several new insights. First, if tuned carefully, different PETL methods can obtain similar accuracy in the low-shot benchmark VTAB-1K. This includes simple methods like FT the bias terms that were reported inferior. Second, though with similar accuracy, we find that PETL methods make different mistakes and high-confidence predictions, likely due to their different inductive biases. Such an inconsistency (or complementariness) opens up the opportunity for ensemble methods, and we make preliminary attempts at this. Third, going beyond the commonly used low-shot tasks, we find that PETL is also useful in many-shot regimes -- it achieves comparable and sometimes better accuracy than full FT, using much fewer learnable parameters. Last but not least, we investigate PETL's ability to preserve a pre-trained model's robustness to distribution shifts (e.g., a CLIP backbone). Perhaps not surprisingly, PETL methods outperform full FT alone. However, with weight-space ensembles, the fully fine-tuned model can better balance target (i.e., downstream) distribution and distribution shift performance, suggesting a future research direction for PETL.	翻訳日:2024-11-06 17:30:16 公開日:2024-10-04
# オンラインからオフラインへのフードデリバリープラットフォームが健康食品選択に与える影響を調査するサイバーフードスワップ Cyber Food Swamps: Investigating the Impacts of Online-to-Offline Food Delivery Platforms on Healthy Food Choices ( http://arxiv.org/abs/2409.16601v2 ) ライセンス: Link先を確認	Yunke Zhang, Yiran Fan, Peijie Liu, Fengli Xu, Yong Li,	(参考訳) オンライン・トゥ・オフライン(O2O)フードデリバリープラットフォームは、都市住民の食品選択を大幅に強化し、より便利な食品アウトレットへのアクセスを可能にしている。しかし,O2Oフードデリバリープラットフォームがユーザの健康的な食品選択に与える影響については,特に懸念が残る。本研究は、大手O2Oデリバリープラットフォームからの大規模実証データを利用して、オンライン食品選択行動の包括的分析と、ファーストフードレストランへのオンライン露出、すなわちオンライン食品環境の影響について述べる。分析の結果,人口集団や都市規模において,男性,低所得者,若年者,大都市におけるファストフードの注文は,O2Oプラットフォームを経由する傾向がみられた。さらに、オンラインおよびオフライン環境における食品暴露の違いについて比較分析を行い、O2Oプラットフォームの拡張サービス範囲がより大きな「サイバフード湿地」を創出できることを確認した。さらに、レグレッション分析では、ファーストフードの注文の比率が高いのは、アクセス可能なファーストフードレストランの比率が高いのが特徴の「サイバーフード湿地」と関連していることを示している。このシェアが10%上昇すると、ファーストフードの注文率が22.0%上昇する。さらに、準自然実験は、オンライン食品環境の変化が健康食品選択に長期的な因果効果を裏付けるものである。以上の結果から,O2Oフードデリバリープラットフォームは,オンライン食品選択曝露の健康への影響に対処し,住民の食生活改善に様々な利害関係者の努力を喚起する必要性が示唆された。 Online-to-offline (O2O) food delivery platforms have substantially enriched the food choices of urban residents by allowing them to conveniently access farther food outlets. However, concerns about the healthiness of delivered food persist, especially because the impact of O2O food delivery platforms on users' healthy food choices remains unclear. This study leverages large-scale empirical data from a leading O2O delivery platform to comprehensively analyze online food choice behaviors and how they are influenced by the online exposure to fast food restaurants, i.e., online food environment. Our analyses reveal significant discrepancy in food preferences across demographic groups and city sizes, where male, low-income, and younger users and those located in larger cities more likely to order fast food via O2O platforms. Besides, we also perform a comparative analysis on the food exposure differences in online and offline environments, confirming that the extended service ranges of O2O platforms can create larger "cyber food swamps". Furthermore, regression analysis highlights that a higher ratio of fast food orders is associated with "cyber food swamps", areas characterized by a higher share of accessible fast food restaurants. A 10% increase in this share raises the probability of ordering fast food by 22.0%. Moreover, a quasi-natural experiment substantiates the long-term causal effect of online food environment changes on healthy food choices. Our findings underscore the need for O2O food delivery platforms to address the health implications of online food choice exposure, thereby informing efforts by various stakeholders to improve residents' dietary health.	翻訳日:2024-11-06 17:30:16 公開日:2024-10-04
# SDCL:半教師型医用画像分割のための学生の不一致情報修正学習 SDCL: Students Discrepancy-Informed Correction Learning for Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2409.16728v2 ) ライセンス: Link先を確認	Bentao Song, Qingfeng Wang,	(参考訳) 半教師付き医用画像セグメンテーション(SSMIS)は、限られた医療ラベル付きデータの問題を緩和する可能性を実証している。しかし, 教師によるSSMIS法は, 疑似ラベルの誤用により, 確証と認知バイアスが影響する可能性が示唆された。この課題に対処するために,我々は,2人の学生と1人の非訓練教師を含む,平均的教師のアプローチを改善し,自己修正学習の指導に2人の学生の分節差を利用するSDCL(Dedisrepancy-Informed Correction Learning)フレームワークを提案する。 SDCLの本質は、セグメンテーションの差異の領域を潜在的なバイアス領域として識別し、モデルが正しい認知をレビューし、これらの領域で自身のバイアスを補正することを奨励することである。連続的なレビューと修正によるバイアス補正学習を容易にするために、正しいセグメンテーションボクセル距離を最小化し、誤セグメンテーションボクセルエントロピーを最大化する2つの補正損失関数を用いる。 2つの3次元データセット(CTとMRI)と1つの2次元データセット(MRI)の3つの公開医用画像データセットについて実験を行った。その結果, SDCL は現在の State-of-the-Art (SOTA) 法を2.57\%, 3.04\%, 2.34\% で上回っていることがわかった。さらに,本手法の精度は,ACDCデータセットの完全教師付き手法に非常に近く,膵臓およびLAデータセットの完全教師付き手法を超えている。 (コードは \url{https://github.com/pascalcpp/SDCL})。 Semi-supervised medical image segmentation (SSMIS) has been demonstrated the potential to mitigate the issue of limited medical labeled data. However, confirmation and cognitive biases may affect the prevalent teacher-student based SSMIS methods due to erroneous pseudo-labels. To tackle this challenge, we improve the mean teacher approach and propose the Students Discrepancy-Informed Correction Learning (SDCL) framework that includes two students and one non-trainable teacher, which utilizes the segmentation difference between the two students to guide the self-correcting learning. The essence of SDCL is to identify the areas of segmentation discrepancy as the potential bias areas, and then encourage the model to review the correct cognition and rectify their own biases in these areas. To facilitate the bias correction learning with continuous review and rectification, two correction loss functions are employed to minimize the correct segmentation voxel distance and maximize the erroneous segmentation voxel entropy. We conducted experiments on three public medical image datasets: two 3D datasets (CT and MRI) and one 2D dataset (MRI). The results show that our SDCL surpasses the current State-of-the-Art (SOTA) methods by 2.57\%, 3.04\%, and 2.34\% in the Dice score on the Pancreas, LA, and ACDC datasets, respectively. In addition, the accuracy of our method is very close to the fully supervised method on the ACDC dataset, and even exceeds the fully supervised method on the Pancreas and LA dataset. (Code available at \url{https://github.com/pascalcpp/SDCL}).	翻訳日:2024-11-06 17:20:02 公開日:2024-10-04
# インフォームド深層階層分類--非標準解析によるアプローチ Informed deep hierarchical classification: a non-standard analysis inspired approach ( http://arxiv.org/abs/2409.16956v2 ) ライセンス: Link先を確認	Lorenzo Fiaschi, Marco Cococcioni,	(参考訳) 本研究は, 厳密な親子構造に組織された複数のラベルによるデータ分類の問題という, 階層的分類課題に対する新しいアプローチを提案する。出力層の前に配置された特定のプロジェクション演算子を備えた多出力ディープニューラルネットワークで構成されている。辞書型ハイブリッドディープニューラルネットワーク(LH-DNN)と呼ばれるアーキテクチャの設計は、辞書型多目的最適化、非標準分析、ディープラーニングといった、異なる研究分野のツールを組み合わせることで実現されている。このアプローチの有効性を評価するために、結果として得られるネットワークは、階層的な分類タスクに適した畳み込みニューラルネットワークであるB-CNN、CIFAR10、CIFAR100(複数の現実世界のアプリケーションに採用され、調整される前に提案された)、Fashion-MNISTベンチマークと比較される。エビデンスによれば、LH-DNNは、特に階層関係の学習において、アドホック損失関数を重み付けすることなく、学習パラメータの劇的な減少、エポックの訓練、計算時間に直面して、優れた性能を達成できる。 This work proposes a novel approach to the deep hierarchical classification task, i.e., the problem of classifying data according to multiple labels organized in a rigid parent-child structure. It consists in a multi-output deep neural network equipped with specific projection operators placed before each output layer. The design of such an architecture, called lexicographic hybrid deep neural network (LH-DNN), has been possible by combining tools from different and quite distant research fields: lexicographic multi-objective optimization, non-standard analysis, and deep learning. To assess the efficacy of the approach, the resulting network is compared against the B-CNN, a convolutional neural network tailored for hierarchical classification tasks, on the CIFAR10, CIFAR100 (where it has been originally and recently proposed before being adopted and tuned for multiple real-world applications) and Fashion-MNIST benchmarks. Evidence states that an LH-DNN can achieve comparable if not superior performance, especially in the learning of the hierarchical relations, in the face of a drastic reduction of the learning parameters, training epochs, and computational time, without the need for ad-hoc loss functions weighting values.	翻訳日:2024-11-06 17:10:14 公開日:2024-10-04
# シリコン窒化物外部キャビティレーザーの100Hz以下固有線幅852nm Sub-100 Hz Intrinsic Linewidth 852 nm Silicon Nitride External Cavity Laser ( http://arxiv.org/abs/2409.17382v2 ) ライセンス: Link先を確認	Hani Nejadriahi, Eric Kittlaus, Debapam Bose, Nitesh Chauhan, Jiawei Wang, Mathieu Fradet, Mahmood Bagheri, Andrei Isichenko, David Heim, Siamak Forouhar, Daniel Blumenthal,	(参考訳) レーザー冷却とセシウム原子の操作に関係し, 動作波長852nm付近に100Hz以下の固有線幅を有する外部共振器レーザを試作した。最大CW出力は24mW、波長可変は15nm、サイドモード抑制比は50dBを超える。この性能レベルは、市販の半導体ゲインチップと組み合わせて外部キャビティとして機能する低損失集積窒化ケイ素フォトニック回路を慎重に設計することによる。提案手法は, 半導体ゲイン媒質の選択により, より短い波長に拡張可能な, 超低温原子をベースとした新しいセンサ概念の必要性に着目した, サブkHzライン幅の小型集積レーザの実現可能性を示すものである。 We demonstrate an external cavity laser with intrinsic linewidth below 100 Hz around an operating wavelength of 852 nm, selected for its relevance to laser cooling and manipulation of cesium atoms. This system achieves a maximum CW output power of 24 mW, wavelength tunability over 15 nm, and a side-mode suppression ratio exceeding 50 dB. This performance level is facilitated by careful design of a low-loss integrated silicon nitride photonic circuit serving as the external cavity combined with commercially available semiconductor gain chips. This approach demonstrates the feasibility of compact integrated lasers with sub-kHz linewidth centering on the needs of emerging sensor concepts based on ultracold atoms and can be further extended to shorter wavelengths via selection of suitable semiconductor gain media.	翻訳日:2024-11-06 16:30:51 公開日:2024-10-04
# MoJE:脱獄専門家の混成、暴行攻撃の警護にタブラル・クラシファイア(動画あり) MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks ( http://arxiv.org/abs/2409.17699v3 ) ライセンス: Link先を確認	Giandomenico Cornacchia, Giulio Zizzo, Kieran Fraser, Muhammad Zaid Hameed, Ambrish Rawat, Mark Purcell,	(参考訳) 多様なアプリケーションにおけるLarge Language Models(LLMs)の普及は、潜在的ジェイルブレイク攻撃を防ぐための堅牢なセキュリティ対策の必要性を浮き彫りにしている。これらの攻撃は、LSM内の脆弱性、データ完全性やユーザのプライバシを危険にさらす。ガードレールはこのような脅威に対して重要な防御機構として機能するが、既存のモデルは検出精度と計算効率の両方の観点から、しばしば不足する。本稿では,LLMに対するジェイルブレイク攻撃防止の重要性を論じ,これらのモデルを保護する上での入力ガードレールの役割を強調した。現状のガードレールの限界を超えるよう設計された新しいガードレールアーキテクチャであるMoJE(Mixture of Jailbreak Expert)を紹介する。単純な言語統計手法を用いることで、MoJEはモデル推論中に最小限の計算オーバーヘッドを維持しながら、ジェイルブレイク攻撃の検出に優れる。厳格な実験を通じて、MoJEは良心的なプロンプトを損なうことなく90%の攻撃を検知できる優れた性能を示し、脱獄攻撃に対するLLMの安全性を高めた。 The proliferation of Large Language Models (LLMs) in diverse applications underscores the pressing need for robust security measures to thwart potential jailbreak attacks. These attacks exploit vulnerabilities within LLMs, endanger data integrity and user privacy. Guardrails serve as crucial protective mechanisms against such threats, but existing models often fall short in terms of both detection accuracy, and computational efficiency. This paper advocates for the significance of jailbreak attack prevention on LLMs, and emphasises the role of input guardrails in safeguarding these models. We introduce MoJE (Mixture of Jailbreak Expert), a novel guardrail architecture designed to surpass current limitations in existing state-of-the-art guardrails. By employing simple linguistic statistical techniques, MoJE excels in detecting jailbreak attacks while maintaining minimal computational overhead during model inference. Through rigorous experimentation, MoJE demonstrates superior performance capable of detecting 90% of the attacks without compromising benign prompts, enhancing LLMs security against jailbreak attacks.	翻訳日:2024-11-06 16:10:55 公開日:2024-10-04
# Pairwise RankingのためのFew-shot Prompting: 効果的な非パラメトリック検索モデル Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model ( http://arxiv.org/abs/2409.17745v3 ) ライセンス: Link先を確認	Nilanjan Sinhababu, Andrew Parry, Debasis Ganguly, Debasis Samanta, Pabitra Mitra,	(参考訳) 教師付きランキングモデルは、効果的であることの利点にもかかわらず、通常複雑な処理(通常、タスク固有の事前トレーニングと微調整の複数の段階)を伴います。これによって研究者たちは,ゼロショットで動作可能な大規模言語モデル(LLM)を活用した,シンプルなパイプラインの探索を動機付けている。しかし、ゼロショット推論では、クエリのペアとその関連ドキュメントのトレーニングセットは使用しないため、そのパフォーマンスは、そのようなペアでトレーニングされる教師付きモデルよりも大幅に低下する。トレーニングサンプルが一般的にゼロショットのパフォーマンスを改善するという既存の知見に触発されて、私たちの研究では、これがランキングモデルにも当てはまるかどうか調査している。より具体的には、クエリとドキュメントのペアが与えられた場合、トレーニングセットから類似したクエリの好みの例を増やすことで、好み予測タスクが改善される。提案手法は,インドメイン (TREC DL) とアウトドメイン (BEIR サブセット) の検索ベンチマークにおいて,ゼロショットベースラインに対する一貫した改善を示す。また,複雑なトレーニングパイプラインを必要とせず,教師付きモデルに近い性能を実現する。 A supervised ranking model, despite its advantage of being effective, usually involves complex processing - typically multiple stages of task-specific pre-training and fine-tuning. This has motivated researchers to explore simpler pipelines leveraging large language models (LLMs) that are capable of working in a zero-shot manner. However, since zero-shot inference does not make use of a training set of pairs of queries and their relevant documents, its performance is mostly worse than that of supervised models, which are trained on such example pairs. Motivated by the existing findings that training examples generally improve zero-shot performance, in our work, we explore if this also applies to ranking models. More specifically, given a query and a pair of documents, the preference prediction task is improved by augmenting examples of preferences for similar queries from a training set. Our proposed pairwise few-shot ranker demonstrates consistent improvements over the zero-shot baseline on both in-domain (TREC DL) and out-domain (BEIR subset) retrieval benchmarks. Our method also achieves a close performance to that of a supervised model without requiring any complex training pipeline.	翻訳日:2024-11-06 16:00:56 公開日:2024-10-04
# MLによる透かしの安全性評価:コピーと除去攻撃 Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks ( http://arxiv.org/abs/2409.18211v1 ) ライセンス: Link先を確認	Vitaliy Kinakh, Brian Pulfer, Yury Belousov, Pierre Fernandez, Teddy Furon, Slava Voloshynovskiy,	(参考訳) 現実世界やAIが生成したメディアから取得した膨大な量のデジタルコンテンツは、著作権保護、トレーサビリティ、データ証明の方法を必要とする。デジタル透かしはこれらの課題に対処するための重要なアプローチである。その進化は、手作り、オートエンコーダベース、基礎モデルベースメソッドの3世代に及ぶ。 % Itsの進化は、手作りの方法、オートエンコーダベースのスキーム、基礎モデルに基づく方法の3世代にまたがる。これらのシステムの堅牢性は十分に文書化されているが、敵の攻撃に対するセキュリティは未解明のままである。本稿では,逆埋め込み技術を用いた基礎モデルの潜時空間デジタル透かしシステムのセキュリティ評価を行う。一連の実験は、コピーと削除攻撃の下でのセキュリティの次元を調査し、これらのシステムの脆弱性に関する実証的な洞察を提供する。すべての実験コードと結果はhttps://github.com/vkinakh/ssl-watermarking- attacks}{repositoryで公開されている。 The vast amounts of digital content captured from the real world or AI-generated media necessitate methods for copyright protection, traceability, or data provenance verification. Digital watermarking serves as a crucial approach to address these challenges. Its evolution spans three generations: handcrafted, autoencoder-based, and foundation model based methods. %Its evolution spans three generations: handcrafted methods, autoencoder-based schemes, and methods based on foundation models. While the robustness of these systems is well-documented, the security against adversarial attacks remains underexplored. This paper evaluates the security of foundation models' latent space digital watermarking systems that utilize adversarial embedding techniques. A series of experiments investigate the security dimensions under copy and removal attacks, providing empirical insights into these systems' vulnerabilities. All experimental codes and results are available at https://github.com/vkinakh/ssl-watermarking-attacks}{repository	翻訳日:2024-11-06 15:21:45 公開日:2024-10-04
# MLによる透かしの安全性評価:コピーと除去攻撃 Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks ( http://arxiv.org/abs/2409.18211v2 ) ライセンス: Link先を確認	Vitaliy Kinakh, Brian Pulfer, Yury Belousov, Pierre Fernandez, Teddy Furon, Slava Voloshynovskiy,	(参考訳) 現実世界やAIが生成したメディアから取得した膨大な量のデジタルコンテンツは、著作権保護、トレーサビリティ、データ証明の方法を必要とする。デジタル透かしはこれらの課題に対処するための重要なアプローチである。その進化は、手作り、オートエンコーダベース、基礎モデルベースメソッドの3世代に及ぶ。これらのシステムの堅牢性は十分に文書化されているが、敵の攻撃に対するセキュリティは未解明のままである。本稿では,逆埋め込み技術を用いた基礎モデルの潜時空間デジタル透かしシステムのセキュリティ評価を行う。一連の実験は、コピーと削除攻撃の下でのセキュリティの次元を調査し、これらのシステムの脆弱性に関する実証的な洞察を提供する。実験コードと結果はすべてhttps://github.com/vkinakh/ssl-watermarking- attacksで公開されている。 The vast amounts of digital content captured from the real world or AI-generated media necessitate methods for copyright protection, traceability, or data provenance verification. Digital watermarking serves as a crucial approach to address these challenges. Its evolution spans three generations: handcrafted, autoencoder-based, and foundation model based methods. While the robustness of these systems is well-documented, the security against adversarial attacks remains underexplored. This paper evaluates the security of foundation models' latent space digital watermarking systems that utilize adversarial embedding techniques. A series of experiments investigate the security dimensions under copy and removal attacks, providing empirical insights into these systems' vulnerabilities. All experimental codes and results are available at https://github.com/vkinakh/ssl-watermarking-attacks .	翻訳日:2024-11-06 15:21:45 公開日:2024-10-04
# 拡散形状事前推定によるアモーダル・インスタンス・セグメンテーション Amodal Instance Segmentation with Diffusion Shape Prior Estimation ( http://arxiv.org/abs/2409.18256v1 ) ライセンス: Link先を確認	Minh Tran, Khoa Vo, Tri Nguyen, Ngan Le,	(参考訳) Amodal Instance Segmentation (AIS)は、画像内のオブジェクトの可視部分と隠蔽部分の両方のセグメンテーション予測を含む、興味深い課題を提示している。従来は、アモーダルセグメンテーションを強化するために、トレーニングデータから収集した形状の事前情報に頼っていた。しかし、これらのアプローチは対象圏の詳細を過度に適合させ無視する可能性がある。最近の進歩は、潜在空間から画像を生成するために、広範囲なデータセットで事前訓練された条件付き拡散モデルの可能性を強調している。そこで我々は,拡散形状優先推定(DiffSP)モジュールを用いたAISDiffを提案する。 AISDiffは、目に見えるセグメンテーションマスクとオブジェクトカテゴリの予測から始まり、オクルージョンマスクの予測を通じてオクルージョン認識処理を行う。その後、これらの要素はDiffSPモジュールに入力され、オブジェクトの前の形状を推測します。 DiffSPは、広範囲なデータセットで事前訓練された条件付き拡散モデルを使用して、形状事前推定のためのリッチな視覚的特徴を抽出する。さらに,アモーダルセグメンテーションに先立って,その形状から注目に基づく特徴写像を利用する形状優先アモーダル予測器を提案する。様々なAISベンチマークによる実験では、AISDiffの有効性が示されています。 Amodal Instance Segmentation (AIS) presents an intriguing challenge, including the segmentation prediction of both visible and occluded parts of objects within images. Previous methods have often relied on shape prior information gleaned from training data to enhance amodal segmentation. However, these approaches are susceptible to overfitting and disregard object category details. Recent advancements highlight the potential of conditioned diffusion models, pretrained on extensive datasets, to generate images from latent space. Drawing inspiration from this, we propose AISDiff with a Diffusion Shape Prior Estimation (DiffSP) module. AISDiff begins with the prediction of the visible segmentation mask and object category, alongside occlusion-aware processing through the prediction of occluding masks. Subsequently, these elements are inputted into our DiffSP module to infer the shape prior of the object. DiffSP utilizes conditioned diffusion models pretrained on extensive datasets to extract rich visual features for shape prior estimation. Additionally, we introduce the Shape Prior Amodal Predictor, which utilizes attention-based feature maps from the shape prior to refine amodal segmentation. Experiments across various AIS benchmarks demonstrate the effectiveness of our AISDiff.	翻訳日:2024-11-06 15:01:18 公開日:2024-10-04
# 拡散形状事前推定によるアモーダル・インスタンス・セグメンテーション Amodal Instance Segmentation with Diffusion Shape Prior Estimation ( http://arxiv.org/abs/2409.18256v2 ) ライセンス: Link先を確認	Minh Tran, Khoa Vo, Tri Nguyen, Ngan Le,	(参考訳) Amodal Instance Segmentation (AIS)は、画像内のオブジェクトの可視部分と隠蔽部分の両方のセグメンテーション予測を含む、興味深い課題を提示している。従来は、アモーダルセグメンテーションを強化するために、トレーニングデータから収集した形状の事前情報に頼っていた。しかし、これらのアプローチは対象圏の詳細を過度に適合させ無視する可能性がある。最近の進歩は、潜在空間から画像を生成するために、広範囲なデータセットで事前訓練された条件付き拡散モデルの可能性を強調している。そこで我々は,拡散形状優先推定(DiffSP)モジュールを用いたAISDiffを提案する。 AISDiffは、目に見えるセグメンテーションマスクとオブジェクトカテゴリの予測から始まり、オクルージョンマスクの予測を通じてオクルージョン認識処理を行う。その後、これらの要素はDiffSPモジュールに入力され、オブジェクトの前の形状を推測します。 DiffSPは、広範囲なデータセットで事前訓練された条件付き拡散モデルを使用して、形状事前推定のためのリッチな視覚的特徴を抽出する。さらに,アモーダルセグメンテーションに先立って,その形状から注目に基づく特徴写像を利用する形状優先アモーダル予測器を提案する。様々なAISベンチマークによる実験では、AISDiffの有効性が示されています。 Amodal Instance Segmentation (AIS) presents an intriguing challenge, including the segmentation prediction of both visible and occluded parts of objects within images. Previous methods have often relied on shape prior information gleaned from training data to enhance amodal segmentation. However, these approaches are susceptible to overfitting and disregard object category details. Recent advancements highlight the potential of conditioned diffusion models, pretrained on extensive datasets, to generate images from latent space. Drawing inspiration from this, we propose AISDiff with a Diffusion Shape Prior Estimation (DiffSP) module. AISDiff begins with the prediction of the visible segmentation mask and object category, alongside occlusion-aware processing through the prediction of occluding masks. Subsequently, these elements are inputted into our DiffSP module to infer the shape prior of the object. DiffSP utilizes conditioned diffusion models pretrained on extensive datasets to extract rich visual features for shape prior estimation. Additionally, we introduce the Shape Prior Amodal Predictor, which utilizes attention-based feature maps from the shape prior to refine amodal segmentation. Experiments across various AIS benchmarks demonstrate the effectiveness of our AISDiff.	翻訳日:2024-11-06 15:01:18 公開日:2024-10-04
# 合成西ブロット源属性のための説明可能なアーティファクト Explainable Artifacts for Synthetic Western Blot Source Attribution ( http://arxiv.org/abs/2409.18881v2 ) ライセンス: Link先を確認	João Phillipe Cardenuto, Sara Mandelli, Daniel Moreira, Paolo Bestagini, Edward Delp, Anderson Rocha,	(参考訳) 近年の人工知能の進歩により、生成モデルは原始的なものと区別できない合成科学的イメージを作成できるようになった。不正な記事を体系的に生成する製紙所として知られる組織によって活用されると、これらの技術は根拠のない科学に関する誤報の拡散に大きく寄与し、科学研究への信頼を損なう可能性がある。以前の研究では、合成コンテンツを識別するための畳み込みニューラルネットワークのようなブラックボックスソリューションを探索してきたが、異なるモデルにまたがって一般化し、検出過程を知らせる合成画像のアーティファクトに関する洞察を提供するという課題に対処する者はほとんどいなかった。本研究の目的は、最先端の生成モデル(ジェネレーティブ・ディフュージョン・モデル、ジェネレーティブ・ディフュージョン・モデル)によって生成された説明可能なアーティファクトを特定し、それらをオープン・セットの識別とソース属性(すなわち、画像を作成するモデルを指し示す)に活用することである。 Recent advancements in artificial intelligence have enabled generative models to produce synthetic scientific images that are indistinguishable from pristine ones, posing a challenge even for expert scientists habituated to working with such content. When exploited by organizations known as paper mills, which systematically generate fraudulent articles, these technologies can significantly contribute to the spread of misinformation about ungrounded science, potentially undermining trust in scientific research. While previous studies have explored black-box solutions, such as Convolutional Neural Networks, for identifying synthetic content, only some have addressed the challenge of generalizing across different models and providing insight into the artifacts in synthetic images that inform the detection process. This study aims to identify explainable artifacts generated by state-of-the-art generative models (e.g., Generative Adversarial Networks and Diffusion Models) and leverage them for open-set identification and source attribution (i.e., pointing to the model that created the image).	翻訳日:2024-11-06 05:32:49 公開日:2024-10-04
# 半修正対象検出における低バイアス教師モデルの適用 Applying the Lower-Biased Teacher Model in Semi-Suepervised Object Detection ( http://arxiv.org/abs/2409.19703v1 ) ライセンス: Link先を確認	Shuang Wang,	(参考訳) 半教師対象検出タスクに適したアンバイアスド教師モデルの強化であるローワーバイアスド教師モデルを提案する。このモデルの主な革新は、教師モデルへのローカライズ損失の統合であり、擬似ラベル生成の精度を大幅に向上させる。クラス不均衡やバウンディングボックスの精度といった重要な問題に対処することにより、ローワーバイアスト・教師・モデルはオブジェクト検出タスクにおいて優れたパフォーマンスを示す。複数の半教師対象検出データセットに対する大規模な実験により、下バイアス教師モデルは、クラス不均衡に起因する擬似ラベルバイアスを低減させるだけでなく、不正な境界ボックスによる誤りを緩和することが示された。その結果,既存の手法と比較して,mAPスコアが向上し,信頼性の高い検出結果が得られることがわかった。本研究は,精度の高い擬似ラベル生成の重要性を浮き彫りにして,半教師あり学習におけるオブジェクト検出のための堅牢なフレームワークを提供する。 I present the Lower Biased Teacher model, an enhancement of the Unbiased Teacher model, specifically tailored for semi-supervised object detection tasks. The primary innovation of this model is the integration of a localization loss into the teacher model, which significantly improves the accuracy of pseudo-label generation. By addressing key issues such as class imbalance and the precision of bounding boxes, the Lower Biased Teacher model demonstrates superior performance in object detection tasks. Extensive experiments on multiple semi-supervised object detection datasets show that the Lower Biased Teacher model not only reduces the pseudo-labeling bias caused by class imbalances but also mitigates errors arising from incorrect bounding boxes. As a result, the model achieves higher mAP scores and more reliable detection outcomes compared to existing methods. This research underscores the importance of accurate pseudo-label generation and provides a robust framework for future advancements in semi-supervised learning for object detection.	翻訳日:2024-11-05 21:29:26 公開日:2024-10-04
# 半監督対象検出における低バイアス教師モデルの適用 Applying the Lower-Biased Teacher Model in Semi-Supervised Object Detection ( http://arxiv.org/abs/2409.19703v2 ) ライセンス: Link先を確認	Shuang Wang,	(参考訳) 半教師対象検出タスクに適したアンバイアスド教師モデルの強化であるローワーバイアスド教師モデルを提案する。このモデルの主な革新は、教師モデルへのローカライズ損失の統合であり、擬似ラベル生成の精度を大幅に向上させる。クラス不均衡やバウンディングボックスの精度といった重要な問題に対処することにより、ローワーバイアスト・教師・モデルはオブジェクト検出タスクにおいて優れたパフォーマンスを示す。複数の半教師対象検出データセットに対する大規模な実験により、下バイアス教師モデルは、クラス不均衡に起因する擬似ラベルバイアスを低減させるだけでなく、不正な境界ボックスによる誤りを緩和することが示された。その結果,既存の手法と比較して,mAPスコアが向上し,信頼性の高い検出結果が得られることがわかった。本研究は,精度の高い擬似ラベル生成の重要性を浮き彫りにして,半教師あり学習におけるオブジェクト検出のための堅牢なフレームワークを提供する。 I present the Lower Biased Teacher model, an enhancement of the Unbiased Teacher model, specifically tailored for semi-supervised object detection tasks. The primary innovation of this model is the integration of a localization loss into the teacher model, which significantly improves the accuracy of pseudo-label generation. By addressing key issues such as class imbalance and the precision of bounding boxes, the Lower Biased Teacher model demonstrates superior performance in object detection tasks. Extensive experiments on multiple semi-supervised object detection datasets show that the Lower Biased Teacher model not only reduces the pseudo-labeling bias caused by class imbalances but also mitigates errors arising from incorrect bounding boxes. As a result, the model achieves higher mAP scores and more reliable detection outcomes compared to existing methods. This research underscores the importance of accurate pseudo-label generation and provides a robust framework for future advancements in semi-supervised learning for object detection.	翻訳日:2024-11-05 21:29:26 公開日:2024-10-04
# Coffee-Gym: 誤ったコードに対する自然言語フィードバックの評価と改善のための環境 Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code ( http://arxiv.org/abs/2409.19715v1 ) ライセンス: Link先を確認	Hyungjoo Chae, Taeyoon Kwon, Seungjun Moon, Yongho Song, Dongjin Kang, Kai Tzu-iunn Ong, Beong-woo Kwak, Seonghyeon Bae, Seung-won Hwang, Jinyoung Yeo,	(参考訳) 本稿では、コード編集のフィードバックを提供する訓練モデルのための総合的なRL環境であるCoffee-Gymについて述べる。 Coffee-Gymには,(1)人間のコード編集トレースを含むデータセットであるCoffee,(2)誤ったコード編集のための機械によるフィードバックを含むデータセットであるCoffeeEval,(2)修正されたコードのパフォーマンスをユニットテストで評価することで,フィードバックの有用性を忠実に反映する報酬関数であるCoffeeEvalが含まれる。それらとともに、Coffee-Gymは、RLでフィードバックモデルをトレーニングするための高品質データセットの有効性に対処し、SOTA報酬モデル(すなわちGPT-4)よりも正確な報酬を提供する。 Coffee-Gymを適用することで、オープンソースのLLMのコード編集の強化において、ベースラインよりも優れたフィードバックモデルを求め、それをクローズドソースのLLMに匹敵するものにする。データセットとモデルチェックポイントを公開しています。 This paper presents Coffee-Gym, a comprehensive RL environment for training models that provide feedback on code editing. Coffee-Gym includes two major components: (1) Coffee, a dataset containing humans' code edit traces for coding questions and machine-written feedback for editing erroneous code; (2) CoffeeEval, a reward function that faithfully reflects the helpfulness of feedback by assessing the performance of the revised code in unit tests. With them, Coffee-Gym addresses the unavailability of high-quality datasets for training feedback models with RL, and provides more accurate rewards than the SOTA reward model (i.e., GPT-4). By applying Coffee-Gym, we elicit feedback models that outperform baselines in enhancing open-source code LLMs' code editing, making them comparable with closed-source LLMs. We make the dataset and the model checkpoint publicly available.	翻訳日:2024-11-05 21:29:26 公開日:2024-10-04
# Coffee-Gym: 誤ったコードに対する自然言語フィードバックの評価と改善のための環境 Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code ( http://arxiv.org/abs/2409.19715v2 ) ライセンス: Link先を確認	Hyungjoo Chae, Taeyoon Kwon, Seungjun Moon, Yongho Song, Dongjin Kang, Kai Tzu-iunn Ong, Beong-woo Kwak, Seonghyeon Bae, Seung-won Hwang, Jinyoung Yeo,	(参考訳) 本稿では、コード編集のフィードバックを提供する訓練モデルのための総合的なRL環境であるCoffee-Gymについて述べる。 Coffee-Gymには,(1)人間のコード編集トレースを含むデータセットであるCoffee,(2)誤ったコード編集のための機械によるフィードバックを含むデータセットであるCoffeeEval,(2)修正されたコードのパフォーマンスをユニットテストで評価することで,フィードバックの有用性を忠実に反映する報酬関数であるCoffeeEvalが含まれる。それらとともに、Coffee-Gymは、RLでフィードバックモデルをトレーニングするための高品質データセットの有効性に対処し、SOTA報酬モデル(すなわちGPT-4)よりも正確な報酬を提供する。 Coffee-Gymを適用することで、オープンソースのLLMのコード編集の強化において、ベースラインよりも優れたフィードバックモデルを求め、それをクローズドソースのLLMに匹敵するものにする。データセットとモデルチェックポイントを公開しています。 This paper presents Coffee-Gym, a comprehensive RL environment for training models that provide feedback on code editing. Coffee-Gym includes two major components: (1) Coffee, a dataset containing humans' code edit traces for coding questions and machine-written feedback for editing erroneous code; (2) CoffeeEval, a reward function that faithfully reflects the helpfulness of feedback by assessing the performance of the revised code in unit tests. With them, Coffee-Gym addresses the unavailability of high-quality datasets for training feedback models with RL, and provides more accurate rewards than the SOTA reward model (i.e., GPT-4). By applying Coffee-Gym, we elicit feedback models that outperform baselines in enhancing open-source code LLMs' code editing, making them comparable with closed-source LLMs. We make the dataset and the model checkpoint publicly available.	翻訳日:2024-11-05 21:29:26 公開日:2024-10-04
# 指紋の質と人口動態に関する大規模運用研究 A large-scale operational study of fingerprint quality and demographics ( http://arxiv.org/abs/2409.19992v1 ) ライセンス: Link先を確認	Javier Galbally, Aleksandrs Cepilovs, Ramon Blanco-Gonzalo, Gillian Ormiston, Oscar Miguel-Hurtado, Istvan Sz. Racz,	(参考訳) 特定の人口集団に対する指紋認識技術の性能にはある程度の偏りがあるが、性別、年齢、指型などの特定の要因が指紋の品質や指紋照合精度に与える影響を理解するための十分な証拠は残っていない。本研究は、約16,000人の被験者の10プリントインプレッションを含む大規模運用データのデータベース上で、まだ研究中の課題に対処する。以上の結果から, 指紋品質と人口動態の依存性についてさらなる知見が得られ, 実際に, 個体群の異なる部分を対象とした指紋認識システムには, ある程度の性能変動が存在することが示唆された。実験的な評価に基づき、研究は、データ駆動による証拠に基づく新しい観察を指摘し、そのような観察を説明するための妥当な仮説を提供し、観察された指紋品質の違いを減らすのに役立つ潜在的なフォローアップ行動で結論付ける。このようにして、本論文は、バイオメトリック技術のアルゴリズム的公正性と等価性をさらに高めるための貢献とみなすことができる。 Even though a few initial works have shown on small sets of data some level of bias in the performance of fingerprint recognition technology with respect to certain demographic groups, there is still not sufficient evidence to understand the impact that certain factors such as gender, age or finger-type may have on fingerprint quality and, in turn, also on fingerprint matching accuracy. The present work addresses this still under researched topic, on a large-scale database of operational data containing 10-print impressions of almost 16,000 subjects. The results reached provide further insight into the dependency of fingerprint quality and demographics, and show that there in fact exists a certain degree of performance variability in fingerprint-based recognition systems for different segments of the population. Based on the experimental evaluation, the work points out new observations based on data-driven evidence, provides plausible hypotheses to explain such observations, and concludes with potential follow-up actions that can help to reduce the observed fingerprint quality differences. This way, the current paper can be considered as a contribution to further increase the algorithmic fairness and equality of biometric technology.	翻訳日:2024-11-05 16:18:02 公開日:2024-10-04
# 指紋の質と人口動態に関する大規模運用研究 A large-scale operational study of fingerprint quality and demographics ( http://arxiv.org/abs/2409.19992v2 ) ライセンス: Link先を確認	Javier Galbally, Aleksandrs Cepilovs, Ramon Blanco-Gonzalo, Gillian Ormiston, Oscar Miguel-Hurtado, Istvan Sz. Racz,	(参考訳) 特定の人口集団に対する指紋認識技術の性能にはある程度の偏りがあるが、性別、年齢、指型などの特定の要因が指紋の品質や指紋照合精度に与える影響を理解するための十分な証拠は残っていない。本研究は、約16,000人の被験者の10プリントインプレッションを含む大規模運用データのデータベース上で、まだ研究中の課題に対処する。以上の結果から, 指紋品質と人口動態の依存性についてさらなる知見が得られ, 実際に, 個体群の異なる部分を対象とした指紋認識システムには, ある程度の性能変動が存在することが示唆された。実験的な評価に基づき、研究は、データ駆動による証拠に基づく新しい観察を指摘し、そのような観察を説明するための妥当な仮説を提供し、観察された指紋品質の違いを減らすのに役立つ潜在的なフォローアップ行動で結論付ける。このようにして、本論文は、バイオメトリック技術のアルゴリズム的公正性と等価性をさらに高めるための貢献とみなすことができる。 Even though a few initial works have shown on small sets of data some level of bias in the performance of fingerprint recognition technology with respect to certain demographic groups, there is still not sufficient evidence to understand the impact that certain factors such as gender, age or finger-type may have on fingerprint quality and, in turn, also on fingerprint matching accuracy. The present work addresses this still under researched topic, on a large-scale database of operational data containing 10-print impressions of almost 16,000 subjects. The results reached provide further insight into the dependency of fingerprint quality and demographics, and show that there in fact exists a certain degree of performance variability in fingerprint-based recognition systems for different segments of the population. Based on the experimental evaluation, the work points out new observations based on data-driven evidence, provides plausible hypotheses to explain such observations, and concludes with potential follow-up actions that can help to reduce the observed fingerprint quality differences. This way, the current paper can be considered as a contribution to further increase the algorithmic fairness and equality of biometric technology.	翻訳日:2024-11-05 16:18:02 公開日:2024-10-04
# VideoINSTA: LLMを用いたインフォーマティブ空間時間推論によるゼロショット長ビデオ理解 VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs ( http://arxiv.org/abs/2409.20365v2 ) ライセンス: Link先を確認	Ruotong Liao, Max Erler, Huiyu Wang, Guangyao Zhai, Gengyuan Zhang, Yunpu Ma, Volker Tresp,	(参考訳) ビデオ言語領域では、ビデオ理解のためのゼロショットのLarge Language Modelベースの推論を利用した最近の研究が、従来のエンドツーエンドモデルと競合する問題となっている。しかし、長いビデオ理解は、ゼロショットLCMベースのアプローチであっても、拡張タイムパンに対する推論の複雑さのために、ユニークな課題を呈している。長ビデオにおける情報冗長性の課題は、大規模言語モデル(LLM)にどのような情報が必要なのか、そしてそれを長期ビデオ解析における複雑な時空間推論にどのように活用するかという問題を引き起こす。 Informative Spatial-TemporAl Reasoning for zero-shot long-form video understanding。 VideoINSTAは,(1)LLMを用いた長時間ビデオ理解のためのゼロショットフレームワーク,(2)ビデオ内の空間的時間的情報を引き出すイベントベースの時間的推論とコンテンツに基づく空間的推論アプローチ,(3)情報充足性と予測信頼度に基づく時間的要因のバランスをとる自己反射的情報推論スキームを提供する。 EgoSchema、NextQA、IntentQAの3つの長いビデオ質問応答ベンチマークと、オープンな質問応答データセットActivityNetQA。コードは、https://github.com/mayhugotong/VideoINSTA.comで公開されている。 In the video-language domain, recent works in leveraging zero-shot Large Language Model-based reasoning for video understanding have become competitive challengers to previous end-to-end models. However, long video understanding presents unique challenges due to the complexity of reasoning over extended timespans, even for zero-shot LLM-based approaches. The challenge of information redundancy in long videos prompts the question of what specific information is essential for large language models (LLMs) and how to leverage them for complex spatial-temporal reasoning in long-form video analysis. We propose a framework VideoINSTA, i.e. INformative Spatial-TemporAl Reasoning for zero-shot long-form video understanding. VideoINSTA contributes (1) a zero-shot framework for long video understanding using LLMs; (2) an event-based temporal reasoning and content-based spatial reasoning approach for LLMs to reason over spatial-temporal information in videos; (3) a self-reflective information reasoning scheme balancing temporal factors based on information sufficiency and prediction confidence. Our model significantly improves the state-of-the-art on three long video question-answering benchmarks: EgoSchema, NextQA, and IntentQA, and the open question answering dataset ActivityNetQA. The code is released here: https://github.com/mayhugotong/VideoINSTA.	翻訳日:2024-11-05 15:48:47 公開日:2024-10-04
# M2Distill: 一生学習のためのマルチモーダル蒸留 M2Distill: Multi-Modal Distillation for Lifelong Imitation Learning ( http://arxiv.org/abs/2410.00064v1 ) ライセンス: Link先を確認	Kaushik Roy, Akila Dissanayake, Brendan Tidd, Peyman Moghadam,	(参考訳) 操作タスクに対する生涯の模倣学習は、漸進的な学習ステップで発生する分散シフトによって大きな課題を生じさせる。既存の手法はしばしば教師なしのスキル発見に焦点を合わせ、成長を続けるスキルライブラリを構築したり、複数のポリシーから蒸留したりすることで、多様な操作タスクが継続的に導入され、学習プロセスを通して一貫した潜伏空間を確保するのに失敗するなど、スケーラビリティの問題につながる可能性がある。本稿では,マルチモーダル蒸留を用いた生涯模擬学習手法であるM2Distillを紹介し,学習過程全体を通して視覚,言語,行動分布を一貫した潜伏空間を保存することに着目した。従来の段階から現在の段階にまたがる潜在表現の変化を規制し、連続的な学習ステップ間のガウス混合モデル(GMM)ポリシーの相違を低減させることにより、学習方針は、新しいスキルをシームレスに統合しながら、学習済みのタスクを実行する能力を維持する。 LIBERO-OBJECT, LIBERO-GOAL, LIBERO-SPATIALなど, 寿命の長い模擬学習ベンチマークスイートの大規模な評価は, 評価指標のすべてにおいて, 従来手法よりも常に優れていたことを示す。 Lifelong imitation learning for manipulation tasks poses significant challenges due to distribution shifts that occur in incremental learning steps. Existing methods often focus on unsupervised skill discovery to construct an ever-growing skill library or distillation from multiple policies, which can lead to scalability issues as diverse manipulation tasks are continually introduced and may fail to ensure a consistent latent space throughout the learning process, leading to catastrophic forgetting of previously learned skills. In this paper, we introduce M2Distill, a multi-modal distillation-based method for lifelong imitation learning focusing on preserving consistent latent space across vision, language, and action distributions throughout the learning process. By regulating the shifts in latent representations across different modalities from previous to current steps, and reducing discrepancies in Gaussian Mixture Model (GMM) policies between consecutive learning steps, we ensure that the learned policy retains its ability to perform previously learned tasks while seamlessly integrating new skills. Extensive evaluations on the LIBERO lifelong imitation learning benchmark suites, including LIBERO-OBJECT, LIBERO-GOAL, and LIBERO-SPATIAL, demonstrate that our method consistently outperforms prior state-of-the-art methods across all evaluated metrics.	翻訳日:2024-11-05 15:09:43 公開日:2024-10-04
# M2Distill: 一生学習のためのマルチモーダル蒸留 M2Distill: Multi-Modal Distillation for Lifelong Imitation Learning ( http://arxiv.org/abs/2410.00064v2 ) ライセンス: Link先を確認	Kaushik Roy, Akila Dissanayake, Brendan Tidd, Peyman Moghadam,	(参考訳) 操作タスクに対する生涯の模倣学習は、漸進的な学習ステップで発生する分散シフトによって大きな課題を生じさせる。既存の手法はしばしば教師なしのスキル発見に焦点を合わせ、成長を続けるスキルライブラリを構築したり、複数のポリシーから蒸留したりすることで、多様な操作タスクが継続的に導入され、学習プロセスを通して一貫した潜伏空間を確保するのに失敗するなど、スケーラビリティの問題につながる可能性がある。本稿では,マルチモーダル蒸留を用いた生涯模擬学習手法であるM2Distillを紹介し,学習過程全体を通して視覚,言語,行動分布を一貫した潜伏空間を保存することに着目した。従来の段階から現在の段階にまたがる潜在表現の変化を規制し、連続的な学習ステップ間のガウス混合モデル(GMM)ポリシーの相違を低減させることにより、学習方針は、新しいスキルをシームレスに統合しながら、学習済みのタスクを実行する能力を維持する。 LIBERO-OBJECT, LIBERO-GOAL, LIBERO-SPATIALなど, 寿命の長い模擬学習ベンチマークスイートの大規模な評価は, 評価指標のすべてにおいて, 従来手法よりも常に優れていたことを示す。 Lifelong imitation learning for manipulation tasks poses significant challenges due to distribution shifts that occur in incremental learning steps. Existing methods often focus on unsupervised skill discovery to construct an ever-growing skill library or distillation from multiple policies, which can lead to scalability issues as diverse manipulation tasks are continually introduced and may fail to ensure a consistent latent space throughout the learning process, leading to catastrophic forgetting of previously learned skills. In this paper, we introduce M2Distill, a multi-modal distillation-based method for lifelong imitation learning focusing on preserving consistent latent space across vision, language, and action distributions throughout the learning process. By regulating the shifts in latent representations across different modalities from previous to current steps, and reducing discrepancies in Gaussian Mixture Model (GMM) policies between consecutive learning steps, we ensure that the learned policy retains its ability to perform previously learned tasks while seamlessly integrating new skills. Extensive evaluations on the LIBERO lifelong imitation learning benchmark suites, including LIBERO-OBJECT, LIBERO-GOAL, and LIBERO-SPATIAL, demonstrate that our method consistently outperforms prior state-of-the-art methods across all evaluated metrics.	翻訳日:2024-11-05 15:09:43 公開日:2024-10-04
# 有限データを用いた深部モデル解釈 : コアセットに基づくアプローチ Deep Model Interpretation with Limited Data : A Coreset-based Approach ( http://arxiv.org/abs/2410.00524v1 ) ライセンス: Link先を確認	Hamed Behzadi-Khormouji, José Oramas,	(参考訳) モデル解釈は、訓練されたモデルの内部から洞察を抽出することを目的としている。この課題に対処する一般的なアプローチは、適切な操作に欠かせないモデルで内部的に符号化された関連する機能の特徴づけである。これらの手法の最近の進歩にもかかわらず、それらが必要とするデータセットの厳密な評価のため、計算コストが低いという弱点がある。その結果、これらの手法の設計に関する研究は、より小さなデータサブセットに焦点を合わせており、洞察の減少につながる可能性がある。これらの計算コストに対処するために,コアセット選択手法を用いて,大規模なデータセットの代表的なサブセットを抽出するコアセットベースの解釈フレームワークを提案する。そこで本稿では,モデル解釈手法のロバスト性を評価するための類似性に基づく評価プロトコルを提案する。いくつかの解釈法、DNNモデル、コアセット選択法を考慮した実験は、提案手法の有効性を示す。 Model Interpretation aims at the extraction of insights from the internals of a trained model. A common approach to address this task is the characterization of relevant features internally encoded in the model that are critical for its proper operation. Despite recent progress of these methods, they come with the weakness of being computationally expensive due to the dense evaluation of datasets that they require. As a consequence, research on the design of these methods have focused on smaller data subsets which may led to reduced insights. To address these computational costs, we propose a coreset-based interpretation framework that utilizes coreset selection methods to sample a representative subset of the large dataset for the interpretation task. Towards this goal, we propose a similarity-based evaluation protocol to assess the robustness of model interpretation methods towards the amount data they take as input. Experiments considering several interpretation methods, DNN models, and coreset selection methods show the effectiveness of the proposed framework.	翻訳日:2024-11-05 05:07:10 公開日:2024-10-04
# 有限データを用いた深部モデル解釈 : コアセットに基づくアプローチ Deep Model Interpretation with Limited Data : A Coreset-based Approach ( http://arxiv.org/abs/2410.00524v2 ) ライセンス: Link先を確認	Hamed Behzadi-Khormouji, José Oramas,	(参考訳) モデル解釈は、訓練されたモデルの内部から洞察を抽出することを目的としている。この課題に対処する一般的なアプローチは、適切な操作に欠かせないモデルで内部的に符号化された関連する機能の特徴づけである。これらの手法の最近の進歩にもかかわらず、それらが必要とするデータセットの厳密な評価のため、計算コストが低いという弱点がある。その結果、これらの手法の設計に関する研究は、より小さなデータサブセットに焦点を合わせており、洞察の減少につながる可能性がある。これらの計算コストに対処するために,コアセット選択手法を用いて,大規模なデータセットの代表的なサブセットを抽出するコアセットベースの解釈フレームワークを提案する。そこで本稿では,モデル解釈手法のロバスト性を評価するための類似性に基づく評価プロトコルを提案する。いくつかの解釈法、DNNモデル、コアセット選択法を考慮した実験は、提案手法の有効性を示す。 Model Interpretation aims at the extraction of insights from the internals of a trained model. A common approach to address this task is the characterization of relevant features internally encoded in the model that are critical for its proper operation. Despite recent progress of these methods, they come with the weakness of being computationally expensive due to the dense evaluation of datasets that they require. As a consequence, research on the design of these methods have focused on smaller data subsets which may led to reduced insights. To address these computational costs, we propose a coreset-based interpretation framework that utilizes coreset selection methods to sample a representative subset of the large dataset for the interpretation task. Towards this goal, we propose a similarity-based evaluation protocol to assess the robustness of model interpretation methods towards the amount data they take as input. Experiments considering several interpretation methods, DNN models, and coreset selection methods show the effectiveness of the proposed framework.	翻訳日:2024-11-05 05:07:10 公開日:2024-10-04
# VideoCLIP-XL:ビデオCLIPモデルの長文記述理解の改善 VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models ( http://arxiv.org/abs/2410.00741v1 ) ライセンス: Link先を確認	Jiapeng Wang, Chengyu Wang, Kunzhe Huang, Jun Huang, Lianwen Jin,	(参考訳) Contrastive Language-Image Pre-Training (CLIP) は広く研究され、多くの応用に応用されている。しかし、事前トレーニング中の短い要約テキストに重点を置いているため、CLIPは長い記述を理解することができない。この問題は、ビデオが豊富な詳細コンテンツを含んでいることを考えると、特に鋭い。本稿では,ビデオCLIPモデルの長文理解能力を解き放つことを目的とした,ビデオCLIP-XL(eXtra Length)モデルを提案する。まず、自動データ収集システムを構築し、VIdeoとLong-Descriptionのペアで大規模なVILD事前学習データセットを収集する。次に,テキスト類似性誘導型プライマリコンポーネントマッチング(TPCM)を提案し,長文記述能力を拡張しながら特徴空間の分布をよりよく学習する。また,より理解を深めるために,Detail-aware Description Ranking (DDR) と Hallucination-aware Description Ranking (HDR) という2つの新しいタスクを導入した。最後に,Long Video Description Ranking (LVDR) ベンチマークを構築し,より包括的にLong Video Description Ranking (LVDR) を評価する。長文と短文を併用した広範に使用されているテキストビデオ検索ベンチマークとLVDRベンチマークの大規模な実験結果により,本手法の有効性が明らかとなった。 Contrastive Language-Image Pre-training (CLIP) has been widely studied and applied in numerous applications. However, the emphasis on brief summary texts during pre-training prevents CLIP from understanding long descriptions. This issue is particularly acute regarding videos given that videos often contain abundant detailed contents. In this paper, we propose the VideoCLIP-XL (eXtra Length) model, which aims to unleash the long-description understanding capability of video CLIP models. Firstly, we establish an automatic data collection system and gather a large-scale VILD pre-training dataset with VIdeo and Long-Description pairs. Then, we propose Text-similarity-guided Primary Component Matching (TPCM) to better learn the distribution of feature space while expanding the long description capability. We also introduce two new tasks namely Detail-aware Description Ranking (DDR) and Hallucination-aware Description Ranking (HDR) for further understanding improvement. Finally, we construct a Long Video Description Ranking (LVDR) benchmark for evaluating the long-description capability more comprehensively. Extensive experimental results on widely-used text-video retrieval benchmarks with both short and long descriptions and our LVDR benchmark can fully demonstrate the effectiveness of our method.	翻訳日:2024-11-05 04:05:39 公開日:2024-10-04
# VideoCLIP-XL:ビデオCLIPモデルの長文記述理解の改善 VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models ( http://arxiv.org/abs/2410.00741v2 ) ライセンス: Link先を確認	Jiapeng Wang, Chengyu Wang, Kunzhe Huang, Jun Huang, Lianwen Jin,	(参考訳) Contrastive Language-Image Pre-Training (CLIP) は広く研究され、多くの応用に応用されている。しかし、事前トレーニング中の短い要約テキストに重点を置いているため、CLIPは長い記述を理解することができない。この問題は、ビデオが豊富な詳細コンテンツを含んでいることを考えると、特に鋭い。本稿では,ビデオCLIPモデルの長文理解能力を解き放つことを目的とした,ビデオCLIP-XL(eXtra Length)モデルを提案する。まず、自動データ収集システムを構築し、VIdeoとLong-Descriptionのペアで大規模なVILD事前学習データセットを収集する。次に,テキスト類似性誘導型プライマリコンポーネントマッチング(TPCM)を提案し,長文記述能力を拡張しながら特徴空間の分布をよりよく学習する。また,より理解を深めるために,Detail-aware Description Ranking (DDR) と Hallucination-aware Description Ranking (HDR) という2つの新しいタスクを導入した。最後に,Long Video Description Ranking (LVDR) ベンチマークを構築し,より包括的にLong Video Description Ranking (LVDR) を評価する。長文と短文を併用した広範に使用されているテキストビデオ検索ベンチマークとLVDRベンチマークの大規模な実験結果により,本手法の有効性が明らかとなった。 Contrastive Language-Image Pre-training (CLIP) has been widely studied and applied in numerous applications. However, the emphasis on brief summary texts during pre-training prevents CLIP from understanding long descriptions. This issue is particularly acute regarding videos given that videos often contain abundant detailed contents. In this paper, we propose the VideoCLIP-XL (eXtra Length) model, which aims to unleash the long-description understanding capability of video CLIP models. Firstly, we establish an automatic data collection system and gather a large-scale VILD pre-training dataset with VIdeo and Long-Description pairs. Then, we propose Text-similarity-guided Primary Component Matching (TPCM) to better learn the distribution of feature space while expanding the long description capability. We also introduce two new tasks namely Detail-aware Description Ranking (DDR) and Hallucination-aware Description Ranking (HDR) for further understanding improvement. Finally, we construct a Long Video Description Ranking (LVDR) benchmark for evaluating the long-description capability more comprehensively. Extensive experimental results on widely-used text-video retrieval benchmarks with both short and long descriptions and our LVDR benchmark can fully demonstrate the effectiveness of our method.	翻訳日:2024-11-05 04:05:39 公開日:2024-10-04
# VHASR:視覚ホットワードを用いたマルチモーダル音声認識システム VHASR: A Multimodal Speech Recognition System With Vision Hotwords ( http://arxiv.org/abs/2410.00822v1 ) ライセンス: Link先を確認	Jiliang Hu, Zuchao Li, Ping Wang, Haojun Ai, Lefei Zhang, Hai Zhao,	(参考訳) 画像に基づくマルチモーダル音声認識(ASR)モデルは、音声関連画像を組み込んだ音声認識性能を向上させる。しかし、モデルに画像情報を導入することは、ASRの性能向上に寄与しない、という研究もある。本稿では,音声関連画像情報を活用した新しい手法を提案し,視覚をホットワードとして利用するマルチモーダル音声認識システムVHASRを提案する。本システムでは,まず2つのストリームのテキストを別々に書き起こし,出力を合成する。提案したモデルをFlickr8k,ADE20k,COCO,OpenImagesの4つのデータセットで評価した。実験の結果,VHASRは画像のキー情報を効果的に活用し,モデルの音声認識能力を向上できることがわかった。既存の画像ベースマルチモーダル ASR の中で,その性能は単調な ASR を上回るだけでなく,SOTA も達成している。 The image-based multimodal automatic speech recognition (ASR) model enhances speech recognition performance by incorporating audio-related image. However, some works suggest that introducing image information to model does not help improving ASR performance. In this paper, we propose a novel approach effectively utilizing audio-related image information and set up VHASR, a multimodal speech recognition system that uses vision as hotwords to strengthen the model's speech recognition capability. Our system utilizes a dual-stream architecture, which firstly transcribes the text on the two streams separately, and then combines the outputs. We evaluate the proposed model on four datasets: Flickr8k, ADE20k, COCO, and OpenImages. The experimental results show that VHASR can effectively utilize key information in images to enhance the model's speech recognition ability. Its performance not only surpasses unimodal ASR, but also achieves SOTA among existing image-based multimodal ASR.	翻訳日:2024-11-05 03:55:54 公開日:2024-10-04
# VHASR:視覚ホットワードを用いたマルチモーダル音声認識システム VHASR: A Multimodal Speech Recognition System With Vision Hotwords ( http://arxiv.org/abs/2410.00822v2 ) ライセンス: Link先を確認	Jiliang Hu, Zuchao Li, Ping Wang, Haojun Ai, Lefei Zhang, Hai Zhao,	(参考訳) 画像に基づくマルチモーダル音声認識(ASR)モデルは、音声関連画像を組み込んだ音声認識性能を向上させる。しかし、モデルに画像情報を導入することは、ASRの性能向上に寄与しない、という研究もある。本稿では,音声関連画像情報を活用した新しい手法を提案し,視覚をホットワードとして利用するマルチモーダル音声認識システムVHASRを提案する。本システムでは,まず2つのストリームのテキストを別々に書き起こし,出力を合成する。提案したモデルをFlickr8k,ADE20k,COCO,OpenImagesの4つのデータセットで評価した。実験の結果,VHASRは画像のキー情報を効果的に活用し,モデルの音声認識能力を向上できることがわかった。既存の画像ベースマルチモーダル ASR の中で,その性能は単調な ASR を上回るだけでなく,SOTA も達成している。 The image-based multimodal automatic speech recognition (ASR) model enhances speech recognition performance by incorporating audio-related image. However, some works suggest that introducing image information to model does not help improving ASR performance. In this paper, we propose a novel approach effectively utilizing audio-related image information and set up VHASR, a multimodal speech recognition system that uses vision as hotwords to strengthen the model's speech recognition capability. Our system utilizes a dual-stream architecture, which firstly transcribes the text on the two streams separately, and then combines the outputs. We evaluate the proposed model on four datasets: Flickr8k, ADE20k, COCO, and OpenImages. The experimental results show that VHASR can effectively utilize key information in images to enhance the model's speech recognition ability. Its performance not only surpasses unimodal ASR, but also achieves SOTA among existing image-based multimodal ASR.	翻訳日:2024-11-05 03:55:54 公開日:2024-10-04
# 長軸マニピュレーションタスクのための安定力学系のシングルショット学習 Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks ( http://arxiv.org/abs/2410.01033v1 ) ライセンス: Link先を確認	Alexandre St-Aubin, Amin Abyaneh, Hsiu-Chin Lin,	(参考訳) 複雑なシーケンシャルなタスクをマスターすることは、ロボティクスにおいて重要な課題である。長距離操作タスクの学習は進歩してきたが、既存のほとんどのアプローチは信頼性と成功を保証するための厳密な数学的保証を欠いている。本稿では,課題達成率の向上に焦点をあて,必要となるトレーニングデータの量を削減することを目的とした,長期的タスクの学習と安定政策に関するこれまでの取り組みを拡張する。提案手法では,(1)経路ポイントとサブゴールによって定義された離散的なステップに分割し,(2)知覚ノイズやランダムな乱れに直面した場合でも,ロボットを各サブゴールに誘導するグローバルな動的システムポリシーを学習する。シミュレーションと実世界の両方の実験を通して,本手法を検証し,シミュレーションから物理ロボットプラットフォームへの効果的移行を実証した。コードはhttps://github.com/Alestaubin/stable-imitation-policy-with-waypointsで公開されている。 Mastering complex sequential tasks continues to pose a significant challenge in robotics. While there has been progress in learning long-horizon manipulation tasks, most existing approaches lack rigorous mathematical guarantees for ensuring reliable and successful execution. In this paper, we extend previous work on learning long-horizon tasks and stable policies, focusing on improving task success rates while reducing the amount of training data needed. Our approach introduces a novel method that (1) segments long-horizon demonstrations into discrete steps defined by waypoints and subgoals, and (2) learns globally stable dynamical system policies to guide the robot to each subgoal, even in the face of sensory noise and random disturbances. We validate our approach through both simulation and real-world experiments, demonstrating effective transfer from simulation to physical robotic platforms. Code is available at https://github.com/Alestaubin/stable-imitation-policy-with-waypoints	翻訳日:2024-11-04 23:40:11 公開日:2024-10-04
# 長軸マニピュレーションタスクのための安定力学系のシングルショット学習 Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks ( http://arxiv.org/abs/2410.01033v2 ) ライセンス: Link先を確認	Alexandre St-Aubin, Amin Abyaneh, Hsiu-Chin Lin,	(参考訳) 複雑なシーケンシャルなタスクをマスターすることは、ロボティクスにおいて重要な課題である。長距離操作タスクの学習は進歩してきたが、既存のほとんどのアプローチは信頼性と成功を保証するための厳密な数学的保証を欠いている。本稿では,課題達成率の向上に焦点をあて,必要となるトレーニングデータの量を削減することを目的とした,長期的タスクの学習と安定政策に関するこれまでの取り組みを拡張する。提案手法では,(1)経路ポイントとサブゴールによって定義された離散的なステップに分割し,(2)知覚ノイズやランダムな乱れに直面した場合でも,ロボットを各サブゴールに誘導するグローバルな動的システムポリシーを学習する。シミュレーションと実世界の両方の実験を通して,本手法を検証し,シミュレーションから物理ロボットプラットフォームへの効果的移行を実証した。コードはhttps://github.com/Alestaubin/stable-imitation-policy-with-waypointsで公開されている。 Mastering complex sequential tasks continues to pose a significant challenge in robotics. While there has been progress in learning long-horizon manipulation tasks, most existing approaches lack rigorous mathematical guarantees for ensuring reliable and successful execution. In this paper, we extend previous work on learning long-horizon tasks and stable policies, focusing on improving task success rates while reducing the amount of training data needed. Our approach introduces a novel method that (1) segments long-horizon demonstrations into discrete steps defined by waypoints and subgoals, and (2) learns globally stable dynamical system policies to guide the robot to each subgoal, even in the face of sensory noise and random disturbances. We validate our approach through both simulation and real-world experiments, demonstrating effective transfer from simulation to physical robotic platforms. Code is available at https://github.com/Alestaubin/stable-imitation-policy-with-waypoints	翻訳日:2024-11-04 23:30:27 公開日:2024-10-04
# 必要なRNNは全部あるのか? Were RNNs All We Needed? ( http://arxiv.org/abs/2410.01201v1 ) ライセンス: Link先を確認	Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, Hossein Hajimirsadegh,	(参考訳) シーケンス長に関するトランスフォーマーのスケーラビリティ制限は、トレーニング中に並列化可能なリカレントシーケンスモデルに新たな関心を寄せている。その結果、S4、Mamba、Aarenといった新しい再並行アーキテクチャが、同等のパフォーマンスを実現するために提案されている。本研究では、従来のリカレントニューラルネットワーク(RNN)を10年以上前のLSTM(1997年)とGRU(2014年)で再検討する。これらのモデルは,時間的バックプロパゲーション(BPTT)を必要とするため遅いが,入力から隠れた状態依存を取り除くことで,LSTMやGRUはBPTTを必要とせず,並列で効率的に訓練できることを示す。これに基づいて,(1)従来のパラメータよりもはるかに少ないパラメータを使用する最小バージョン (minLSTMs と minGRUs) を導入し,(2) はトレーニング中に完全に並列化可能である(長さ512のシーケンスでは175倍高速)。最後に、これらの10年前のRNNの取り除かれたバージョンは、最近のシーケンスモデルの実証的な性能と一致していることを示す。 The scalability limitations of Transformers regarding sequence length have renewed interest in recurrent sequence models that are parallelizable during training. As a result, many novel recurrent architectures, such as S4, Mamba, and Aaren, have been proposed that achieve comparable performance. In this work, we revisit traditional recurrent neural networks (RNNs) from over a decade ago: LSTMs (1997) and GRUs (2014). While these models were slow due to requiring to backpropagate through time (BPTT), we show that by removing their hidden state dependencies from their input, forget, and update gates, LSTMs and GRUs no longer need to BPTT and can be efficiently trained in parallel. Building on this, we introduce minimal versions (minLSTMs and minGRUs) that (1) use significantly fewer parameters than their traditional counterparts and (2) are fully parallelizable during training (175x faster for a sequence of length 512). Lastly, we show that these stripped-down versions of decade-old RNNs match the empirical performance of recent sequence models.	翻訳日:2024-11-04 22:50:44 公開日:2024-10-04
# 必要なRNNは全部あるのか? Were RNNs All We Needed? ( http://arxiv.org/abs/2410.01201v2 ) ライセンス: Link先を確認	Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, Hossein Hajimirsadegh,	(参考訳) シーケンス長に関するトランスフォーマーのスケーラビリティ制限は、トレーニング中に並列化可能なリカレントシーケンスモデルに新たな関心を寄せている。その結果、S4、Mamba、Aarenといった新しい再並行アーキテクチャが、同等のパフォーマンスを実現するために提案されている。本研究では、従来のリカレントニューラルネットワーク(RNN)を10年以上前のLSTM(1997年)とGRU(2014年)で再検討する。これらのモデルは,時間的バックプロパゲーション(BPTT)を必要とするため遅いが,入力から隠れた状態依存を取り除くことで,LSTMやGRUはBPTTを必要とせず,並列で効率的に訓練できることを示す。これに基づいて,(1)従来のパラメータよりもはるかに少ないパラメータを使用する最小バージョン (minLSTMs と minGRUs) を導入し,(2) はトレーニング中に完全に並列化可能である(長さ512のシーケンスでは175倍高速)。最後に、これらの10年前のRNNの取り除かれたバージョンは、最近のシーケンスモデルの実証的な性能と一致していることを示す。 The scalability limitations of Transformers regarding sequence length have renewed interest in recurrent sequence models that are parallelizable during training. As a result, many novel recurrent architectures, such as S4, Mamba, and Aaren, have been proposed that achieve comparable performance. In this work, we revisit traditional recurrent neural networks (RNNs) from over a decade ago: LSTMs (1997) and GRUs (2014). While these models were slow due to requiring to backpropagate through time (BPTT), we show that by removing their hidden state dependencies from their input, forget, and update gates, LSTMs and GRUs no longer need to BPTT and can be efficiently trained in parallel. Building on this, we introduce minimal versions (minLSTMs and minGRUs) that (1) use significantly fewer parameters than their traditional counterparts and (2) are fully parallelizable during training (175x faster for a sequence of length 512). Lastly, we show that these stripped-down versions of decade-old RNNs match the empirical performance of recent sequence models.	翻訳日:2024-11-04 22:40:58 公開日:2024-10-04
# CSIM:画像品質評価のための局所的変化に敏感なコピュラに基づく類似度指数 CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment ( http://arxiv.org/abs/2410.01411v1 ) ライセンス: Link先を確認	Safouane El Ghazouali, Umberto Michelucci, Yassin El Hillali, Hichem Nouira,	(参考訳) 画像類似度メトリクスは、画像処理、コンピュータビジョン、機械学習で使用されるため、コンピュータビジョンアプリケーションにおいて重要な役割を果たす。さらに、これらのメトリクスは、画像検索、オブジェクト認識、品質評価などのタスクを可能にし、医療、天文学、監視といった分野に必須である。 PSNR、MSE、SSIM、ISSM、FSIMといった既存のメトリクスは、画像の小さな変更に対する速度、複雑さ、感度のいずれにおいても制限に直面していることが多い。これらの課題に対処するために,画像の微妙な変化に敏感でありながらリアルタイムに組み合わせた新しい画像類似度指標CSIMについて検討した。この新しい計量は、確率論からガウスコピュラを使い、画像が局所的な画像パッチに関連する画素分布のベクトルに変換する。これらのベクトルには、強度と画素位置に加えて、画素値間の依存関係に関する情報が含まれ、画像内の構造的関係をキャプチャする。 Copulasの特性を活用することで、CSIMはピクセル強度の結合分布を効果的にモデル化し、画像パッチのより微妙な比較を可能にし、他のメトリクスと比較して局所的な変化に敏感になる。実験により、CSIMは、ノイズ、圧縮アーティファクト、ぼやけなど、様々な画像歪みシナリオにおいて、既存の類似度指標よりも優れていることが示された。この距離計が微妙な違いを検知する能力は、医用画像などの高精度なアプリケーションに適しており、小さな異常の検出が重要となる可能性がある。この研究で得られた結果は、このGithubリポジトリから再現することができる。 Image similarity metrics play an important role in computer vision applications, as they are used in image processing, computer vision and machine learning. Furthermore, those metrics enable tasks such as image retrieval, object recognition and quality assessment, essential in fields like healthcare, astronomy and surveillance. Existing metrics, such as PSNR, MSE, SSIM, ISSM and FSIM, often face limitations in terms of either speed, complexity or sensitivity to small changes in images. To address these challenges, a novel image similarity metric, namely CSIM, that combines real-time while being sensitive to subtle image variations is investigated in this paper. The novel metric uses Gaussian Copula from probability theory to transform an image into vectors of pixel distribution associated to local image patches. These vectors contain, in addition to intensities and pixel positions, information on the dependencies between pixel values, capturing the structural relationships within the image. By leveraging the properties of Copulas, CSIM effectively models the joint distribution of pixel intensities, enabling a more nuanced comparison of image patches making it more sensitive to local changes compared to other metrics. Experimental results demonstrate that CSIM outperforms existing similarity metrics in various image distortion scenarios, including noise, compression artifacts and blur. The metric's ability to detect subtle differences makes it suitable for applications requiring high precision, such as medical imaging, where the detection of minor anomalies can be of a high importance. The results obtained in this work can be reproduced from this Github repository: https://github.com/safouaneelg/copulasimilarity.	翻訳日:2024-11-04 21:09:23 公開日:2024-10-04
# CSIM:画像品質評価のための局所的変化に敏感なコピュラに基づく類似度指数 CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment ( http://arxiv.org/abs/2410.01411v2 ) ライセンス: Link先を確認	Safouane El Ghazouali, Umberto Michelucci, Yassin El Hillali, Hichem Nouira,	(参考訳) 画像類似度メトリクスは、画像処理、コンピュータビジョン、機械学習で使用されるため、コンピュータビジョンアプリケーションにおいて重要な役割を果たす。さらに、これらのメトリクスは、画像検索、オブジェクト認識、品質評価などのタスクを可能にし、医療、天文学、監視といった分野に必須である。 PSNR、MSE、SSIM、ISSM、FSIMといった既存のメトリクスは、画像の小さな変更に対する速度、複雑さ、感度のいずれにおいても制限に直面していることが多い。これらの課題に対処するために,画像の微妙な変化に敏感でありながらリアルタイムに組み合わせた新しい画像類似度指標CSIMについて検討した。この新しい計量は、確率論からガウスコピュラを使い、画像が局所的な画像パッチに関連する画素分布のベクトルに変換する。これらのベクトルには、強度と画素位置に加えて、画素値間の依存関係に関する情報が含まれ、画像内の構造的関係をキャプチャする。 Copulasの特性を活用することで、CSIMはピクセル強度の結合分布を効果的にモデル化し、画像パッチのより微妙な比較を可能にし、他のメトリクスと比較して局所的な変化に敏感になる。実験により、CSIMは、ノイズ、圧縮アーティファクト、ぼやけなど、様々な画像歪みシナリオにおいて、既存の類似度指標よりも優れていることが示された。この距離計が微妙な違いを検知する能力は、医用画像などの高精度なアプリケーションに適しており、小さな異常の検出が重要となる可能性がある。この研究で得られた結果は、このGithubリポジトリから再現することができる。 Image similarity metrics play an important role in computer vision applications, as they are used in image processing, computer vision and machine learning. Furthermore, those metrics enable tasks such as image retrieval, object recognition and quality assessment, essential in fields like healthcare, astronomy and surveillance. Existing metrics, such as PSNR, MSE, SSIM, ISSM and FSIM, often face limitations in terms of either speed, complexity or sensitivity to small changes in images. To address these challenges, a novel image similarity metric, namely CSIM, that combines real-time while being sensitive to subtle image variations is investigated in this paper. The novel metric uses Gaussian Copula from probability theory to transform an image into vectors of pixel distribution associated to local image patches. These vectors contain, in addition to intensities and pixel positions, information on the dependencies between pixel values, capturing the structural relationships within the image. By leveraging the properties of Copulas, CSIM effectively models the joint distribution of pixel intensities, enabling a more nuanced comparison of image patches making it more sensitive to local changes compared to other metrics. Experimental results demonstrate that CSIM outperforms existing similarity metrics in various image distortion scenarios, including noise, compression artifacts and blur. The metric's ability to detect subtle differences makes it suitable for applications requiring high precision, such as medical imaging, where the detection of minor anomalies can be of a high importance. The results obtained in this work can be reproduced from this Github repository: https://github.com/safouaneelg/copulasimilarity.	翻訳日:2024-11-04 21:09:23 公開日:2024-10-04
# Verbalized Graph Representation Learning: Entire Processを通しての大規模言語モデルに基づく完全解釈可能なグラフモデル Verbalized Graph Representation Learning: A Fully Interpretable Graph Model Based on Large Language Models Throughout the Entire Process ( http://arxiv.org/abs/2410.01457v1 ) ライセンス: Link先を確認	Xingyu Ji, Jiale Liu, Lu Li, Maojun Wang, Zeyu Zhang,	(参考訳) テキスト分散グラフ(TAGs)での表現学習は、特にグラフニューラルネットワーク(GNNs)を通じて、広範囲にわたる実世界の応用により、大きな関心を集めている。従来のGNN手法はグラフの構造情報を符号化することに重点を置いており、しばしばノードやエッジ属性の浅いテキスト埋め込みを用いている。これにより、データ内のリッチなセマンティック情報と、複雑な下流タスクの推論能力を理解することができ、解釈可能性も欠如する。大規模言語モデル(LLM)の台頭に伴い、グラフ表現学習や下流タスクのためのGNNと組み合わせた研究が増えている。これらのアプローチは、TAGsデータセットのリッチなセマンティック情報を効果的に活用するが、主な欠点は、部分的に解釈可能であり、クリティカルフィールドでの応用を制限することである。本稿では,完全に解釈可能な言語型グラフ表現学習(VGRL)手法を提案する。通常、連続的なパラメータ空間内で最適化される従来のグラフ機械学習モデルとは対照的に、VGRLは、このパラメータ空間をテキスト記述に制限し、プロセス全体を通して完全な解釈可能性を保証する。我々は,VGRLの有効性を実証的に評価するためにいくつかの研究を行い,これらの手法がグラフ表現学習におけるステップストーンとして役立つと信じている。 Representation learning on text-attributed graphs (TAGs) has attracted significant interest due to its wide-ranging real-world applications, particularly through Graph Neural Networks (GNNs). Traditional GNN methods focus on encoding the structural information of graphs, often using shallow text embeddings for node or edge attributes. This limits the model to understand the rich semantic information in the data and its reasoning ability for complex downstream tasks, while also lacking interpretability. With the rise of large language models (LLMs), an increasing number of studies are combining them with GNNs for graph representation learning and downstream tasks. While these approaches effectively leverage the rich semantic information in TAGs datasets, their main drawback is that they are only partially interpretable, which limits their application in critical fields. In this paper, we propose a verbalized graph representation learning (VGRL) method which is fully interpretable. In contrast to traditional graph machine learning models, which are usually optimized within a continuous parameter space, VGRL constrains this parameter space to be text description which ensures complete interpretability throughout the entire process, making it easier for users to understand and trust the decisions of the model. We conduct several studies to empirically evaluate the effectiveness of VGRL and we believe these method can serve as a stepping stone in graph representation learning.	翻訳日:2024-11-04 17:44:25 公開日:2024-10-04
# Verbalized Graph Representation Learning: Entire Processを通しての大規模言語モデルに基づく完全解釈可能なグラフモデル Verbalized Graph Representation Learning: A Fully Interpretable Graph Model Based on Large Language Models Throughout the Entire Process ( http://arxiv.org/abs/2410.01457v2 ) ライセンス: Link先を確認	Xingyu Ji, Jiale Liu, Lu Li, Maojun Wang, Zeyu Zhang,	(参考訳) テキスト分散グラフ(TAGs)での表現学習は、特にグラフニューラルネットワーク(GNNs)を通じて、広範囲にわたる実世界の応用により、大きな関心を集めている。従来のGNN手法はグラフの構造情報を符号化することに重点を置いており、しばしばノードやエッジ属性の浅いテキスト埋め込みを用いている。これにより、データ内のリッチなセマンティック情報と、複雑な下流タスクの推論能力を理解することができ、解釈可能性も欠如する。大規模言語モデル(LLM)の台頭に伴い、グラフ表現学習や下流タスクのためのGNNと組み合わせた研究が増えている。これらのアプローチは、TAGsデータセットのリッチなセマンティック情報を効果的に活用するが、主な欠点は、部分的に解釈可能であり、クリティカルフィールドでの応用を制限することである。本稿では,完全に解釈可能な言語型グラフ表現学習(VGRL)手法を提案する。通常、連続的なパラメータ空間内で最適化される従来のグラフ機械学習モデルとは対照的に、VGRLは、このパラメータ空間をテキスト記述に制限し、プロセス全体を通して完全な解釈可能性を保証する。我々は,VGRLの有効性を実証的に評価するためにいくつかの研究を行い,これらの手法がグラフ表現学習におけるステップストーンとして役立つと信じている。 Representation learning on text-attributed graphs (TAGs) has attracted significant interest due to its wide-ranging real-world applications, particularly through Graph Neural Networks (GNNs). Traditional GNN methods focus on encoding the structural information of graphs, often using shallow text embeddings for node or edge attributes. This limits the model to understand the rich semantic information in the data and its reasoning ability for complex downstream tasks, while also lacking interpretability. With the rise of large language models (LLMs), an increasing number of studies are combining them with GNNs for graph representation learning and downstream tasks. While these approaches effectively leverage the rich semantic information in TAGs datasets, their main drawback is that they are only partially interpretable, which limits their application in critical fields. In this paper, we propose a verbalized graph representation learning (VGRL) method which is fully interpretable. In contrast to traditional graph machine learning models, which are usually optimized within a continuous parameter space, VGRL constrains this parameter space to be text description which ensures complete interpretability throughout the entire process, making it easier for users to understand and trust the decisions of the model. We conduct several studies to empirically evaluate the effectiveness of VGRL and we believe these method can serve as a stepping stone in graph representation learning.	翻訳日:2024-11-04 17:44:25 公開日:2024-10-04
# フェデレーション学習における低ランク適応のための選択的集約 Selective Aggregation for Low-Rank Adaptation in Federated Learning ( http://arxiv.org/abs/2410.01463v1 ) ライセンス: Link先を確認	Pengxin Guo, Shuang Zeng, Yanran Wang, Huijie Fan, Feifei Wang, Liangqiong Qu,	(参考訳) 我々は、学習した$A$および$B$行列の非対称性解析のレンズを通して、連合学習におけるLoRAについて検討する。そうすることで、$A$行列が一般的な知識を学習するのに対して、$B$行列はクライアント固有の知識を取得することに重点を置いていることがわかりました。この発見に基づいて、フェデレートシェアA 低ランク適応(FedSA-LoRA)を導入し、重量更新をモデル化するために2つの低ランクトレーニング可能な行列をA$とB$で使用するが、集約のためにサーバと共有されるのはA$のみである。さらに、学習した$A$と$B$の関係をrsLoRAやVeRAといった他のLoRA変種で調べ、一貫したパターンを明らかにします。その結果,FedSA-rsLoRA法をこれらのLoRA変種に拡張し,FedSA-rsLoRA法とFedSA-VeRA法が得られた。このようにして、Lora と FL を統合するための一般的なパラダイムを確立し、その後の LoRA 変種と FL を組み合わせるためのガイダンスを提供する。自然言語理解と生成タスクに関する大規模な実験結果から,提案手法の有効性が示された。 We investigate LoRA in federated learning through the lens of the asymmetry analysis of the learned $A$ and $B$ matrices. In doing so, we uncover that $A$ matrices are responsible for learning general knowledge, while $B$ matrices focus on capturing client-specific knowledge. Based on this finding, we introduce Federated Share-A Low-Rank Adaptation (FedSA-LoRA), which employs two low-rank trainable matrices $A$ and $B$ to model the weight update, but only $A$ matrices are shared with the server for aggregation. Moreover, we delve into the relationship between the learned $A$ and $B$ matrices in other LoRA variants, such as rsLoRA and VeRA, revealing a consistent pattern. Consequently, we extend our FedSA-LoRA method to these LoRA variants, resulting in FedSA-rsLoRA and FedSA-VeRA. In this way, we establish a general paradigm for integrating LoRA with FL, offering guidance for future work on subsequent LoRA variants combined with FL. Extensive experimental results on natural language understanding and generation tasks demonstrate the effectiveness of the proposed method.	翻訳日:2024-11-04 17:34:40 公開日:2024-10-04
# フェデレーション学習における低ランク適応のための選択的集約 Selective Aggregation for Low-Rank Adaptation in Federated Learning ( http://arxiv.org/abs/2410.01463v2 ) ライセンス: Link先を確認	Pengxin Guo, Shuang Zeng, Yanran Wang, Huijie Fan, Feifei Wang, Liangqiong Qu,	(参考訳) 我々は、学習した$A$および$B$行列の非対称性解析のレンズを通して、連合学習におけるLoRAについて検討する。そうすることで、$A$行列が一般的な知識を学習するのに対して、$B$行列はクライアント固有の知識を取得することに重点を置いていることがわかりました。この発見に基づいて、フェデレートシェアA 低ランク適応(FedSA-LoRA)を導入し、重量更新をモデル化するために2つの低ランクトレーニング可能な行列をA$とB$で使用するが、集約のためにサーバと共有されるのはA$のみである。さらに、学習した$A$と$B$の関係をrsLoRAやVeRAといった他のLoRA変種で調べ、一貫したパターンを明らかにします。その結果,FedSA-rsLoRA法をこれらのLoRA変種に拡張し,FedSA-rsLoRA法とFedSA-VeRA法が得られた。このようにして、Lora と FL を統合するための一般的なパラダイムを確立し、その後の LoRA 変種と FL を組み合わせるためのガイダンスを提供する。自然言語理解と生成タスクに関する大規模な実験結果から,提案手法の有効性が示された。私たちのコードはhttps://github.com/Pengxin-Guo/FedSA-LoRAで公開されています。 We investigate LoRA in federated learning through the lens of the asymmetry analysis of the learned $A$ and $B$ matrices. In doing so, we uncover that $A$ matrices are responsible for learning general knowledge, while $B$ matrices focus on capturing client-specific knowledge. Based on this finding, we introduce Federated Share-A Low-Rank Adaptation (FedSA-LoRA), which employs two low-rank trainable matrices $A$ and $B$ to model the weight update, but only $A$ matrices are shared with the server for aggregation. Moreover, we delve into the relationship between the learned $A$ and $B$ matrices in other LoRA variants, such as rsLoRA and VeRA, revealing a consistent pattern. Consequently, we extend our FedSA-LoRA method to these LoRA variants, resulting in FedSA-rsLoRA and FedSA-VeRA. In this way, we establish a general paradigm for integrating LoRA with FL, offering guidance for future work on subsequent LoRA variants combined with FL. Extensive experimental results on natural language understanding and generation tasks demonstrate the effectiveness of the proposed method. Our code is available at https://github.com/Pengxin-Guo/FedSA-LoRA.	翻訳日:2024-11-04 17:34:40 公開日:2024-10-04
# HarmAug: 安全ガードモデルの知識蒸留に有効なデータ拡張 HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models ( http://arxiv.org/abs/2410.01524v1 ) ライセンス: Link先を確認	Seanie Lee, Haebin Seong, Dong Bok Lee, Minki Kang, Xiaoyin Chen, Dominik Wagner, Yoshua Bengio, Juho Lee, Sung Ju Hwang,	(参考訳) 大規模言語モデル(LLM)を対象とした悪意のあるクエリを検出する安全ガードモデルは、現実世界のアプリケーションにおけるLLMのセキュアで責任あるデプロイを保証するために不可欠である。しかし、モバイル機器にLLMと並行して数十億のパラメータを持つ既存の安全ガードモデルをデプロイするのは、かなりのメモリ要件とレイテンシのために現実的ではない。このコストを削減するため、二元的有害性ラベルを持つ命令応答対のラベル付きデータセットを用いて、大規模な教師安全ガードモデルをより小さなものに蒸留する。既存のラベル付きデータセットの有害な命令の多様性が限られているため、ナトリウム蒸留モデルはより大きなモデルに比べて性能が劣る傾向にある。小型モデルと大規模モデルの間のギャップを埋めるため,LLMをジェイルブレイクして有害な命令を生成する単純なデータ拡張手法であるHarmAugを提案する。攻撃的コンテンツを誘発する単一の有害なインストラクションプロンプト」のようなプロンプトが与えられたら、LLMの応答に肯定的なプレフィックス(例:"I have an idea for a prompt:")を追加する。これによりLSMは応答の残りを引き続き生成し、有害な命令をサンプリングする。別のLCMは有害な命令に対する応答を生成し、教師モデルは命令応答対をラベル付けする。 HarmAugが他の関連するベースラインより優れていることを実証的に示しています。さらに、HarmAugでトレーニングされた435万パラメータの安全ガードモデルは、70億以上のパラメータを持つ大型モデルに匹敵するF1スコアを達成し、計算コストの25%未満で運用しながら、AUPRCでそれを上回ります。 Safety guard models that detect malicious queries aimed at large language models (LLMs) are essential for ensuring the secure and responsible deployment of LLMs in real-world applications. However, deploying existing safety guard models with billions of parameters alongside LLMs on mobile devices is impractical due to substantial memory requirements and latency. To reduce this cost, we distill a large teacher safety guard model into a smaller one using a labeled dataset of instruction-response pairs with binary harmfulness labels. Due to the limited diversity of harmful instructions in the existing labeled dataset, naively distilled models tend to underperform compared to larger models. To bridge the gap between small and large models, we propose HarmAug, a simple yet effective data augmentation method that involves jailbreaking an LLM and prompting it to generate harmful instructions. Given a prompt such as, "Make a single harmful instruction prompt that would elicit offensive content", we add an affirmative prefix (e.g., "I have an idea for a prompt:") to the LLM's response. This encourages the LLM to continue generating the rest of the response, leading to sampling harmful instructions. Another LLM generates a response to the harmful instruction, and the teacher model labels the instruction-response pair. We empirically show that our HarmAug outperforms other relevant baselines. Moreover, a 435-million-parameter safety guard model trained with HarmAug achieves an F1 score comparable to larger models with over 7 billion parameters, and even outperforms them in AUPRC, while operating at less than 25% of their computational cost.	翻訳日:2024-11-04 17:14:45 公開日:2024-10-04
# HarmAug: 安全ガードモデルの知識蒸留に有効なデータ拡張 HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models ( http://arxiv.org/abs/2410.01524v2 ) ライセンス: Link先を確認	Seanie Lee, Haebin Seong, Dong Bok Lee, Minki Kang, Xiaoyin Chen, Dominik Wagner, Yoshua Bengio, Juho Lee, Sung Ju Hwang,	(参考訳) 大規模言語モデル(LLM)を対象とした悪意のあるクエリを検出する安全ガードモデルは、現実世界のアプリケーションにおけるLLMのセキュアで責任あるデプロイを保証するために不可欠である。しかし、モバイル機器にLLMと並行して数十億のパラメータを持つ既存の安全ガードモデルをデプロイするのは、かなりのメモリ要件とレイテンシのために現実的ではない。このコストを削減するため、二元的有害性ラベルを持つ命令応答対のラベル付きデータセットを用いて、大規模な教師安全ガードモデルをより小さなものに蒸留する。既存のラベル付きデータセットの有害な命令の多様性が限られているため、ナトリウム蒸留モデルはより大きなモデルに比べて性能が劣る傾向にある。小型モデルと大規模モデルの間のギャップを埋めるため,LLMをジェイルブレイクして有害な命令を生成する単純なデータ拡張手法であるHarmAugを提案する。攻撃的コンテンツを誘発する単一の有害なインストラクションプロンプト」のようなプロンプトが与えられたら、LLMの応答に肯定的なプレフィックス(例:"I have an idea for a prompt:")を追加する。これによりLSMは応答の残りを引き続き生成し、有害な命令をサンプリングする。別のLCMは有害な命令に対する応答を生成し、教師モデルは命令応答対をラベル付けする。 HarmAugが他の関連するベースラインより優れていることを実証的に示しています。さらに、HarmAugでトレーニングされた435万パラメータの安全ガードモデルは、70億以上のパラメータを持つ大型モデルに匹敵するF1スコアを達成し、計算コストの25%未満で運用しながら、AUPRCでそれを上回ります。 Safety guard models that detect malicious queries aimed at large language models (LLMs) are essential for ensuring the secure and responsible deployment of LLMs in real-world applications. However, deploying existing safety guard models with billions of parameters alongside LLMs on mobile devices is impractical due to substantial memory requirements and latency. To reduce this cost, we distill a large teacher safety guard model into a smaller one using a labeled dataset of instruction-response pairs with binary harmfulness labels. Due to the limited diversity of harmful instructions in the existing labeled dataset, naively distilled models tend to underperform compared to larger models. To bridge the gap between small and large models, we propose HarmAug, a simple yet effective data augmentation method that involves jailbreaking an LLM and prompting it to generate harmful instructions. Given a prompt such as, "Make a single harmful instruction prompt that would elicit offensive content", we add an affirmative prefix (e.g., "I have an idea for a prompt:") to the LLM's response. This encourages the LLM to continue generating the rest of the response, leading to sampling harmful instructions. Another LLM generates a response to the harmful instruction, and the teacher model labels the instruction-response pair. We empirically show that our HarmAug outperforms other relevant baselines. Moreover, a 435-million-parameter safety guard model trained with HarmAug achieves an F1 score comparable to larger models with over 7 billion parameters, and even outperforms them in AUPRC, while operating at less than 25% of their computational cost.	翻訳日:2024-11-04 17:14:45 公開日:2024-10-04
# 球面上のひび割れしたカーネル確率勾配 Truncated Kernel Stochastic Gradient Descent on Spheres ( http://arxiv.org/abs/2410.01570v1 ) ライセンス: Link先を確認	JinHui Bai, Lei Shi,	(参考訳) 球面高調波の構造に着想を得て,最小二乗損失関数を持つT-kernel SGDアルゴリズムを提案する。 TカーネルSGDは「トランケーション」演算を用いており、直列ベースのカーネル関数を確率勾配降下に適用することで、高次元空間で適切な閉形式カーネル関数を見つけるのが困難になるのを避けることができる。従来のカーネルSGDとは対照的に、TカーネルSGDは反復中に仮説空間を動的に調整することでバイアスと分散のバランスをとるのにより効果的である。提案アルゴリズムの最も重要な利点は、カーネルSGDの固有の飽和問題を克服しつつ、一定のステップサイズ(サンプルサイズに依存しない)を用いて理論的に最適な収束率を達成することができることである。さらに、球面多項式の構造を利用して等価なTカーネルSGDを導出し、カーネルSGDと比較してストレージと計算コストを大幅に削減する。典型的には、Tカーネル SGD は、計算複雑性が$\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$計算量と$\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$ストレージを必要とする。この規則性は、目的関数の滑らか度パラメータと、カーネル関数に付随する積分作用素の固有値の減衰率によって決定され、どちらも推定問題の難しさを反映している。本研究の主な成果は,この先行情報がTカーネルSGDの収束にどのように影響するかを定量的に評価することである。数値実験により, 本論文で示された理論的知見をさらに検証した。 Inspired by the structure of spherical harmonics, we propose the truncated kernel stochastic gradient descent (T-kernel SGD) algorithm with a least-square loss function for spherical data fitting. T-kernel SGD employs a "truncation" operation, enabling the application of a series-based kernel function in stochastic gradient descent, thereby avoiding the difficulties of finding suitable closed-form kernel functions in high-dimensional spaces. In contrast to traditional kernel SGD, T-kernel SGD is more effective in balancing bias and variance by dynamically adjusting the hypothesis space during iterations. The most significant advantage of the proposed algorithm is that it can achieve theoretically optimal convergence rates using a constant step size (independent of the sample size) while overcoming the inherent saturation problem of kernel SGD. Additionally, we leverage the structure of spherical polynomials to derive an equivalent T-kernel SGD, significantly reducing storage and computational costs compared to kernel SGD. Typically, T-kernel SGD requires only $\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$ computational complexity and $\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$ storage to achieve optimal rates for the d-dimensional sphere, where $0<\epsilon<\frac{1}{2}$ can be arbitrarily small if the optimal fitting or the underlying space possesses sufficient regularity. This regularity is determined by the smoothness parameter of the objective function and the decaying rate of the eigenvalues of the integral operator associated with the kernel function, both of which reflect the difficulty of the estimation problem. Our main results quantitatively characterize how this prior information influences the convergence of T-kernel SGD. The numerical experiments further validate the theoretical findings presented in this paper.	翻訳日:2024-11-04 16:54:49 公開日:2024-10-04
# 球面上のひび割れしたカーネル確率勾配 Truncated Kernel Stochastic Gradient Descent on Spheres ( http://arxiv.org/abs/2410.01570v2 ) ライセンス: Link先を確認	JinHui Bai, Lei Shi,	(参考訳) 球面高調波の構造に着想を得て,最小二乗損失関数を持つT-kernel SGDアルゴリズムを提案する。 TカーネルSGDは「トランケーション」演算を用い、直列ベースのカーネル関数を確率勾配下降に適用することで、高次元空間で適切な閉形式カーネル関数を見つけるのが困難になるのを避ける。従来のカーネルSGDとは対照的に、TカーネルSGDは反復中に仮説空間を動的に調整することでバイアスと分散のバランスをとるのにより効果的である。提案アルゴリズムの最も重要な利点は、カーネルSGDの固有の飽和問題を克服しつつ、一定のステップサイズ(サンプルサイズに依存しない)を用いて理論的に最適な収束率を達成することができることである。さらに、球面多項式の構造を利用して等価なTカーネルSGDを導出し、カーネルSGDと比較してストレージと計算コストを大幅に削減する。典型的には、Tカーネル SGD は、計算複雑性が$\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$計算量と$\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$ストレージを必要とする。この規則性は、目的関数の滑らか度パラメータと、カーネル関数に付随する積分作用素の固有値の減衰率によって決定され、どちらも推定問題の難しさを反映している。本研究の主な成果は,この先行情報がTカーネルSGDの収束にどのように影響するかを定量的に評価することである。数値実験により, 本論文で示された理論的知見をさらに検証した。 Inspired by the structure of spherical harmonics, we propose the truncated kernel stochastic gradient descent (T-kernel SGD) algorithm with a least-square loss function for spherical data fitting. T-kernel SGD employs a "truncation" operation, enabling the application of series-based kernels function in stochastic gradient descent, thereby avoiding the difficulties of finding suitable closed-form kernel functions in high-dimensional spaces. In contrast to traditional kernel SGD, T-kernel SGD is more effective in balancing bias and variance by dynamically adjusting the hypothesis space during iterations. The most significant advantage of the proposed algorithm is that it can achieve theoretically optimal convergence rates using a constant step size (independent of the sample size) while overcoming the inherent saturation problem of kernel SGD. Additionally, we leverage the structure of spherical polynomials to derive an equivalent T-kernel SGD, significantly reducing storage and computational costs compared to kernel SGD. Typically, T-kernel SGD requires only $\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$ computational complexity and $\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$ storage to achieve optimal rates for the d-dimensional sphere, where $0<\epsilon<\frac{1}{2}$ can be arbitrarily small if the optimal fitting or the underlying space possesses sufficient regularity. This regularity is determined by the smoothness parameter of the objective function and the decaying rate of the eigenvalues of the integral operator associated with the kernel function, both of which reflect the difficulty of the estimation problem. Our main results quantitatively characterize how this prior information influences the convergence of T-kernel SGD. The numerical experiments further validate the theoretical findings presented in this paper.	翻訳日:2024-11-04 16:54:49 公開日:2024-10-04
# HarmoniCa: 拡散トランスフォーマーアクセラレーションにおけるより良い機能キャッシュのためのトレーニングと推論の調和 HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration ( http://arxiv.org/abs/2410.01723v1 ) ライセンス: Link先を確認	Yushi Huang, Zining Wang, Ruihao Gong, Jing Liu, Xinjie Zhang, Jinyang Guo, Xianglong Liu, Jun Zhang,	(参考訳) Diffusion Transformer (DiTs) は、生成タスクにおける優れたスケーラビリティと優れたパフォーマンスで有名である。しかし、そのかなりの推論コストは実践的な展開を妨げる。タイムステップ間で冗長な計算を保存および検索する機能キャッシュメカニズムは、拡散モデルにおけるステップごとの推論時間を削減することを約束する。 DiTの既存のキャッシュメソッドは手動で設計されている。学習ベースのアプローチは戦略を適応的に最適化しようとするが、トレーニングと推論の相違に悩まされ、パフォーマンスと加速度比の両方を損なう。より詳細な分析では,(1)事前の時間差,(2)早期のキャッシュ使用の影響を無視する事前の時間差,(2)訓練対象(各時間差の予測ノイズ)が推論目標(高品質な画像の生成)から逸脱する客観的なミスマッチ,の2点が主な特徴である。これらの相違を緩和するために,ステップワイズ・デノナイジング・トレーニング(SDT)とイメージエラー・プロキシ・ガイド・オブジェクト(IEPO)をベースとした新しい学習ベースキャッシング・フレームワークを用いて,トレーニングと推論を調和させる新しい手法であるHarmoniCaを提案する。従来のトレーニングパラダイムと比較すると、新たに提案されたSDTは、推論時の動作と同じように、トレーニング中の前のタイムステップからの情報を活用することができるように、デノナイジングプロセスの継続性を維持している。さらに,キャッシュされた特徴の再利用による最終的な画像誤差を近似するために,効率的なプロキシ機構を統合したIEPOを設計する。したがって、IEPOは最終的な画像品質とキャッシュ利用のバランスを保ち、各タイムステップで予測される出力に対するキャッシュ使用の影響のみを考慮したトレーニングの問題を解消する。 Diffusion Transformers (DiTs) have gained prominence for outstanding scalability and extraordinary performance in generative tasks. However, their considerable inference costs impede practical deployment. The feature cache mechanism, which involves storing and retrieving redundant computations across timesteps, holds promise for reducing per-step inference time in diffusion models. Most existing caching methods for DiT are manually designed. Although the learning-based approach attempts to optimize strategies adaptively, it suffers from discrepancies between training and inference, which hampers both the performance and acceleration ratio. Upon detailed analysis, we pinpoint that these discrepancies primarily stem from two aspects: (1) Prior Timestep Disregard, where training ignores the effect of cache usage at earlier timesteps, and (2) Objective Mismatch, where the training target (align predicted noise in each timestep) deviates from the goal of inference (generate the high-quality image). To alleviate these discrepancies, we propose HarmoniCa, a novel method that Harmonizes training and inference with a novel learning-based Caching framework built upon Step-Wise Denoising Training (SDT) and Image Error Proxy-Guided Objective (IEPO). Compared to the traditional training paradigm, the newly proposed SDT maintains the continuity of the denoising process, enabling the model to leverage information from prior timesteps during training, similar to the way it operates during inference. Furthermore, we design IEPO, which integrates an efficient proxy mechanism to approximate the final image error caused by reusing the cached feature. Therefore, IEPO helps balance final image quality and cache utilization, resolving the issue of training that only considers the impact of cache usage on the predicted output at each timestep.	翻訳日:2024-11-04 15:43:48 公開日:2024-10-04
# HarmoniCa: 拡散トランスフォーマーアクセラレーションにおけるより良い機能キャッシュのためのトレーニングと推論の調和 HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration ( http://arxiv.org/abs/2410.01723v2 ) ライセンス: Link先を確認	Yushi Huang, Zining Wang, Ruihao Gong, Jing Liu, Xinjie Zhang, Jun Zhang,	(参考訳) Diffusion Transformer (DiTs) は、生成タスクにおける優れたスケーラビリティと優れたパフォーマンスで有名である。しかし、そのかなりの推論コストは実践的な展開を妨げる。タイムステップ間で冗長な計算を保存および検索する機能キャッシュメカニズムは、拡散モデルにおけるステップごとの推論時間を削減することを約束する。 DiTの既存のキャッシュメソッドは手動で設計されている。学習ベースのアプローチは戦略を適応的に最適化しようとするが、トレーニングと推論の相違に悩まされ、パフォーマンスと加速度比の両方を損なう。より詳細な分析では,(1)事前の時間差,(2)早期のキャッシュ使用の影響を無視する事前の時間差,(2)訓練対象(各時間差の予測ノイズ)が推論目標(高品質な画像の生成)から逸脱する客観的なミスマッチ,の2点が主な特徴である。これらの相違を緩和するために,ステップワイズ・デノナイジング・トレーニング(SDT)とイメージエラー・プロキシ・ガイド・オブジェクト(IEPO)をベースとした新しい学習ベースキャッシング・フレームワークを用いて,トレーニングと推論を調和させる新しい手法であるHarmoniCaを提案する。従来のトレーニングパラダイムと比較すると、新たに提案されたSDTは、推論時の動作と同じように、トレーニング中の前のタイムステップからの情報を活用することができるように、デノナイジングプロセスの継続性を維持している。さらに,キャッシュされた特徴の再利用による最終的な画像誤差を近似するために,効率的なプロキシ機構を統合したIEPOを設計する。したがって、IEPOは最終的な画像品質とキャッシュ利用のバランスを保ち、各タイムステップで予測される出力に対するキャッシュ使用の影響のみを考慮したトレーニングの問題を解消する。 Diffusion Transformers (DiTs) have gained prominence for outstanding scalability and extraordinary performance in generative tasks. However, their considerable inference costs impede practical deployment. The feature cache mechanism, which involves storing and retrieving redundant computations across timesteps, holds promise for reducing per-step inference time in diffusion models. Most existing caching methods for DiT are manually designed. Although the learning-based approach attempts to optimize strategies adaptively, it suffers from discrepancies between training and inference, which hampers both the performance and acceleration ratio. Upon detailed analysis, we pinpoint that these discrepancies primarily stem from two aspects: (1) Prior Timestep Disregard, where training ignores the effect of cache usage at earlier timesteps, and (2) Objective Mismatch, where the training target (align predicted noise in each timestep) deviates from the goal of inference (generate the high-quality image). To alleviate these discrepancies, we propose HarmoniCa, a novel method that Harmonizes training and inference with a novel learning-based Caching framework built upon Step-Wise Denoising Training (SDT) and Image Error Proxy-Guided Objective (IEPO). Compared to the traditional training paradigm, the newly proposed SDT maintains the continuity of the denoising process, enabling the model to leverage information from prior timesteps during training, similar to the way it operates during inference. Furthermore, we design IEPO, which integrates an efficient proxy mechanism to approximate the final image error caused by reusing the cached feature. Therefore, IEPO helps balance final image quality and cache utilization, resolving the issue of training that only considers the impact of cache usage on the predicted output at each timestep.	翻訳日:2024-11-04 15:43:48 公開日:2024-10-04
# 極端イベントによる説明可能な地球表面の予測 Explainable Earth Surface Forecasting under Extreme Events ( http://arxiv.org/abs/2410.01770v1 ) ライセンス: Link先を確認	Oscar J. Pellicer-Valero, Miguel-Ángel Fernández-Torres, Chaonan Ji, Miguel D. Mahecha, Gustau Camps-Valls,	(参考訳) 気候変動に関連する極端な出来事が増加する中、高次元地球観測データは生態系への影響を予測し理解するためのユニークな機会となる。しかし、これは処理、視覚化、モデリング、データの説明の複雑さによって妨げられている。この課題にどのように対処できるかを示すために、私たちは、新しいDeepExtremeCubesデータセットに基づいて、畳み込みの長い短期メモリベースのアーキテクチャをトレーニングします。 DeepExtremeCubesには、世界中の4万件のSentinel-2ミニキューブ(2016年1月～2022年10月)と、極端な気象、気象データ、植生の土地被覆、地形図などが含まれている。カーネル正規化差分植生指標を用いて将来の反射率と植生の影響を予測すると、実験セットのR$^2$スコアが0.9055に達した。説明可能な人工知能は、2020年10月の中央南アメリカの複合熱波と干ばつイベントにおけるモデルの予測を分析するために使用された。その結果, 平均気温と表面圧力は, 通常の条件下での予測値として最適であることが判明した。対照的に、蒸発と表面潜熱フラックスの最小限の異常は、イベント中にリードを取る。イベント前の属性にもレギュレーションの変化が見られ、イベントが発生するまでの期間を評価するのに役立つかもしれない。この論文のすべての実験と数字を再現するコードはhttps://github.com/DeepExtremes/txyXAIで公開されている。 With climate change-related extreme events on the rise, high dimensional Earth observation data presents a unique opportunity for forecasting and understanding impacts on ecosystems. This is, however, impeded by the complexity of processing, visualizing, modeling, and explaining this data. To showcase how this challenge can be met, here we train a convolutional long short-term memory-based architecture on the novel DeepExtremeCubes dataset. DeepExtremeCubes includes around 40,000 long-term Sentinel-2 minicubes (January 2016-October 2022) worldwide, along with labeled extreme events, meteorological data, vegetation land cover, and topography map, sampled from locations affected by extreme climate events and surrounding areas. When predicting future reflectances and vegetation impacts through kernel normalized difference vegetation index, the model achieved an R$^2$ score of 0.9055 in the test set. Explainable artificial intelligence was used to analyze the model's predictions during the October 2020 Central South America compound heatwave and drought event. We chose the same area exactly one year before the event as counterfactual, finding that the average temperature and surface pressure are generally the best predictors under normal conditions. In contrast, minimum anomalies of evaporation and surface latent heat flux take the lead during the event. A change of regime is also observed in the attributions before the event, which might help assess how long the event was brewing before happening. The code to replicate all experiments and figures in this paper is publicly available at https://github.com/DeepExtremes/txyXAI	翻訳日:2024-11-04 15:24:19 公開日:2024-10-04
# 極端イベントによる説明可能な地球表面の予測 Explainable Earth Surface Forecasting under Extreme Events ( http://arxiv.org/abs/2410.01770v2 ) ライセンス: Link先を確認	Oscar J. Pellicer-Valero, Miguel-Ángel Fernández-Torres, Chaonan Ji, Miguel D. Mahecha, Gustau Camps-Valls,	(参考訳) 気候変動に関連する極端な出来事が増加する中、高次元地球観測データは生態系への影響を予測し理解するためのユニークな機会となる。しかし、これは処理、視覚化、モデリング、データの説明の複雑さによって妨げられている。この課題にどのように対処できるかを示すために、私たちは、新しいDeepExtremeCubesデータセットに基づいて、畳み込みの長い短期メモリベースのアーキテクチャをトレーニングします。 DeepExtremeCubesには、世界中の4万件のSentinel-2ミニキューブ(2016年1月～2022年10月)と、極端な気象、気象データ、植生の土地被覆、地形図などが含まれている。カーネル正規化差分植生指標を用いて将来の反射率と植生の影響を予測すると、実験セットのR$^2$スコアが0.9055に達した。説明可能な人工知能は、2020年10月の中央南アメリカの複合熱波と干ばつイベントにおけるモデルの予測を分析するために使用された。その結果, 平均気温と表面圧力は, 通常の条件下での予測値として最適であることが判明した。対照的に、蒸発と表面潜熱フラックスの最小限の異常は、イベント中にリードを取る。イベント前の属性にもレギュレーションの変化が見られ、イベントが発生するまでの期間を評価するのに役立つかもしれない。この論文のすべての実験と数字を再現するコードはhttps://github.com/DeepExtremes/txyXAIで公開されている。 With climate change-related extreme events on the rise, high dimensional Earth observation data presents a unique opportunity for forecasting and understanding impacts on ecosystems. This is, however, impeded by the complexity of processing, visualizing, modeling, and explaining this data. To showcase how this challenge can be met, here we train a convolutional long short-term memory-based architecture on the novel DeepExtremeCubes dataset. DeepExtremeCubes includes around 40,000 long-term Sentinel-2 minicubes (January 2016-October 2022) worldwide, along with labeled extreme events, meteorological data, vegetation land cover, and topography map, sampled from locations affected by extreme climate events and surrounding areas. When predicting future reflectances and vegetation impacts through kernel normalized difference vegetation index, the model achieved an R$^2$ score of 0.9055 in the test set. Explainable artificial intelligence was used to analyze the model's predictions during the October 2020 Central South America compound heatwave and drought event. We chose the same area exactly one year before the event as counterfactual, finding that the average temperature and surface pressure are generally the best predictors under normal conditions. In contrast, minimum anomalies of evaporation and surface latent heat flux take the lead during the event. A change of regime is also observed in the attributions before the event, which might help assess how long the event was brewing before happening. The code to replicate all experiments and figures in this paper is publicly available at https://github.com/DeepExtremes/txyXAI	翻訳日:2024-11-04 15:24:19 公開日:2024-10-04
# 言語モデルが推論に最適化されているとき、自動回帰のエンバーをまだ示しているのか? OpenAI o1の分析 When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 ( http://arxiv.org/abs/2410.01792v1 ) ライセンス: Link先を確認	R. Thomas McCoy, Shunyu Yao, Dan Friedman, Mathew D. Hardy, Thomas L. Griffiths,	(参考訳) We showed that several large language model (LLMs) have some important limit that are a origins in next-word prediction。ここでは,従来のLLMと異なり,推論に最適化されたOpenAIの新しいシステムであるo1について,これらの問題が継続するかどうかを検討する。多くの場合、o1 は従来の LLM よりも大幅に優れており、特に希少な共通タスク(例えば、最初の文字ではなく、リスト内の各単語の2番目の文字から頭字語を生成する)に大きく改善されている。しかし、これらの定量的改善にもかかわらず、o1は以前のシステムで観測したのと同じ定性的傾向を示している。具体的には、従来のLLMと同様、o1は例やタスクの確率に敏感で、高確率設定では低確率設定よりもパフォーマンスが良く、"トークンを考える"必要が少なくなっています。これらの結果は、推論のための言語モデルの最適化は緩和できるが、言語モデルの確率感度を完全に克服できないことを示している。 In "Embers of Autoregression" (McCoy et al., 2023), we showed that several large language models (LLMs) have some important limitations that are attributable to their origins in next-word prediction. Here we investigate whether these issues persist with o1, a new system from OpenAI that differs from previous LLMs in that it is optimized for reasoning. We find that o1 substantially outperforms previous LLMs in many cases, with particularly large improvements on rare variants of common tasks (e.g., forming acronyms from the second letter of each word in a list, rather than the first letter). Despite these quantitative improvements, however, o1 still displays the same qualitative trends that we observed in previous systems. Specifically, o1 - like previous LLMs - is sensitive to the probability of examples and tasks, performing better and requiring fewer "thinking tokens" in high-probability settings than in low-probability ones. These results show that optimizing a language model for reasoning can mitigate but might not fully overcome the language model's probability sensitivity.	翻訳日:2024-11-04 15:14:33 公開日:2024-10-04
# 言語モデルが推論に最適化されているとき、自動回帰のエンバーをまだ示しているのか? OpenAI o1の分析 When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 ( http://arxiv.org/abs/2410.01792v2 ) ライセンス: Link先を確認	R. Thomas McCoy, Shunyu Yao, Dan Friedman, Mathew D. Hardy, Thomas L. Griffiths,	(参考訳) We showed that several large language model (LLMs) have some important limit that are a origins in next-word prediction。ここでは,従来のLLMと異なり,推論に最適化されたOpenAIの新しいシステムであるo1について,これらの問題が継続するかどうかを検討する。多くの場合、o1 は従来の LLM よりも大幅に優れており、特に希少な共通タスク(例えば、最初の文字ではなく、リスト内の各単語の2番目の文字から頭字語を生成する)に大きく改善されている。しかし、これらの定量的改善にもかかわらず、o1は以前のシステムで観測したのと同じ定性的傾向を示している。具体的には、従来のLLMと同様、o1は例やタスクの確率に敏感で、高確率設定では低確率設定よりも優れたパフォーマンスと“トークンを考える”必要が少なくなっています。これらの結果は、推論のための言語モデルの最適化は緩和できるが、言語モデルの確率感度を完全に克服できないことを示している。 In "Embers of Autoregression" (McCoy et al., 2023), we showed that several large language models (LLMs) have some important limitations that are attributable to their origins in next-word prediction. Here we investigate whether these issues persist with o1, a new system from OpenAI that differs from previous LLMs in that it is optimized for reasoning. We find that o1 substantially outperforms previous LLMs in many cases, with particularly large improvements on rare variants of common tasks (e.g., forming acronyms from the second letter of each word in a list, rather than the first letter). Despite these quantitative improvements, however, o1 still displays the same qualitative trends that we observed in previous systems. Specifically, o1 -- like previous LLMs -- is sensitive to the probability of examples and tasks, performing better and requiring fewer "thinking tokens" in high-probability settings than in low-probability ones. These results show that optimizing a language model for reasoning can mitigate but might not fully overcome the language model's probability sensitivity.	翻訳日:2024-11-04 15:14:33 公開日:2024-10-04
# Bayes-CATSI:医療時系列データ計算のための変分ベイズ的アプローチ Bayes-CATSI: A variational Bayesian approach for medical time series data imputation ( http://arxiv.org/abs/2410.01847v1 ) ライセンス: Link先を確認	Omkar Kulkarni, Rohitash Chandra,	(参考訳) 医療時系列データセットは、データ計算方法を必要とする欠落した値を特徴としているが、従来の機械学習モデルは予測における不確実な定量化の欠如により不足している。これらのモデルの中で、CATSI(Context-Aware Time Series Imputation)は、各患者のグローバルな依存関係をキャプチャして、コンテキストベクトルをインプットプロセスに組み込むことで、その効果を際立たせる。本稿では,変分推論による不確実性定量化を利用したベイズ時間系列計算(Bayes-CATSI)フレームワークを提案する。脳波(EEG)、脳波(EOG)、筋電図(EMG)、心電図(EKG)の時系列を考察した。変分推論は後部分布の形状を仮定し、クルバック・リーバー(KL)の発散を最小化することで、真の後部分布に最も近い変分密度を求める。そこで我々は,変分ベイズディープラーニング層をCATSIモデルに統合した。その結果,ベイズCATSIは不確実な定量化を提供するだけでなく,CATSIモデルよりも優れた計算性能が得られることがわかった。具体的には、Bayes-CATSIのインスタンスはCATSIを9.57%上回っている。ベイズCATSIを他の医療データ計算問題に適用するためのオープンソースコード実装を提供する。 Medical time series datasets feature missing values that need data imputation methods, however, conventional machine learning models fall short due to a lack of uncertainty quantification in predictions. Among these models, the CATSI (Context-Aware Time Series Imputation) stands out for its effectiveness by incorporating a context vector into the imputation process, capturing the global dependencies of each patient. In this paper, we propose a Bayesian Context-Aware Time Series Imputation (Bayes-CATSI) framework which leverages uncertainty quantification offered by variational inference. We consider the time series derived from electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), electrocardiology (EKG). Variational Inference assumes the shape of the posterior distribution and through minimization of the Kullback-Leibler(KL) divergence it finds variational densities that are closest to the true posterior distribution. Thus , we integrate the variational Bayesian deep learning layers into the CATSI model. Our results show that Bayes-CATSI not only provides uncertainty quantification but also achieves superior imputation performance compared to the CATSI model. Specifically, an instance of Bayes-CATSI outperforms CATSI by 9.57 %. We provide an open-source code implementation for applying Bayes-CATSI to other medical data imputation problems.	翻訳日:2024-11-04 14:34:44 公開日:2024-10-04
# Bayes-CATSI:医療時系列データ計算のための変分ベイズ深層学習フレームワーク Bayes-CATSI: A variational Bayesian deep learning framework for medical time series data imputation ( http://arxiv.org/abs/2410.01847v2 ) ライセンス: Link先を確認	Omkar Kulkarni, Rohitash Chandra,	(参考訳) 医療時系列データセットは、データ計算方法を必要とする欠落した値を特徴としているが、従来の機械学習モデルは予測における不確実な定量化の欠如により不足している。これらのモデルの中で、CATSI(Context-Aware Time Series Imputation)は、各患者のグローバルな依存関係をキャプチャして、コンテキストベクトルをインプットプロセスに組み込むことで、その効果を際立たせる。本稿では,変分推論による不確実性定量化を利用したベイズ時間系列計算(Bayes-CATSI)フレームワークを提案する。脳波(EEG)、脳波(EOG)、筋電図(EMG)、心電図(EKG)の時系列を考察した。変分推論は後部分布の形状を仮定し、クルバック・リーバー(KL)の発散を最小化することで、真の後部分布に最も近い変分密度を求める。そこで我々は,変分ベイズディープラーニング層をCATSIモデルに統合した。その結果,ベイズCATSIは不確実な定量化を提供するだけでなく,CATSIモデルよりも優れた計算性能が得られることがわかった。具体的には、Bayes-CATSIのインスタンスはCATSIを9.57%上回っている。ベイズCATSIを他の医療データ計算問題に適用するためのオープンソースコード実装を提供する。 Medical time series datasets feature missing values that need data imputation methods, however, conventional machine learning models fall short due to a lack of uncertainty quantification in predictions. Among these models, the CATSI (Context-Aware Time Series Imputation) stands out for its effectiveness by incorporating a context vector into the imputation process, capturing the global dependencies of each patient. In this paper, we propose a Bayesian Context-Aware Time Series Imputation (Bayes-CATSI) framework which leverages uncertainty quantification offered by variational inference. We consider the time series derived from electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), electrocardiology (EKG). Variational Inference assumes the shape of the posterior distribution and through minimization of the Kullback-Leibler(KL) divergence it finds variational densities that are closest to the true posterior distribution. Thus , we integrate the variational Bayesian deep learning layers into the CATSI model. Our results show that Bayes-CATSI not only provides uncertainty quantification but also achieves superior imputation performance compared to the CATSI model. Specifically, an instance of Bayes-CATSI outperforms CATSI by 9.57 %. We provide an open-source code implementation for applying Bayes-CATSI to other medical data imputation problems.	翻訳日:2024-11-04 14:34:44 公開日:2024-10-04
# コンテンツ型分解による視覚基礎モデルの半監督的微調整 Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition ( http://arxiv.org/abs/2410.02069v1 ) ライセンス: Link先を確認	Mariia Drozdova, Vitaliy Kinakh, Yury Belousov, Erica Lastufka, Slava Voloshynovskiy,	(参考訳) 本稿では,限定ラベル付きデータを用いた下流タスクにおける基礎モデルの性能向上を目的とした,半教師付き微調整手法を提案する。情報理論フレームワーク内でのコンテンツスタイルの分解を利用して、事前学習された視覚基盤モデルの潜在表現を強化し、特定のタスク目標とより効果的に整合させ、分散シフトの問題に対処する。我々は、MNIST、その拡張されたバリエーション(黄色と白のストライプ)、CIFAR-10、SVHN、GalaxyMNISTを含む複数のデータセットに対するアプローチを評価した。実験は、純粋な教師付きベースライン、特に低ラベルのデータレギュレーションにおいて、テストされたデータセットの大部分に対して、凍結されたバックボーンとトレーニング可能なバックボーンの両方で改善されていることを示す。 In this paper, we present a semi-supervised fine-tuning approach designed to improve the performance of foundation models on downstream tasks with limited labeled data. By leveraging content-style decomposition within an information-theoretic framework, our method enhances the latent representations of pre-trained vision foundation models, aligning them more effectively with specific task objectives and addressing the problem of distribution shift. We evaluate our approach on multiple datasets, including MNIST, its augmented variations (with yellow and white stripes), CIFAR-10, SVHN, and GalaxyMNIST. The experiments show improvements over purely supervised baselines, particularly in low-labeled data regimes, across both frozen and trainable backbones for the majority of the tested datasets.	翻訳日:2024-11-04 09:05:40 公開日:2024-10-04
# コンテンツ型分解による視覚基礎モデルの半監督的微調整 Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition ( http://arxiv.org/abs/2410.02069v2 ) ライセンス: Link先を確認	Mariia Drozdova, Vitaliy Kinakh, Yury Belousov, Erica Lastufka, Slava Voloshynovskiy,	(参考訳) 本稿では,ラベル付きデータに制限のある下流タスクにおいて,事前学習した基礎モデルの性能向上を目的とした半教師付き微調整手法を提案する。情報理論フレームワーク内でのコンテンツスタイルの分解を利用して、事前学習された視覚基盤モデルの潜在表現を強化し、特定のタスク目標とより効果的に整合させ、分散シフトの問題に対処する。我々は、MNIST、その拡張されたバリエーション(黄色と白のストライプ)、CIFAR-10、SVHN、GalaxyMNISTを含む複数のデータセットに対するアプローチを評価した。実験では、トレーニング済みモデルの教師付き微調整ベースライン、特に低ラベルのデータレギュレーションにおいて、テスト済みデータセットの大部分の凍結およびトレーニング可能なバックボーンに対して改善が示されている。 In this paper, we present a semi-supervised fine-tuning approach designed to improve the performance of pre-trained foundation models on downstream tasks with limited labeled data. By leveraging content-style decomposition within an information-theoretic framework, our method enhances the latent representations of pre-trained vision foundation models, aligning them more effectively with specific task objectives and addressing the problem of distribution shift. We evaluate our approach on multiple datasets, including MNIST, its augmented variations (with yellow and white stripes), CIFAR-10, SVHN, and GalaxyMNIST. The experiments show improvements over supervised finetuning baseline of pre-trained models, particularly in low-labeled data regimes, across both frozen and trainable backbones for the majority of the tested datasets.	翻訳日:2024-11-04 09:05:40 公開日:2024-10-04
# EC-DIT:Adaptive Expert-Choice Routingによる拡散変換器のスケーリング EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing ( http://arxiv.org/abs/2410.02098v1 ) ライセンス: Link先を確認	Haotian Sun, Bowen Zhang, Yanghao Li, Haoshuo Huang, Tao Lei, Ruoming Pang, Bo Dai, Nan Du,	(参考訳) 拡散変換器はテキストと画像の合成に広く採用されている。これらのモデルを数十億のパラメータにスケールすることは、有望であることを示しているが、現在のサイズを超えてスケールする効果は、過小評価され、困難なままである。画像生成の計算的不均一性を明示的に活用することにより、エキスパート・チョイス・ルーティングを持つ拡散トランスフォーマーのためのMixture-of-Experts(MoE)モデル(EC-DIT)を新たに開発する。 EC-DITは、入力テキストを理解するために割り当てられた計算を適応的に最適化し、各画像パッチを生成する。この異質性は、EC-DITを最大97億のパラメータにスケーリングする効率的な方法を提供し、トレーニング収束、テキスト・ツー・イメージアライメント、および高密度モデルや従来のMoEモデルよりも全体的な生成品質を大幅に向上させる。本稿では,EC-DITによる拡張性と適応性を示すため,エンド・ツー・エンド・トレーニングによるテキストの重要度を認識した。特に,テキストと画像のアライメント評価では,最先端のGenEvalスコアが71.68%に達し,直感的に解釈可能な競合推論速度を維持している。 Diffusion transformers have been widely adopted for text-to-image synthesis. While scaling these models up to billions of parameters shows promise, the effectiveness of scaling beyond current sizes remains underexplored and challenging. By explicitly exploiting the computational heterogeneity of image generations, we develop a new family of Mixture-of-Experts (MoE) models (EC-DIT) for diffusion transformers with expert-choice routing. EC-DIT learns to adaptively optimize the compute allocated to understand the input texts and generate the respective image patches, enabling heterogeneous computation aligned with varying text-image complexities. This heterogeneity provides an efficient way of scaling EC-DIT up to 97 billion parameters and achieving significant improvements in training convergence, text-to-image alignment, and overall generation quality over dense models and conventional MoE models. Through extensive ablations, we show that EC-DIT demonstrates superior scalability and adaptive compute allocation by recognizing varying textual importance through end-to-end training. Notably, in text-to-image alignment evaluation, our largest models achieve a state-of-the-art GenEval score of 71.68% and still maintain competitive inference speed with intuitive interpretability.	翻訳日:2024-11-04 08:55:37 公開日:2024-10-04
# EC-DIT:Adaptive Expert-Choice Routingによる拡散変換器のスケーリング EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing ( http://arxiv.org/abs/2410.02098v2 ) ライセンス: Link先を確認	Haotian Sun, Tao Lei, Bowen Zhang, Yanghao Li, Haoshuo Huang, Ruoming Pang, Bo Dai, Nan Du,	(参考訳) 拡散変換器はテキストと画像の合成に広く採用されている。これらのモデルを数十億のパラメータにスケールすることは、有望であることを示しているが、現在のサイズを超えてスケールする効果は、過小評価され、困難なままである。画像生成の計算的不均一性を明示的に活用することにより、エキスパート・チョイス・ルーティングを持つ拡散トランスフォーマーのためのMixture-of-Experts(MoE)モデル(EC-DIT)を新たに開発する。 EC-DITは、入力テキストを理解するために割り当てられた計算を適応的に最適化し、各画像パッチを生成する。この異質性は、EC-DITを最大97億のパラメータにスケーリングする効率的な方法を提供し、トレーニング収束、テキスト・ツー・イメージアライメント、および高密度モデルや従来のMoEモデルよりも全体的な生成品質を大幅に向上させる。本稿では,EC-DITによる拡張性と適応性を示すため,エンド・ツー・エンド・トレーニングによるテキストの重要度を認識した。特に,テキストと画像のアライメント評価では,最先端のGenEvalスコアが71.68%に達し,直感的に解釈可能な競合推論速度を維持している。 Diffusion transformers have been widely adopted for text-to-image synthesis. While scaling these models up to billions of parameters shows promise, the effectiveness of scaling beyond current sizes remains underexplored and challenging. By explicitly exploiting the computational heterogeneity of image generations, we develop a new family of Mixture-of-Experts (MoE) models (EC-DIT) for diffusion transformers with expert-choice routing. EC-DIT learns to adaptively optimize the compute allocated to understand the input texts and generate the respective image patches, enabling heterogeneous computation aligned with varying text-image complexities. This heterogeneity provides an efficient way of scaling EC-DIT up to 97 billion parameters and achieving significant improvements in training convergence, text-to-image alignment, and overall generation quality over dense models and conventional MoE models. Through extensive ablations, we show that EC-DIT demonstrates superior scalability and adaptive compute allocation by recognizing varying textual importance through end-to-end training. Notably, in text-to-image alignment evaluation, our largest models achieve a state-of-the-art GenEval score of 71.68% and still maintain competitive inference speed with intuitive interpretability.	翻訳日:2024-11-04 08:45:48 公開日:2024-10-04
# L-CiteEval: ロングコンテキストモデルは応答するコンテキストを真に活用するのか? L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? ( http://arxiv.org/abs/2410.02115v1 ) ライセンス: Link先を確認	Zecheng Tang, Keyan Zhou, Juntao Li, Baibei Ji, Jianye Hou, Min Zhang,	(参考訳) 近年、LCM(Long-context Model)は、文書要約などの長いコンテキストを含むタスクを扱うための利便性をユーザに提供することで、顕著な進歩を遂げている。コミュニティが生成結果の忠実さをますます優先するにつれて、LCM出力の正確性を保証するだけでは不十分であり、極めて長いコンテキストから結果を検証することは極めて困難である。しかし,L-CiteEval(L-CiteEval,L-CiteEval)は,L-CiteEval(L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval, L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L -CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L- CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteE,L-C,L-CiteE,L- L-CiteEvalは、コンテキストの長さを8Kから48Kまで、さまざまなドメインから11のタスクをカバーし、完全に自動化された評価スイートを提供する。 11個の最先端のクローズドソースおよびオープンソースLCMを用いてテストした結果、これらのモデルは生成された結果に小さな違いがあるものの、オープンソースモデルは引用精度とリコールの点でクローズドソースモデルよりもかなり遅れていることがわかった。これは、現在のオープンソースのLCMは、与えられたコンテキストよりも、その固有の知識に基づいて応答する傾向があり、実用的なアプリケーションにおけるユーザエクスペリエンスに重大なリスクを及ぼすことを示唆している。また,RAGアプローチを評価し,RAGは生成品質をわずかに低下させることなく,LCMの忠実度を著しく向上させることができることを観察した。さらに,LCMの注意機構と引用生成過程の相関関係を見いだした。 Long-context models (LCMs) have made remarkable strides in recent years, offering users great convenience for handling tasks that involve long context, such as document summarization. As the community increasingly prioritizes the faithfulness of generated results, merely ensuring the accuracy of LCM outputs is insufficient, as it is quite challenging for humans to verify the results from the extremely lengthy context. Yet, although some efforts have been made to assess whether LCMs respond truly based on the context, these works either are limited to specific tasks or heavily rely on external evaluation resources like GPT-4.In this work, we introduce L-CiteEval, a comprehensive multi-task benchmark for long-context understanding with citations, aiming to evaluate both the understanding capability and faithfulness of LCMs. L-CiteEval covers 11 tasks from diverse domains, spanning context lengths from 8K to 48K, and provides a fully automated evaluation suite. Through testing with 11 cutting-edge closed-source and open-source LCMs, we find that although these models show minor differences in their generated results, open-source models substantially trail behind their closed-source counterparts in terms of citation accuracy and recall. This suggests that current open-source LCMs are prone to responding based on their inherent knowledge rather than the given context, posing a significant risk to the user experience in practical applications. We also evaluate the RAG approach and observe that RAG can significantly improve the faithfulness of LCMs, albeit with a slight decrease in the generation quality. Furthermore, we discover a correlation between the attention mechanisms of LCMs and the citation generation process.	翻訳日:2024-11-04 08:45:48 公開日:2024-10-04
# L-CiteEval: ロングコンテキストモデルは応答するコンテキストを真に活用するのか? L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? ( http://arxiv.org/abs/2410.02115v2 ) ライセンス: Link先を確認	Zecheng Tang, Keyan Zhou, Juntao Li, Baibei Ji, Jianye Hou, Min Zhang,	(参考訳) 近年、LCM(Long-context Model)は、文書要約などの長いコンテキストを含むタスクを扱うための利便性をユーザに提供することで、顕著な進歩を遂げている。コミュニティが生成結果の忠実さをますます優先するにつれて、LCM出力の正確性を保証するだけでは不十分であり、極めて長いコンテキストから結果を検証することは極めて困難である。しかし,L-CiteEval(L-CiteEval,L-CiteEval,L-CiteEval)は,L-CiteEval(L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval, L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L -CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L- CiteEval,L-L-CiteEval,L-CiteEval,L-CiteEval,L-C,L-C,L-CiteE,L- L-CiteEvalは、コンテキストの長さを8Kから48Kまで、さまざまなドメインから11のタスクをカバーし、完全に自動化された評価スイートを提供する。 11個の最先端のクローズドソースおよびオープンソースLCMを用いてテストした結果、これらのモデルは生成された結果に小さな違いがあるものの、オープンソースモデルは引用精度とリコールの点でクローズドソースモデルよりもかなり遅れていることがわかった。これは、現在のオープンソースのLCMは、与えられたコンテキストよりも、その固有の知識に基づいて応答する傾向があり、実用的なアプリケーションにおけるユーザエクスペリエンスに重大なリスクを及ぼすことを示唆している。また,RAGアプローチを評価し,RAGは生成品質をわずかに低下させることなく,LCMの忠実度を著しく向上させることができることを観察した。さらに,LCMの注意機構と引用生成過程の相関関係を見いだした。 Long-context models (LCMs) have made remarkable strides in recent years, offering users great convenience for handling tasks that involve long context, such as document summarization. As the community increasingly prioritizes the faithfulness of generated results, merely ensuring the accuracy of LCM outputs is insufficient, as it is quite challenging for humans to verify the results from the extremely lengthy context. Yet, although some efforts have been made to assess whether LCMs respond truly based on the context, these works either are limited to specific tasks or heavily rely on external evaluation resources like GPT4.In this work, we introduce L-CiteEval, a comprehensive multi-task benchmark for long-context understanding with citations, aiming to evaluate both the understanding capability and faithfulness of LCMs. L-CiteEval covers 11 tasks from diverse domains, spanning context lengths from 8K to 48K, and provides a fully automated evaluation suite. Through testing with 11 cutting-edge closed-source and open-source LCMs, we find that although these models show minor differences in their generated results, open-source models substantially trail behind their closed-source counterparts in terms of citation accuracy and recall. This suggests that current open-source LCMs are prone to responding based on their inherent knowledge rather than the given context, posing a significant risk to the user experience in practical applications. We also evaluate the RAG approach and observe that RAG can significantly improve the faithfulness of LCMs, albeit with a slight decrease in the generation quality. Furthermore, we discover a correlation between the attention mechanisms of LCMs and the citation generation process.	翻訳日:2024-11-04 08:45:48 公開日:2024-10-04
# 構造行列連続空間上の効率的な線形層探索 Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices ( http://arxiv.org/abs/2410.02117v1 ) ライセンス: Link先を確認	Andres Potapczynski, Shikai Qiu, Marc Finzi, Christopher Ferri, Zixi Chen, Micah Goldblum, Bayan Bruss, Christopher De Sa, Andrew Gordon Wilson,	(参考訳) 密度線形層は、大規模ニューラルネットワークにおいて支配的な計算ボトルネックであり、より効率的な代替手段の必要性を示唆している。従来は少数の手作り構造体に重点を置き、モデルサイズとトレーニング例の両方を最適に割り当てたときに、これらの構造が計算最適スケーリング法則で高密度層を超過できるかを調査した。本研究では,アインシュタイン和を通じて表現可能なすべての線形作用素の探索を可能にする統一フレームワークを提案する。このフレームワークは、低ランク、クローネッカー、テンソル・トレイン、ブロック・テンソル・トレイン(BTT)、モナールなど、これまでに提案された多くの構造を含む。この枠組みを解析するために、計算的および代数的特性に基づく全ての演算子の分類を開発し、計算-最適スケーリング法則の違いは、導入した少数の変数によって主に支配されていることを示す。つまり、小さな$\omega$(パラメータの共有を計測する)と大きな$\psi$(ランクを計測する)は、確実にスケーリング法則の改善につながった。計算単位あたりのパラメータを最大化するフルランク構造が最適であるという知見に導かれて,BTT構造における計算のスパース化によって得られる新しいMixture-of-Experts (MoE)アーキテクチャであるBTT-MoEを提案する。フィードフォワードネットワーク全体の標準スパースMoEとは対照的に、BTT-MoEは、アテンションブロック内の投影行列を含むモデルのすべての線形層におけるMoEを学習する。 BTT-MoEは高密度層や標準MoEに比べて計算効率が大幅に向上することがわかった。 Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are optimally allocated. In this work, we present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses many previously proposed structures, such as low-rank, Kronecker, Tensor-Train, Block Tensor-Train (BTT), and Monarch, along with many novel structures. To analyze the framework, we develop a taxonomy of all such operators based on their computational and algebraic properties and show that differences in the compute-optimal scaling laws are mostly governed by a small number of variables that we introduce. Namely, a small $\omega$ (which measures parameter sharing) and large $\psi$ (which measures the rank) reliably led to better scaling laws. Guided by the insight that full-rank structures that maximize parameters per unit of compute perform the best, we propose BTT-MoE, a novel Mixture-of-Experts (MoE) architecture obtained by sparsifying computation in the BTT structure. In contrast to the standard sparse MoE for each entire feed-forward network, BTT-MoE learns an MoE in every single linear layer of the model, including the projection matrices in the attention blocks. We find BTT-MoE provides a substantial compute-efficiency gain over dense layers and standard MoE.	翻訳日:2024-11-04 08:45:48 公開日:2024-10-04
# 構造行列連続空間上の効率的な線形層探索 Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices ( http://arxiv.org/abs/2410.02117v2 ) ライセンス: Link先を確認	Andres Potapczynski, Shikai Qiu, Marc Finzi, Christopher Ferri, Zixi Chen, Micah Goldblum, Bayan Bruss, Christopher De Sa, Andrew Gordon Wilson,	(参考訳) 密度線形層は、大規模ニューラルネットワークにおいて支配的な計算ボトルネックであり、より効率的な代替手段の必要性を示唆している。従来は少数の手作り構造体に重点を置き、モデルサイズとトレーニング例の両方を最適に割り当てたときに、これらの構造が計算最適スケーリング法則で高密度層を超過できるかを調査した。本研究では,アインシュタイン和を通じて表現可能なすべての線形作用素の探索を可能にする統一フレームワークを提案する。このフレームワークは、低ランク、クローネッカー、テンソル・トレイン、ブロック・テンソル・トレイン(BTT)、モナールなど、これまでに提案された多くの構造を含む。この枠組みを解析するために、計算的および代数的特性に基づく全ての演算子の分類を開発し、計算-最適スケーリング法則の違いは、導入した少数の変数によって主に支配されていることを示す。つまり、小さな$\omega$(パラメータの共有を計測する)と大きな$\psi$(ランクを計測する)は、確実にスケーリング法則の改善につながった。計算単位あたりのパラメータを最大化するフルランク構造が最適であるという知見に導かれて,BTT構造における計算のスパース化によって得られる新しいMixture-of-Experts (MoE)アーキテクチャであるBTT-MoEを提案する。フィードフォワードネットワーク全体の標準スパースMoEとは対照的に、BTT-MoEは、アテンションブロック内の投影行列を含むモデルのすべての線形層におけるMoEを学習する。 BTT-MoEは高密度層や標準MoEに比べて計算効率が大幅に向上することがわかった。 Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are optimally allocated. In this work, we present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses many previously proposed structures, such as low-rank, Kronecker, Tensor-Train, Block Tensor-Train (BTT), and Monarch, along with many novel structures. To analyze the framework, we develop a taxonomy of all such operators based on their computational and algebraic properties and show that differences in the compute-optimal scaling laws are mostly governed by a small number of variables that we introduce. Namely, a small $\omega$ (which measures parameter sharing) and large $\psi$ (which measures the rank) reliably led to better scaling laws. Guided by the insight that full-rank structures that maximize parameters per unit of compute perform the best, we propose BTT-MoE, a novel Mixture-of-Experts (MoE) architecture obtained by sparsifying computation in the BTT structure. In contrast to the standard sparse MoE for each entire feed-forward network, BTT-MoE learns an MoE in every single linear layer of the model, including the projection matrices in the attention blocks. We find BTT-MoE provides a substantial compute-efficiency gain over dense layers and standard MoE.	翻訳日:2024-11-04 08:45:48 公開日:2024-10-04
# C-MELT:ECG-Language事前学習のためのコントラスト強化マスク付きオートエンコーダ C-MELT: Contrastive Enhanced Masked Auto-Encoders for ECG-Language Pre-Training ( http://arxiv.org/abs/2410.02131v1 ) ライセンス: Link先を確認	Manh Pham, Aaqib Saeed, Dong Ma,	(参考訳) 心電図(ECG)信号の正確な解釈は心血管疾患の診断に重要である。 ECG信号と付随するテキストレポートを統合することは、生理学的データと質的な洞察を組み合わせることで臨床診断を強化する大きな可能性を秘めている。しかし、この統合は、固有のモダリティの相違と、堅牢なクロスモーダル学習のためのラベル付きデータの不足により、大きな課題に直面している。これらの障害に対処するために,コントラッシブマスク付きオートエンコーダアーキテクチャを用いて,ECGとテキストデータを事前学習する新しいフレームワークであるC-MELTを提案する。 C-MELTは、生成性の強さと識別能力の強化を一意に組み合わせて、堅牢なクロスモーダル表現を実現する。これは、マスク付きモダリティモデリング、特殊損失関数、およびクロスモーダルアライメントに適した改善されたネガティブサンプリング戦略によって達成される。さまざまなダウンストリームタスクにわたる5つの公開データセットに対する大規模な実験により、C-MELTは既存の手法よりも大幅に優れており、それぞれ、最先端モデルよりも線形プローブとゼロショットのパフォーマンスが15%、2%向上していることが示された。これらの結果はC-MELTの有効性を浮き彫りにしており, マルチモーダル表現による臨床診断の進歩の可能性を示している。 Accurate interpretation of Electrocardiogram (ECG) signals is pivotal for diagnosing cardiovascular diseases. Integrating ECG signals with their accompanying textual reports holds immense potential to enhance clinical diagnostics through the combination of physiological data and qualitative insights. However, this integration faces significant challenges due to inherent modality disparities and the scarcity of labeled data for robust cross-modal learning. To address these obstacles, we propose C-MELT, a novel framework that pre-trains ECG and text data using a contrastive masked auto-encoder architecture. C-MELT uniquely combines the strengths of generative with enhanced discriminative capabilities to achieve robust cross-modal representations. This is accomplished through masked modality modeling, specialized loss functions, and an improved negative sampling strategy tailored for cross-modal alignment. Extensive experiments on five public datasets across diverse downstream tasks demonstrate that C-MELT significantly outperforms existing methods, achieving 15% and 2% increases in linear probing and zero-shot performance over state-of-the-art models, respectively. These results highlight the effectiveness of C-MELT, underscoring its potential to advance automated clinical diagnostics through multi-modal representations.	翻訳日:2024-11-04 08:35:44 公開日:2024-10-04
# C-MELT:ECG-Language事前学習のためのコントラスト強化マスク付きオートエンコーダ C-MELT: Contrastive Enhanced Masked Auto-Encoders for ECG-Language Pre-Training ( http://arxiv.org/abs/2410.02131v2 ) ライセンス: Link先を確認	Manh Pham, Aaqib Saeed, Dong Ma,	(参考訳) 心電図(ECG)信号の正確な解釈は心血管疾患の診断に重要である。 ECG信号と付随するテキストレポートを統合することは、生理学的データと質的な洞察を組み合わせることで臨床診断を強化する大きな可能性を秘めている。しかし、この統合は、固有のモダリティの相違と、堅牢なクロスモーダル学習のためのラベル付きデータの不足により、大きな課題に直面している。これらの障害に対処するために,コントラッシブマスク付きオートエンコーダアーキテクチャを用いて,ECGとテキストデータを事前学習する新しいフレームワークであるC-MELTを提案する。 C-MELTは、生成性の強さと識別能力の強化を一意に組み合わせて、堅牢なクロスモーダル表現を実現する。これは、マスク付きモダリティモデリング、特殊損失関数、およびクロスモーダルアライメントに適した改善されたネガティブサンプリング戦略によって達成される。さまざまなダウンストリームタスクにわたる5つの公開データセットに対する大規模な実験により、C-MELTは既存の手法よりも大幅に優れており、それぞれ、最先端モデルよりも線形プローブとゼロショットのパフォーマンスが15%、2%向上していることが示された。これらの結果はC-MELTの有効性を浮き彫りにしており, マルチモーダル表現による臨床診断の進歩の可能性を示している。 Accurate interpretation of Electrocardiogram (ECG) signals is pivotal for diagnosing cardiovascular diseases. Integrating ECG signals with their accompanying textual reports holds immense potential to enhance clinical diagnostics through the combination of physiological data and qualitative insights. However, this integration faces significant challenges due to inherent modality disparities and the scarcity of labeled data for robust cross-modal learning. To address these obstacles, we propose C-MELT, a novel framework that pre-trains ECG and text data using a contrastive masked auto-encoder architecture. C-MELT uniquely combines the strengths of generative with enhanced discriminative capabilities to achieve robust cross-modal representations. This is accomplished through masked modality modeling, specialized loss functions, and an improved negative sampling strategy tailored for cross-modal alignment. Extensive experiments on five public datasets across diverse downstream tasks demonstrate that C-MELT significantly outperforms existing methods, achieving 15% and 2% increases in linear probing and zero-shot performance over state-of-the-art models, respectively. These results highlight the effectiveness of C-MELT, underscoring its potential to advance automated clinical diagnostics through multi-modal representations.	翻訳日:2024-11-04 08:35:44 公開日:2024-10-04
# ピクセルからトークンへ:量子化された視覚モーダルのバイトペアエンコーディング From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities ( http://arxiv.org/abs/2410.02155v1 ) ライセンス: Link先を確認	Wanpeng Zhang, Zilong Xie, Yicheng Feng, Yijiang Li, Xingrun Xing, Sipeng Zheng, Zongqing Lu,	(参考訳) マルチモーダル大言語モデルは、視覚情報とテキスト情報を統合するために大きな進歩を遂げてきたが、これらのモダリティを効果的に整合させるのにしばしば苦労している。本稿では,BPE(Byte-Pair Encoding)の原理を視覚データに適用することにより,このギャップを埋める新しい画像トークンを提案する。視覚的エンコーダを分離する従来の手法とは異なり、本手法は構造的事前情報を画像トークンに直接組み込んで、テキストのみの大規模言語モデルで使われるトークン化戦略を模倣する。この革新的なアプローチにより、Transformerモデルはモダリティをより効果的に学習し、推論することができる。理論的解析と広範な実験により,BPEイメージトケナイザは,限られたトレーニングデータであっても,MLLMのマルチモーダル理解能力を著しく向上させることを示した。提案手法は,様々なベンチマークにおける性能向上だけでなく,有望なスケーラビリティを示すとともに,より効率的かつ有能なマルチモーダル基盤モデルの実現にも寄与する可能性がある。 Multimodal Large Language Models have made significant strides in integrating visual and textual information, yet they often struggle with effectively aligning these modalities. We introduce a novel image tokenizer that bridges this gap by applying the principle of Byte-Pair Encoding (BPE) to visual data. Unlike conventional approaches that rely on separate visual encoders, our method directly incorporates structural prior information into image tokens, mirroring the successful tokenization strategies used in text-only Large Language Models. This innovative approach enables Transformer models to more effectively learn and reason across modalities. Through theoretical analysis and extensive experiments, we demonstrate that our BPE Image Tokenizer significantly enhances MLLMs' multimodal understanding capabilities, even with limited training data. Our method not only improves performance across various benchmarks but also shows promising scalability, potentially paving the way for more efficient and capable multimodal foundation models.	翻訳日:2024-11-04 08:25:54 公開日:2024-10-04
# ピクセルからトークンへ:量子化された視覚モーダルのバイトペアエンコーディング From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities ( http://arxiv.org/abs/2410.02155v2 ) ライセンス: Link先を確認	Wanpeng Zhang, Zilong Xie, Yicheng Feng, Yijiang Li, Xingrun Xing, Sipeng Zheng, Zongqing Lu,	(参考訳) マルチモーダル大言語モデルは、視覚情報とテキスト情報を統合するために大きな進歩を遂げてきたが、これらのモダリティを効果的に整合させるのにしばしば苦労している。本稿では,BPE(Byte-Pair Encoding)の原理を視覚データに適用することにより,このギャップを埋める新しい画像トークンを提案する。視覚的エンコーダを分離する従来の手法とは異なり、本手法は構造的事前情報を画像トークンに直接組み込んで、テキストのみの大規模言語モデルで使われるトークン化戦略を模倣する。この革新的なアプローチにより、Transformerモデルはモダリティをより効果的に学習し、推論することができる。理論的解析と広範な実験により,BPEイメージトケナイザは,限られたトレーニングデータであっても,MLLMのマルチモーダル理解能力を著しく向上させることを示した。提案手法は,様々なベンチマークにおける性能向上だけでなく,有望なスケーラビリティを示すとともに,より効率的かつ有能なマルチモーダル基盤モデルの実現にも寄与する可能性がある。 Multimodal Large Language Models have made significant strides in integrating visual and textual information, yet they often struggle with effectively aligning these modalities. We introduce a novel image tokenizer that bridges this gap by applying the principle of Byte-Pair Encoding (BPE) to visual data. Unlike conventional approaches that rely on separate visual encoders, our method directly incorporates structural prior information into image tokens, mirroring the successful tokenization strategies used in text-only Large Language Models. This innovative approach enables Transformer models to more effectively learn and reason across modalities. Through theoretical analysis and extensive experiments, we demonstrate that our BPE Image Tokenizer significantly enhances MLLMs' multimodal understanding capabilities, even with limited training data. Our method not only improves performance across various benchmarks but also shows promising scalability, potentially paving the way for more efficient and capable multimodal foundation models.	翻訳日:2024-11-04 08:25:54 公開日:2024-10-04
# POSIX: 大規模言語モデルのための素早い感度指数 POSIX: A Prompt Sensitivity Index For Large Language Models ( http://arxiv.org/abs/2410.02185v1 ) ライセンス: Link先を確認	Anwoy Chatterjee, H S V N S Kowndinya Renduchintala, Sumit Bhatia, Tanmoy Chakraborty,	(参考訳) その顕著な能力にもかかわらず、LLM(Large Language Models)はプロンプトの小さなバリエーションに驚くほど敏感であり、スペルエラー、単語の変更、プロンプトテンプレートなどのプロンプトの小さなバリエーションに応答して、かなり異なる出力を生成することが多い。しかしながら、LLMの品質を評価する一方で、ダウンストリームタスクにおけるパフォーマンスのみに焦点をあてる傾向があり、センシティブに注意を払わないことが多い。このギャップを埋めるため,新しいPrOmpt Sensitivity IndeXのPOSIXを提案する。 POSIXの背景にある重要な考え方は、対応するプロンプトを異なるインテント保存プロンプトに置き換えることによって、所定の応答のログ化の相対的な変化を捉えることである。本研究はPOSIXの迅速な感度測定における有効性を実証する実験的な証拠を提供する。パラメータ数の増加や命令のチューニングだけでは即発感度を低下させるわけではないが、数発の例を1回だけ追加しても、ほぼ常に即発感度を低下させる。また,テンプレートの更新がMCQ型タスクでは最も感度が高いのに対して,パラフレーズ化はオープンな生成タスクでは最も感度が高いことが判明した。結果の再現コードはhttps://github.com/kowndinyarenduchintala/POSIX.comで公開されている。 Despite their remarkable capabilities, Large Language Models (LLMs) are found to be surprisingly sensitive to minor variations in prompts, often generating significantly divergent outputs in response to minor variations in the prompts, such as spelling errors, alteration of wording or the prompt template. However, while assessing the quality of an LLM, the focus often tends to be solely on its performance on downstream tasks, while very little to no attention is paid to prompt sensitivity. To fill this gap, we propose POSIX - a novel PrOmpt Sensitivity IndeX as a reliable measure of prompt sensitivity, thereby offering a more comprehensive evaluation of LLM performance. The key idea behind POSIX is to capture the relative change in loglikelihood of a given response upon replacing the corresponding prompt with a different intent-preserving prompt. We provide thorough empirical evidence demonstrating the efficacy of POSIX in capturing prompt sensitivity and subsequently use it to measure and thereby compare prompt sensitivity of various open-source LLMs. We find that merely increasing the parameter count or instruction tuning does not necessarily reduce prompt sensitivity whereas adding some few-shot exemplars, even just one, almost always leads to significant decrease in prompt sensitivity. We also find that alterations to prompt template lead to the highest sensitivity in the case of MCQtype tasks, whereas paraphrasing results in the highest sensitivity in open-ended generation tasks. The code for reproducing our results is open-sourced at https://github.com/kowndinyarenduchintala/POSIX.	翻訳日:2024-11-04 08:15:54 公開日:2024-10-04
# POSIX: 大規模言語モデルのための素早い感度指数 POSIX: A Prompt Sensitivity Index For Large Language Models ( http://arxiv.org/abs/2410.02185v2 ) ライセンス: Link先を確認	Anwoy Chatterjee, H S V N S Kowndinya Renduchintala, Sumit Bhatia, Tanmoy Chakraborty,	(参考訳) その顕著な能力にもかかわらず、LLM(Large Language Models)はプロンプトの小さなバリエーションに驚くほど敏感であり、スペルエラー、単語の変更、プロンプトテンプレートなどのプロンプトの小さなバリエーションに応答して、かなり異なる出力を生成することが多い。しかしながら、LLMの品質を評価する一方で、ダウンストリームタスクにおけるパフォーマンスのみに焦点をあてる傾向があり、センシティブに注意を払わないことが多い。このギャップを埋めるため,新しいPrOmpt Sensitivity IndeXのPOSIXを提案する。 POSIXの背景にある重要な考え方は、対応するプロンプトを異なるインテント保存プロンプトに置き換えることによって、所定の応答のログ化の相対的な変化を捉えることである。本研究はPOSIXの迅速な感度測定における有効性を実証する実験的な証拠を提供する。パラメータ数の増加や命令のチューニングだけでは即発感度を低下させるわけではないが、数発の例を1回だけ追加しても、ほぼ常に即発感度を低下させる。また,テンプレートの変更がMCQ型タスクでは高い感度をもたらすのに対して,パラフレーズ化はオープン・エンド・ジェネレーションタスクでは高い感度をもたらすことがわかった。結果の再現コードはhttps://github.com/kowndinya-renduchintala/POSIX.comで公開されている。 Despite their remarkable capabilities, Large Language Models (LLMs) are found to be surprisingly sensitive to minor variations in prompts, often generating significantly divergent outputs in response to minor variations in the prompts, such as spelling errors, alteration of wording or the prompt template. However, while assessing the quality of an LLM, the focus often tends to be solely on its performance on downstream tasks, while very little to no attention is paid to prompt sensitivity. To fill this gap, we propose POSIX - a novel PrOmpt Sensitivity IndeX as a reliable measure of prompt sensitivity, thereby offering a more comprehensive evaluation of LLM performance. The key idea behind POSIX is to capture the relative change in loglikelihood of a given response upon replacing the corresponding prompt with a different intent-preserving prompt. We provide thorough empirical evidence demonstrating the efficacy of POSIX in capturing prompt sensitivity and subsequently use it to measure and thereby compare prompt sensitivity of various open-source LLMs. We find that merely increasing the parameter count or instruction tuning does not necessarily reduce prompt sensitivity whereas adding some few-shot exemplars, even just one, almost always leads to significant decrease in prompt sensitivity. We also find that alterations to prompt template lead to the highest sensitivity in the case of MCQ type tasks, whereas paraphrasing results in the highest sensitivity in open-ended generation tasks. The code for reproducing our results is open-sourced at https://github.com/kowndinya-renduchintala/POSIX.	翻訳日:2024-11-04 08:15:54 公開日:2024-10-04
# Buckle Up: データキュレーションによるすべてのカスタマイズステージにおけるLLMのロバスト化 Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation ( http://arxiv.org/abs/2410.02220v1 ) ライセンス: Link先を確認	Xiaoqun Liu, Jiacheng Liang, Luoxi Tang, Chenyu You, Muchao Ye, Zhaohan Xi,	(参考訳) 大規模な言語モデル(LLM)は、"カストミゼーション(customization)"と呼ばれるプロセスを通じて下流のアプリケーションに広く適用され、微調整はドメイン固有の専門知識を統合する一般的な方法である。しかし、最近の研究では、LSMを悪意のあるサンプルでチューニングすることで、その堅牢性を損なうことができ、有害なコンテンツを増幅する脆弱性が明らかにされている。このような攻撃を緩和するために、データキュレーションを利用した効果的な防御フレームワークを提案し、LLMの観点から、コモンセンステキストを改訂し、安全性を高める。キュレートされたテキストは、カスタマイズプロセスのすべての段階で、Jailbreak攻撃を緩和することができる: 将来のJailbreakの試みに対してLLMを免疫するカスタマイズ前、Jailbreakリスクを中和するカスタマイズ中、または、妥協されたモデルを復元するカスタマイズ後。キュレートされたデータは、標準の微調整ワークフローを通じてLLMを強化するため、LLM推論中に追加モジュールを導入せず、元のカスタマイズプロセスを保存する。実験の結果、ジェイルブレイク効果は大幅に減少し、最大で100%の応答が得られた。特に,本手法は,安全関連データよりも容易なコモンセンステキストでも有効である。あらゆる段階の防御フレームワークと実験性能により、この作業は、脱獄リスクを軽減し、LLMの安全なカスタマイズを確保するための重要な進歩を示す。 Large language models (LLMs) are extensively adapted for downstream applications through a process known as "customization," with fine-tuning being a common method for integrating domain-specific expertise. However, recent studies have revealed a vulnerability that tuning LLMs with malicious samples can compromise their robustness and amplify harmful content, an attack known as "jailbreaking." To mitigate such attack, we propose an effective defensive framework utilizing data curation to revise commonsense texts and enhance their safety implication from the perspective of LLMs. The curated texts can mitigate jailbreaking attacks at every stage of the customization process: before customization to immunize LLMs against future jailbreak attempts, during customization to neutralize jailbreaking risks, or after customization to restore the compromised models. Since the curated data strengthens LLMs through the standard fine-tuning workflow, we do not introduce additional modules during LLM inference, thereby preserving the original customization process. Experimental results demonstrate a substantial reduction in jailbreaking effects, with up to a 100% success in generating responsible responses. Notably, our method is effective even with commonsense texts, which are often more readily available than safety-relevant data. With the every-stage defensive framework and supporting experimental performance, this work represents a significant advancement in mitigating jailbreaking risks and ensuring the secure customization of LLMs.	翻訳日:2024-11-04 07:55:57 公開日:2024-10-04
# Buckle Up: データキュレーションによるすべてのカスタマイズステージにおけるLLMのロバスト化 Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation ( http://arxiv.org/abs/2410.02220v2 ) ライセンス: Link先を確認	Xiaoqun Liu, Jiacheng Liang, Luoxi Tang, Chenyu You, Muchao Ye, Zhaohan Xi,	(参考訳) 大規模な言語モデル(LLM)は、"カストミゼーション(customization)"と呼ばれるプロセスを通じて下流のアプリケーションに広く適用され、微調整はドメイン固有の専門知識を統合する一般的な方法である。しかし、最近の研究では、LSMを悪意のあるサンプルでチューニングすることで、その堅牢性を損なうことができ、有害なコンテンツを増幅する脆弱性が明らかにされている。このような攻撃を緩和するために、データキュレーションを利用した効果的な防御フレームワークを提案し、LLMの観点から、コモンセンステキストを改訂し、安全性を高める。キュレートされたテキストは、カスタマイズプロセスのすべての段階で、Jailbreak攻撃を緩和することができる: 将来のJailbreakの試みに対してLLMを免疫するカスタマイズ前、Jailbreakリスクを中和するカスタマイズ中、または、妥協されたモデルを復元するカスタマイズ後。キュレートされたデータは、標準の微調整ワークフローを通じてLLMを強化するため、LLM推論中に追加モジュールを導入せず、元のカスタマイズプロセスを保存する。実験の結果、ジェイルブレイク効果は大幅に減少し、最大で100%の応答が得られた。特に,本手法は,安全関連データよりも容易なコモンセンステキストでも有効である。あらゆる段階の防御フレームワークと実験性能により、この作業は、脱獄リスクを軽減し、LLMの安全なカスタマイズを確保するための重要な進歩を示す。 Large language models (LLMs) are extensively adapted for downstream applications through a process known as "customization," with fine-tuning being a common method for integrating domain-specific expertise. However, recent studies have revealed a vulnerability that tuning LLMs with malicious samples can compromise their robustness and amplify harmful content, an attack known as "jailbreaking." To mitigate such attack, we propose an effective defensive framework utilizing data curation to revise commonsense texts and enhance their safety implication from the perspective of LLMs. The curated texts can mitigate jailbreaking attacks at every stage of the customization process: before customization to immunize LLMs against future jailbreak attempts, during customization to neutralize jailbreaking risks, or after customization to restore the compromised models. Since the curated data strengthens LLMs through the standard fine-tuning workflow, we do not introduce additional modules during LLM inference, thereby preserving the original customization process. Experimental results demonstrate a substantial reduction in jailbreaking effects, with up to a 100% success in generating responsible responses. Notably, our method is effective even with commonsense texts, which are often more readily available than safety-relevant data. With the every-stage defensive framework and supporting experimental performance, this work represents a significant advancement in mitigating jailbreaking risks and ensuring the secure customization of LLMs.	翻訳日:2024-11-04 07:55:57 公開日:2024-10-04
# コーパスノベルティのアノテーションガイドライン : その1-名前付きエンティティ認識 Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition ( http://arxiv.org/abs/2410.02281v1 ) ライセンス: Link先を確認	Arthur Amalvy, Vincent Labatut,	(参考訳) ノベルティ・コーパス(英: Novelties corpus)は、名前付きエンティティ認識(NER)に注釈を付けた小説(と小説の一部)のコレクションである。本書では、その注釈中に適用されるガイドラインについて記述する。アノテータが使用する指示に加えて、注釈付き小説から検索した多くの例や、実体としてマークすべき表現、そうすべきでない表現を含む。 The Novelties corpus is a collection of novels (and parts of novels) annotated for Named Entity Recognition (NER) among other tasks. This document describes the guidelines applied during its annotation. It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels, and illustrating expressions that should be marked as entities as well as expressions that should not.	翻訳日:2024-11-04 07:36:05 公開日:2024-10-04
# マルチアーマッドバンドにおけるレイのアッパー信頼境界について On Lai's Upper Confidence Bound in Multi-Armed Bandits ( http://arxiv.org/abs/2410.02279v1 ) ライセンス: Link先を確認	Huachen Ren, Cun-Hui Zhang,	(参考訳) この記念論文では、Tze Leung Lai による多武装の盗賊のトピックへの献身的な貢献を顕彰する。ガウス報酬の探索レベルが一定である高信頼度有界指数に対して、急激な非漸近的後悔境界を確立する。さらに, 対応する腕の標本サイズに応じて減少する探索関数を用いて, 上面信頼度有界指数に対する非漸近的後悔境界を定めている。後悔境界は、レイ・ロビンズの下界と一致する鉛直定数を持つ。我々の結果は、機械学習の文献にもっと注目に値する、Laiの独創的な作品の側面を強調します。 In this memorial paper, we honor Tze Leung Lai's seminal contributions to the topic of multi-armed bandits, with a specific focus on his pioneering work on the upper confidence bound. We establish sharp non-asymptotic regret bounds for an upper confidence bound index with a constant level of exploration for Gaussian rewards. Furthermore, we establish a non-asymptotic regret bound for the upper confidence bound index of \cite{lai1987adaptive} which employs an exploration function that decreases with the sample size of the corresponding arm. The regret bounds have leading constants that match the Lai-Robbins lower bound. Our results highlight an aspect of Lai's seminal works that deserves more attention in the machine learning literature.	翻訳日:2024-11-04 04:12:15 公開日:2024-10-04
# マルチアーマッドバンドにおけるレイのアッパー信頼境界について On Lai's Upper Confidence Bound in Multi-Armed Bandits ( http://arxiv.org/abs/2410.02279v2 ) ライセンス: Link先を確認	Huachen Ren, Cun-Hui Zhang,	(参考訳) この記念論文では、Tze Leung Lai による多武装の盗賊のトピックへの献身的な貢献を顕彰する。ガウス報酬の探索レベルが一定である高信頼度有界指数に対して、急激な非漸近的後悔境界を確立する。さらに,Lai (1987) の高信頼束縛指数に対して,対応する腕の標本サイズに比例して減少する探索関数を用いた非漸近的後悔境界を確立する。後悔境界は、レイ・ロビンズの下界と一致する鉛直定数を持つ。我々の結果は、機械学習の文献にもっと注目に値する、Laiの独創的な作品の側面を強調します。 In this memorial paper, we honor Tze Leung Lai's seminal contributions to the topic of multi-armed bandits, with a specific focus on his pioneering work on the upper confidence bound. We establish sharp non-asymptotic regret bounds for an upper confidence bound index with a constant level of exploration for Gaussian rewards. Furthermore, we establish a non-asymptotic regret bound for the upper confidence bound index of Lai (1987) which employs an exploration function that decreases with the sample size of the corresponding arm. The regret bounds have leading constants that match the Lai-Robbins lower bound. Our results highlight an aspect of Lai's seminal works that deserves more attention in the machine learning literature.	翻訳日:2024-11-04 04:12:15 公開日:2024-10-04
# コーパスノベルティのアノテーションガイドライン : その1-名前付きエンティティ認識 Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition ( http://arxiv.org/abs/2410.02281v2 ) ライセンス: Link先を確認	Arthur Amalvy, Vincent Labatut,	(参考訳) ノベルティ・コーパス(英: Novelties corpus)は、名前付きエンティティ認識(NER)に注釈を付けた小説(と小説の一部)のコレクションである。本書では、その注釈中に適用されるガイドラインについて記述する。アノテータが使用する指示に加えて、注釈付き小説から検索した多くの例や、実体としてマークすべき表現、そうすべきでない表現を含む。 The Novelties corpus is a collection of novels (and parts of novels) annotated for Named Entity Recognition (NER) among other tasks. This document describes the guidelines applied during its annotation. It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels, and illustrating expressions that should be marked as entities as well as expressions that should not.	翻訳日:2024-11-04 04:12:15 公開日:2024-10-04
# オンライン手書き文字作成におけるグリフからの切り離し Decoupling Layout from Glyph in Online Chinese Handwriting Generation ( http://arxiv.org/abs/2410.02309v1 ) ライセンス: Link先を確認	Ren-Min Si, Yan-Ming Zhang, Yi Chen,	(参考訳) テキストは人類の文明の伝達において重要な役割を担い、様々なスタイルでオンラインの手書きテキストを生成する機械を教えることは、興味深い、重要な課題である。しかし、これまでのほとんどの研究は個々の中国語フォントの生成に集中しており、完全なテキスト行の生成はほとんど探索されていない。本稿では,テキスト行が自然にレイアウトとグリフの2つの構成要素に分けることができることを示す。この分割に基づいて,この課題に階層的に対処するために,テキスト行レイアウトジェネレータと拡散型スタイリゼーションフォント合成器を併用したテキスト行レイアウトジェネレータを設計した。より具体的には、レイアウト生成装置は、テキスト内容と提供されたスタイル参照に基づいて、コンテキスト内学習を行い、各グリフの位置を自己回帰的に生成する。一方、文字埋め込み辞書、複数スケールの書体スタイルエンコーダ、及び1D U-Netベースの拡散デノイザからなるフォントシンセサイザは、所定の書体参照から抽出した書体スタイルを模倣しつつ、その位置に各フォントを生成する。 CASIA-OLHWDBの定性的および定量的実験により,本手法は構造的正確かつ識別不能な模擬サンプルを生成することができることを示した。 Text plays a crucial role in the transmission of human civilization, and teaching machines to generate online handwritten text in various styles presents an interesting and significant challenge. However, most prior work has concentrated on generating individual Chinese fonts, leaving {complete text line generation largely unexplored}. In this paper, we identify that text lines can naturally be divided into two components: layout and glyphs. Based on this division, we designed a text line layout generator coupled with a diffusion-based stylized font synthesizer to address this challenge hierarchically. More concretely, the layout generator performs in-context-like learning based on the text content and the provided style references to generate positions for each glyph autoregressively. Meanwhile, the font synthesizer which consists of a character embedding dictionary, a multi-scale calligraphy style encoder, and a 1D U-Net based diffusion denoiser will generate each font on its position while imitating the calligraphy style extracted from the given style references. Qualitative and quantitative experiments on the CASIA-OLHWDB demonstrate that our method is capable of generating structurally correct and indistinguishable imitation samples.	翻訳日:2024-11-04 04:00:02 公開日:2024-10-04
# オンライン手書き文字作成におけるグリフからの切り離し Decoupling Layout from Glyph in Online Chinese Handwriting Generation ( http://arxiv.org/abs/2410.02309v2 ) ライセンス: Link先を確認	Min-Si Ren, Yan-Ming Zhang, Yi Chen,	(参考訳) テキストは人類の文明の伝達において重要な役割を担い、様々なスタイルでオンラインの手書きテキストを生成する機械を教えることは、興味深い、重要な課題である。しかし、これまでのほとんどの研究は個々の中国語フォントの生成に集中しており、完全なテキスト行の生成はほとんど探索されていない。本稿では,テキスト行が自然にレイアウトとグリフの2つの構成要素に分けることができることを示す。この分割に基づいて,この課題に階層的に対処するために,テキスト行レイアウトジェネレータと拡散型スタイリゼーションフォント合成器を併用したテキスト行レイアウトジェネレータを設計した。より具体的には、レイアウト生成装置は、テキスト内容と提供されたスタイル参照に基づいて、コンテキスト内学習を行い、各グリフの位置を自己回帰的に生成する。一方、文字埋め込み辞書、複数スケールの書体スタイルエンコーダ、及び1D U-Netベースの拡散デノイザからなるフォントシンセサイザは、所定の書体参照から抽出した書体スタイルを模倣しつつ、その位置に各フォントを生成する。 CASIA-OLHWDBの定性的および定量的実験により,本手法は構造的正確かつ識別不能な模擬サンプルを生成することができることを示した。 Text plays a crucial role in the transmission of human civilization, and teaching machines to generate online handwritten text in various styles presents an interesting and significant challenge. However, most prior work has concentrated on generating individual Chinese fonts, leaving {complete text line generation largely unexplored}. In this paper, we identify that text lines can naturally be divided into two components: layout and glyphs. Based on this division, we designed a text line layout generator coupled with a diffusion-based stylized font synthesizer to address this challenge hierarchically. More concretely, the layout generator performs in-context-like learning based on the text content and the provided style references to generate positions for each glyph autoregressively. Meanwhile, the font synthesizer which consists of a character embedding dictionary, a multi-scale calligraphy style encoder, and a 1D U-Net based diffusion denoiser will generate each font on its position while imitating the calligraphy style extracted from the given style references. Qualitative and quantitative experiments on the CASIA-OLHWDB demonstrate that our method is capable of generating structurally correct and indistinguishable imitation samples.	翻訳日:2024-11-04 04:00:02 公開日:2024-10-04
# RAGはLLMの推論にどの程度役立つか? How Much Can RAG Help the Reasoning of LLM? ( http://arxiv.org/abs/2410.02338v1 ) ライセンス: Link先を確認	Jingyu Liu, Jiaen Lin, Yong Liu,	(参考訳) Retrieval-Augmented Generation (RAG) は、新しい知識の導入と幻覚の低減に効果があるため、現代のLarge Language Models (LLMs) において大きな人気を集めている。しかし、RAGの深い理解は依然として限られており、RAGが推論プロセスをどのように助け、RAGが推論能力を改善するのにどう役立つのかは疑問である。外部文書はドメイン固有の情報を組み込む方法として一般的に考えられているが、クエリに関連する中間的推論結果も含んでいることから、これまで検討されていないLCMの推論能力を高める可能性が示唆されている。本稿では,この問題を深く検討し,RAGが推論を補助できるのに対して,支援は限定的であることを示す。一定の深さを持つ木として推論過程を概念化すれば、RAGはより深い推論を行うLLMを支援するのに苦労する。さらに、ドキュメント内の情報はノイズをフィルタリングするために事前処理が必要である。我々は、この前処理がLLMの微調整を単純に行うのが困難であることを示し、その問題を解決するために多くのトランスフォーマー層を必要とすることをしばしば示している。問題を単純化するために,DPromptチューニングを提案する。これは,限られた変圧器層内での問題を効果的に解決し,性能が向上する。 Retrieval-Augmented Generation (RAG) has gained significant popularity in modern Large Language Models (LLMs) due to its effectiveness in introducing new knowledge and reducing hallucinations. However, the deep understanding of RAG remains limited, how does RAG help the reasoning process and can RAG help improve the reasoning capability remains question. While external documents are typically considered as a method to incorporate domain-specific information, they also contain intermediate reasoning results related to the query, this suggests that documents could enhance the reasoning capability of LLMs, which has not been previously explored. In this paper, we investigate this issue in depth and find that while RAG can assist with reasoning, the help is limited. If we conceptualize the reasoning process as a tree with fixed depth, then RAG struggles to assist LLMs in performing deeper reasoning. Additionally, the information in the documents requires preprocessing to filter out noise. We demonstrate that this preprocessing is difficult to achieve simply fine-tuning of the LLM, it often necessitates numerous additional transformer layers to solve the problem. To simplify the problem, we propose DPrompt tuning, which effectively resolves the issue within just limited transformer layers, leading to improved performance.	翻訳日:2024-11-04 03:50:17 公開日:2024-10-04
# RAGはLLMの推論にどの程度役立つか? How Much Can RAG Help the Reasoning of LLM? ( http://arxiv.org/abs/2410.02338v2 ) ライセンス: Link先を確認	Jingyu Liu, Jiaen Lin, Yong Liu,	(参考訳) Retrieval-Augmented Generation (RAG) は、新しい知識の導入と幻覚の低減に効果があるため、現代のLarge Language Models (LLMs) において大きな人気を集めている。しかし、RAGの深い理解は依然として限られており、RAGが推論プロセスをどのように助け、RAGが推論能力を改善するのにどう役立つのかは疑問である。外部文書はドメイン固有の情報を組み込む方法として一般的に考えられているが、クエリに関連する中間的推論結果も含んでいることから、これまで検討されていないLCMの推論能力を高める可能性が示唆されている。本稿では,この問題を深く検討し,RAGが推論を補助できるのに対して,支援は限定的であることを示す。一定の深さを持つ木として推論過程を概念化すれば、RAGはより深い推論を行うLLMを支援するのに苦労する。さらに、ドキュメント内の情報はノイズをフィルタリングするために事前処理が必要である。我々は、この前処理がLLMの微調整を単純に行うのが困難であることを示し、その問題を解決するために多くのトランスフォーマー層を必要とすることをしばしば示している。問題を単純化するために,DPromptチューニングを提案する。これは,限られた変圧器層内での問題を効果的に解決し,性能が向上する。 Retrieval-Augmented Generation (RAG) has gained significant popularity in modern Large Language Models (LLMs) due to its effectiveness in introducing new knowledge and reducing hallucinations. However, the deep understanding of RAG remains limited, how does RAG help the reasoning process and can RAG help improve the reasoning capability remains question. While external documents are typically considered as a method to incorporate domain-specific information, they also contain intermediate reasoning results related to the query, this suggests that documents could enhance the reasoning capability of LLMs, which has not been previously explored. In this paper, we investigate this issue in depth and find that while RAG can assist with reasoning, the help is limited. If we conceptualize the reasoning process as a tree with fixed depth, then RAG struggles to assist LLMs in performing deeper reasoning. Additionally, the information in the documents requires preprocessing to filter out noise. We demonstrate that this preprocessing is difficult to achieve simply fine-tuning of the LLM, it often necessitates numerous additional transformer layers to solve the problem. To simplify the problem, we propose DPrompt tuning, which effectively resolves the issue within just limited transformer layers, leading to improved performance.	翻訳日:2024-11-04 03:50:17 公開日:2024-10-04
# IoT-LLM: 大規模言語モデルによる実世界のIoTタスク推論の強化 IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models ( http://arxiv.org/abs/2410.02429v1 ) ライセンス: Link先を確認	Tuo An, Yunjiao Zhou, Han Zou, Jianfei Yang,	(参考訳) 大規模言語モデル(LLM)は、テキストと視覚領域にまたがる顕著な能力を示してきたが、しばしば物理法則に違反した出力を生成し、物理世界に対する理解のギャップを明らかにしている。知覚が推論の基礎となる人間の認知に触発されて,モノのインターネット(IoT)センサデータを用いた知覚能力の向上と,物理世界でのIoTタスク推論に関する関連する知識について検討する。本研究では,実世界のIoTタスクに対処するLLMの能力を,認識と知識ベースを増強して体系的に研究し,その能力を高めるために統合されたフレームワークであるIoT-LLMを提案する。 IoT-LLMでは、IoTデータをLLMに対応可能なフォーマットにプリプロセッシングし、チェーン・オブ・シンクレットのプロンプトと特殊な役割定義を通じてコモンセンスの知識を活性化し、コンテキスト内学習に基づくIoT指向の検索強化生成を通じて理解を深める、という3つのステップをカスタマイズします。性能を評価するため、我々は、異なるデータタイプと推論困難を持つ5つの実世界のIoTタスクを備えた新しいベンチマークを設計し、6つのオープンソースおよびオープンソースLLM上でベンチマーク結果を提供する。実験の結果,これらのタスクを効果的に実行できないテキスト入力による既存のLLMの限界が示された。 GPT-4 などの LLM を推論した IoT タスクの性能は IoT-LLM により大幅に向上し,従来の手法と比較して,各タスクの平均 65% の改善が達成された。結果は、推論プロセスを提供することで、IoTデータとデータの背後にある物理法則を理解するLLMの能力を示す。我々の研究の限界は、この新時代の将来の研究に刺激を与えると主張されている。 Large Language Models (LLMs) have demonstrated remarkable capabilities across textual and visual domains but often generate outputs that violate physical laws, revealing a gap in their understanding of the physical world. Inspired by human cognition, where perception is fundamental to reasoning, we explore augmenting LLMs with enhanced perception abilities using Internet of Things (IoT) sensor data and pertinent knowledge for IoT task reasoning in the physical world. In this work, we systematically study LLMs capability to address real-world IoT tasks by augmenting their perception and knowledge base, and then propose a unified framework, IoT-LLM, to enhance such capability. In IoT-LLM, we customize three steps for LLMs: preprocessing IoT data into formats amenable to LLMs, activating their commonsense knowledge through chain-of-thought prompting and specialized role definitions, and expanding their understanding via IoT-oriented retrieval-augmented generation based on in-context learning. To evaluate the performance, We design a new benchmark with five real-world IoT tasks with different data types and reasoning difficulties and provide the benchmarking results on six open-source and close-source LLMs. Experimental results demonstrate the limitations of existing LLMs with naive textual inputs that cannot perform these tasks effectively. We show that IoT-LLM significantly enhances the performance of IoT tasks reasoning of LLM, such as GPT-4, achieving an average improvement of 65% across various tasks against previous methods. The results also showcase LLMs ability to comprehend IoT data and the physical law behind data by providing a reasoning process. Limitations of our work are claimed to inspire future research in this new era.	翻訳日:2024-11-04 03:20:51 公開日:2024-10-04
# IoT-LLM: 大規模言語モデルによる実世界のIoTタスク推論の強化 IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models ( http://arxiv.org/abs/2410.02429v2 ) ライセンス: Link先を確認	Tuo An, Yunjiao Zhou, Han Zou, Jianfei Yang,	(参考訳) 大規模言語モデル(LLM)は、テキストと視覚領域にまたがる顕著な能力を示してきたが、しばしば物理法則に違反した出力を生成し、物理世界に対する理解のギャップを明らかにしている。知覚が推論の基礎となる人間の認知に触発されて,モノのインターネット(IoT)センサデータを用いた知覚能力の向上と,物理世界でのIoTタスク推論に関する関連する知識について検討する。本研究では,実世界のIoTタスクに対処するLLMの能力を,認識と知識ベースを増強して体系的に研究し,その能力を高めるために統合されたフレームワークであるIoT-LLMを提案する。 IoT-LLMでは、IoTデータをLLMに対応可能なフォーマットにプリプロセッシングし、チェーン・オブ・シンクレットのプロンプトと特殊な役割定義を通じてコモンセンスの知識を活性化し、コンテキスト内学習に基づくIoT指向の検索強化生成を通じて理解を深める、という3つのステップをカスタマイズします。性能を評価するため、我々は、異なるデータタイプと推論困難を持つ5つの実世界のIoTタスクを備えた新しいベンチマークを設計し、6つのオープンソースおよびオープンソースLLM上でベンチマーク結果を提供する。実験の結果,これらのタスクを効果的に実行できないテキスト入力による既存のLLMの限界が示された。 GPT-4 などの LLM を推論した IoT タスクの性能は IoT-LLM により大幅に向上し,従来の手法と比較して,各タスクの平均 65% の改善が達成された。結果は、推論プロセスを提供することで、IoTデータとデータの背後にある物理法則を理解するLLMの能力を示す。我々の研究の限界は、この新時代の将来の研究に刺激を与えると主張されている。 Large Language Models (LLMs) have demonstrated remarkable capabilities across textual and visual domains but often generate outputs that violate physical laws, revealing a gap in their understanding of the physical world. Inspired by human cognition, where perception is fundamental to reasoning, we explore augmenting LLMs with enhanced perception abilities using Internet of Things (IoT) sensor data and pertinent knowledge for IoT task reasoning in the physical world. In this work, we systematically study LLMs capability to address real-world IoT tasks by augmenting their perception and knowledge base, and then propose a unified framework, IoT-LLM, to enhance such capability. In IoT-LLM, we customize three steps for LLMs: preprocessing IoT data into formats amenable to LLMs, activating their commonsense knowledge through chain-of-thought prompting and specialized role definitions, and expanding their understanding via IoT-oriented retrieval-augmented generation based on in-context learning. To evaluate the performance, We design a new benchmark with five real-world IoT tasks with different data types and reasoning difficulties and provide the benchmarking results on six open-source and close-source LLMs. Experimental results demonstrate the limitations of existing LLMs with naive textual inputs that cannot perform these tasks effectively. We show that IoT-LLM significantly enhances the performance of IoT tasks reasoning of LLM, such as GPT-4, achieving an average improvement of 65% across various tasks against previous methods. The results also showcase LLMs ability to comprehend IoT data and the physical law behind data by providing a reasoning process. Limitations of our work are claimed to inspire future research in this new era.	翻訳日:2024-11-04 03:20:51 公開日:2024-10-04
# MedVisionLlama: トレーニング済みの大規模言語モデルレイヤを活用して医療画像のセグメンテーションを促進する MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation ( http://arxiv.org/abs/2410.02458v1 ) ライセンス: Link先を確認	Gurucharan Marthi Krishna Kumar, Aman Chadha, Janine Mendola, Amir Shmuel,	(参考訳) テキストデータにおける汎用性で知られる大規模言語モデル (LLM) は, 正確な画像診断を行う上で重要な課題である, 医用画像のセグメンテーションを強化する可能性について, 研究が進んでいる。本研究では、予め訓練されたLCMトランスブロックを統合することで、医用画像セグメンテーションのためのビジョントランス(ViT)の強化について検討する。凍結LDMトランスバータブロックをViTモデルエンコーダに組み込んだアプローチにより,様々な医用画像モダリティのセグメンテーション性能が大幅に向上した。本稿では,グローバルな特徴学習と局所的な特徴学習を組み合わせたハイブリッド注意機構を提案する。改良されたモデルでは、平均Diceスコアが0.74から0.79に向上し、精度、精度、ジャカード指数が向上した。これらの結果は, 医用画像分割の精細化におけるLLMトランスフォーマーの有効性を示し, モデル精度とロバスト性を大幅に向上させる可能性を強調した。ソースコードと実装は以下の通りである。 Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: https://bit.ly/3zf2CVs	翻訳日:2024-11-04 03:11:05 公開日:2024-10-04
# MedVisionLlama: トレーニング済みの大規模言語モデルレイヤを活用して医療画像のセグメンテーションを促進する MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation ( http://arxiv.org/abs/2410.02458v2 ) ライセンス: Link先を確認	Gurucharan Marthi Krishna Kumar, Aman Chadha, Janine Mendola, Amir Shmuel,	(参考訳) テキストデータにおける汎用性で知られる大規模言語モデル (LLM) は, 正確な画像診断を行う上で重要な課題である, 医用画像のセグメンテーションを強化する可能性について, 研究が進んでいる。本研究では、予め訓練されたLCMトランスブロックを統合することで、医用画像セグメンテーションのためのビジョントランス(ViT)の強化について検討する。凍結LDMトランスバータブロックをViTモデルエンコーダに組み込んだアプローチにより,様々な医用画像モダリティのセグメンテーション性能が大幅に向上した。本稿では,グローバルな特徴学習と局所的な特徴学習を組み合わせたハイブリッド注意機構を提案する。改良されたモデルでは、平均Diceスコアが0.74から0.79に向上し、精度、精度、ジャカード指数が向上した。これらの結果は, 医用画像分割の精細化におけるLLMトランスフォーマーの有効性を示し, モデル精度とロバスト性を大幅に向上させる可能性を強調した。ソースコードと実装は以下の通りである。 Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: https://bit.ly/3zf2CVs	翻訳日:2024-11-04 03:11:05 公開日:2024-10-04
# 拡散モデルは進化的アルゴリズムである Diffusion Models are Evolutionary Algorithms ( http://arxiv.org/abs/2410.02543v1 ) ライセンス: Link先を確認	Yanbo Zhang, Benedikt Hartl, Hananel Hazan, Michael Levin,	(参考訳) 機械学習と生物学の収束において、拡散モデルが進化的アルゴリズムであることを明らかにする。進化を進化過程として考慮し、進化を拡散として逆転させることにより、拡散モデルが自然に進化のアルゴリズムを実行し、自然に選択、突然変異、生殖の隔離を包含することを数学的に示す。この同値性に基づいて拡散進化法(Diffusion Evolution method)を提案する。拡散モデルの文脈で最初に導入された反復的復調を利用した進化的アルゴリズムで、パラメータ空間における解をヒューリスティックに洗練する。従来のアプローチとは異なり、拡散進化は複数の最適解を効果的に同定し、主要な進化アルゴリズムより優れている。さらに、拡散モデル、すなわち潜時空間拡散と加速サンプリングの先進的な概念を活用して、高次元複素パラメータ空間における進化的タスクの解を求める潜時空間拡散進化(Latent Space Diffusion Evolution)を導入し、計算ステップを大幅に削減する。この拡散と進化の間の並列性は、2つの異なる分野を橋渡しするだけでなく、相互拡張のための新たな道を開き、オープンエンド進化に関する疑問を提起し、拡散進化の文脈において非ガウス的または離散拡散モデルを利用する可能性がある。 In a convergence of machine learning and biology, we reveal that diffusion models are evolutionary algorithms. By considering evolution as a denoising process and reversed evolution as diffusion, we mathematically demonstrate that diffusion models inherently perform evolutionary algorithms, naturally encompassing selection, mutation, and reproductive isolation. Building on this equivalence, we propose the Diffusion Evolution method: an evolutionary algorithm utilizing iterative denoising -- as originally introduced in the context of diffusion models -- to heuristically refine solutions in parameter spaces. Unlike traditional approaches, Diffusion Evolution efficiently identifies multiple optimal solutions and outperforms prominent mainstream evolutionary algorithms. Furthermore, leveraging advanced concepts from diffusion models, namely latent space diffusion and accelerated sampling, we introduce Latent Space Diffusion Evolution, which finds solutions for evolutionary tasks in high-dimensional complex parameter space while significantly reducing computational steps. This parallel between diffusion and evolution not only bridges two different fields but also opens new avenues for mutual enhancement, raising questions about open-ended evolution and potentially utilizing non-Gaussian or discrete diffusion models in the context of Diffusion Evolution.	翻訳日:2024-11-04 02:31:52 公開日:2024-10-04
# 拡散モデルは進化的アルゴリズムである Diffusion Models are Evolutionary Algorithms ( http://arxiv.org/abs/2410.02543v2 ) ライセンス: Link先を確認	Yanbo Zhang, Benedikt Hartl, Hananel Hazan, Michael Levin,	(参考訳) 機械学習と生物学の収束において、拡散モデルが進化的アルゴリズムであることを明らかにする。進化を進化過程として考慮し、進化を拡散として逆転させることにより、拡散モデルが自然に進化のアルゴリズムを実行し、自然に選択、突然変異、生殖の隔離を包含することを数学的に示す。この同値性に基づいて拡散進化法(Diffusion Evolution method)を提案する。拡散モデルの文脈で最初に導入された反復的復調を利用した進化的アルゴリズムで、パラメータ空間における解をヒューリスティックに洗練する。従来のアプローチとは異なり、拡散進化は複数の最適解を効果的に同定し、主要な進化アルゴリズムより優れている。さらに、拡散モデル、すなわち潜時空間拡散と加速サンプリングの先進的な概念を活用して、高次元複素パラメータ空間における進化的タスクの解を求める潜時空間拡散進化(Latent Space Diffusion Evolution)を導入し、計算ステップを大幅に削減する。この拡散と進化の間の並列性は、2つの異なる分野を橋渡しするだけでなく、相互拡張のための新たな道を開き、オープンエンド進化に関する疑問を提起し、拡散進化の文脈において非ガウス的または離散拡散モデルを利用する可能性がある。 In a convergence of machine learning and biology, we reveal that diffusion models are evolutionary algorithms. By considering evolution as a denoising process and reversed evolution as diffusion, we mathematically demonstrate that diffusion models inherently perform evolutionary algorithms, naturally encompassing selection, mutation, and reproductive isolation. Building on this equivalence, we propose the Diffusion Evolution method: an evolutionary algorithm utilizing iterative denoising -- as originally introduced in the context of diffusion models -- to heuristically refine solutions in parameter spaces. Unlike traditional approaches, Diffusion Evolution efficiently identifies multiple optimal solutions and outperforms prominent mainstream evolutionary algorithms. Furthermore, leveraging advanced concepts from diffusion models, namely latent space diffusion and accelerated sampling, we introduce Latent Space Diffusion Evolution, which finds solutions for evolutionary tasks in high-dimensional complex parameter space while significantly reducing computational steps. This parallel between diffusion and evolution not only bridges two different fields but also opens new avenues for mutual enhancement, raising questions about open-ended evolution and potentially utilizing non-Gaussian or discrete diffusion models in the context of Diffusion Evolution.	翻訳日:2024-11-04 02:31:52 公開日:2024-10-04
# 自動音声認識におけるスペクトル圧縮のための畳み込み変分オートエンコーダ Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition ( http://arxiv.org/abs/2410.02560v1 ) ライセンス: Link先を確認	Olga Yakovenko, Ivan Bondarenko,	(参考訳) 多くの自動音声認識(ASR)では、スペクトルがメル周波数ケプストラル係数(MFCC)よりも良い結果を示すが、実際には特徴空間の複素次元性のために使用が困難である。下記の論文では、畳み込み変分オートエンコーダ(VAE)に基づく圧縮スペクトログラム表現の代替手法を提案する。畳み込みVAEモデルは、13次元の埋め込みから短いオーディオスペクトログラム(25ms)の断片を再構成するために、LibriSpeechデータセットのサブサンプルで訓練された。トレーニングされた40次元(300ms)の埋め込みモデルは、GoogleSpeechCommandsデータセットで音声コマンドのコーパスを生成するために使用された。生成された特徴を用いて、ASRシステムを構築し、MFCCの機能を持つモデルと比較した。 For many Automatic Speech Recognition (ASR) tasks audio features as spectrograms show better results than Mel-frequency Cepstral Coefficients (MFCC), but in practice they are hard to use due to a complex dimensionality of a feature space. The following paper presents an alternative approach towards generating compressed spectrogram representation, based on Convolutional Variational Autoencoders (VAE). A Convolutional VAE model was trained on a subsample of the LibriSpeech dataset to reconstruct short fragments of audio spectrograms (25 ms) from a 13-dimensional embedding. The trained model for a 40-dimensional (300 ms) embedding was used to generate features for corpus of spoken commands on the GoogleSpeechCommands dataset. Using the generated features an ASR system was built and compared to the model with MFCC features.	翻訳日:2024-11-04 02:31:52 公開日:2024-10-04
# 自動音声認識におけるスペクトル圧縮のための畳み込み変分オートエンコーダ Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition ( http://arxiv.org/abs/2410.02560v2 ) ライセンス: Link先を確認	Olga Iakovenko, Ivan Bondarenko,	(参考訳) 多くの自動音声認識(ASR)では、スペクトルがメル周波数ケプストラル係数(MFCC)よりも良い結果を示すが、実際には特徴空間の複素次元性のために使用が困難である。下記の論文では、畳み込み変分オートエンコーダ(VAE)に基づく圧縮スペクトログラム表現の代替手法を提案する。畳み込みVAEモデルは、13次元の埋め込みから短いオーディオスペクトログラム(25ms)の断片を再構成するために、LibriSpeechデータセットのサブサンプルで訓練された。トレーニングされた40次元(300ms)の埋め込みモデルは、GoogleSpeechCommandsデータセットで音声コマンドのコーパスを生成するために使用された。生成された特徴を用いて、ASRシステムを構築し、MFCCの機能を持つモデルと比較した。 For many Automatic Speech Recognition (ASR) tasks audio features as spectrograms show better results than Mel-frequency Cepstral Coefficients (MFCC), but in practice they are hard to use due to a complex dimensionality of a feature space. The following paper presents an alternative approach towards generating compressed spectrogram representation, based on Convolutional Variational Autoencoders (VAE). A Convolutional VAE model was trained on a subsample of the LibriSpeech dataset to reconstruct short fragments of audio spectrograms (25 ms) from a 13-dimensional embedding. The trained model for a 40-dimensional (300 ms) embedding was used to generate features for corpus of spoken commands on the GoogleSpeechCommands dataset. Using the generated features an ASR system was built and compared to the model with MFCC features.	翻訳日:2024-11-04 02:31:52 公開日:2024-10-04
# 他人の予測から3次元知覚を学ぶ Learning 3D Perception from Others' Predictions ( http://arxiv.org/abs/2410.02646v1 ) ライセンス: Link先を確認	Jinsu Yoo, Zhenyang Feng, Tai-Yu Pan, Yihong Sun, Cheng Perng Phoo, Xiangyu Chen, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao,	(参考訳) 実環境における高精度な3Dオブジェクト検出には,高品質な大量の注釈付きデータが必要である。このようなデータを取得するのは面倒で費用がかかるため、新しいセンサーが採用されたり、検出器が新しい環境にデプロイされたりする際には、繰り返し作業が必要になることが多い。本研究では,3次元物体検出装置を構築するための新たなシナリオについて検討する。例えば、自動運転車が新しいエリアに入ると、その領域に最適化された検出器を持つ他の交通参加者から学ぶことができる。この設定はラベル効率、センサ非依存、通信効率が高い:近くのユニットは予測をエゴエージェント(例えば車)と共有するだけでよい。しかし、受信した予測を地絡として、エゴ車の検知器を訓練することは、性能の低下につながる。本研究は, 疑似陽性, 偽陰性, 不正確な擬似ラベルが生じる主な原因として, 問題を体系的に検討し, 視点ミスマッチと(同期やGPSエラーによる)位置ずれを同定する。距離に基づくカリキュラムを提案し、まず、類似した視点で近接した単位から学習し、その後、自己学習によって他の単位の予測の質を向上させる。さらに、有効な擬似ラベルリファインメントモジュールを少数の注釈付きデータでトレーニングできることを示し、オブジェクト検出器のトレーニングに必要なデータ量を大幅に削減する。我々は、エゴカーの擬似ラベルとして参照車の予測を用いて、最近リリースされた実世界の協調運転データセットに対するアプローチを検証する。いくつかのシナリオ(センサ、検出器、ドメインなど)を含む広範囲な実験は、他のユニットの予測から3D知覚をラベル効率よく学習するアプローチの有効性を実証している。 Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is deployed in a new environment. We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector. For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area. This setting is label-efficient, sensor-agnostic, and communication-efficient: nearby units only need to share the predictions with the ego agent (e.g., car). Naively using the received predictions as ground-truths to train the detector for the ego car, however, leads to inferior performance. We systematically study the problem and identify viewpoint mismatches and mislocalization (due to synchronization and GPS errors) as the main causes, which unavoidably result in false positives, false negatives, and inaccurate pseudo labels. We propose a distance-based curriculum, first learning from closer units with similar viewpoints and subsequently improving the quality of other units' predictions via self-training. We further demonstrate that an effective pseudo label refinement module can be trained with a handful of annotated data, largely reducing the data quantity necessary to train an object detector. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars' predictions as pseudo labels for the ego car. Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units' predictions.	翻訳日:2024-11-04 01:52:35 公開日:2024-10-04
# 他人の予測から3次元知覚を学ぶ Learning 3D Perception from Others' Predictions ( http://arxiv.org/abs/2410.02646v2 ) ライセンス: Link先を確認	Jinsu Yoo, Zhenyang Feng, Tai-Yu Pan, Yihong Sun, Cheng Perng Phoo, Xiangyu Chen, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao,	(参考訳) 実環境における高精度な3Dオブジェクト検出には,高品質な大量の注釈付きデータが必要である。このようなデータを取得するのは面倒で費用がかかるため、新しいセンサーが採用されたり、検出器が新しい環境にデプロイされたりする際には、繰り返し作業が必要になることが多い。本研究では,3次元物体検出装置を構築するための新たなシナリオについて検討する。例えば、自動運転車が新しいエリアに入ると、その領域に最適化された検出器を持つ他の交通参加者から学ぶことができる。この設定はラベル効率、センサ非依存、通信効率が高い:近くのユニットは予測をエゴエージェント(例えば車)と共有するだけでよい。しかし、受信した予測を地絡として、エゴ車の検知器を訓練することは、性能の低下につながる。本研究は, 疑似陽性, 偽陰性, 不正確な擬似ラベルが生じる主な原因として, 問題を体系的に検討し, 視点ミスマッチと(同期やGPSエラーによる)位置ずれを同定する。距離に基づくカリキュラムを提案し、まず、類似した視点で近接した単位から学習し、その後、自己学習によって他の単位の予測の質を向上させる。さらに、有効な擬似ラベルリファインメントモジュールを少数の注釈付きデータでトレーニングできることを示し、オブジェクト検出器のトレーニングに必要なデータ量を大幅に削減する。我々は、エゴカーの擬似ラベルとして参照車の予測を用いて、最近リリースされた実世界の協調運転データセットに対するアプローチを検証する。いくつかのシナリオ(センサ、検出器、ドメインなど)を含む広範囲な実験は、他のユニットの予測から3D知覚をラベル効率よく学習するアプローチの有効性を実証している。 Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is deployed in a new environment. We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector. For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area. This setting is label-efficient, sensor-agnostic, and communication-efficient: nearby units only need to share the predictions with the ego agent (e.g., car). Naively using the received predictions as ground-truths to train the detector for the ego car, however, leads to inferior performance. We systematically study the problem and identify viewpoint mismatches and mislocalization (due to synchronization and GPS errors) as the main causes, which unavoidably result in false positives, false negatives, and inaccurate pseudo labels. We propose a distance-based curriculum, first learning from closer units with similar viewpoints and subsequently improving the quality of other units' predictions via self-training. We further demonstrate that an effective pseudo label refinement module can be trained with a handful of annotated data, largely reducing the data quantity necessary to train an object detector. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars' predictions as pseudo labels for the ego car. Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units' predictions.	翻訳日:2024-11-04 01:52:35 公開日:2024-10-04
# 合成データを用いたビデオインストラクションチューニング Video Instruction Tuning With Synthetic Data ( http://arxiv.org/abs/2410.02713v1 ) ライセンス: Link先を確認	Yuanhan Zhang, Jinming Wu, Wei Li, Bo Li, Zejun Ma, Ziwei Liu, Chunyuan Li,	(参考訳) ビデオ大マルチモーダルモデル(LMM)の開発は,Webから大量の高品質な生データを収集することの難しさによって妨げられている。そこで本稿では,LLaVA-Video-178Kというビデオ命令追従のための高品質な合成データセットを作成する方法を提案する。このデータセットには、詳細なキャプション、オープンエンド質問回答(QA)、複数選択QAといった重要なタスクが含まれている。このデータセットをトレーニングすることにより、既存の視覚的インストラクションチューニングデータと組み合わせて、新しいビデオLMMであるLLaVA-Videoを導入する。実験の結果,LLaVA-Videoは様々なビデオベンチマークで高い性能を示し,データセットの有効性を強調した。データセット、生成パイプライン、モデルチェックポイントをリリースする予定です。 The development of video large multimodal models (LMMs) has been hindered by the difficulty of curating large amounts of high-quality raw data from the web. To address this, we propose an alternative approach by creating a high-quality synthetic dataset specifically for video instruction-following, namely LLaVA-Video-178K. This dataset includes key tasks such as detailed captioning, open-ended question-answering (QA), and multiple-choice QA. By training on this dataset, in combination with existing visual instruction tuning data, we introduce LLaVA-Video, a new video LMM. Our experiments demonstrate that LLaVA-Video achieves strong performance across various video benchmarks, highlighting the effectiveness of our dataset. We plan to release the dataset, its generation pipeline, and the model checkpoints.	翻訳日:2024-11-04 01:23:03 公開日:2024-10-04
# 合成データを用いたビデオインストラクションチューニング Video Instruction Tuning With Synthetic Data ( http://arxiv.org/abs/2410.02713v2 ) ライセンス: Link先を確認	Yuanhan Zhang, Jinming Wu, Wei Li, Bo Li, Zejun Ma, Ziwei Liu, Chunyuan Li,	(参考訳) ビデオ大マルチモーダルモデル(LMM)の開発は,Webから大量の高品質な生データを収集することの難しさによって妨げられている。そこで本稿では,LLaVA-Video-178Kというビデオ命令追従のための高品質な合成データセットを作成する方法を提案する。このデータセットには、詳細なキャプション、オープンエンド質問回答(QA)、複数選択QAといった重要なタスクが含まれている。このデータセットをトレーニングすることにより、既存の視覚的インストラクションチューニングデータと組み合わせて、新しいビデオLMMであるLLaVA-Videoを導入する。実験の結果,LLaVA-Videoは様々なビデオベンチマークで高い性能を示し,データセットの有効性を強調した。データセット、生成パイプライン、モデルチェックポイントをリリースする予定です。 The development of video large multimodal models (LMMs) has been hindered by the difficulty of curating large amounts of high-quality raw data from the web. To address this, we propose an alternative approach by creating a high-quality synthetic dataset specifically for video instruction-following, namely LLaVA-Video-178K. This dataset includes key tasks such as detailed captioning, open-ended question-answering (QA), and multiple-choice QA. By training on this dataset, in combination with existing visual instruction tuning data, we introduce LLaVA-Video, a new video LMM. Our experiments demonstrate that LLaVA-Video achieves strong performance across various video benchmarks, highlighting the effectiveness of our dataset. We plan to release the dataset, its generation pipeline, and the model checkpoints.	翻訳日:2024-11-04 01:23:03 公開日:2024-10-04
# 正義か偏見か? LLM-as-a-Judgeにおけるバイアスの定量化 Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge ( http://arxiv.org/abs/2410.02736v1 ) ライセンス: Link先を確認	Jiayi Ye, Yanbo Wang, Yue Huang, Dongping Chen, Qihui Zhang, Nuno Moniz, Tian Gao, Werner Geyer, Chao Huang, Pin-Yu Chen, Nitesh V Chawla, Xiangliang Zhang,	(参考訳) LLM-as-a-Judgeは様々なベンチマークで評価手法として広く利用されており、モデルトレーニングにおける教師付き報酬として機能している。しかし、多くのドメインで優れているにもかかわらず、潜在的な問題は未調査であり、その信頼性と実用性の範囲を損なう。そこで本研究では,LLM-as-a-Judgeにおける各種類のバイアスを,自動的および原則的修正を用いて体系的に定量化し解析する,新しい自動バイアス量化フレームワークであるCALMを提案する。実験では,複数の人気言語モデルについて検討し,高度なモデルが総合的な性能を達成する一方で,特定のタスクにおいて重要なバイアスが持続することを示した。実験結果から, LLM-as-a-Judgeの信頼性は改善の余地があることが示唆された。さらに,これらのバイアスの明示的および暗黙的な影響についても論じ,LLM-as-a-Judgeの信頼性向上を示唆する。当社の作業は、これらの問題に対処するステークホルダの必要性を強調し、LLM-as-a-Judgeアプリケーションで注意を喚起します。 LLM-as-a-Judge has been widely utilized as an evaluation method in various benchmarks and served as supervised rewards in model training. However, despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. Therefore, we identify 12 key potential biases and propose a new automated bias quantification framework-CALM-which systematically quantifies and analyzes each type of bias in LLM-as-a-Judge by using automated and principle-guided modification. Our experiments cover multiple popular language models, and the results indicate that while advanced models have achieved commendable overall performance, significant biases persist in certain specific tasks. Empirical results suggest that there remains room for improvement in the reliability of LLM-as-a-Judge. Moreover, we also discuss the explicit and implicit influence of these biases and give some suggestions for the reliable application of LLM-as-a-Judge. Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.	翻訳日:2024-11-04 01:13:18 公開日:2024-10-04
# 正義か偏見か? LLM-as-a-Judgeにおけるバイアスの定量化 Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge ( http://arxiv.org/abs/2410.02736v2 ) ライセンス: Link先を確認	Jiayi Ye, Yanbo Wang, Yue Huang, Dongping Chen, Qihui Zhang, Nuno Moniz, Tian Gao, Werner Geyer, Chao Huang, Pin-Yu Chen, Nitesh V Chawla, Xiangliang Zhang,	(参考訳) LLM-as-a-Judgeは様々なベンチマークで評価手法として広く利用されており、モデルトレーニングにおける教師付き報酬として機能している。しかし、多くのドメインで優れているにもかかわらず、潜在的な問題は未調査であり、その信頼性と実用性の範囲を損なう。そこで本研究では,LLM-as-a-Judgeにおける各種類のバイアスを,自動的および原則的修正を用いて体系的に定量化し解析する,新しい自動バイアス量化フレームワークであるCALMを提案する。実験では,複数の人気言語モデルについて検討し,高度なモデルが総合的な性能を達成する一方で,特定のタスクにおいて重要なバイアスが持続することを示した。実験結果から, LLM-as-a-Judgeの信頼性は改善の余地があることが示唆された。さらに,これらのバイアスの明示的および暗黙的な影響についても論じ,LLM-as-a-Judgeの信頼性向上を示唆する。当社の作業は、これらの問題に対処するステークホルダの必要性を強調し、LLM-as-a-Judgeアプリケーションで注意を喚起します。 LLM-as-a-Judge has been widely utilized as an evaluation method in various benchmarks and served as supervised rewards in model training. However, despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. Therefore, we identify 12 key potential biases and propose a new automated bias quantification framework-CALM-which systematically quantifies and analyzes each type of bias in LLM-as-a-Judge by using automated and principle-guided modification. Our experiments cover multiple popular language models, and the results indicate that while advanced models have achieved commendable overall performance, significant biases persist in certain specific tasks. Empirical results suggest that there remains room for improvement in the reliability of LLM-as-a-Judge. Moreover, we also discuss the explicit and implicit influence of these biases and give some suggestions for the reliable application of LLM-as-a-Judge. Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.	翻訳日:2024-11-04 01:13:18 公開日:2024-10-04
# AVG-LLaVA:適応的な視覚的粒度を持つマルチモーダル大モデル AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity ( http://arxiv.org/abs/2410.02745v1 ) ライセンス: Link先を確認	Zhibin Lan, Liqiang Niu, Fandong Meng, Wenbo Li, Jie Zhou, Jinsong Su,	(参考訳) 近年、高解像度画像を扱う場合、支配的なLMMは通常、それらを複数のローカル画像と1つのグローバル画像に分割する。本研究では、入力画像と命令に基づいて適切な視覚的粒度を適応的に選択できるLMMであるAVG-LLaVAを紹介する。このアプローチは、ビジュアルトークンの数を減らし、推論を高速化するだけでなく、全体的なモデルパフォーマンスも改善する。具体的には、LLaVA-NeXTに基づく以下のモジュールを紹介する。 (a)異なる粒度の視覚的トークンを得るために複数のプール層を含む視覚的粒度スケーラ b)トランスフォーマー層、MLP層、及び投票器層を含む視覚的粒度ルータであって、画像及び指示に基づいて適切な視覚的粒度を選択するために使用される。さらに,ルータが予測する粒度をLMMの好みに合わせることを目的とした新たなトレーニングパラダイムであるRGLFを提案する。大規模な実験と分析の結果、AVG-LLaVAは11のベンチマークで優れたパフォーマンスを達成し、ビジュアルトークンの数を大幅に削減し、推論を高速化している(例えば、ビジュアルトークンの85.3%削減とAI2Dベンチマークでの推論速度の2.53$\times$上昇)。 Recently, when dealing with high-resolution images, dominant LMMs usually divide them into multiple local images and one global image, which will lead to a large number of visual tokens. In this work, we introduce AVG-LLaVA, an LMM that can adaptively select the appropriate visual granularity based on the input image and instruction. This approach not only reduces the number of visual tokens and speeds up inference, but also improves the overall model performance. Specifically, we introduce the following modules based on LLaVA-NeXT: (a) a visual granularity scaler that includes multiple pooling layers to obtain visual tokens with different granularities; (b) a visual granularity router, which includes a Transformer layer, an MLP layer, and a voter layer, used to select the appropriate visual granularity based on the image and instruction. Furthermore, we propose RGLF, a novel training paradigm that aims at aligning the granularity predicted by the router with the preferences of the LMM, without the need for additional manually annotated data. Extensive experiments and analysis show that AVG-LLaVA achieves superior performance across 11 benchmarks, as well as significantly reduces the number of visual tokens and speeds up inference (e.g., an 85.3% reduction in visual tokens and a 2.53$\times$ increase in inference speed on the AI2D benchmark).	翻訳日:2024-11-04 01:03:22 公開日:2024-10-04
# AVG-LLaVA:適応的な視覚的粒度を持つ大規模マルチモーダルモデル AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity ( http://arxiv.org/abs/2410.02745v2 ) ライセンス: Link先を確認	Zhibin Lan, Liqiang Niu, Fandong Meng, Wenbo Li, Jie Zhou, Jinsong Su,	(参考訳) 近年、高解像度画像を扱う場合、支配的なLMMは通常、それらを複数のローカル画像と1つのグローバル画像に分割する。本研究では、入力画像と命令に基づいて適切な視覚的粒度を適応的に選択できるLMMであるAVG-LLaVAを紹介する。このアプローチは、ビジュアルトークンの数を減らし、推論を高速化するだけでなく、全体的なモデルパフォーマンスも改善する。具体的には、LLaVA-NeXTに基づく以下のモジュールを紹介する。 (a)異なる粒度の視覚的トークンを得るために複数のプール層を含む視覚的粒度スケーラ b)トランスフォーマー層、MLP層、及び投票器層を含む視覚的粒度ルータであって、画像及び指示に基づいて適切な視覚的粒度を選択するために使用される。さらに,ルータが予測する粒度をLMMの好みに合わせることを目的とした新たなトレーニングパラダイムであるRGLFを提案する。大規模な実験と分析の結果、AVG-LLaVAは11のベンチマークで優れたパフォーマンスを達成し、ビジュアルトークンの数を大幅に削減し、推論を高速化している(例えば、ビジュアルトークンの85.3%削減とAI2Dベンチマークでの推論速度の2.53$\times$上昇)。 Recently, when dealing with high-resolution images, dominant LMMs usually divide them into multiple local images and one global image, which will lead to a large number of visual tokens. In this work, we introduce AVG-LLaVA, an LMM that can adaptively select the appropriate visual granularity based on the input image and instruction. This approach not only reduces the number of visual tokens and speeds up inference, but also improves the overall model performance. Specifically, we introduce the following modules based on LLaVA-NeXT: (a) a visual granularity scaler that includes multiple pooling layers to obtain visual tokens with different granularities; (b) a visual granularity router, which includes a Transformer layer, an MLP layer, and a voter layer, used to select the appropriate visual granularity based on the image and instruction. Furthermore, we propose RGLF, a novel training paradigm that aims at aligning the granularity predicted by the router with the preferences of the LMM, without the need for additional manually annotated data. Extensive experiments and analysis show that AVG-LLaVA achieves superior performance across 11 benchmarks, as well as significantly reduces the number of visual tokens and speeds up inference (e.g., an 85.3% reduction in visual tokens and a 2.53$\times$ increase in inference speed on the AI2D benchmark).	翻訳日:2024-11-04 01:03:22 公開日:2024-10-04
# ソーシャル・アウェア・ダイアログのためのスケーラブルなフレームベースによる社会文化的ノルムベースの構築 Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues ( http://arxiv.org/abs/2410.03049v1 ) ライセンス: Link先を確認	Shilin Qu, Weiqing Wang, Xin Zhou, Haolan Zhan, Zhuang Li, Lizhen Qu, Linhao Luo, Yuan-Fang Li, Gholamreza Haffari,	(参考訳) 社会文化の規範は、社会的相互作用における個人的行為の指針として機能し、尊敬、協力、適切な行動を強調し、会話情報検索、文脈情報検索、検索強化機械学習といったタスクに役立てることができる。本稿では,大規模言語モデル(LLM)を用いた社会文化的ノルム(SCN)ベースを構築するためのスケーラブルなアプローチを提案する。我々は、包括的で広くアクセス可能な中国社会文化ノルムベースを構築した。提案手法は,コンテキストフレームに富んだ社会認識対話を主データ源として利用し,生成過程の制約と幻覚の低減を図る。これにより、状況に関する発話の実践的な意味を生かし、高品質でニュアンスのある自然言語のノルム文を抽出することができる。金フレームを付加した実対話は容易には利用できないため、合成データを用いて提案する。私たちの経験的結果は以下のとおりです。 (i)合成データから得られるSCNの品質は、金枠に注釈を付けた実際の対話に匹敵するものであり、 (II) 実データから抽出したSCNの品質は、銀(予測)または金のフレームで注釈付けされ、フレームアノテーションを使わずにそれを超える。さらに、複数の下流対話タスクを推論するRAGモデル(Retrieval-Augmented Generation)モデルにおいて、抽出したSCNの有効性を示す。 Sociocultural norms serve as guiding principles for personal conduct in social interactions, emphasizing respect, cooperation, and appropriate behavior, which is able to benefit tasks including conversational information retrieval, contextual information retrieval and retrieval-enhanced machine learning. We propose a scalable approach for constructing a Sociocultural Norm (SCN) Base using Large Language Models (LLMs) for socially aware dialogues. We construct a comprehensive and publicly accessible Chinese Sociocultural NormBase. Our approach utilizes socially aware dialogues, enriched with contextual frames, as the primary data source to constrain the generating process and reduce the hallucinations. This enables extracting of high-quality and nuanced natural-language norm statements, leveraging the pragmatic implications of utterances with respect to the situation. As real dialogue annotated with gold frames are not readily available, we propose using synthetic data. Our empirical results show: (i) the quality of the SCNs derived from synthetic data is comparable to that from real dialogues annotated with gold frames, and (ii) the quality of the SCNs extracted from real data, annotated with either silver (predicted) or gold frames, surpasses that without the frame annotations. We further show the effectiveness of the extracted SCNs in a RAG-based (Retrieval-Augmented Generation) model to reason about multiple downstream dialogue tasks.	翻訳日:2024-11-03 04:16:10 公開日:2024-10-04
# AuroraCap: 効率的でパフォーマンスのよいビデオのキャプションとベンチマーク AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark ( http://arxiv.org/abs/2410.03051v1 ) ライセンス: Link先を確認	Wenhao Chai, Enxin Song, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tal, Jeng-Neng Hwang, Saining Xie, Christopher D. Manning,	(参考訳) ビデオの詳細なキャプションは、ビデオコンテンツの包括的で一貫性のあるテキスト記述を生成することを目的としており、ビデオの理解と生成の両方に役立っている。本稿では,大規模なマルチモーダルモデルに基づくビデオキャプタであるAuroraCapを提案する。時間的モデリングのためのパラメータを追加せずに、最もシンプルなアーキテクチャ設計に従う。長大なビデオシーケンスによるオーバーヘッドに対処するため、私たちはトークンマージ戦略を実装し、入力されたビジュアルトークンの数を減らす。驚いたことに、この戦略によってパフォーマンスがほとんど損なわれることがわかりました。例えば、Flickr30kで88.9のCIDErを取得し、GPT-4V (55.3)とGemini-1.5 Pro (82.2)を上回った。しかし、既存のビデオキャプションベンチマークには、いくつかの単語からなる単純な記述のみが含まれており、この分野の研究は制限されている。そこで我々は,千以上の注意深い注釈付き字幕を持つビデオ詳細な字幕ベンチマークであるVDCを開発した。さらに,長いキャプション評価を複数の短い質問応答対に変換する分割・問合せ戦略を採用したLCM支援メトリックVDCスコアを提案する。人間のEloランキングの助けを借りて、このベンチマークはビデオのキャプション品質に関する人間の判断と相関していることを示す。 Video detailed captioning is a key task which aims to generate comprehensive and coherent textual descriptions of video content, benefiting both video understanding and generation. In this paper, we propose AuroraCap, a video captioner based on a large multimodal model. We follow the simplest architecture design without additional parameters for temporal modeling. To address the overhead caused by lengthy video sequences, we implement the token merging strategy, reducing the number of input visual tokens. Surprisingly, we found that this strategy results in little performance loss. AuroraCap shows superior performance on various video and image captioning benchmarks, for example, obtaining a CIDEr of 88.9 on Flickr30k, beating GPT-4V (55.3) and Gemini-1.5 Pro (82.2). However, existing video caption benchmarks only include simple descriptions, consisting of a few dozen words, which limits research in this field. Therefore, we develop VDC, a video detailed captioning benchmark with over one thousand carefully annotated structured captions. In addition, we propose a new LLM-assisted metric VDCscore for bettering evaluation, which adopts a divide-and-conquer strategy to transform long caption evaluation into multiple short question-answer pairs. With the help of human Elo ranking, our experiments show that this benchmark better correlates with human judgments of video detailed captioning quality.	翻訳日:2024-11-03 04:16:10 公開日:2024-10-04
# 高速輸送を用いたクラス階層の埋め込みによる構造化表現の学習 Learning Structured Representations by Embedding Class Hierarchy with Fast Optimal Transport ( http://arxiv.org/abs/2410.03052v1 ) ライセンス: Link先を確認	Siqi Zeng, Sixian Du, Makoto Yamada, Han Zhao,	(参考訳) ラベル内に構造化された知識を特徴表現に組み込むため、先行研究 (Zeng et al , 2022) では、教師あり学習における正則化としてCophenetic correlation Coefficient (CPCC) を用いることを提案した。この正規化器は、クラス平均のペアワイズユークリッド距離を算出し、ラベル階層木から派生した対応する最短経路距離と整合する。しかし、クラス平均はクラス条件分布のよい代表ではないかもしれない。この制限に対処するため、CPCCフレームワークの下で、特徴空間内のクラス間のペア距離を測定するために、Earth Mover's Distance (EMD) を用いることを提案する。提案手法は従来の手法を一般化し,特徴空間におけるクラス条件分布がガウス分布である場合,既存のアルゴリズムを復元する。提案手法の計算効率をさらに向上するために,4つのEMD近似変種を探索し,最適なトランスポートCPCCファミリーを導入する。最も効率的なOT-CPCC変種は、データセットとタスク間の競合性能を維持しながら、データセットのサイズで線形時間で実行されます。 To embed structured knowledge within labels into feature representations, prior work (Zeng et al., 2022) proposed to use the Cophenetic Correlation Coefficient (CPCC) as a regularizer during supervised learning. This regularizer calculates pairwise Euclidean distances of class means and aligns them with the corresponding shortest path distances derived from the label hierarchy tree. However, class means may not be good representatives of the class conditional distributions, especially when they are multi-mode in nature. To address this limitation, under the CPCC framework, we propose to use the Earth Mover's Distance (EMD) to measure the pairwise distances among classes in the feature space. We show that our exact EMD method generalizes previous work, and recovers the existing algorithm when class-conditional distributions are Gaussian in the feature space. To further improve the computational efficiency of our method, we introduce the Optimal Transport-CPCC family by exploring four EMD approximation variants. Our most efficient OT-CPCC variant runs in linear time in the size of the dataset, while maintaining competitive performance across datasets and tasks.	翻訳日:2024-11-03 04:16:10 公開日:2024-10-04
# CLIP-Clique:オブジェクトベースグローバルローカライゼーションのための視覚言語モデルによるグラフベースの対応マッチング CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization ( http://arxiv.org/abs/2410.03054v1 ) ライセンス: Link先を確認	Shigemichi Matsuzaki, Kazuhito Tanaka, Kazuhiro Shintani,	(参考訳) 本文では,意味オブジェクトランドマークを持つ地図上でのグローバルなローカライズ手法を提案する。オブジェクトマップ上のローカライズのための最も有望なアプローチの1つは、周囲のオブジェクトの分布から計算されたランドマーク記述子を用いて意味グラフマッチングを使用することである。これらの記述子は誤分類や部分的な観察に弱い。さらに、多くの既存手法はRANSACを用いた不整合抽出に依存しており、これは確率的であり、高い外れ値率に敏感である。従来の問題に対処するために、視覚言語モデル(VLM)を用いた対応マッチングを強化する。ランドマークの識別性は、周囲の物体とは独立なVLM埋め込みによって改善される。さらに、inlierはグラフ理論のアプローチを用いて決定的に推定される。また、対応性や観測完全性を考慮した最小二乗の重み付けによるポーズ計算を導入し、ロバスト性を向上させる。 ScanNetおよびTUMデータセットを用いた実験により,マッチング精度とポーズ推定精度の改善を確認した。 This letter proposes a method of global localization on a map with semantic object landmarks. One of the most promising approaches for localization on object maps is to use semantic graph matching using landmark descriptors calculated from the distribution of surrounding objects. These descriptors are vulnerable to misclassification and partial observations. Moreover, many existing methods rely on inlier extraction using RANSAC, which is stochastic and sensitive to a high outlier rate. To address the former issue, we augment the correspondence matching using Vision Language Models (VLMs). Landmark discriminability is improved by VLM embeddings, which are independent of surrounding objects. In addition, inliers are estimated deterministically using a graph-theoretic approach. We also incorporate pose calculation using the weighted least squares considering correspondence similarity and observation completeness to improve the robustness. We confirmed improvements in matching and pose estimation accuracy through experiments on ScanNet and TUM datasets.	翻訳日:2024-11-03 04:16:10 公開日:2024-10-04
# 大規模言語モデルに対する許容情報フロー解析 Permissive Information-Flow Analysis for Large Language Models ( http://arxiv.org/abs/2410.03055v1 ) ライセンス: Link先を確認	Shoaib Ahmed Siddiqui, Radhika Gaonkar, Boris Köpf, David Krueger, Andrew Paverd, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Menglin Xia, Santiago Zanella-Béguelin,	(参考訳) 大規模言語モデル(LLM)は、大規模ソフトウェアシステムのコモディティコンポーネントになりつつある。これは自然のセキュリティとプライバシの問題を引き起こす。あるコンポーネントから取得した有毒なデータは、モデルの振る舞いを変更し、システム全体を汚染する可能性がある。 1つの有望なアプローチは、動的情報フロー(別名 taint)トラッキングを通じて、システムレベルでこの問題に取り組むことである。残念ながら、出力に最も制限のある入力ラベルを伝搬する従来のアプローチは、多様なソースから取得した入力に対してLLMが動作するアプリケーションには保守的すぎる。本稿では,LLMクエリを通じて情報フローラベルを伝搬する,新しい,より寛容な手法を提案する。提案手法の背景にある重要な考え方は、モデル出力の生成に影響を及ぼすサンプルのラベルのみを伝播させ、不要な入力のラベルを除去することである。このアプローチの2つのバリエーションの有効性を実装し,検討する。 (i)プロンプトベースの検索強化、及び (ii)$k$-nearest-neighbors言語モデル。本稿では,言語モデルに出力ラベルの予測を直接依頼するイントロスペクションに基づくインフルエンス推定器のベースラインと比較する。その結果, プロンプトベースラベルプロパゲータの優位性が強調され, LLMエージェント設定の85%以上でラベルが改善した。これらの知見は,検索増強のための許容ラベル伝搬の実用性を強調した。 Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems. This poses natural security and privacy problems: poisoned data retrieved from one component can change the model's behavior and compromise the entire system, including coercing the model to spread confidential data to untrusted components. One promising approach is to tackle this problem at the system level via dynamic information flow (aka taint) tracking. Unfortunately, the traditional approach of propagating the most restrictive input label to the output is too conservative for applications where LLMs operate on inputs retrieved from diverse sources. In this paper, we propose a novel, more permissive approach to propagate information flow labels through LLM queries. The key idea behind our approach is to propagate only the labels of the samples that were influential in generating the model output and to eliminate the labels of unnecessary input. We implement and investigate the effectiveness of two variations of this approach, based on (i) prompt-based retrieval augmentation, and (ii) a $k$-nearest-neighbors language model. We compare these with the baseline of an introspection-based influence estimator that directly asks the language model to predict the output label. The results obtained highlight the superiority of our prompt-based label propagator, which improves the label in more than 85% of the cases in an LLM agent setting. These findings underscore the practicality of permissive label propagation for retrieval augmentation.	翻訳日:2024-11-03 04:16:10 公開日:2024-10-04
# アンタングル表現評価のための改良されたメトリクスの実現に向けて Towards an Improved Metric for Evaluating Disentangled Representations ( http://arxiv.org/abs/2410.03056v1 ) ライセンス: Link先を確認	Sahib Julka, Yashu Wang, Michael Granitzer,	(参考訳) 切り離された表現学習は、表現を制御可能、解釈可能、転送可能にする上で重要な役割を果たす。領域におけるその重要性にもかかわらず、信頼性と一貫した量的絡み合い計量の探求は依然として大きな課題である。これは、さまざまな特性と、その設計によって導入された潜在的なバイアスを測定する多様なメトリクスの利用に由来する。本研究は,既存のゆがみ評価指標を網羅的に検討し,ゆがみの側面(モジュラリティ,コンパクト性,明示性)を比較し,因子コード関係を検出し,ゆがみの程度を記述した。本稿では,「emph{EDI}」と題する測度を導入し,直感的な概念である「emph{exclusivity}」と「因子コード関係」を改善し,アドホックな決定を最小化する手法を提案する。詳細な分析によると、EDIは既存のメトリクスよりも安定して重要な特性を計測し、標準化されたアプローチとして採用することを提唱している。 Disentangled representation learning plays a pivotal role in making representations controllable, interpretable and transferable. Despite its significance in the domain, the quest for reliable and consistent quantitative disentanglement metric remains a major challenge. This stems from the utilisation of diverse metrics measuring different properties and the potential bias introduced by their design. Our work undertakes a comprehensive examination of existing popular disentanglement evaluation metrics, comparing them in terms of measuring aspects of disentanglement (viz. Modularity, Compactness, and Explicitness), detecting the factor-code relationship, and describing the degree of disentanglement. We propose a new framework for quantifying disentanglement, introducing a metric entitled \emph{EDI}, that leverages the intuitive concept of \emph{exclusivity} and improved factor-code relationship to minimize ad-hoc decisions. An in-depth analysis reveals that EDI measures essential properties while offering more stability than existing metrics, advocating for its adoption as a standardised approach.	翻訳日:2024-11-03 04:16:10 公開日:2024-10-04
# DiffKillR:Dense Microscopy Imagesにおける細胞アノテーションのためのDiffomorphismsの殺害と再生 DiffKillR: Killing and Recreating Diffeomorphisms for Cell Annotation in Dense Microscopy Images ( http://arxiv.org/abs/2410.03058v1 ) ライセンス: Link先を確認	Chen Liu, Danqi Liao, Alejandro Parada-Mayorga, Alejandro Ribeiro, Marcello DiStasio, Smita Krishnaswamy,	(参考訳) 自動スライドスキャンの進歩によって誘導されるデジタル顕微鏡画像の拡散は、生体医学的な研究や臨床診断に重要な機会をもたらす。しかし、これらの画像に密集した情報を正確に注釈付けすることは大きな課題である。 DiffKillRは、アーチェタイプマッチングと画像登録タスクの組み合わせとしてセルアノテーションを再構成する新しいフレームワークである。 DiffKillRは、堅牢な細胞マッチングのために微分同相不変の特徴空間を学習するニューラルネットワークと、アノテーションマッピングのために細胞間の正確なワープフィールドを計算するニューラルネットワークを2つ採用している。注釈付きアーチタイプの小さなセットを使用して、DiffKillRは、大きな顕微鏡画像間でアノテーションを効率よく伝播し、広範囲な手動ラベリングの必要性を減らす。さらに重要なのは、どんな種類のピクセルレベルのアノテーションにも適しています。我々はDiffKillRの理論的性質について論じ、それを3つの顕微鏡タスクで検証し、既存の教師付き・半教師なし・教師なしの手法に対する利点を実証する。 The proliferation of digital microscopy images, driven by advances in automated whole slide scanning, presents significant opportunities for biomedical research and clinical diagnostics. However, accurately annotating densely packed information in these images remains a major challenge. To address this, we introduce DiffKillR, a novel framework that reframes cell annotation as the combination of archetype matching and image registration tasks. DiffKillR employs two complementary neural networks: one that learns a diffeomorphism-invariant feature space for robust cell matching and another that computes the precise warping field between cells for annotation mapping. Using a small set of annotated archetypes, DiffKillR efficiently propagates annotations across large microscopy images, reducing the need for extensive manual labeling. More importantly, it is suitable for any type of pixel-level annotation. We will discuss the theoretical properties of DiffKillR and validate it on three microscopy tasks, demonstrating its advantages over existing supervised, semi-supervised, and unsupervised methods.	翻訳日:2024-11-03 04:06:08 公開日:2024-10-04
# トロッターエラーに対する下界 Lower Bounds for the Trotter Error ( http://arxiv.org/abs/2410.03059v1 ) ライセンス: Link先を確認	Alexander Hahn, Paul Hartung, Daniel Burgarth, Paolo Facchi, Kazuya Yuasa,	(参考訳) 実際に関係する量子系のアナログおよびデジタルシミュレーションでは、ターゲット力学は概ね実装できる。トロッター積公式は、チューニング精度を許容する一般的な方法であるため、最も一般的な近似スキームである。トロッターシミュレーションの精度は常に非可換作用素に対して不正確なものであるが、現在最小誤差が何であるかは分かっていない。トロッター誤差の上限は、しばしば非常に過大評価されることが知られているため、これは重要な量である。ここでは、エラー、ノルムおよび状態の明示的な下限を示し、最小限のリソース要求を導出する。真の誤差との数値的な比較は、我々の境界が正確かつ厳密な推定を与えることを示している。 In analog and digital simulations of practically relevant quantum systems, the target dynamics can only be implemented approximately. The Trotter product formula is the most common approximation scheme as it is a generic method which allows tuning accuracy. The Trotter simulation precision will always be inexact for non-commuting operators, but it is currently unknown what the minimum possible error is. This is an important quantity because upper bounds for the Trotter error are known to often be vast overestimates. Here, we present explicit lower bounds on the error, in norm and on states, allowing to derive minimum resource requirements. Numerical comparison with the true error shows that our bounds offer accurate and tight estimates.	翻訳日:2024-11-03 04:06:08 公開日:2024-10-04
# DocKD:オープンワールド文書理解モデルのためのLLMからの知識蒸留 DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models ( http://arxiv.org/abs/2410.03061v1 ) ライセンス: Link先を確認	Sungnyun Kim, Haofu Liao, Srikar Appalaraju, Peng Tang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan, Stefano Soatto,	(参考訳) ビジュアル文書理解(VDU)は、様々なモダリティ(テキストや画像)とレイアウト(フォーム、テーブルなど)にわたる文書の理解を伴う、困難なタスクである。本研究の目的は,LLMの知識を蒸留することにより,小型VDUモデルの一般化性を高めることである。 LLMの直接的なプロンプトは、しばしば情報的で有用なデータを生成するのに失敗する。これに対し、外部文書知識を統合することでデータ生成プロセスを充実させる新しいフレームワーク(DocKD)を提案する。具体的には、キーと値のペアやレイアウト、記述など、さまざまなドキュメント要素を備えたLCMを提供し、オープンな回答を導き出します。実験の結果,DocKDは高品質な文書アノテーションを生成し,外部文書知識を活用できない直接知識蒸留手法を超越していることがわかった。さらに、DocKD生成データのみでトレーニングされた学生VDUモデルは、ドメイン内タスクで人間が注釈付けしたデータでトレーニングされたモデルに匹敵するだけでなく、ドメイン外タスクで大幅に最適化されている。 Visual document understanding (VDU) is a challenging task that involves understanding documents across various modalities (text and image) and layouts (forms, tables, etc.). This study aims to enhance generalizability of small VDU models by distilling knowledge from LLMs. We identify that directly prompting LLMs often fails to generate informative and useful data. In response, we present a new framework (called DocKD) that enriches the data generation process by integrating external document knowledge. Specifically, we provide an LLM with various document elements like key-value pairs, layouts, and descriptions, to elicit open-ended answers. Our experiments show that DocKD produces high-quality document annotations and surpasses the direct knowledge distillation approach that does not leverage external document knowledge. Moreover, student VDU models trained with solely DocKD-generated data are not only comparable to those trained with human-annotated data on in-domain tasks but also significantly excel them on out-of-domain tasks.	翻訳日:2024-11-03 04:06:08 公開日:2024-10-04
# 画像ファーストかテキストファーストか? 大規模言語モデルにおけるモーダリティのシークエンシングの最適化と推論 Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks ( http://arxiv.org/abs/2410.03062v1 ) ライセンス: Link先を確認	Grant Wardle, Teo Susnjak,	(参考訳) 本稿では,マルチモーダル内の画像とテキストのシークエンシングが,大規模言語モデル(LLM)の推論性能にどのように影響するかを検討する。 3種類の商用LCMを用いて経験的評価を行った。以上の結果から,モダリティが提示される順序は,特に複雑度が変化するタスクにおいて,性能に大きく影響を及ぼす可能性が示唆された。単一の画像を含む単純なタスクに対して、モダリティシークエンシングは精度に明確な影響を及ぼした。しかし、複数の画像と複雑な推論ステップを含むより複雑なタスクでは、シークエンシングの効果が減少し、おそらくタスクの認知的要求が増大したためである。また,本研究は疑問/急激な構造の重要性も強調した。ネストおよびマルチステップの推論タスクでは、モダリティシークエンシングがモデルパフォーマンスを形成する上で重要な役割を果たした。 LLMは推論の初期の段階では優れていたが、以前の情報の再編成に苦慮し、トランスフォーマーアーキテクチャにおけるマルチホップ推論の課題を浮き彫りにした。このことは、モダリティの列と推論ステップの論理フローとの整合性は、モダリティ順序単独よりも重要であることを示唆している。これらの知見は、教育、医用画像、クロスモーダル・ラーニングといった分野にまたがる幅広い応用によって、マルチモーダル・プロンプト・デザインを改善する上で重要な意味を持つ。 This paper examines how the sequencing of images and text within multi-modal prompts influences the reasoning performance of large language models (LLMs). We performed empirical evaluations using three commercial LLMs. Our results demonstrate that the order in which modalities are presented can significantly affect performance, particularly in tasks of varying complexity. For simpler tasks involving a single image, modality sequencing had a clear impact on accuracy. However, in more complex tasks involving multiple images and intricate reasoning steps, the effect of sequencing diminished, likely due to the increased cognitive demands of the task. Our findings also highlight the importance of question/prompt structure. In nested and multi-step reasoning tasks, modality sequencing played a key role in shaping model performance. While LLMs excelled in the initial stages of reasoning, they struggled to re-incorporate earlier information, underscoring the challenges of multi-hop reasoning within transformer architectures. This suggests that aligning the sequence of modalities with the logical flow of reasoning steps is more critical than modality order alone. These insights offer valuable implications for improving multi-modal prompt design, with broader applications across fields such as education, medical imaging, and cross-modal learning.	翻訳日:2024-11-03 04:06:08 公開日:2024-10-04
# プログラミング入門科目における自然言語プロンプトタスクの統合 Integrating Natural Language Prompting Tasks in Introductory Programming Courses ( http://arxiv.org/abs/2410.03063v1 ) ライセンス: Link先を確認	Chris Kerslake, Paul Denny, David H Smith IV, James Prather, Juho Leinonen, Andrew Luxton-Reilly, Stephen MacNeil,	(参考訳) 入門プログラミングコースは、より複雑で興味深いプログラムに進む前に、マスター構文と基本的な構成を強調することが多い。このボトムアップのアプローチは、初心者にとってはイライラさせる可能性があり、問題の解決から焦点を移し、幅広い学生にとってコンピューティングの魅力を損なう可能性がある。コード生産のための生成AIの台頭は、ハイレベルなプロンプトの構築や自動生成されるコードの評価を含む、AIモデルとのインタラクションを通じて新しいスキルを育むことによって、これらの問題に部分的に対処する可能性がある。本経験報告では,6週間のモジュールで4つの実験室にまたがって実施された,イントロダクトリー・コースにおける2つのアクティベーションに焦点を当てた2つのアクティビティについて検討する。第一に、学生は自然言語のプロンプトを書き、構文上の問題解決を強調することで、計算問題を解く必要がある。 2つ目は、プロンプトとコードの関係を理解するために、提供されたフラグメントに相当するコードを生成するプロンプトを作成することである。コースの学生の多くは、プログラミングを学ぶのが難しいと報告しており、しばしば、構文やデバッグに関する不満を引用している。学習プログラムにおける自己報告の難しさは、期待通り、テストやプロジェクトといった従来のプログラミングアセスメントのパフォーマンスと強い逆関係があることがわかりました。しかし、自然言語タスクのパフォーマンスは、自己報告の難しさとあまり強く関連しておらず、異なるスキルをターゲットにしていることが示唆された。 AIコーディングモデルとコミュニケーションする方法を学ぶことは重要なスキルとなり、自然言語によるタスクの促進は幅広い学生にアピールする可能性がある。 Introductory programming courses often emphasize mastering syntax and basic constructs before progressing to more complex and interesting programs. This bottom-up approach can be frustrating for novices, shifting the focus away from problem solving and potentially making computing less appealing to a broad range of students. The rise of generative AI for code production could partially address these issues by fostering new skills via interaction with AI models, including constructing high-level prompts and evaluating code that is automatically generated. In this experience report, we explore the inclusion of two prompt-focused activities in an introductory course, implemented across four labs in a six-week module. The first requires students to solve computational problems by writing natural language prompts, emphasizing problem-solving over syntax. The second involves students crafting prompts to generate code equivalent to provided fragments, to foster an understanding of the relationship between prompts and code. Most of the students in the course had reported finding programming difficult to learn, often citing frustrations with syntax and debugging. We found that self-reported difficulty with learning programming had a strong inverse relationship with performance on traditional programming assessments such as tests and projects, as expected. However, performance on the natural language tasks was less strongly related to self-reported difficulty, suggesting they may target different skills. Learning how to communicate with AI coding models is becoming an important skill, and natural language prompting tasks may appeal to a broad range of students.	翻訳日:2024-11-03 04:06:08 公開日:2024-10-04
# コンピュータか、KVキャッシュをロードする? Compute Or Load KV Cache? Why Not Both? ( http://arxiv.org/abs/2410.03065v1 ) ライセンス: Link先を確認	Shuowei Jin, Xueshen Liu, Qingzhao Zhang, Z. Morley Mao,	(参考訳) 近年のLLM(Large Language Models)の進歩により、コンテキストウィンドウのサイズが大幅に増加し、高度なアプリケーションを実現するとともに、特にプリフィルステージにおけるキー値(KV)キャッシュの計算といった計算オーバーヘッドも大幅に増大した。このシナリオでは、プリフィックスキャッシュがGPUパワーを節約するために登場し、ディスク上のKVキャッシュを節約し、複数のクエリで再利用する。しかしながら、従来のプレフィックスキャッシュメカニズムは、ディスクからGPUメモリへのKVキャッシュのロード速度がI/Oデバイスのスループットによってボトルネックになるため、大きなレイテンシに悩まされることが多い。長文プリフィルのレイテンシを最適化するために,双方向並列化KVキャッシュ生成戦略を採用した新しいKVキャッシュローダであるCakeを提案する。プリフィルタスクを受信すると、Cakeはプレフィックスキャッシュ位置から保存されたKVキャッシュを同時に動的にロードし、ローカルGPU上でKVキャッシュを計算し、利用可能な計算とI/O帯域幅リソースの利用を最大化する。さらに、Cakeは手動パラメータなしで様々なシステムステータスに自動的に適応する。チューニング様々なプロンプトデータセット、GPU、I/Oデバイスの実験において、Cakeは最大68.1%のTTFT(Time To First Token)削減を計算専用法と比較し、94.6%のTTFT削減をI/O専用法と比較した。 Recent advancements in Large Language Models (LLMs) have significantly increased context window sizes, enabling sophisticated applications but also introducing substantial computational overheads, particularly computing key-value (KV) cache in the prefill stage. Prefix caching has emerged to save GPU power in this scenario, which saves KV cache at disks and reuse them across multiple queries. However, traditional prefix caching mechanisms often suffer from substantial latency because the speed of loading KV cache from disks to GPU memory is bottlenecked by the throughput of I/O devices. To optimize the latency of long-context prefill, we propose Cake, a novel KV cache loader, which employs a bidirectional parallelized KV cache generation strategy. Upon receiving a prefill task, Cake simultaneously and dynamically loads saved KV cache from prefix cache locations and computes KV cache on local GPUs, maximizing the utilization of available computation and I/O bandwidth resources. Additionally, Cake automatically adapts to diverse system statuses without manual parameter. tuning. In experiments on various prompt datasets, GPUs, and I/O devices, Cake offers up to 68.1% Time To First Token (TTFT) reduction compare with compute-only method and 94.6% TTFT reduction compare with I/O-only method.	翻訳日:2024-11-03 04:06:08 公開日:2024-10-04
# FedCert:Federated Acertacy Certification FedCert: Federated Accuracy Certification ( http://arxiv.org/abs/2410.03067v1 ) ライセンス: Link先を確認	Minh Hieu Nguyen, Huu Tien Nguyen, Trung Thanh Nguyen, Manh Duong Nguyen, Trong Nghia Hoang, Truong Thao Nguyen, Phi Le Nguyen,	(参考訳) フェデレートラーニング(FL)は、機械学習モデルを分散的にトレーニングするための強力なパラダイムとして登場し、クライアントにローカルデータを保持することでデータのプライバシを保存する。しかし、これらのモデルのクライアントにおけるデータ摂動に対する堅牢性を評価することは、依然として大きな課題である。従来の研究では、モデルの予測の一定の割合が入力データが摂動しても正しいことを保証し、認証精度に基づいて集中トレーニングにおけるモデルの有効性を評価してきた。しかし、これらの評価をFLに拡張するという課題は、未知のクライアントのローカルデータが原因で未解決のままである。そこで本研究では,FLシステムのロバスト性評価に向けた第一歩として,FedCertという手法を提案する。提案手法は,各クライアントの認証精度とクラス分布に基づいて,グローバルモデルの認証精度を近似する。さらに、実世界のシナリオにおけるデータの非独立分散(Non-IID)特性を考慮し、近似アルゴリズムの集約段階において、信頼性の高い精度を保証するためのクライアントグループ化アルゴリズムを導入する。理論的解析を通じて,FLシステムの堅牢性と信頼性を評価する上で,FedCertの有効性を示す。さらに,様々なシナリオにおけるCIFAR-10とCIFAR-100データセットの実験結果から,FedCertはベースライン法に比べて推定誤差を一貫して減少させることが示された。本研究は,FLシステムの堅牢性を評価するためのソリューションを提供し,分散学習の信頼性を高めるための今後の研究の基盤となる。ソースコードはhttps://github.com/thanhhff/FedCert/で入手できる。 Federated Learning (FL) has emerged as a powerful paradigm for training machine learning models in a decentralized manner, preserving data privacy by keeping local data on clients. However, evaluating the robustness of these models against data perturbations on clients remains a significant challenge. Previous studies have assessed the effectiveness of models in centralized training based on certified accuracy, which guarantees that a certain percentage of the model's predictions will remain correct even if the input data is perturbed. However, the challenge of extending these evaluations to FL remains unresolved due to the unknown client's local data. To tackle this challenge, this study proposed a method named FedCert to take the first step toward evaluating the robustness of FL systems. The proposed method is designed to approximate the certified accuracy of a global model based on the certified accuracy and class distribution of each client. Additionally, considering the Non-Independent and Identically Distributed (Non-IID) nature of data in real-world scenarios, we introduce the client grouping algorithm to ensure reliable certified accuracy during the aggregation step of the approximation algorithm. Through theoretical analysis, we demonstrate the effectiveness of FedCert in assessing the robustness and reliability of FL systems. Moreover, experimental results on the CIFAR-10 and CIFAR-100 datasets under various scenarios show that FedCert consistently reduces the estimation error compared to baseline methods. This study offers a solution for evaluating the robustness of FL systems and lays the groundwork for future research to enhance the dependability of decentralized learning. The source code is available at https://github.com/thanhhff/FedCert/.	翻訳日:2024-11-03 04:06:08 公開日:2024-10-04
# ソフトウェアアプリケーションのための対話型GDPR互換プライバシポリシ生成 Interactive GDPR-Compliant Privacy Policy Generation for Software Applications ( http://arxiv.org/abs/2410.03069v1 ) ライセンス: Link先を確認	Pattaraporn Sangaroonsilp, Hoa Khanh Dam, Omar Haggag, John Grundy,	(参考訳) ソフトウェアアプリケーションは、幅広いタスクやインタラクションの実行を支援するように設計されている。彼らは普及し、このデジタル時代において人々の生活に不可欠な役割を担っている。これらのソフトウェアアプリケーションを使用するには、ユーザは時々、個人情報を提供するように要求される。プライバシーは重要な関心事となり、世界中で多くのデータ保護規制が存在しているため、ソフトウェアアプリケーションは、ユーザーの個人情報の収集と処理方法を詳述したプライバシーポリシーをユーザーに提供しなければならない。本稿では,多種多様なソフトウェアアプリケーションに対する一般データ保護規則(GDPR)に関して,包括的かつ遵守可能なプライバシポリシを生成するアプローチを提案する。これをサポートするために、我々はまず、既存のプライバシーポリシー分析に基づくプライバシー条項のライブラリを構築した。そして、インタラクティブなルールベースのシステムを開発し、一連の質問をソフトウェア開発者に促し、その回答を使って、特定のソフトウェアアプリケーション用にカスタマイズされたプライバシポリシを生成しました。我々は、我々のアプローチによって生成されたプライバシーポリシーを可読性、完全性、カバレッジの観点から評価し、3つの既存のプライバシーポリシージェネレータと生成AIベースのツールによって生成されたプライバシーポリシーと比較した。評価結果から,我々のアプローチが生み出すプライバシポリシが最も完全かつ包括的であることを示唆した。 Software applications are designed to assist users in conducting a wide range of tasks or interactions. They have become prevalent and play an integral part in people's lives in this digital era. To use those software applications, users are sometimes requested to provide their personal information. As privacy has become a significant concern and many data protection regulations exist worldwide, software applications must provide users with a privacy policy detailing how their personal information is collected and processed. We propose an approach that generates a comprehensive and compliant privacy policy with respect to the General Data Protection Regulation (GDPR) for diverse software applications. To support this, we first built a library of privacy clauses based on existing privacy policy analysis. We then developed an interactive rule-based system that prompts software developers with a series of questions and uses their answers to generate a customised privacy policy for a given software application. We evaluated privacy policies generated by our approach in terms of readability, completeness and coverage and compared them to privacy policies generated by three existing privacy policy generators and a Generative AI-based tool. Our evaluation results show that the privacy policy generated by our approach is the most complete and comprehensive.	翻訳日:2024-11-03 04:06:08 公開日:2024-10-04
# FedMAC: クロスモーダル・アグリゲーションとコントラスト規則化によるフェデレーション学習における部分モダリティの欠如に対処する FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Contrastive Regularization ( http://arxiv.org/abs/2410.03070v1 ) ライセンス: Link先を確認	Manh Duong Nguyen, Trung Thanh Nguyen, Huy Hieu Pham, Trong Nghia Hoang, Phi Le Nguyen, Thanh Trung Huynh,	(参考訳) Federated Learning(FL)は、分散データソースを使用して機械学習モデルをトレーニングする手法である。データをローカルに保存しながら、クライアントが共同で共有グローバルモデルを学ぶことによって、プライバシを保証する。しかしながら、ある機能やモダリティが利用できない、あるいは不完全なクライアントのデータセットで欠落したモダリティを扱う場合には、大きな課題が生じる。これまでの研究では、完全なモダリティの欠如の問題に対処してきたが、インスタンスレベルでのクライアント間の重大不均一性を考慮して、部分モダリティの欠如に対処できなかった。この課題に対処するために,FL に欠落した部分モダリティ条件下で欠落する多重モダリティに対処するために,FedMAC という新しいフレームワークを提案する。さらに,マルチモーダルな特徴の自明な集約を避けるために,潜在表現空間に制約を加えるために,コントラッシブベース正規化を導入する。実験の結果, 統計的不均一性を有する各種クライアント構成におけるFedMACの有効性が示され, 深刻な欠落シナリオにおいて, 最大26%のベースライン法を上回り, フェデレートシステムにおける部分的に欠落するモダリティの解決法としての可能性を強調した。 Federated Learning (FL) is a method for training machine learning models using distributed data sources. It ensures privacy by allowing clients to collaboratively learn a shared global model while storing their data locally. However, a significant challenge arises when dealing with missing modalities in clients' datasets, where certain features or modalities are unavailable or incomplete, leading to heterogeneous data distribution. While previous studies have addressed the issue of complete-modality missing, they fail to tackle partial-modality missing on account of severe heterogeneity among clients at an instance level, where the pattern of missing data can vary significantly from one sample to another. To tackle this challenge, this study proposes a novel framework named FedMAC, designed to address multi-modality missing under conditions of partial-modality missing in FL. Additionally, to avoid trivial aggregation of multi-modal features, we introduce contrastive-based regularization to impose additional constraints on the latent representation space. The experimental results demonstrate the effectiveness of FedMAC across various client configurations with statistical heterogeneity, outperforming baseline methods by up to 26% in severe missing scenarios, highlighting its potential as a solution for the challenge of partially missing modalities in federated systems.	翻訳日:2024-11-03 04:06:08 公開日:2024-10-04
# 拡散モデルを用いた多足歩行計画 Multi-Robot Motion Planning with Diffusion Models ( http://arxiv.org/abs/2410.03072v1 ) ライセンス: Link先を確認	Yorai Shaoul, Itamar Mishani, Shivam Vats, Jiaoyang Li, Maxim Likhachev,	(参考訳) 拡散モデルは、データから複雑なマルチモーダル動作を学ぶための幅広いロボット工学応用に成功している。しかし, 従来の研究は, 多ボット拡散モデル学習の複雑度が高いため, 単一ロボットと小規模環境に限られている。本論文では,単一ロボットデータのみを用いて,基礎となるデータ分布に適合する衝突のないマルチロボット軌道を生成する手法を提案する。我々のアルゴリズムであるMulti-robot Multi-model Planning Diffusion (MMD)は、学習した拡散モデルと古典的な探索に基づく手法を組み合わせることで、衝突制約下でのデータ駆動動作を生成する。さらに,単一拡散モデルがうまく一般化できない大規模環境において,複数の拡散モデルを構成する方法を示す。我々は,ロジスティクス環境に動機付けられた様々なシナリオにおいて,多数のロボットを計画する上でのアプローチの有効性を実証する。補足資料でビデオデモをご覧ください。 Diffusion models have recently been successfully applied to a wide range of robotics applications for learning complex multi-modal behaviors from data. However, prior works have mostly been confined to single-robot and small-scale environments due to the high sample complexity of learning multi-robot diffusion models. In this paper, we propose a method for generating collision-free multi-robot trajectories that conform to underlying data distributions while using only single-robot data. Our algorithm, Multi-robot Multi-model planning Diffusion (MMD), does so by combining learned diffusion models with classical search-based techniques -- generating data-driven motions under collision constraints. Scaling further, we show how to compose multiple diffusion models to plan in large environments where a single diffusion model fails to generalize well. We demonstrate the effectiveness of our approach in planning for dozens of robots in a variety of simulated scenarios motivated by logistics environments. View video demonstrations in our supplementary material, and our code at: https://github.com/yoraish/mmd.	翻訳日:2024-11-03 04:06:08 公開日:2024-10-04
# MetaOOD:OOD検出モデルの自動選択 MetaOOD: Automatic Selection of OOD Detection Models ( http://arxiv.org/abs/2410.03074v1 ) ライセンス: Link先を確認	Yuehan Qin, Yichi Zhang, Yi Nian, Xueying Ding, Yue Zhao,	(参考訳) 様々なタスクに対して、アウト・オブ・ディストリビューション(OOD)検出モデルを自動的に選択するにはどうすればよいのか? これは、特にオンライントランザクション、自律運転、リアルタイムの患者診断といった重要な領域において、データの分散シフトを特定することによって、オープンワールドアプリケーションの信頼性を維持するために不可欠である。多くのOOD検出方法が利用可能であるにもかかわらず、様々なタスクに対して最適なモデルを選択するという課題は、特に真理ラベルが欠如しているシナリオにおいて、ほとんど未探索のままである。本稿では,メタラーニングを利用してOOD検出モデルを自動的に選択する,最初のゼロショット・アン教師なしフレームワークであるMetaOODを紹介する。メタラーニングアプローチとして、MetaOODは、さまざまなベンチマークOODデータセットにわたる既存のメソッドの履歴パフォーマンスデータを活用することにより、テスト時にラベル付きデータを必要とせずに、新しいデータセットに適したモデルを効果的に選択することが可能になる。タスクの類似性をより正確に定量化するために、データセットと検出モデルの両方の特有のOOD特性をキャプチャする言語モデルに基づく埋め込みを導入する。また,11個のOOD検出モデルの中から24個のユニークなデータセットペアを選別して実験を行い,MetaOODが既存の手法を著しく上回っており,時間的オーバーヘッドが極端に大きいことを実証した。我々の結果はウィルコクソン統計試験によって検証され、MetaOODは確立されたOOD検出器や高度な教師なし選択法を含む11のベースラインの多様なグループを超越していることが示されている。 How can we automatically select an out-of-distribution (OOD) detection model for various underlying tasks? This is crucial for maintaining the reliability of open-world applications by identifying data distribution shifts, particularly in critical domains such as online transactions, autonomous driving, and real-time patient diagnosis. Despite the availability of numerous OOD detection methods, the challenge of selecting an optimal model for diverse tasks remains largely underexplored, especially in scenarios lacking ground truth labels. In this work, we introduce MetaOOD, the first zero-shot, unsupervised framework that utilizes meta-learning to automatically select an OOD detection model. As a meta-learning approach, MetaOOD leverages historical performance data of existing methods across various benchmark OOD datasets, enabling the effective selection of a suitable model for new datasets without the need for labeled data at the test time. To quantify task similarities more accurately, we introduce language model-based embeddings that capture the distinctive OOD characteristics of both datasets and detection models. Through extensive experimentation with 24 unique test dataset pairs to choose from among 11 OOD detection models, we demonstrate that MetaOOD significantly outperforms existing methods and only brings marginal time overhead. Our results, validated by Wilcoxon statistical tests, show that MetaOOD surpasses a diverse group of 11 baselines, including established OOD detectors and advanced unsupervised selection methods.	翻訳日:2024-11-03 03:56:19 公開日:2024-10-04
# Xにおける多言語トピック分類:データセットと解析 Multilingual Topic Classification in X: Dataset and Analysis ( http://arxiv.org/abs/2410.03075v1 ) ライセンス: Link先を確認	Dimosthenis Antypas, Asahi Ushio, Francesco Barbieri, Jose Camacho-Collados,	(参考訳) ソーシャルメディアのダイナミックな領域では、多様なトピックが日々議論され、言語境界を越えている。しかし、様々な言語にまたがる理解と分類の複雑さは、この多言語的多様性に苦しむトピックモデリングのような伝統的な手法において、依然として重要な課題である。本稿では,トピック分類を目的とした4言語(英語,スペイン語,日本語,ギリシャ語)のコンテンツを含む多言語データセットであるX-Topicを紹介する。私たちのデータセットには、ソーシャルメディアコンテンツに適した幅広いトピックが含まれており、クロス言語分析、堅牢な多言語モデルの開発、オンライン対話の研究を行う科学者や専門家にとって貴重なリソースとなっている。最後に、X-Topicを活用し、包括的な言語間および多言語分析を行い、現在の汎用言語モデルとドメイン固有言語モデルの能力を比較する。 In the dynamic realm of social media, diverse topics are discussed daily, transcending linguistic boundaries. However, the complexities of understanding and categorising this content across various languages remain an important challenge with traditional techniques like topic modelling often struggling to accommodate this multilingual diversity. In this paper, we introduce X-Topic, a multilingual dataset featuring content in four distinct languages (English, Spanish, Japanese, and Greek), crafted for the purpose of tweet topic classification. Our dataset includes a wide range of topics, tailored for social media content, making it a valuable resource for scientists and professionals working on cross-linguistic analysis, the development of robust multilingual models, and computational scientists studying online dialogue. Finally, we leverage X-Topic to perform a comprehensive cross-linguistic and multilingual analysis, and compare the capabilities of current general- and domain-specific language models.	翻訳日:2024-11-03 03:56:19 公開日:2024-10-04
# CommonIT: データ分割による大規模言語モデルの共通性を考慮したインストラクションチューニング CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions ( http://arxiv.org/abs/2410.03077v1 ) ライセンス: Link先を確認	Jun Rao, Xuebo Liu, Lian Lian, Shengjun Cheng, Yunjie Liao, Min Zhang,	(参考訳) 命令チューニングにより、LLM(Large Language Models)はコマンドに準拠する能力を高めることができる。データミキシングに焦点を当てたほとんどの研究から切り離され、トレーニング中のデータサンプリングの観点からモデルの能力向上に重点を置いている。人間の学習プロセスからインスピレーションを得て,ひとつのトピックに焦点を合わせることで,類似のトピックに対するソリューションの習得がより容易になるように,CommonIT: Commonality-aware Instruction Tuningという,新しい指導チューニング戦略を導入する。具体的には、命令データセットを3つのメトリクス(Task, Embedding, Length)で異なるグループにクラスタ化する。各トレーニングのミニバッチ(パーティション)は、単一のグループからのデータのみで構成されており、ミニバッチ全体にわたるデータランダム性と、バッチ内のデータ類似性の両方をもたらす。 LLaMaモデルの厳密なテストは、ITデータセット(FLAN、CoT、Alpaca)とモデル(LLaMa2-7B、Qwen2-7B、LLaMa 13B、BLOOM 7B)を通じてLLMの命令追従能力を向上するCommonITの有効性を示す。 CommonITは、Longthメトリックによる一般ドメイン(知識、推論、多言語性、コーディングの平均スコア)の平均2.1\%、Taskメトリックによる特殊ドメイン(GSM、オープンファンクション、コード)平均5.2\%、 Embeddingメトリックによる特定のタスク(MMLU)平均3.8\%を一貫して向上させる。コードは \url{https://github.com/raojay7/CommonIT} で入手できる。 With instruction tuning, Large Language Models (LLMs) can enhance their ability to adhere to commands. Diverging from most works focusing on data mixing, our study concentrates on enhancing the model's capabilities from the perspective of data sampling during training. Drawing inspiration from the human learning process, where it is generally easier to master solutions to similar topics through focused practice on a single type of topic, we introduce a novel instruction tuning strategy termed CommonIT: Commonality-aware Instruction Tuning. Specifically, we cluster instruction datasets into distinct groups with three proposed metrics (Task, Embedding and Length). We ensure each training mini-batch, or "partition", consists solely of data from a single group, which brings about both data randomness across mini-batches and intra-batch data similarity. Rigorous testing on LLaMa models demonstrates CommonIT's effectiveness in enhancing the instruction-following capabilities of LLMs through IT datasets (FLAN, CoT, and Alpaca) and models (LLaMa2-7B, Qwen2-7B, LLaMa 13B, and BLOOM 7B). CommonIT consistently boosts an average improvement of 2.1\% on the general domain (i.e., the average score of Knowledge, Reasoning, Multilinguality and Coding) with the Length metric, and 5.2\% on the special domain (i.e., GSM, Openfunctions and Code) with the Task metric, and 3.8\% on the specific tasks (i.e., MMLU) with the Embedding metric. Code is available at \url{https://github.com/raojay7/CommonIT}.	翻訳日:2024-11-03 03:56:19 公開日:2024-10-04
# 1RSB-FullRSB遷移を持つ高分解能量子スピングラスモデル Exactly solvable quantum spin-glass model with 1RSB-fullRSB transition ( http://arxiv.org/abs/2410.03079v1 ) ライセンス: Link先を確認	Naoto Shiraishi,	(参考訳) 横平均場型ランダム磁石を用いた新しい量子スピングラスモデル,シェリントン・カークパトリックモデルを導入する。パラメータ領域全体の自由エネルギーの正確な表現を厳密に導出する。得られた正確な解は、低温における1RSB-fullRSB転移の存在を示唆する。我々の手法は一般的な古典的スピンモデルに適用でき、任意の可解な古典的スピンモデルがその可解な量子モデルを持つことを示す。 We introduce a novel quantum spin-glass model, a Sherrington-Kirkpatrick model with a transverse mean-field type random magnet. We rigorously derive the exact expression of the free energy of this model at the entire parameter region. The obtained exact solution implies the existence of a 1RSB-fullRSB transition at low temperatures. Our technique can be applied to general classical spin models, telling that any solvable classical spin model has its solvable quantum counterpart.	翻訳日:2024-11-03 03:56:19 公開日:2024-10-04
# 安定拡散によるエッジ生成検出 Generative Edge Detection with Stable Diffusion ( http://arxiv.org/abs/2410.03080v1 ) ライセンス: Link先を確認	Caixia Zhou, Yaping Huang, Mochu Xiang, Jiahui Ren, Haibin Ling, Jing Zhang,	(参考訳) エッジ検出は一般的に、主に識別法によって対処されるピクセルレベルの分類問題と見なされる。近年、エッジ検出タスクにおいて、生成エッジ検出方法、特に拡散モデルに基づく解が初期化されている。大きな可能性にもかかわらず、タスク固有の設計モジュールの再トレーニングと多段階の推論は、より広範なアプリケーションを制限する。より詳しく調べると、その理由の一部は、広範囲に事前訓練された大規模モデル(安定拡散モデル)で符号化されたリッチな識別情報の探索不足にあると推測する。そこで我々は,事前学習した安定拡散モデルのポテンシャルを十分に活用して,GED(Generative Edge Detector)という新しい手法を提案する。我々のモデルは、事前訓練された安定拡散によって得られる豊富な高レベルかつ低レベルの事前知識により、特定のネットワーク設計なしで効率的に訓練および推論することができる。具体的には、遅延画像特徴写像を入力として、デノイングU-Netを微調整し、遅延エッジマップを直接予測することを提案する。さらに、エッジの主観性と曖昧さから、エッジの粒度をデノナイズドU-Netモデルに組み込んで制御可能かつ多様な予測を行う。さらに、複数の予測の相対的な粒度関係を確保するために、粒度正規化を考案する。我々は、複数のデータセットに対して広範な実験を行い、競争性能を達成する(BSDSテストデータセット上でのODSとOISの観点からは、0.870、0.880)。 Edge detection is typically viewed as a pixel-level classification problem mainly addressed by discriminative methods. Recently, generative edge detection methods, especially diffusion model based solutions, are initialized in the edge detection task. Despite great potential, the retraining of task-specific designed modules and multi-step denoising inference limits their broader applications. Upon closer investigation, we speculate that part of the reason is the under-exploration of the rich discriminative information encoded in extensively pre-trained large models (\eg, stable diffusion models). Thus motivated, we propose a novel approach, named Generative Edge Detector (GED), by fully utilizing the potential of the pre-trained stable diffusion model. Our model can be trained and inferred efficiently without specific network design due to the rich high-level and low-level prior knowledge empowered by the pre-trained stable diffusion. Specifically, we propose to finetune the denoising U-Net and predict latent edge maps directly, by taking the latent image feature maps as input. Additionally, due to the subjectivity and ambiguity of the edges, we also incorporate the granularity of the edges into the denoising U-Net model as one of the conditions to achieve controllable and diverse predictions. Furthermore, we devise a granularity regularization to ensure the relative granularity relationship of the multiple predictions. We conduct extensive experiments on multiple datasets and achieve competitive performance (\eg, 0.870 and 0.880 in terms of ODS and OIS on the BSDS test dataset).	翻訳日:2024-11-03 03:56:19 公開日:2024-10-04
# 品質データを用いたパラメータ制約言語モデルのスケーリング Scaling Parameter-Constrained Language Models with Quality Data ( http://arxiv.org/abs/2410.03083v1 ) ライセンス: Link先を確認	Ernie Chang, Matteo Paltenghi, Yang Li, Pin-Jie Lin, Changsheng Zhao, Patrick Huber, Zechun Liu, Rastislav Rabatin, Yangyang Shi, Vikas Chandra,	(参考訳) 言語モデリングにおける法則のスケーリングは、伝統的にデータセットのサイズとモデルパラメータの関数としてトレーニング損失を定量化し、計算最適推定を提供するが、しばしばデータ品質がモデル一般化に与える影響を無視する。本稿では,パラメータ制約言語モデルの性能決定に重要な要因であると考えられる,原定式化におけるデータ品質の顕微鏡的ビュー – 効果的なトレーニングトークン – を提供することにより,従来のスケーリング法則の理解を拡大する。具体的には、提案された効果的なトレーニングトークンの用語を、簡単に計算可能な2つのテキスト指標の組み合わせとして定式化する。 (i)テキストの多様性二教師モデルによる合成性。テキストの品質,モデルサイズ,トレーニングトークン,および8つの推論タスク精度スコアに関連する定数を推定した。我々は,推定定数+0.83ピアソン相関を真の精度と比較し,データサンプリングや合成といったデータ品質の向上を目的とした,広く使われているデータ技術を含むシナリオで解析した。 Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model generalization. In this paper, we extend the conventional understanding of scaling law by offering a microscopic view of data quality within the original formulation -- effective training tokens -- which we posit to be a critical determinant of performance for parameter-constrained language models. Specifically, we formulate the proposed term of effective training tokens to be a combination of two readily-computed indicators of text: (i) text diversity and (ii) syntheticity as measured by a teacher model. We pretrained over $200$ models of 25M to 1.5B parameters on a diverse set of sampled, synthetic data, and estimated the constants that relate text quality, model size, training tokens, and eight reasoning task accuracy scores. We demonstrated the estimated constants yield +0.83 Pearson correlation with true accuracies, and analyzed it in scenarios involving widely-used data techniques such as data sampling and synthesis which aim to improve data quality.	翻訳日:2024-11-03 03:56:19 公開日:2024-10-04
# 散逸促進エンタングルメント生成 Dissipation-accelerated entanglement generation ( http://arxiv.org/abs/2410.03084v1 ) ライセンス: Link先を確認	Xiao-Wei Zheng, Jun-Cong Zheng, Xue-Feng Pan, Li-Hua Lin, Pei-Rong Han, Peng-Bo Li,	(参考訳) 散逸は通常、量子効果を観測し、それらを量子技術に利用するための負の因子とみなされる。本稿では,2つの結合量子ビット間の量子絡み合いの発生を,これらの1つの量子ビットに強い散逸チャネルを導入することで高速化する手法を提案する。最大絡み合いは、これらの2つの量子ビットの間に1つの励起を均等に分配することによって条件的に確立される。当初、励起が散逸量子ビットによって保持されるとき、散逸は量子ジャンプなしで量子状態軌道の励起再分配過程を加速させる。以上の結果から,最大エンタングルメントを条件付きで達成するのに要する時間は,散逸速度の増加とともに単調に減少することが示唆された。さらに、このスキームは、2つのエルミート量子ビットに1つのNH量子ビットが対称結合された3量子ビット系に対するW状態の生成を加速するために一般化できることを示す。 Dissipation is usually considered a negative factor for observing quantum effects and for harnessing them for quantum technologies. Here we propose a scheme for speeding up the generation of quantum entanglement between two coupled qubits by introducing a strong dissipation channel to one of these qubits. The maximal entanglement is conditionally established by evenly distributing a single excitation between these two qubits. When the excitation is initially held by the dissipative qubit, the dissipation accelerates the excitation re-distribution process for the quantum state trajectory without quantum jumps. Our results show that the time needed to conditionally attain the maximal entanglement is monotonously decreased as the dissipative rate is increased. We further show that this scheme can be generalized to accelerate the production of the W state for the three-qubit system, where one NH qubit is symmetrically coupled to two Hermitian qubits.	翻訳日:2024-11-03 03:56:19 公開日:2024-10-04
# 限定ラベル付きデータとトレーニング時間を用いた最適化プロキシ -半スーパービジョンベイズニューラルネットワークアプローチ Optimization Proxies using Limited Labeled Data and Training Time -- A Semi-Supervised Bayesian Neural Network Approach ( http://arxiv.org/abs/2410.03085v1 ) ライセンス: Link先を確認	Parikshit Pareek, Kaarthik Sundar, Deepjyoti Deka, Sidhant Misra,	(参考訳) 制約のある最適化問題は、在庫管理や電力網などの様々なエンジニアリングシステムで発生する。しかし、不確実なパラメータでそのような最適化問題を何度も解くという要求は、重要な計算課題を生じさせる。本研究では,限定ラベル付きデータと限定モデルトレーニング時間の下での制約付き最適化問題を解決するため,ベイズニューラルネットワーク(BNN)を用いた学習手法を提案する。そこで本研究では,サンドイッチ方式で学習開始を行ない,コストを最小化するための教師付き学習ステップ(ラベル付きデータを用いた)と制約実現性を高めるための教師なし学習ステップ(ラベル付きデータを用いた)を交互に行う,実用的で複雑なシステムのための半教師付きBNNを提案する。教師なしと教師なしの両方のステップはベイズ的アプローチを用いており、確率的変分推論はベイズ的推論に近似的に用いられる。提案手法は,従来のBNNおよびディープニューラルネットワーク(DNN)アーキテクチャを,エネルギーネットワーク操作による重要な非凸制約最適化問題において,最大等式ギャップの最大10倍の削減,最適性と不等式(実現可能性)のギャップの半減,補正や投射のステップを必要とせず達成する。 BNNが最小の計算コストで後続サンプルを提供する能力を活用することにより、後続(SvP)スキームによる選択が、さらに10%以上の平等ギャップを削減できることを実証する。また、ラベル付きテストデータの少ない数で構築でき、他のアプリケーションにも容易に適応できる、厳密で実用的な確率的信頼境界も提供します。 Constrained optimization problems arise in various engineering system operations such as inventory management and electric power grids. However, the requirement to repeatedly solve such optimization problems with uncertain parameters poses a significant computational challenge. This work introduces a learning scheme using Bayesian Neural Networks (BNNs) to solve constrained optimization problems under limited labeled data and restricted model training times. We propose a semi-supervised BNN for this practical but complex regime, wherein training commences in a sandwiched fashion, alternating between a supervised learning step (using labeled data) for minimizing cost, and an unsupervised learning step (using unlabeled data) for enforcing constraint feasibility. Both supervised and unsupervised steps use a Bayesian approach, where Stochastic Variational Inference is employed for approximate Bayesian inference. We show that the proposed semi-supervised learning method outperforms conventional BNN and deep neural network (DNN) architectures on important non-convex constrained optimization problems from energy network operations, achieving up to a tenfold reduction in expected maximum equality gap and halving the optimality and inequality (feasibility) gaps, without requiring any correction or projection step. By leveraging the BNN's ability to provide posterior samples at minimal computational cost, we demonstrate that a Selection via Posterior (SvP) scheme can further reduce equality gaps by more than 10%. We also provide tight and practically meaningful probabilistic confidence bounds that can be constructed using a low number of labeled testing data and readily adapted to other applications.	翻訳日:2024-11-03 03:56:19 公開日:2024-10-04
# UNComp: 効率的な大規模言語モデル推論のための不確かさを意識した長期圧縮機 UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference ( http://arxiv.org/abs/2410.03090v1 ) ライセンス: Link先を確認	Jing Xiong, Jianghan Shen, Fanghua Ye, Chaofan Tao, Zhongwei Wan, Jianqiao Lu, Xun Wu, Chuanyang Zheng, Zhijiang Guo, Lingpeng Kong, Ngai Wong,	(参考訳) 大規模言語モデル(LLM)のデプロイは、特に長期コンテキスト推論において、高いメモリと計算要求のために困難である。キー値(KV)キャッシュは、以前計算されたキーと値の再利用によって推論を加速するが、メモリオーバーヘッドも大幅に増加する。既存のKVキャッシュ圧縮手法であるエヴィジョンやマージは、生成後にKVキャッシュを圧縮し、隠れた状態のエヴィジョンを見落とし、プリフィルステージの速度を向上することができない。さらに、異なるアテンションヘッドに均一な圧縮速度を適用すると、過剰な圧縮により、ニードル・イン・ア・ヘイスタックタスクにおいて重要な検索ヘッドを損なう可能性がある。本論文では,行列エントロピーを利用した不確実性を考慮した圧縮手法UNCompを提案する。レイヤとヘッドを不確実性に基づいてグループ化することで、UNCompは隠れた状態とKVキャッシュの両方を適応的に圧縮する。本手法はプリフィル段階で1.6倍の高速化を実現し,KVキャッシュを4.74%に削減し,スループットが6.4倍向上し,1.4倍の高速化を実現した。注目すべきは、ニードル・イン・ア・ヘイスタックのタスクでは、UNCompは元のサイズの9.38%に圧縮された場合でも、フルサイズのKVキャッシュより優れていることである。当社のアプローチは,既存のKVキャッシュスキームにシームレスに統合可能な,効率的かつトレーニング不要なグループクエリアテンションパラダイムを提供する。 Deploying large language models (LLMs) is challenging due to their high memory and computational demands, especially during long-context inference. While key-value (KV) caching accelerates inference by reusing previously computed keys and values, it also introduces significant memory overhead. Existing KV cache compression methods such as eviction and merging typically compress the KV cache after it is generated and overlook the eviction of hidden states, failing to improve the speed of the prefilling stage. Additionally, applying a uniform compression rate across different attention heads can harm crucial retrieval heads in needle-in-a-haystack tasks due to excessive compression. In this paper, we propose UNComp, an uncertainty-aware compression scheme that leverages matrix entropy to estimate model uncertainty across layers and heads at the token sequence level. By grouping layers and heads based on their uncertainty, UNComp adaptively compresses both the hidden states and the KV cache. Our method achieves a 1.6x speedup in the prefilling stage and reduces the KV cache to 4.74% of its original size, resulting in a 6.4x increase in throughput and a 1.4x speedup in inference with only a 1.41% performance loss. Remarkably, in needle-in-a-haystack tasks, UNComp outperforms the full-size KV cache even when compressed to 9.38% of its original size. Our approach offers an efficient, training-free Grouped-Query Attention paradigm that can be seamlessly integrated into existing KV cache schemes.	翻訳日:2024-11-03 03:56:19 公開日:2024-10-04
# AIレースダイナミクスのシミュレーションゲームからの戦略的洞察 Strategic Insights from Simulation Gaming of AI Race Dynamics ( http://arxiv.org/abs/2410.03092v1 ) ライセンス: Link先を確認	Ross Gruetzemacher, Shahar Avin, James Fox, Alexander K Saeri,	(参考訳) 我々は、AIの将来の可能性に関するシナリオ探索演習である"Intelligence Rising"の洞察を提示する。 4年間に43試合を監督してきたファシリテーターの経験に基づいて,ゲームプレイ中に観察された繰り返しパターン,戦略,意思決定過程を照明する。このシミュレーション環境でのAI開発軌跡に関する重要な戦略的考察は、AI人種の不安定化、破滅的なリスク軽減における国際協力の重要な役割、企業と国家の利益を連携させることの課題、AI能力の迅速かつ変革的な変化の可能性などである。私たちは、このゲームがAIガバナンスに固有の複雑さと不確実性に参加者をさらけ出すのに効果的だと信じている場所を強調します。主要なゲームプレイのテーマは、国際協定の出現、そのような合意の堅牢性への挑戦、AI開発におけるサイバーセキュリティの重要な役割、予期せぬ危機がAIの軌道を劇的に変える可能性である。これらの洞察を文書化することによって、私たちは、政策立案者、業界リーダー、そしてAI開発とガバナンスの複雑な環境をナビゲートする研究者に貴重な監視を提供することを目指しています。 We present insights from "Intelligence Rising", a scenario exploration exercise about possible AI futures. Drawing on the experiences of facilitators who have overseen 43 games over a four-year period, we illuminate recurring patterns, strategies, and decision-making processes observed during gameplay. Our analysis reveals key strategic considerations about AI development trajectories in this simulated environment, including: the destabilising effects of AI races, the crucial role of international cooperation in mitigating catastrophic risks, the challenges of aligning corporate and national interests, and the potential for rapid, transformative change in AI capabilities. We highlight places where we believe the game has been effective in exposing participants to the complexities and uncertainties inherent in AI governance. Key recurring gameplay themes include the emergence of international agreements, challenges to the robustness of such agreements, the critical role of cybersecurity in AI development, and the potential for unexpected crises to dramatically alter AI trajectories. By documenting these insights, we aim to provide valuable foresight for policymakers, industry leaders, and researchers navigating the complex landscape of AI development and governance.	翻訳日:2024-11-03 03:56:19 公開日:2024-10-04
# 絡み合いによる証明可能で堅牢な量子学習の利点 Entanglement-induced provable and robust quantum learning advantages ( http://arxiv.org/abs/2410.03094v1 ) ライセンス: Link先を確認	Haimeng Zhao, Dong-Ling Deng,	(参考訳) 量子コンピューティングは、機械学習の強化、高速化、革新のための非並列ポテンシャルを持っている。しかし、量子学習の優位性の明白な実証は、今のところ達成されていない。ここでは,従来の機械学習モデルと比較して,表現性,推論速度,トレーニング効率の面で,ノイズロストな非条件量子学習の優位性を厳格に確立する。量子絡み合いは、非ローカル機械学習タスクで必要とされる通信を減らすために用いられる。特に、エンタングルメント資源を用いた変動パラメータの一定数の量子モデルを用いて、単位精度で解くことができる完全古典的タスクを設計する。さらに、量子モデルは一定時間で訓練でき、多くのサンプルは問題の大きさに逆比例することを示した。この利点は、一定偏極雑音に対して頑健であることを示す。シミュレーションにより,従来のモデルではサイズが大きくなるにつれて性能が向上するが,オーバーフィッティングに悩まされることを示した。オーバーフィッティング問題によって強化された定数対線形分離により、比較的小さなシステムサイズで量子上の優位性を示すことができる。我々は,量子古典的学習分離法であるIonQ Ariaの数値シミュレーションとトラップイオン実験を併用して実証した。この結果は,現在ノイズの多い中間規模量子デバイスを用いた実用的な応用において,量子学習の優位性を実証するための貴重なガイドを提供する。 Quantum computing holds the unparalleled potentials to enhance, speed up or innovate machine learning. However, an unambiguous demonstration of quantum learning advantage has not been achieved so far. Here, we rigorously establish a noise-robust, unconditional quantum learning advantage in terms of expressivity, inference speed, and training efficiency, compared to commonly-used classical machine learning models. Our proof is information-theoretic and pinpoints the origin of this advantage: quantum entanglement can be used to reduce the communication required by non-local machine learning tasks. In particular, we design a fully classical task that can be solved with unit accuracy by a quantum model with a constant number of variational parameters using entanglement resources, whereas commonly-used classical models must scale at least linearly with the size of the task to achieve a larger-than-exponentially-small accuracy. We further show that the quantum model can be trained with constant time and a number of samples inversely proportional to the problem size. We prove that this advantage is robust against constant depolarization noise. We show through numerical simulations that even though the classical models can have improved performance as their sizes are increased, they would suffer from overfitting. The constant-versus-linear separation, bolstered by the overfitting problem, makes it possible to demonstrate the quantum advantage with relatively small system sizes. We demonstrate, through both numerical simulations and trapped-ion experiments on IonQ Aria, the desired quantum-classical learning separation. Our results provide a valuable guide for demonstrating quantum learning advantages in practical applications with current noisy intermediate-scale quantum devices.	翻訳日:2024-11-03 03:56:19 公開日:2024-10-04
# テキストベースとドラッグベースを組み合わせた高精度・フレキシブルな画像編集 Combing Text-based and Drag-based Editing for Precise and Flexible Image Editing ( http://arxiv.org/abs/2410.03097v1 ) ライセンス: Link先を確認	Ziqi Jiang, Zhen Wang, Long Chen,	(参考訳) 正確で柔軟な画像編集は、コンピュータビジョンの基本的な課題である。修正された領域に基づいて、ほとんどの編集方法は、大域的な編集と局所的な編集の2つのタイプに分けられる。本稿では,テキストベースの編集とドラッグベースの編集という2つの一般的な編集手法を選択し,その欠点を解析する。具体的には、テキストベースのメソッドは望まれる修正を正確に記述できないことが多いが、ドラッグベースのメソッドは曖昧さに悩まされている。これらの問題に対処するため, 拡散モデル上でテキストとドラッグ信号を組み合わせて, 正確かつ曖昧な操作を行う新しい画像編集法である \textbf{CLIPDrag} を提案した。これら2つの信号を完全に活用するために、テキスト信号をグローバルガイダンスとして扱い、ドラッグポイントをローカル情報として扱う。そこで本研究では,CLIPのような学習済み言語ビジョンモデルを適用することで,テキスト信号を既存のドラッグベース手法に統合する,新たなグローバルな動作監視手法を提案する。さらに,CLIPDragにおける緩やかな収束の問題にも対処し,ドラッグポイントを正しい方向に移動させる高速な点追跡手法を提案する。大規模な実験では、CLIPDragは既存の単一のドラッグベースのメソッドやテキストベースのメソッドよりも優れています。 Precise and flexible image editing remains a fundamental challenge in computer vision. Based on the modified areas, most editing methods can be divided into two main types: global editing and local editing. In this paper, we choose the two most common editing approaches (ie text-based editing and drag-based editing) and analyze their drawbacks. Specifically, text-based methods often fail to describe the desired modifications precisely, while drag-based methods suffer from ambiguity. To address these issues, we proposed \textbf{CLIPDrag}, a novel image editing method that is the first to combine text and drag signals for precise and ambiguity-free manipulations on diffusion models. To fully leverage these two signals, we treat text signals as global guidance and drag points as local information. Then we introduce a novel global-local motion supervision method to integrate text signals into existing drag-based methods by adapting a pre-trained language-vision model like CLIP. Furthermore, we also address the problem of slow convergence in CLIPDrag by presenting a fast point-tracking method that enforces drag points moving toward correct directions. Extensive experiments demonstrate that CLIPDrag outperforms existing single drag-based methods or text-based methods.	翻訳日:2024-11-03 03:56:19 公開日:2024-10-04
# CoCoHD:議会委員会、データセットを聴取 CoCoHD: Congress Committee Hearing Dataset ( http://arxiv.org/abs/2410.03099v1 ) ライセンス: Link先を確認	Arnav Hiray, Yunsong Liu, Mingxiao Song, Agam Shah, Sudheer Chava,	(参考訳) アメリカ議会の公聴会は国民経済と社会の織物に大きな影響を与え、個人の生活に影響を及ぼした。その重要性にもかかわらず、これらの談話を分析するための包括的なデータセットが欠如している。これに対応するために、1997年から2024年までの86の委員会における聴聞を32,697レコードでカバーする連邦議会聴聞データセット(CoCoHD)を提案する。このデータセットは、医療、LGBTQ+の権利、気候正義といった重要な問題に関する政策言語の研究を可能にする。化石燃料消費に関するエネルギー商務委員会(Energy and Commerce Committee)のスタンスを分析し,1000件のエネルギー関連文に関するケーススタディでその可能性を実証した。事前学習した言語モデルを微調整することにより、各聴力に対するエネルギー関連尺度を作成する。市場分析の結果,CoCoHDを用いた自然言語分析がエネルギーセクターのトレンドを予測・強調できることがわかった。 U.S. congressional hearings significantly influence the national economy and social fabric, impacting individual lives. Despite their importance, there is a lack of comprehensive datasets for analyzing these discourses. To address this, we propose the Congress Committee Hearing Dataset (CoCoHD), covering hearings from 1997 to 2024 across 86 committees, with 32,697 records. This dataset enables researchers to study policy language on critical issues like healthcare, LGBTQ+ rights, and climate justice. We demonstrate its potential with a case study on 1,000 energy-related sentences, analyzing the Energy and Commerce Committee's stance on fossil fuel consumption. By fine-tuning pre-trained language models, we create energy-relevant measures for each hearing. Our market analysis shows that natural language analysis using CoCoHD can predict and highlight trends in the energy sector.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# YouTubeのオートコンプリート提案における人種ステレオタイプの検討 Examining Racial Stereotypes in YouTube Autocomplete Suggestions ( http://arxiv.org/abs/2410.03102v1 ) ライセンス: Link先を確認	Eunbin Ha, Haein Kong, Shagun Jhaver,	(参考訳) Autocompleteは、ユーザ入力に基づいてクエリを予測し、潜在的に関連性のある提案のセットにユーザーを誘導する人気のある検索機能である。本研究では、YouTubeのオートコンプリートが、人種情報を探究するユーザーのための情報ソースとしてどのように機能するかを検討する。本研究では、4つの人種グループに関する入力クエリに対する自動完全提案のアルゴリズム出力監査を行い、それらが具現化するステレオタイプについて検討する。批判的談話分析を用いて、人種的偏見が現れる5つの主要な社会文化的文脈(外観、能力、文化、社会的平等、マナー)を同定する。以上の結果から,我々の収集したオートコンプリートにおける差別と人種間緊張の集合的証拠が示され,他の人種的マイノリティの潜在的なリスクを浮き彫りにした。我々は、コンテンツモデレーションポリシーの設計と、検索アウトプットにおけるこれらのバイアスに対処するための実施において、緊急のイノベーションを求めている。 Autocomplete is a popular search feature that predicts queries based on user input and guides users to a set of potentially relevant suggestions. In this study, we examine how YouTube autocompletes serve as an information source for users exploring information about race. We perform an algorithm output audit of autocomplete suggestions for input queries about four racial groups and examine the stereotypes they embody. Using critical discourse analysis, we identify five major sociocultural contexts in which racial biases manifest -- Appearance, Ability, Culture, Social Equity, and Manner. Our results show evidence of aggregated discrimination and interracial tensions in the autocompletes we collected and highlight their potential risks in othering racial minorities. We call for urgent innovations in content moderation policy design and enforcement to address these biases in search outputs.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# 水平長予測:ルックアヘッド計画によるコード生成の中間機能向上 Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning ( http://arxiv.org/abs/2410.03103v1 ) ライセンス: Link先を確認	Yifeng Ding, Hantian Ding, Shiqi Wang, Qing Sun, Varun Kumar, Zijian Wang,	(参考訳) Fill-in-the-Middle(FIM)は、コード言語モデルに不可欠なものとなり、左と右の両方のコンテキストに与えられた不足コードの生成を可能にした。しかし、現在のFIMトレーニングパラダイムは、元のトレーニングシーケンスをリオーダーし、次に通常の次の学習予測(NTP)を実行することで、しばしば、周囲のコンテキストとスムーズに整合したコンテンツを生成するのに苦労するモデルに繋がる。重要なことは、既存の作業はこの弱点を回避するためにルールベースの後処理に依存しているが、そのような方法は、制約のあるデータセット固有の仮定に依存するため、オープンドメインのコード補完タスクで実際に使用することはできない(例えば、基底真実と同じ数の行を生成する)。さらに、これらの非現実的な仮定なしに、FIMタスクのモデル性能は著しく低下する。 NTPだけでは、コード入力を成功させる重要な要因である、遠くの状況で効率的な計画条件を学習するモデルには不十分である、という仮説を立てる。これを解決するために,各ステップで残るミドルトークンの数(すなわち水平長)をモデルに予測させる新たなトレーニング目標であるHorizon-Length Prediction (HLP)を提案する。 HLPはルックアヘッド計画によってFIMを前進させ、データセット固有の後処理に頼ることなく、任意の左右コンテキストの入力境界を本質的に学習することを可能にする。異なるモデルとサイズで評価したところ、HLPはファイルレベルやリポジトリレベル、非現実的なポストプロセッシング手法を使わずに、様々なベンチマークでFIM性能を最大24%向上させることがわかった。さらに、HLPによる計画能力の向上により、コード推論におけるモデルパフォーマンスが向上する。重要なのは、HLPは無視可能なトレーニングオーバーヘッドと追加の推論コストのみを発生させ、現実のシナリオにおける実用性を保証することだ。 Fill-in-the-Middle (FIM) has become integral to code language models, enabling generation of missing code given both left and right contexts. However, the current FIM training paradigm, which reorders original training sequences and then performs regular next-token prediction (NTP), often leads to models struggling to generate content that aligns smoothly with the surrounding context. Crucially, while existing works rely on rule-based post-processing to circumvent this weakness, such methods are not practically usable in open-domain code completion tasks as they depend on restrictive, dataset-specific assumptions (e.g., generating the same number of lines as in the ground truth). Moreover, model performance on FIM tasks deteriorates significantly without these unrealistic assumptions. We hypothesize that NTP alone is insufficient for models to learn effective planning conditioned on the distant right context, a critical factor for successful code infilling. To overcome this, we propose Horizon-Length Prediction (HLP), a novel training objective that teaches models to predict the number of remaining middle tokens (i.e., horizon length) at each step. HLP advances FIM with lookahead planning, enabling models to inherently learn infilling boundaries for arbitrary left and right contexts without relying on dataset-specific post-processing. Our evaluation across different models and sizes shows that HLP significantly improves FIM performance by up to 24% relatively on diverse benchmarks, across file-level and repository-level, and without resorting to unrealistic post-processing methods. Furthermore, the enhanced planning capability gained through HLP boosts model performance on code reasoning. Importantly, HLP only incurs negligible training overhead and no additional inference cost, ensuring its practicality for real-world scenarios.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# Mamba in Vision: 技術と応用に関する総合的な調査 Mamba in Vision: A Comprehensive Survey of Techniques and Applications ( http://arxiv.org/abs/2410.03105v1 ) ライセンス: Link先を確認	Md Maklachur Rahman, Abdullah Aman Tutul, Ankur Nath, Lamyanba Laishram, Soon Ki Jung, Tracy Hammond,	(参考訳) Mambaは、コンピュータビジョンにおいて、畳み込みニューラルネットワーク(CNN)とビジョントランスフォーマー(ViT)が直面する課題を克服するための、新しいアプローチとして登場した。 CNNは局所的な特徴の抽出に長けているが、複雑なアーキテクチャ変更なしに長距離依存関係をキャプチャするのに苦労することが多い。対照的に、ViTはグローバルな関係を効果的にモデル化するが、自己認識機構の二次的な複雑さのために高い計算コストに悩まされる。 MambaはSelective Structured State Space Modelsを活用して、線形計算の複雑さで長距離依存を効果的に捉えることで、これらの制限に対処する。本調査では,Mambaモデルのユニークなコントリビューション,計算的メリット,応用について分析し,課題と今後の研究方向性を明らかにする。コンピュータビジョンにおけるMambaモデルの理解と成長を促進する基盤となるリソースを提供する。この作業の概要はhttps://github.com/maklachur/Mamba-in-Computer-Vision.comで確認できる。 Mamba is emerging as a novel approach to overcome the challenges faced by Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in computer vision. While CNNs excel at extracting local features, they often struggle to capture long-range dependencies without complex architectural modifications. In contrast, ViTs effectively model global relationships but suffer from high computational costs due to the quadratic complexity of their self-attention mechanisms. Mamba addresses these limitations by leveraging Selective Structured State Space Models to effectively capture long-range dependencies with linear computational complexity. This survey analyzes the unique contributions, computational benefits, and applications of Mamba models while also identifying challenges and potential future research directions. We provide a foundational resource for advancing the understanding and growth of Mamba models in computer vision. An overview of this work is available at https://github.com/maklachur/Mamba-in-Computer-Vision.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# MBDS:グラフネットワークシミュレータのためのマルチボディダイナミクスシミュレーションデータセット MBDS: A Multi-Body Dynamics Simulation Dataset for Graph Networks Simulators ( http://arxiv.org/abs/2410.03107v1 ) ライセンス: Link先を確認	Sheng Yang, Fengge Wu, Junsuo Zhao,	(参考訳) 物理世界の構造と事象をモデル化することは、ニューラルネットワークの基本的な目的である。様々なアプローチの中で、グラフネットワークシミュレータ(GNS)は、計算コストが低く、精度が高いため、物理現象をモデル化する主要な手法として登場した。物理シミュレーション技術のトレーニングと評価に使用されるデータセットは、典型的には研究者自身によって生成され、しばしばデータ量と品質が制限される。これにより,これらの手法の性能を正確に評価する上での課題が生じる。これに対応して、1D、2D、3Dシーンを含む高品質な物理シミュレーションデータセットを構築した。さらに、我々の研究は8つの完全なシーンを開発し、データセットの包括性を大幅に向上させることで、自分自身を区別する。私たちのデータセットの重要な特徴は、物理世界のより現実的なシミュレーションを促進する、正確な多体ダイナミクスを取り入れることである。高品質なデータセットを用いて,既存のGNS手法の体系的評価を行った。私たちのデータセットはhttps://github.com/Sherlocktein/MBDSでダウンロード可能です。 Modeling the structure and events of the physical world constitutes a fundamental objective of neural networks. Among the diverse approaches, Graph Network Simulators (GNS) have emerged as the leading method for modeling physical phenomena, owing to their low computational cost and high accuracy. The datasets employed for training and evaluating physical simulation techniques are typically generated by researchers themselves, often resulting in limited data volume and quality. Consequently, this poses challenges in accurately assessing the performance of these methods. In response to this, we have constructed a high-quality physical simulation dataset encompassing 1D, 2D, and 3D scenes, along with more trajectories and time-steps compared to existing datasets. Furthermore, our work distinguishes itself by developing eight complete scenes, significantly enhancing the dataset's comprehensiveness. A key feature of our dataset is the inclusion of precise multi-body dynamics, facilitating a more realistic simulation of the physical world. Utilizing our high-quality dataset, we conducted a systematic evaluation of various existing GNS methods. Our dataset is accessible for download at https://github.com/Sherlocktein/MBDS, offering a valuable resource for researchers to enhance the training and evaluation of their methodologies.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# 確率力学系学習のための学習自由条件拡散モデル A Training-Free Conditional Diffusion Model for Learning Stochastic Dynamical Systems ( http://arxiv.org/abs/2410.03108v1 ) ライセンス: Link先を確認	Yanfang Liu, Yuan Chen, Dongbin Xiu, Guannan Zhang,	(参考訳) 本研究では,未知確率微分方程式(SDE)をデータを用いて学習するための学習自由条件拡散モデルを提案する。提案手法は、スコアベース拡散モデルを用いて確率フローマップを近似することにより、SDEをモデル化するための計算効率と精度の重要な課題に対処する。既存の手法とは異なり、この手法は解析的に導出された閉形式正確なスコア関数に基づいており、これは軌道データを用いてモンテカルロ法によって効率的に推定することができ、スコア関数を学ぶためにニューラルネットワークのトレーニングを不要にする。対応する逆常微分方程式を解くことでラベル付きデータを生成することにより、フローマップの教師あり学習を可能にする。線形系,非線形系,多次元系を含む多種多様なSDE型に対する大規模数値実験により,本手法の汎用性と有効性を示す。学習されたモデルは、未知の確率系の短期的および長期的挙動を予測し、しばしばドリフトと拡散係数を推定する際に、GANのようなベースライン法を上回っている。 This study introduces a training-free conditional diffusion model for learning unknown stochastic differential equations (SDEs) using data. The proposed approach addresses key challenges in computational efficiency and accuracy for modeling SDEs by utilizing a score-based diffusion model to approximate their stochastic flow map. Unlike the existing methods, this technique is based on an analytically derived closed-form exact score function, which can be efficiently estimated by Monte Carlo method using the trajectory data, and eliminates the need for neural network training to learn the score function. By generating labeled data through solving the corresponding reverse ordinary differential equation, the approach enables supervised learning of the flow map. Extensive numerical experiments across various SDE types, including linear, nonlinear, and multi-dimensional systems, demonstrate the versatility and effectiveness of the method. The learned models exhibit significant improvements in predicting both short-term and long-term behaviors of unknown stochastic systems, often surpassing baseline methods like GANs in estimating drift and diffusion coefficients.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# LoRC: プログレッシブ圧縮戦略によるLDMKVキャッシュの低ランク圧縮 LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy ( http://arxiv.org/abs/2410.03111v1 ) ライセンス: Link先を確認	Rongzhi Zhang, Kuang Wang, Liyuan Liu, Shuohang Wang, Hao Cheng, Chao Zhang, Yelong Shen,	(参考訳) キーバリュー(KV)キャッシュは、トランスフォーマーベースの自己回帰型大言語モデル(LLM)を提供する上で重要なコンポーネントであり、以前計算されたKVベクトルを格納することでより高速な推論を可能にする。しかし、メモリ消費はシーケンス長とバッチサイズと線形にスケールし、LLMデプロイメントにおいて大きなボトルネックとなる。この問題を軽減するための既存のアプローチとしては、(1) 事前訓練されたLCMには適さない広範囲なパラメータチューニングを必要とするアップサイクリング段階に統合された効率的なアテンションバリアント、(2) テスト時のKVキャッシュ圧縮、主に層間依存関係を見落とし、タスク固有のトークン消去ポリシーがある。本稿では,KVキャッシュ圧縮に対する直交的アプローチを提案する。そこで我々は,KV重量行列の低ランク近似を提案し,モデル再構成なしで既存のトランスフォーマーベースLLMとのプラグイン統合を実現する。重みレベルでKVキャッシュを効果的に圧縮するために、我々は階層的に感度を調整し、深層ネットワークにおける圧縮エラーの蓄積に関する理論的解析によって支持されるプログレッシブ圧縮戦略を導入する。本手法は,テスト段階におけるアップサイクリング段階のモデルチューニングやタスク固有のプロファイリングを伴わずに機能するように設計されている。 LLaMAモデルによる多種多様なタスクにわたる8Bから70Bパラメータの大規模な実験により、我々のアプローチは性能を維持しながらGPUメモリのフットプリントを大幅に削減することを示した。 The Key-Value (KV) cache is a crucial component in serving transformer-based autoregressive large language models (LLMs), enabling faster inference by storing previously computed KV vectors. However, its memory consumption scales linearly with sequence length and batch size, posing a significant bottleneck in LLM deployment. Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages, which requires extensive parameter tuning thus unsuitable for pre-trained LLMs; (2) KV cache compression at test time, primarily through token eviction policies, which often overlook inter-layer dependencies and can be task-specific. This paper introduces an orthogonal approach to KV cache compression. We propose a low-rank approximation of KV weight matrices, allowing for plug-in integration with existing transformer-based LLMs without model retraining. To effectively compress KV cache at the weight level, we adjust for layerwise sensitivity and introduce a progressive compression strategy, which is supported by our theoretical analysis on how compression errors accumulate in deep networks. Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages. Extensive experiments with LLaMA models ranging from 8B to 70B parameters across various tasks show that our approach significantly reduces the GPU memory footprint while maintaining performance.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# X-ALMA:プラグイン&プレイモジュールと大規模翻訳における適応的拒絶 X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale ( http://arxiv.org/abs/2410.03115v1 ) ライセンス: Link先を確認	Haoran Xu, Kenton Murray, Philipp Koehn, Hieu Hoang, Akiko Eriguchi, Huda Khayrallah,	(参考訳) 大規模言語モデル(LLM)は、様々なNLPタスクで顕著な成功を収めてきたが、英語中心の事前学習と限定的な多言語データにより、主に英語に焦点を当てている。一部の多言語 LLM は数百の言語をサポートしていると主張しているが、モデルでは中級言語と低級言語の高品質な応答が得られず、不均衡な性能は英語や中国語のような高水準の言語に大きく依存している。本稿では,多言語機械翻訳タスクに焦点をあてて,言語数よりも品質を優先し,資源レベルに関わらず,50言語にまたがるトップレベルパフォーマンスを保証することを約束するモデルであるX-ALMAを導入する。 X-ALMAは、COMET-22に従って、FLORESおよびWMT'23テストデータセット上の全ての翻訳方向において、Aya-101やAya-23のような最先端のオープンソース多言語LLMを超越している。これは、訓練中の言語競合を防止するためのプラグアンドプレイ言語固有のモジュールアーキテクチャと、翻訳性能を最大化するための新しい最適化手法を備えた、慎重に設計されたトレーニングレギュレーションによって達成される。学習体制の最終段階において,提案した適応的推論優先最適化(ARPO)は,翻訳タスクにおける既存の選好最適化手法を超越している。 Large language models (LLMs) have achieved remarkable success across various NLP tasks, yet their focus has predominantly been on English due to English-centric pre-training and limited multilingual data. While some multilingual LLMs claim to support for hundreds of languages, models often fail to provide high-quality response for mid- and low-resource languages, leading to imbalanced performance heavily skewed in favor of high-resource languages like English and Chinese. In this paper, we prioritize quality over scaling number of languages, with a focus on multilingual machine translation task, and introduce X-ALMA, a model designed with a commitment to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels. X-ALMA surpasses state-of-the-art open-source multilingual LLMs, such as Aya-101 and Aya-23, in every single translation direction on the FLORES and WMT'23 test datasets according to COMET-22. This is achieved by plug-and-play language-specific module architecture to prevent language conflicts during training and a carefully designed training regimen with novel optimization methods to maximize the translation performance. At the final stage of training regimen, our proposed Adaptive Rejection Preference Optimization (ARPO) surpasses existing preference optimization methods in translation tasks.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# ProcBench: マルチステップ推論と追従手順のベンチマーク ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure ( http://arxiv.org/abs/2410.03117v1 ) ライセンス: Link先を確認	Ippei Fujisawa, Sensho Nobe, Hiroki Seto, Rina Onda, Yoshiaki Uchida, Hiroki Ikoma, Pei-Chun Chien, Ryota Kanai,	(参考訳) 推論は幅広い知的活動の中心であり、大規模言語モデル(LLM)の能力は進歩を続けているが、推論タスクのパフォーマンスは依然として限られている。推論のプロセスとメカニズムはまだ完全には理解されていないが、重要な要素は経路探索、関連する知識の選択、多段階推論である。問題はこれらの成分の合成によって解決される。本稿では,多段階推論の直接評価という,推論能力の特定の側面に焦点を当てたベンチマークを提案する。この目的のために,経路探索と暗黙的知識利用を大きく排除することで,多段階推論が特に焦点を絞った特別な推論タスクを設計する。我々のデータセットは、明示的な指示とそれに対応する質問のペアで構成されており、質問の解決に必要な手順は、その指示の中で完全に詳細に記述されている。この設定により、モデルは与えられた指示に従うだけで問題を解決することができる。各ステップで様々なステップの解決と応答評価を必要とする問題を構築することにより、最先端のLCMの指示に従う能力の徹底的な評価を可能にする。評価の堅牢性を確保するために、我々は複数の異なるタスクを含む。さらに,タスク間の精度の比較,ステップアウェアなメトリクスの利用,複雑性の別々に定義された尺度の適用により,タスクの推論におけるLLMの機能と限界に関する洞察を提供する実験を行う。本研究は,LSMの発達に重要な意味を持ち,今後の推論能力向上研究の分野に注目する。データセットは \url{https://huggingface.co/datasets/ifujisawa/procbench} で、コードも \url{https://github.com/ifujisawa/proc-bench} で利用可能です。 Reasoning is central to a wide range of intellectual activities, and while the capabilities of large language models (LLMs) continue to advance, their performance in reasoning tasks remains limited. The processes and mechanisms underlying reasoning are not yet fully understood, but key elements include path exploration, selection of relevant knowledge, and multi-step inference. Problems are solved through the synthesis of these components. In this paper, we propose a benchmark that focuses on a specific aspect of reasoning ability: the direct evaluation of multi-step inference. To this end, we design a special reasoning task where multi-step inference is specifically focused by largely eliminating path exploration and implicit knowledge utilization. Our dataset comprises pairs of explicit instructions and corresponding questions, where the procedures necessary for solving the questions are entirely detailed within the instructions. This setup allows models to solve problems solely by following the provided directives. By constructing problems that require varying numbers of steps to solve and evaluating responses at each step, we enable a thorough assessment of state-of-the-art LLMs' ability to follow instructions. To ensure the robustness of our evaluation, we include multiple distinct tasks. Furthermore, by comparing accuracy across tasks, utilizing step-aware metrics, and applying separately defined measures of complexity, we conduct experiments that offer insights into the capabilities and limitations of LLMs in reasoning tasks. Our findings have significant implications for the development of LLMs and highlight areas for future research in advancing their reasoning abilities. Our dataset is available at \url{https://huggingface.co/datasets/ifujisawa/procbench} and code at \url{https://github.com/ifujisawa/proc-bench}.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# 精度、安定性、一般化:カウンター言語とダイク言語を分類するためのRNN学習能力の総合評価 Precision, Stability, and Generalization: A Comprehensive Assessment of RNNs learnability capability for Classifying Counter and Dyck Languages ( http://arxiv.org/abs/2410.03118v1 ) ライセンス: Link先を確認	Neisarg Dave, Daniel Kifer, Lee Giles, Ankur Mali,	(参考訳) 本研究では,リカレントニューラルネットワーク(RNN)の構造化形式言語分類における学習可能性について検討し,カウンター言語とダイク言語に着目した。伝統的に、一階述語(LSTM)と二階述語(O2RNN)の両方のRNNは、主にチョムスキー階層内の理論的表現性に基づいて、そのようなタスクに有効であると考えられてきた。しかし、我々の研究は、RNNが主にステートマシンとして機能し、その言語能力は、その埋め込みの正確さと、ネガティブな例をサンプリングする戦略に大きく影響されていることを示すことで、この概念に挑戦する。実験の結果, 正例と負例との構造的類似性が増加するにつれて, 性能が著しく低下することがわかった。興味深いことに、RNN埋め込みを用いた基本的な単層分類器でさえ、偶然よりも優れた性能を示した。一般化を評価するため,40本までの弦のモデルを訓練し,41本から500本までの弦の試験を行った。 LSTMモデルとO2RNNモデルの安定性の比較により、O2RNNは一般に様々なシナリオでより安定した安定性を提供することが示された。さらに、我々の仮説が様々なRNNと一致していることを明らかにするために、異なる初期化戦略の影響について検討する。全体として、この研究はRNNの計算能力に関する信念を確立し、言語分類タスクに対するニューラルネットワークの可能性を評価する上で、データ構造とサンプリング技術の重要性を強調した。表現性に対する強い制約は、単なる表現性が学習の本質を捉えないため、真の学習可能性を理解するために不可欠である、と氏は強調する。 This study investigates the learnability of Recurrent Neural Networks (RNNs) in classifying structured formal languages, focusing on counter and Dyck languages. Traditionally, both first-order (LSTM) and second-order (O2RNN) RNNs have been considered effective for such tasks, primarily based on their theoretical expressiveness within the Chomsky hierarchy. However, our research challenges this notion by demonstrating that RNNs primarily operate as state machines, where their linguistic capabilities are heavily influenced by the precision of their embeddings and the strategies used for sampling negative examples. Our experiments revealed that performance declines significantly as the structural similarity between positive and negative examples increases. Remarkably, even a basic single-layer classifier using RNN embeddings performed better than chance. To evaluate generalization, we trained models on strings up to a length of 40 and tested them on strings from lengths 41 to 500, using 10 unique seeds to ensure statistical robustness. Stability comparisons between LSTM and O2RNN models showed that O2RNNs generally offer greater stability across various scenarios. We further explore the impact of different initialization strategies revealing that our hypothesis is consistent with various RNNs. Overall, this research questions established beliefs about RNNs' computational capabilities, highlighting the importance of data structure and sampling techniques in assessing neural networks' potential for language classification tasks. It emphasizes that stronger constraints on expressivity are crucial for understanding true learnability, as mere expressivity does not capture the essence of learning.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# 強化学習システムにおけるリングアトラクタを用いた空間認識による意思決定 Spatial-aware decision-making with ring attractors in reinforcement learning systems ( http://arxiv.org/abs/2410.03119v1 ) ライセンス: Link先を確認	Marcos Negre Saura, Richard Allmendinger, Theodore Papamarkou, Wei Pan,	(参考訳) 本稿では、ニューラルネットワークのダイナミックスにインスパイアされた数学的モデルであるリングアトラクションの強化学習(RL)行動選択プロセスへの統合について検討する。リングアトラクタは、空間情報と不確実性をエンコードする特別な脳にインスパイアされた構造として、学習速度と予測性能を改善する生物学的に妥当なメカニズムを提供する。アクション空間を明示的にエンコードし、神経活動の組織化を容易にし、深いRLの文脈でニューラルネットワーク全体にわたって空間表現の分散を可能にする。 RLアクション選択プロセスにおけるリングアトラクターの応用は、リング上の特定の場所にアクションをマッピングし、神経活動に基づいて選択されたアクションをデコードすることを含む。本研究では,リングアトラクタを外生モデルとして構築し,深層学習ポリシーアルゴリズムの一部として統合することにより,リングアトラクタの適用について検討する。その結果, Atari 100kベンチマークの最先端モデルでは, 大幅な改善が見られた。特に、我々の統合されたアプローチは最先端モデルの性能を半分に改善し、選択されたベースラインよりも53\%向上したことを示す。 This paper explores the integration of ring attractors, a mathematical model inspired by neural circuit dynamics, into the reinforcement learning (RL) action selection process. Ring attractors, as specialized brain-inspired structures that encode spatial information and uncertainty, offer a biologically plausible mechanism to improve learning speed and predictive performance. They do so by explicitly encoding the action space, facilitating the organization of neural activity, and enabling the distribution of spatial representations across the neural network in the context of deep RL. The application of ring attractors in the RL action selection process involves mapping actions to specific locations on the ring and decoding the selected action based on neural activity. We investigate the application of ring attractors by both building them as exogenous models and integrating them as part of a Deep Learning policy algorithm. Our results show a significant improvement in state-of-the-art models for the Atari 100k benchmark. Notably, our integrated approach improves the performance of state-of-the-art models by half, representing a 53\% increase over selected baselines.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# RIPPLECOT:チェーン・オブ・ソート・インコンテクスト学習による言語モデルにおける知識編集のリップル効果の増幅 RIPPLECOT: Amplifying Ripple Effect of Knowledge Editing in Language Models via Chain-of-Thought In-Context Learning ( http://arxiv.org/abs/2410.03122v1 ) ライセンス: Link先を確認	Zihao Zhao, Yuchen Yang, Yijiang Li, Yinzhi Cao,	(参考訳) リップル効果は、大規模言語モデルの知識編集において重要な課題となる。すなわち、単一の事実が編集されると、モデルは関連する事実の連鎖に関連付けられたマルチホップ質問によって評価されるシーケンス内の関連事実を正確に更新するのに苦労する。最近の戦略は、従来のパラメータ更新から、より柔軟で計算集約性の高い方法へと移行し、リップル効果に対処する上でより効果的であることが証明された。インコンテキストラーニング(ICL)の編集では、単純な「Imagine that + new fact」を使ってLLMをガイドするが、新しい事実だけでそのようなシナリオに関わる事実の連鎖を特定できないため、複雑なマルチホップ問題に苦労する。さらに、メモリベースの編集は、すべての編集や関連する事実に対する追加のストレージを保持し、継続的な更新を効果的に維持する必要がある。これらの設計上の制限の結果、Vicuna-7BのMQuAKE-cfベンチマークでは、最も高い精度が33.8%に留まった。そこで我々は,Chain-of-Thought(COT)推論を統合した新しいICL編集手法であるRippleCOTを提案する。 RippleCOTはデモを‘newfact, question, thought, answer’として構成し、質問の中にマルチホップロジックを特定し分解するための思考コンポーネントを組み込む。このアプローチは、関連する事実の連鎖による複雑なマルチホップ質問を通じて、モデルを効果的に導く。総合的な実験により、RippleCOTはリップル効果の最先端を著しく上回り、精度は7.8%から87.1%まで向上した。 The ripple effect poses a significant challenge in knowledge editing for large language models. Namely, when a single fact is edited, the model struggles to accurately update the related facts in a sequence, which is evaluated by multi-hop questions linked to a chain of related facts. Recent strategies have moved away from traditional parameter updates to more flexible, less computation-intensive methods, proven to be more effective in addressing the ripple effect. In-context learning (ICL) editing uses a simple demonstration `Imagine that + new fact` to guide LLMs, but struggles with complex multi-hop questions as the new fact alone fails to specify the chain of facts involved in such scenarios. Besides, memory-based editing maintains additional storage for all edits and related facts, requiring continuous updates to stay effective. As a result of these design limitations, the challenge remains, with the highest accuracy being only 33.8% on the MQuAKE-cf benchmarks for Vicuna-7B. To address this, we propose RippleCOT, a novel ICL editing approach integrating Chain-of-Thought (COT) reasoning. RippleCOT structures demonstrations as `newfact, question, thought, answer`, incorporating a thought component to identify and decompose the multi-hop logic within questions. This approach effectively guides the model through complex multi-hop questions with chains of related facts. Comprehensive experiments demonstrate that RippleCOT significantly outperforms the state-of-the-art on the ripple effect, achieving accuracy gains ranging from 7.8% to 87.1%.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# 研削:符号付き距離場からのパラメータ化表面の再構成 Shrinking: Reconstruction of Parameterized Surfaces from Signed Distance Fields ( http://arxiv.org/abs/2410.03123v1 ) ライセンス: Link先を確認	Haotian Yin, Przemyslaw Musialski,	(参考訳) 本稿では,3次元曲面に対して広く用いられている暗黙的ニューラル表現(INR)であるSigned Distance Fields (SDFs) から,明示的パラメータ化曲面を再構成する手法を提案する。従来のマーチングキューブのような再構成手法では,INRの連続的および微分可能特性を損なう離散メッシュを抽出するが,本手法ではパラメータ化初期球を目標のSDF形状に合わせて反復的に収縮させ,微分可能性と表面パラメータ化を保った。これにより、テクスチャマッピング、幾何学処理、アニメーション、有限要素解析などの下流アプリケーションが可能になる。 ABCデータセットの典型的な幾何学的形状と部分から評価し,高度なコンピュータグラフィックスや幾何学的深層学習アプリケーションに欠かせないスムーズさと差別性を保ちながら,競争力のある再現性を実現する。 We propose a novel method for reconstructing explicit parameterized surfaces from Signed Distance Fields (SDFs), a widely used implicit neural representation (INR) for 3D surfaces. While traditional reconstruction methods like Marching Cubes extract discrete meshes that lose the continuous and differentiable properties of INRs, our approach iteratively contracts a parameterized initial sphere to conform to the target SDF shape, preserving differentiability and surface parameterization throughout. This enables downstream applications such as texture mapping, geometry processing, animation, and finite element analysis. Evaluated on the typical geometric shapes and parts of the ABC dataset, our method achieves competitive reconstruction quality, maintaining smoothness and differentiability crucial for advanced computer graphics and geometric deep learning applications.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# ブラックボックス言語モデルを用いた教師なしプロンプト学習について On Unsupervised Prompt Learning for Classification with Black-box Language Models ( http://arxiv.org/abs/2410.03124v1 ) ライセンス: Link先を確認	Zhen-Yu Zhang, Jiandong Zhang, Huaxiu Yao, Gang Niu, Masashi Sugiyama,	(参考訳) 大規模言語モデル(LLM)はテキスト形式の学習問題において顕著な成功を収めており、最も人気のあるLLMはブラックボックス方式で展開されている。一方、特定のダウンストリームタスクがより良いパフォーマンスを得るためには、通常、微調整が必要であり、この機能はブラックボックスLLMのオーナーによって提供される。ブラックボックスLSMを微調整するには、モデルパラメータを調整するためにラベル付きデータが必要である。しかし、現実の多くのアプリケーションでは、LLMは熟練した人間のアノテーションよりも高品質なテキストデータセットをラベル付けすることができ、ラベルなしデータで微調整されたブラックボックスLSMの可能性を探る動機となった。本稿では,学習パラメータがプロンプト自身とラベルなしデータの擬似ラベルであるブラックボックスLPMを用いた分類のための教師なしプロンプト学習を提案する。具体的には、プロンプトは離散トークンの列としてモデル化され、各トークンは、それぞれが学習対象のカテゴリ分布を持つ。一方、擬似ラベルを学習するには、まずLLMのテキスト内学習(ICL)機能について検討し、まずLLMを用いて信頼性の高い擬似ラベル付きデータを識別し、そのプロンプトに基づいて擬似ラベル付きデータを他の非ラベル付きデータに割り当てる。以前は、プロンプトがトレーニング中に関与していないときに、プロンプトが予測に使用される場合に関係しているため、トレーニング中にそれらを考慮することで、プロンプト学習とプロンプト利用のステージはより一貫したものになる。ベンチマークデータセットを用いた実験により,提案アルゴリズムの有効性が示された。教師なしの素早い学習の後、擬似ラベル付きデータセットを使用して、ブラックボックスLLMの所有者によるさらなる微調整を行うことができる。 Large language models (LLMs) have achieved impressive success in text-formatted learning problems, and most popular LLMs have been deployed in a black-box fashion. Meanwhile, fine-tuning is usually necessary for a specific downstream task to obtain better performance, and this functionality is provided by the owners of the black-box LLMs. To fine-tune a black-box LLM, labeled data are always required to adjust the model parameters. However, in many real-world applications, LLMs can label textual datasets with even better quality than skilled human annotators, motivating us to explore the possibility of fine-tuning black-box LLMs with unlabeled data. In this paper, we propose unsupervised prompt learning for classification with black-box LLMs, where the learning parameters are the prompt itself and the pseudo labels of unlabeled data. Specifically, the prompt is modeled as a sequence of discrete tokens, and every token has its own to-be-learned categorical distribution. On the other hand, for learning the pseudo labels, we are the first to consider the in-context learning (ICL) capabilities of LLMs: we first identify reliable pseudo-labeled data using the LLM, and then assign pseudo labels to other unlabeled data based on the prompt, allowing the pseudo-labeled data to serve as in-context demonstrations alongside the prompt. Those in-context demonstrations matter: previously, they are involved when the prompt is used for prediction while they are not involved when the prompt is trained; thus, taking them into account during training makes the prompt-learning and prompt-using stages more consistent. Experiments on benchmark datasets show the effectiveness of our proposed algorithm. After unsupervised prompt learning, we can use the pseudo-labeled dataset for further fine-tuning by the owners of the black-box LLMs.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# 可変ラウンジ相互作用を持つ量子格子モデルにおける相関スプレッド Correlation Spreading in Quantum Lattice Models with Variable-Range Interactions ( http://arxiv.org/abs/2410.03125v1 ) ライセンス: Link先を確認	Julien Despres,	(参考訳) 本論では, 急激な大域的クエンチを通じて平衡から遠ざかる短距離あるいは長距離相互作用を持つ孤立格子モデルにおける量子相関の拡散について検討した。準粒子理論に依存する一般的な理論的アプローチが提示される。後者は、超立方体格子上の短距離相互作用粒子と長距離相互作用粒子とスピン格子モデルの両方に有効な等時連結相関関数の一般表現を公表することを許している。定常位相の議論に基づき、その因果性円錐は、相関エッジと、時空相関の外部構造と内部構造を定義する一連の局所極限からなる普遍的な2次元構造を示すことを示した。短距離相互作用では、各構造の運動は弾道的であり、関連する拡散速度は、ポストクエンチハミルトニアンの準粒子分散関係の群と位相速度と関連している。 1/\|R\|^{\alpha}$ という形の長距離相互作用に対して、相関の拡散は、パワーロー指数 $\alpha$ をチューニングする際の群速度のばらつきによって大きく異なる。発散群速度、すなわち準局所的な状態に対して、因果円錐に対する普遍代数的構造の証拠を提示した。相関エッジの運動は常に弾道性よりも遅いことが分かっているが、局所的なエクストリームマは、それぞれ空隙のない量子系と空隙を持つ量子系に対して、弾道性および弾道性よりも速く伝播する。局所的な状態が明確に定義された群速度を示唆するならば、類似のスケーリング法則を回復し、相関の因果性円錐に対する短距離の場合よりも速度を拡大する。 In this thesis, we have investigated the spreading of quantum correlations in isolated lattice models with short- or long-range interactions driven far from equilibrium via sudden global quenches. A general theoretical approach relying on a quasiparticle theory is presented. The latter has permitted to unveil a generic expression for the equal-time connected correlation functions valid both for short-range and long-range interacting particle and spin lattice models on a hypercubic lattice. Relying on stationary phase arguments, we have shown that its causality cone displays a universal twofold structure consisting of a correlation edge and a series of local extrema defining the outer and inner structure of the space-time correlations. For short-range interactions, the motion of each structure is ballistic and the associated spreading velocities are related to the group and phase velocities of the quasiparticle dispersion relation of the post-quench Hamiltonian. For long-range interactions of the form $1/\|R\|^{\alpha}$, the correlation spreading is substantially different due to a possible divergence of group velocity when tuning the power-law exponent $\alpha$. For a divergent group velocity, i.e. the quasi-local regime, we have presented evidence of a universal algebraic structure for the causality cone. While, the correlation edge motion has been found to be always slower than ballistic, the local extrema propagate faster than ballistically and ballistically for gapless and gapped quantum systems respectively. For the local regime implying a well-defined group velocity, we have recovered similar scaling laws and spreading velocities than the short-range case for the causality cone of correlations.	翻訳日:2024-11-03 03:46:34 公開日:2024-10-04
# 資格改善の機会がある場合のAIモデルの公正性を考慮した意思決定者のエンゲージメントの理解 Understanding Decision Subjects' Engagement with and Perceived Fairness of AI Models When Opportunities of Qualification Improvement Exist ( http://arxiv.org/abs/2410.03126v1 ) ライセンス: Link先を確認	Meric Altug Gemalmaz, Ming Yin,	(参考訳) 我々は、AIモデルの公正さが、決定の対象である場合、モデルの公正さに対する人々の関与と認識にどのように影響するかを考察するが、これらの決定に対して繰り返し、戦略的に反応することができる。モデルとの対話を継続するか、モデルから望ましい決定を下す可能性を改善するために自らに投資すべきか、という2つのタイプの戦略的な反応が検討されている。 3つの人-オブジェクト実験により、決定対象がAIモデルとの戦略的かつ反復的な相互作用において、モデルの公平性はモデルとの相互作用や自己改善の意思を変化させることはないことがわかった。しかし、意思決定対象は、AIモデルがグループに対して体系的に偏見を抱く場合、特に、適度な判断の適格性を改善することの難しさが、低資格の人々にとってより大きい場合には、依然として公平でないと認識する。 We explore how an AI model's decision fairness affects people's engagement with and perceived fairness of the model if they are subject to its decisions, but could repeatedly and strategically respond to these decisions. Two types of strategic responses are considered -- people could determine whether to continue interacting with the model, and whether to invest in themselves to improve their chance of future favorable decisions from the model. Via three human-subject experiments, we found that in decision subjects' strategic, repeated interactions with an AI model, the model's decision fairness does not change their willingness to interact with the model or to improve themselves, even when the model exhibits unfairness on salient protected attributes. However, decision subjects still perceive the AI model to be less fair when it systematically biases against their group, especially if the difficulty of improving one's qualification for the favorable decision is larger for the lowly-qualified people.	翻訳日:2024-11-03 03:36:45 公開日:2024-10-04
# von Mises-Fisher分布を用いたベイズ推論による変分量子アルゴリズム A variational quantum algorithm by Bayesian Inference with von Mises-Fisher distribution ( http://arxiv.org/abs/2410.03130v1 ) ライセンス: Link先を確認	Trung Huynh, Gwangil An, Minsu Kim, Yu-Seong Jeon, Jinhyoung Lee,	(参考訳) 変分量子固有解法アルゴリズムは、多くの物理・化学的問題において基本的な課題であるハミルトンの基底状態と基底エネルギーの探索能力から注目されている。有望な結果を示しているが、様々な測定方法の使用は依然として大きな障害である。近年,量子位相推定法にインスパイアされた測定手法が提案されている。この測定手法に基づいて,フォン・ミセス=フィッシャー分布とともにベイズ推論の原理を取り入れた新しい手法を提案し,様々なランダムなハミルトン行列に対して基底状態を特定できる新しいアルゴリズムの能力を理論的に実証する。これはまた、他の量子情報科学問題におけるフォン・ミセス・フィッシャー分布ポテンシャルを探索する新しい方法を開く。 The variational quantum eigensolver algorithm has gained attentions due to its capability of locating the ground state and ground energy of a Hamiltonian, which is a fundamental task in many physical and chemical problems. Although it has demonstrated promising results, the use of various types of measurements remains a significant obstacle. Recently, a quantum phase estimation algorithm inspired measurement scheme has been proposed to overcome this issue by introducing an additional ancilla system that is coupled to the primary system. Based on this measurement scheme, we present a novel approach that employs Bayesian inference principles together with von Mises-Fisher distribution and theoretically demonstrates the new algorithm's capability in identifying the ground state with certain for various random Hamiltonian matrices. This also opens a new way for exploring the von Mises-Fisher distribution potential in other quantum information science problems.	翻訳日:2024-11-03 03:36:45 公開日:2024-10-04
# 残余寿命予測:大規模言語モデルに基づく多次元産業信号処理と効率的な伝達学習に関する研究 Remaining Useful Life Prediction: A Study on Multidimensional Industrial Signal Processing and Efficient Transfer Learning Based on Large Language Models ( http://arxiv.org/abs/2410.03134v1 ) ライセンス: Link先を確認	Yan Chen, Cheng Liu,	(参考訳) RUL(Remaining useful Life)予測は、機器の信頼性と運用安全性が最重要である現代産業システムの維持に不可欠である。従来の手法は、小規模なディープラーニングや物理・統計モデルに基づいており、複雑な多次元センサーデータや様々な操作条件に悩まされ、一般化能力を制限している。これらの課題に対処するために,大規模言語モデル(LLM)をRUL予測に用いる革新的な回帰フレームワークを提案する。コーパスデータに事前学習したLLMのモデリング能力を利用することで,複雑な時間依存性を効果的に把握し,予測精度を向上させることができる。ターボファンエンジンのRUL予測タスクにおける広範囲な実験により、提案モデルは挑戦的なFD002およびFD004サブセットの最先端(SOTA)手法を超越し、他のサブセットのSOTAに近い結果が得られることが示された。これまでの研究と異なり、我々のフレームワークは全てのサブセットに対して同じスライディングウィンドウ長と全てのセンサ信号を使用し、強い一貫性と一般化を示している。さらに、転送学習実験により、微調整のための最小限のターゲットドメインデータでは、モデルが完全なターゲットドメインデータに基づいて訓練されたSOTAメソッドより優れていることが明らかになった。本研究は、産業信号処理とRUL予測におけるLLMの意義を強調し、将来のインテリジェント産業システムにおける健康管理のための先進的なソリューションを提供する。 Remaining useful life (RUL) prediction is crucial for maintaining modern industrial systems, where equipment reliability and operational safety are paramount. Traditional methods, based on small-scale deep learning or physical/statistical models, often struggle with complex, multidimensional sensor data and varying operating conditions, limiting their generalization capabilities. To address these challenges, this paper introduces an innovative regression framework utilizing large language models (LLMs) for RUL prediction. By leveraging the modeling power of LLMs pre-trained on corpus data, the proposed model can effectively capture complex temporal dependencies and improve prediction accuracy. Extensive experiments on the Turbofan engine's RUL prediction task show that the proposed model surpasses state-of-the-art (SOTA) methods on the challenging FD002 and FD004 subsets and achieves near-SOTA results on the other subsets. Notably, different from previous research, our framework uses the same sliding window length and all sensor signals for all subsets, demonstrating strong consistency and generalization. Moreover, transfer learning experiments reveal that with minimal target domain data for fine-tuning, the model outperforms SOTA methods trained on full target domain data. This research highlights the significant potential of LLMs in industrial signal processing and RUL prediction, offering a forward-looking solution for health management in future intelligent industrial systems.	翻訳日:2024-11-03 03:36:45 公開日:2024-10-04
# 正確な世界モデルを用いた構造を考慮したLLMのリレーショナル推論 Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model ( http://arxiv.org/abs/2410.03136v1 ) ライセンス: Link先を確認	Siheng Xiong, Ali Payani, Yuan Yang, Faramarz Fekri,	(参考訳) 大規模言語モデル(LLM)の推論能力の強化は、特に複雑で多段階の意思決定を必要とするタスクにおいて、依然として重要な課題である。人間は、様々な行動の潜在的な結果をシミュレートするために、内的世界モデルによる計画的計画を活用することで、これらのタスクを遂行する。そこで我々は,LLMのための多段階推論フレームワークを提案し,これをSWAP(Structure-Aware Planning with Accurate World Model)と呼ぶ。自然言語におけるChain-of-Thought(CoT)推論のみに依存する従来のアプローチとは異なり、SWAPは構造情報を取り入れ、世界モデルを通じて推論プロセスをガイドし、ステップ上のソフトな検証メカニズムを提供する。さらに、SWAPは、より信頼性の高い世界モデリングを可能にするGenerator-Discriminatorアーキテクチャを導入することで、複雑な推論タスクにおける正確な世界状態予測の課題を克服する。具体的には、ジェネレータが次の状態を予測し、判別器は問題コンテキストで要求される論理的一貫性と整合性を確保する。 SWAPはまた、早期収束を防ぐための幅広い潜在的な行動を探る政策モデルを奨励している。多様性に基づくモデリング(DBM)を用いて行動と状態の両方の世代多様性のボトルネックを解消し、比較的ランキング(CR)による識別精度を向上させることにより、SWAPはLLMの推論性能を著しく向上させる。 SWAPは,数理推論,論理推論,コーディングタスクなど,多種多様な推論集約型ベンチマークで評価される。大規模な実験により、SWAPはベースラインよりも大幅に改善され、同じ大きさの既存のLLMよりも一貫して優れていることが示された。 Enhancing the reasoning capabilities of large language models (LLMs) remains a key challenge, especially for tasks that require complex, multi-step decision-making. Humans excel at these tasks by leveraging deliberate planning with an internal world model to simulate the potential outcomes of various actions. Inspired by this, we propose a novel multi-step reasoning framework for LLMs, referred to as Structure-aware Planning with Accurate World Model (SWAP). Unlike previous approaches that rely solely on Chain-of-Thought (CoT) reasoning in natural language, SWAP incorporates structural information to guide the reasoning process via a world model and provides a soft verification mechanism over the steps. Moreover, SWAP overcomes the challenge of accurate world state predictions in complex reasoning tasks by introducing a Generator-Discriminator architecture, which enables more reliable world modeling. Specifically, the generator predicts the next state, and the discriminator ensures alignment with the logical consistency required by the problem context. SWAP also encourages the policy model to explore a broad range of potential actions to prevent premature convergence. By resolving the bottlenecks of generation diversity for both actions and states using diversity-based modeling (DBM) and improving discrimination accuracy through contrastive ranking (CR), SWAP significantly enhances the reasoning performance of LLMs. We evaluate SWAP across diverse reasoning-intensive benchmarks including math reasoning, logical reasoning, and coding tasks. Extensive experiments demonstrate that SWAP achieves substantial improvements over the baselines and consistently outperforms existing LLMs of similar sizes.	翻訳日:2024-11-03 03:36:45 公開日:2024-10-04
# SAG: モデルコラボレーションによるスタイル対応記事生成 SAG: Style-Aligned Article Generation via Model Collaboration ( http://arxiv.org/abs/2410.03137v1 ) ライセンス: Link先を確認	Chenning Xu, Fangxun Shu, Dian Jin, Jinghao Wei, Hao Jiang,	(参考訳) 大規模言語モデル(LLM)は、パーソナライズされたスタイリッシュなコンテンツ生成に対する需要を増大させている。しかし、GPT-4のようなクローズドソースモデルは最適化の機会に制限を与える一方で、Qwen-72Bのようなオープンソースの代替品の相当なトレーニングコストと柔軟性は、かなりの課題を生んでいる。逆に、SLM(Small Language Model)は複雑な命令を理解し、学習した能力を新しい文脈に移すのに苦労し、しばしばより顕著な制限を示す。本稿では, LLM と SLM の長所を利用した協調学習フレームワークを提案する。我々はLLMを凍結して、その堅牢な命令追従能力を利用し、その後、スタイル固有のデータを用いてSLMに教師付き微調整を適用する。さらに,スタイルの整合性を高める自己改善手法を提案する。新しいベンチマークであるNoteBenchは、スタイル整合生成を徹底的に評価しています。 GPT-4と比較して, ROUGE-L0.78, BLEU-40.55の改善が得られた。 Large language models (LLMs) have increased the demand for personalized and stylish content generation. However, closed-source models like GPT-4 present limitations in optimization opportunities, while the substantial training costs and inflexibility of open-source alternatives, such as Qwen-72B, pose considerable challenges. Conversely, small language models (SLMs) struggle with understanding complex instructions and transferring learned capabilities to new contexts, often exhibiting more pronounced limitations. In this paper, we present a novel collaborative training framework that leverages the strengths of both LLMs and SLMs for style article generation, surpassing the performance of either model alone. We freeze the LLMs to harness their robust instruction-following capabilities and subsequently apply supervised fine-tuning on the SLM using style-specific data. Additionally, we introduce a self-improvement method to enhance style consistency. Our new benchmark, NoteBench, thoroughly evaluates style-aligned generation. Extensive experiments show that our approach achieves state-of-the-art performance, with improvements of 0.78 in ROUGE-L and 0.55 in BLEU-4 scores compared to GPT-4, while maintaining a low hallucination rate regarding factual and faithfulness.	翻訳日:2024-11-03 03:36:45 公開日:2024-10-04
# LLMは様々な分子を生成することができるか? : 構造的多様性との整合に向けて Can LLMs Generate Diverse Molecules? Towards Alignment with Structural Diversity ( http://arxiv.org/abs/2410.03138v1 ) ライセンス: Link先を確認	Hyosoon Jang, Yunhui Jang, Jaehyung Kim, Sungsoo Ahn,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、分子構造を創出する際、薬物候補として顕著な性能を示しており、薬物発見を加速する大きな可能性を秘めている。しかし、現在のLSMは、様々な分子のセットを提案するという、薬物発見の重要な要件を見落としている。この多様性は、他の分子がウェットラブや臨床的検証に失敗する場合に成功する可能性のある代替分子を提供するため、生存可能な薬物を見つける可能性を改善するために不可欠である。このような多様性の必要性にもかかわらず、LLMはしばしば与えられたプロンプトから構造的に類似した分子を出力する。ビームサーチのような復号方式はテキストの多様性を高める可能性があるが、これはしばしば分子構造的な多様性と一致しない。そこで本研究では, 分子生成LDMを微調整し, 構造的に多様な分子の集合を自己回帰的に生成する手法を提案する。提案手法は,(1)LLMを自己回帰的に生成する分子に適応させるための微調整と,(2)生成分子の構造多様性を最大化するための強化学習の2段階からなる。実験により,1) 既存の復号法と比較して, LLM がより多様な分子を発見できることを示すとともに, 2) 化学ドメインに微調整された分子を含む様々な分子の生成において, LLM が他の代表的 LLM よりも優れることを示した。 Recent advancements in large language models (LLMs) have demonstrated impressive performance in generating molecular structures as drug candidates, which offers significant potential to accelerate drug discovery. However, the current LLMs overlook a critical requirement for drug discovery: proposing a diverse set of molecules. This diversity is essential for improving the chances of finding a viable drug, as it provides alternative molecules that may succeed where others fail in wet-lab or clinical validations. Despite such a need for diversity, the LLMs often output structurally similar molecules from a given prompt. While decoding schemes like beam search may enhance textual diversity, this often does not align with molecular structural diversity. In response, we propose a new method for fine-tuning molecular generative LLMs to autoregressively generate a set of structurally diverse molecules, where each molecule is generated by conditioning on the previously generated molecules. Our approach consists of two stages: (1) supervised fine-tuning to adapt LLMs to autoregressively generate molecules in a sequence and (2) reinforcement learning to maximize structural diversity within the generated molecules. Our experiments show that (1) our fine-tuning approach enables the LLMs to better discover diverse molecules compared to existing decoding schemes and (2) our fine-tuned model outperforms other representative LLMs in generating diverse molecules, including the ones fine-tuned on chemical domains.	翻訳日:2024-11-03 03:36:45 公開日:2024-10-04
# 純粋相関の存在下でのインコンテキスト学習 In-context Learning in Presence of Spurious Correlations ( http://arxiv.org/abs/2410.03140v1 ) ライセンス: Link先を確認	Hrayr Harutyunyan, Rafayel Darbinyan, Samvel Karapetyan, Hrant Khachatrian,	(参考訳) 大規模言語モデルは、いくつかの例からタスクを解くことを学ぶ、コンテキスト内学習において顕著な能力を示す。近年の研究では、コンテクスト内で単純な回帰タスクを実行するためにトランスフォーマーを訓練できることが示されている。本研究は,突発的特徴を含む分類タスクに対して,文脈内学習者を訓練する可能性について検討する。従来の文脈内学習者の訓練手法は、刺激的な特徴に影響を受けやすいことが判明した。さらに、メタトレーニングデータセットが1つのタスクのみのインスタンスを含む場合、従来のアプローチはタスクの記憶に結びつき、予測にコンテキストを活用するモデルの生成に失敗する。そこで本研究では,そのような学習者に対して,与えられた分類課題を学習するための新しい手法を提案する。注目すべきは、このコンテキスト内学習者は、ERMやGroupDROのような強力なメソッドよりも優れています。しかし、これらのアルゴリズムとは異なり、他のタスクによく当てはまらない。そこで本研究では,テキスト内学習インスタンスの多種多様なデータセットをトレーニングすることにより,未知のタスクに一般化するインコンテキスト学習者を得ることが可能であることを示す。 Large language models exhibit a remarkable capacity for in-context learning, where they learn to solve tasks given a few examples. Recent work has shown that transformers can be trained to perform simple regression tasks in-context. This work explores the possibility of training an in-context learner for classification tasks involving spurious features. We find that the conventional approach of training in-context learners is susceptible to spurious features. Moreover, when the meta-training dataset includes instances of only one task, the conventional approach leads to task memorization and fails to produce a model that leverages context for predictions. Based on these observations, we propose a novel technique to train such a learner for a given classification task. Remarkably, this in-context learner matches and sometimes outperforms strong methods like ERM and GroupDRO. However, unlike these algorithms, it does not generalize well to other tasks. We show that it is possible to obtain an in-context learner that generalizes to unseen tasks by training on a diverse dataset of synthetic in-context learning instances.	翻訳日:2024-11-03 03:24:16 公開日:2024-10-04
# Margin Matching Preference Optimization: グラニュラーフィードバックによるモデルアライメントの強化 Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback ( http://arxiv.org/abs/2410.03145v1 ) ライセンス: Link先を確認	Kyuyoung Kim, Ah Jeong Seo, Hao Liu, Jinwoo Shin, Kimin Lee,	(参考訳) 人間のフィードバックからの強化学習など、アライメント技術で微調整された大規模言語モデル(LLM)は、これまでで最も有能なAIシステムの開発に役立っている。その成功にもかかわらず、既存の手法は、ペア間の相対的な品質の微妙な違いを捉えるのに失敗する、ペアの選好で好まれる出力を示すような単純なバイナリラベルに依存するのが一般的である。この制限に対処するために、相対的な品質マージンを最適化に組み込んだMMPO(Margin Matching Preference Optimization)というアプローチを導入し、LCMポリシーと報酬モデルの改善につながった。具体的には、ペアの選好における品質マージンを考慮し、Bradley-Terryモデルに基づくソフトターゲット確率を設計し、標準のクロスエントロピー目標を持つモデルを訓練する。人間とAIの両方のフィードバックデータによる実験によると、MMPOはMT-benchやRewardBenchといった一般的なベンチマークにおいて、ベースラインメソッドよりも一貫してパフォーマンスが向上している。特に、MMPOでトレーニングされた7Bモデルは、2024年6月現在、RewardBenchで最先端のパフォーマンスを達成しており、同じスケールの他のモデルよりも優れています。我々の分析は、MMPOが過剰適合に対してより堅牢であることを示し、より良い校正モデルをもたらすことも示している。 Large language models (LLMs) fine-tuned with alignment techniques, such as reinforcement learning from human feedback, have been instrumental in developing some of the most capable AI systems to date. Despite their success, existing methods typically rely on simple binary labels, such as those indicating preferred outputs in pairwise preferences, which fail to capture the subtle differences in relative quality between pairs. To address this limitation, we introduce an approach called Margin Matching Preference Optimization (MMPO), which incorporates relative quality margins into optimization, leading to improved LLM policies and reward models. Specifically, given quality margins in pairwise preferences, we design soft target probabilities based on the Bradley-Terry model, which are then used to train models with the standard cross-entropy objective. Experiments with both human and AI feedback data demonstrate that MMPO consistently outperforms baseline methods, often by a substantial margin, on popular benchmarks including MT-bench and RewardBench. Notably, the 7B model trained with MMPO achieves state-of-the-art performance on RewardBench as of June 2024, outperforming other models of the same scale. Our analysis also shows that MMPO is more robust to overfitting, leading to better-calibrated models.	翻訳日:2024-11-03 03:24:16 公開日:2024-10-04
# 自律型とウィザード・オブ・オズのユーザ行動の違いの分析と検出 Analysis and Detection of Differences in Spoken User Behaviors between Autonomous and Wizard-of-Oz Systems ( http://arxiv.org/abs/2410.03147v1 ) ライセンス: Link先を確認	Mikey Elmers, Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara,	(参考訳) 本研究では、遠隔操作ロボットと自律対話システムとの対話を比較検討し、日本人とロボットの対話の大規模コーパスにおけるユーザの行動差について検討した。注意的聴取と面接の対話シナリオにおけるユーザ音声行動の分析を行った。その結果, 発話長, 発話速度, 補聴器, バックチャネル, 拡散, および操作者制御条件と自律的条件の笑いなどの指標に有意な差が認められた。さらに,オペレータと自律的なシステム条件を区別する予測モデルを開発した。ベースラインモデルと比較して精度と精度が向上し, ベースラインモデルよりもF1スコアが高いモデルもいくつか存在する。 This study examined users' behavioral differences in a large corpus of Japanese human-robot interactions, comparing interactions between a tele-operated robot and an autonomous dialogue system. We analyzed user spoken behaviors in both attentive listening and job interview dialogue scenarios. Results revealed significant differences in metrics such as speech length, speaking rate, fillers, backchannels, disfluencies, and laughter between operator-controlled and autonomous conditions. Furthermore, we developed predictive models to distinguish between operator and autonomous system conditions. Our models demonstrated higher accuracy and precision compared to the baseline model, with several models also achieving a higher F1 score than the baseline.	翻訳日:2024-11-03 03:24:16 公開日:2024-10-04
# イベント中心物語のレンズによるメディアフレイミング Media Framing through the Lens of Event-Centric Narratives ( http://arxiv.org/abs/2410.03151v1 ) ライセンス: Link先を確認	Rohan Das, Aditya Chandra, I-Ta Lee, Maria Leonor Pacheco,	(参考訳) コミュニケーションの観点から、フレームは特定の解釈を奨励し、他人を遠ざけるために使用される言語のパッケージングを定義する。例えば、ニュース記事は、移民を経済の押し上げまたは排水とみなすことができるため、同じ現象の全く異なる解釈を伝えることができる。本論では, フラーミング装置を説明するためには, 物語の作り方を考える必要がある,と論じる。この方向への第一歩として、イベントと他のイベントとの関係を抽出し、それらを高レベルな物語に分類し、ニュース記事のフレームを説明するためのフレームワークを提案する。我々のフレームワークは、移民と銃規制という2つの異なる領域において、米国のニュースにおけるフレーミングを分析するのに利用できることを示す。 From a communications perspective, a frame defines the packaging of the language used in such a way as to encourage certain interpretations and to discourage others. For example, a news article can frame immigration as either a boost or a drain on the economy, and thus communicate very different interpretations of the same phenomenon. In this work, we argue that to explain framing devices we have to look at the way narratives are constructed. As a first step in this direction, we propose a framework that extracts events and their relations to other events, and groups them into high-level narratives that help explain frames in news articles. We show that our framework can be used to analyze framing in U.S. news for two different domains: immigration and gun control.	翻訳日:2024-11-03 03:24:16 公開日:2024-10-04
# メモリ拡張リカレントニューラルネットワークにおける学習可能性の探索:精度,安定性,経験的考察 Exploring Learnability in Memory-Augmented Recurrent Neural Networks: Precision, Stability, and Empirical Insights ( http://arxiv.org/abs/2410.03154v1 ) ライセンス: Link先を確認	Shrabon Das, Ankur Mali,	(参考訳) 本研究では,Pushdown Automataと理論的に等価であるメモリレスおよびメモリ拡張RNNの学習可能性について検討する。経験的な結果から、これらのモデルは記号文法を習得するよりも精度に頼って、長い列の一般化に失敗することが多い。完全トレーニングおよびコンポーネント凍結モデルの実験により、メモリコンポーネントの凍結はパフォーマンスを著しく向上し、Penn Treebankデータセット(テストパープレキシティを123.5から120.5に削減した)で最先端の結果が得られた。凍結メモリを持つモデルでは、通常のモデルでは60%減少するのに対して、より長いシーケンスで初期性能の90%を保った。理論的解析は、凍結記憶が時間的依存を安定化させ、堅牢な収束をもたらすことを示唆している。これらの知見は、RNNの真の学習可能性限界を理解するために、安定したメモリ設計と長いシーケンス評価の必要性を強調している。 This study explores the learnability of memory-less and memory-augmented RNNs, which are theoretically equivalent to Pushdown Automata. Empirical results show that these models often fail to generalize on longer sequences, relying more on precision than mastering symbolic grammar. Experiments on fully trained and component-frozen models reveal that freezing the memory component significantly improves performance, achieving state-of-the-art results on the Penn Treebank dataset (test perplexity reduced from 123.5 to 120.5). Models with frozen memory retained up to 90% of initial performance on longer sequences, compared to a 60% drop in standard models. Theoretical analysis suggests that freezing memory stabilizes temporal dependencies, leading to robust convergence. These findings stress the need for stable memory designs and long-sequence evaluations to understand RNNs true learnability limits.	翻訳日:2024-11-03 03:24:16 公開日:2024-10-04
# MELODI: 長期のコンテキストに対するメモリ圧縮の探索 MELODI: Exploring Memory Compression for Long Contexts ( http://arxiv.org/abs/2410.03156v1 ) ライセンス: Link先を確認	Yinpeng Chen, DeLesley Hutchins, Aren Jansen, Andrey Zhmoginov, David Racz, Jesper Andersen,	(参考訳) 本稿では,短いコンテキストウィンドウを用いて,長い文書を効率的に処理できる新しいメモリアーキテクチャMELODIを提案する。 MELODIの鍵となる原理は、短期記憶と長期記憶をネットワーク層とコンテキストウィンドウの両方にわたる階層的な圧縮スキームとして表現することである。特に、短期記憶は、複数のレイヤにわたるコンテキストウィンドウの繰り返し圧縮によって達成され、ウィンドウ間のスムーズな遷移を保証する。対照的に、長期記憶は単一の中間層内でさらなる圧縮を行い、コンテキストウィンドウ全体で情報を集約し、履歴全体から重要な情報を効果的に統合する。強いベースライン – 大規模な長期メモリ(64Kキー値ペア)に対して集中的に注意を払っているMemorizing Transformer – と比較して, 提案手法は, 様々な長期コンテキストデータセットにおいて優れた性能を示し, メモリフットプリントを8。 We present MELODI, a novel memory architecture designed to efficiently process long documents using short context windows. The key principle behind MELODI is to represent short-term and long-term memory as a hierarchical compression scheme across both network layers and context windows. Specifically, the short-term memory is achieved through recurrent compression of context windows across multiple layers, ensuring smooth transitions between windows. In contrast, the long-term memory performs further compression within a single middle layer and aggregates information across context windows, effectively consolidating crucial information from the entire history. Compared to a strong baseline - the Memorizing Transformer employing dense attention over a large long-term memory (64K key-value pairs) - our method demonstrates superior performance on various long-context datasets while remarkably reducing the memory footprint by a factor of 8.	翻訳日:2024-11-03 03:24:16 公開日:2024-10-04
# 選択状態空間モデルにおけるメモリ圧縮の数学的形式化 Mathematical Formalism for Memory Compression in Selective State Space Models ( http://arxiv.org/abs/2410.03158v1 ) ライセンス: Link先を確認	Siddhanth Bhat,	(参考訳) 状態空間モデル(SSM)は、シーケンスデータの長距離依存性をモデル化するための強力なフレームワークとして登場した。従来のリカレントニューラルネットワーク(RNN)や畳み込みニューラルネットワーク(CNN)とは異なり、SSMは、制御理論や力学系の原理を利用して、シーケンスモデリングに対する構造的かつ安定したアプローチを提供する。しかし、シーケンスモデリングにおける重要な課題は、重要な情報を失うことなく、長期的依存関係をコンパクトな隠れ状態表現に圧縮することである。本稿では,選択状態空間モデルにおけるメモリ圧縮を理解するための厳密な数学的枠組みを開発する。入力関連性に基づいて隠れた状態を動的にフィルタリング・更新する選択ゲーティング機構を導入し,効率的なメモリ圧縮を実現する。我々は、相互情報やレート歪曲理論などの情報理論ツールを用いて、メモリ効率と情報保持のトレードオフを定式化する。我々の分析は、モデル性能を犠牲にすることなく圧縮できる情報量に関する理論的境界を提供する。また、選択的SSMにおける隠れ状態の安定性と収束性を証明し、信頼性のある長期記憶保持を保証する定理を導出する。計算複雑性解析により、選択SSMは従来のRNNモデルと比較してメモリ効率と処理速度を大幅に改善することが明らかになった。時系列予測や自然言語処理などのシーケンスモデリングタスクの実証的検証を通じて、選択的なSSMが、少ないメモリと計算資源を使用しながら、最先端のパフォーマンスを達成できることを実証する。 State space models (SSMs) have emerged as a powerful framework for modelling long-range dependencies in sequence data. Unlike traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs), SSMs offer a structured and stable approach to sequence modelling, leveraging principles from control theory and dynamical systems. However, a key challenge in sequence modelling is compressing long-term dependencies into a compact hidden state representation without losing critical information. In this paper, we develop a rigorous mathematical framework for understanding memory compression in selective state space models. We introduce a selective gating mechanism that dynamically filters and updates the hidden state based on input relevance, allowing for efficient memory compression. We formalize the trade-off between memory efficiency and information retention using information-theoretic tools, such as mutual information and rate-distortion theory. Our analysis provides theoretical bounds on the amount of information that can be compressed without sacrificing model performance. We also derive theorems that prove the stability and convergence of the hidden state in selective SSMs, ensuring reliable long-term memory retention. Computational complexity analysis reveals that selective SSMs offer significant improvements in memory efficiency and processing speed compared to traditional RNN-based models. Through empirical validation on sequence modelling tasks such as time-series forecasting and natural language processing, we demonstrate that selective SSMs achieve state-of-the-art performance while using less memory and computational resources.	翻訳日:2024-11-03 03:24:16 公開日:2024-10-04
# 時系列予測のための自己回帰移動平均アテンション機構 Autoregressive Moving-average Attention Mechanism for Time Series Forecasting ( http://arxiv.org/abs/2410.03159v1 ) ライセンス: Link先を確認	Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang,	(参考訳) 本稿では,様々な線形アテンション機構に適応できる自己回帰(AR)移動平均アテンション構造を提案する。本稿では、時系列予測(TSF)タスクにおいて、予め見落とされたデコーダのみの自己回帰変換モデルを用いて、適切なトークン化とトレーニング手法を適用すると、最適なベースラインに匹敵する結果が得られることを示す。さらに、統計学と最近の線形注意の進歩からARMAモデルに着想を得て、既存の自己回帰的注意機構に完全なARMA構造を導入する。間接MA重み生成法を用いて,MA項を基礎となる効率的な注目モデルの時間的複雑さとパラメータサイズを維持しつつ組み込む。さらに、間接パラメータ生成が局所的時間的影響のモデリング要求に合致する暗黙のMA重みを生成する方法について検討する。実験結果から、ARMA構造を組み込むことで、TSFタスクにおける様々なAR注意の処理性能が向上し、最先端の結果が得られた。 We propose an Autoregressive (AR) Moving-average (MA) attention structure that can adapt to various linear attention mechanisms, enhancing their ability to capture long-range and local temporal patterns in time series. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines when appropriate tokenization and training methods are applied. Moreover, inspired by the ARMA model from statistics and recent advances in linear attention, we introduce the full ARMA structure into existing autoregressive attention mechanisms. By using an indirect MA weight generation method, we incorporate the MA term while maintaining the time complexity and parameter size of the underlying efficient attention models. We further explore how indirect parameter generation can produce implicit MA weights that align with the modeling requirements for local temporal impacts. Experimental results show that incorporating the ARMA structure consistently improves the performance of various AR attentions on TSF tasks, achieving state-of-the-art results.	翻訳日:2024-11-03 03:24:16 公開日:2024-10-04
# ビデオ拡散における時間的モデリングの再定義:ベクトル化された時間ステップアプローチ Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach ( http://arxiv.org/abs/2410.03160v1 ) ライセンス: Link先を確認	Yaofang Liu, Yumeng Ren, Xiaodong Cun, Aitor Artola, Yang Liu, Tieyong Zeng, Raymond H. Chan, Jean-michel Morel,	(参考訳) 拡散モデルは画像生成に革命をもたらし、ビデオ生成への拡張は将来性を示している。しかしながら、現在のビデオ拡散モデル~(VDM)は、クリップレベルで適用されるスカラータイムステップ変数に依存しており、画像からビデオ生成のような様々なタスクに必要な複雑な時間依存性をモデル化する能力を制限する。この制限に対処するため,新しいベクトル化タイムステップ変数~(VTV)を導入したフレーム対応ビデオ拡散モデル~(FVDM)を提案する。従来のVDMとは異なり、我々の手法では各フレームが独立したノイズスケジュールに従うことができ、モデルが微粒な時間依存性を捉える能力を高めることができる。 FVDMの柔軟性は、標準的なビデオ生成、画像間生成、ビデオ補間、長いビデオ合成など、複数のタスクで実証されている。様々なVTV構成により、ゼロショット法における微調整時の破滅的な忘れ込みや限定的な一般化性といった課題を克服し、生成ビデオの質の向上を実現し、FVDMはビデオ生成品質において最先端の手法よりも優れ、拡張タスクにも優れることを示す実験的な評価を行った。既存のVDMの根本的な欠点に対処することで、FVDMはビデオ合成の新しいパラダイムを設定し、生成モデリングやマルチメディアアプリケーションに重要な意味を持つ堅牢なフレームワークを提供する。 Diffusion models have revolutionized image generation, and their extension to video generation has shown promise. However, current video diffusion models~(VDMs) rely on a scalar timestep variable applied at the clip level, which limits their ability to model complex temporal dependencies needed for various tasks like image-to-video generation. To address this limitation, we propose a frame-aware video diffusion model~(FVDM), which introduces a novel vectorized timestep variable~(VTV). Unlike conventional VDMs, our approach allows each frame to follow an independent noise schedule, enhancing the model's capacity to capture fine-grained temporal dependencies. FVDM's flexibility is demonstrated across multiple tasks, including standard video generation, image-to-video generation, video interpolation, and long video synthesis. Through a diverse set of VTV configurations, we achieve superior quality in generated videos, overcoming challenges such as catastrophic forgetting during fine-tuning and limited generalizability in zero-shot methods.Our empirical evaluations show that FVDM outperforms state-of-the-art methods in video generation quality, while also excelling in extended tasks. By addressing fundamental shortcomings in existing VDMs, FVDM sets a new paradigm in video synthesis, offering a robust framework with significant implications for generative modeling and multimedia applications.	翻訳日:2024-11-03 03:24:16 公開日:2024-10-04
# 適応型マスキングは視覚的グラウンド化を促進する Adaptive Masking Enhances Visual Grounding ( http://arxiv.org/abs/2410.03161v1 ) ライセンス: Link先を確認	Sen Jia, Lei Li,	(参考訳) 近年では、LAION-5BやDataComp-1Bのような拡張データセットでの大規模視覚言語事前学習の成功により、ゼロショットと少数ショット学習が注目されている。しかし、これらのデータセットの継続的な拡張は、特にデータの可用性と計算オーバーヘッドに関して重要な課題を示し、ローショット学習能力の進歩にボトルネックを生じさせる。本稿では,低ショットの学習シナリオにおいて,データセットサイズの増加を必要とせず,語彙基底の強化を目的とした,ガウス放射を用いた画像解釈型マスキングを提案する。認知科学からインスピレーションを得て,近年のマスク付きオートエンコーダ(MAE)の成功により,視覚バックボーンが生成する特徴マップの有能な領域における適応マスキングを活用している。これにより、隠蔽された情報の再構成を通じて、頑健で一般化された表現を学習し、局所的特徴とグローバル的特徴の両方に効果的な注意を向けることができる。我々はCOCOやODinWを含むベンチマークデータセットに対するアプローチの有効性を評価し、ゼロショットタスクや少数ショットタスクにおいて優れた性能を示す。実験結果から、画像はベースラインモデルより優れ、一般化の向上と低ショットシナリオの性能向上を実現していることがわかった。これらの知見は、ゼロショット学習と少数ショット学習の進歩にデータセットサイズを継続的にスケーリングするアプローチに代わる、アダプティブな特徴操作とガウス的モデリングの可能性を浮き彫りにしている。私たちのコードはhttps://github.com/git-lenny/IMAGE.comで公開されています。 In recent years, zero-shot and few-shot learning in visual grounding have garnered considerable attention, largely due to the success of large-scale vision-language pre-training on expansive datasets such as LAION-5B and DataComp-1B. However, the continuous expansion of these datasets presents significant challenges, particularly with respect to data availability and computational overhead, thus creating a bottleneck in the advancement of low-shot learning capabilities. In this paper, we propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, aimed at enhancing vocabulary grounding in low-shot learning scenarios without necessitating an increase in dataset size. Drawing inspiration from cognitive science and the recent success of masked autoencoders (MAE), our method leverages adaptive masking on salient regions of the feature maps generated by the vision backbone. This enables the model to learn robust, generalized representations through the reconstruction of occluded information, thereby facilitating effective attention to both local and global features. We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks. Experimental results consistently show that IMAGE outperforms baseline models, achieving enhanced generalization and improved performance in low-shot scenarios. These findings highlight the potential of adaptive feature manipulation through attention mechanisms and Gaussian modeling as a promising alternative to approaches that rely on the continual scaling of dataset sizes for the advancement of zero-shot and few-shot learning. Our code is publicly available at https://github.com/git-lenny/IMAGE.	翻訳日:2024-11-03 03:14:31 公開日:2024-10-04
# 透かし付きLLMは、加工プロンプトでユーザによって識別できるか? Can Watermarked LLMs be Identified by Users via Crafted Prompts? ( http://arxiv.org/abs/2410.03168v1 ) ライセンス: Link先を確認	Aiwei Liu, Sheng Guan, Yiming Liu, Leyi Pan, Yifei Zhang, Liancheng Fang, Lijie Wen, Philip S. Yu, Xuming Hu,	(参考訳) 大規模言語モデル(LLM)のためのテキスト透かしは,LLM出力の検出と誤用防止に大きく進歩している。現在の透かし技術は、高い検出性、テキスト品質への影響の最小化、テキスト編集に対する堅牢性を提供する。しかし、近年の研究はLLMサービスにおける透かし技術の不受容性についての調査を欠いている。 LLMプロバイダは、実際のシナリオにおける透かしの存在を開示したくないかもしれないため、サービスを使用するユーザの意欲を減らし、攻撃に対する透かしをより脆弱にする可能性がある。この研究は、透かしLLMの非受容性を初めて研究したものである。そこで我々は,LLMに適切に設計されたプロンプトによって透かしを検出する,Water-Probeと呼ばれる識別アルゴリズムを設計した。我々の主要な動機は、現在の透かしLLMが同じ透かしキーの下で一貫した偏りを露呈し、異なる透かしキーの下で同様の違いをもたらすことである。実験では、ほぼすべての主流の透かしアルゴリズムが、よく設計されたプロンプトと容易に識別できることが示され、一方、Water-Probeは、非透かしLLMに対して最小の偽陽性率を示す。最後に,透かしLLMの非受容性を高める鍵として,透かしキー選択のランダム性を高めることを提案する。そこで本研究では,複数の透かしキーをマージすることで,透かし不感受性を著しく向上するWater-Bag戦略を提案する。 Text watermarking for Large Language Models (LLMs) has made significant progress in detecting LLM outputs and preventing misuse. Current watermarking techniques offer high detectability, minimal impact on text quality, and robustness to text editing. However, current researches lack investigation into the imperceptibility of watermarking techniques in LLM services. This is crucial as LLM providers may not want to disclose the presence of watermarks in real-world scenarios, as it could reduce user willingness to use the service and make watermarks more vulnerable to attacks. This work is the first to investigate the imperceptibility of watermarked LLMs. We design an identification algorithm called Water-Probe that detects watermarks through well-designed prompts to the LLM. Our key motivation is that current watermarked LLMs expose consistent biases under the same watermark key, resulting in similar differences across prompts under different watermark keys. Experiments show that almost all mainstream watermarking algorithms are easily identified with our well-designed prompts, while Water-Probe demonstrates a minimal false positive rate for non-watermarked LLMs. Finally, we propose that the key to enhancing the imperceptibility of watermarked LLMs is to increase the randomness of watermark key selection. Based on this, we introduce the Water-Bag strategy, which significantly improves watermark imperceptibility by merging multiple watermark keys.	翻訳日:2024-11-03 03:14:31 公開日:2024-10-04
# 自己回帰型大言語モデルは計算的に普遍的である Autoregressive Large Language Models are Computationally Universal ( http://arxiv.org/abs/2410.03170v1 ) ライセンス: Link先を確認	Dale Schuurmans, Hanjun Dai, Francesco Zanini,	(参考訳) 変換器をベースとした言語モデルの自己回帰復号化は,外部介入や重みの変更を伴わずに,普遍的な計算を実現することができることを示す。この結果を確立するには、言語モデルがコンテキスト境界を使って任意の長さの入力を処理できるかを理解する必要がある。この目的のために,コンテクストウィンドウが進行するにつれて,長い入力によって出力されたトークンがシーケンスの最後に付加される自己回帰復号の一般化を検討する。まず、この結果が計算の古典的モデルであるラグシステムに対応していることを示す。新しい証明を活用することで、2027年の生産規則を持つラグシステムにより、普遍的なチューリングマシンをシミュレートできることが示される。次に,既存の大言語モデルがこのような普遍的なラグシステムの振る舞いをシミュレートできるかどうかを検討する。本稿では,2027年の生産ルールのそれぞれを正しく適用するために,決定的(欲求的)デコーディングの下でモデルを動かすgemini-1.5-pro-001に対して,単一のシステムプロンプトを開発できることを示し,肯定的な回答を与える。我々は、チャーチ・チューリングの論文でgemini-1.5-pro-001に拡張された自己回帰(greedy)デコーディングが汎用コンピュータであると結論付けた。 We show that autoregressive decoding of a transformer-based language model can realize universal computation, without external intervention or modification of the model's weights. Establishing this result requires understanding how a language model can process arbitrarily long inputs using a bounded context. For this purpose, we consider a generalization of autoregressive decoding where, given a long input, emitted tokens are appended to the end of the sequence as the context window advances. We first show that the resulting system corresponds to a classical model of computation, a Lag system, that has long been known to be computationally universal. By leveraging a new proof, we show that a universal Turing machine can be simulated by a Lag system with 2027 production rules. We then investigate whether an existing large language model can simulate the behaviour of such a universal Lag system. We give an affirmative answer by showing that a single system-prompt can be developed for gemini-1.5-pro-001 that drives the model, under deterministic (greedy) decoding, to correctly apply each of the 2027 production rules. We conclude that, by the Church-Turing thesis, prompted gemini-1.5-pro-001 with extended autoregressive (greedy) decoding is a general purpose computer.	翻訳日:2024-11-03 03:14:31 公開日:2024-10-04
# 深層カーネル学習型遺伝的アルゴリズムによる高次元空間の高速最適化 Rapid optimization in high dimensional space by deep kernel learning augmented genetic algorithms ( http://arxiv.org/abs/2410.03173v1 ) ライセンス: Link先を確認	Mani Valleti, Aditya Raghavan, Sergei V. Kalinin,	(参考訳) 複雑な高次元空間の探索は、分子発見、プロセス最適化、サプライチェーン管理といった分野における重要な課題を示す。遺伝的アルゴリズム(GA)は、新しい候補空間を作成するための大きなパワーを提供するが、新しい提案された各ソリューションの評価を必要とするため、しばしば高い計算要求を伴う。一方、Deep Kernel Learning(DKL)は、選択された候補構造の空間を効率的にナビゲートするが、生成能力に欠ける。本研究では,新しい候補空間の挙動を迅速に把握するために,DKLに基づく代理モデルの効率性を持つ新しい候補を生成するために,GAの生成力を両立させるアプローチを提案する。このDKL-GAフレームワークは、ベイズ最適化(BO)ワークフローを構築するためにさらに使用できる。本稿では,フェロSIMモデルの最適化による本手法の有効性を実証し,分子発見や電池充電の最適化など多種多様な課題に適用可能であることを示す。 Exploration of complex high-dimensional spaces presents significant challenges in fields such as molecular discovery, process optimization, and supply chain management. Genetic Algorithms (GAs), while offering significant power for creating new candidate spaces, often entail high computational demands due to the need for evaluation of each new proposed solution. On the other hand, Deep Kernel Learning (DKL) efficiently navigates the spaces of preselected candidate structures but lacks generative capabilities. This study introduces an approach that amalgamates the generative power of GAs to create new candidates with the efficiency of DKL-based surrogate models to rapidly ascertain the behavior of new candidate spaces. This DKL-GA framework can be further used to build Bayesian Optimization (BO) workflows. We demonstrate the effectiveness of this approach through the optimization of the FerroSIM model, showcasing its broad applicability to diverse challenges, including molecular discovery and battery charging optimization.	翻訳日:2024-11-03 03:14:31 公開日:2024-10-04
# HRVMamba: 密度予測のための高解像度ビジュアルステートスペースモデル HRVMamba: High-Resolution Visual State Space Model for Dense Prediction ( http://arxiv.org/abs/2410.03174v1 ) ライセンス: Link先を確認	Hao Zhang, Yongqiang Ma, Wenqi Shao, Ping Luo, Nanning Zheng, Kaipeng Zhang,	(参考訳) 近年、効率的なハードウェア対応設計(Mamba)を備えた状態空間モデル(SSM)は、トークン長とグローバルな受容領域に関する線形計算の複雑さから、コンピュータビジョンタスクにおいて有意な可能性を証明している。しかし、人間のポーズ推定やセマンティックセグメンテーションを含む密集した予測タスクにおけるマンバのパフォーマンスは、帰納的バイアスの不足、長距離の忘れ、低解像度の出力表現の3つの主要な課題によって制約されている。これらの課題に対処するために,マルチスケールの畳み込みカーネルを用いた動的ビジュアル状態空間(DVSS)ブロックを導入し,様々なスケールの局所的特徴を抽出し,帰納的バイアスを高めるとともに,変形可能な畳み込みを用いて,入力情報とタスク固有情報に基づいて適応的な空間的アグリゲーションを実現する。 DVSSブロックに基づく高分解能視覚空間モデル(HRVMamba)を導入し、プロセス全体を通して高分解能表現を保存し、効果的なマルチスケール特徴学習を促進する。大規模な実験では、HRVMambaの高密度予測タスクにおける印象的なパフォーマンスを強調し、ベルやホイッスルを使わずに既存のベンチマークモデルと競合する結果を達成している。コードはhttps://github.com/zhanghao5201/HRVMamba.comで入手できる。 Recently, State Space Models (SSMs) with efficient hardware-aware designs, i.e., Mamba, have demonstrated significant potential in computer vision tasks due to their linear computational complexity with respect to token length and their global receptive field. However, Mamba's performance on dense prediction tasks, including human pose estimation and semantic segmentation, has been constrained by three key challenges: insufficient inductive bias, long-range forgetting, and low-resolution output representation. To address these challenges, we introduce the Dynamic Visual State Space (DVSS) block, which utilizes multi-scale convolutional kernels to extract local features across different scales and enhance inductive bias, and employs deformable convolution to mitigate the long-range forgetting problem while enabling adaptive spatial aggregation based on input and task-specific information. By leveraging the multi-resolution parallel design proposed in HRNet, we introduce High-Resolution Visual State Space Model (HRVMamba) based on the DVSS block, which preserves high-resolution representations throughout the entire process while promoting effective multi-scale feature learning. Extensive experiments highlight HRVMamba's impressive performance on dense prediction tasks, achieving competitive results against existing benchmark models without bells and whistles. Code is available at https://github.com/zhanghao5201/HRVMamba.	翻訳日:2024-11-03 03:14:31 公開日:2024-10-04
# 事前学習型視覚言語(CLIP)モデルにおける対象幻覚の探索と緩和 Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models ( http://arxiv.org/abs/2410.03176v1 ) ライセンス: Link先を確認	Yufang Liu, Tao Ji, Changzhi Sun, Yuanbin Wu, Aimin Zhou,	(参考訳) LVLM(Large Vision-Language Models)は目覚ましい性能を達成しているが、これらのモデルにおける物体の幻覚に深刻な問題があることが研究で指摘されている。しかし、これらの幻覚の由来については明確な結論は出ていない。本稿では,CLIPモデルにおける物体幻覚問題に関する詳細な研究について述べる。孤立しても、CLIPモデルは対象の幻覚に傾向があり、幻覚問題は単に視覚と言語モダリティの相互作用によるものではないことを示唆する。そこで本研究では,種々の幻覚的問題を伴う負のサンプルを作成することで,対物的データ拡張手法を提案する。提案手法は,CLIPモデルのオブジェクト幻覚を効果的に緩和できることを示し,拡張されたモデルを視覚エンコーダとして使用することにより,LVLMにおけるオブジェクト幻覚の問題を効果的に緩和できることを示す。 Large Vision-Language Models (LVLMs) have achieved impressive performance, yet research has pointed out a serious issue with object hallucinations within these models. However, there is no clear conclusion as to which part of the model these hallucinations originate from. In this paper, we present an in-depth investigation into the object hallucination problem specifically within the CLIP model, which serves as the backbone for many state-of-the-art vision-language systems. We unveil that even in isolation, the CLIP model is prone to object hallucinations, suggesting that the hallucination problem is not solely due to the interaction between vision and language modalities. To address this, we propose a counterfactual data augmentation method by creating negative samples with a variety of hallucination issues. We demonstrate that our method can effectively mitigate object hallucinations for CLIP model, and we show the the enhanced model can be employed as a visual encoder, effectively alleviating the object hallucination issue in LVLMs.	翻訳日:2024-11-03 03:14:31 公開日:2024-10-04
# 宇宙粒子生成における動的カシミール効果とアナログ特性の基礎的課題 Foundational Issues in Dynamical Casimir Effect and Analogue Features in Cosmological Particle Creation ( http://arxiv.org/abs/2410.03179v1 ) ライセンス: Link先を確認	Jen-Tsung Hsiang, Bei-Lok Hu,	(参考訳) ブラックホールからのホーキング放射のアナログ源としての移動鏡は、力学カシミール効果(英語版)(DCE)とCPCのパラメトリック増幅機構に基づく類似性があるにもかかわらず、宇宙論的な粒子生成(英語版)(CPC)とともに広く研究されてきた。この「パースペクティブ」エッセイは、CPCの理論的基礎となる曲線時空における量子場理論の厳密さと完全性の一部を、様々な実験的な探索を楽しむDCEに伝えることを目的としている。実験室での空間実験を行う場合、なぜ「曲線」時空に悩まされるべきなのかというような単純な問題から、フィールド環境における着色雑音によるシステム力学における非局所散逸の頻繁な出現、量子レンツ法の存在、DCE放出のバックリアクション効果におけるDCE放出の変動散逸関係、鏡や媒体の動的応答を考慮したマイクロフィジカルモデルの構築など、基本的な理論的問題まで7つの課題を選択した。 DCEの理論的基盤の強化は、概念的明確性の向上だけでなく、DCEの将来の実験設計のコンセプトタイプの証明の開発にも有用である。 DCE実験の結果は、これらの基本的な過程の検証に最も期待するアナログ重力の精神から、初期の宇宙における磁場効果の理解を深めることになる。 Moving mirrors as analogue sources of Hawking radiation from black holes have been explored extensively, less so with cosmological particle creation (CPC), even though the analogy between dynamical Casimir effect (DCE) and CPC based on the mechanism of parametric amplification of quantum field fluctuations has also been known for a long time. This `perspective' essay intends to convey some of the rigor and thoroughness of quantum field theory in curved spacetime, which serves as the theoretical foundation of CPC, to DCE, which enjoys a variety of active experimental explorations. We have selected out seven issues of relevance to address, starting from the naively simple ones, e.g., why should one be bothered with `curved' spacetime when performing a laboratory experiment in ostensibly flat space, to foundational theoretical ones, such as the frequent appearance of nonlocal dissipation in the system dynamics induced by colored noises in its field environment, the existence of quantum Lenz law and fluctuation-dissipation relations in the backreaction effects of DCE emission on the moving atom/mirror or the source, and the construction of a microphysics model to account for the dynamical responses of a mirror or medium. The strengthening of theoretical ground for DCE is useful not only for improving conceptual clarity but needed for the development of proof of concept type of future experimental designs for DCE. Results from DCE experiments in turn will enrich our understanding of quantum field effects in the early universe because they are, in the spirit of analogue gravity, our best hopes for the verification of these fundamental processes.	翻訳日:2024-11-03 03:14:31 公開日:2024-10-04
# VDM-SLの仕様スライシング Specification Slicing for VDM-SL ( http://arxiv.org/abs/2410.03180v1 ) ライセンス: Link先を確認	Tomohiro Oda, Han-Myung Chang,	(参考訳) 実行可能な仕様は、軽量なフォーマルなソフトウェア開発における強力なツールの1つです。 VDM-SLは命令文を通じて内部状態を参照して更新する操作の明示的で実行可能な定義を可能にする。 VDM-SLの広範な実行可能なサブセットは仕様段階での検証とテストを可能にするが、命令型プログラミングのように読み書きやデバッグが困難になる。本稿では,プログラムスライシングに基づくVDM-SLの仕様スライシングを定義する。そして、その応用を提示し、議論する。 VDM-SL のスライサは ViennaTalk で実装されており、ブラウザや VDM-SL 仕様を記述するデバッガで使用することができる。 The executable specification is one of the powerful tools in lightweight formal software development. VDM-SL allows the explicit and executable definition of operations that reference and update internal state through imperative statements. While the extensive executable subset of VDM-SL enables validation and testing in the specification phase, it also brings difficulties in reading and debugging as in imperative programming. In this paper, we define specification slicing for VDM-SL based on program slicing, a technique used for debugging and maintaining program source code in implementation languages. We then present and discuss its applications. The slicer for VDM-SL is implemented on ViennaTalk and can be used on browsers and debuggers describing the VDM-SL specification.	翻訳日:2024-11-03 03:14:31 公開日:2024-10-04
# Kiss up, Kick down: ビジュアルペルソナを割り当てたマルチモーダル大規模言語モデルの振る舞い変化を探る Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas ( http://arxiv.org/abs/2410.03181v1 ) ライセンス: Link先を確認	Seungjong Sun, Eungu Lee, Seo Yeon Baek, Seunghyun Hwang, Wonbyung Lee, Dongyan Nan, Bernard J. Jansen, Jang Hyun Kim,	(参考訳) 本研究は,多モーダル大言語モデル(LLM)が視覚的ペルソナと行動の整合性について検討し,主にテキストに基づくペルソナに焦点を当てた文献における大きなギャップに対処する試みである。我々は,LLMの視覚的ペルソナとして割り当てるための5K架空のアバター画像の新たなデータセットを開発し,これらの画像に表される視覚的特徴に基づいて,アグレッシブ性に着目して,それらの交渉行動を分析した。その結果,LLMは人間に類似した方法で画像の攻撃性を評価し,攻撃的な視覚的ペルソナを刺激するとより攻撃的な交渉行動を出力することがわかった。興味深いことに、LLMは、相手のイメージが自分より攻撃的でなく、相手のイメージが攻撃的に見えるときの攻撃的行動がより少ない場合に、より攻撃的な交渉行動を示した。 This study is the first to explore whether multi-modal large language models (LLMs) can align their behaviors with visual personas, addressing a significant gap in the literature that predominantly focuses on text-based personas. We developed a novel dataset of 5K fictional avatar images for assignment as visual personas to LLMs, and analyzed their negotiation behaviors based on the visual traits depicted in these images, with a particular focus on aggressiveness. The results indicate that LLMs assess the aggressiveness of images in a manner similar to humans and output more aggressive negotiation behaviors when prompted with an aggressive visual persona. Interestingly, the LLM exhibited more aggressive negotiation behaviors when the opponent's image appeared less aggressive than their own, and less aggressive behaviors when the opponents image appeared more aggressive.	翻訳日:2024-11-03 03:14:31 公開日:2024-10-04
# 辞書アシスタントとしての大型言語モデルを用いたバイリンガル例文の生成 Generating bilingual example sentences with large language models as lexicography assistants ( http://arxiv.org/abs/2410.03182v1 ) ライセンス: Link先を確認	Raphael Merx, Ekaterina Vylomova, Kemal Kurniawan,	(参考訳) 本稿では,フランス語(高資源),インドネシア語(中資源),テトゥン語(低資源),英語を対象言語とする言語間のバイリンガル辞書の例文の生成と評価におけるLLMの性能について述べる。 GDEX(Good Dictionary Example)基準に対するLCM生成例の品質評価を行った。この結果から,LLMは十分な辞書例を生成できるが,低リソース言語では性能が著しく低下することが明らかとなった。また,低いアノテータ間の合意率に反映される品質など,人間の嗜好の変動も観察する。そこで本研究では,LLMを個々のアノテータの好みに合わせることができることを示す。さらに、実例の自動評価に事前訓練された言語モデルを用いることについて検討し、文の難易度が高リソース言語における典型性とインテリジェンスのための優れたプロキシとなることを発見した。また,LLM生成文対に対する600の新たな評価データセットも提供し,特に低リソース言語において,LLMが辞書作業のコスト削減に寄与する可能性について考察した。 We present a study of LLMs' performance in generating and rating example sentences for bilingual dictionaries across languages with varying resource levels: French (high-resource), Indonesian (mid-resource), and Tetun (low-resource), with English as the target language. We evaluate the quality of LLM-generated examples against the GDEX (Good Dictionary EXample) criteria: typicality, informativeness, and intelligibility. Our findings reveal that while LLMs can generate reasonably good dictionary examples, their performance degrades significantly for lower-resourced languages. We also observe high variability in human preferences for example quality, reflected in low inter-annotator agreement rates. To address this, we demonstrate that in-context learning can successfully align LLMs with individual annotator preferences. Additionally, we explore the use of pre-trained language models for automated rating of examples, finding that sentence perplexity serves as a good proxy for typicality and intelligibility in higher-resourced languages. Our study also contributes a novel dataset of 600 ratings for LLM-generated sentence pairs, and provides insights into the potential of LLMs in reducing the cost of lexicographic work, particularly for low-resource languages.	翻訳日:2024-11-03 03:14:31 公開日:2024-10-04
# EXAQ: LLMの高速化のための指数的アウェア量子化 EXAQ: Exponent Aware Quantization For LLMs Acceleration ( http://arxiv.org/abs/2410.03185v1 ) ライセンス: Link先を確認	Moran Shkolnik, Maxim Fishman, Brian Chmiel, Hilla Ben-Yaacov, Ron Banner, Kfir Yehuda Levy,	(参考訳) 量子化は、LLM(Large Language Models)推論に関連する計算と記憶のコストを削減するための主要なアプローチとして確立されている。現在の研究の大半は、重みとアクティベーションの定量化に重点を置いており、低ビットの汎用行列多重演算(GEMM)が可能であり、残りの非線形演算は高い精度で実行される。本研究では, これらの手法の適用により, LLMの推論における主要なボトルネックがソフトマックス層にあることを発見した。ソフトマックス演算は, 指数計算, 累積, 正規化の3段階からなる。ソフトマックス関数への入力に対して最適なクリッピング値を決定するための解析的手法を提案する。この方法では、$e^x$と$\sum(e^x)$の両方の計算を最小限の精度で高速化する。例えば、LLaMA1-30Bでは、よく知られた"Physical Interaction: Question Answering"(PIQA)データセット評価に基づいて、2ビット量子化を行い、ベースライン性能を実現する。この超低ビット量子化は、蓄積相において初めて約4倍の加速を可能にする。 e^x$と$\sum(e^x)$の両方を加速させることで、ソフトマックス演算の36.9%の加速が得られる。 Quantization has established itself as the primary approach for decreasing the computational and storage expenses associated with Large Language Models (LLMs) inference. The majority of current research emphasizes quantizing weights and activations to enable low-bit general-matrix-multiply (GEMM) operations, with the remaining non-linear operations executed at higher precision. In our study, we discovered that following the application of these techniques, the primary bottleneck in LLMs inference lies in the softmax layer. The softmax operation comprises three phases: exponent calculation, accumulation, and normalization, Our work focuses on optimizing the first two phases. We propose an analytical approach to determine the optimal clipping value for the input to the softmax function, enabling sub-4-bit quantization for LLMs inference. This method accelerates the calculations of both $e^x$ and $\sum(e^x)$ with minimal to no accuracy degradation. For example, in LLaMA1-30B, we achieve baseline performance with 2-bit quantization on the well-known "Physical Interaction: Question Answering" (PIQA) dataset evaluation. This ultra-low bit quantization allows, for the first time, an acceleration of approximately 4x in the accumulation phase. The combination of accelerating both $e^x$ and $\sum(e^x)$ results in a 36.9% acceleration in the softmax operation.	翻訳日:2024-11-03 03:14:31 公開日:2024-10-04
# 糖尿病網膜症分類における概念記述法の検討 Looking into Concept Explanation Methods for Diabetic Retinopathy Classification ( http://arxiv.org/abs/2410.03188v1 ) ライセンス: Link先を確認	Andrea M. Storås, Josefine V. Sundgaard,	(参考訳) 糖尿病網膜症は糖尿病の一般的な合併症であり,眼底画像を用いた網膜異常の進行のモニタリングが重要である。画像は医療専門家によって解釈されなければならないため、糖尿病網膜症のために糖尿病患者全員をスクリーニングすることは不可能である。深層学習は、眼底画像の自動解析とグルーピングの素晴らしい結果を示している。しかし、1つの欠点は、解釈可能性の欠如であり、クリニックにおけるそのようなシステムの実装を妨げている。説明可能な人工知能手法は、ディープニューラルネットワークを説明するために応用できる。概念に基づく説明は人間の理解には直感的であるが、糖尿病網膜症のグレーディングについては詳細は明らかにされていない。本研究は、糖尿病網膜症の自動診断のために開発されたディープニューラルネットワークを説明するための概念に基づく2つの説明手法について検討・比較する。いずれの方法にも長所と短所があることに気付き、メソッドの選択は利用可能なデータとエンドユーザの好みを考慮に入れなければなりません。 Diabetic retinopathy is a common complication of diabetes, and monitoring the progression of retinal abnormalities using fundus imaging is crucial. Because the images must be interpreted by a medical expert, it is infeasible to screen all individuals with diabetes for diabetic retinopathy. Deep learning has shown impressive results for automatic analysis and grading of fundus images. One drawback is, however, the lack of interpretability, which hampers the implementation of such systems in the clinic. Explainable artificial intelligence methods can be applied to explain the deep neural networks. Explanations based on concepts have shown to be intuitive for humans to understand, but have not yet been explored in detail for diabetic retinopathy grading. This work investigates and compares two concept-based explanation techniques for explaining deep neural networks developed for automatic diagnosis of diabetic retinopathy: Quantitative Testing with Concept Activation Vectors and Concept Bottleneck Models. We found that both methods have strengths and weaknesses, and choice of method should take the available data and the end user's preferences into account.	翻訳日:2024-11-03 03:04:25 公開日:2024-10-04
# ペアワイズサンプル最適化を用いた時間ステップ拡散モデルのチューニング Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization ( http://arxiv.org/abs/2410.03190v1 ) ライセンス: Link先を確認	Zichen Miao, Zhengyuan Yang, Kevin Lin, Ze Wang, Zicheng Liu, Lijuan Wang, Qiang Qiu,	(参考訳) 近年の時間分割拡散モデルの進歩により、非蒸留多段階モデルに匹敵する高品質な画像生成が可能になったが、推論ステップは大幅に少なくなった。このようなモデルは、低推論コストと遅延のためにアプリケーションにとって魅力的であるが、単純な拡散目標でそれらを微調整すると、劣化し、ぼやけた出力が得られる。直感的な代替手段は、優れた結果を生み出すが、複雑で計算集約的な、微調整された教師モデルで拡散蒸留を繰り返すことである。本稿では,任意の時間ステップ蒸留拡散モデルを直接微調整できるPSOアルゴリズムを提案する。 PSOは、現在の時間ステップ蒸留モデルからサンプリングされた追加の参照画像を導入し、トレーニング画像と参照画像との相対的な近縁率を増大させる。これにより、モデルは出力分布を微調整しながら、数ステップの生成能力を維持できる。また、PSOは、オフラインサンプリングとオンラインサンプリングの両方のペアワイズデータに柔軟に拡張できる一般化された定式化であり、拡散モデル優先最適化の様々な一般的な目的をカバーできることを示した。我々は、好みの最適化と、スタイル転送やコンセプトのカスタマイズなど、その他の微調整タスクにおいてPSOを評価する。 PSOは、オフラインとオンラインのペアワイズ画像データの両方を用いて、蒸留モデルを直接人間の好ましくない世代に適応させることができることを示す。 PSOはまた、時間ステップ蒸留拡散モデルを直接チューニングすることで、スタイル転送と概念カスタマイズの有効性を示す。 Recent advancements in timestep-distilled diffusion models have enabled high-quality image generation that rivals non-distilled multi-step models, but with significantly fewer inference steps. While such models are attractive for applications due to the low inference cost and latency, fine-tuning them with a naive diffusion objective would result in degraded and blurry outputs. An intuitive alternative is to repeat the diffusion distillation process with a fine-tuned teacher model, which produces good results but is cumbersome and computationally intensive; the distillation training usually requires magnitude higher of training compute compared to fine-tuning for specific image styles. In this paper, we present an algorithm named pairwise sample optimization (PSO), which enables the direct fine-tuning of an arbitrary timestep-distilled diffusion model. PSO introduces additional reference images sampled from the current time-step distilled model, and increases the relative likelihood margin between the training images and reference images. This enables the model to retain its few-step generation ability, while allowing for fine-tuning of its output distribution. We also demonstrate that PSO is a generalized formulation which can be flexibly extended to both offline-sampled and online-sampled pairwise data, covering various popular objectives for diffusion model preference optimization. We evaluate PSO in both preference optimization and other fine-tuning tasks, including style transfer and concept customization. We show that PSO can directly adapt distilled models to human-preferred generation with both offline and online-generated pairwise preference image data. PSO also demonstrates effectiveness in style transfer and concept customization by directly tuning timestep-distilled diffusion models.	翻訳日:2024-11-03 03:04:25 公開日:2024-10-04
# MultiVerse: 効率的かつ表現力のあるマルチタスクテキスト音声合成 MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech ( http://arxiv.org/abs/2410.03192v1 ) ライセンス: Link先を確認	Taejun Bak, Youngsik Eom, SeungJae Choi, Young-Sun Joo,	(参考訳) 訓練データ量をスケールアップするテキスト音声合成システム(TTS)は、ゼロショット音声合成において大幅に改善されている。しかし、これらのシステムには一定の制限があり、大量のトレーニングデータが必要であり、コストが増大し、しばしばプロソディの類似性を見落としている。これらの問題に対処するために、ゼロショットマルチタスクTSシステムであるMultiVerseを提案する。 MultiVerseは、従来のデータ駆動型アプローチよりも、トレーニングデータが少ない。限られたデータであってもゼロショット性能を確保するために,フィルタ関連およびソース関連表現をモデル化するためのプロンプトを利用して,ソースフィルタ理論に基づくアンタングルメントを利用する。さらに,プロソディの類似性をさらに向上するため,プロソディ・モデリング手法として,プロソディ・ベースの自己回帰的手法と非自己回帰的手法を併用した。評価の結果,MultiVerse のマルチタスク TTS 性能は,データ量が少ないデータ駆動型 TTS システムに匹敵するゼロショット TTS 性能を達成できるだけでなく,同じデータ量で訓練された他のゼロショット TTS システムよりも大幅に向上することが示された。特に,提案するプロソディ・モデリング技術は,与えられたプロソディと高いプロソディ類似性を持つ音声を生成するMultiVerseの能力に大きく寄与する。私たちのサンプルはhttps://nc-ai.github.io/speech/publications/multiverse/index.htmlで公開されています。 Text-to-speech (TTS) systems that scale up the amount of training data have achieved significant improvements in zero-shot speech synthesis. However, these systems have certain limitations: they require a large amount of training data, which increases costs, and often overlook prosody similarity. To address these issues, we propose MultiVerse, a zero-shot multi-task TTS system that is able to perform TTS or speech style transfer in zero-shot and cross-lingual conditions. MultiVerse requires much less training data than traditional data-driven approaches. To ensure zero-shot performance even with limited data, we leverage source-filter theory-based disentanglement, utilizing the prompt for modeling filter-related and source-related representations. Additionally, to further enhance prosody similarity, we adopt a prosody modeling approach combining prompt-based autoregressive and non-autoregressive methods. Evaluations demonstrate the remarkable zero-shot multi-task TTS performance of MultiVerse and show that MultiVerse not only achieves zero-shot TTS performance comparable to data-driven TTS systems with much less data, but also significantly outperforms other zero-shot TTS systems trained with the same small amount of data. In particular, our novel prosody modeling technique significantly contributes to MultiVerse's ability to generate speech with high prosody similarity to the given prompts. Our samples are available at https://nc-ai.github.io/speech/publications/multiverse/index.html	翻訳日:2024-11-03 03:04:25 公開日:2024-10-04
# マスク言語モデルを用いた並列コーパス拡張 Parallel Corpus Augmentation using Masked Language Models ( http://arxiv.org/abs/2410.03194v1 ) ライセンス: Link先を確認	Vibhuti Kumari, Narayana Murthy Kavi,	(参考訳) 本稿では, 良質なテキストコーパスを並列テキストコーパスに拡張する手法を提案し, 得られたシードコーパスよりも多くの折りたたみ式コーパスを生成できることを示した。追加の単言語コーパスは不要である。我々は、多言語マスク言語モデルを用いて、文脈における代替単語のマスキングと予測を行い、文の組込みを用いて、互いに翻訳される可能性のある文対をチェックし、選択する。 MT品質評価のための指標を用いて手法を横断的に検証する。本手法は,適切なシードコーパスが利用できるすべての言語ペアにおいて,データ不足の問題を大幅に軽減できると考えている。 In this paper we propose a novel method of augmenting parallel text corpora which promises good quality and is also capable of producing many fold larger corpora than the seed corpus we start with. We do not need any additional monolingual corpora. We use Multi-Lingual Masked Language Model to mask and predict alternative words in context and we use Sentence Embeddings to check and select sentence pairs which are likely to be translations of each other. We cross check our method using metrics for MT Quality Estimation. We believe this method can greatly alleviate the data scarcity problem for all language pairs for which a reasonable seed corpus is available.	翻訳日:2024-11-03 03:04:24 公開日:2024-10-04
# 大規模社会技術システムの要求工学における市民プラットフォームの可能性 The Potential of Citizen Platforms for Requirements Engineering of Large Socio-Technical Software Systems ( http://arxiv.org/abs/2410.03195v1 ) ライセンス: Link先を確認	Jukka Ruohonen, Kalle Hjerppe,	(参考訳) 参加型市民プラットフォーム(Participatory citizen platform)は、政策立案と熟考型民主主義に市民をデジタル的により深く関与させる革新的なソリューションである。これらのプラットフォームはエンジニアリングの文脈でも使用されているが、これまでのところ、プラットフォームと要求工学を結びつけるための作業は行われていない。本稿ではこの顕著なギャップを埋める。要件工学とともにプラットフォームについて議論することに加えて、この論文は潜在的な利点とデメリットを詳述し、ソフトウェア工学の文脈における将来のパイロット研究の道を開く。これらの工学的特徴により、この論文は、その実装とガバナンスを含む公共部門における大規模社会技術ソフトウェアシステムの研究にも貢献する。 Participatory citizen platforms are innovative solutions to digitally better engage citizens in policy-making and deliberative democracy in general. Although these platforms have been used also in an engineering context, thus far, there is no existing work for connecting the platforms to requirements engineering. The present paper fills this notable gap. In addition to discussing the platforms in conjunction with requirements engineering, the paper elaborates potential advantages and disadvantages, thus paving the way for a future pilot study in a software engineering context. With these engineering tenets, the paper also contributes to the research of large socio-technical software systems in a public sector context, including their implementation and governance.	翻訳日:2024-11-03 03:04:24 公開日:2024-10-04
# 対話型言語における対話型構造学習による自動質問生成のための言語間移動 Cross-lingual Transfer for Automatic Question Generation by Learning Interrogative Structures in Target Languages ( http://arxiv.org/abs/2410.03197v1 ) ライセンス: Link先を確認	Seonjeong Hwang, Yunsu Kim, Gary Geunbae Lee,	(参考訳) 自動質問生成(QG)は、QAコーパスの強化、チャットボットシステムの強化、教育材料の開発など、幅広い目的を果たす。その重要性にもかかわらず、既存のデータセットのほとんどは英語に重点を置いており、その結果、他の言語でのデータ可用性にかなりの差がある。 QG(XLT-QG)の言語間転送は、高ソース言語データセットでトレーニングされたモデルが低リソース言語で質問を生成することを可能にすることで、この制限に対処する。本稿では,小言語モデルを用いて,単言語,並列,ラベル付きデータを必要としない,単純かつ効率的なXLT-QG手法を提案する。我々のモデルは、英語のQAデータセットのみに基づいて訓練され、限定された質問例から質問構造を学習し、対象言語で質問を生成する。実験の結果,提案手法は複数のXLT-QGベースラインより優れ,GPT-3.5-turboに匹敵する性能を示した。さらに,本モデルが生成した合成データは,多言語QAモデルの学習に有用であることを示す。大規模言語モデルよりもパラメータが大幅に少なく、ターゲット言語に対する追加のトレーニングを必要としないため、本手法は様々な言語を対象としたQGおよびQAタスクに有効なソリューションを提供する。 Automatic question generation (QG) serves a wide range of purposes, such as augmenting question-answering (QA) corpora, enhancing chatbot systems, and developing educational materials. Despite its importance, most existing datasets predominantly focus on English, resulting in a considerable gap in data availability for other languages. Cross-lingual transfer for QG (XLT-QG) addresses this limitation by allowing models trained on high-resource language datasets to generate questions in low-resource languages. In this paper, we propose a simple and efficient XLT-QG method that operates without the need for monolingual, parallel, or labeled data in the target language, utilizing a small language model. Our model, trained solely on English QA datasets, learns interrogative structures from a limited set of question exemplars, which are then applied to generate questions in the target language. Experimental results show that our method outperforms several XLT-QG baselines and achieves performance comparable to GPT-3.5-turbo across different languages. Additionally, the synthetic data generated by our model proves beneficial for training multilingual QA models. With significantly fewer parameters than large language models and without requiring additional training for target languages, our approach offers an effective solution for QG and QA tasks across various languages.	翻訳日:2024-11-03 03:04:24 公開日:2024-10-04
# PersoBench: 大規模言語モデルにおけるパーソナライズされた応答生成のベンチマーク PersoBench: Benchmarking Personalized Response Generation in Large Language Models ( http://arxiv.org/abs/2410.03198v1 ) ライセンス: Link先を確認	Saleh Afzoon, Usman Naseem, Amin Beheshti, Zahra Jamali,	(参考訳) 大きな言語モデル(LLM)は印象的な会話能力を示したが、パーソナライズされた応答を提供する能力は未だに不明である。近年のベンチマークでは、ロールプレイングコンテキストにおけるペルソナの一貫性を自動的に評価しているが、応答生成におけるパーソナライゼーションの評価は未定である。このギャップに対処するため、ゼロショット環境での対話生成におけるLLMのパーソナライズ能力を評価するために、新しいベンチマークPersoBenchを提案する。我々は、よく知られたデータセットと様々なメトリクスを用いて、3つのオープンソースと3つのクローズドソースLCMの性能を評価する。 3つの有名なペルソナ・アウェア・データセットを用いて分析を行い、標準およびチェーン・オブ・シークレット・プロンディングの手法を用いて、流布、多様性、コヒーレンス、パーソナライゼーションを含む応答品質の複数の次元を評価した。以上の結果から,LLMは流動的で多様な応答を生成するのに優れるが,会話コンテキストと提供されるペルソナの両方を考慮して,パーソナライズされた一貫性のある応答を提供するのに十分ではないことが明らかとなった。ベンチマーク実装はhttps://github.com/salehafzoon/PersoBench.comで公開しています。 While large language models (LLMs) have exhibited impressive conversational capabilities, their proficiency in delivering personalized responses remains unclear. Although recent benchmarks automatically evaluate persona consistency in role-playing contexts using LLM-based judgment, the evaluation of personalization in response generation remains underexplored. To address this gap, we present a new benchmark, PersoBench, to evaluate the personalization ability of LLMs in persona-aware dialogue generation within a zero-shot setting. We assess the performance of three open-source and three closed-source LLMs using well-known datasets and a range of metrics. Our analysis, conducted on three well-known persona-aware datasets, evaluates multiple dimensions of response quality, including fluency, diversity, coherence, and personalization, across both standard and chain-of-thought prompting methods. Our findings reveal that while LLMs excel at generating fluent and diverse responses, they are far from satisfactory in delivering personalized and coherent responses considering both the conversation context and the provided personas. Our benchmark implementation is available at https://github.com/salehafzoon/PersoBench.	翻訳日:2024-11-03 03:04:24 公開日:2024-10-04
# サイバー物理システムのためのテストジェネレータの学習 Learning test generators for cyber-physical systems ( http://arxiv.org/abs/2410.03202v1 ) ライセンス: Link先を確認	Jarkko Peltomäki, Ivan Porres,	(参考訳) サイバー物理システムに対するブラックボックス実行時検証手法は、入力と出力が時間とともに信号として表現され、その正確性要件が時間論理で規定されるシステムにおけるエラーを発見するために用いられる。既存の方法、例えば要求のファルシフィケーションは、システム正当性に対する反例である単一の入力を見つけることに集中することが多い。本稿では,単一要求に対して多種多様な反例を生成可能なテストジェネレータの開発方法について検討する。いくつかの反例は、入力条件の異なるシステム障害を露呈し、障害の根本原因分析をサポートする。本稿では,WOGANアルゴリズムを用いて自動生成する手法を提案する。このアルゴリズムは、反例の集合上の均一分布のターゲット分布をモデル化するワッサーシュタイン生成逆数ネットワークを反復的に訓練することによって機能する。 WOGANは、実行時検証のためのテストジェネレータとして機能する生成モデルを訓練するアルゴリズムである。トレーニングは、以前のモデルやデータセットを必要とせずにオンラインで実行される。また,このようなテストジェネレータの評価基準も提案する。我々は、ARCH-COMPのファルシフィケーションベンチマークなど、よく知られたいくつかの問題に対して、訓練されたジェネレータを評価した。実験結果から,WOGANアルゴリズムによって訓練された発電機は,一様ランダムサンプリングのサンプルと同等に多種多様である試験を生成する一方で,最先端の要求ファルシフィケーションアルゴリズムと同じくらい有効であることが示唆された。我々は、WOGANは自動でテストジェネレータを生成するための実行可能な方法であり、これらのテストジェネレータは、サイバー物理システムの実行時検証のために、多種多様な反例を生成することができると結論付けた。 Black-box runtime verification methods for cyber-physical systems can be used to discover errors in systems whose inputs and outputs are expressed as signals over time and their correctness requirements are specified in a temporal logic. Existing methods, such as requirement falsification, often focus on finding a single input that is a counterexample to system correctness. In this paper, we study how to create test generators that can produce multiple and diverse counterexamples for a single requirement. Several counterexamples expose system failures in varying input conditions and support the root cause analysis of the faults. We present the WOGAN algorithm to create such test generators automatically. The algorithm works by training iteratively a Wasserstein generative adversarial network that models the target distribution of the uniform distribution on the set of counterexamples. WOGAN is an algorithm that trains generative models that act as test generators for runtime verification. The training is performed online without the need for a previous model or dataset. We also propose criteria to evaluate such test generators. We evaluate the trained generators on several well-known problems including the ARCH-COMP falsification benchmarks. Our experimental results indicate that generators trained by the WOGAN algorithm are as effective as state-of-the-art requirement falsification algorithms while producing tests that are as diverse as a sample from uniform random sampling. We conclude that WOGAN is a viable method to produce test generators automatically and that these test generators can generate multiple and diverse counterexamples for the runtime verification of cyber-physical systems.	翻訳日:2024-11-03 03:04:24 公開日:2024-10-04
# 1次論理変換による意味構造学習 Learning Semantic Structure through First-Order-Logic Translation ( http://arxiv.org/abs/2410.03203v1 ) ライセンス: Link先を確認	Akshay Chaturvedi, Nicholas Asher,	(参考訳) 本論文では,トランスフォーマーに基づく言語モデルが,簡単な文から述語構造を抽出できるかどうかを考察する。まず、どの述語がどの対象に当てはまるかを言語モデルが混同することがあることを示す。これを軽減するために,質問応答(Q/A),一階述語論理(FOL)翻訳という2つの課題と,素早い処理と微調整を行う2つの方法を検討する。 FOL翻訳では、一般化能力を評価するために設計された合成データセット上で、いくつかの大きな言語モデルを微調整する。 Q/AではBERTやRoBERTaのようなエンコーダモデルを微調整し、LSMのプロンプトを使用する。その結果,LLMのFOL翻訳は述語構造を学習するのに適していることがわかった。 In this paper, we study whether transformer-based language models can extract predicate argument structure from simple sentences. We firstly show that language models sometimes confuse which predicates apply to which objects. To mitigate this, we explore two tasks: question answering (Q/A), and first order logic (FOL) translation, and two regimes, prompting and finetuning. In FOL translation, we finetune several large language models on synthetic datasets designed to gauge their generalization abilities. For Q/A, we finetune encoder models like BERT and RoBERTa and use prompting for LLMs. The results show that FOL translation for LLMs is better suited to learn predicate argument structure.	翻訳日:2024-11-03 03:04:24 公開日:2024-10-04
# メタヒューリスティックアルゴリズムの設計・実験と実世界の最適化問題への応用に関する研究 A Tutorial on the Design, Experimentation and Application of Metaheuristic Algorithms to Real-World Optimization Problems ( http://arxiv.org/abs/2410.03205v1 ) ライセンス: Link先を確認	Eneko Osaba, Esther Villar-Rodriguez, Javier Del Ser, Antonio J. Nebro, Daniel Molina, Antonio LaTorre, Ponnuthurai N. Suganthan, Carlos A. Coello Coello, Francisco Herrera,	(参考訳) ここ数年、メタヒューリスティックアルゴリズムによる実世界の最適化問題の定式化と効率的な解法は、数多くの研究の触媒となっている。メタヒューリスティックの設計と使用に関する数十年の歴史的進歩にもかかわらず、新しい技術成果の理解可能性、アルゴリズム設計の正しさ、性能検証性に関して大きな困難が残っている。明確な例は、最適化に使用されるメタヒューリスティック(英語版)を扱う作業の複製性の欠如に起因している。さらに、多くの場合、報告された結果に疑わしい統計的意義がある。この研究は、科学的厳密さ、価値、透明性を提供するために最適化に使用されるメタヒューリスティックス手法の研究を行う際に、受け入れるべき良いプラクティスの提案を聴衆に提供することを目的としている。この目的のために、我々は、この科学分野に取り組む際に従うべきすべての研究フェーズをカバーするステップバイステップの方法論を紹介した。具体的には、問題の定式化、ソリューションエンコーディング、探索演算子の実装、評価指標、実験の設計、実世界のパフォーマンスに関する考察等について、しばしば見過ごされがちな側面と有用な勧告について論じる。最後に、現実のアプリケーション環境上での展開と運用において、新しく開発された最適化メタヒューリスティックスの成功に向けた重要な考察、課題、研究の方向性について概説する。 In the last few years, the formulation of real-world optimization problems and their efficient solution via metaheuristic algorithms has been a catalyst for a myriad of research studies. In spite of decades of historical advancements on the design and use of metaheuristics, large difficulties still remain in regards to the understandability, algorithmic design uprightness, and performance verifiability of new technical achievements. A clear example stems from the scarce replicability of works dealing with metaheuristics used for optimization, which is often infeasible due to ambiguity and lack of detail in the presentation of the methods to be reproduced. Additionally, in many cases, there is a questionable statistical significance of their reported results. This work aims at providing the audience with a proposal of good practices which should be embraced when conducting studies about metaheuristics methods used for optimization in order to provide scientific rigor, value and transparency. To this end, we introduce a step by step methodology covering every research phase that should be followed when addressing this scientific field. Specifically, frequently overlooked yet crucial aspects and useful recommendations will be discussed in regards to the formulation of the problem, solution encoding, implementation of search operators, evaluation metrics, design of experiments, and considerations for real-world performance, among others. Finally, we will outline important considerations, challenges, and research directions for the success of newly developed optimization metaheuristics in their deployment and operation over real-world application environments.	翻訳日:2024-11-03 02:54:39 公開日:2024-10-04
# SPHINX:ハイパーグラフ推論ネットワークを用いた構造予測 SPHINX: Structural Prediction using Hypergraph Inference Network ( http://arxiv.org/abs/2410.03208v1 ) ライセンス: Link先を確認	Iulia Duta, Pietro Liò,	(参考訳) 高次関係の重要性は、多くの現実世界システムにおいて広く認識されている。しかし、それらに注釈をつけるのは退屈な作業であり、時には不可能な作業である。その結果、データモデリングの現在のアプローチは、高次相互作用を完全に無視するか、あるいはペア接続に単純化する。高次処理を容易にするため、ハイパーグラフ構造が利用できない場合でも、最終ノードレベル信号のみから、非教師なしの方法で遅延ハイパーグラフ構造を推論するモデルであるハイパーグラフ推論ネットワーク(SPHINX)を用いて構造予測を導入する。このモデルは、各ハイパーエッジに対して、ノード上の確率分布を逐次予測するために使用されるソフトで微分可能なクラスタリング法と、それらを明示的なハイパーグラフ構造に変換するサンプリングアルゴリズムから構成される。近年のk-サブセットサンプリングの進歩は,先行研究で示されたトレーニング不安定性のいくつかに対処して,離散ハイパーグラフ構造を生成するのに適したツールであることが示されている。結果として得られるモデルは、最新のハイパーグラフニューラルネットワークに必要な高次構造を生成することができ、注釈付けが難しいドメインでの高次相互作用のキャプチャを容易にする。トラジェクトリ予測のための2つの挑戦的データセットを用いて行った広範囲なアブレーション研究と実験を通じて、我々のモデルは、解釈可能で最終的な性能を高めるための適切な潜時ハイパーグラフを推測できることを実証した。 The importance of higher-order relations is widely recognized in a large number of real-world systems. However, annotating them is a tedious and sometimes impossible task. Consequently, current approaches for data modelling either ignore the higher-order interactions altogether or simplify them into pairwise connections. In order to facilitate higher-order processing, even when a hypergraph structure is not available, we introduce Structural Prediction using Hypergraph Inference Network (SPHINX), a model that learns to infer a latent hypergraph structure in an unsupervised way, solely from the final node-level signal. The model consists of a soft, differentiable clustering method used to sequentially predict, for each hyperedge, the probability distribution over the nodes and a sampling algorithm that converts them into an explicit hypergraph structure. We show that the recent advancement in k-subset sampling represents a suitable tool for producing discrete hypergraph structures, addressing some of the training instabilities exhibited by prior works. The resulting model can generate the higher-order structure necessary for any modern hypergraph neural network, facilitating the capture of higher-order interaction in domains where annotating them is difficult. Through extensive ablation studies and experiments conducted on two challenging datasets for trajectory prediction, we demonstrate that our model is capable of inferring suitable latent hypergraphs, that are interpretable and enhance the final performance.	翻訳日:2024-11-03 02:54:39 公開日:2024-10-04
# Tadashi: 保証された正確さでAIベースの自動コード生成を実現する Tadashi: Enabling AI-Based Automated Code Generation With Guaranteed Correctness ( http://arxiv.org/abs/2410.03210v1 ) ライセンス: Link先を確認	Emil Vatai, Aleksandr Drozd, Ivan R. Ivanov, Yinghao Ren, Mohamed Wahib,	(参考訳) フレームワークとDSL 自動生成コードは、伝統的に、適用されたコード変換の合法性を保証するために厳格な方法を持つように、開発する人間の専門家に依存してきました。機械学習(ML)は、ハードウェアターゲットに最適化されたコードを自動生成する手段として広く採用されている。しかし、MLソリューション、特にブラックボックスDNNは、合法性に関する保証を提供していない。本稿では,多面体モデルを利用して,MLをコード生成に適用する上で不可欠なデータセットのキュレートを求める研究者を支援する図書館,多面体モデルを提案する。 Tadashiは、ベースライン参照コードに適用された多面的スケジュールに基づいて、候補変換の合法性を確実かつ実践的にチェックする機能を提供する。図書館が生成した変換の合法性を保証することを証明し、その軽量な実用コストを実証する。 Tadashiはhttps://github.com/vatai/tadashi/.comで入手できる。 Frameworks and DSLs auto-generating code have traditionally relied on human experts developing them to have in place rigorous methods to assure the legality of the applied code transformations. Machine Learning (ML) is gaining wider adoption as a means to auto-generate code optimised for the hardware target. However, ML solutions, and in particular black-box DNNs, provide no such guarantees on legality. In this paper we propose a library, Tadashi, which leverages the polyhedral model to empower researchers seeking to curate datasets crucial for applying ML in code-generation. Tadashi provides the ability to reliably and practically check the legality of candidate transformations on polyhedral schedules applied on a baseline reference code. We provide a proof that our library guarantees the legality of generated transformations, and demonstrate its lightweight practical cost. Tadashi is available at https://github.com/vatai/tadashi/.	翻訳日:2024-11-03 02:54:39 公開日:2024-10-04
# CUDLE:未管理環境における大麻検出のためのラベルスカルシティ下での学習 CUDLE: Learning Under Label Scarcity to Detect Cannabis Use in Uncontrolled Environments ( http://arxiv.org/abs/2410.03211v1 ) ライセンス: Link先を確認	Reza Rahimi Azghan, Nicholas C. Glodosky, Ramesh Kumar Sah, Carrie Cuttler, Ryan McLaughlin, Michael J. Cleveland, Hassan Ghasemzadeh,	(参考訳) ウェアラブルセンサーシステムは、行動介入を支援するために生理的健康をリアルタイムで客観的に監視する大きな可能性を実証している。しかし、人間の監督が限られており、患者による自己ラベル化に依存しているため、生活自由環境で正確なラベルを取得することは依然として困難であり、データ収集や教師付き学習は特に困難である。この問題に対処するために、我々はCUDLE(Cannabis Use Detection with Label efficiency)を紹介した。これは、現実のウェアラブルセンサーデータによる自己教師型学習を活用して、医療的課題に対処する新しいフレームワークである。 CUDLEは、対照的な学習フレームワークを通じて、センサ由来のデータを使用して大麻の消費モーメントを特定する。まず、データ拡張を伴う自己教師付きプレテキストタスクを通じて、堅牢な表現を学習する。これらの表現は、浅い分類器で下流タスクで微調整され、CUDLEは従来の教師付きメソッド、特にラベル付きデータよりも優れている。アプローチを評価するため,大麻利用者20名を対象に,EMA(Ecological Momentary Assessment)手法を用いて,利用者が報告した大麻使用モーメントと合わせて500時間以上のウェアラブルセンサデータを収集した。収集したデータを用いて広範囲に分析したところ,CUDLEの精度は73.4%,教師付きアプローチでは71.1%,ラベル数が減少するにつれてパフォーマンスギャップが拡大していることがわかった。特に、CUDLEは、75%少ないラベルを使用しながら教師付きモデルを上回るだけでなく、はるかに少ない被験者でピーク性能に達する。 Wearable sensor systems have demonstrated a great potential for real-time, objective monitoring of physiological health to support behavioral interventions. However, obtaining accurate labels in free-living environments remains difficult due to limited human supervision and the reliance on self-labeling by patients, making data collection and supervised learning particularly challenging. To address this issue, we introduce CUDLE (Cannabis Use Detection with Label Efficiency), a novel framework that leverages self-supervised learning with real-world wearable sensor data to tackle a pressing healthcare challenge: the automatic detection of cannabis consumption in free-living environments. CUDLE identifies cannabis consumption moments using sensor-derived data through a contrastive learning framework. It first learns robust representations via a self-supervised pretext task with data augmentation. These representations are then fine-tuned in a downstream task with a shallow classifier, enabling CUDLE to outperform traditional supervised methods, especially with limited labeled data. To evaluate our approach, we conducted a clinical study with 20 cannabis users, collecting over 500 hours of wearable sensor data alongside user-reported cannabis use moments through EMA (Ecological Momentary Assessment) methods. Our extensive analysis using the collected data shows that CUDLE achieves a higher accuracy of 73.4%, compared to 71.1% for the supervised approach, with the performance gap widening as the number of labels decreases. Notably, CUDLE not only surpasses the supervised model while using 75% less labels, but also reaches peak performance with far fewer subjects.	翻訳日:2024-11-03 02:54:39 公開日:2024-10-04
# WMT24 Indic MT共有タスクのためのNLIP_Lab-IIth低リソースMTシステム NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task ( http://arxiv.org/abs/2410.03215v1 ) ライセンス: Link先を確認	Pramit Sahoo, Maharaj Brahma, Maunendra Sankar Desarkar,	(参考訳) 本稿では,WMT 24の低リソースインデックス言語翻訳におけるタスク共有システムについて述べる。 eng $\leftrightarrow$ {as, kha, lus, mni} を参加言語ペアとみなす。この共有タスクでは、22のインド諸言語に対するアライメント強化により、埋め込みをより近くに整列させることを目標とした事前学習モデルの微調整について検討する。我々の一次システムは、事前訓練されたモデルに基づく言語固有の微調整に基づいている。我々は、eng$\rightarrow$as, eng$\rightarrow$kha, eng$\rightarrow$lus, eng$\rightarrow$mniの公式公試セットにおいて、50.6, 42.3, 54.9, 66.3のchrF2スコアを得る。また、言語グループ化や層凍結による多言語学習についても検討する。私たちのコード、モデル、生成された翻訳はここで利用可能です。 In this paper, we describe our system for the WMT 24 shared task of Low-Resource Indic Language Translation. We consider eng $\leftrightarrow$ {as, kha, lus, mni} as participating language pairs. In this shared task, we explore the finetuning of a pre-trained model motivated by the pre-trained objective of aligning embeddings closer by alignment augmentation \cite{lin-etal-2020-pre} for 22 scheduled Indian languages. Our primary system is based on language-specific finetuning on a pre-trained model. We achieve chrF2 scores of 50.6, 42.3, 54.9, and 66.3 on the official public test set for eng$\rightarrow$as, eng$\rightarrow$kha, eng$\rightarrow$lus, eng$\rightarrow$mni respectively. We also explore multilingual training with/without language grouping and layer-freezing. Our code, models, and generated translations are available here: https://github.com/pramitsahoo/WMT2024-LRILT.	翻訳日:2024-11-03 02:54:39 公開日:2024-10-04
# 医療データ管理のためのインテリジェントな量子サイバーセキュリティフレームワーク An Intelligent Quantum Cyber-Security Framework for Healthcare Data Management ( http://arxiv.org/abs/2410.03217v1 ) ライセンス: Link先を確認	Kishu Gupta, Deepika Saxena, Pooja Rani, Jitendra Kumar, Aaisha Makkar, Ashutosh Kumar Singh, Chung-Nan Lee,	(参考訳) デジタル医療は、医療サービスの強化のために、消費者が医療データにアクセスし、配布しやすくするために不可欠である。しかし、医療システム間のデジタル化に関する重要な懸念は、機密性の高いデジタル医療データ共有と悪意あるエンティティの積極的な評価を促進するために、迅速な、生産的で安全な保管施設と活発なコミュニケーション戦略を必要とすることである。本稿では,医療データ管理におけるセキュリティとプライバシの潜在的な問題を克服する,包括的な量子ベースのフレームワークを提案する。量子暗号化を利用することで、セキュアなストレージと共有クラウドプラットフォーム上での医療データの分散に量子暗号化を装備する。また、このフレームワークは、量子フィードフォワードニューラルネットワークユニットを使用して、アクセスを許可する前にデータ要求の背後にある意図を調べ、潜在的なデータ漏洩を積極的に推定する。このようにして、提案したフレームワークは、高度な量子アプローチと機械学習を結合して、悪意あるエンティティを自動で保護し、アクセスし、予測することで、医療データ全体を管理する。このように提案されたIQ-HDMは、より協力的で効果的な医療提供をもたらし、個人の健康データを適切に管理する権限を与える。提案されたIQ-HDMフレームワークと最先端の手法の実験的評価と比較は、医療データセキュリティに関連するサイバー脅威に対処する上で、67.6%の大幅な改善を概説している。 Digital healthcare is essential to facilitate consumers to access and disseminate their medical data easily for enhanced medical care services. However, the significant concern with digitalization across healthcare systems necessitates for a prompt, productive, and secure storage facility along with a vigorous communication strategy, to stimulate sensitive digital healthcare data sharing and proactive estimation of malicious entities. In this context, this paper introduces a comprehensive quantum-based framework to overwhelm the potential security and privacy issues for secure healthcare data management. It equips quantum encryption for the secured storage and dispersal of healthcare data over the shared cloud platform by employing quantum encryption. Also, the framework furnishes a quantum feed-forward neural network unit to examine the intention behind the data request before granting access, for proactive estimation of potential data breach. In this way, the proposed framework delivers overall healthcare data management by coupling the advanced and more competent quantum approach with machine learning to safeguard the data storage, access, and prediction of malicious entities in an automated manner. Thus, the proposed IQ-HDM leads to more cooperative and effective healthcare delivery and empowers individuals with adequate custody of their health data. The experimental evaluation and comparison of the proposed IQ-HDM framework with state-of-the-art methods outline a considerable improvement up to 67.6%, in tackling cyber threats related to healthcare data security.	翻訳日:2024-11-03 02:54:39 公開日:2024-10-04
# ブラウン雑音を聴くことを学ぶ Learning to steer with Brownian noise ( http://arxiv.org/abs/2410.03221v1 ) ライセンス: Link先を確認	Stefan Ankirchner, Sören Christensen, Jan Kallsen, Philip Le Borne, Stefan Perko,	(参考訳) 本稿では,境界速度追従問題のエルゴード版について考察し,意思決定者が基礎となるシステムパラメータの知識を欠いており,同時に制御しながら学習しなければならないことを仮定する。本研究では,移動経験平均に基づくアルゴリズムを提案し,統計的手法と確率的制御理論を統合するための枠組みを開発する。私たちの一番の成果は対数的期待の後悔率です。これを実現するために,本研究では,根底にあるプロセスのエルゴード収束率と考慮された推定者のリスクを厳密に分析する。 This paper considers an ergodic version of the bounded velocity follower problem, assuming that the decision maker lacks knowledge of the underlying system parameters and must learn them while simultaneously controlling. We propose algorithms based on moving empirical averages and develop a framework for integrating statistical methods with stochastic control theory. Our primary result is a logarithmic expected regret rate. To achieve this, we conduct a rigorous analysis of the ergodic convergence rates of the underlying processes and the risks of the considered estimators.	翻訳日:2024-11-03 02:54:39 公開日:2024-10-04
# 大規模言語モデルを用いた産業機械故障の相談 Consultation on Industrial Machine Faults with Large language Models ( http://arxiv.org/abs/2410.03223v1 ) ライセンス: Link先を確認	Apiradee Boonmee, Kritsada Wongsuwan, Pimchanok Sukjai,	(参考訳) 産業機械故障診断は、製造環境における運転効率と安全性の重要な要素である。従来の手法は専門家の知識と特定の機械学習モデルに大きく依存しており、適応性に制限があり、広範なラベル付きデータを必要とする。本稿では,大規模言語モデル(LLM)を利用した新しい手法を提案する。プロンプトを動的に作成することにより,多様なデータソースから情報を合成する能力が向上し,文脈的理解や行動可能なレコメンデーションが向上する。実験の結果,本手法はベースラインモデルより優れており,各種故障の診断において91%の精度が得られた。この知見は, 工業的断層協議の改革におけるLLMの可能性を浮き彫りにし, 複雑な環境におけるより効率的な保守戦略の道を開いた。 Industrial machine fault diagnosis is a critical component of operational efficiency and safety in manufacturing environments. Traditional methods rely heavily on expert knowledge and specific machine learning models, which can be limited in their adaptability and require extensive labeled data. This paper introduces a novel approach leveraging Large Language Models (LLMs), specifically through a structured multi-round prompting technique, to improve fault diagnosis accuracy. By dynamically crafting prompts, our method enhances the model's ability to synthesize information from diverse data sources, leading to improved contextual understanding and actionable recommendations. Experimental results demonstrate that our approach outperforms baseline models, achieving an accuracy of 91% in diagnosing various fault types. The findings underscore the potential of LLMs in revolutionizing industrial fault consultation practices, paving the way for more effective maintenance strategies in complex environments.	翻訳日:2024-11-03 02:54:39 公開日:2024-10-04
# ScriptViz: 大規模な映画データベースに基づくスクリプト作成を支援する可視化ツール ScriptViz: A Visualization Tool to Aid Scriptwriting based on a Large Movie Database ( http://arxiv.org/abs/2410.03224v1 ) ライセンス: Link先を確認	Anyi Rao, Jean-Peïc Chou, Maneesh Agrawala,	(参考訳) スクリプトライターは通常、自分の心の可視化に頼って、自分の想像力を使って、自分が書いているシーンを見たり、感じたり、経験したりすることで、鮮やかなストーリーを作る。メンタルヴィジュアライゼーションの他に、映画内の既存のイメージやシーンを参照し、視覚要素を分析して特定の雰囲気や雰囲気を作り出すことも多い。本稿では,スクリーンライティングプロセスのための大規模映画データベースをベースとした外部可視化を実現するためのScriptVizを開発する。スクリプトのテキストと対話に基づいて、大規模な映画データベースから参照ビジュアルをリアルタイムで取得する。このツールは、作者が視覚的要素を制御できる2つのタイプのコントロールを提供する。 1) 固定された視覚要素で何が欲しいか正確に確認し、 2)不確実な要素の分散を見よ。 15人のスクリプト作者のユーザ評価によると、ScriptVizは、スクリプトと密に連携し、スクリプトの作成を支援する、一貫性がありながら多様な視覚的可能性を持つスクリプトを提示できる。 Scriptwriters usually rely on their mental visualization to create a vivid story by using their imagination to see, feel, and experience the scenes they are writing. Besides mental visualization, they often refer to existing images or scenes in movies and analyze the visual elements to create a certain mood or atmosphere. In this paper, we develop ScriptViz to provide external visualization based on a large movie database for the screenwriting process. It retrieves reference visuals on the fly based on scripts' text and dialogue from a large movie database. The tool provides two types of control on visual elements that enable writers to 1) see exactly what they want with fixed visual elements and 2) see variances in uncertain elements. User evaluation among 15 scriptwriters shows that ScriptViz is able to present scriptwriters with consistent yet diverse visual possibilities, aligning closely with their scripts and helping their creation.	翻訳日:2024-11-03 02:54:39 公開日:2024-10-04
# ALR$^2$:Long-context Question AnsweringのためのRetrieve-then-Reason Framework ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering ( http://arxiv.org/abs/2410.03227v1 ) ライセンス: Link先を確認	Huayang Li, Pat Verga, Priyanka Sen, Bowen Yang, Vijay Viswanathan, Patrick Lewis, Taro Watanabe, Yixuan Su,	(参考訳) 近年,大規模言語モデル (LLM) のコンテキストウィンドウが大幅に拡張されている。しかし、LLMが処理できるコンテキスト長は増大しているが、そのコンテキストを正確に推論するモデルの能力は著しく低下している。これは、現代のLLMがコンテキスト内の膨大な情報に圧倒されることが多いためであり、質問に答える際には、モデルはテキスト全体にわずかに分散している関係する証拠を識別し、推論しなければならない。長文推論の課題を軽減するために,LLMが中間的検索ステップで収集した関連する証拠を推論することのできる,検索テーマ推論フレームワークを開発した。現代のLLMは、関連した事実を正確に取り出すのに苦労し、しばしば「回収された事実」を幻覚させ、欠陥のある推論と誤った答えを生み出す。これらの問題に対処するために、ALR$^2$を導入し、LLMの長文推論能力を明示的な2段階の手順により強化する手法、すなわち、LLMを検索と推論の両方の目的と整合させる手法を提案する。長文推論タスクの性能劣化を軽減するために, ALR$^2$の有効性を示す。長文QAベンチマークの広範な実験により、我々の手法は、HotpotQAデータセットとSQuADデータセットの長文バージョンで、それぞれ8.4と7.9のEMゲインを達成し、競争ベースラインを大きなマージンで上回ります。 The context window of large language models (LLMs) has been extended significantly in recent years. However, while the context length that the LLM can process has grown, the capability of the model to accurately reason over that context degrades noticeably. This occurs because modern LLMs often become overwhelmed by the vast amount of information in the context; when answering questions, the model must identify and reason over relevant evidence sparsely distributed throughout the text. To alleviate the challenge of long-context reasoning, we develop a retrieve-then-reason framework, enabling LLMs to reason over relevant evidence collected during an intermediate retrieval step. We find that modern LLMs struggle to accurately retrieve relevant facts and instead, often hallucinate "retrieved facts", resulting in flawed reasoning and the production of incorrect answers. To address these issues, we introduce ALR$^2$, a method that augments the long-context reasoning capability of LLMs via an explicit two-stage procedure, i.e., aligning LLMs with the objectives of both retrieval and reasoning. We demonstrate the efficacy of ALR$^2$ for mitigating performance degradation in long-context reasoning tasks. Through extensive experiments on long-context QA benchmarks, we find our method to outperform competitive baselines by large margins, achieving at least 8.4 and 7.9 EM gains on the long-context versions of HotpotQA and SQuAD datasets, respectively.	翻訳日:2024-11-03 02:54:39 公開日:2024-10-04
# 予測のためのフローマッチングにおける確率パスの設計選択 Elucidating the Design Choice of Probability Paths in Flow Matching for Forecasting ( http://arxiv.org/abs/2410.03229v1 ) ライセンス: Link先を確認	Soon Hoe Lim, Yijin Wang, Annan Yu, Emma Hart, Michael W. Mahoney, Xiaoye S. Li, N. Benjamin Erichson,	(参考訳) フローマッチングは、最近、生成モデリングの強力なパラダイムとして現れ、潜在空間における確率的時系列予測にまで拡張されている。しかし,確率経路モデルの選択が予測性能に与える影響は未検討のままである。本研究では,フローマッチングによる時空間データの予測が,確率パスモデルの選択に非常に敏感であることを示す。そこで本研究では,予測性能の向上を目的とした新しい確率パスモデルを提案する。各種力学系ベンチマークにおける実験結果から,本モデルがトレーニング中の収束を高速化し,既存の確率パスモデルと比較して予測性能が向上することが示唆された。重要なことは、我々のアプローチは推論時に効率的であり、ほんの数ステップのサンプリングしか必要としない。これにより,提案手法は実世界の応用に有効であり,確率予測のための新たな道を開くことができる。 Flow matching has recently emerged as a powerful paradigm for generative modeling and has been extended to probabilistic time series forecasting in latent spaces. However, the impact of the specific choice of probability path model on forecasting performance remains under-explored. In this work, we demonstrate that forecasting spatio-temporal data with flow matching is highly sensitive to the selection of the probability path model. Motivated by this insight, we propose a novel probability path model designed to improve forecasting performance. Our empirical results across various dynamical system benchmarks show that our model achieves faster convergence during training and improved predictive performance compared to existing probability path models. Importantly, our approach is efficient during inference, requiring only a few sampling steps. This makes our proposed model practical for real-world applications and opens new avenues for probabilistic forecasting.	翻訳日:2024-11-03 02:54:39 公開日:2024-10-04
# LLMの信頼度に基づくLLM生成符号の選択的表示 Showing LLM-Generated Code Selectively Based on Confidence of LLMs ( http://arxiv.org/abs/2410.03234v1 ) ライセンス: Link先を確認	Jia Li, Yuqi Zhu, Yongmin Li, Ge Li, Zhi Jin,	(参考訳) 大規模言語モデル(LLM)は、コード生成において印象的な能力を示しているが、誤ったプログラムを生成する可能性がある。プログラムを読むのに10倍の時間がかかる。これらの誤ったプログラムを開発者に示すことは、開発者のエネルギーを無駄にし、ソフトウェアにセキュリティリスクを導入します。上記の制限に対処するため,新しいLLMベースのコード生成手法であるHonestCoderを提案する。 HonestCoder は LLM の信頼性に基づいて生成したプログラムを開発者に選択的に表示する。信頼性は、生成されたプログラムの正確性に関する貴重な洞察を提供する。この目的を達成するために,LLMのコード生成に対する信頼度を推定する新しい手法を提案する。 LLM 生成プログラム間のマルチモーダル類似度を測定することで信頼性を推定する。 TruthCodeBenchは2,265のサンプルからなり、2つの人気のあるプログラミング言語(PythonとJava)をカバーする。我々は、HonestCoderを4つの人気のあるLLM(例えば、DeepSeek-CoderとCode Llama)に適用し、TruthCodeBenchで評価する。実験の結果,以下の知見を得た。 1)HonestCoderはLLMの信頼性を効果的に推定し,生成したプログラムの正確性を正確に判定する。例えば、HoestCoderは、AUROCでは27.79%、AUCPRでは63.74%で最先端のベースラインを上回っている。 2) HonestCoderは、開発者が示す誤ったプログラムの数を減らすことができる。 8つのベースラインと比較して、より正しいプログラムと間違ったプログラムを開発者に示すことができる。 (3) コードが無差別に表示されるのと比較して、HoestCoderはわずかな時間オーバーヘッド(要求あたり約0.4秒)しか追加しない。 (4)ソフトウェア開発におけるLCMの活用を促進するための今後の方向性について論じる。コード関連タスクの実行において,LCMのアウトプットの信頼性を測る上で,この取り組みが広範な議論の動機となることを願っている。 Large Language Models (LLMs) have shown impressive abilities in code generation, but they may generate erroneous programs. Reading a program takes ten times longer than writing it. Showing these erroneous programs to developers will waste developers' energies and introduce security risks to software. To address the above limitations, we propose HonestCoder, a novel LLM-based code generation approach. HonestCoder selectively shows the generated programs to developers based on LLMs' confidence. The confidence provides valuable insights into the correctness of generated programs. To achieve this goal, we propose a novel approach to estimate LLMs' confidence in code generation. It estimates confidence by measuring the multi-modal similarity between LLMs-generated programs. We collect and release a multilingual benchmark named TruthCodeBench, which consists of 2,265 samples and covers two popular programming languages (i.e., Python and Java). We apply HonestCoder to four popular LLMs (e.g., DeepSeek-Coder and Code Llama) and evaluate it on TruthCodeBench. Based on the experiments, we obtain the following insights. (1) HonestCoder can effectively estimate LLMs' confidence and accurately determine the correctness of generated programs. For example, HonestCoder outperforms the state-of-the-art baseline by 27.79% in AUROC and 63.74% in AUCPR. (2) HonestCoder can decrease the number of erroneous programs shown to developers. Compared to eight baselines, it can show more correct programs and fewer erroneous programs to developers. (3) Compared to showing code indiscriminately, HonestCoder only adds slight time overhead (approximately 0.4 seconds per requirement). (4) We discuss future directions to facilitate the application of LLMs in software development. We hope this work can motivate broad discussions about measuring the reliability of LLMs' outputs in performing code-related tasks.	翻訳日:2024-11-02 23:28:42 公開日:2024-10-04
# 大規模言語モデルを用いた不整合公理付きオントロジーの強化 Enriching Ontologies with Disjointness Axioms using Large Language Models ( http://arxiv.org/abs/2410.03235v1 ) ライセンス: Link先を確認	Elias Crum, Antonio De Santis, Manon Ovide, Jiaxin Pan, Alessia Pisu, Nicolas Lazzari, Sebastian Rudolph,	(参考訳) オントロジは、知識グラフの洗練された推論と一貫性チェックに有用であるにもかかわらず、クラス間での明確な不一致宣言を欠いていることが多い。本研究では,クラス不整合公理を同定し,主張することで,オントロジーを充実させるLarge Language Models (LLMs) の可能性を探る。提案手法は,LLMに埋め込まれた暗黙の知識を活用することを目的としている。本手法をDBpediaのオントロジーで検証し,オープンソース LLM に着目した。本研究は, LLMが効果的なプロンプト戦略によって導かれると, クラス間の関係を確実に識別し, オントロジーの完了過程を広範囲な手入力なしで合理化できることを示唆する。包括的不整合性向上のために,不整合性とサブクラス文の論理的関係を考慮に入れ,満足度を維持し,LLMへの呼び出し数を減少させるプロセスを提案する。この研究は、自動オントロジー拡張におけるLLMの将来の応用の基礎を提供し、戦略的プロンプト設計によるLLM性能の最適化に関する洞察を提供する。私たちのコードはGitHubでhttps://github.com/n28div/llm-disjointnessで公開されています。 Ontologies often lack explicit disjointness declarations between classes, despite their usefulness for sophisticated reasoning and consistency checking in Knowledge Graphs. In this study, we explore the potential of Large Language Models (LLMs) to enrich ontologies by identifying and asserting class disjointness axioms. Our approach aims at leveraging the implicit knowledge embedded in LLMs, using prompt engineering to elicit this knowledge for classifying ontological disjointness. We validate our methodology on the DBpedia ontology, focusing on open-source LLMs. Our findings suggest that LLMs, when guided by effective prompt strategies, can reliably identify disjoint class relationships, thus streamlining the process of ontology completion without extensive manual input. For comprehensive disjointness enrichment, we propose a process that takes logical relationships between disjointness and subclass statements into account in order to maintain satisfiability and reduce the number of calls to the LLM. This work provides a foundation for future applications of LLMs in automated ontology enhancement and offers insights into optimizing LLM performance through strategic prompt design. Our code is publicly available on GitHub at https://github.com/n28div/llm-disjointness.	翻訳日:2024-11-02 23:28:42 公開日:2024-10-04
# 映画の字幕を超えて:YouTubeは音声語彙の最適な近似か? Beyond Film Subtitles: Is YouTube the Best Approximation of Spoken Vocabulary? ( http://arxiv.org/abs/2410.03240v1 ) ライセンス: Link先を確認	Adam Nohejl, Frederikus Hudi, Eunike Andriani Kardinata, Shintaro Ozaki, Maria Angelica Riera Machin, Hongyu Sun, Justin Vasselli, Taro Watanabe,	(参考訳) 単語頻度は、心理言語学において重要な変数であり、大きな言語モデル(LLM)の時代でさえ、単語と人間の親しみをモデル化するのに有用である。映画の字幕の頻度は、日常的な言語露出の特に良い近似であることが証明されている。しかし、多くの言語では、映画の字幕は簡単には入手できないか、英語から圧倒的に翻訳されている。我々は、慎重に処理されたYouTube字幕から抽出された周波数が、現在利用可能な最も優れたリソースに匹敵する近似を提供することを示した。さらに、高品質な字幕や音声コーパスが存在しない言語でも利用できる。我々は,中国語,英語,インドネシア語,日本語,スペイン語の5つの多言語に対して,YouTube字幕を用いて周波数ノルムを構築し,語彙決定時間,単語親和性,語彙複雑性との相関性を評価する。 2つの心理言語学変数と強く相関するのに加えて、新しい周波数に対する単純な線形回帰は、英語と日本語の語彙的複雑性予測タスクにおいて、フィルム字幕周波数とLLM GPT-4で訓練されたモデルの両方を上回り、新しい高いスコアを達成する。私たちのコード、頻度リスト、fastTextワードの埋め込み、統計言語モデルはhttps://github.com/naist-nlp/tubelex.comで無料で利用可能です。 Word frequency is a key variable in psycholinguistics, useful for modeling human familiarity with words even in the era of large language models (LLMs). Frequency in film subtitles has proved to be a particularly good approximation of everyday language exposure. For many languages, however, film subtitles are not easily available, or are overwhelmingly translated from English. We demonstrate that frequencies extracted from carefully processed YouTube subtitles provide an approximation comparable to, and often better than, the best currently available resources. Moreover, they are available for languages for which a high-quality subtitle or speech corpus does not exist. We use YouTube subtitles to construct frequency norms for five diverse languages, Chinese, English, Indonesian, Japanese, and Spanish, and evaluate their correlation with lexical decision time, word familiarity, and lexical complexity. In addition to being strongly correlated with two psycholinguistic variables, a simple linear regression on the new frequencies achieves a new high score on a lexical complexity prediction task in English and Japanese, surpassing both models trained on film subtitle frequencies and the LLM GPT-4. Our code, the frequency lists, fastText word embeddings, and statistical language models are freely available at https://github.com/naist-nlp/tubelex.	翻訳日:2024-11-02 23:28:42 公開日:2024-10-04
# オンライン模倣学習のための単一歩数サイクルによる潜時行動の先行 Latent Action Priors From a Single Gait Cycle Demonstration for Online Imitation Learning ( http://arxiv.org/abs/2410.03246v1 ) ライセンス: Link先を確認	Oliver Hausdörfer, Alexander von Rohr, Éric Lefort, Angela Schoellig,	(参考訳) シミュレーションにおける深層強化学習(DRL)は、しばしば脆く非現実的な学習結果をもたらす。エージェントをより望ましいソリューションへプッシュするには、例えば報酬形成、専門家データ、モーションプリミティブを通じて、学習プロセスに事前情報を注入することができる。本稿では,ロボット学習における帰納的バイアスとして,専門家による実証から学んだ潜伏行動を行動空間の先行として提案する。単純なオートエンコーダを用いて1つのオープンループ歩行サイクルのみからこれらの動作先を学習できることが示される。 DRLにおけるこれらの潜伏アクションの先行と、模倣のための確立されたスタイルの報酬を組み合わせることで、上記の専門家によるパフォーマンスのレベルが達成され、より望ましい歩みにつながります。さらに、アクション先行は転送タスクの性能を大幅に改善し、より高い目標速度の歩行遷移を導いた。ビデオとコードはhttps://sites.google.com/view/latent-action-priors.comで公開されている。 Deep Reinforcement Learning (DRL) in simulation often results in brittle and unrealistic learning outcomes. To push the agent towards more desirable solutions, prior information can be injected in the learning process through, for instance, reward shaping, expert data, or motion primitives. We propose an additional inductive bias for robot learning: latent actions learned from expert demonstration as priors in the action space. We show that these action priors can be learned from only a single open-loop gait cycle using a simple autoencoder. Using these latent action priors combined with established style rewards for imitation in DRL achieves above expert demonstration level of performance and leads to more desirable gaits. Further, action priors substantially improve the performance on transfer tasks, even leading to gait transitions for higher target speeds. Videos and code are available at https://sites.google.com/view/latent-action-priors.	翻訳日:2024-11-02 23:28:42 公開日:2024-10-04
# 多チャンネル情報を用いた神経細胞の3次元分割と細胞型同定 3D Segmentation of Neuronal Nuclei and Cell-Type Identification using Multi-channel Information ( http://arxiv.org/abs/2410.03248v1 ) ライセンス: Link先を確認	Antonio LaTorre, Lidia Alonso-Nanclares, José María Peña, Javier De Felipe,	(参考訳) 背景分析画像を用いて脳内の異なる細胞の種類を正確に推定することは神経科学の主要な目的である。神経細胞の自動的、選択的検出とセグメンテーションは、神経解剖学的研究において重要なステップである。神経核の3次元再構成を改良し,非神経細胞型以外の領域を分割する手法を提案する。結果は,ラット新皮質からの画像のスタック上で,複雑なシナリオ(大きな画像のスタック,不均一な染色,異なる細胞マーカーを可視化する3つの異なるチャネル)で検証した。神経細胞核と3Dセグメンテーションの良好な識別比を提供することができた。既存の方法との比較: 多くの自動ツールが現在利用可能であるが、異なる方法では、ラベル付けとイメージング技術の違いや、細胞を検知するアルゴリズムの違いにより、同じ脳の領域でも異なる細胞数の推定結果が得られる。さらに、利用可能な自動化ソフトウェア手法のいくつかは、神経解剖学者による評価の後、不正確または不整合であると報告された細胞数を推定した。結論神経細胞、グリア細胞、および血管周囲細胞を識別する自動セグメンテーションのためのツールを持つことは重要である。それは、現在手動で実行されているタスクを大幅にスピードアップし、細胞のカウントを体系的にし、人間のバイアスを避けます。さらに、異なる細胞の3次元再構成により、細胞の空間分布のモデルを生成することができる。 Background Analyzing images to accurately estimate the number of different cell types in the brain using automatic methods is a major objective in neuroscience. The automatic and selective detection and segmentation of neurons would be an important step in neuroanatomical studies. New method We present a method to improve the 3D reconstruction of neuronal nuclei that allows their segmentation, excluding the nuclei of non-neuronal cell types. Results We have tested the algorithm on stacks of images from rat neocortex, in a complex scenario (large stacks of images, uneven staining, and three different channels to visualize different cellular markers). It was able to provide a good identification ratio of neuronal nuclei and a 3D segmentation. Comparison with Existing Methods: Many automatic tools are in fact currently available, but different methods yield different cell count estimations, even in the same brain regions, due to differences in the labeling and imaging techniques, as well as in the algorithms used to detect cells. Moreover, some of the available automated software methods have provided estimations of cell numbers that have been reported to be inaccurate or inconsistent after evaluation by neuroanatomists. Conclusions It is critical to have a tool for automatic segmentation that allows discrimination between neurons, glial cells and perivascular cells. It would greatly speed up a task that is currently performed manually and would allow the cell counting to be systematic, avoiding human bias. Furthermore, the resulting 3D reconstructions of different cell types can be used to generate models of the spatial distribution of cells.	翻訳日:2024-11-02 23:28:42 公開日:2024-10-04
# エキスパートレベル言語モデルはエキスパートレベルアノテーションか? Are Expert-Level Language Models Expert-Level Annotators? ( http://arxiv.org/abs/2410.03254v1 ) ライセンス: Link先を確認	Yu-Min Tseng, Wei-Lin Chen, Chung-Chi Chen, Hsin-Hsi Chen,	(参考訳) データアノテーションは、関連する情報を含むテキストデータのラベル付けやタグ付けを指す。 LLMをヒトのアノテーターの代替品として利用することについて、多くの研究が肯定的な結果を報告している。しかし、既存の研究は古典的なNLPタスクに焦点をあてており、専門家の知識を必要とする領域において、データアノテータとしてのLLMが果たすことの度合いは未定である。本研究では,3つの専門分野にわたる包括的アプローチについて検討し,費用対効果の観点からの実践的提案について考察する。我々の知る限り、我々はLSMを専門家レベルのデータアノテータとして初めて体系的に評価した。 Data annotation refers to the labeling or tagging of textual data with relevant information. A large body of works have reported positive results on leveraging LLMs as an alternative to human annotators. However, existing studies focus on classic NLP tasks, and the extent to which LLMs as data annotators perform in domains requiring expert knowledge remains underexplored. In this work, we investigate comprehensive approaches across three highly specialized domains and discuss practical suggestions from a cost-effectiveness perspective. To the best of our knowledge, we present the first systematic evaluation of LLMs as expert-level data annotators.	翻訳日:2024-11-02 23:28:42 公開日:2024-10-04
# 事前学習言語モデルにおける語彙適応強化のための適応的BPEトークン化 Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models ( http://arxiv.org/abs/2410.03258v1 ) ライセンス: Link先を確認	Gunjan Balde, Soumyadeep Roy, Mainack Mondal, Niloy Ganguly,	(参考訳) 本研究では, マイクロペア符号化(BPE)トークン化方式を用いた語彙適応手法の基本的制限を, エキスパートドメインへの微調整事前学習言語モデル (PLM) に適用する。現在のアプローチでは、PLM語彙の最後にターゲットドメイン固有の語彙を自明に付加している。このアプローチは優先度の低いスコアをもたらし、与えられたテキストのトークン化にマージルールを反復的に使用するBPEの準最適トークン化を引き起こす。この問題を軽減するために,BPEトークン化初期化フェーズを修正したAdaptBPEを提案する。各種分類タスクと要約タスクに対して,AdaptBPEと標準BPEを広範囲に評価し,AdaptBPEの精度は3.57%,Rue-Lでは1.87%向上した。 MEDVOCのAdaptBPEは、参照サマリーがOOV濃度が高い場合や長さが長い場合、特にうまく機能する。また,AdaptBPEがMEDVOCと比較して,より関連性が高く忠実な要約を生成することを明らかにする。コードベースはhttps://github.com/gb-kgp/adaptbpe.comで公開しています。 In this work, we show a fundamental limitation in vocabulary adaptation approaches that use Byte-Pair Encoding (BPE) tokenization scheme for fine-tuning pretrained language models (PLMs) to expert domains. Current approaches trivially append the target domain-specific vocabulary at the end of the PLM vocabulary. This approach leads to a lower priority score and causes sub-optimal tokenization in BPE that iteratively uses merge rules to tokenize a given text. To mitigate this issue, we propose AdaptBPE where the BPE tokenization initialization phase is modified to first perform the longest string matching on the added (target) vocabulary before tokenizing at the character level. We perform an extensive evaluation of AdaptBPE versus the standard BPE over various classification and summarization tasks; AdaptBPE improves by 3.57% (in terms of accuracy) and 1.87% (in terms of Rouge-L), respectively. AdaptBPE for MEDVOC works particularly well when reference summaries have high OOV concentration or are longer in length. We also conduct a human evaluation, revealing that AdaptBPE generates more relevant and more faithful summaries as compared to MEDVOC. We make our codebase publicly available at https://github.com/gb-kgp/adaptbpe.	翻訳日:2024-11-02 23:28:42 公開日:2024-10-04
# 部分空間アライメントによる回帰テスト時間適応 Test-time Adaptation for Regression by Subspace Alignment ( http://arxiv.org/abs/2410.03263v1 ) ライセンス: Link先を確認	Kazuki Adachi, Shin'ya Yamaguchi, Atsutoshi Kumagai, Tomoki Hamagami,	(参考訳) 本稿では、ソース領域で事前訓練された回帰モデルを、ラベルなしのターゲットデータを含む未知のターゲット分布に適応させる、回帰のためのテスト時間適応(TTA)について検討する。回帰は機械学習の基本的なタスクの1つであるが、既存のTTA手法のほとんどは分類固有の設計を持ち、モデルがクラス分類予測を出力するのに対し、回帰モデルは典型的には単一のスカラー値のみを出力する。回帰のためにTTAを有効にするために、ソースとターゲットドメイン間の特徴分布を整列させてドメインギャップを緩和する機能アライメントアプローチを採用する。しかし, 従来のTTA手法では, 小部分空間に分散し, 生の特徴次元の多くが出力にはほとんど意味がないため, 特徴アライメントが不有効あるいはさらに悪化することが判明した。回帰のためのTTAにおける効果的な特徴アライメントとして,SSA(Significant-subspace Alignment)を提案する。 SSAは、部分空間検出と次元重み付けという2つのコンポーネントから構成される。部分空間検出は、出力に代表的で重要な特徴部分空間を見つける。そして、TTA中にサブスペースで特徴アライメントを行う。一方、次元重み付けは出力により大きな意味を持つ特徴部分空間の次元の重要性を高める。実世界のデータセットにおいて,SSAが様々なベースラインより優れていることを示す。 This paper investigates test-time adaptation (TTA) for regression, where a regression model pre-trained in a source domain is adapted to an unknown target distribution with unlabeled target data. Although regression is one of the fundamental tasks in machine learning, most of the existing TTA methods have classification-specific designs, which assume that models output class-categorical predictions, whereas regression models typically output only single scalar values. To enable TTA for regression, we adopt a feature alignment approach, which aligns the feature distributions between the source and target domains to mitigate the domain gap. However, we found that naive feature alignment employed in existing TTA methods for classification is ineffective or even worse for regression because the features are distributed in a small subspace and many of the raw feature dimensions have little significance to the output. For an effective feature alignment in TTA for regression, we propose Significant-subspace Alignment (SSA). SSA consists of two components: subspace detection and dimension weighting. Subspace detection finds the feature subspace that is representative and significant to the output. Then, the feature alignment is performed in the subspace during TTA. Meanwhile, dimension weighting raises the importance of the dimensions of the feature subspace that have greater significance to the output. We experimentally show that SSA outperforms various baselines on real-world datasets.	翻訳日:2024-11-02 23:28:42 公開日:2024-10-04
# ε$-contaminateed Credal Setsの最適輸送 Optimal Transport for $ε$-Contaminated Credal Sets ( http://arxiv.org/abs/2410.03267v1 ) ライセンス: Link先を確認	Michele Caprio,	(参考訳) 我々は,モンジェとカントロビッチの最適輸送問題の確率の低いバージョンを提供する。より低い確率が$\epsilon$-contaminated set の下位エンベロープである場合、Monge の我々のバージョンと、関東ロビッチの問題の限定バージョンは、それぞれの古典バージョンと一致することを示す。また,関東ロビッチの最適計画の存在条件と,その2つの問題が等価となる条件についても述べる。副産物として、$\epsilon$-contamination の場合、Monge と Kantorovich の最適輸送問題は一致しない。機械学習と人工知能への本研究の応用についても論じる。 We provide a version for lower probabilities of Monge's and Kantorovich's optimal transport problems. We show that, when the lower probabilities are the lower envelopes of $\epsilon$-contaminated sets, then our version of Monge's, and a restricted version of our Kantorovich's problems, coincide with their respective classical versions. We also give sufficient conditions for the existence of our version of Kantorovich's optimal plan, and for the two problems to be equivalent. As a byproduct, we show that for $\epsilon$-contaminations the lower probability versions of Monge's and Kantorovich's optimal transport problems need not coincide. The applications of our results to Machine Learning and Artificial Intelligence are also discussed.	翻訳日:2024-11-02 23:28:42 公開日:2024-10-04
# 空間探索のための量子ウォークに及ぼす二変量ガウスポテンシャルの影響 Impact of Bivariate Gaussian Potentials on Quantum Walks for Spatial Search ( http://arxiv.org/abs/2410.03269v1 ) ライセンス: Link先を確認	Franklin de L. Marquezino, Raqueline A. M. Santos,	(参考訳) 空間探索問題における量子ウォークの力学に対するポテンシャル場,特に二変量ガウス分布関数の利用の影響について検討する。二次元格子を探索するためのAmbainis-Kempe-Rivosh(AKR)モデル上に構築し,二変量ガウス関数の標準偏差変化と正規化が探索アルゴリズムの性能に与える影響について検討する。その結果,標準偏差が小さい場合,量子ウォークはAKRアルゴリズムを密接に反映するが,標準偏差が大きくなるにつれて成功確率が急速に低下することを示した。この振る舞いは、二変量ガウスがAKRアルゴリズム内の雑音の多いオラクルを効果的にモデル化する方法を示す。さらに、AKRベースのモデルと代替量子ウォークモデルとの比較を、ハダマール硬貨と標準シフトを用いて行った。これらの知見は、量子ウォーク探索アルゴリズムの堅牢性を理解し、量子ウォークを最適化アルゴリズムに適用する方法に関する洞察を与えるのに寄与する。 We examine the impact of potential fields, particularly utilizing a bivariate Gaussian distribution function, on the dynamics of quantum walks in spatial search problems. Building on the Ambainis-Kempe-Rivosh (AKR) model for searching on a two-dimensional grid, we incorporate potential fields to investigate how changes in standard deviation and normalization of the bivariate Gaussian function impact the performance of the search algorithm. Our results show that the quantum walk closely mirrors the AKR algorithm when the standard deviation is small but exhibits a rapid decay in success probability as the standard deviation increases. This behavior demonstrates how the bivariate Gaussian can effectively model a noisy oracle within the AKR algorithm. Additionally, we compare the AKR-based model with an alternative quantum walk model using a Hadamard coin and standard shift. These findings contribute to understanding the robustness of quantum walk search algorithms, and provide insights into how quantum walks can be applied to optimization algorithms.	翻訳日:2024-11-02 23:28:42 公開日:2024-10-04
# 感情負荷によるユーザ生成コンテンツの機械翻訳評価のためのマルチタスク学習フレームワーク A Multi-task Learning Framework for Evaluating Machine Translation of Emotion-loaded User-generated Content ( http://arxiv.org/abs/2410.03277v1 ) ライセンス: Link先を確認	Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo,	(参考訳) ユーザ生成コンテンツ(UGC)の機械翻訳(MT)は、スラング、感情、皮肉や皮肉といった文学的デバイスを扱うなど、ユニークな課題を生んでいる。これらの翻訳の品質を評価することは、現在のメトリクスがUGCのユビキタスな機能に重点を置いていないため、難しい。この問題に対処するために、感情ラベルと多次元品質指標に基づく人手による翻訳誤りを含む既存の感情関連データセットを利用する。文レベル評価スコアと単語レベルラベルで拡張し、マルチタスク設定で文レベルと単語レベルの翻訳評価と感情分類に適したデータセットを作成する。我々はこれらのタスクを同時に実行する新しいアーキテクチャを提案し、NashやAligned Lossのような異なる損失ヒューリスティックを統合した新しい複合損失関数を提案する。本評価では,既存の微調整学習とマルチタスク学習のアプローチを比較し,複数のデータセット上でのアブレーション実験による一般化を評価する。提案手法は最先端性能を実現し,UGCのMT評価のための総合的な解析手法を提案する。 Machine translation (MT) of user-generated content (UGC) poses unique challenges, including handling slang, emotion, and literary devices like irony and sarcasm. Evaluating the quality of these translations is challenging as current metrics do not focus on these ubiquitous features of UGC. To address this issue, we utilize an existing emotion-related dataset that includes emotion labels and human-annotated translation errors based on Multi-dimensional Quality Metrics. We extend it with sentence-level evaluation scores and word-level labels, leading to a dataset suitable for sentence- and word-level translation evaluation and emotion classification, in a multi-task setting. We propose a new architecture to perform these tasks concurrently, with a novel combined loss function, which integrates different loss heuristics, like the Nash and Aligned losses. Our evaluation compares existing fine-tuning and multi-task learning approaches, assessing generalization with ablative experiments over multiple datasets. Our approach achieves state-of-the-art performance and we present a comprehensive analysis for MT evaluation of UGC.	翻訳日:2024-11-02 23:18:36 公開日:2024-10-04
# デジタルステレオスコープを用いたマニキン記録心肺音データセット Manikin-Recorded Cardiopulmonary Sounds Dataset Using Digital Stethoscope ( http://arxiv.org/abs/2410.03280v1 ) ライセンス: Link先を確認	Yasaman Torabi, Shahram Shirani, James P. Reilly,	(参考訳) 心臓と肺の音は、医療監視に不可欠です。近年の聴診器技術の進歩により、患者の音を精度良く捉えられるようになった。本データセットでは,個人と混合記録を含む心臓と肺の両方の音を計測するために,デジタル聴診器を用いた。私たちの知る限りでは、このデータセットは心呼吸音と混合呼吸音の両方を提供する最初のデータセットです。記録は、ヒトの生理状態を再現し、身体の異なる場所でクリーンな心臓と肺の音を発生させる患者シミュレータである臨床マニキンから収集された。このデータセットは、正常な音と様々な異常(例えば、大腿骨、心房細動、頻拍、房室ブロック、第3および第4心臓音、捕食、ひび割れ、ロンチ、胸水、ガーリング音)を含む。このデータセットは、専門看護師が定めるように、異なる解剖学的場所で行われる胸部検査の音声記録を含む。それぞれの録音は、特定の音種を強調するために周波数フィルタを用いて拡張されている。このデータセットは、自動心肺疾患検出、音分類、教師なし分離技術、音声信号処理に関連するディープラーニングアルゴリズムなど、人工知能の応用に有用である。 Heart and lung sounds are crucial for healthcare monitoring. Recent improvements in stethoscope technology have made it possible to capture patient sounds with enhanced precision. In this dataset, we used a digital stethoscope to capture both heart and lung sounds, including individual and mixed recordings. To our knowledge, this is the first dataset to offer both separate and mixed cardiorespiratory sounds. The recordings were collected from a clinical manikin, a patient simulator designed to replicate human physiological conditions, generating clean heart and lung sounds at different body locations. This dataset includes both normal sounds and various abnormalities (i.e., murmur, atrial fibrillation, tachycardia, atrioventricular block, third and fourth heart sound, wheezing, crackles, rhonchi, pleural rub, and gurgling sounds). The dataset includes audio recordings of chest examinations performed at different anatomical locations, as determined by specialist nurses. Each recording has been enhanced using frequency filters to highlight specific sound types. This dataset is useful for applications in artificial intelligence, such as automated cardiopulmonary disease detection, sound classification, unsupervised separation techniques, and deep learning algorithms related to audio signal processing.	翻訳日:2024-11-02 23:18:36 公開日:2024-10-04
# BN-SCAFFOLD:フェデレートラーニングにおけるバッチ正規化統計のドリフト制御 BN-SCAFFOLD: controlling the drift of Batch Normalization statistics in Federated Learning ( http://arxiv.org/abs/2410.03281v1 ) ライセンス: Link先を確認	Gonzalo Iñaki Quintana, Laurence Vancamberg, Vincent Jugnon, Mathilde Mougeot, Agnès Desolneux,	(参考訳) 機械学習(ML)モデルを分散的にトレーニングするための学習パラダイムとして、フェデレートラーニング(FL)が注目を集めている。バッチ正規化(BN)は、収束と一般化を改善するため、ディープニューラルネットワーク(DNN)においてユビキタスである。しかし、BNは異種FLにおけるDNNの性能を阻害すると報告されている。近年、BN統計と全てのクライアントからの勾配を集約することにより、BN上の不均一性の影響を軽減するためにFedTANアルゴリズムが提案されている。しかし、通信コストが高く、DNNの深さとともに直線的に増加する。 SCAFFOLDは分散低減アルゴリズムであり、クライアントのドリフトを通信効率のよい方法で推定し、補正する。ヘテロジニアスFL設定の有望な結果にもかかわらず、BNを持つモデルでは性能が劣っていることが報告されている。本研究では、異種FLにおけるBNを用いたDNNの効率的なトレーニング方法として、SCAFFOLD、より一般的にはばらつきの低減を目指す。 Wang et al 2023 の業績に触発された BN-DNN 設定における分散還元アルゴリズムの収束を解析するための統一理論フレームワークを導入し,SSCAFFOLD が BN-DNN のバイアスを除去できないことを示す。そこで我々は,SCAFFOLDのクライアントドリフト補正をBN統計に拡張するBN-SCAFFOLDアルゴリズムを提案する。上記のフレームワークを用いて収束を証明し、MNISTとCIFAR-10の実験により理論的結果を検証する。 BN-SCAFFOLDは通信コストが高く、フェデレート平均化(FedAvg)、SCAFFOLD、およびBNの不均一性を緩和するために設計された他のFLアルゴリズムよりも優れている。 Federated Learning (FL) is gaining traction as a learning paradigm for training Machine Learning (ML) models in a decentralized way. Batch Normalization (BN) is ubiquitous in Deep Neural Networks (DNN), as it improves convergence and generalization. However, BN has been reported to hinder performance of DNNs in heterogeneous FL. Recently, the FedTAN algorithm has been proposed to mitigate the effect of heterogeneity on BN, by aggregating BN statistics and gradients from all the clients. However, it has a high communication cost, that increases linearly with the depth of the DNN. SCAFFOLD is a variance reduction algorithm, that estimates and corrects the client drift in a communication-efficient manner. Despite its promising results in heterogeneous FL settings, it has been reported to underperform for models with BN. In this work, we seek to revive SCAFFOLD, and more generally variance reduction, as an efficient way of training DNN with BN in heterogeneous FL. We introduce a unified theoretical framework for analyzing the convergence of variance reduction algorithms in the BN-DNN setting, inspired of by the work of Wang et al. 2023, and show that SCAFFOLD is unable to remove the bias introduced by BN. We thus propose the BN-SCAFFOLD algorithm, which extends the client drift correction of SCAFFOLD to BN statistics. We prove convergence using the aforementioned framework and validate the theoretical results with experiments on MNIST and CIFAR-10. BN-SCAFFOLD equals the performance of FedTAN, without its high communication cost, outperforming Federated Averaging (FedAvg), SCAFFOLD, and other FL algorithms designed to mitigate BN heterogeneity.	翻訳日:2024-11-02 23:18:36 公開日:2024-10-04
# ボルツマン密度からのニューラルサンプリング:ワッサーシュタイン幾何学におけるフィッシャー・ラオ曲線 Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry ( http://arxiv.org/abs/2410.03282v1 ) ライセンス: Link先を確認	Jannis Chemseddine, Christian Wald, Richard Duong, Gabriele Steidl,	(参考訳) 非正規化ボルツマン密度$\rho_D$ からエネルギー$f_t$ で与えられるボルツマン曲線を単純な密度$\rho_Z$ から学習してサンプリングするタスクに対処する。まず、フィッシャー・ラオ流がワッサーシュタイン幾何学において絶対連続である条件を検討する。第二に、特定の補間 $f_t$ と、関連する密度/速度対 $(\rho_t,v_t)$ の学習に対処する。速度場$v_t$のパラメトリゼーションしか必要としない線形補間が「質量の対流」の問題に悩まされることが数値的に観察された。ワッサーシュタイン幾何学のツールを用いて解析的な例を示し、速度場の爆発を正確に測定する。 M\'at\'e と Fleuret に触発され、$f_t$ と $v_t$ の両方をパラメタライズし、$f_t$ のみをパラメタライズし、適切な $v_t$ を修正する補間法を提案する。これはランゲヴィン力学に関連するクルバック・リーブラー発散のワッサーシュタイン勾配流に対応している。我々は,本モデルが,上記サンプリング課題をうまく解くための良好な流れ場を提供することを示す数値例で示す。 We deal with the task of sampling from an unnormalized Boltzmann density $\rho_D$ by learning a Boltzmann curve given by energies $f_t$ starting in a simple density $\rho_Z$. First, we examine conditions under which Fisher-Rao flows are absolutely continuous in the Wasserstein geometry. Second, we address specific interpolations $f_t$ and the learning of the related density/velocity pairs $(\rho_t,v_t)$. It was numerically observed that the linear interpolation, which requires only a parametrization of the velocity field $v_t$, suffers from a "teleportation-of-mass" issue. Using tools from the Wasserstein geometry, we give an analytical example, where we can precisely measure the explosion of the velocity field. Inspired by M\'at\'e and Fleuret, who parametrize both $f_t$ and $v_t$, we propose an interpolation which parametrizes only $f_t$ and fixes an appropriate $v_t$. This corresponds to the Wasserstein gradient flow of the Kullback-Leibler divergence related to Langevin dynamics. We demonstrate by numerical examples that our model provides a well-behaved flow field which successfully solves the above sampling task.	翻訳日:2024-11-02 23:18:36 公開日:2024-10-04
# uniINF:パラメータフリー重機MABのためのBest-of-Both-Worldsアルゴリズム uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs ( http://arxiv.org/abs/2410.03284v1 ) ライセンス: Link先を確認	Yu Chen, Jiatai Huang, Yan Dai, Longbo Huang,	(参考訳) 本稿では,HTMAB(Heavy-Tailed Multi-Armed Bandits)問題に対するUniINFアルゴリズムを提案する。時間とともに損失分布が一定となる確率的MAB設定とは異なり、本研究は両腕と時間に依存する重み付き分布から損失が生じる対向的な構成にまで拡張する。我々の新しいアルゴリズム「uniINF」は、Best-of-Both-Worlds(BoBW)特性を楽しみ、正確な環境タイプを知らずに確率的および対角的環境の両方で最適に機能する。さらに,本アルゴリズムはパラメータフリーの機能も備えており,重みパラメータ $(\sigma, \alpha)$ a-priori を知らずに動作する。正確に言うと、uniINFは確率的および対数的環境においてほぼ最適の後悔を保証し、$(\sigma, \alpha)$が知られているときに対応する下界と一致する(対数的要因まで)。我々の知る限り、UniINFは重み付きMAB問題に対するBoBW特性を達成する最初のパラメータフリーアルゴリズムである。技術的には、パラメータフリーHTMABのBoBW保証を実現する革新的な技術を開発し、ログバリアのダイナミクスの洗練された解析、自動分散学習率スケジューリングスキーム、適応的なスキップ・クリッピング・ロスチューニング技術、対数後悔の停止時間解析を含む。 In this paper, we present a novel algorithm, uniINF, for the Heavy-Tailed Multi-Armed Bandits (HTMAB) problem, demonstrating robustness and adaptability in both stochastic and adversarial environments. Unlike the stochastic MAB setting where loss distributions are stationary with time, our study extends to the adversarial setup, where losses are generated from heavy-tailed distributions that depend on both arms and time. Our novel algorithm `uniINF` enjoys the so-called Best-of-Both-Worlds (BoBW) property, performing optimally in both stochastic and adversarial environments without knowing the exact environment type. Moreover, our algorithm also possesses a Parameter-Free feature, i.e., it operates without the need of knowing the heavy-tail parameters $(\sigma, \alpha)$ a-priori. To be precise, uniINF ensures nearly-optimal regret in both stochastic and adversarial environments, matching the corresponding lower bounds when $(\sigma, \alpha)$ is known (up to logarithmic factors). To our knowledge, uniINF is the first parameter-free algorithm to achieve the BoBW property for the heavy-tailed MAB problem. Technically, we develop innovative techniques to achieve BoBW guarantees for Parameter-Free HTMABs, including a refined analysis for the dynamics of log-barrier, an auto-balancing learning rate scheduling scheme, an adaptive skipping-clipping loss tuning technique, and a stopping-time analysis for logarithmic regret.	翻訳日:2024-11-02 23:18:35 公開日:2024-10-04
# 計算外交 : 「善のためのハッカソン」がデジタル時代の多元主義にどう貢献するか Computational Diplomacy: How "hackathons for good" feed a participatory future for multilateralism in the digital age ( http://arxiv.org/abs/2410.03286v1 ) ライセンス: Link先を確認	Thomas Maillart, Lucia Gomez, Ewa Lombard, Alexander Nolte, Francesco Pisano,	(参考訳) この記事では、グローバルSDGの課題に対処することに焦点を当てた、ソフトウェアとハードウェア開発者のコミュニティを構築する上でのハッカソンの役割について説明する。我々は、この動きを計算外交として論じる:グローバルな問題に対処するために集団的知性を活用するデジタルガバナンスのための分散的で参加的なプロセス。 DevpostとGitHubのデータを分析してみると、2010年以降のハッカソンの30%がSDGのトピックに取り組み、革新的なソリューションを作るためにさまざまな技術を採用していることが分かる。ハッカソンは重要なカイロの瞬間として機能し、即時のプロジェクト成果と長期生産の両方を駆動するイノベーションのバーストを引き起こします。これらの出来事は、人間の協力と共感の神経生物学的基盤を利用し、目的意識を育み、対人偏見を減らすことを提案する。デジタルガバナンスに対するこのボトムアップアプローチは、ソフトウェア開発、人間の集合知性、集団行動を統合し、変革的変革のための動的モデルを作成します。カイロモーメントを活用することで、計算外交は未来のデジタル多角的ガバナンスにおいてより包括的で効果的なモデルを促進する。 This article explores the role of hackathons for good in building a community of software and hardware developers focused on addressing global SDG challenges. We theorise this movement as computational diplomacy: a decentralised, participatory process for digital governance that leverages collective intelligence to tackle major global issues. Analysing Devpost and GitHub data reveals that 30% of hackathons since 2010 have addressed SDG topics, employing diverse technologies to create innovative solutions. Hackathons serve as crucial kairos moments, sparking innovation bursts that drive both immediate project outcomes and long-term production. We propose that these events harness the neurobiological basis of human cooperation and empathy, fostering a collective sense of purpose and reducing interpersonal prejudice. This bottom-up approach to digital governance integrates software development, human collective intelligence, and collective action, creating a dynamic model for transformative change. By leveraging kairos moments, computational diplomacy promotes a more inclusive and effective model for digital multilateral governance of the future.	翻訳日:2024-11-02 23:18:35 公開日:2024-10-04
# セマンティックセグメンテーションに基づくスライディング画像の組織学的品質制御 Semantic Segmentation Based Quality Control of Histopathology Whole Slide Images ( http://arxiv.org/abs/2410.03289v1 ) ライセンス: Link先を確認	Abhijeet Patil, Garima Jain, Harsh Diwakar, Jay Sawant, Tripti Bameta, Swapnil Rane, Amit Sethi,	(参考訳) 我々は, 組織領域, 組織折り, ペンマークなど, さまざまな領域を区分する, 病理組織全体像(WSI)の品質管理のためのソフトウェアパイプラインを開発した。 WSIを処理するためのGPUの必要性と可用性の向上を踏まえ、提案したパイプラインは、精度と速度のバランスをとるために、複数の軽量ディープラーニングモデルで構成されている。パイプラインは全TCGAで評価され、これは28の臓器から11,000以上の病理像を含むWSIデータセットとして最大である。これは、深層学習をベースとしない以前の研究と比較され、臓器間でのセグメンテーションの結果が一貫した改善を示した。組織やぼやけたセグメンテーションに対するアノテーションの労力を最小限に抑えるため, パッチ分類ツールHistoROIを用いてラベルが同定されたWSIからモザイクパッチ(サブイメージ)をモザイクすることで, 注釈付き画像が自動的に作成される。トレーニング済みのQCパイプラインの汎用性と、その広範なテストにより、この作業の潜在的な影響は広くなっています。大規模な病理画像解析の精度と信頼性を高めるために、WSIコホートの自動前処理に使用できる。トレーニングされたモデル、トレーニングスクリプト、トレーニングデータ、推論結果はhttps://github.com/abhijeetptl5/wsisegqcで公開されています。 We developed a software pipeline for quality control (QC) of histopathology whole slide images (WSIs) that segments various regions, such as blurs of different levels, tissue regions, tissue folds, and pen marks. Given the necessity and increasing availability of GPUs for processing WSIs, the proposed pipeline comprises multiple lightweight deep learning models to strike a balance between accuracy and speed. The pipeline was evaluated in all TCGAs, which is the largest publicly available WSI dataset containing more than 11,000 histopathological images from 28 organs. It was compared to a previous work, which was not based on deep learning, and it showed consistent improvement in segmentation results across organs. To minimize annotation effort for tissue and blur segmentation, annotated images were automatically prepared by mosaicking patches (sub-images) from various WSIs whose labels were identified using a patch classification tool HistoROI. Due to the generality of our trained QC pipeline and its extensive testing the potential impact of this work is broad. It can be used for automated pre-processing any WSI cohort to enhance the accuracy and reliability of large-scale histopathology image analysis for both research and clinical use. We have made the trained models, training scripts, training data, and inference results publicly available at https://github.com/abhijeetptl5/wsisegqc, which should enable the research community to use the pipeline right out of the box or further customize it to new datasets and applications in the future.	翻訳日:2024-11-02 23:18:35 公開日:2024-10-04
# グラウンドドビデオLLM:ビデオ大言語モデルにおける微細な時間的グラウンド化 Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models ( http://arxiv.org/abs/2410.03290v1 ) ライセンス: Link先を確認	Haibo Wang, Zhiyang Xu, Yu Cheng, Shizhe Diao, Yufan Zhou, Yixin Cao, Qifan Wang, Weifeng Ge, Lifu Huang,	(参考訳) ビデオ大言語モデル (Video-LLMs) は、粗粒度ビデオ理解において顕著な能力を示したが、細粒度の時間的接地に苦慮している。本稿では,特定の映像モーメントをきめ細かな方法で知覚・推論できる新しいビデオLLMであるGrounded-VideoLLMを紹介する。実時間モデルやタイムスタンプ表現が欠如しているため,現在のビデオ-LLMでは微細な映像理解に制限がある。そこで我々は,(1)フレーム間の関係を符号化するための時間的ストリームと(2)タイムスタンプを表現するための時間的知識に富んだ離散的時間的トークンを付加することにより,モデルを強化する。 Grounded-VideoLLMのトレーニングを最適化するために、簡単なビデオキャプションタスクから始まり、ビデオ時間的グラウンドニングタスクを段階的に導入し、複雑さを増す。 Grounded-VideoLLMの時間的推論能力をさらに強化するため、自動アノテーションパイプラインにより地上ビデオQAデータセットをキュレートする。広汎な実験により、Grounded-VideoLLMは、時間文の接地、高密度ビデオキャプション、グラウンドドビデオQAといったきめ細かい接地作業に優れるだけでなく、一般的なビデオ理解のための多目的ビデオアシスタントとして大きな可能性を示す。 Video Large Language Models (Video-LLMs) have demonstrated remarkable capabilities in coarse-grained video understanding, however, they struggle with fine-grained temporal grounding. In this paper, we introduce Grounded-VideoLLM, a novel Video-LLM adept at perceiving and reasoning over specific video moments in a fine-grained manner. We identify that current Video-LLMs have limitations for fine-grained video understanding since they lack effective temporal modeling and timestamp representation. In light of this, we sharpen our model by incorporating (1) an additional temporal stream to encode the relationships between frames and (2) discrete temporal tokens enriched with specific time knowledge to represent timestamps. To optimize the training of Grounded-VideoLLM, we employ a multi-stage training scheme, beginning with simple video-captioning tasks and progressively introducing video temporal grounding tasks of increasing complexity. To further enhance Grounded-VideoLLM's temporal reasoning capability, we also curate a grounded VideoQA dataset by an automatic annotation pipeline. Extensive experiments demonstrate that Grounded-VideoLLM not only excels in fine-grained grounding tasks such as temporal sentence grounding, dense video captioning, and grounded VideoQA, but also shows great potential as a versatile video assistant for general video understanding.	翻訳日:2024-11-02 23:18:35 公開日:2024-10-04
# 動的システムのコンテキスト内学習のための拡張トランスフォーマーアーキテクチャ Enhanced Transformer architecture for in-context learning of dynamical systems ( http://arxiv.org/abs/2410.03291v1 ) ライセンス: Link先を確認	Matteo Rufolo, Dario Piga, Gabriele Maroni, Marco Forgione,	(参考訳) 著者らによって最近紹介されたインコンテキスト識別パラダイムは、システム全体の振る舞いを記述するメタモデルである合成データに基づいて、推定、オフライン、およびベースとなることを目的としている。訓練後、このメタモデルは実システムによって生成された観測された入出力シーケンス(コンテキスト)で入力され、その振る舞いをゼロショット学習方式で予測する。本稿では、確率的フレームワーク内で学習タスクを定式化すること、非連続的なコンテキストとクエリウィンドウを管理すること、長いコンテキストシーケンスを効果的に扱うために繰り返しパッチを適用すること、の3つの主要な革新を通じて、元のメタモデリングフレームワークを強化する。これらの修正の有効性は、Wiener-Hammersteinシステムクラスに焦点を当てた数値的な例を通して示され、モデルの性能と拡張性を強調している。 Recently introduced by some of the authors, the in-context identification paradigm aims at estimating, offline and based on synthetic data, a meta-model that describes the behavior of a whole class of systems. Once trained, this meta-model is fed with an observed input/output sequence (context) generated by a real system to predict its behavior in a zero-shot learning fashion. In this paper, we enhance the original meta-modeling framework through three key innovations: by formulating the learning task within a probabilistic framework; by managing non-contiguous context and query windows; and by adopting recurrent patching to effectively handle long context sequences. The efficacy of these modifications is demonstrated through a numerical example focusing on the Wiener-Hammerstein system class, highlighting the model's enhanced performance and scalability.	翻訳日:2024-11-02 23:18:35 公開日:2024-10-04
# 深部選択状態空間モデルのトーケンダイナミクスのデミステレーション Demystifying the Token Dynamics of Deep Selective State Space Models ( http://arxiv.org/abs/2410.03292v1 ) ライセンス: Link先を確認	Thieu N Vo, Tung D. Pham, Xin T. Tong, Tan Minh Nguyen,	(参考訳) Mamba のような選択状態空間モデル (SSM) は、シーケンシャルなデータモデリングの有効性で有名になった。その卓越した経験的性能にもかかわらず、深い選択性を持つSSMの包括的な理論的理解は、高い忠実性を必要とするアプリケーションに対するさらなる開発と採用を妨げるままである。本稿では,事前学習したマンバモデルにおけるトークンの動的特性について検討する。特に,マンバモデルの連続時間限界を規定する力学系を導出し,その解の漸近挙動を特徴づける。一次元の場合、以下の2つのシナリオのうち、すべてのトークンが 0 に収束するか、またはすべてのトークンが無限大に分岐するかのどちらかである。各シナリオがいつ発生するかを決定するために、モデルパラメータに基づいた基準を提供する。収束シナリオに対しては、このシナリオがモデルの性能に悪影響を及ぼすことを実証的に検証する。分岐シナリオでは、異なるトークンが異なるレートで無限大に分岐し、モデルトレーニング中の更新に不平等に寄与することを証明する。これらの調査に基づき,本モデルでは,収束シナリオを除外し,重要なスコアに基づいてトークンを並べ替える2つの改良点を提案する。実世界の応用において,Mambaの有効性を高めるための洞察を提供するとともに,これらの改良を検証した。 Selective state space models (SSM), such as Mamba, have gained prominence for their effectiveness in modeling sequential data. Despite their outstanding empirical performance, a comprehensive theoretical understanding of deep selective SSM remains elusive, hindering their further development and adoption for applications that need high fidelity. In this paper, we investigate the dynamical properties of tokens in a pre-trained Mamba model. In particular, we derive the dynamical system governing the continuous-time limit of the Mamba model and characterize the asymptotic behavior of its solutions. In the one-dimensional case, we prove that only one of the following two scenarios happens: either all tokens converge to zero, or all tokens diverge to infinity. We provide criteria based on model parameters to determine when each scenario occurs. For the convergent scenario, we empirically verify that this scenario negatively impacts the model's performance. For the divergent scenario, we prove that different tokens will diverge to infinity at different rates, thereby contributing unequally to the updates during model training. Based on these investigations, we propose two refinements for the model: excluding the convergent scenario and reordering tokens based on their importance scores, both aimed at improving practical performance. Our experimental results validate these refinements, offering insights into enhancing Mamba's effectiveness in real-world applications.	翻訳日:2024-11-02 23:18:35 公開日:2024-10-04
# 多言語テキスト分類におけるゼロショット自己説明と人間の理性の比較 Comparing zero-shot self-explanations with human rationales in multilingual text classification ( http://arxiv.org/abs/2410.03296v1 ) ライセンス: Link先を確認	Stephanie Brandl, Oliver Eberle,	(参考訳) インストラクションチューニングされたLLMは、勾配計算や複雑なXAIメソッドの適用を必要としない自己説明を生成することで、その出力をユーザに説明することができる。本稿では,この能力が,人間に対する妥当性,モデルに対する忠実性に関して,入力論理の形で自己説明を評価することによって,良好な説明をもたらすかどうかを解析する。そこで本研究では,感情分類と強制労働検出という2つのテキスト分類タスクを適用した。英語の他に、デンマーク語とイタリア語による感情分類タスクの翻訳も含み、全サンプルに対する自己説明と人間のアノテーションを比較する。直接比較を可能にするため,本パイプラインを4LLM(Llama2,Llama3,Mistral,Mixtral)に適用する。以上の結果から,自己説明はLRPよりも人間のアノテーションと密接に一致し,忠実度は同等であることがわかった。 Instruction-tuned LLMs are able to provide an explanation about their output to users by generating self-explanations that do not require gradient computations or the application of possibly complex XAI methods. In this paper, we analyse whether this ability results in a good explanation by evaluating self-explanations in the form of input rationales with respect to their plausibility to humans as well as their faithfulness to models. For this, we apply two text classification tasks: sentiment classification and forced labour detection. Next to English, we further include Danish and Italian translations of the sentiment classification task and compare self-explanations to human annotations for all samples. To allow for direct comparisons, we also compute post-hoc feature attribution, i.e., layer-wise relevance propagation (LRP) and apply this pipeline to 4 LLMs (Llama2, Llama3, Mistral and Mixtral). Our results show that self-explanations align more closely with human annotations compared to LRP, while maintaining a comparable level of faithfulness.	翻訳日:2024-11-02 23:08:51 公開日:2024-10-04
# マルコフ政権と非マルコフ政権における外的運転・散逸・集団的影響の相互作用 Interplay between external driving, dissipation and collective effects in the Markovian and non-Markovian regimes ( http://arxiv.org/abs/2410.03297v1 ) ライセンス: Link先を確認	Roie Dann,	(参考訳) 量子光学系の進化は、周囲環境との相互作用、外部制御レーザー、異なるシステムコンポーネント間の相互作用の3つの重要な要素によって決定される。 3つの動的寄与の間の相互作用を理解することは、非平衡現象や技術応用の研究に不可欠である。本研究では、ボゾン場に同時に結合した駆動光学系における開系現象について検討する。フォトニック結晶に結合したマイクロキャビティの線形系について、環境相互作用と外部制御が適用されたコヒーレントドライブに有意な非マルコフ補正を引き起こすことを解析的に示す。さらに、複数のモードが同じフィールドに結合され、あるモードに印加されたレーザーが他のモードを効果的に駆動するときに、集合的なクロスドライブ効果が生じる。線形解に基づいて、2レベルエミッターに対する非マルコフ的マスター方程式が導出される。注目すべきことに、提案された運動方程式は、ボソニックモードではエミッタを近似できない中程度の駆動強度でも正確である。非線型性の影響を解析し、正確な擬モード解に対してベンチマークし、マルコフ体制の確立されたマスター方程式と比較する。この状態の中では、短時間の非マルコフ効果が環境の帯域幅の逆数を超え、短いレーザーパルスによって引き起こされるメモリ効果を示す。これらの発見は、量子光学系の駆動されたオープンシステムのダイナミクス、固体材料に埋め込まれた不純物、分子システムなどに関する貴重な洞察を与え、量子状態の正確な制御の道を開いた。 The evolution of quantum optical systems is determined by three key factors: the interactions with their surrounding environment, externally controlled lasers and between the different system components. Understanding the interplay between the three dynamical contributions is essential for the study of out-of-equilibrium phenomena as well as technological applications. The present study investigates open system phenomena in driven optical systems coupled simultaneously to a bosonic field. For a linear system of micro-cavities coupled to a photonic crystal, it is analytically shown that environmental interaction and external control cause significant non-Markovian corrections to the applied coherent drive. Additionally, collective cross-driving effects arise when multiple modes are coupled to the same field, where a laser applied to one mode effectively drives other modes. Based on the linear solution, a non-Markovian master equation for two-level emitters is derived. Remarkably, the proposed equation of motion remains accurate even for moderate driving intensities, where emitters cannot be approximated by bosonic modes. The influence of the non-linearity is analyzed and benchmarked against an exact pseudo-mode solution, and compared with established master equations in the Markovian regime. Within this regime, the comparison demonstrates the presence of short-time non-Markovian effects at times well beyond the inverse of the environment's bandwidth, and memory effects induced by short laser pulses. These findings offer valuable insights into the driven open system dynamics of quantum optical systems, impurities embedded in solid-state materials, molecular systems, and more, paving the way for precise control of their quantum states.	翻訳日:2024-11-02 23:08:51 公開日:2024-10-04
# SELU: 未知の環境下での自己学習型体育館 SELU: Self-Learning Embodied MLLMs in Unknown Environments ( http://arxiv.org/abs/2410.03303v1 ) ライセンス: Link先を確認	Boyu Li, Haobin Jiang, Ziluo Ding, Xinrun Xu, Haoran Li, Dongbin Zhao, Zongqing Lu,	(参考訳) 近年,Multimodal Large Language Model (MLLM) は視覚的理解と意思決定能力を示し,未知の環境でMLLMを自律的に改善する探索を可能にしている。しかし、人間や環境フィードバックのような外部からのフィードバックは、必ずしも利用できない。この課題に対処するため,既存の手法は主に投票・採点機構によるMLLMの意思決定能力の向上に重点を置いているが,未知環境におけるMLLMの環境理解向上にはほとんど努力が払われていない。 MLLMの自己学習の可能性を完全に解き放つために,強化学習におけるアクター-批判的自己学習パラダイムに触発された,SELUと呼ばれる新しいアクター-批判的自己学習パラダイムを提案する。批評家は、アクターが収集したインタラクショントラジェクトリから知識を抽出するために、自己認識と後向きのレバーベリングを採用し、それによって環境理解を増強する。同時に、批評家が提供した自己フィードバックにより、俳優は改善され、意思決定が強化される。筆者らはAI2-THORおよびVirtualHome環境での手法の評価を行い、SELUは約28%と30%の批判的改善と、自己学習による約20%と24%のアクター的改善を実現している。 Recently, multimodal large language models (MLLMs) have demonstrated strong visual understanding and decision-making capabilities, enabling the exploration of autonomously improving MLLMs in unknown environments. However, external feedback like human or environmental feedback is not always available. To address this challenge, existing methods primarily focus on enhancing the decision-making capabilities of MLLMs through voting and scoring mechanisms, while little effort has been paid to improving the environmental comprehension of MLLMs in unknown environments. To fully unleash the self-learning potential of MLLMs, we propose a novel actor-critic self-learning paradigm, dubbed SELU, inspired by the actor-critic paradigm in reinforcement learning. The critic employs self-asking and hindsight relabeling to extract knowledge from interaction trajectories collected by the actor, thereby augmenting its environmental comprehension. Simultaneously, the actor is improved by the self-feedback provided by the critic, enhancing its decision-making. We evaluate our method in the AI2-THOR and VirtualHome environments, and SELU achieves critic improvements of approximately 28% and 30%, and actor improvements of about 20% and 24% via self-learning.	翻訳日:2024-11-02 23:08:51 公開日:2024-10-04
# ユニタリケイリーグラフにおける隣接行列ハミルトンにより支配される量子分数復元 Quantum fractional revival governed by adjacency matrix Hamiltonian in unitary Cayley graphs ( http://arxiv.org/abs/2410.03310v1 ) ライセンス: Link先を確認	Rachana Soni, Neelam Choudhary, Navneet Pratap Singh,	(参考訳) 本稿では、隣接行列ハミルトンを用いたユニタリケイリーグラフにおける量子分数復元の存在を特徴づける。ユニタリケイリーグラフ $X=(Z_n, S)$ は接続集合 $S \subseteq Z_n$ として特別なグラフである。ユニタリケイリーグラフは積分グラフであり、その隣接行列は循環グラフである。ユニタリケイリーグラフにおける量子分数復活は、頂点の数が偶数である場合にのみ存在することを証明する。量子分数復元を許容するユニタリケイリーグラフに対して、数論的およびスペクトル的特徴付けが与えられる。量子分数復元は量子エンタングルメントに類似している。これは量子情報の伝達に有用な量子ビット状態伝達現象の1つである。 In this article, we give characterization for existence of quantum fractional revival in unitary Cayley graph utilizing adjacency matrix Hamiltonian. Unitary Cayley graph $X=( Z_n, S)$ is a special graph as connection set $S \subseteq Z_n$ is the collection of coprimes to $n$. Unitary Cayley graph is an integral graph and its adjacency matrix is a circulant one. We prove that quantum fractional revival in unitary Cayley graphs exists only when the number of vertices is even. Number-theoretic and spectral characterizations are given for unitary Cayley graph admitting quantum fractional revival. Quantum fractional revival is analogous to quantum entanglement. It is one of qubit state transfer phenomena useful in communication of quantum information.	翻訳日:2024-11-02 23:08:51 公開日:2024-10-04
# Quo Vadis, Motion Generation? 大規模言語モデルから大規模運動モデルへ Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models ( http://arxiv.org/abs/2410.03311v1 ) ライセンス: Link先を確認	Ye Wang, Sipeng Zheng, Bin Cao, Qianshan Wei, Qin Jin, Zongqing Lu,	(参考訳) 近年のLLMの成功に触発されて、人間の動き理解の分野は、大きな動きモデルの開発へと移りつつある。いくつかの進歩にもかかわらず、現在の最先端の作業は、大規模で高品質なモーションデータがないために、真のジェネラリストモデルを達成するには程遠いままである。これを解決するために、最初の100万レベルのモーション生成ベンチマークであるMotionBaseを紹介し、前回の最大データセットの15倍のデータ量を提供し、階層的な詳細なテキスト記述を備えたマルチモーダルデータを特徴付ける。この膨大なデータセットを活用することで、我々の大きな動きモデルは、目に見えないものを含む幅広い動きの強いパフォーマンスを示す。組織的な調査を通じて、我々は、データ取得コストの軽減に重要な役割を果たす合成データと擬似ラベルを用いて、データサイズとモデルサイズの両方をスケールすることの重要性を強調した。さらに,本研究では,既存の評価指標,特にドメイン外のテキスト命令を扱う際の限界を明らかにする。さらに,動作情報を保存し,コードブックの容量を拡大し,大規模動きモデルの表現能力を向上する,動きトークン化のための新しい2次元ルックアップフリーアプローチを提案する。 MotionBaseのリリースとこの研究から得られた知見は、より強力で汎用的なモーション生成モデルを開発するための道を開くことが期待されている。 Inspired by the recent success of LLMs, the field of human motion understanding has increasingly shifted towards the development of large motion models. Despite some progress, current state-of-the-art works remain far from achieving truly generalist models, largely due to the lack of large-scale, high-quality motion data. To address this, we present MotionBase, the first million-level motion generation benchmark, offering 15 times the data volume of the previous largest dataset, and featuring multimodal data with hierarchically detailed text descriptions. By leveraging this vast dataset, our large motion model demonstrates strong performance across a broad range of motions, including unseen ones. Through systematic investigation, we underscore the importance of scaling both data and model size, with synthetic data and pseudo labels playing a crucial role in mitigating data acquisition costs. Moreover, our research reveals the limitations of existing evaluation metrics, particularly in handling out-of-domain text instructions -- an issue that has long been overlooked. In addition to these, we introduce a novel 2D lookup-free approach for motion tokenization, which preserves motion information and expands codebook capacity, further enhancing the representative ability of large motion models. The release of MotionBase and the insights gained from this study are expected to pave the way for the development of more powerful and versatile motion generation models.	翻訳日:2024-11-02 23:08:51 公開日:2024-10-04
# 大規模言語モデルを用いたASR後感情認識における文脈とシステム融合 Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models ( http://arxiv.org/abs/2410.03312v1 ) ライセンス: Link先を確認	Pavel Stepachev, Pinzhen Chen, Barry Haddow,	(参考訳) 大規模言語モデル(LLM)は、音声とテキストのモデリングにおいて重要な役割を担っている。我々は、文脈と複数のシステムのアウトプットをASR後の音声感情予測に最適に活用するために、GenSEC という最近のタスクに基づいて LLM のプロンプトについて検討する。我々の技術には、ASR transcript ranking, variable conversation context, and system output fusionがある。会話の文脈はリターンを減少させており、予測のための書き起こしを選択するための指標が不可欠であることを示す。最後に、提案するベースラインを絶対精度で20%超えます。 Large language models (LLMs) have started to play a vital role in modelling speech and text. To explore the best use of context and multiple systems' outputs for post-ASR speech emotion prediction, we study LLM prompting on a recent task named GenSEC. Our techniques include ASR transcript ranking, variable conversation context, and system output fusion. We show that the conversation context has diminishing returns and the metric used to select the transcript for prediction is crucial. Finally, our best submission surpasses the provided baseline by 20% in absolute accuracy.	翻訳日:2024-11-02 23:08:51 公開日:2024-10-04
# インパクト指向の個人化フェデレーション学習 Influence-oriented Personalized Federated Learning ( http://arxiv.org/abs/2410.03315v1 ) ライセンス: Link先を確認	Yue Tan, Guodong Long, Jing Jiang, Chengqi Zhang,	(参考訳) 伝統的な連合学習(FL)法は、しばしばパラメータアグリゲーションの固定重み付けに依存し、他者による相互影響を無視している。したがって、不均一なデータコンテキストにおけるそれらの有効性は限られている。この問題に対処するために,各クライアントに対して適応パラメータアグリゲーションを実現するために,クライアントレベルとクラスレベルの影響を定量的に測定する,影響指向のフェデレーション学習フレームワークであるFedC^2Iを提案する。我々の中核となる考え方は、十分に構成された影響ベクトルと影響行列を用いて、FLシステム内のクライアント間影響を明示的にモデル化することである。インフルエンスベクトルは、クライアントレベルの影響を定量化し、クライアントが他者からの知識を選択的に取得し、特徴表現層の集約をガイドする。一方、影響行列は、パーソナライズされた分類器アグリゲーションを達成するために、よりきめ細かな方法でクラスレベルの影響をキャプチャする。非IID環境下での既存のフェデレート学習手法に対するFedC^2Iの性能評価を行い,本手法の優位性を実証した。 Traditional federated learning (FL) methods often rely on fixed weighting for parameter aggregation, neglecting the mutual influence by others. Hence, their effectiveness in heterogeneous data contexts is limited. To address this problem, we propose an influence-oriented federated learning framework, namely FedC^2I, which quantitatively measures Client-level and Class-level Influence to realize adaptive parameter aggregation for each client. Our core idea is to explicitly model the inter-client influence within an FL system via the well-crafted influence vector and influence matrix. The influence vector quantifies client-level influence, enables clients to selectively acquire knowledge from others, and guides the aggregation of feature representation layers. Meanwhile, the influence matrix captures class-level influence in a more fine-grained manner to achieve personalized classifier aggregation. We evaluate the performance of FedC^2I against existing federated learning methods under non-IID settings and the results demonstrate the superiority of our method.	翻訳日:2024-11-02 22:58:38 公開日:2024-10-04
# Visual-O1:マルチモーダル・マルチターン・チェーン・オブ・シンセサイティングによる曖昧な指示を理解する Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning ( http://arxiv.org/abs/2410.03321v1 ) ライセンス: Link先を確認	Minheng Ni, Yutao Fan, Lei Zhang, Wangmeng Zuo,	(参考訳) 大規模モデルが進化するにつれて、言語命令はマルチモーダルタスクでますます活用される。人間の言語の習慣のため、これらの命令はしばしば現実のシナリオにおける曖昧さを含み、正確な解釈のために視覚的文脈や常識の統合を必要とする。しかし、高度にインテリジェントな大規模モデルでさえ、曖昧な命令に対して顕著な性能制限を示し、曖昧さの弱い推論能力は破滅的な誤りを引き起こす可能性がある。本稿では,マルチモーダルなマルチターン・チェーン・オブ・シークレット推論フレームワークであるVisual-O1を提案する。人間のマルチモーダルなマルチターン推論をシミュレートし、高度にインテリジェントなモデルに対する瞬間的な経験や、不明瞭な指示を理解するための一般的なインテリジェントなモデルに対する経験を提供する。長いテキストを理解したり、長い複雑な推論を行うために高知能なモデルを必要とする従来の手法とは異なり、我々のフレームワークは計算オーバーヘッドを著しく増加させておらず、一般的にはインテリジェントなモデルであってもより汎用的で効果的である。実験により,本手法は,曖昧な命令に対して異なるインテリジェンスレベルのモデルの性能を著しく向上するだけでなく,汎用データセット上での性能も向上することが示された。私たちの研究は、不確実性と曖昧さのある現実のシナリオにおいて、人工知能が人間のように機能する可能性を強調します。データとコードを公開します。 As large-scale models evolve, language instructions are increasingly utilized in multi-modal tasks. Due to human language habits, these instructions often contain ambiguities in real-world scenarios, necessitating the integration of visual context or common sense for accurate interpretation. However, even highly intelligent large models exhibit significant performance limitations on ambiguous instructions, where weak reasoning abilities of disambiguation can lead to catastrophic errors. To address this issue, this paper proposes Visual-O1, a multi-modal multi-turn chain-of-thought reasoning framework. It simulates human multi-modal multi-turn reasoning, providing instantial experience for highly intelligent models or empirical experience for generally intelligent models to understand ambiguous instructions. Unlike traditional methods that require models to possess high intelligence to understand long texts or perform lengthy complex reasoning, our framework does not significantly increase computational overhead and is more general and effective, even for generally intelligent models. Experiments show that our method not only significantly enhances the performance of models of different intelligence levels on ambiguous instructions but also improves their performance on general datasets. Our work highlights the potential of artificial intelligence to work like humans in real-world scenarios with uncertainty and ambiguity. We will release our data and code.	翻訳日:2024-11-02 22:58:37 公開日:2024-10-04
# SpatioTemporal Informationは2つのビデオ要約ベンチマークに役立つか? Does SpatioTemporal information benefit Two video summarization benchmarks? ( http://arxiv.org/abs/2410.03323v1 ) ライセンス: Link先を確認	Aashutosh Ganesh, Mirela Popa, Daan Odijk, Nava Tintarev,	(参考訳) ビデオの要約における重要な側面は、ビデオの各部分の背後にある時間的文脈を理解して、何が重要で何が重要でないかを理解することである。近年、ビデオ要約モデルは、この情報を表現するために時空間関係をモデル化している。これらのモデルは重要なベンチマークデータセットに対して最先端の相関スコアを得た。しかし、レビューされていないのは、時空間関係が最先端の結果を得るために必要であるかどうかである。これまでのアクティビティ認識の研究は、シーンやオブジェクトのような静的なキューを、モーション情報よりも優先することで、バイアスを見つけてきた。本稿では,類似の関係が映像要約の課題に影響を及ぼすかどうかを考察する。そのために、既存のベンチマークデータセットで時間情報が果たす役割を分析します。まず、時間的に不変なモデルでベースラインを推定し、そのようなモデルがベンチマークデータセット(TVSumとSumMe)上でどれだけうまくランクされているかを確認する。次に、ビデオの時間的順序を乱して、既存の最先端モデルに与える影響を調査します。我々の研究結果の1つは、TVSumデータセット上の人間のベースラインに近い競合相関スコアを時間的不変モデルが達成することである。また,既存モデルは時間的摂動の影響を受けないことを示す。さらに、一定の時間セグメントをシャッフルする破壊戦略により、相関スコアを実際に改善することができる。これらの結果から,時空間的関係が微妙な役割を果たしていることが判明し,これらのベンチマークが映像要約のタスクを適切にモデル化するかどうかという疑問が提起された。 https://github.com/AashGan/TemporalPerturbSum An important aspect of summarizing videos is understanding the temporal context behind each part of the video to grasp what is and is not important. Video summarization models have in recent years modeled spatio-temporal relationships to represent this information. These models achieved state-of-the-art correlation scores on important benchmark datasets. However, what has not been reviewed is whether spatio-temporal relationships are even required to achieve state-of-the-art results. Previous work in activity recognition has found biases, by prioritizing static cues such as scenes or objects, over motion information. In this paper we inquire if similar spurious relationships might influence the task of video summarization. To do so, we analyse the role that temporal information plays on existing benchmark datasets. We first estimate a baseline with temporally invariant models to see how well such models rank on benchmark datasets (TVSum and SumMe). We then disrupt the temporal order of the videos to investigate the impact it has on existing state-of-the-art models. One of our findings is that the temporally invariant models achieve competitive correlation scores that are close to the human baselines on the TVSum dataset. We also demonstrate that existing models are not affected by temporal perturbations. Furthermore, with certain disruption strategies that shuffle fixed time segments, we can actually improve their correlation scores. With these results, we find that spatio-temporal relationship play a minor role and we raise the question whether these benchmarks adequately model the task of video summarization. Code available at: https://github.com/AashGan/TemporalPerturbSum	翻訳日:2024-11-02 22:58:37 公開日:2024-10-04
# デコヒーレンスフリー部分空間を用いたフォトニック絡み合い状態の決定論的生成 Deterministic generation of photonic entangled states using decoherence-free subspaces ( http://arxiv.org/abs/2410.03325v1 ) ライセンス: Link先を確認	Oriol Rubies-Bigorda, Stuart J. Masson, Susanne F. Yelin, Ana Asenjo-Garcia,	(参考訳) 本稿では、量子情報技術の基本となる光の量子状態の決定論的生成のための資源として、物質の集合状態を用いることを提案する。我々の最小限のモデルでは、半導波路に結合された3つのエミッタ、すなわちミラーで終了する1次元導波路から構成される。発光体間の光子による相互作用は、明るい状態と暗い状態の出現をもたらす。ダークステートは、消散から保護された非コヒーレンスな部分空間を形成する。エミッターの局所的な駆動と共振周波数の制御により、デコヒーレンス自由部分空間内で任意の量子ゲートを実行することができる。明るい状態への結合は光子放出を促進するため、光と物質の間の量子ゲートの実現を可能にする。これらのゲートのシーケンシャルな応用は、グリーンベルガー=ホルン=ザイリンガーや1次元および2次元のクラスター状態のようなフォトニックな絡み合った状態を生み出すことを実証する。 We propose the use of collective states of matter as a resource for the deterministic generation of quantum states of light, which are fundamental for quantum information technologies. Our minimal model consists of three emitters coupled to a half-waveguide, i.e., a one-dimensional waveguide terminated by a mirror. Photon-mediated interactions between the emitters result in the emergence of bright and dark states. The dark states form a decoherence-free subspace, protected from dissipation. Local driving of the emitters and control of their resonance frequencies allows to perform arbitrary quantum gates within the decoherence-free subspace. Coupling to bright states facilitates photon emission, thereby enabling the realization of quantum gates between light and matter. We demonstrate that sequential application of these gates leads to the generation of photonic entangled states, such as Greenberger-Horne-Zeilinger and one- and two-dimensional cluster states.	翻訳日:2024-11-02 22:58:37 公開日:2024-10-04
# EmojiHeroVR:頭部ディスプレイの部分的閉塞下での表情認識に関する研究 EmojiHeroVR: A Study on Facial Expression Recognition under Partial Occlusion from Head-Mounted Displays ( http://arxiv.org/abs/2410.03331v1 ) ライセンス: Link先を確認	Thorben Ortmann, Qi Wang, Larissa Putzar,	(参考訳) 感情認識は、感情フィードバックを提供し、高度なパーソナライゼーションを可能にすることで、バーチャルリアリティ(VR)体験の評価と向上を促進する。しかし、HMD(Head-Mounted Displays)は顔の上半分を遮蔽するので、表情はユーザーの感情を認識するために使われることは滅多にない。この問題に対処するため,新しいVRゲームEmojiHeroVRをプレイした37人の参加者を対象に調査を行った。収集されたデータベースであるEmoHeVRDB(EmojiHeroVR Database)には、3,556のラベル付き顔画像と1,778の再現された感情が含まれている。ラベル付き画像ごとに、ラベル付き画像の前後に直接記録された29のフレームも提供し、動的顔表情認識(FER)を容易にする。さらに、EmoHeVRDBには、フレーム毎にMeta Quest Pro VRヘッドセットを介してキャプチャされた63の表情のアクティベートに関するデータが含まれている。データベースを利用して,静的FER分類タスクのベースライン評価を行い,6つの基本的な感情と,EfficientNet-B0アーキテクチャを用いた中立性について検討した。最良のモデルでは、テストセットで69.84%の精度を達成し、HMD閉塞下のFERは従来のFERよりもはるかに困難であることを示した。 Emotion recognition promotes the evaluation and enhancement of Virtual Reality (VR) experiences by providing emotional feedback and enabling advanced personalization. However, facial expressions are rarely used to recognize users' emotions, as Head-Mounted Displays (HMDs) occlude the upper half of the face. To address this issue, we conducted a study with 37 participants who played our novel affective VR game EmojiHeroVR. The collected database, EmoHeVRDB (EmojiHeroVR Database), includes 3,556 labeled facial images of 1,778 reenacted emotions. For each labeled image, we also provide 29 additional frames recorded directly before and after the labeled image to facilitate dynamic Facial Expression Recognition (FER). Additionally, EmoHeVRDB includes data on the activations of 63 facial expressions captured via the Meta Quest Pro VR headset for each frame. Leveraging our database, we conducted a baseline evaluation on the static FER classification task with six basic emotions and neutral using the EfficientNet-B0 architecture. The best model achieved an accuracy of 69.84% on the test set, indicating that FER under HMD occlusion is feasible but significantly more challenging than conventional FER.	翻訳日:2024-11-02 22:58:37 公開日:2024-10-04
# 乳がん分類における先行CNNアーキテクチャの比較解析とエンサンブルエンハンスメント Comparative Analysis and Ensemble Enhancement of Leading CNN Architectures for Breast Cancer Classification ( http://arxiv.org/abs/2410.03333v1 ) ライセンス: Link先を確認	Gary Murphy, Raghubir Singh,	(参考訳) 本研究は,病理組織像を用いた乳癌分類への新規かつ正確なアプローチを提案する。さまざまな画像データセットにまたがる主要な畳み込みニューラルネットワーク(CNN)モデルを体系的に比較し、最適なハイパーパラメータを特定し、分類の有効性に基づいてランク付けする。探索する各モデルの分類精度を最大化するために、データ強化、代替の完全接続層、モデルトレーニングハイパーパラメータ設定、および事前トレーニングされた重みの使用に対するモデルの再トレーニングの利点がある。私たちの方法論には、トレーニング実行中に一貫性のあるデータ条件を確保するために生成されたデータセットのシリアライズや、トレーニング期間の大幅な短縮など、いくつかのオリジナルの概念が含まれている。結果の自動キュレーションと組み合わせることで、2,000以上のトレーニング順列の探索が可能になった。本研究は,独立系CNNモデルにおいて,例外的分類精度を達成し,モデルの有効性でランク付けするために必要な設定を確立した。これらの結果に基づき、3つの高性能スタンドアロンCNNモデルと多様な分類器を積み重ねたアンサンブルアーキテクチャを提案し、その結果、分類精度が向上した。非常に多くのモデル置換を体系的に実行して最高の結果を得る能力は、BreakHis x40とBreakHis x200の99.75%、Bachデータセットをトレーニング、検証、テストデータセットに分割する95.18%など、非常に高品質な結果をもたらす。 Bach Onlineのブラインドチャレンジでは89%がこのアプローチを使用していた。本研究は乳癌の病理組織像データセットに基づいているが,他の医用画像データセットにも等しく適用可能である。 This study introduces a novel and accurate approach to breast cancer classification using histopathology images. It systematically compares leading Convolutional Neural Network (CNN) models across varying image datasets, identifies their optimal hyperparameters, and ranks them based on classification efficacy. To maximize classification accuracy for each model we explore, the effects of data augmentation, alternative fully-connected layers, model training hyperparameter settings, and, the advantages of retraining models versus using pre-trained weights. Our methodology includes several original concepts, including serializing generated datasets to ensure consistent data conditions across training runs and significantly reducing training duration. Combined with automated curation of results, this enabled the exploration of over 2,000 training permutations -- such a comprehensive comparison is as yet unprecedented. Our findings establish the settings required to achieve exceptional classification accuracy for standalone CNN models and rank them by model efficacy. Based on these results, we propose ensemble architectures that stack three high-performing standalone CNN models together with diverse classifiers, resulting in improved classification accuracy. The ability to systematically run so many model permutations to get the best outcomes gives rise to very high quality results, including 99.75% for BreakHis x40 and BreakHis x200 and 95.18% for the Bach datasets when split into train, validation and test datasets. The Bach Online blind challenge, yielded 89% using this approach. Whilst this study is based on breast cancer histopathology image datasets, the methodology is equally applicable to other medical image datasets.	翻訳日:2024-11-02 22:58:37 公開日:2024-10-04
# X線は15の価値がある: 解釈可能な放射線学レポート作成のためのスパースオートエンコーダ An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation ( http://arxiv.org/abs/2410.03334v1 ) ライセンス: Link先を確認	Ahmed Abdulaal, Hugo Fry, Nina Montaña-Brown, Ayodeji Ijishakin, Jack Gao, Stephanie Hyland, Daniel C. Alexander, Daniel C. Castro,	(参考訳) 放射線サービスは前例のない需要を経験しており、放射線学レポート生成の自動化への関心が高まっている。既存のビジョンランゲージモデル(VLM)は幻覚に悩まされ、解釈性に欠け、高価な微調整を必要とする。我々は,SAE-Radを導入し,スパースオートエンコーダ(SAE)を用いて,事前学習された視覚変換器から人間の解釈可能な特徴へ潜在表現を分解する。我々のハイブリッドアーキテクチャは、最先端のSAEの進歩を組み合わせ、空間性を維持しつつ正確な遅延再構築を実現します。既成の言語モデルを用いて,SAEの各特徴について,地中真実のレポートをラジオロジカルな記述に分解し,各画像の完全なレポートにコンパイルすることで,このタスクのために大規模なモデルを微調整する必要がなくなる。我々の知る限り、SAE-Radは下流のマルチモーダル推論タスクに機械的解釈可能性手法を明示的に用いた最初の例である。 MIMIC-CXRデータセットでは、SAE-Radは、最先端のモデルと比較して競合する放射線学固有のメトリクスを達成し、トレーニングのために計算資源を著しく少なくしている。質的な分析により、SAE-Radは意味のある視覚概念を学び、専門家の解釈と密接に一致したレポートを生成することが明らかになった。以上の結果から,SAEは医療におけるマルチモーダル推論を強化し,既存のVLMの代替となる可能性が示唆された。 Radiological services are experiencing unprecedented demand, leading to increased interest in automating radiology report generation. Existing Vision-Language Models (VLMs) suffer from hallucinations, lack interpretability, and require expensive fine-tuning. We introduce SAE-Rad, which uses sparse autoencoders (SAEs) to decompose latent representations from a pre-trained vision transformer into human-interpretable features. Our hybrid architecture combines state-of-the-art SAE advancements, achieving accurate latent reconstructions while maintaining sparsity. Using an off-the-shelf language model, we distil ground-truth reports into radiological descriptions for each SAE feature, which we then compile into a full report for each image, eliminating the need for fine-tuning large models for this task. To the best of our knowledge, SAE-Rad represents the first instance of using mechanistic interpretability techniques explicitly for a downstream multi-modal reasoning task. On the MIMIC-CXR dataset, SAE-Rad achieves competitive radiology-specific metrics compared to state-of-the-art models while using significantly fewer computational resources for training. Qualitative analysis reveals that SAE-Rad learns meaningful visual concepts and generates reports aligning closely with expert interpretations. Our results suggest that SAEs can enhance multimodal reasoning in healthcare, providing a more interpretable alternative to existing VLMs.	翻訳日:2024-11-02 22:58:37 公開日:2024-10-04
# Audio-Agent: オーディオ生成、編集、合成にLLMを活用する Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition ( http://arxiv.org/abs/2410.03335v1 ) ライセンス: Link先を確認	Zixuan Wang, Yu-Wing Tai, Chi-Keung Tang,	(参考訳) 本稿では,テキストやビデオの入力に基づく音声生成,編集,合成のためのマルチモーダルフレームワークであるAudio-Agentを紹介する。従来のTTA(text-to-audio)タスクのアプローチは、テキスト記述からシングルパス推論を行うことが多い。しかし、このデザインは複雑なテキスト条件が与えられた場合、高品質なオーディオを作り出すのに苦労している。本手法では,事前学習したTTA拡散ネットワークを音声生成エージェントとして利用し,テキスト条件をアトミックな特定の命令に分解し,音声生成のためにエージェントを呼び出す。その結果、Audio-Agentは、提供されたテキストやビデオと密に一致した高品質なオーディオを生成し、可変長生成もサポートする。 VTA(Video-to-audio)タスクでは、既存のほとんどの手法では、ビデオイベントと生成されたオーディオを同期させるタイムスタンプ検出器をトレーニングする必要がある。本稿では,事前学習したLarge Language Model(LLM),例えばGemma2-2B-itを微調整して,ビデオとオーディオのモダリティをブリッジする意味的条件と時間的条件の両方を得る,というシンプルなアプローチを提案する。したがって、我々のフレームワークは、トレーニングにおいてかなりの計算オーバーヘッドを伴わずに、TTAタスクとVTAタスクの両方に包括的なソリューションを提供する。 We introduce Audio-Agent, a multimodal framework for audio generation, editing and composition based on text or video inputs. Conventional approaches for text-to-audio (TTA) tasks often make single-pass inferences from text descriptions. While straightforward, this design struggles to produce high-quality audio when given complex text conditions. In our method, we utilize a pre-trained TTA diffusion network as the audio generation agent to work in tandem with GPT-4, which decomposes the text condition into atomic, specific instructions, and calls the agent for audio generation. Consequently, Audio-Agent generates high-quality audio that is closely aligned with the provided text or video while also supporting variable-length generation. For video-to-audio (VTA) tasks, most existing methods require training a timestamp detector to synchronize video events with generated audio, a process that can be tedious and time-consuming. We propose a simpler approach by fine-tuning a pre-trained Large Language Model (LLM), e.g., Gemma2-2B-it, to obtain both semantic and temporal conditions to bridge video and audio modality. Thus our framework provides a comprehensive solution for both TTA and VTA tasks without substantial computational overhead in training.	翻訳日:2024-11-02 22:58:37 公開日:2024-10-04
# 自然論理と大規模言語モデルによるゼロショットファクト検証 Zero-Shot Fact Verification via Natural Logic and Large Language Models ( http://arxiv.org/abs/2410.03341v1 ) ライセンス: Link先を確認	Marek Strong, Rami Aly, Andreas Vlachos,	(参考訳) 近年の自然論理による事実検証システムの発達は、主張を集合論演算子を通して証拠と整合させ、忠実な正当化を提供することによって説明可能性を高めている。これらの進歩にもかかわらず、このようなシステムは自然論理に注釈付けされた大量のトレーニングデータに依存していることが多い。そこで本研究では,命令調整型大規模言語モデルの一般化機能を利用したゼロショット手法を提案する。提案手法と他の事実検証システムのゼロショット能力を総合的に評価するために,多言語データセットを含む,人工的および実世界のクレームに関するすべてのモデルを評価する。また,本手法を他の事実検証システムと比較する。まず、ゼロショットの一般化設定において、本手法は、自然論理データに特化して訓練されていない他のシステムよりも優れており、最高性能のベースラインに対して平均8.96ポイントの精度向上を実現していることを示す。第2に、ゼロショット転送設定において、自然論理データに基づいて訓練された現在のシステムは、他の領域にうまく一般化しないことを示し、本手法は、実世界のクレームを持つ全てのデータセットにおいて、これらのシステムより優れていることを示す。 The recent development of fact verification systems with natural logic has enhanced their explainability by aligning claims with evidence through set-theoretic operators, providing faithful justifications. Despite these advancements, such systems often rely on a large amount of training data annotated with natural logic. To address this issue, we propose a zero-shot method that utilizes the generalization capabilities of instruction-tuned large language models. To comprehensively assess the zero-shot capabilities of our method and other fact verification systems, we evaluate all models on both artificial and real-world claims, including multilingual datasets. We also compare our method against other fact verification systems in two setups. First, in the zero-shot generalization setup, we demonstrate that our approach outperforms other systems that were not specifically trained on natural logic data, achieving an average accuracy improvement of 8.96 points over the best-performing baseline. Second, in the zero-shot transfer setup, we show that current systems trained on natural logic data do not generalize well to other domains, and our method outperforms these systems across all datasets with real-world claims.	翻訳日:2024-11-02 22:58:37 公開日:2024-10-04
# Dolphin: スケーラブルなニューロシンボリックラーニングのためのプログラム可能なフレームワーク Dolphin: A Programmable Framework for Scalable Neurosymbolic Learning ( http://arxiv.org/abs/2410.03348v1 ) ライセンス: Link先を確認	Aaditya Naik, Jason Liu, Claire Wang, Saikat Dutta, Mayur Naik, Eric Wong,	(参考訳) ニューロシンボリック学習は、シンボリック推論をディープラーニングモデルに組み込むための有望なパラダイムとして登場した。しかし、既存のフレームワークは、トレーニングデータとシンボリックプログラムの複雑さの両方に関してスケーラビリティに制限がある。シンボルプログラムの前方連鎖と後方勾配の伝播をベクトル化計算にマッピングすることにより,神経記号学習を基本レベルでスケールするフレームワークであるDolphinを提案する。この目的のためにDolphin氏は、PyTorchのような高性能なディープラーニングフレームワークの上に構築された一連の抽象化とプリミティブを導入し、シンボルプログラムをPyTorchモジュールとして書けるようにした。これにより、開発者が慣れ親しんだPythonのような言語でニューロシンボリックプログラムを記述し、GPU上でのエンドツーエンドの差別化に寄与する計算グラフにコンパイルすることが可能になる。我々はDolphinを、テキスト、画像、ビデオ処理のディープラーニングモデルとマルチホップ推論、再帰、さらにはPython eval()のようなブラックボックス関数を含むシンボリックプログラムを組み合わせた5つのニューロシンボリックタスクを対象とした13のベンチマークで評価した。ドルフィンの訓練時間は0.33%-37.17%(平均2.77%)で、Scallop、ISED、IndeCateR+と比較すると最大である。ドルフィンで書かれたモデルは、最大のベンチマークでも最先端の精度を達成する。 Neurosymbolic learning has emerged as a promising paradigm to incorporate symbolic reasoning into deep learning models. However, existing frameworks are limited in scalability with respect to both the training data and the complexity of symbolic programs. We propose Dolphin, a framework to scale neurosymbolic learning at a fundamental level by mapping both forward chaining and backward gradient propagation in symbolic programs to vectorized computations. For this purpose, Dolphin introduces a set of abstractions and primitives built directly on top of a high-performance deep learning framework like PyTorch, effectively enabling symbolic programs to be written as PyTorch modules. It thereby enables neurosymbolic programs to be written in a language like Python that is familiar to developers and compile them to computation graphs that are amenable to end-to-end differentiation on GPUs. We evaluate Dolphin on a suite of 13 benchmarks across 5 neurosymbolic tasks that combine deep learning models for text, image, or video processing with symbolic programs that involve multi-hop reasoning, recursion, and even black-box functions like Python eval(). Dolphin only takes 0.33%-37.17% of the time (and 2.77% on average) to train these models on the largest input per task compared to baselines Scallop, ISED, and IndeCateR+, which time out on most of these inputs. Models written in Dolphin also achieve state-of-the-art accuracies even on the largest benchmarks.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# 自己回帰法によるコード等価表現の生成 Generating Equivalent Representations of Code By A Self-Reflection Approach ( http://arxiv.org/abs/2410.03351v1 ) ライセンス: Link先を確認	Jia Li, Ge Li, Lecheng Wang, Hao Zhu, Zhi Jin,	(参考訳) コードの等価表現(ER)は、コード自身、例えば自然言語のコメントや擬似コードと同じ意味を保存したテキスト表現である。 ERはソフトウェア開発とメンテナンスにおいて重要な役割を担います。しかし、コードのERを自動的に生成する方法は、依然としてオープンな課題である。本稿では,ERを生成するための自己回帰手法を提案する。 2つの大規模言語モデル(LLM)を相互に動作させ、リフレクションプロセスを通じてERを生成する。 ERの制約が適用されるかどうかによって、我々の手法はオープンな設定と制約のある設定の両方でERを生成する。 ERを2つの設定で生成し,8つの結果を得るための実証的研究を行った。 1)オープン環境でERを生成する。オープンな環境では、LLMは制約なしにコードを表現することができ、その結果のERを分析し、5つの重要な発見を明らかにする。これらの発見は、LLMがコード内の構文構造、API、数値計算をどのように理解したかに光を当てた。 2)制約された環境でERを生成する。制約された設定では、自然言語コメント、擬似コード、フローチャートなどのERに制約を課す。これにより、当社のアプローチは、さまざまなソフトウェアエンジニアリングタスクに対処できます。実験結果から,本手法が特定の制約に従うERを効果的に生成し,様々なソフトウェア工学タスクをサポートすることを示す3つの知見を得た。 (3)今後の方向性。また、コード生成のための中間言語の作成、LCMフレンドリーな要件記述の探索、ソフトウェア工学タスクのさらなる支援など、将来的な研究の方向性についても論じる。本論文は,研究コミュニティの議論を喚起し,多くのフォローアップ研究を刺激すると考えられる。 Equivalent Representations (ERs) of code are textual representations that preserve the same semantics as the code itself, e.g., natural language comments and pseudocode. ERs play a critical role in software development and maintenance. However, how to automatically generate ERs of code remains an open challenge. In this paper, we propose a self-reflection approach to generating ERs of code. It enables two Large Language Models (LLMs) to work mutually and produce an ER through a reflection process. Depending on whether constraints on ERs are applied, our approach generates ERs in both open and constrained settings. We conduct a empirical study to generate ERs in two settings and obtain eight findings. (1) Generating ERs in the open setting. In the open setting, we allow LLMs to represent code without any constraints, analyzing the resulting ERs and uncovering five key findings. These findings shed light on how LLMs comprehend syntactic structures, APIs, and numerical computations in code. (2) Generating ERs in the constrained setting. In the constrained setting, we impose constraints on ERs, such as natural language comments, pseudocode, and flowcharts. This allows our approach to address a range of software engineering tasks. Based on our experiments, we have three findings demonstrating that our approach can effectively generate ERs that adhere to specific constraints, thus supporting various software engineering tasks. (3) Future directions. We also discuss potential future research directions, such as deriving intermediate languages for code generation, exploring LLM-friendly requirement descriptions, and further supporting software engineering tasks. We believe that this paper will spark discussions in research communities and inspire many follow-up studies.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# LANTERN:Relaxed Speculative Decodingによる視覚自己回帰モデルの高速化 LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding ( http://arxiv.org/abs/2410.03355v1 ) ライセンス: Link先を確認	Doohyuk Jang, Sihwan Park, June Yong Yang, Yeonsung Jung, Jihun Yun, Souvik Kundu, Sung-Yub Kim, Eunho Yang,	(参考訳) オートレグレッシブ(AR)モデルは画像生成において最近注目され、しばしば拡散モデルの性能と一致するか、さらに上回っている。しかし、ARモデルの1つの大きな制限は、そのシーケンシャルな性質であり、トークンを一度に1つずつ処理し、より効率的に動作するGANや拡散ベースの方法と比較すると、生成を遅くする。投機的復号化は、1つの前方で複数のトークンを生成することでLCMを加速させる効果が証明されているが、視覚ARモデルにおけるその応用はいまだに探索されていない。本稿では,視覚的ARモデルがトークンに低確率を割り当てることによって,投機的復号化の性能を損なうような,この設定における課題を特定する。この課題を克服するために、潜在空間におけるトークンの交換性を活用するLANTERNと呼ばれる緩和された受け入れ条件を提案する。この緩和は、未熟に拒絶される候補トークンをより柔軟な使用を可能にすることで、視覚的ARモデルにおける投機的復号化の有効性を回復させる。さらに、全変動距離境界を組み込むことで、画像の品質やセマンティックコヒーレンスを著しく損なうことなく、これらの速度ゲインを実現する。実験により,提案手法が投機的復号化よりも大幅に高速化されたことを示す。具体的には、最先端の投機的復号法である na\ の応用と比較して、LANTERN は現代のビジュアルARモデルである LlamaGen に適用すると、greedy の復号法とランダムサンプリング法と比較して、$\mathbf{1.75}\times$ と $\mathbf{1.76}\times$ のスピードアップを増大させる。 Auto-Regressive (AR) models have recently gained prominence in image generation, often matching or even surpassing the performance of diffusion models. However, one major limitation of AR models is their sequential nature, which processes tokens one at a time, slowing down generation compared to models like GANs or diffusion-based methods that operate more efficiently. While speculative decoding has proven effective for accelerating LLMs by generating multiple tokens in a single forward, its application in visual AR models remains largely unexplored. In this work, we identify a challenge in this setting, which we term \textit{token selection ambiguity}, wherein visual AR models frequently assign uniformly low probabilities to tokens, hampering the performance of speculative decoding. To overcome this challenge, we propose a relaxed acceptance condition referred to as LANTERN that leverages the interchangeability of tokens in latent space. This relaxation restores the effectiveness of speculative decoding in visual AR models by enabling more flexible use of candidate tokens that would otherwise be prematurely rejected. Furthermore, by incorporating a total variation distance bound, we ensure that these speed gains are achieved without significantly compromising image quality or semantic coherence. Experimental results demonstrate the efficacy of our method in providing a substantial speed-up over speculative decoding. In specific, compared to a na\"ive application of the state-of-the-art speculative decoding, LANTERN increases speed-ups by $\mathbf{1.75}\times$ and $\mathbf{1.76}\times$, as compared to greedy decoding and random sampling, respectively, when applied to LlamaGen, a contemporary visual AR model.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# 言語横断型AMRパーシングはメタになるべきか? : メタラーニングと共同学習型AMRパーシングの実証評価 Should Cross-Lingual AMR Parsing go Meta? An Empirical Assessment of Meta-Learning and Joint Learning AMR Parsing ( http://arxiv.org/abs/2410.03357v1 ) ライセンス: Link先を確認	Jeongwoo Kang, Maximin Coavoux, Cédric Lopez, Didier Schwab,	(参考訳) 言語間AMR解析は、トレーニングデータがソース言語でのみ利用できる場合、ターゲット言語でAMRグラフを予測するタスクである。 AMRトレーニングデータと評価データのサイズが小さいため、言語間AMR解析は英語、スペイン語、ドイツ語、中国語、イタリア語などの小さな言語でのみ研究されている。言語間構文解析にメタラーニングを適用したLangedijk et al(2022)からインスピレーションを得て,メタラーニングを用いた言語間AMR解析について検討した。我々は,これらのモデルを$k$-shotシナリオ(0-shotを含む)で評価し,クロアチア語,ファルシ語,韓国語,中国語,フランス語での有効性を評価した。特に、韓国とクロアチアのテストセットは、既存のThe Little Prince English AMRコーパスに基づいて、我々の研究の一部として開発され、公開されています。従来のジョイントラーニングと比較し,実証的研究を行った。メタラーニングモデルは, 特定の言語に対する0ショット評価において若干改善されているが, $k$ が 0 よりも高い場合, 性能向上は最小か欠落であることが示唆された。 Cross-lingual AMR parsing is the task of predicting AMR graphs in a target language when training data is available only in a source language. Due to the small size of AMR training data and evaluation data, cross-lingual AMR parsing has only been explored in a small set of languages such as English, Spanish, German, Chinese, and Italian. Taking inspiration from Langedijk et al. (2022), who apply meta-learning to tackle cross-lingual syntactic parsing, we investigate the use of meta-learning for cross-lingual AMR parsing. We evaluate our models in $k$-shot scenarios (including 0-shot) and assess their effectiveness in Croatian, Farsi, Korean, Chinese, and French. Notably, Korean and Croatian test sets are developed as part of our work, based on the existing The Little Prince English AMR corpus, and made publicly available. We empirically study our method by comparing it to classical joint learning. Our findings suggest that while the meta-learning model performs slightly better in 0-shot evaluation for certain languages, the performance gain is minimal or absent when $k$ is higher than 0.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# 量子コミットと量子片方向のOracle分離 Oracle Separation Between Quantum Commitments and Quantum One-wayness ( http://arxiv.org/abs/2410.03358v1 ) ライセンス: Link先を確認	John Bostanci, Boyang Chen, Barak Nehoran,	(参考訳) 量子コミットメントが存在するが、(効果的に検証可能な)片方向状態生成器が存在しないような、ユニタリな量子オラクルが存在することを示す。どちらも、暗号の最小仮定としてワンウェイ関数を置き換える候補として広く考えられている。最近の研究は、一方の状態発生器からコミットメントを構築することができることを示したが、他方の方向は未解決のままである。我々の結果はブラックボックスの構成を除外し、この決定的なオープンな問題を解決し、量子コミットメント(EFI対の同値クラス、量子オブザーバー転送、セキュアな量子多パーティ計算)が、すべての既知のプリミティブの中では、極端に弱いように見えることを示唆している。 We show that there exists a unitary quantum oracle relative to which quantum commitments exist but no (efficiently verifiable) one-way state generators exist. Both have been widely considered candidates for replacing one-way functions as the minimal assumption for cryptography: the weakest cryptographic assumption implied by all of computational cryptography. Recent work has shown that commitments can be constructed from one-way state generators, but the other direction has remained open. Our results rule out any black-box construction, and thus settle this crucial open problem, suggesting that quantum commitments (as well as its equivalency class of EFI pairs, quantum oblivious transfer, and secure quantum multiparty computation) appear to be strictly weakest among all known cryptographic primitives.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# マルチカラー空間テンソルマージを利用した高調波高調波結合ハイブリッド変圧器ネットワークアーキテクチャ An Enhanced Harmonic Densely Connected Hybrid Transformer Network Architecture for Chronic Wound Segmentation Utilising Multi-Colour Space Tensor Merging ( http://arxiv.org/abs/2410.03359v1 ) ライセンス: Link先を確認	Bill Cassidy, Christian Mcbride, Connah Kendrick, Neil D. Reeves, Joseph M. Pappachan, Cornelius J. Fernandez, Elias Chacko, Raphael Brüngel, Christoph M. Friedrich, Metib Alotaibi, Abdullah Abdulaziz AlWabel, Mohammad Alderwish, Kuan-Ying Lai, Moi Hoon Yap,	(参考訳) 慢性的な傷や合併症は、世界中のクリニックや病院の負担を増大させ続けている。静脈、動脈、糖尿病、圧傷は世界中でますます一般的になりつつある。これらの状態は、感染によって引き起こされる手足の切断や死亡リスクの増加により、感染した人に対する非常に不安定な反感を引き起こす可能性がある。したがって、慢性的な創傷治療において、臨床医を支援する新しい方法が、高品質なケア基準を維持する上で不可欠である。本稿では,ネットワークの初期層にコントラスト除去コンポーネントを統合し,特徴学習を強化する改良型HarDNetセグメンテーションアーキテクチャを提案する。また、マルチカラー空間テンソルマージプロセスを利用し、畳み込みブロックの調和形状を調整し、これらの追加的特徴を容易にする。提案モデルでは,光肌患者の創傷画像を用いてトレーニングを行い,より暗い肌の症例のみからなる2つのテストセット(1セットは真実,もう1セットは不要)でモデルをテストする。主観的評価は, クラス内相関係数を指標とした臨床創傷専門家から得られた。土台真実を含む暗色の音色検定では、Dice類似度係数(+0.1221)と結合(+0.1274)の交叉による改善を実証する。定性分析では, 基準モデルと提案モデルを比較した場合, 3%に改善が認められた。本研究は, 創傷画像のみを訓練したモデルを用いて, 慢性的な創傷セグメント化のための黒色色調に焦点を当てた最初の研究である。糖尿病は、患者がより暗い肌の色合いを持つ国で流行し、そのようなケースにもっと焦点を合わせる必要があることを強調している。また, 慢性的な創傷断裂に対して, 今までで最大の定性的研究を行った。 Chronic wounds and associated complications present ever growing burdens for clinics and hospitals world wide. Venous, arterial, diabetic, and pressure wounds are becoming increasingly common globally. These conditions can result in highly debilitating repercussions for those affected, with limb amputations and increased mortality risk resulting from infection becoming more common. New methods to assist clinicians in chronic wound care are therefore vital to maintain high quality care standards. This paper presents an improved HarDNet segmentation architecture which integrates a contrast-eliminating component in the initial layers of the network to enhance feature learning. We also utilise a multi-colour space tensor merging process and adjust the harmonic shape of the convolution blocks to facilitate these additional features. We train our proposed model using wound images from light-skinned patients and test the model on two test sets (one set with ground truth, and one without) comprising only darker-skinned cases. Subjective ratings are obtained from clinical wound experts with intraclass correlation coefficient used to determine inter-rater reliability. For the dark-skin tone test set with ground truth, we demonstrate improvements in terms of Dice similarity coefficient (+0.1221) and intersection over union (+0.1274). Qualitative analysis showed high expert ratings, with improvements of >3% demonstrated when comparing the baseline model with the proposed model. This paper presents the first study to focus on darker-skin tones for chronic wound segmentation using models trained only on wound images exhibiting lighter skin. Diabetes is highly prevalent in countries where patients have darker skin tones, highlighting the need for a greater focus on such cases. Additionally, we conduct the largest qualitative study to date for chronic wound segmentation.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# スピン-j系のエンタングリングパワー--幾何学的アプローチ Entangling power of spin-j systems: a geometrical approach ( http://arxiv.org/abs/2410.03361v1 ) ライセンス: Link先を確認	Eduardo Serrano-Ensástiga, Diego Morachis Galindo, Jesús A. Maytorena, Chryssomalis Chryssomalakos,	(参考訳) 高いエンタングリング能力を持つユニタリゲートは、そのエンタングリング能力のためにいくつかの量子強化技術に関係している。スピン状態やボゾン系のような対称多粒子系では、粒子交換対称性はこれらのゲートと非絡み合い状態の集合を制限する。本研究では、SU(2)不変量から得られる成分を持つベクトル間の内積として再構成することで、スピン系の絡み合う力を解析する。このアプローチにより、小さなスピンに対して最大化するユニタリゲートの検出を含む、小さなスピン系に対して、この量を研究することができる。極端ユニタリゲートは、ある状態のフシミ関数の凸結合に結びついているのと同様、高い回転対称性を持つ絡み合い分布を示す。さらに、エンタングリングパワーといくつかのスピン状態部分空間で許容されるシュミット数との関係について検討する。このように、ここで提示される幾何学的アプローチは、量子情報理論における他の概念と結びついた絡み合う力を研究するための新しい経路を示唆している。 Unitary gates with high entangling power are relevant for several quantum-enhanced technologies due to their entangling capabilities. For symmetric multiparticle systems, such as spin states or bosonic systems, the particle exchange symmetry restricts these gates and also the set of not-entangled states. In this work, we analyze the entangling power of spin systems by reformulating it as an inner product between vectors with components given by SU(2) invariants. This approach allows us to study this quantity for small-spin systems including the detection of the unitary gate that maximizes it for small spins. We observe that extremal unitary gates exhibit entanglement distributions with high rotational symmetry, same that are linked to a convex combination of Husimi functions of certain states. Furthermore, we explore the connection between entangling power and the Schmidt numbers admissible in some spin state subspaces. Thus, the geometrical approach presented here suggests new paths for studying entangling power linked to other concepts in quantum information theory.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# オンライン非パラメトリック回帰のためのMinimax Adaptive Boosting Minimax Adaptive Boosting for Online Nonparametric Regression ( http://arxiv.org/abs/2410.03363v1 ) ライセンス: Link先を確認	Paul Liautaud, Pierre Gaillard, Olivier Wintenberger,	(参考訳) 一般凸損失を伴う対向的オンライン非パラメトリック回帰の促進について検討した。我々はまず,パラメータフリーなオンライン勾配向上アルゴリズム(OGB)を導入し,その連鎖木への応用により,リプシッツ関数と競合する際の最小限の後悔を実現することを示す。非パラメトリック関数クラスと競合するのは難しいが、ローカルなリプシッツネスのようなローカルなパターンは、オンラインアルゴリズムがパフォーマンスを改善するために活用できる。連鎖木に基づくコアツリーにOGBを適用することにより,異なるリプシッツプロファイルに整合したすべてのプルーニングに対して効率よく競合し,局所正規性への最適依存を示す。その結果,オンライン回帰に対する局所的適応的最適率を持つ最初の計算効率の良いアルゴリズムが,対角的条件下で得られた。 We study boosting for adversarial online nonparametric regression with general convex losses. We first introduce a parameter-free online gradient boosting (OGB) algorithm and show that its application to chaining trees achieves minimax optimal regret when competing against Lipschitz functions. While competing with nonparametric function classes can be challenging, the latter often exhibit local patterns, such as local Lipschitzness, that online algorithms can exploit to improve performance. By applying OGB over a core tree based on chaining trees, our proposed method effectively competes against all prunings that align with different Lipschitz profiles and demonstrates optimal dependence on the local regularities. As a result, we obtain the first computationally efficient algorithm with locally adaptive optimal rates for online regression in an adversarial setting.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# 誤り訂正符号変換器:非統一から統一へ Error Correction Code Transformer: From Non-Unified to Unified ( http://arxiv.org/abs/2410.03364v1 ) ライセンス: Link先を確認	Yongli Yan, Jieao Zhu, Tianyue Zheng, Jiaqi He, Linglong Dai,	(参考訳) チャネル符号化は、現代の無線システムにおいて信頼性の高いデータ伝送に不可欠であり、その重要性は、様々なエラー訂正コードをサポートする必要がある第6世代(6G)ネットワークの出現とともに増大する。しかし、従来のデコーダは特定のデコードアルゴリズムに適した固定ハードウェア回路として設計され、非効率性と柔軟性が制限された。これらの課題に対処するために,Pola,Low-Density Parity-Check(LDPC),Bose-Chaudhuri-Hocquenghem(BCH)など,複数の線形ブロックコードを扱う,コードに依存しないトランスフォーマーベースのデコードアーキテクチャを提案する。これを実現するために、標準化されたユニットが様々なコードタイプにまたがるパラメータを調和させるのに使われ、再設計された統一されたアテンションモジュールは様々なコードワードの構造情報を圧縮する。さらに、パリティチェック行列の空間性から派生したスパースマスクを導入し、情報とパリティチェックビット間の固有の制約を捕捉し、復号精度とロバスト性を向上させる。広範にわたる実験結果から,提案手法は既存の手法に勝るだけでなく,次世代無線通信システムに対して,柔軟性,効率,高性能なソリューションを提供することが示された。 Channel coding is vital for reliable data transmission in modern wireless systems, and its significance will increase with the emergence of sixth-generation (6G) networks, which will need to support various error correction codes. However, traditional decoders were typically designed as fixed hardware circuits tailored to specific decoding algorithms, leading to inefficiencies and limited flexibility. To address these challenges, this paper proposes a unified, code-agnostic Transformer-based decoding architecture capable of handling multiple linear block codes, including Polar, Low-Density Parity-Check (LDPC), and Bose-Chaudhuri-Hocquenghem (BCH), within a single framework. To achieve this, standardized units are employed to harmonize parameters across different code types, while the redesigned unified attention module compresses the structural information of various codewords. Additionally, a sparse mask, derived from the sparsity of the parity-check matrix, is introduced to enhance the model's ability to capture inherent constraints between information and parity-check bits, resulting in improved decoding accuracy and robustness. Extensive experimental results demonstrate that the proposed unified Transformer-based decoder not only outperforms existing methods but also provides a flexible, efficient, and high-performance solution for next-generation wireless communication systems.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# 生成拡散モデルにおける潜在抽象化 Latent Abstractions in Generative Diffusion Models ( http://arxiv.org/abs/2410.03368v1 ) ライセンス: Link先を確認	Giulio Franzese, Mattia Martini, Giulio Corallo, Paolo Papotti, Pietro Michiardi,	(参考訳) 本研究では,拡散に基づく生成モデルが画像などの高次元データをどのように生成するかを,低次元の潜在抽象概念の表現に暗黙的に頼って検討し,生成過程を導く。我々は,NLFを拡張した理論的枠組みを提案し,SDEに基づく生成モデルについて一意に考察する。本理論の進展は, 関節(状態および測定)力学の新たな定式化と, システム状態が測定過程に与える影響の情報理論的尺度に依存する。我々の理論によれば、拡散モデルはSDEのシステムとしてキャストすることができ、観測不可能な遅延抽象の進化が観測可能な測定過程(生成経路に対応する)のダイナミクスを操縦する非線形フィルタを記述することができる。さらに、生成過程の異なる段階における潜伏抽象の出現に関する、我々の理論と過去の経験的結果を検証するための実証的研究を行った。 In this work we study how diffusion-based generative models produce high-dimensional data, such as an image, by implicitly relying on a manifestation of a low-dimensional set of latent abstractions, that guide the generative process. We present a novel theoretical framework that extends NLF, and that offers a unique perspective on SDE-based generative models. The development of our theory relies on a novel formulation of the joint (state and measurement) dynamics, and an information-theoretic measure of the influence of the system state on the measurement process. According to our theory, diffusion models can be cast as a system of SDE, describing a non-linear filter in which the evolution of unobservable latent abstractions steers the dynamics of an observable measurement process (corresponding to the generative pathways). In addition, we present an empirical study to validate our theory and previous empirical results on the emergence of latent abstractions at different stages of the generative process.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# インターバル境界の伝播を再び大きくする Make Interval Bound Propagation great again ( http://arxiv.org/abs/2410.03373v1 ) ライセンス: Link先を確認	Patryk Krukowski, Daniel Wilczak, Jacek Tabor, Anna Bielawska, Przemysław Spurek,	(参考訳) 医療データ分析,自律運転,対人訓練など,実生活に動機づけられたさまざまなシナリオにおいて,我々は,堅牢なディープネットワークに関心を持っている。入力の比較的小さな摂動が出力の劇的な変化(クラスの変更など)を引き起こすことができない場合、ネットワークは堅牢である。これは、NNC(Neural Network Certification)の幅広い分野に該当する。 NNCにおける2つの重要な問題は、与えられた事前訓練されたネットワークの堅牢性を計算する方法と、堅牢なネットワークを構築する方法である。堅牢なネットワークを構築するための一般的なアプローチは、インターバルバウンド・プロパゲーション (Interval Bound Propagation, IBP) である。本報告では,IPPは包装効果に感受性があることから,第1の症例では準最適であることを示す。線形活性化においても、IPPは強い準最適境界を与える。したがって、ラップ効果に免疫的な戦略を用いて最適に近い境界を得る必要がある。我々は、ニューラルネットワークのラップ効果を軽減するために、厳密な計算に特化した2つの古典的なアプローチ、Dubleton ArithmeticとAffine Arithmeticを適用する。これらの手法は線形活性化関数を持つネットワークに対して正確な結果をもたらし、ラップ効果に抵抗する。その結果,IPBよりも最適値に近い値が得られることがわかった。 In various scenarios motivated by real life, such as medical data analysis, autonomous driving, and adversarial training, we are interested in robust deep networks. A network is robust when a relatively small perturbation of the input cannot lead to drastic changes in output (like change of class, etc.). This falls under the broader scope field of Neural Network Certification (NNC). Two crucial problems in NNC are of profound interest to the scientific community: how to calculate the robustness of a given pre-trained network and how to construct robust networks. The common approach to constructing robust networks is Interval Bound Propagation (IBP). This paper demonstrates that IBP is sub-optimal in the first case due to its susceptibility to the wrapping effect. Even for linear activation, IBP gives strongly sub-optimal bounds. Consequently, one should use strategies immune to the wrapping effect to obtain bounds close to optimal ones. We adapt two classical approaches dedicated to strict computations -- Dubleton Arithmetic and Affine Arithmetic -- to mitigate the wrapping effect in neural networks. These techniques yield precise results for networks with linear activation functions, thus resisting the wrapping effect. As a result, we achieve bounds significantly closer to the optimal level than IBPs.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# SoundSignature:どんな音楽が好き? SoundSignature: What Type of Music Do You Like? ( http://arxiv.org/abs/2410.03375v1 ) ライセンス: Link先を確認	Brandon James Carone, Pablo Ripollés,	(参考訳) SoundSignatureは、ユーザーのお気に入りの曲を分析するためにカスタムのOpenAIアシスタントを統合する音楽アプリケーションである。このシステムには最先端の音楽情報検索(MIR)Pythonパッケージが組み込まれており、抽出された音響的・音楽的特徴と、アシスタントのアーティストやバンドに関する広範な知識を組み合わせている。この知識を組み合わせることでSoundSignatureは、新たなIoT of Sounds(IoS)エコシステムのセマンティックオーディオと原則を活用し、MIRとAIを統合して、音楽の音響特性に関するパーソナライズされた洞察をユーザに提供する。ユーザーはチャットボットと対話して、演奏された音響分析と音楽の味との関係についてより深い質問をすることができる。この対話性はアプリケーションを変え、親しみのある曲やお気に入りの曲に関する情報資源としてだけでなく、ユーザーが音楽の特徴、音楽理論、信号処理でよく使われる音響特性、そして音楽の背後にあるアーティストの理解を深めるための教育プラットフォームとしても機能する。一般的なユーザビリティ以外にも、コード認識アルゴリズム(CREMA)、ソース分離アルゴリズム(DEMUCS)、オーディオ・トゥ・MIDIコンバータ(基本ピッチ)など、確立されたオープンソースのミュージシャン固有のツールが組み込まれている。これらの機能は、コーディングスキルのないユーザが、チャットボットと対話することで、高度なオープンソースの音楽処理アルゴリズムにアクセスできるようにする。本稿では,アプリケーションの革新的な特徴と教育的可能性を強調し,その有効性とユーザビリティを評価するパイロットユーザ研究から得られた知見を紹介する。 SoundSignature is a music application that integrates a custom OpenAI Assistant to analyze users' favorite songs. The system incorporates state-of-the-art Music Information Retrieval (MIR) Python packages to combine extracted acoustic/musical features with the assistant's extensive knowledge of the artists and bands. Capitalizing on this combined knowledge, SoundSignature leverages semantic audio and principles from the emerging Internet of Sounds (IoS) ecosystem, integrating MIR with AI to provide users with personalized insights into the acoustic properties of their music, akin to a musical preference personality report. Users can then interact with the chatbot to explore deeper inquiries about the acoustic analyses performed and how they relate to their musical taste. This interactivity transforms the application, acting not only as an informative resource about familiar and/or favorite songs, but also as an educational platform that enables users to deepen their understanding of musical features, music theory, acoustic properties commonly used in signal processing, and the artists behind the music. Beyond general usability, the application also incorporates several well-established open-source musician-specific tools, such as a chord recognition algorithm (CREMA), a source separation algorithm (DEMUCS), and an audio-to-MIDI converter (basic-pitch). These features allow users without coding skills to access advanced, open-source music processing algorithms simply by interacting with the chatbot (e.g., can you give me the stems of this song?). In this paper, we highlight the application's innovative features and educational potential, and present findings from a pilot user study that evaluates its efficacy and usability.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# ベクトル量子化による深部強化学習における逆方向摂動の緩和 Mitigating Adversarial Perturbations for Deep Reinforcement Learning via Vector Quantization ( http://arxiv.org/abs/2410.03376v1 ) ライセンス: Link先を確認	Tung M. Luu, Thanh Nguyen, Tee Joshua Tian Jin, Sungwoon Kim, Chang D. Yoo,	(参考訳) 近年の研究では、訓練における優れた強化学習(RL)エージェントは、デプロイメント中に敵の摂動に対してレジリエンスを欠いていることが示されている。これは、現実世界にデプロイする前に堅牢なエージェントを構築することの重要性を強調している。従来の作業は、ディープニューラルネットワークコンポーネント自体の堅牢性の向上や、強力な攻撃に対するエージェントの対角的なトレーニングなど、この問題に対処するための堅牢なトレーニングベースの手順の開発に重点を置いていた。そこで本研究では,RLの入力変換に基づくディフェンスについて検討する。具体的には、ベクトル量子化(VQ)の変種を入力観測の変換として使用し、テスト中の敵攻撃の空間を削減し、その結果、変換された観測は攻撃の影響を受けない。本手法は, 計算効率が高く, 対人訓練とシームレスに統合され, 対人攻撃に対するRLエージェントの堅牢性をさらに向上する。複数の環境における広範囲な実験を通して、VQを入力変換として使用すると、エージェントの観察に対する敵の攻撃に対して効果的に防御できることを示した。 Recent studies reveal that well-performing reinforcement learning (RL) agents in training often lack resilience against adversarial perturbations during deployment. This highlights the importance of building a robust agent before deploying it in the real world. Most prior works focus on developing robust training-based procedures to tackle this problem, including enhancing the robustness of the deep neural network component itself or adversarially training the agent on strong attacks. In this work, we instead study an input transformation-based defense for RL. Specifically, we propose using a variant of vector quantization (VQ) as a transformation for input observations, which is then used to reduce the space of adversarial attacks during testing, resulting in the transformed observations being less affected by attacks. Our method is computationally efficient and seamlessly integrates with adversarial training, further enhancing the robustness of RL agents against adversarial attacks. Through extensive experiments in multiple environments, we demonstrate that using VQ as the input transformation effectively defends against adversarial attacks on the agent's observations.	翻訳日:2024-11-02 22:48:52 公開日:2024-10-04
# 因果微分ネットワークによる摂動目標予測 Predicting perturbation targets with causal differential networks ( http://arxiv.org/abs/2410.03380v1 ) ライセンス: Link先を確認	Menghua Wu, Umesh Padia, Sean H. Murphy, Regina Barzilay, Tommi Jaakkola,	(参考訳) 生物学的システムの変更に関与する変数を相対的に同定することで、病気の理解や細胞工学における無数の応用が可能になる。因果関係の観点から、同じ因果関係モデルによって生成された2つのデータセット、観察的(制御)と介入的(摂動)が与えられる。目的は、介入の標的である測定変数(eg遺伝子)のサブセットを分離することである。因果グラフを知ることは、探索空間を制限し、これらの変数を効率的に特定することを可能にする。しかしながら、未知の介入目標が存在する場合、因果グラフを推定する現在のアルゴリズムは、グラフの組合せ空間と一貫した介入目標を共同で探索する必要があるため、生物学的データ中の数百から数千の変数に不適切にスケールする。本研究では,2つの探索ステップを分離する摂動目標の予測に因果性に着想を得たアプローチを提案する。まず, 因果グラフを観察データと介入データから別々に推定するために, 償却因果探索モデルを用いる。そして、これらのペアグラフを、教師付き学習フレームワークにおいて、介入された変数の集合にマッピングすることを学ぶ。このアプローチは、数千の変数を持つ7つのシングルセルトランスクリプトミクスデータセットの摂動モデリングのベースラインを一貫して上回る。また、6つの因果探索アルゴリズムに対して、様々な抽出可能な合成データセットの介入目標を予測することで、大幅な改善を示す。 Rationally identifying variables responsible for changes to a biological system can enable myriad applications in disease understanding and cell engineering. From a causality perspective, we are given two datasets generated by the same causal model, one observational (control) and one interventional (perturbed). The goal is to isolate the subset of measured variables (e.g. genes) that were the targets of the intervention, i.e. those whose conditional independencies have changed. Knowing the causal graph would limit the search space, allowing us to efficiently pinpoint these variables. However, current algorithms that infer causal graphs in the presence of unknown intervention targets scale poorly to the hundreds or thousands of variables in biological data, as they must jointly search the combinatorial spaces of graphs and consistent intervention targets. In this work, we propose a causality-inspired approach for predicting perturbation targets that decouples the two search steps. First, we use an amortized causal discovery model to separately infer causal graphs from the observational and interventional datasets. Then, we learn to map these paired graphs to the sets of variables that were intervened upon, in a supervised learning framework. This approach consistently outperforms baselines for perturbation modeling on seven single-cell transcriptomics datasets, each with thousands of measured variables. We also demonstrate significant improvements over six causal discovery algorithms in predicting intervention targets across a variety of tractable, synthetic datasets.	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# マシン内のcogs、それがすべきことをする -- WMT24の一般翻訳タスクへのAMIサブミッション Cogs in a Machine, Doing What They're Meant to Do -- The AMI Submission to the WMT24 General Translation Task ( http://arxiv.org/abs/2410.03381v1 ) ライセンス: Link先を確認	Atli Jasonarson, Hinrik Hafsteinsson, Bjarki Ármannsson, Steinþór Steingrímsson,	(参考訳) 本稿では,WMT24の一般翻訳課題に対する「アルニ・マグヌッソン研究所のチーム」の提出について述べる。我々は、英語からアイスランド語への翻訳の方向について研究している。本システムは4つの翻訳モデルと文法補正モデルから構成される。モデルのトレーニングには、データセットを慎重にキュレートし、システムの出力の品質に有害な可能性のある文ペアを積極的にフィルタリングします。データのいくつかは人間の翻訳から収集され、いくつかは人工的に生成される。合成データの一部がLLMを用いて生成され,システムの翻訳能力が著しく向上することがわかった。 This paper presents the submission of the \'Arni Magnusson Institute's team to the WMT24 General translation task. We work on the English->Icelandic translation direction. Our system comprises four translation models and a grammar correction model. For training our models we carefully curate our datasets, aggressively filtering out sentence pairs that may detrimentally affect the quality of our system's output. Some of our data are collected from human translations and some are synthetically generated. A part of the synthetic data is generated using an LLM, and we find that it increases the translation capability of our system significantly.	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# ポッツ量子スピン鎖上の閉じ込めと偽真空崩壊 Confinement and false vacuum decay on the Potts quantum spin chain ( http://arxiv.org/abs/2410.03382v1 ) ライセンス: Link先を確認	Octavio Pomponio, Anna Krasznai, Gábor Takács,	(参考訳) 強磁性状態における混合体三状態ポッツ量子鎖における量子クエンチ後の非平衡ダイナミクスを考察する。イジングスピン鎖の類似した設定と比較すると、ポッツモデルはよりリッチな現象論を持ち、これは部分的にスペクトルのバリオン励起から始まり、部分的には初期磁化と長手磁場の様々な相対的アライメントから生じる。半古典的近似と正確な対角化を組み合わせることで励起スペクトルを求め、その結果を用いて観察する様々な動的挙動を説明する。ダイナミックな閉じ込めの回復に加え、イジング連鎖に似たブロッホ振動によるワニエ・スタークの局在は、クエンチ分光におけるバリオン励起の存在を特徴としている。さらに、初期磁化と長手磁場が不一致の場合には、閉じ込めとブロッホ振動の両方が部分的な局所化をもたらすだけであり、いくつかの相関は、エンタングルメントエントロピーの対応する成長とともに、抑制されていない光円錐挙動を保持する。 We consider non-equilibrium dynamics after quantum quenches in the mixed-field three-state Potts quantum chain in the ferromagnetic regime. Compared to the analogous setting for the Ising spin chain, the Potts model has a much richer phenomenology, which originates partly from baryonic excitations in the spectrum and partly from the various possible relative alignments of the initial magnetisation and the longitudinal field. We obtain the excitation spectrum by combining semi-classical approximation and exact diagonalisation, and we use the results to explain the various dynamical behaviours we observe. Besides recovering dynamical confinement, as well as Wannier-Stark localisation due to Bloch oscillations similar to the Ising chain, a novel feature is the presence of baryonic excitations in the quench spectroscopy. In addition, when the initial magnetisation and the longitudinal field are misaligned, both confinement and Bloch oscillations only result in partial localisation, with some correlations retaining an unsuppressed light-cone behaviour together with a corresponding growth of entanglement entropy.	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# てんかん発作の分類から検出へ:脳波信号の深層学習に基づくアプローチ From Epilepsy Seizures Classification to Detection: A Deep Learning-based Approach for Raw EEG Signals ( http://arxiv.org/abs/2410.03385v1 ) ライセンス: Link先を確認	Davy Darankoum, Manon Villalba, Clelia Allioux, Baptiste Caraballo, Carine Dumont, Eloise Gronlier, Corinne Roucard, Yann Roche, Chloe Habermacher, Sergei Grudinin, Julien Volle,	(参考訳) てんかんは世界で最も多い神経疾患である。間質性側頭葉てんかん(MTLE)の3分の1は薬剤耐性を示し、新しい治療の必要性を訴えている。抗敗血症薬(ASM)開発における重要な役割は、脳波(EEG)信号で発生するてんかんを検出・定量する能力であり、治療効果の評価に欠かせない。本研究では,脳波信号に適用した深層学習モデルに基づく発作検出パイプラインを提案する。このパイプラインは、発作と発作のない活動を事前に区別せずに、連続した生の脳波信号をセグメント化する新しい前処理技術、脳波のセグメントを再編成し、発作の開始/終了の識別を可能にする後処理アルゴリズム、そして最後に、予測されたラベルと実際のラベルの厳密な発作イベントの比較に基づく新しい評価手順を統合する。データ漏洩の可能性に対処するデータ分割戦略を使用して、モデルトレーニングが実施されている。発作分類と発作検出タスクの基本的な相違を実証し,2つのタスクのパフォーマンスの相違を示した。最後に、畳み込みニューラルネットワークとトランスフォーマーエンコーダを組み合わせて、最高のアーキテクチャの種間での一般化能力を実証した。モデルは動物の脳波でトレーニングされ、バランスの取れたボンデータセット上でF1スコアの93%で人間の脳波でテストされた。 Epilepsy represents the most prevalent neurological disease in the world. One-third of people suffering from mesial temporal lobe epilepsy (MTLE) exhibit drug resistance, urging the need to develop new treatments. A key part in anti-seizure medication (ASM) development is the capability of detecting and quantifying epileptic seizures occurring in electroencephalogram (EEG) signals, which is crucial for treatment efficacy evaluation. In this study, we introduced a seizure detection pipeline based on deep learning models applied to raw EEG signals. This pipeline integrates: a new pre-processing technique which segments continuous raw EEG signals without prior distinction between seizure and seizure-free activities; a post-processing algorithm developed to reassemble EEG segments and allow the identification of seizures start/end; and finally, a new evaluation procedure based on a strict seizure events comparison between predicted and real labels. Models training have been performed using a data splitting strategy which addresses the potential for data leakage. We demonstrated the fundamental differences between a seizure classification and a seizure detection task and showed the differences in performance between the two tasks. Finally, we demonstrated the generalization capabilities across species of our best architecture, combining a Convolutional Neural Network and a Transformer encoder. The model was trained on animal EEGs and tested on human EEGs with a F1-score of 93% on a balanced Bonn dataset.	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# 行動データを用いた慢性疾患診断 Chronic Disease Diagnoses Using Behavioral Data ( http://arxiv.org/abs/2410.03386v1 ) ライセンス: Link先を確認	Di Wang, Yidan Hu, Eng Sing Lee, Hui Hwang Teong, Ray Tian Rui Lai, Wai Han Hoi, Chunyan Miao,	(参考訳) 慢性疾患の早期発見は、タイムリーな介入の黄金の機会を提供することによって、医療にとって有益である。多くの先行研究は、疾患の診断に機械学習(ML)モデルを使うことに成功しているが、医療データに大きく依存しており、慢性疾患の初期段階のほとんどの患者には不十分である。本稿では, 糖尿病, 高脂血症, 高血圧症(総称して3H) の診断を, 臨床現場で収集した医療データを用いることなく早期に3Hの検出を可能にすることを目的とする。具体的には、3ヶ月の学習期間で629人の被験者から毎日の行動データを収集し、データ前処理後のさまざまなMLモデルを訓練した。実験の結果, 糖尿病, 高脂血症, 高血圧の3H診断は, それぞれ80.2\%, 71.3\%, 81.2\%であった。さらに、訓練されたモデル上でShapley分析を行い、疾患の種類ごとに最も影響のある特徴を特定する。特定された影響力のある特徴は、文献で報告された特徴と一致している。 Early detection of chronic diseases is beneficial to healthcare by providing a golden opportunity for timely interventions. Although numerous prior studies have successfully used machine learning (ML) models for disease diagnoses, they highly rely on medical data, which are scarce for most patients in the early stage of the chronic diseases. In this paper, we aim to diagnose hyperglycemia (diabetes), hyperlipidemia, and hypertension (collectively known as 3H) using own collected behavioral data, thus, enable the early detection of 3H without using medical data collected in clinical settings. Specifically, we collected daily behavioral data from 629 participants over a 3-month study period, and trained various ML models after data preprocessing. Experimental results show that only using the participants' uploaded behavioral data, we can achieve accurate 3H diagnoses: 80.2\%, 71.3\%, and 81.2\% for diabetes, hyperlipidemia, and hypertension, respectively. Furthermore, we conduct Shapley analysis on the trained models to identify the most influential features for each type of diseases. The identified influential features are consistent with those reported in the literature.	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# イオン輸送における量子効果:熱力学資源論のアプローチ Quantum Effects in Ion Transport: A Thermodynamic Resource Theory Approach ( http://arxiv.org/abs/2410.03389v1 ) ライセンス: Link先を確認	Amin Mohammadi, Afshin Shafiee,	(参考訳) 近年、量子状態における熱力学の理解は、ナノスケール物理学や実験技術の発展によって大きな注目を集めている。並行して、成長する証拠は、様々な生物学的プロセスにおける量子効果の重要性を支持し、量子熱力学に関係がますます高まっている。本研究では、熱力学の資源理論を応用し、細胞膜を横断するイオン輸送における量子的性質の役割を解明する。この枠組みの中で、量子的性質は、量子状態における一般化熱力学的制約の下で資源として扱われる。具体的には、イオン輸送力学におけるメモリ効果を反映する非マルコビアン性は、イオン輸送過程の収量と効率を高める重要な量子資源として機能することを示した。対照的に、イオン輸送タンパク質のエネルギー状態の重畳として表される量子コヒーレンスは、これらの指標を減らし、イオンチャネルとイオンポンプを区別する重要な役割を担っている。最後に、余剰コヒーレントシステムを導入することで、コヒーレンスによりイオンポンプのイオンチャネルへの変換が容易になることを示す。 In recent years, understanding thermodynamics in the quantum regime has garnered significant attention, driven by advances in nanoscale physics and experimental techniques. In parallel, growing evidence supports the importance of quantum effects in various biological processes, making them increasingly relevant to quantum thermodynamics. In this study, we apply resource theory formulations of thermodynamics to investigate the role of quantum properties in ion transport across cell membranes. Within this framework, quantum properties are treated as resources under generalized thermodynamic constraints in the quantum regime. Specifically, our findings reveal that non-Markovianity, which reflects memory effects in ion transport dynamics, serves as a key quantum resource that enhances the yield and efficiency of the ion transport process. In contrast, quantum coherence, manifested as the superposition of energy states in ion-transport proteins, reduces these metrics but plays a crucial role in distinguishing between ion channels and ion pumps: two distinct types of ion-transport proteins in cell membranes. Finally, we demonstrate that introducing an additional coherent system allows coherence to facilitate the transformation of an ion pump into an ion channel.	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# Lightning UQ Box: ディープラーニングにおける不確実性定量化のための総合的なフレームワーク Lightning UQ Box: A Comprehensive Framework for Uncertainty Quantification in Deep Learning ( http://arxiv.org/abs/2410.03390v1 ) ライセンス: Link先を確認	Nils Lehmann, Jakob Gawlikowski, Adam J. Stewart, Vytautas Jancauskas, Stefan Depeweg, Eric Nalisnick, Nina Maria Gottschling,	(参考訳) 不確実性定量化(英: Uncertainty Quantification、UQ)は、ディープニューラルネットワーク(DNN)を現実世界のタスクに適用するための重要なツールである。しかし、その利点にもかかわらず、UQは既存のUQ手順を適用し評価するために必要な追加の技術知識のため、標準のDNNワークフローから外されることが多い。したがって、ユーザーは大きなオーバーヘッドを伴わずに、UQをモデリングワークフローに統合できる包括的なツールボックスが必要である。本稿では,UQ に対する様々なアプローチを適用し評価するための統一インターフェースである \texttt{Lightning UQ Box を紹介する。本稿では,ツールボックスに実装された最先端のUQ手法を理論的,定量的に比較する。私たちは2つの挑戦的なビジョンタスクに焦点を合わせます。一赤外線衛星画像から熱帯低気圧風速の推定 (II)空のRGB画像から太陽電池パネルの出力を推定する。方法の違いを強調することで、我々の結果は、UQメソッドのベンチマークに使用できる、広範かつアプローチ可能なUQの実験フレームワークの必要性を示しています。ツールボックス、実装例、その他の情報は、https://github.com/lightning-uq-box/lightning-uq-boxで確認できる。 Uncertainty quantification (UQ) is an essential tool for applying deep neural networks (DNNs) to real world tasks, as it attaches a degree of confidence to DNN outputs. However, despite its benefits, UQ is often left out of the standard DNN workflow due to the additional technical knowledge required to apply and evaluate existing UQ procedures. Hence there is a need for a comprehensive toolbox that allows the user to integrate UQ into their modelling workflow, without significant overhead. We introduce \texttt{Lightning UQ Box}: a unified interface for applying and evaluating various approaches to UQ. In this paper, we provide a theoretical and quantitative comparison of the wide range of state-of-the-art UQ methods implemented in our toolbox. We focus on two challenging vision tasks: (i) estimating tropical cyclone wind speeds from infrared satellite imagery and (ii) estimating the power output of solar panels from RGB images of the sky. By highlighting the differences between methods our results demonstrate the need for a broad and approachable experimental framework for UQ, that can be used for benchmarking UQ methods. The toolbox, example implementations, and further information are available at: https://github.com/lightning-uq-box/lightning-uq-box	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# 線形偏光に対する量子回路の複雑さ Quantum circuit complexity for linearly polarised light ( http://arxiv.org/abs/2410.03391v1 ) ライセンス: Link先を確認	E. M. F. Curado, S. Faci, J. P. Gazeau, T. Koide, A. C. Maioli, D. Noguera,	(参考訳) 本研究では,オープンシステムに拡張する量子回路の複雑性の形式について検討する。この方法論を説明するために、状態の射影ヒルベルト空間がユークリッド平面の向きの集合によって描写される基本モデルに焦点を当てる。具体的には、混合量子状態がゲート列と相互作用する際のダイナミクスについて検討する。提案手法では, 実数 2 の密度行列の列を解析する。この数学的モデルは、準単色光線の線形偏光を規定するストークス密度行列と、量子偏光子と見なされるゲートによって物理的に例示される。偏光-直線偏光間の相互作用は、この量子形式論の文脈内で解釈される。光の密度行列は、連続ゲート間の時間間隔で、ゴリニ-コサコフスキー-リンドブラッド-スダルシャン過程(GKLS)に類似したアプローチで進化する。特に、コスト関数や寛容性、精度の上限を考えると、最適なゲート数と電力-法則の関係が従うことが分かる。 In this study, we explore a form of quantum circuit complexity that extends to open systems. To illustrate our methodology, we focus on a basic model where the projective Hilbert space of states is depicted by the set of orientations in the Euclidean plane. Specifically, we investigate the dynamics of mixed quantum states as they undergo interactions with a sequence of gates. Our approach involves the analysis of sequences of real $2\times2$ density matrices. This mathematical model is physically exemplified by the Stokes density matrices, which delineate the linear polarisation of a quasi-monochromatic light beam, and the gates, which are viewed as quantum polarisers, whose states are also real $2\times2$ density matrices. The interaction between polariser-linearly polarised light is construed within the context of this quantum formalism. Each density matrix for the light evolves in an approach analogous to a Gorini-Kossakowski-Lindblad-Sudarshan (GKLS) process during the time interval between consecutive gates. Notably, when considering an upper limit for the cost function or tolerance or accuracy, we unearth that the optimal number of gates follows a power-law relationship.	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# 2羽のハエを1羽の石で殺す: 英語の「Icelandic Idioms」と「Proper Names」を使ってLSMを壊そうとする試み Killing Two Flies with One Stone: An Attempt to Break LLMs Using English->Icelandic Idioms and Proper Names ( http://arxiv.org/abs/2410.03394v1 ) ライセンス: Link先を確認	Bjarki Ármannsson, Hinrik Hafsteinsson, Atli Jasonarson, Steinþór Steingrímsson,	(参考訳) 本稿では,WMT24テストスイートのサブタスクに,'Arni Magn\'usson Institute'sのチームが参加し,英語からアイスランド語への翻訳方向の慣用的な表現と適切な名前に焦点をあてる。直感的にも経験的にも、慣用句や固有名称は現代の翻訳モデルにとって重要な課題であることが知られている。 2つの異なるテストスイートを作成します。 1つ目は、一般的な英語の慣用表現を翻訳する際のMTシステムの能力を評価し、また、リテラル文脈で使用する場合、それらの表現と同一のフレーズを区別できるかどうかをテストする。第2のテストスイートはアイスランド語の異名に翻訳されるべき地名と、男性と女性の間の表面的な形態を共有するアイスランド語の2つの名前からなるため、誤った翻訳が読みやすさに影響を及ぼす。報告されたスコアは比較的低く、特に慣用的な表現や地名についてであり、改善の余地がかなりあることを示している。 This paper presents the submission of the \'Arni Magn\'usson Institute's team to the WMT24 test suite subtask, focusing on idiomatic expressions and proper names for the English->Icelandic translation direction. Intuitively and empirically, idioms and proper names are known to be a significant challenge for modern translation models. We create two different test suites. The first evaluates the competency of MT systems in translating common English idiomatic expressions, as well as testing whether systems can distinguish between those expressions and the same phrases when used in a literal context. The second test suite consists of place names that should be translated into their Icelandic exonyms (and correctly inflected) and pairs of Icelandic names that share a surface form between the male and female variants, so that incorrect translations impact meaning as well as readability. The scores reported are relatively low, especially for idiomatic expressions and place names, and indicate considerable room for improvement.	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# GraphCroc: グラフ構造再構築のための相互相関オートエンコーダ GraphCroc: Cross-Correlation Autoencoder for Graph Structural Reconstruction ( http://arxiv.org/abs/2410.03396v1 ) ライセンス: Link先を確認	Shijin Duan, Ruyi Ding, Jiaxing He, Aidong Adam Ding, Yunsi Fei, Xiaolin Xu,	(参考訳) グラフ構造化データは、多くのアプリケーションに不可欠なものであり、様々なグラフ表現法の開発を促す。グラフオートエンコーダ(GAE)、特にノード埋め込みからグラフ構造を再構築する。現在のGAEモデルは、主に自己相関を利用してグラフ構造を表現し、しばしばマルチグラフシナリオを見下ろすノードレベルのタスクに焦点を当てている。我々の理論的分析は、一般に島、対称構造、方向エッジといった特定のグラフの特徴、特により小さいグラフや複数のグラフの文脈において正確に表現できないことを示唆している。これらの制約に対処するために,GAE表現能力を著しく向上する相互相関機構を導入する。さらに,様々な下流タスクに適したフレキシブルエンコーダアーキテクチャをサポートする新しいGAEであるGraphCrocを提案する。このモデルは、損失分散戦略を実装することにより、最適化中の表現バイアスの課題にも対処する。理論的解析と数値評価の両方で、我々の手法はグラフ構造再構築において既存の自己相関に基づくGAEよりも著しく優れていることが示されている。 Graph-structured data is integral to many applications, prompting the development of various graph representation methods. Graph autoencoders (GAEs), in particular, reconstruct graph structures from node embeddings. Current GAE models primarily utilize self-correlation to represent graph structures and focus on node-level tasks, often overlooking multi-graph scenarios. Our theoretical analysis indicates that self-correlation generally falls short in accurately representing specific graph features such as islands, symmetrical structures, and directional edges, particularly in smaller or multiple graph contexts. To address these limitations, we introduce a cross-correlation mechanism that significantly enhances the GAE representational capabilities. Additionally, we propose GraphCroc, a new GAE that supports flexible encoder architectures tailored for various downstream tasks and ensures robust structural reconstruction, through a mirrored encoding-decoding process. This model also tackles the challenge of representation bias during optimization by implementing a loss-balancing strategy. Both theoretical analysis and numerical evaluations demonstrate that our methodology significantly outperforms existing self-correlation-based GAEs in graph structure reconstruction.	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# EBES: イベントシーケンスのベンチマークを容易にする EBES: Easy Benchmarking for Event Sequences ( http://arxiv.org/abs/2410.03399v1 ) ライセンス: Link先を確認	Dmitry Osin, Igor Udovichenko, Viktor Moskvoretskii, Egor Shvetsov, Evgeny Burnaev,	(参考訳) イベントシーケンスは、不規則なサンプリング間隔とカテゴリと数値の混合によって特徴づけられ、医療、ファイナンス、ユーザーインタラクションログといった様々な現実世界のドメインで一般的なデータ構造である。時間データモデリング技術の進歩にもかかわらず、イベントシーケンスのパフォーマンスを評価するための標準ベンチマークは存在しない。これは、様々な評価プロトコルによって異なる論文間での結果の比較を複雑にし、この分野の進歩を誤解させる可能性がある。本稿では,標準化された評価シナリオとプロトコルを備えた総合的なベンチマークツールEBESを紹介する。私たちのライブラリは、統一インターフェースによるベンチマーク、データセットの追加、メソッド統合を簡単にします。それは、新しい合成データセットを含み、公開可能な最大の銀行用データセットを含む、前処理された現実世界のデータセットを提供する。この結果から,データセットの詳細な分析を行い,モデル比較に不適なデータセットを同定した。本稿では、時間的およびシーケンシャルなコンポーネントのモデリングの重要性と、モデルの堅牢性とスケーリング特性について考察する。これらの知見は今後の研究の方向性を浮き彫りにしている。本ベンチマークの目的は,再現可能な研究の促進,進歩の迅速化,実環境への影響の増大である。 Event sequences, characterized by irregular sampling intervals and a mix of categorical and numerical features, are common data structures in various real-world domains such as healthcare, finance, and user interaction logs. Despite advances in temporal data modeling techniques, there is no standardized benchmarks for evaluating their performance on event sequences. This complicates result comparison across different papers due to varying evaluation protocols, potentially misleading progress in this field. We introduce EBES, a comprehensive benchmarking tool with standardized evaluation scenarios and protocols, focusing on regression and classification problems with sequence-level targets. Our library simplifies benchmarking, dataset addition, and method integration through a unified interface. It includes a novel synthetic dataset and provides preprocessed real-world datasets, including the largest publicly available banking dataset. Our results provide an in-depth analysis of datasets, identifying some as unsuitable for model comparison. We investigate the importance of modeling temporal and sequential components, as well as the robustness and scaling properties of the models. These findings highlight potential directions for future research. Our benchmark aim is to facilitate reproducible research, expediting progress and increasing real-world impacts.	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# 分散ネットワーク型マルチタスク学習 Distributed Networked Multi-task Learning ( http://arxiv.org/abs/2410.03403v1 ) ライセンス: Link先を確認	Lingzhou Hong, Alfredo Garcia,	(参考訳) 異種および/または相関したデータストリームを含む複数の線形モデル推定タスクを考慮に入れた分散マルチタスク学習方式を提案する。ノードを異なる学習タスクに対応するグループに分割し、有向ネットワークトポロジに従って通信することができると仮定する。各ノードは、線形モデルを非同期に推定し、それぞれ雑音低減と一般化性能の向上を目的とした局所(グループ内)正則化と大域(グループ間)正則化の条件を満たす。本稿では,推定器の収束度とタスク関係を有限時間で評価し,ランダム場温度推定と,異なる学区の学生のパフォーマンスのモデル化という2つの例において,スキームの一般適用性を説明する。 We consider a distributed multi-task learning scheme that accounts for multiple linear model estimation tasks with heterogeneous and/or correlated data streams. We assume that nodes can be partitioned into groups corresponding to different learning tasks and communicate according to a directed network topology. Each node estimates a linear model asynchronously and is subject to local (within-group) regularization and global (across groups) regularization terms targeting noise reduction and generalization performance improvement respectively. We provide a finite-time characterization of convergence of the estimators and task relation and illustrate the scheme's general applicability in two examples: random field temperature estimation and modeling student performance from different academic districts.	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# Camel: 差別的プライバシのシャッフルモデルにおけるコミュニケーション効率が高く悪意のあるフェデレーション学習 Camel: Communication-Efficient and Maliciously Secure Federated Learning in the Shuffle Model of Differential Privacy ( http://arxiv.org/abs/2410.03407v1 ) ライセンス: Link先を確認	Shuangqing Xu, Yifeng Zheng, Zhongyun Hua,	(参考訳) フェデレートラーニング(FL)は、複数のクライアントが、ローカルなプライベートデータを公開することなく、集約のための勾配更新のみを共有することで、モデルを共同でトレーニングすることのできる、魅力的なパラダイムとして急速に成長しています。プライバシーに敏感な勾配更新を保護するため、正式なプライバシー保証を提供するために、ローカル差分プライバシー(LDP)メカニズムの研究が続けられている。 LDPメカニズムでは、クライアントは集約のために共有する前に、勾配更新をローカルに中断する。しかし、そのような手法は、重音の付加のため、モデルユーティリティーを著しく劣化させることで知られている。より優れたプライバシユーティリティトレードオフを実現するために、最近のトレンドは、プライバシーの増幅を実現するために、摂動勾配更新の中間シャッフル操作に依存する、FLにおけるDPのシャッフルモデルを適用することである。本稿では,DP のシャッフルモデルにおける新しい通信効率と悪意のあるセキュアな FL フレームワークである Camel について述べる。 Camelはまず、シャッフル計算の整合性チェックを野心的にサポートし、悪意のある敵に対するセキュリティを達成することで、既存の作業から脱却する。具体的には、シークレット共有シャッフルのトレンドとなる暗号プリミティブに基づいて、システム全体の通信効率の最適化と、サーバ側の計算のセキュリティを強化するための軽量な整合性チェックのためのカスタム技術を開発した。さらに、FLプロセス全体のRenyi差分プライバシー(RDP)を分析することにより、プライバシー損失をはるかに厳しくする。大規模な実験により、Camelは最先端の作業よりも優れたプライバシーとユーティリティのトレードオフを実現し、有望なパフォーマンスを実現している。 Federated learning (FL) has rapidly become a compelling paradigm that enables multiple clients to jointly train a model by sharing only gradient updates for aggregation, without revealing their local private data. In order to protect the gradient updates which could also be privacy-sensitive, there has been a line of work studying local differential privacy (LDP) mechanisms to provide a formal privacy guarantee. With LDP mechanisms, clients locally perturb their gradient updates before sharing them out for aggregation. However, such approaches are known for greatly degrading the model utility, due to heavy noise addition. To enable a better privacy-utility tradeoff, a recently emerging trend is to apply the shuffle model of DP in FL, which relies on an intermediate shuffling operation on the perturbed gradient updates to achieve privacy amplification. Following this trend, in this paper, we present Camel, a new communication-efficient and maliciously secure FL framework in the shuffle model of DP. Camel first departs from existing works by ambitiously supporting integrity check for the shuffle computation, achieving security against malicious adversary. Specifically, Camel builds on the trending cryptographic primitive of secret-shared shuffle, with custom techniques we develop for optimizing system-wide communication efficiency, and for lightweight integrity checks to harden the security of server-side computation. In addition, we also derive a significantly tighter bound on the privacy loss through analyzing the Renyi differential privacy (RDP) of the overall FL process. Extensive experiments demonstrate that Camel achieves better privacy-utility trade-offs than the state-of-the-art work, with promising performance.	翻訳日:2024-11-02 22:39:00 公開日:2024-10-04
# 決定変換器の予測符号化 Predictive Coding for Decision Transformer ( http://arxiv.org/abs/2410.03408v1 ) ライセンス: Link先を確認	Tung M. Luu, Donghoon Lee, Chang D. Yoo,	(参考訳) オフライン強化学習(RL)における最近の研究は、リターン条件付き教師付き学習として意思決定を定式化する効果を実証している。特に、決定変換器(DT)アーキテクチャは、様々な領域で約束されている。しかし、初期の成功にもかかわらず、DTはゴール条件付きRLのいくつかの挑戦的なデータセットでは性能が劣っている。この制限は、特に非構造的、最適でないデータセットにおいて、政策学習を導くためのリターン条件付けの非効率性に起因し、DTは時間的構成性を効果的に学習することができない。さらに、この問題は長期のスパース・リワードタスクでさらに悪化する可能性がある。この課題に対処するために、一般化された将来の条件付けを活用してDT手法を強化するPCDT(Predictive Coding for Decision Transformer)フレームワークを提案する。 PCDTはDTフレームワークを拡張し、予測的なコーディングを条件に、過去と未来の両方の要因に基づいた意思決定を可能にし、一般化を改善するアーキテクチャを利用する。提案手法は,AntMaze環境とFrankaKitchen環境の8つのデータセットに対する広範な実験を通じて,オフラインゴール条件RLにおける既存の値ベースおよびトランスフォーマーベースの手法に匹敵する性能を実現する。さらに,本手法を物理ロボットを用いた目標達成作業でも評価する。 Recent work in offline reinforcement learning (RL) has demonstrated the effectiveness of formulating decision-making as return-conditioned supervised learning. Notably, the decision transformer (DT) architecture has shown promise across various domains. However, despite its initial success, DTs have underperformed on several challenging datasets in goal-conditioned RL. This limitation stems from the inefficiency of return conditioning for guiding policy learning, particularly in unstructured and suboptimal datasets, resulting in DTs failing to effectively learn temporal compositionality. Moreover, this problem might be further exacerbated in long-horizon sparse-reward tasks. To address this challenge, we propose the Predictive Coding for Decision Transformer (PCDT) framework, which leverages generalized future conditioning to enhance DT methods. PCDT utilizes an architecture that extends the DT framework, conditioned on predictive codings, enabling decision-making based on both past and future factors, thereby improving generalization. Through extensive experiments on eight datasets from the AntMaze and FrankaKitchen environments, our proposed method achieves performance on par with or surpassing existing popular value-based and transformer-based methods in offline goal-conditioned RL. Furthermore, we also evaluate our method on a goal-reaching task with a physical robot.	翻訳日:2024-11-02 22:29:14 公開日:2024-10-04
# 代理に基づくヒューリスティック最適化のための回帰対ペアワイズモデルの比較研究 Comparative study of regression vs pairwise models for surrogate-based heuristic optimisation ( http://arxiv.org/abs/2410.03409v1 ) ライセンス: Link先を確認	Pablo S. Naharro, Pablo Toharia, Antonio LaTorre, José-María Peña,	(参考訳) ヒューリスティック最適化アルゴリズムは、解をサンプリングし、その適合性を評価し、有望な解の方向に探索をバイアスすることで探索空間を探索する。しかし、多くの場合、この適合度関数は高価な計算処理を実行し、合理的な評価数を劇的に削減する。この文脈では、シュロゲートモデルはこれらの計算問題を緩和するための優れた代替品として現れている。本稿では,サロゲート問題の定式化を,適合度(表面サロゲートモデル)を近似する回帰モデルと,分類モデル(ペアワイズサロゲートモデル)を結合する新しい方法の両方として扱う。ペアワイズアプローチは、例えば差分進化(differial Evolution)のように、実際に探索を駆動するために適合値を必要としないアルゴリズムによって直接利用することができ、ある解が他の解より優れているかどうかを知るのに十分である。これらのモデリングアプローチに基づいて、異なる機械学習アルゴリズム(正規化回帰、ニューラルネットワーク、決定木、ブースティングメソッド、ランダムフォレスト)、異なる代理戦略(多様性の促進や予測しきい値の緩和)など、異なる構成下で代理モデルを多次元的に分析し、表面および対の代理モデルと比較した。論文の実験的部分には、SOCO2011コンペティションで提案されている連続最適化のベンチマーク問題と、最近のGECCO2021産業課題に含まれるシミュレーション問題が含まれている。本稿では,オンライン機械学習に基づくサロゲートモデルを用いた場合,全体の探索性能は,予測モデルの精度だけでなく,肯定的・否定的事例に対するバイアスの種類や,それらの予測を用いて実際のフィットネス機能を実行するかを決定する方法にも依存することを示す。 Heuristic optimisation algorithms explore the search space by sampling solutions, evaluating their fitness, and biasing the search in the direction of promising solutions. However, in many cases, this fitness function involves executing expensive computational calculations, drastically reducing the reasonable number of evaluations. In this context, surrogate models have emerged as an excellent alternative to alleviate these computational problems. This paper addresses the formulation of surrogate problems as both regression models that approximate fitness (surface surrogate models) and a novel way to connect classification models (pairwise surrogate models). The pairwise approach can be directly exploited by some algorithms, such as Differential Evolution, in which the fitness value is not actually needed to drive the search, and it is sufficient to know whether a solution is better than another one or not. Based on these modelling approaches, we have conducted a multidimensional analysis of surrogate models under different configurations: different machine learning algorithms (regularised regression, neural networks, decision trees, boosting methods, and random forests), different surrogate strategies (encouraging diversity or relaxing prediction thresholds), and compare them for both surface and pairwise surrogate models. The experimental part of the article includes the benchmark problems already proposed for the SOCO2011 competition in continuous optimisation and a simulation problem included in the recent GECCO2021 Industrial Challenge. This paper shows that the performance of the overall search, when using online machine learning-based surrogate models, depends not only on the accuracy of the predictive model but also on both the kind of bias towards positive or negative cases and how the optimisation uses those predictions to decide whether to execute the actual fitness function.	翻訳日:2024-11-02 22:29:14 公開日:2024-10-04
# 合成関係データの忠実度と実用性のベンチマーク Benchmarking the Fidelity and Utility of Synthetic Relational Data ( http://arxiv.org/abs/2410.03411v1 ) ライセンス: Link先を確認	Valter Hudovernik, Martin Jurkovič, Erik Štrumbelj,	(参考訳) リレーショナルデータの合成は、研究者、実践者、業界からより多くの注目を集め始めています。このタスクは、テーブル間の関係が複雑になるため、単一のテーブルを合成するよりも難しい。同じ理由から、リレーショナルデータを合成するためのベンチマーク手法は、新しい課題をもたらす。我々の研究は、最先端の手法の実証的な評価の欠如と、そのような評価をどのように行うべきかの理解のギャップによって動機付けられている。我々は、関係データ合成、共通ベンチマークデータセット、および合成データの忠実性と有用性を測定するためのアプローチに関する関連研究についてレビューする。ベストプラクティスと新しい堅牢な検出アプローチをベンチマークツールに組み合わせ、それを2つの商用ツールを含む6つの方法の比較に使用します。一部のメソッドは他のメソッドよりも優れているが、元のデータと区別できないデータセットを合成する手段はない。実用面では、モデル予測性能と特徴量の両方において、実データと合成データの適度な相関が観察されるのが一般的である。 Synthesizing relational data has started to receive more attention from researchers, practitioners, and industry. The task is more difficult than synthesizing a single table due to the added complexity of relationships between tables. For the same reason, benchmarking methods for synthesizing relational data introduces new challenges. Our work is motivated by a lack of an empirical evaluation of state-of-the-art methods and by gaps in the understanding of how such an evaluation should be done. We review related work on relational data synthesis, common benchmarking datasets, and approaches to measuring the fidelity and utility of synthetic data. We combine the best practices and a novel robust detection approach into a benchmarking tool and use it to compare six methods, including two commercial tools. While some methods are better than others, no method is able to synthesize a dataset that is indistinguishable from original data. For utility, we typically observe moderate correlation between real and synthetic data for both model predictive performance and feature importance.	翻訳日:2024-11-02 22:29:14 公開日:2024-10-04
# Team MTS @ AutoMin 2021: 既存の要約手法の概要と教師なし要約手法との比較 Team MTS @ AutoMin 2021: An Overview of Existing Summarization Approaches and Comparison to Unsupervised Summarization Techniques ( http://arxiv.org/abs/2410.03412v1 ) ライセンス: Link先を確認	Olga Iakovenko, Anna Andreeva, Anna Lapidus, Liana Mikaelyan,	(参考訳) ビデオやオーディオ会議による遠隔コミュニケーションは、世界規模のパンデミックにより、これまで以上に人気が高まっている。これらの出来事は、AutoMin 2021チャレンジに繋がる音声言語の自動マイニングシステムの開発を促した。下記の論文は、Automatic Minutes チャレンジに参加しているチーム MTS が実施した研究成果について説明する。本稿では,テキストと音声の要約に対する既存のアプローチを解析し,クラスタリングに基づく教師なし要約手法を提案する。提案手法は, ルージュ1, ルージュ2, ルージュL値0.21, 0.02, 0.2, ルージュ1, ルージュ2, ルージュL値Adequacy, 文法的正しさおよびフラレンシ値0.0180, 0.035, 0.098, Remote communication through video or audio conferences has become more popular than ever because of the worldwide pandemic. These events, therefore, have provoked the development of systems for automatic minuting of spoken language leading to AutoMin 2021 challenge. The following paper illustrates the results of the research that team MTS has carried out while participating in the Automatic Minutes challenge. In particular, in this paper we analyze existing approaches to text and speech summarization, propose an unsupervised summarization technique based on clustering and provide a pipeline that includes an adapted automatic speech recognition block able to run on real-life recordings. The proposed unsupervised technique outperforms pre-trained summarization models on the automatic minuting task with Rouge 1, Rouge 2 and Rouge L values of 0.21, 0.02 and 0.2 on the dev set, with Rouge 1, Rouge 2, Rouge L, Adequacy, Grammatical correctness and Fluency values of 0.180, 0.035, 0.098, 1.857, 2.304, 1.911 on the test set accordingly	翻訳日:2024-11-02 22:29:14 公開日:2024-10-04
# セキュアなキーリースのためのシンプルなフレームワーク A Simple Framework for Secure Key Leasing ( http://arxiv.org/abs/2410.03413v1 ) ライセンス: Link先を確認	Fuyuki Kitagawa, Tomoyuki Morimae, Takashi Yamakawa,	(参考訳) セキュアな鍵リース(すなわち、鍵取り消し可能な暗号)により、暗号鍵を量子状態としてリースし、鍵を検証可能な方法で取り消すことができる。本稿では,BB84状態の復号化特性を利用して,暗号プリミティブをセキュアな鍵リースで構築するための簡単なフレームワークを提案する。この枠組みに基づき、以下のスキームを得る。 -IND-CPAのセキュアな公開鍵暗号スキームに基づいて古典的な取り消しを行うセキュアな鍵リースを備えた公開鍵暗号スキーム。以前の研究は、量子的取り消しか、LWE問題による学習の量子的硬度のようなより強い仮定に依存していた。 -一方の関数に基づいて古典的な取り消しを行うセキュアな鍵リースを持つ擬似乱数関数。以前の研究は、LWE問題の量子硬度のような強い仮定に依存していた。 -ショート整数解(SIS)問題の量子硬度に基づく古典的取り消しを有するセキュアな鍵リース付きデジタル署名スキーム。私たちの構造には静的な署名キーがあります。つまり、署名キーの状態は署名前後でほとんど変化しません。以前の構成では、コピー保護というより強力な目標を達成するために、非静的署名キーや識別不能な難読化に依存していた。さらに、敵が削除の有効な証明書を提出した後、取り消しの検証キーが漏洩しても、これらのスキームはすべて安全である。私たちの知る限り、この設定では、以前の構成はすべて完全に壊れています。さらに、我々の見解では、我々のセキュリティ証明は既存のスキームよりもはるかに単純である。 Secure key leasing (a.k.a. key-revocable cryptography) enables us to lease a cryptographic key as a quantum state in such a way that the key can be later revoked in a verifiable manner. We propose a simple framework for constructing cryptographic primitives with secure key leasing via the certified deletion property of BB84 states. Based on our framework, we obtain the following schemes. - A public key encryption scheme with secure key leasing that has classical revocation based on any IND-CPA secure public key encryption scheme. Prior works rely on either quantum revocation or stronger assumptions such as the quantum hardness of the learning with errors (LWE) problem. - A pseudorandom function with secure key leasing that has classical revocation based on one-way functions. Prior works rely on stronger assumptions such as the quantum hardness of the LWE problem. - A digital signature scheme with secure key leasing that has classical revocation based on the quantum hardness of the short integer solution (SIS) problem. Our construction has static signing keys, i.e., the state of a signing key almost does not change before and after signing. Prior constructions either rely on non-static signing keys or indistinguishability obfuscation to achieve a stronger goal of copy-protection. In addition, all of our schemes remain secure even if a verification key for revocation is leaked after the adversary submits a valid certificate of deletion. To our knowledge, all prior constructions are totally broken in this setting. Moreover, in our view, our security proofs are much simpler than those for existing schemes.	翻訳日:2024-11-02 22:29:14 公開日:2024-10-04
# 単一ベクトルアブレーションによる言語モデルにおける偽の拒絶の軽減 Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation ( http://arxiv.org/abs/2410.03415v1 ) ライセンス: Link先を確認	Xinpeng Wang, Chengzhi Hu, Paul Röttger, Barbara Plank,	(参考訳) モデルが悪意のある指示に従うことや、有害なアドバイスをすることを拒否したり(例: "どうやって誰かを殺すのか" など)、安全でないもの(例: "どのようにPythonプロセスを殺すのか" など)に似ても、安全な要求を拒否するべきではない。このような誤った拒絶を避けることは、以前の研究が示すように、高機能な言語モデルでさえ困難である。本稿では,単一ベクトルアブレーションによる言語モデルにおける偽の拒絶を緩和するための簡易かつ外科的手法を提案する。与えられたモデルに対して、偽の拒絶ベクトルを抽出し、このベクトルを非難することで、モデル安全性や一般モデルの能力に悪影響を及ぼすことなく、偽の拒絶率を低減することを示す。また,本手法はモデル安全性のきめ細かい校正に有効であることを示す。提案手法はトレーニング不要で,モデルに依存しないため,現在および将来の言語モデルにおける誤認の問題を軽減するのに有用である。 Training a language model to be both helpful and harmless requires careful calibration of refusal behaviours: Models should refuse to follow malicious instructions or give harmful advice (e.g. "how do I kill someone?"), but they should not refuse safe requests, even if they superficially resemble unsafe ones (e.g. "how do I kill a Python process?"). Avoiding such false refusal, as prior work has shown, is challenging even for highly-capable language models. In this paper, we propose a simple and surgical method for mitigating false refusal in language models via single vector ablation. For a given model, we extract a false refusal vector and show that ablating this vector reduces false refusal rate without negatively impacting model safety and general model capabilities. We also show that our approach can be used for fine-grained calibration of model safety. Our approach is training-free and model-agnostic, making it useful for mitigating the problem of false refusal in current and future language models.	翻訳日:2024-11-02 22:29:14 公開日:2024-10-04
# Img2CAD:構造的視覚幾何学を用いた単一画像からの3次元CADモデル生成 Img2CAD: Conditioned 3D CAD Model Generation from Single Image with Structured Visual Geometry ( http://arxiv.org/abs/2410.03417v1 ) ライセンス: Link先を確認	Tianrun Chen, Chunan Yu, Yuanqi Hu, Jing Li, Tao Xu, Runlong Cao, Lanyun Zhu, Ying Zang, Yong Zhang, Zejian Li, Linyun Sun,	(参考訳) 本稿では,編集可能なパラメータを持つCADモデルを生成するために2次元画像入力を用いた知識に対する最初のアプローチであるImg2CADを提案する。テキストや画像入力を使用した既存の3Dモデル生成のためのAIメソッドとは異なり、CADツールと互換性がなく、編集性や細かい制御が欠けているメッシュベースの表現に依存することが多い。我々は、オブジェクトから抽出されたベクトル化されたワイヤフレームを特徴とする、構造化ビジュアル幾何学(SVG)と呼ばれる革新的な中間表現を特定した。この表現は、条件付きCADモデルの生成性能を大幅に向上させる。 ABC-monoはレンダリングされた画像を持つ20,000以上の3DCADモデルからなる既知の最大のデータセットであり、KoCADは、実世界のキャプチャーオブジェクトとそれらの地上の真理CADモデルを組み合わせた最初のデータセットであり、条件付きCADモデル生成におけるさらなる研究を支援する。 In this paper, we propose Img2CAD, the first approach to our knowledge that uses 2D image inputs to generate CAD models with editable parameters. Unlike existing AI methods for 3D model generation using text or image inputs often rely on mesh-based representations, which are incompatible with CAD tools and lack editability and fine control, Img2CAD enables seamless integration between AI-based 3D reconstruction and CAD software. We have identified an innovative intermediate representation called Structured Visual Geometry (SVG), characterized by vectorized wireframes extracted from objects. This representation significantly enhances the performance of generating conditioned CAD models. Additionally, we introduce two new datasets to further support research in this area: ABC-mono, the largest known dataset comprising over 200,000 3D CAD models with rendered images, and KOCAD, the first dataset featuring real-world captured objects alongside their ground truth CAD models, supporting further research in conditioned CAD model generation.	翻訳日:2024-11-02 22:29:14 公開日:2024-10-04
# CNN層を経由した航空機のレーダ・アレータ干渉低減 Aircraft Radar Altimeter Interference Mitigation Through a CNN-Layer Only Denoising Autoencoder Architecture ( http://arxiv.org/abs/2410.03423v1 ) ライセンス: Link先を確認	Samuel B. Brown, Stephen Young, Adam Wagenknecht, Daniel Jakubisin, Charles E. Thornton, Aaron Orndorff, William C. Headley,	(参考訳) 信号処理アプリケーションのためのデノイングオートエンコーダは、特に大規模なサンプルシステムにおいて、無線周波数通信信号を再構成する学習において重大な困難を経験することが示されている。通信システムでは、この課題は主に、本質的には確率的である変調されたデータストリームを再構築する必要があるためである。本研究では,高構造FMCWレーダ信号を再構成しながら,干渉する無線周波数通信信号を除去するために,デノナイズ方式のオートエンコーダを用いることにより,この制限を利用する。具体的には、CNN層のみのオートエンコーダアーキテクチャを用いて、多数の干渉信号からなる厳しい干渉環境においても、レーダ高度計のレンジ推定精度を向上させることができることを示す。これは、畳み込み層のみのオートエンコーダを使用せずとも、エンドツーエンドのFMCWレーダ高度計シミュレーションの包括的な性能解析によって実証される。提案手法は、狭帯域のトーン干渉と広帯域QPSK干渉の両方の存在下での干渉緩和を、レンジRMS誤差、偽高度レポート数、および結果のレンジプロファイルのピーク・ツー・サイドローブ比の観点から著しく改善する。最大4万個のIQサンプルのFMCWレーダー信号を確実に再構成することができる。 Denoising autoencoders for signal processing applications have been shown to experience significant difficulty in learning to reconstruct radio frequency communication signals, particularly in the large sample regime. In communication systems, this challenge is primarily due to the need to reconstruct the modulated data stream which is generally highly stochastic in nature. In this work, we take advantage of this limitation by using the denoising autoencoder to instead remove interfering radio frequency communication signals while reconstructing highly structured FMCW radar signals. More specifically, in this work we show that a CNN-layer only autoencoder architecture can be utilized to improve the accuracy of a radar altimeter's ranging estimate even in severe interference environments consisting of a multitude of interference signals. This is demonstrated through comprehensive performance analysis of an end-to-end FMCW radar altimeter simulation with and without the convolutional layer-only autoencoder. The proposed approach significantly improves interference mitigation in the presence of both narrow-band tone interference as well as wideband QPSK interference in terms of range RMS error, number of false altitude reports, and the peak-to-sidelobe ratio of the resulting range profile. FMCW radar signals of up to 40,000 IQ samples can be reliably reconstructed.	翻訳日:2024-11-02 22:29:14 公開日:2024-10-04
# Cayley Graph Propagation Cayley Graph Propagation ( http://arxiv.org/abs/2410.03424v1 ) ライセンス: Link先を確認	JJ Wilson, Maya Bechler-Speicher, Petar Veličković,	(参考訳) グラフ構造化データのモデリングにおいて、グラフニューラルネットワーク(GNN)を使った成功談は多々あるが、それらは過度な監視に弱いことで知られており、タスクはノードの距離ペア間の情報の混合を必要とする。この問題に対処するため、先行研究では、情報フローを改善するためにグラフ構造を書き換えることを提案している。あるいは、重要な研究機関が、オーバー・スカッシングを改善するためにボトルネックのないグラフ構造の発見と事前計算に力を入れている。数学界におけるボトルネックのないグラフのファミリの一つとして、拡張グラフがある。先行研究$\unicode{x2014}$Expander Graph Propagation (EGP)$\unicode{x2014}$proposing the use of a well-known expander graph family$\unicode{x2014}$the Cayley graphs of the $\mathrm{SL}(2,\mathbb{Z}_n)$ special linear group$\unicode{x2014}$as a computer template for GNNs。しかし、EGPでは、使用する計算グラフは与えられた入力グラフと整合するように切り詰められる。本研究は, トランケーションが対流膨張特性に有害であることを示す。代わりに、完全なケイリーグラフ構造上の情報を伝播する手法であるCGPを提案する。実世界の複数のデータセットにまたがる実証的な証拠は、CGPがEGPに比べて大幅な改善を回復するだけでなく、計算に複雑なグラフリウィリング技術に類似していることを示している。 In spite of the plethora of success stories with graph neural networks (GNNs) on modelling graph-structured data, they are notoriously vulnerable to over-squashing, whereby tasks necessitate the mixing of information between distance pairs of nodes. To address this problem, prior work suggests rewiring the graph structure to improve information flow. Alternatively, a significant body of research has dedicated itself to discovering and precomputing bottleneck-free graph structures to ameliorate over-squashing. One well regarded family of bottleneck-free graphs within the mathematical community are expander graphs, with prior work$\unicode{x2014}$Expander Graph Propagation (EGP)$\unicode{x2014}$proposing the use of a well-known expander graph family$\unicode{x2014}$the Cayley graphs of the $\mathrm{SL}(2,\mathbb{Z}_n)$ special linear group$\unicode{x2014}$as a computational template for GNNs. However, in EGP the computational graphs used are truncated to align with a given input graph. In this work, we show that truncation is detrimental to the coveted expansion properties. Instead, we propose CGP, a method to propagate information over a complete Cayley graph structure, thereby ensuring it is bottleneck-free to better alleviate over-squashing. Our empirical evidence across several real-world datasets not only shows that CGP recovers significant improvements as compared to EGP, but it is also akin to or outperforms computationally complex graph rewiring techniques.	翻訳日:2024-11-02 22:29:14 公開日:2024-10-04
# このテストセットはどれくらい難しいか?爆発的トレーニングダイナミクスによるNLIの特性評価 How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics ( http://arxiv.org/abs/2410.03429v1 ) ライセンス: Link先を確認	Adrian Cosma, Stefan Ruseti, Mihai Dascalu, Cornelia Caragea,	(参考訳) 自然言語推論(NLI)評価は、言語理解モデルを評価する上で重要であるが、一般的なデータセットは、実際のモデル性能を人工的に向上させる体系的な急激な相関に悩まされている。そこで本研究では,人為的および非現実的な例を手作業で構築することに頼ることなく,挑戦的なテストセットを自動生成する手法を提案する。一般的なNLIデータセットのテストセットを,トレーニングダイナミクスを利用した3つの難易度に分類する。この分類は、性能が著しく低下し、より現実的で多様な言語現象を包含する最も難易度の高い事例として、素早い相関措置を著しく減少させる。我々の特徴付け手法がトレーニングセットに適用された場合、トレーニングされたデータのごく一部でトレーニングされたモデルは、他のデータセットの特徴付け手法を上回り、フルデータセットでトレーニングされたモデルに匹敵するパフォーマンスを達成する。本研究は,NLIデータセット構築における制約に対処し,多様なNLUアプリケーションに影響を及ぼすモデル性能のより正確な評価を提供する。 Natural Language Inference (NLI) evaluation is crucial for assessing language understanding models; however, popular datasets suffer from systematic spurious correlations that artificially inflate actual model performance. To address this, we propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples. We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics. This categorization significantly reduces spurious correlation measures, with examples labeled as having the highest difficulty showing markedly decreased performance and encompassing more realistic and diverse linguistic phenomena. When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset, surpassing other dataset characterization techniques. Our research addresses limitations in NLI dataset construction, providing a more authentic evaluation of model performance with implications for diverse NLU applications.	翻訳日:2024-11-02 22:29:14 公開日:2024-10-04
# Image Speak Volumes: アクセシブルコミュニケーションのための画像生成のユーザ中心評価 Images Speak Volumes: User-Centric Assessment of Image Generation for Accessible Communication ( http://arxiv.org/abs/2410.03430v1 ) ライセンス: Link先を確認	Miriam Anschütz, Tringa Sylaj, Georg Groh,	(参考訳) 説明画像は、アクセシブルで読みやすい(E2R)テキストにおいて重要な役割を果たす。しかし、オンラインデータベースで利用可能な画像はそれぞれのテキストに合わせて調整されておらず、カスタマイズされた画像の作成は高価である。本研究では,手軽にカスタマイズ可能な画像を提供することで,テキスト・画像生成モデルがこのギャップを埋めることができるかを検討した。我々は、7、4つのオープンソース、3つのクローズドソース画像生成モデルをベンチマークし、その結果の画像を広範囲に評価した。また,E2Rターゲットグループの人々とユーザスタディを行い,画像が要件を満たしているかどうかを検討した。いくつかのモデルは優れた性能を示すが、人間の監督なしに大規模に使用する準備ができていない。我々の研究は、E2Rクリエーターにとってアクセス可能な情報の作成を容易にし、ターゲットグループのニーズに合わせてアクセス可能なイメージを調整するための重要なステップである。 Explanatory images play a pivotal role in accessible and easy-to-read (E2R) texts. However, the images available in online databases are not tailored toward the respective texts, and the creation of customized images is expensive. In this large-scale study, we investigated whether text-to-image generation models can close this gap by providing customizable images quickly and easily. We benchmarked seven, four open- and three closed-source, image generation models and provide an extensive evaluation of the resulting images. In addition, we performed a user study with people from the E2R target group to examine whether the images met their requirements. We find that some of the models show remarkable performance, but none of the models are ready to be used at a larger scale without human supervision. Our research is an important step toward facilitating the creation of accessible information for E2R creators and tailoring accessible images to the target group's needs.	翻訳日:2024-11-02 22:29:14 公開日:2024-10-04
# EB-NeRD:ニュースレコメンデーションのための大規模データセット EB-NeRD: A Large-Scale Dataset for News Recommendation ( http://arxiv.org/abs/2410.03432v1 ) ライセンス: Link先を確認	Johannes Kruse, Kasper Lindskow, Saikishore Kalloori, Marco Polignano, Claudio Pomo, Abhishek Srivastava, Anshuk Uppal, Michael Riis Andersen, Jes Frellsen,	(参考訳) パーソナライズされたコンテンツレコメンデーションは、ビデオストリーミングからソーシャルネットワークまで、デジタルメディアのコンテンツ体験に重要な要素となっている。しかし、いくつかのドメイン固有の課題は、ニュース出版におけるレコメンデーターシステムの採用を妨げている。これらの課題に対処するために、Ekstra Bladet News Recommendation Dataset (EB-NeRD)を紹介する。このデータセットには、100万人以上のユニークユーザと、Ekstra Bladetの3700万以上のインプレッションログが含まれている。また、125,000以上のデンマークのニュース記事のコレクションが含まれており、タイトル、要約、ボディ、カテゴリなどのメタデータが完備している。 EB-NeRDはRecSys '24 Challengeのベンチマークデータセットとして機能し、このデータセットが、ニュースパブリッシングのために効果的で責任あるレコメンデータシステムを設計する際の技術的および規範的な課題にどのように対処できるかを実証した。データセットは以下の通りである。 Personalized content recommendations have been pivotal to the content experience in digital media from video streaming to social networks. However, several domain specific challenges have held back adoption of recommender systems in news publishing. To address these challenges, we introduce the Ekstra Bladet News Recommendation Dataset (EB-NeRD). The dataset encompasses data from over a million unique users and more than 37 million impression logs from Ekstra Bladet. It also includes a collection of over 125,000 Danish news articles, complete with titles, abstracts, bodies, and metadata, such as categories. EB-NeRD served as the benchmark dataset for the RecSys '24 Challenge, where it was demonstrated how the dataset can be used to address both technical and normative challenges in designing effective and responsible recommender systems for news publishing. The dataset is available at: https://recsys.eb.dk.	翻訳日:2024-11-02 22:19:23 公開日:2024-10-04
# 多点触覚の知覚的重要度予測のための自己教師付き時空間マスクパージング注意ネットワーク Self-supervised Spatio-Temporal Graph Mask-Passing Attention Network for Perceptual Importance Prediction of Multi-point Tactility ( http://arxiv.org/abs/2410.03434v1 ) ライセンス: Link先を確認	Dazhong He, Qian Liu,	(参考訳) 視覚的・聴覚的情報は現代のマルチメディアシステムでは一般的であるが、触覚的相互作用(触覚的・審美的相互作用)は人間の知覚のユニークな形態を提供する。しかし,接触操作のためのマルチメディア技術は,非接触型マルチメディア技術よりも成熟度が低く,さらなる開発が必要である。低レイテンシとビットレートを必要とする特殊な触覚メディア技術は、触覚情報圧縮を必要とする触覚インタラクションを実現するために不可欠である。既存のビブロタクタクタブル信号圧縮法は知覚モデルに基づいて,複数の空間的相互作用点における融合触覚知覚の特性を考慮していない。実際、触覚の重要性の違いは、従来の周波数や時間領域に限らず、触覚に特有の皮膚上の空間的位置の違いも含んでいる。最も頻繁に使用される触覚情報、視覚的テクスチャ知覚のために、自己教師付き学習と時空間グラフニューラルネットワークに基づいて、その知覚的重要性を複数の点で予測するモデルを開発した。現在の実験結果から,多点触覚の知覚シナリオにおいて,様々な点の知覚的重要性を効果的に予測できることが示唆された。 While visual and auditory information are prevalent in modern multimedia systems, haptic interaction, e.g., tactile and kinesthetic interaction, provides a unique form of human perception. However, multimedia technology for contact interaction is less mature than non-contact multimedia technologies and requires further development. Specialized haptic media technologies, requiring low latency and bitrates, are essential to enable haptic interaction, necessitating haptic information compression. Existing vibrotactile signal compression methods, based on the perceptual model, do not consider the characteristics of fused tactile perception at multiple spatially distributed interaction points. In fact, differences in tactile perceptual importance are not limited to conventional frequency and time domains, but also encompass differences in the spatial locations on the skin unique to tactile perception. For the most frequently used tactile information, vibrotactile texture perception, we have developed a model to predict its perceptual importance at multiple points, based on self-supervised learning and Spatio-Temporal Graph Neural Network. Current experimental results indicate that this model can effectively predict the perceptual importance of various points in multi-point tactile perception scenarios.	翻訳日:2024-11-02 22:19:23 公開日:2024-10-04
# 解釈可能なセマンティックテキスト埋め込み作成のための汎用フレームワーク A General Framework for Producing Interpretable Semantic Text Embeddings ( http://arxiv.org/abs/2410.03435v1 ) ライセンス: Link先を確認	Yiqun Sun, Qiang Huang, Yixuan Tang, Anthony K. H. Tung, Jun Yu,	(参考訳) セマンティックテキストの埋め込みは自然言語処理(NLP)において多くのタスクに必須である。ブラックボックスモデルは高品質な埋め込みを生成することができるが、解釈可能性の欠如は透明性を必要とするタスクでの使用を制限する。近年のアプローチでは、ドメインエキスパートが作成した質問やLLMが生成した質問を活用することで、解釈可能性の向上が図られているが、これらの手法は専門家の入力や適切な設計に大きく依存しており、その一般化性と幅広いタスクにまたがる差別的な質問を生成する能力を制限する。これらの課題に対処するために,さまざまなタスクにまたがる解釈可能なセマンティックテキストの埋め込みを生成するための一般的なフレームワークである,<algo{CQG-MBQA} (Contrastive Question Generation - Multi-task Binary Question Answering)を紹介した。この枠組みは,高度に識別的かつ低認知的負荷のYes/no質問を \algo{CQG} 法を用いて体系的に生成し,それらをより効果的に解答することにより,コスト効率のよい埋め込みを実現する。本研究では,多くの高度なブラックボックスモデルに匹敵する埋め込み品質を提供するとともに,本質的な解釈可能性を維持しつつ,より広範な実験とアブレーション研究を通じて, \algo{CQG-MBQA}の有効性と解釈可能性を検証する。さらに、 \algo{CQG-MBQA} は、様々なダウンストリームタスクにまたがる他の解釈可能なテキスト埋め込みメソッドよりも優れている。 Semantic text embedding is essential to many tasks in Natural Language Processing (NLP). While black-box models are capable of generating high-quality embeddings, their lack of interpretability limits their use in tasks that demand transparency. Recent approaches have improved interpretability by leveraging domain-expert-crafted or LLM-generated questions, but these methods rely heavily on expert input or well-prompt design, which restricts their generalizability and ability to generate discriminative questions across a wide range of tasks. To address these challenges, we introduce \algo{CQG-MBQA} (Contrastive Question Generation - Multi-task Binary Question Answering), a general framework for producing interpretable semantic text embeddings across diverse tasks. Our framework systematically generates highly discriminative, low cognitive load yes/no questions through the \algo{CQG} method and answers them efficiently with the \algo{MBQA} model, resulting in interpretable embeddings in a cost-effective manner. We validate the effectiveness and interpretability of \algo{CQG-MBQA} through extensive experiments and ablation studies, demonstrating that it delivers embedding quality comparable to many advanced black-box models while maintaining inherently interpretability. Additionally, \algo{CQG-MBQA} outperforms other interpretable text embedding methods across various downstream tasks.	翻訳日:2024-11-02 22:19:23 公開日:2024-10-04
# プレトレーニングにおけるアクティベーション・スパリティのメリットを探る Exploring the Benefit of Activation Sparsity in Pre-training ( http://arxiv.org/abs/2410.03440v1 ) ライセンス: Link先を確認	Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin, Yankai Lin, Zhiyuan Zeng, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou,	(参考訳) 事前訓練されたトランスフォーマーは本質的にスパース活性化の特徴を持ち、各トークンに対して少数のニューロンのみが活性化される。スパース・アクティベーションはポスト・トレーニング法によって研究されているが、プレ・トレーニングの可能性は未解決のままである。本研究では,まず,事前学習中に活性化特性がどう変化するかを検討する。本研究により,トランスフォーマーは,トレーニングの進行とともに活性化相関が変化し続けながら,トレーニング前プロセスの大部分を通してスパースアクティベーションを示すことが明らかとなった。そこで本研究では,Sparse-Dense Learning (SSD)を提案する。 SSDは、Mixtures-of-Experts (MoE)ベースのスパーストレーニングと事前トレーニング中の従来の密集トレーニングを適応的に切り替え、スパーストレーニングの効率を活用し、スパーストレーニングの静的アクティベーション相関を回避する。高密度トレーニングと比較して、SSDは同じモデルサイズで同等のパフォーマンスを達成し、事前トレーニングコストを削減します。さらに、SSDでトレーニングされたモデルは、スパース推論のMoEモデルとして直接使用することができ、最大2\times$高速推論速度の高密度モデルと同じパフォーマンスを達成することができる。コードはhttps://github.com/thunlp/moefication.comで入手できる。 Pre-trained Transformers inherently possess the characteristic of sparse activation, where only a small fraction of the neurons are activated for each token. While sparse activation has been explored through post-training methods, its potential in pre-training remains untapped. In this work, we first study how activation properties change during pre-training. Our examination reveals that Transformers exhibit sparse activation throughout the majority of the pre-training process while the activation correlation keeps evolving as training progresses. Leveraging this observation, we propose Switchable Sparse-Dense Learning (SSD). SSD adaptively switches between the Mixtures-of-Experts (MoE) based sparse training and the conventional dense training during the pre-training process, leveraging the efficiency of sparse training and avoiding the static activation correlation of sparse training. Compared to dense training, SSD achieves comparable performance with identical model size and reduces pre-training costs. Moreover, the models trained with SSD can be directly used as MoE models for sparse inference and achieve the same performance as dense models with up to $2\times$ faster inference speed. Codes are available at https://github.com/thunlp/moefication.	翻訳日:2024-11-02 22:19:23 公開日:2024-10-04
# CLoSD:マルチタスク文字制御のためのシミュレーションと拡散のループを閉じる CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control ( http://arxiv.org/abs/2410.03441v1 ) ライセンス: Link先を確認	Guy Tevet, Sigal Raab, Setareh Cohan, Daniele Reda, Zhengyi Luo, Xue Bin Peng, Amit H. Bermano, Michiel van de Panne,	(参考訳) 物理シミュレーションのための運動拡散モデルと強化学習(RL)に基づく制御は、人間の運動生成に相補的な強みを持つ。前者はテキストなどの直感的な制御に固執し、後者は物理的にもっともらしい動きと環境との直接的な相互作用を提供する。本研究では,それぞれの強みを組み合わせた手法を提案する。 CLoSDはテキスト駆動のRL物理ベースのコントローラで、様々なタスクの拡散生成によって導かれる。我々の重要な洞察は、動きの拡散がロバストなRLコントローラのためのオンザフライユニバーサルプランナーとして機能するということである。この目的のために、CLoSDは、Diffusion Planner(DiP)とトラッキングコントローラという、2つのモジュール間のクローズドループインタラクションを維持している。 DiPはテキストのプロンプトとターゲット位置によって制御される高速応答型自己回帰拡散モデルであり、コントローラはシンプルで堅牢な動作模倣器であり、DiPからの動作計画を継続的に受信し、環境からのフィードバックを提供する。 CLoSDは、目標地点へのナビゲーション、テキストプロンプトで指定された手や足で物体を打つこと、座ること、立ち上がることなど、さまざまなタスクをシームレスに実行することができる。 https://guytevet.github.io/CLoSD-page/ Motion diffusion models and Reinforcement Learning (RL) based control for physics-based simulations have complementary strengths for human motion generation. The former is capable of generating a wide variety of motions, adhering to intuitive control such as text, while the latter offers physically plausible motion and direct interaction with the environment. In this work, we present a method that combines their respective strengths. CLoSD is a text-driven RL physics-based controller, guided by diffusion generation for various tasks. Our key insight is that motion diffusion can serve as an on-the-fly universal planner for a robust RL controller. To this end, CLoSD maintains a closed-loop interaction between two modules -- a Diffusion Planner (DiP), and a tracking controller. DiP is a fast-responding autoregressive diffusion model, controlled by textual prompts and target locations, and the controller is a simple and robust motion imitator that continuously receives motion plans from DiP and provides feedback from the environment. CLoSD is capable of seamlessly performing a sequence of different tasks, including navigation to a goal location, striking an object with a hand or foot as specified in a text prompt, sitting down, and getting up. https://guytevet.github.io/CLoSD-page/	翻訳日:2024-11-02 22:19:23 公開日:2024-10-04
# 不純物をもつ二重ユニタリ量子回路における絡み合い Entanglement in dual unitary quantum circuits with impurities ( http://arxiv.org/abs/2410.03442v1 ) ライセンス: Link先を確認	Shachar Fraenkel, Colin Rylands,	(参考訳) 両部エンタングルメントエントロピーは多体量子系における普遍的性質の最も有用な特徴の1つである。平衡から遠いところでは、その力学、準粒子像と膜像の2つの非常に効果的な理論が存在する。本研究では、不純物に摂食された量子回路モデルにおいて、絡み合いのダイナミクスとこれら2つの相補的アプローチについて検討する。特に、空間的に固定された非双対不純物ゲートを含む双対ユニタリ量子回路を考える。不純物の有限距離における半無限部分系と有限部分系の両方に対する絡み合いエントロピーを計算し、正確な結果を有効理論の予測と比較する。前者の場合、どちらの理論も互いに一致し、正確な計算を行う。しかし後者の場合、両理論は質的に異なり、準粒子像は膜像とは対照的に非単調な成長を予測している。このようなモノトニックな動作は、ランダムなカオス回路でも起こりうることを示す。 Bipartite entanglement entropy is one of the most useful characterizations of universal properties in a many-body quantum system. Far from equilibrium, there exist two highly effective theories describing its dynamics -- the quasiparticle and membrane pictures. In this work we investigate entanglement dynamics, and these two complementary approaches, in a quantum circuit model perturbed by an impurity. In particular, we consider a dual unitary quantum circuit containing a spatially fixed, non-dual-unitary impurity gate, allowing for differing local Hilbert space dimensions to either side. We compute the entanglement entropy for both a semi-infinite and a finite subsystem within a finite distance of the impurity, comparing exact results to predictions of the effective theories. We find that in the former case, both theories agree with each other and the exact calculation. In the latter case, however, both theories qualitatively differ, with the quasiparticle picture predicting a non-monotonic growth in contrast to the membrane picture. We show that such non-monotonic behavior can arise even in random chaotic circuits, pointing to a hitherto unknown shortcoming of the membrane picture in describing such systems.	翻訳日:2024-11-02 22:19:23 公開日:2024-10-04
# 自然言語処理の不確かさについて On Uncertainty In Natural Language Processing ( http://arxiv.org/abs/2410.03446v1 ) ライセンス: Link先を確認	Dennis Ulmer,	(参考訳) ディープラーニングの過去10年で、さまざまなアプリケーションにデプロイされる、ますます有能なシステムが生まれました。自然言語処理において、この分野は大きな言語モデルを含む多くのブレークスルーによって変革され、ますます多くのユーザ向けアプリケーションで使われている。この技術の利点を享受し、潜在的な害を軽減するためには、モデル予測の信頼性と、その開発を妨げた不確実性を定量化することが重要である。この論文は、自然言語処理の不確実性が言語的、統計的、神経的な視点からどのように特徴づけられるか、そして、実験パイプラインの設計を通してそれを減らし、定量化する方法について研究する。さらに,テキスト分類タスクにおける帰納的モデルバイアスの効果を理論的かつ実験的に検討することにより,モデリングにおける不確実性定量化について検討する。対応する実験には、3つの異なる言語(デンマーク語、英語、フィンランド語)とタスクのデータと、異なる不確実性定量化アプローチの大規模なセットが含まれる。さらに,非交換不能な共形予測に基づく自然言語生成における校正サンプリング手法を提案する。最後に、補助予測器を用いて、大規模ブラックボックス言語モデルの信頼度を定量化する手法を開発し、ターゲットモデルの出力テキストへの入力から信頼度を予測する。 The last decade in deep learning has brought on increasingly capable systems that are deployed on a wide variety of applications. In natural language processing, the field has been transformed by a number of breakthroughs including large language models, which are used in increasingly many user-facing applications. In order to reap the benefits of this technology and reduce potential harms, it is important to quantify the reliability of model predictions and the uncertainties that shroud their development. This thesis studies how uncertainty in natural language processing can be characterized from a linguistic, statistical and neural perspective, and how it can be reduced and quantified through the design of the experimental pipeline. We further explore uncertainty quantification in modeling by theoretically and empirically investigating the effect of inductive model biases in text classification tasks. The corresponding experiments include data for three different languages (Danish, English and Finnish) and tasks as well as a large set of different uncertainty quantification approaches. Additionally, we propose a method for calibrated sampling in natural language generation based on non-exchangeable conformal prediction, which provides tighter token sets with better coverage of the actual continuation. Lastly, we develop an approach to quantify confidence in large black-box language models using auxiliary predictors, where the confidence is predicted from the input to and generated output text of the target model alone.	翻訳日:2024-11-02 22:19:23 公開日:2024-10-04
# 言語モデルはどのように文脈文法的キューを優先するか? How Language Models Prioritize Contextual Grammatical Cues? ( http://arxiv.org/abs/2410.03447v1 ) ライセンス: Link先を確認	Hamidreza Amirzadeh, Afra Alishahi, Hosein Mohebbi,	(参考訳) トランスフォーマーベースの言語モデルは、文脈情報を効果的に捉え、活用する優れた能力を示している。主観的コンセンサスやコア参照解決など,対象タスクに対する単一コンテキストキューの寄与を定量化し,追跡するために,さまざまな分析手法が用いられているが,そのコンテキスト内で複数の関連キューが利用可能となるシナリオはいまだ検討されていない。本稿では,複数のジェンダーキュー語が存在する場合の言語モデルによるジェンダー合意の扱いについて検討し,それぞれが対象のジェンダー代名詞を独立に曖昧にすることができることを示す。我々は、エンコーダベースであるBERTとデコーダベースモデルであるGPT-2の2つの広く使われているトランスフォーマーモデルを分析する。我々の分析では、モデル内の情報の流れを追跡するコンテキスト混合分析と、モデルの予測に対するキューの影響を測定するアクティベーションパッチングという2つの相補的なアプローチを採用している。 GPT-2は最終のキューに依存しているのに対し、BERTはターゲットの単語表現とモデルの予測の両方を形成するために、コンテキストの最初のキューを優先順位付けする傾向にある。この結果から,エンコーダベースのモデルとデコーダベースのモデルでは,予測にコンテキスト情報を優先し,使用する方法に顕著な違いが認められた。 Transformer-based language models have shown an excellent ability to effectively capture and utilize contextual information. Although various analysis techniques have been used to quantify and trace the contribution of single contextual cues to a target task such as subject-verb agreement or coreference resolution, scenarios in which multiple relevant cues are available in the context remain underexplored. In this paper, we investigate how language models handle gender agreement when multiple gender cue words are present, each capable of independently disambiguating a target gender pronoun. We analyze two widely used Transformer-based models: BERT, an encoder-based, and GPT-2, a decoder-based model. Our analysis employs two complementary approaches: context mixing analysis, which tracks information flow within the model, and a variant of activation patching, which measures the impact of cues on the model's prediction. We find that BERT tends to prioritize the first cue in the context to form both the target word representations and the model's prediction, while GPT-2 relies more on the final cue. Our findings reveal striking differences in how encoder-based and decoder-based models prioritize and use contextual information for their predictions.	翻訳日:2024-11-02 22:09:37 公開日:2024-10-04
# 毒性分類器とAbleismに応答する大規模言語モデル How Toxicity Classifiers and Large Language Models Respond to Ableism ( http://arxiv.org/abs/2410.03448v1 ) ライセンス: Link先を確認	Mahika Phutane, Ananya Seelam, Aditya Vashistha,	(参考訳) 障害のある人(PwD)は、定期的にネット上の憎悪やマイクロアグレッションに遭遇する。オンラインプラットフォームは、機械学習モデルを使用してオンラインの害を和らげる一方で、これらのモデルが能力主義とどのように相互作用するかを研究する研究はほとんどない。本稿では,PwDをターゲットとした100のソーシャルメディアコメントのデータセットをキュレートし,160人の参加者を募集し,これらのコメントがいかに有毒で有能かを説明する。その後,最先端の毒性分類器 (TCs) と大規模言語モデル (LLMs) を誘導し,その害を評価・説明した。分析の結果, TCsおよびLSMsはPwDよりも毒性が有意に低かったが, LLMsは一般的にPwDと同程度であった。しかし、LLMによる能力主義の説明は感情的な害を見落としており、PwDの説明の重要な側面である文脈の特異性や認識が欠如していた。障害を意識した毒性分類器を設計する上での課題について論じ,能力主義検出から能力主義解釈・説明への転換を提唱する。 People with disabilities (PwD) regularly encounter ableist hate and microaggressions online. While online platforms use machine learning models to moderate online harm, there is little research investigating how these models interact with ableism. In this paper, we curated a dataset of 100 social media comments targeted towards PwD, and recruited 160 participants to rate and explain how toxic and ableist these comments were. We then prompted state-of-the art toxicity classifiers (TCs) and large language models (LLMs) to rate and explain the harm. Our analysis revealed that TCs and LLMs rated toxicity significantly lower than PwD, but LLMs rated ableism generally on par with PwD. However, ableism explanations by LLMs overlooked emotional harm, and lacked specificity and acknowledgement of context, important facets of PwD explanations. Going forward, we discuss challenges in designing disability-aware toxicity classifiers, and advocate for the shift from ableism detection to ableism interpretation and explanation.	翻訳日:2024-11-02 22:09:37 公開日:2024-10-04
# リトリーバーとしてのMLLM: エージェントのマルチモーダル検索を対話的に学習する MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents ( http://arxiv.org/abs/2410.03450v1 ) ライセンス: Link先を確認	Junpeng Yue, Xinru Xu, Börje F. Karlsson, Zongqing Lu,	(参考訳) MLLMエージェントは、マルチモーダルなタスク関連軌道データを取得することで、複雑なエンボディされたタスクの可能性を実証する。しかし,現在の検索手法は,手前の特定のタスクに対する有効性を無視し,テキストや視覚的手がかりの表面レベルでの類似性に重点を置いている。この課題に対処するため,MLLM as ReTriever (MART) という新たな手法を提案し,対話データを利用してMLLMレトリバーを選好学習に基づいて微調整し,トラジェクトリの有効性を十分に考慮し,それらを未知のタスクに優先する手法を提案する。また、MLLMの要約機能を活用して、キー情報を保存しながら少ないトークンでトラジェクトリを表現する機構であるトラジェクトリ抽象化を導入し、エージェントがトラジェクトリのマイルストーンをよりよく理解できるようにする。様々な環境における実験結果から,本手法はベースライン手法と比較して,見えない場面でのタスク成功率を大幅に向上することが示された。本研究は,汎用MLLMをトラジェクタとして微調整し,トラジェクタの有効性を評価することで,エンボディエージェントのマルチモーダル検索のための新しいパラダイムを提案する。アクションと観測空間のすべてのベンチマークタスクセットとシミュレータコード修正がリリースされる。 MLLM agents demonstrate potential for complex embodied tasks by retrieving multimodal task-relevant trajectory data. However, current retrieval methods primarily focus on surface-level similarities of textual or visual cues in trajectories, neglecting their effectiveness for the specific task at hand. To address this issue, we propose a novel method, MLLM as ReTriever (MART), which enhances the performance of embodied agents by utilizing interaction data to fine-tune an MLLM retriever based on preference learning, such that the retriever fully considers the effectiveness of trajectories and prioritize them for unseen tasks. We also introduce Trajectory Abstraction, a mechanism that leverages MLLMs' summarization capabilities to represent trajectories with fewer tokens while preserving key information, enabling agents to better comprehend milestones in the trajectory. Experimental results across various environments demonstrate our method significantly improves task success rates in unseen scenes compared to baseline methods. This work presents a new paradigm for multimodal retrieval in embodied agents, by fine-tuning a general-purpose MLLM as the retriever to assess trajectory effectiveness. All benchmark task sets and simulator code modifications for action and observation spaces will be released.	翻訳日:2024-11-02 22:09:37 公開日:2024-10-04
# 周波数偏光超符号化フォトニック量子ビットの決定論的かつ効率的な源 A deterministic and efficient source of frequency-polarization hyper-encoded photonic qubits ( http://arxiv.org/abs/2410.03454v1 ) ライセンス: Link先を確認	N. Coste, D. A. Fioretto, S. E. Thomas, S. C. Wein, H. Ollivier, I. Maillette de Buy Wenniger, A. Henry, N. Belabas, A. Harouri, A. Lemaitre, I. Sagnes, N. Somaschi, O. Krebs, L. Lanco, P. Senellart,	(参考訳) 光子の周波数や色は、長距離にわたって量子情報を符号化し配布する魅力的な自由度である。しかし、周波数符号化されたフォトニック量子ビットの生成は、これまでは確率的な非線形単光子源と非効率ゲートに依存してきた。ここでは, 共振器内の半導体量子ドットに基づいて, 周波数および偏光に超符号化されたフォトニック量子ビットの決定論的生成を示す。我々は中性励起子の二重双極子構造を利用して、ポンプレーザーパルスの偏光によって制御される振幅と位相における量子重ね合わせの発生を実証する。ソースは、第1レンズの28$\pm$2%の生成確率に対応する4MHzの周波数偏光単光子量子ビットを生成し、光子数純度は98%である。光子は、それぞれの双極子に対して91%、両のバランスの取れた量子重ね合わせにおいて88%の区別がつかない。超符号化フォトニック状態の密度行列は時間分解偏光トモグラフィーにより測定され、目標状態に対する忠実度は94$\pm$ 8%、収束度は77$\pm$ 2%と推定される。我々のアプローチは、周波数符号化に基づく量子情報処理の分野に量子ドット源の利点をもたらす。 The frequency or color of photons is an attractive degree of freedom to encode and distribute the quantum information over long distances. However, the generation of frequency-encoded photonic qubits has so far relied on probabilistic non-linear single-photon sources and inefficient gates. Here, we demonstrate the deterministic generation of photonic qubits hyper-encoded in frequency and polarization based on a semiconductor quantum dot in a cavity. We exploit the double dipole structure of a neutral exciton and demonstrate the generation of any quantum superposition in amplitude and phase, controlled by the polarization of the pump laser pulse. The source generates frequency-polarization single-photon qubits at a rate of 4 MHz corresponding to a generation probability at the first lens of 28 $\pm$ 2%, with a photon number purity > 98%. The photons show an indistinguishability > 91% for each dipole and 88% for a balanced quantum superposition of both. The density matrix of the hyper-encoded photonic state is measured by time-resolved polarization tomography, evidencing a fidelity to the target state of 94 $\pm$ 8% and concurrence of 77 $\pm$ 2%, here limited by frequency overlap in our device. Our approach brings the advantages of quantum dot sources to the field of quantum information processing based on frequency encoding.	翻訳日:2024-11-02 22:09:37 公開日:2024-10-04
# CoCoLoFa: LLM支援の群衆が書いた共通の論理的誤りを伴うニュースコメントのデータセット CoCoLoFa: A Dataset of News Comments with Common Logical Fallacies Written by LLM-Assisted Crowds ( http://arxiv.org/abs/2410.03457v1 ) ライセンス: Link先を確認	Min-Hsuan Yeh, Ruyuan Wan, Ting-Hao 'Kenneth' Huang,	(参考訳) テキスト中の論理的誤検出は、ユーザが引数の欠陥を見つけるのに役立つが、この検出を自動化するのは容易ではない。大規模な実世界のテキストデータを手動で注釈付けして、検出モデルの開発と検証のためのデータセットを作成するのはコストがかかる。本稿では,648件のニュース記事に対する7,706件のコメントと,それぞれのコメントに誤りの有無とタイプをラベル付けした,既知の最大の論理的誤読データセットであるCoCoLoFaを紹介する。我々は,ニュース記事に反応して,特定の誤字型(例えば,滑りやすい斜面)を具現化したコメントを書くために,143人の群衆労働者を募集した。この作業の複雑さを認識して,作業者のインターフェースにLLMを利用したアシスタントを構築し,コメントの起草と修正を支援した。専門家は、CoCoLoFaの書き込み品質とラベル付けの有効性を高い信頼性と評価した。 CoCoLoFaを使用して微調整されたBERTベースのモデルは、テストセット上で最高の誤検出(F1=0.86)と分類(F1=0.87)を達成し、最先端のLLMよりも優れていた。我々の研究は、クラウドソーシングとLLMを組み合わせることで、より効果的に複雑な言語現象のデータセットを構築することができることを示している。 Detecting logical fallacies in texts can help users spot argument flaws, but automating this detection is not easy. Manually annotating fallacies in large-scale, real-world text data to create datasets for developing and validating detection models is costly. This paper introduces CoCoLoFa, the largest known logical fallacy dataset, containing 7,706 comments for 648 news articles, with each comment labeled for fallacy presence and type. We recruited 143 crowd workers to write comments embodying specific fallacy types (e.g., slippery slope) in response to news articles. Recognizing the complexity of this writing task, we built an LLM-powered assistant into the workers' interface to aid in drafting and refining their comments. Experts rated the writing quality and labeling validity of CoCoLoFa as high and reliable. BERT-based models fine-tuned using CoCoLoFa achieved the highest fallacy detection (F1=0.86) and classification (F1=0.87) performance on its test set, outperforming the state-of-the-art LLMs. Our work shows that combining crowdsourcing and LLMs enables us to more effectively construct datasets for complex linguistic phenomena that crowd workers find challenging to produce on their own.	翻訳日:2024-11-02 22:09:37 公開日:2024-10-04
# 多次元ベトナム:タスク、データセット、ベースラインモデル、課題 Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges ( http://arxiv.org/abs/2410.03458v1 ) ライセンス: Link先を確認	Nguyen Van Dinh, Thanh Chi Dang, Luan Thanh Nguyen, Kiet Van Nguyen,	(参考訳) 低資源語であるベトナム語は通常、北ベトナム、中央ベトナム、南ベトナムに属する3つの主要な方言群に分類される。しかし、これらの地域内の各州は独自の発音のバリエーションを持っている。様々な音声認識データセットが存在するにもかかわらず、ベトナムの個々の州に特有の63の方言の詳細な分類を提供していない。このギャップに対処するため、ベトナム全土で話されている63の地方方言の多様性を包括的に分析したベトナム多方言データセット(ViMD)を導入した。我々のデータセットは、約19,000の発話からなる102.56時間の音声からなり、関連するテキストには120万以上の単語が含まれている。ベンチマークを行い、データセットの課題を同時に示すために、(1)識別と(2)音声認識の2つの下流タスクに対して、最先端のトレーニング済みモデルを微調整する。実験結果から,地理的要因が方言に与える影響と,多言語音声データを含む音声認識タスクにおける現在のアプローチの制約の2つが示唆された。私たちのデータセットは研究目的で利用可能です。 Vietnamese, a low-resource language, is typically categorized into three primary dialect groups that belong to Northern, Central, and Southern Vietnam. However, each province within these regions exhibits its own distinct pronunciation variations. Despite the existence of various speech recognition datasets, none of them has provided a fine-grained classification of the 63 dialects specific to individual provinces of Vietnam. To address this gap, we introduce Vietnamese Multi-Dialect (ViMD) dataset, a novel comprehensive dataset capturing the rich diversity of 63 provincial dialects spoken across Vietnam. Our dataset comprises 102.56 hours of audio, consisting of approximately 19,000 utterances, and the associated transcripts contain over 1.2 million words. To provide benchmarks and simultaneously demonstrate the challenges of our dataset, we fine-tune state-of-the-art pre-trained models for two downstream tasks: (1) Dialect identification and (2) Speech recognition. The empirical results suggest two implications including the influence of geographical factors on dialects, and the constraints of current approaches in speech recognition tasks involving multi-dialect speech data. Our dataset is available for research purposes.	翻訳日:2024-11-02 22:09:37 公開日:2024-10-04
# テキスト音声合成のための生成意味コミュニケーション Generative Semantic Communication for Text-to-Speech Synthesis ( http://arxiv.org/abs/2410.03459v1 ) ライセンス: Link先を確認	Jiahao Zheng, Jinke Ren, Peng Xu, Zhihao Yuan, Jie Xu, Fangxin Wang, Gui Gui, Shuguang Cui,	(参考訳) セマンティック通信は、ソースデータのセマンティック情報のみを送信することによって、通信効率を向上させるための有望な技術である。しかし,従来の意味コミュニケーション手法は,テキスト音声合成(TTS)のような新たな生成タスクでは効率が良くないデータ再構成タスクに重点を置いている。この制限に対処するために, 生成人工知能技術を活用した, TTS合成のための新しい生成意味コミュニケーションフレームワークを開発する。まず,WavLMと残留ベクトル量子化法という事前学習された大音声モデルを用いて,送信側と受信側で2つの意味的知識ベース(KB)を構築する。送信機におけるKBは効果的な意味抽出を可能にし、受信機におけるKBは生命に似た音声合成を促進する。そこで我々は,トランスフォーマーエンコーダと拡散モデルを用いて,通信オーバーヘッドを伴わずに効率的なセマンティックコーディングを実現する。最後に, 付加的な白色ガウスノイズ流路とレイリーフェディング流路のいずれにおいても, 生成した音声の忠実度は4つのベースラインよりもはるかに高いことを示した。 Semantic communication is a promising technology to improve communication efficiency by transmitting only the semantic information of the source data. However, traditional semantic communication methods primarily focus on data reconstruction tasks, which may not be efficient for emerging generative tasks such as text-to-speech (TTS) synthesis. To address this limitation, this paper develops a novel generative semantic communication framework for TTS synthesis, leveraging generative artificial intelligence technologies. Firstly, we utilize a pre-trained large speech model called WavLM and the residual vector quantization method to construct two semantic knowledge bases (KBs) at the transmitter and receiver, respectively. The KB at the transmitter enables effective semantic extraction, while the KB at the receiver facilitates lifelike speech synthesis. Then, we employ a transformer encoder and a diffusion model to achieve efficient semantic coding without introducing significant communication overhead. Finally, numerical results demonstrate that our framework achieves much higher fidelity for the generated speech than four baselines, in both cases with additive white Gaussian noise channel and Rayleigh fading channel.	翻訳日:2024-11-02 22:09:37 公開日:2024-10-04
# 自動GDA: 検索拡張生成における効率的な接地検証のための自動ドメイン適応 Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation ( http://arxiv.org/abs/2410.03461v1 ) ライセンス: Link先を確認	Tobias Leemann, Periklis Petridis, Giuseppe Vietri, Dionysis Manousakas, Aaron Roth, Sergul Aydore,	(参考訳) 検索拡張生成(RAG)は、大規模言語モデル(LLM)出力の事実性を高めることが示されているが、LLMはまだ幻覚に悩まされており、誤った情報や無関係な情報を生成する。 1つの一般的な検出戦略は、LLMにその応答が得られた証拠に根拠があるかどうかを再度評価させることであるが、このアプローチはコストがかかる。あるいは、効率的な基底検証のための軽量自然言語推論(NLI)モデルも推論時に利用できる。既存の事前学習されたNLIモデルは潜在的な解決策を提供するが、実際のRAG入力のより大きなモデルに比べて性能は低い。 RAG入力は、NLIモデルをトレーニングするために使われるほとんどのデータセットよりも複雑で、基礎となる知識ベースに特有の特徴を持ち、特定のターゲットドメインにNLIモデルを適応する必要がある。さらに、ターゲットドメインにラベル付きインスタンスがないため、例えば、微調整によって、教師付きドメイン適応が不可能になる。これらの課題に対処するために、自動生成ドメイン適応(Auto Generative Domain Adaptation, Auto-GDA)を導入する。我々のフレームワークは、合成データ生成による教師なしドメイン適応を可能にする。従来の手作りフィルタリングや拡張戦略に依存した手法とは異なり、Auto-GDAは、低効率の教師モデルからの弱いラベルと離散最適化を用いて生成したサンプルの品質を継続的に改善し、最も有望な追加サンプルを選択するために反復的なプロセスを採用している。提案手法の有効性を実験的に検証し,Auto-GDAを用いた合成データに微調整したモデルが,教師モデルの性能を上回り,LLMの性能レベルを計算コストの10%にまで達することを示した。 While retrieval augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs, LLMs still suffer from hallucination, generating incorrect or irrelevant information. One common detection strategy involves prompting the LLM again to assess whether its response is grounded in the retrieved evidence, but this approach is costly. Alternatively, lightweight natural language inference (NLI) models for efficient grounding verification can be used at inference time. While existing pre-trained NLI models offer potential solutions, their performance remains subpar compared to larger models on realistic RAG inputs. RAG inputs are more complex than most datasets used for training NLI models and have characteristics specific to the underlying knowledge base, requiring adaptation of the NLI models to a specific target domain. Additionally, the lack of labeled instances in the target domain makes supervised domain adaptation, e.g., through fine-tuning, infeasible. To address these challenges, we introduce Automatic Generative Domain Adaptation (Auto-GDA). Our framework enables unsupervised domain adaptation through synthetic data generation. Unlike previous methods that rely on handcrafted filtering and augmentation strategies, Auto-GDA employs an iterative process to continuously improve the quality of generated samples using weak labels from less efficient teacher models and discrete optimization to select the most promising augmented samples. Experimental results demonstrate the effectiveness of our approach, with models fine-tuned on synthetic data using Auto-GDA often surpassing the performance of the teacher model and reaching the performance level of LLMs at 10 % of their computational cost.	翻訳日:2024-11-02 22:09:37 公開日:2024-10-04
# 逆問題に対する拡散状態誘導射影勾配 Diffusion State-Guided Projected Gradient for Inverse Problems ( http://arxiv.org/abs/2410.03463v1 ) ライセンス: Link先を確認	Rayhan Zirvi, Bahareh Tolooshams, Anima Anandkumar,	(参考訳) 拡散モデルの最近の進歩は、逆問題解決のためのデータ事前学習に有効である。拡散サンプリングステップを利用して、各ステップで測定ガイダンス勾配を使用してデータの一貫性を強制する。一般の逆問題では、測定精度が低下し、不正確な後続サンプリングが生じるため、無条件で訓練された拡散モデルを使用する場合、近似が必要である。言い換えれば、それらの近似により、これらの手法は拡散前の拡散によって定義されるデータ多様体上の生成過程を保存できず、画像復元のような応用の成果物に繋がる。拡散過程の中間状態の低ランク近似である部分空間に測定勾配を投影する拡散状態誘導射影勾配(DiffStateGrad)を提案する。 DiffStateGradは、モジュールとして、幅広い拡散ベースの逆解法に付加することができ、以前の多様体上の拡散過程の保存を改善し、アーティファクト誘導コンポーネントをフィルタリングすることができる。 DiffStateGradは、測定手順のステップサイズとノイズの選択による拡散モデルのロバスト性の向上と、最悪の場合の性能向上を両立させる。最後に、DiffStateGradは、線形および非線形画像復元の逆問題に対する最先端技術を改善することを実証する。 Recent advancements in diffusion models have been effective in learning data priors for solving inverse problems. They leverage diffusion sampling steps for inducing a data prior while using a measurement guidance gradient at each step to impose data consistency. For general inverse problems, approximations are needed when an unconditionally trained diffusion model is used since the measurement likelihood is intractable, leading to inaccurate posterior sampling. In other words, due to their approximations, these methods fail to preserve the generation process on the data manifold defined by the diffusion prior, leading to artifacts in applications such as image restoration. To enhance the performance and robustness of diffusion models in solving inverse problems, we propose Diffusion State-Guided Projected Gradient (DiffStateGrad), which projects the measurement gradient onto a subspace that is a low-rank approximation of an intermediate state of the diffusion process. DiffStateGrad, as a module, can be added to a wide range of diffusion-based inverse solvers to improve the preservation of the diffusion process on the prior manifold and filter out artifact-inducing components. We highlight that DiffStateGrad improves the robustness of diffusion models in terms of the choice of measurement guidance step size and noise while improving the worst-case performance. Finally, we demonstrate that DiffStateGrad improves upon the state-of-the-art on linear and nonlinear image restoration inverse problems.	翻訳日:2024-11-02 22:09:37 公開日:2024-10-04
# S7: シーケンスモデリングのための選択的で単純化された状態空間層 S7: Selective and Simplified State Space Layers for Sequence Modeling ( http://arxiv.org/abs/2410.03464v1 ) ライセンス: Link先を確認	Taylan Soydan, Nikola Zubić, Nico Messikommer, Siddhartha Mishra, Davide Scaramuzza,	(参考訳) シーケンスモデリングにおける中心的な課題は、拡張されたコンテキストでタスクを効率的に処理することである。最近の状態空間モデル(SSM)はこの分野で大きな進歩を遂げているが、入力依存フィルタリングが欠如している場合が多い。安定なパラメータ化と特定の設計選択を取り入れ、入力内容に基づいて状態遷移を動的に調整し、効率と性能を維持しながら、入力依存を処理できるシンプルで強力なSSMであるS7を導入することで、このギャップに対処する。この再パラメータ化は、時間とともに状態遷移を良好に保ち、長期連続モデリングにおける安定性を保証することを証明している。さらに、グラデーション規範をコントロールし、効率的なトレーニングを可能にし、グラデーションの爆発や消滅といった問題を防止する。 S7は、ニューロモルフィックイベントベースのデータセット、Long Range Arenaベンチマーク、さまざまな物理的および生物学的時系列など、さまざまなシーケンスモデリングタスクにおいて、ベースラインを大幅に上回っている。全体として、S7は、複雑なドメイン固有の帰納的バイアスに頼ることなく、より簡単なシーケンスモデリングアプローチを提供する。 A central challenge in sequence modeling is efficiently handling tasks with extended contexts. While recent state-space models (SSMs) have made significant progress in this area, they often lack input-dependent filtering or require substantial increases in model complexity to handle input variability. We address this gap by introducing S7, a simplified yet powerful SSM that can handle input dependence while incorporating stable reparameterization and specific design choices to dynamically adjust state transitions based on input content, maintaining efficiency and performance. We prove that this reparameterization ensures stability in long-sequence modeling by keeping state transitions well-behaved over time. Additionally, it controls the gradient norm, enabling efficient training and preventing issues like exploding or vanishing gradients. S7 significantly outperforms baselines across various sequence modeling tasks, including neuromorphic event-based datasets, Long Range Arena benchmarks, and various physical and biological time series. Overall, S7 offers a more straightforward approach to sequence modeling without relying on complex, domain-specific inductive biases, achieving significant improvements across key benchmarks.	翻訳日:2024-11-02 21:59:46 公開日:2024-10-04
# 安全か? ヘイトスピーチカウンターリングにおけるLLMの断面積強度に及ぼすガードレールの影響 Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering ( http://arxiv.org/abs/2410.03466v1 ) ライセンス: Link先を確認	Helena Bonaldi, Greta Damo, Nicolás Benjamín Ocampo, Elena Cabrio, Serena Villata, Marco Guerini,	(参考訳) ヘイトスピーチ緩和戦略としての反音声の有効性は、NLG研究コミュニティ、特にそれを自動生成するタスクへの関心が高まりつつある。しかし、自動生成された応答は、専門家が生成した反音声を特徴付ける議論的な豊かさを欠くことが多い。本研究では,よりコジェントな応答を生成するために,対音声生成の2つの側面に焦点を当てる。まず, 安全ガードレールの存在が世代品質を損なうかどうかを検証した。第二に、ヘイトスピーチの特定の要素を攻撃することが、オンラインヘイトと戦うためのより効果的な議論戦略をもたらすかどうかを評価する。広範囲な人的・自動的な評価を行うことにより、安全ガードレールの存在がいかに有害であるかを、本質的に肯定的な社会的相互作用を育むことを目的とした課題に示す。さらに, ヘイトスピーチの特定の構成要素, 特に暗黙の否定的ステレオタイプとそのヘイトフルな部分に対する攻撃は, 高品質な世代を生み出すことが示唆された。 The potential effectiveness of counterspeech as a hate speech mitigation strategy is attracting increasing interest in the NLG research community, particularly towards the task of automatically producing it. However, automatically generated responses often lack the argumentative richness which characterises expert-produced counterspeech. In this work, we focus on two aspects of counterspeech generation to produce more cogent responses. First, by investigating the tension between helpfulness and harmlessness of LLMs, we test whether the presence of safety guardrails hinders the quality of the generations. Secondly, we assess whether attacking a specific component of the hate speech results in a more effective argumentative strategy to fight online hate. By conducting an extensive human and automatic evaluation, we show how the presence of safety guardrails can be detrimental also to a task that inherently aims at fostering positive social interactions. Moreover, our results show that attacking a specific component of the hate speech, and in particular its implicit negative stereotype and its hateful parts, leads to higher-quality generations.	翻訳日:2024-11-02 21:59:46 公開日:2024-10-04
# 注意図のトポロジ解析による脆弱性検出 Vulnerability Detection via Topological Analysis of Attention Maps ( http://arxiv.org/abs/2410.03470v1 ) ライセンス: Link先を確認	Pavel Snopov, Andrey Nikolaevich Golubinskiy,	(参考訳) 近年,脆弱性検出に対するディープラーニング(DL)アプローチが注目されている。これらの手法は有望な結果を示し、多くの場合、従来の静的コード解析ツールをはるかに上回っている。本研究では,BERTモデルの注意行列に基づくトポロジカルデータ解析(TDA)のツールを用いた脆弱性検出手法を提案する。従来の機械学習(ML)技術は,これらの注意行列から抽出したトポロジ的特徴に基づいて訓練すると,CodeBERTaのような事前学習言語モデル(LLM)と競合する。これは、永続的ホモロジーを含むTDAツールが、脆弱性を特定するために重要な意味情報を効果的にキャプチャできることを示している。 Recently, deep learning (DL) approaches to vulnerability detection have gained significant traction. These methods demonstrate promising results, often surpassing traditional static code analysis tools in effectiveness. In this study, we explore a novel approach to vulnerability detection utilizing the tools from topological data analysis (TDA) on the attention matrices of the BERT model. Our findings reveal that traditional machine learning (ML) techniques, when trained on the topological features extracted from these attention matrices, can perform competitively with pre-trained language models (LLMs) such as CodeBERTa. This suggests that TDA tools, including persistent homology, are capable of effectively capturing semantic information critical for identifying vulnerabilities.	翻訳日:2024-11-02 21:59:46 公開日:2024-10-04
# ピアレビューにおけるグループフェアネス Group Fairness in Peer Review ( http://arxiv.org/abs/2410.03474v1 ) ライセンス: Link先を確認	Haris Aziz, Evi Micha, Nisarg Shah,	(参考訳) NeurIPSやAAAIといった大規模なカンファレンスは、多数のコミュニティからの応募を惹きつけるため、さまざまなAI分野のクロスロードとして機能している。しかし、一部のコミュニティではレビュー経験が不十分な場合があり、そのコミュニティ以外では資格の低いレビュアーに応募が割り当てられている。しばしば推奨される解決策は、このような大きなカンファレンスを小さなカンファレンスに分割することだが、これはコミュニティの分離と学際的な研究の害につながる可能性がある。我々は、この課題に取り組み、コア(core)と呼ばれるグループフェアネスの概念を導入し、可能なすべてのコミュニティ(研究者のサブセット)を、大きなカンファレンスから撤退することで、一方的に利益を得ることができない方法で扱うことを要求する。我々は、簡単なピアレビューモデルについて研究し、常にコアにレビューの代入が認められることを証明し、そのような代入を見つけるための効率的なアルゴリズムを設計する。 CVPRとICLRのカンファレンスの実際のデータを使って、アルゴリズムと既存のレビュー割り当てアルゴリズムを、さまざまなメトリクスで比較しています。 Large conferences such as NeurIPS and AAAI serve as crossroads of various AI fields, since they attract submissions from a vast number of communities. However, in some cases, this has resulted in a poor reviewing experience for some communities, whose submissions get assigned to less qualified reviewers outside of their communities. An often-advocated solution is to break up any such large conference into smaller conferences, but this can lead to isolation of communities and harm interdisciplinary research. We tackle this challenge by introducing a notion of group fairness, called the core, which requires that every possible community (subset of researchers) to be treated in a way that prevents them from unilaterally benefiting by withdrawing from a large conference. We study a simple peer review model, prove that it always admits a reviewing assignment in the core, and design an efficient algorithm to find one such assignment. We use real data from CVPR and ICLR conferences to compare our algorithm to existing reviewing assignment algorithms on a number of metrics.	翻訳日:2024-11-02 21:59:46 公開日:2024-10-04
# 階層型ニューラルネットワークの学習の難しさについて On the Hardness of Learning One Hidden Layer Neural Networks ( http://arxiv.org/abs/2410.03477v1 ) ライセンス: Link先を確認	Shuchen Li, Ilias Zadik, Manolis Zampetakis,	(参考訳) 本研究では,1つの隠れ層ReLUニューラルネットワークを$\mathbb{R}^d$から入力することで学習する問題を考察する。この学習問題は,(1)ニューラルネットワークのサイズが$d$の多項式であり,(2)入力分布が標準ガウスであり,(3)ノイズがガウスで$d$の多項式が小さい場合においても,標準的な暗号的仮定の下では困難であることを示す。我々の硬さは、連続学習エラー(CLWE)問題の硬さに基づいており、特に、最も短いベクトル問題を乗算多項式係数まで解くという、最も難しい硬さに基づいている。 In this work, we consider the problem of learning one hidden layer ReLU neural networks with inputs from $\mathbb{R}^d$. We show that this learning problem is hard under standard cryptographic assumptions even when: (1) the size of the neural network is polynomial in $d$, (2) its input distribution is a standard Gaussian, and (3) the noise is Gaussian and polynomially small in $d$. Our hardness result is based on the hardness of the Continuous Learning with Errors (CLWE) problem, and in particular, is based on the largely believed worst-case hardness of approximately solving the shortest vector problem up to a multiplicative polynomial factor.	翻訳日:2024-11-02 21:59:46 公開日:2024-10-04
# VEDIT:手続き型ビデオ表現学習のための潜在予測アーキテクチャ VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning ( http://arxiv.org/abs/2410.03478v1 ) ライセンス: Link先を確認	Han Lin, Tushar Nagarajan, Nicolas Ballas, Mido Assran, Mojtaba Komeili, Mohit Bansal, Koustuv Sinha,	(参考訳) 手続き型ビデオ表現学習(Procedural video representation learning)は、現在入力されている映像をテキストアノテーションとともに予測し、予測できるエージェントを学習することを目的とした活発な研究分野である。先行研究は、しばしば言語監督を伴う視覚エンコーダや予測モデルの大規模事前学習に依存している。しかし、ノイズの多いテキスト管理を伴うビデオクリップシーケンスを学習するために、計算集中事前学習を拡張する必要性と効果は、これまでの研究でまだ十分に検証されていない。本研究では,厳密な既成の凍結型視覚エンコーダとよく設計された予測モデルを用いて,予測モデルの事前訓練や言語やASRからの追加の監督を必要とせず,予測および手続き計画における最先端(SoTA)のパフォーマンスを実現できることを示す。画素空間から表現を学習する代わりに,一般に公開されている視覚エンコーダの埋め込み空間を利用する。観察されたステップから凍結したクリップレベルの埋め込みを条件付け、未確認ステップの動作を予測することによって、我々の予測モデルは、反復的復調により予測のための堅牢な表現を学習することができる。 4つのデータセット(NIV, CrossTask, COIN, Ego4D-v2)にまたがる5つの手続き的学習タスク(NIV, CrossTask, COIN, Ego4D-v2)に関する実証的研究は、我々のモデルが長方形の行動予測において強いベースライン(+2.6%、Noun ED@20では+3.1%)を前進させ、ステップ予測(+5.0%)、タスク分類(+3.8%)、手順計画タスク(+2.28%、mAccでは+3.39%、mIoUでは+0.90%)においてSoTAを大幅に改善していることを示している。 Procedural video representation learning is an active research area where the objective is to learn an agent which can anticipate and forecast the future given the present video input, typically in conjunction with textual annotations. Prior works often rely on large-scale pretraining of visual encoders and prediction models with language supervision. However, the necessity and effectiveness of extending compute intensive pretraining to learn video clip sequences with noisy text supervision have not yet been fully validated by previous works. In this work, we show that a strong off-the-shelf frozen pretrained visual encoder, along with a well designed prediction model, can achieve state-of-the-art (SoTA) performance in forecasting and procedural planning without the need for pretraining the prediction model, nor requiring additional supervision from language or ASR. Instead of learning representations from pixel space, our method utilizes the latent embedding space of publicly available vision encoders. By conditioning on frozen clip-level embeddings from observed steps to predict the actions of unseen steps, our prediction model is able to learn robust representations for forecasting through iterative denoising - leveraging the recent advances in diffusion transformers (Peebles & Xie, 2023). Empirical studies over a total of five procedural learning tasks across four datasets (NIV, CrossTask, COIN and Ego4D-v2) show that our model advances the strong baselines in long-horizon action anticipation (+2.6% in Verb ED@20, +3.1% in Noun ED@20), and significantly improves the SoTA in step forecasting (+5.0%), task classification (+3.8%), and procedure planning tasks (up to +2.28% in success rate, +3.39% in mAcc, and +0.90% in mIoU).	翻訳日:2024-11-02 21:59:46 公開日:2024-10-04
# 開量子系問題としてのパラメトリック近似 Parametric approximation as open quantum systems problem ( http://arxiv.org/abs/2410.03482v1 ) ライセンス: Link先を確認	A. Yu. Karasev, A. E. Teretenkov,	(参考訳) 本研究では、パラメトリック近似のオープン量子系ビューを開発し、それに対する体系的な摂動補正を得る。散逸を伴うJaynes-Cummingsモデルを考えると、この場は減少を伴うパラメトリック近似に近い状態にあると仮定する。パラメトリック近似に対する非単項補正とそれに対する動的ラムシフトの寄与を得る。高階調では、これらの非単位補正は、枯渇する前に非マルコフ的であるように見える。また, 劣化後の初期マルコフ的非マルコフ的挙動は, レーザー誘起密度行列の研磨により動的に寄与することを示した。 In this work we develop an open quantum system view of the parametric approximation, which allows us to obtain systematic perturbative corrections to it. We consider the Jaynes-Cummings model with dissipation, assuming that the field is in the regime close to the parametric approximation with depletion. We obtain non-unitary corrections to the parametric approximation and additional dynamical Lamb-shift contributions to it. For high detuning, these non-unitary corrections appear to be non-Markovian before depletion. And we show that even after depletion, initial non-Markovian behaviour contributes to the dynamics via laser-induced polishing of the density matrix.	翻訳日:2024-11-02 21:59:46 公開日:2024-10-04
# ディープフェイク検出のためのマルチモーダルフレームワーク A Multimodal Framework for Deepfake Detection ( http://arxiv.org/abs/2410.03487v1 ) ライセンス: Link先を確認	Kashish Gandhi, Prutha Kulkarni, Taran Shah, Piyush Chaudhari, Meera Narvekar, Kranti Ghag,	(参考訳) ディープフェイク技術の急速な進歩は、デジタルメディアの完全性に重大な脅威をもたらす。 AIを使って合成メディアを作るDeepfakesは、ビデオやオーディオを説得力を持って修正して、現実を正しく表現する。これにより、個人情報、詐欺、および個人のプライバシーとセキュリティに対する深刻な影響のリスクが生じる。本研究は,視覚的要素と聴覚的要素の両方を対象とする,革新的なマルチモーダルアプローチによるディープフェイクの重要課題に対処する。この包括的な戦略は、人間の知覚が複数の感覚入力、特に視覚情報と聴覚情報を統合し、メディアコンテンツを完全に理解することを認識する。視覚分析のために,高度な特徴抽出技術を用いたモデルを開発し,9つの顔の特徴を抽出し,様々な機械学習モデルと深層学習モデルを適用した。本モデルでは,特徴抽出にメル・スペクトログラム解析を用い,各種機械学習および深層学習モデルを適用した。組み合わせた分析を実現するため、元のデータセットの実際の音声とディープフェイク音声は、テスト目的で交換され、バランスの取れたサンプルが確保された。提案した映像・音声分類モデル,すなわち人工ニューラルネットワークとVGG19を用いて,いずれの成分も同定した場合,全体サンプルをディープフェイクとして分類する。我々のマルチモーダル・フレームワークは視覚的・聴覚的分析を組み合わせたもので、精度は94%である。 The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.	翻訳日:2024-11-02 21:59:46 公開日:2024-10-04
# FAIR原則を計算ワークフローに適用する Applying the FAIR Principles to Computational Workflows ( http://arxiv.org/abs/2410.03490v1 ) ライセンス: Link先を確認	Sean R. Wilkinson, Meznah Aloqalaa, Khalid Belhajjame, Michael R. Crusoe, Bruno de Paula Kinoshita, Luiz Gadelha, Daniel Garijo, Ove Johan Ragnar Gustafsson, Nick Juty, Sehrish Kanwal, Farah Zaib Khan, Johannes Köster, Karsten Peters-von Gehlen, Line Pouchard, Randy K. Rannow, Stian Soiland-Reyes, Nicola Soranzo, Shoaib Sufi, Ziheng Sun, Baiba Vilne, Merridee A. Wouters, Denis Yuen, Carole Goble,	(参考訳) 計算科学とデータ科学の最近のトレンドは、生産性、再現性、プラットフォームへの民主化アクセスとノウハウの処理のためのツールとして、計算ワークフローの認識と採用が増加していることを示している。デジタルオブジェクトを共有、発見、再利用するためには、計算ワークフローはFinderable、Accessible、Interoperable、ReusableのFAIR原則の恩恵を受ける。 Workflows Community InitiativeのFAIR Workflows Working Group (WCI-FW)は、FAIRデータとソフトウェア原則の両方を計算ワークフローに適用する体系的な取り組みを行っている。我々は、私たちの議論を反映し、私たちの選択と適応を正当化するコメンデーションを提示する。それらがベースとするソフトウェアやデータ原則と同様に、これらはワークフローユーザと作者、ワークフロー管理システム開発者、ワークフローサービスのプロバイダに対して、採用のためのガイドレールとして提供され、議論の場となる。ワークフローは、データ分析、データ収集、AIベースの予測、シミュレーションのためのドキュメント化、自動化された機器として、より普及しつつある。本論文で提案するワークフローに対するFAIR勧告は,研究資産としての価値を最大化し,より広範なコミュニティによる採用を促進するものである。 Recent trends within computational and data sciences show an increasing recognition and adoption of computational workflows as tools for productivity, reproducibility, and democratized access to platforms and processing know-how. As digital objects to be shared, discovered, and reused, computational workflows benefit from the FAIR principles, which stand for Findable, Accessible, Interoperable, and Reusable. The Workflows Community Initiative's FAIR Workflows Working Group (WCI-FW), a global and open community of researchers and developers working with computational workflows across disciplines and domains, has systematically addressed the application of both FAIR data and software principles to computational workflows. We present our recommendations with commentary that reflects our discussions and justifies our choices and adaptations. Like the software and data principles on which they are based, these are offered to workflow users and authors, workflow management system developers, and providers of workflow services as guide rails for adoption and fodder for discussion. Workflows are becoming more prevalent as documented, automated instruments for data analysis, data collection, AI-based predictions, and simulations. The FAIR recommendations for workflows that we propose in this paper will maximize their value as research assets and facilitate their adoption by the wider community.	翻訳日:2024-11-02 21:59:46 公開日:2024-10-04
# 再現可能なLCM評価に向けて:LCMベンチマークスコアの不確かさの定量化 Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores ( http://arxiv.org/abs/2410.03492v1 ) ライセンス: Link先を確認	Robert E. Blackwell, Jon Barry, Anthony G. Cohn,	(参考訳) 大規模言語モデル(LLM)は確率的であり、固定されたランダムシードで温度を0に設定しても、すべてのモデルが決定論的回答を与えるわけではない。しかしながら、連続実験の時間とコストのために不確実性を定量化しようとするベンチマーク研究はほとんどない。 LLMのキャパシティをテストするために設計されたベンチマークを用いて,実験的な繰り返しが平均スコアと予測間隔に与える影響を推定する。本稿では,ベンチマークスコアの不確かさを簡易に定量化する手法を提案し,再現可能なLCM評価について提案する。 Large language models (LLMs) are stochastic, and not all models give deterministic answers, even when setting temperature to zero with a fixed random seed. However, few benchmark studies attempt to quantify uncertainty, partly due to the time and cost of repeated experiments. We use benchmarks designed for testing LLMs' capacity to reason about cardinal directions to explore the impact of experimental repeats on mean score and prediction interval. We suggest a simple method for cost-effectively quantifying the uncertainty of a benchmark score and make recommendations concerning reproducible LLM evaluation.	翻訳日:2024-11-02 21:59:45 公開日:2024-10-04
# 合成可能な化学空間をナビゲートするための生成人工知能 Generative Artificial Intelligence for Navigating Synthesizable Chemical Space ( http://arxiv.org/abs/2410.03494v1 ) ライセンス: Link先を確認	Wenhao Gao, Shitong Luo, Connor W. Coley,	(参考訳) 合成可能な化学空間を効率的に探索し、ナビゲートするための生成モデリングフレームワークであるSynFormerを紹介する。従来の分子生成手法とは異なり、我々は分子の合成経路を生成し、設計が合成的に牽引可能であることを保証する。拡張性のあるトランスフォーマーアーキテクチャとブロック選択のための拡散モジュールを組み込むことで、SynFormerは合成可能な分子設計において既存のモデルを超えている。本研究では,(1) 局所的な化学空間探索,(2) 参照分子の合成可能な類似物を生成する,(2) グローバルな化学空間探索,(2) ブラックボックス特性予測オラクルに基づいて最適な分子を同定する,という2つの主要な応用において,SynFormerの有効性を実証する。さらに,より計算資源が利用可能になるにつれて,性能の向上を通じて,我々のアプローチのスケーラビリティを実証する。当社のコードとトレーニング済みのモデルを公開することで、SynFormerは、薬物発見や材料科学の分野にまたがって利用できるようになることを期待しています。 We introduce SynFormer, a generative modeling framework designed to efficiently explore and navigate synthesizable chemical space. Unlike traditional molecular generation approaches, we generate synthetic pathways for molecules to ensure that designs are synthetically tractable. By incorporating a scalable transformer architecture and a diffusion module for building block selection, SynFormer surpasses existing models in synthesizable molecular design. We demonstrate SynFormer's effectiveness in two key applications: (1) local chemical space exploration, where the model generates synthesizable analogs of a reference molecule, and (2) global chemical space exploration, where the model aims to identify optimal molecules according to a black-box property prediction oracle. Additionally, we demonstrate the scalability of our approach via the improvement in performance as more computational resources become available. With our code and trained models openly available, we hope that SynFormer will find use across applications in drug discovery and materials science.	翻訳日:2024-11-02 21:59:45 公開日:2024-10-04
# Fourier PINN: 強い境界条件から適応的なFourierベースへ Fourier PINNs: From Strong Boundary Conditions to Adaptive Fourier Bases ( http://arxiv.org/abs/2410.03496v1 ) ライセンス: Link先を確認	Madison Cooley, Varun Shankar, Robert M. Kirby, Shandian Zhe,	(参考訳) 偏微分方程式(PDE)の従来の数値解法に代わるメッシュフリーの代替として、物理情報ニューラルネットワーク(PINN)への関心が高まっている。しかし、PINNは高頻度でマルチスケールなターゲットソリューションを学ぶのに苦労することが多い。この問題に対処するために,我々はまず,ディリクレ BC に対する PINN の強い境界条件 (BC) について検討し,標準 PINN と比較して相対誤差が一貫した減少を観察する。次にフーリエ変換と畳み込み定理に基づく理論的解析を行う。強いBC PINNは、ターゲット溶液の高周波成分の振幅をよりよく学習できることがわかった。しかし、強力なBC PINNのアーキテクチャを構築することは、多くのBCやドメインのジオメトリにとって困難である。理論解析により,Fourier PINN を提案する。Fourier PINN は単純で汎用的で強力な手法で,あらかじめ特定された密度の高いFourier ベースで PINN を増強する。提案アーキテクチャも同様に高周波成分を学習するが、特定のBCや問題領域に制限はない。本研究では,ニューラルネットベース最適化,フーリエとニューラルネットベースベース推定,係数切り抜きによる適応学習とベース選択アルゴリズムを開発した。このスキームは、高い周波数を柔軟に識別し、名目周波数を弱め、ターゲットの溶液のパワースペクトルをよりよく捉えることができる。我々は,一連の系統的な実験を通じて,アプローチの利点を示す。 Interest is rising in Physics-Informed Neural Networks (PINNs) as a mesh-free alternative to traditional numerical solvers for partial differential equations (PDEs). However, PINNs often struggle to learn high-frequency and multi-scale target solutions. To tackle this problem, we first study a strong Boundary Condition (BC) version of PINNs for Dirichlet BCs and observe a consistent decline in relative error compared to the standard PINNs. We then perform a theoretical analysis based on the Fourier transform and convolution theorem. We find that strong BC PINNs can better learn the amplitudes of high-frequency components of the target solutions. However, constructing the architecture for strong BC PINNs is difficult for many BCs and domain geometries. Enlightened by our theoretical analysis, we propose Fourier PINNs -- a simple, general, yet powerful method that augments PINNs with pre-specified, dense Fourier bases. Our proposed architecture likewise learns high-frequency components better but places no restrictions on the particular BCs or problem domains. We develop an adaptive learning and basis selection algorithm via alternating neural net basis optimization, Fourier and neural net basis coefficient estimation, and coefficient truncation. This scheme can flexibly identify the significant frequencies while weakening the nominal frequencies to better capture the target solution's power spectrum. We show the advantage of our approach through a set of systematic experiments.	翻訳日:2024-11-02 21:50:00 公開日:2024-10-04
# 適応器の混合による協調的・効率的なパーソナライゼーション Collaborative and Efficient Personalization with Mixtures of Adaptors ( http://arxiv.org/abs/2410.03497v1 ) ライセンス: Link先を確認	Abdulla Jasem Almansoori, Samuel Horváth, Martin Takáč,	(参考訳) 非イドデータは、現実世界のフェデレーション学習問題で広く利用されている。データの不均一性は、分散シフトの点で異なるタイプのものとなる。この研究では、コンセプトシフト、すなわちクライアント間での予測のシフトから生じる異質性に興味を持っています。特に、モデルをクライアントのタスクに適応させたいマルチタスク学習について検討する。この問題に対処するためのパラメータ効率フレームワークを提案し、各クライアントはそのタスクに応じてパラメータ効率のよいアダプタを混在させることを学ぶ。バックボーンとしてLoRA(Lolow-Rank Adaptors)を使用し、そのコンセプトを他のタイプのレイヤに拡張しています。当社のフレームワークをFLoRAL(Federated Low-Rank Adaptive Learning)と呼んでいます。このフレームワークは、アルゴリズムではなく、マルチタスク学習目的のためのモデルパラメータ化であり、文献からの多くのアルゴリズムを含む、この目的を最適化する任意のアルゴリズム上で動作することができる。 FLoRALはメモリ効率が高く、クライアントはアダプタ自体がフェデレーションされるため、小さな状態(例えば、アダプタ毎に1個)でパーソナライズされる。したがって、パーソナライゼーションは、この意味でも----フェデレーションである。クライアントは、アダプタをローカルにトレーニングすることで、より自由にパーソナライズすることができるが、アダプタの協調的かつ効率的なトレーニングが可能であり、パフォーマンスが向上することを示す。また,FLoralは,フェデレートされたパーソナライゼーションのメリットと,FLoralの過度な適合性を示すため,クラスタ割り当てを最適化した完全モデルのアンサンブルよりも優れていることを示す。合成データセット,MNIST, CIFAR-10, CIFAR-100などの実世界のマルチタスク問題について, 有望な実験結果を示す。また, 局所SGDの緩和対象に関する理論的解析を行い, 凝集ミスマッチが収束に及ぼす影響について考察する。 Non-iid data is prevalent in real-world federated learning problems. Data heterogeneity can come in different types in terms of distribution shifts. In this work, we are interested in the heterogeneity that comes from concept shifts, i.e., shifts in the prediction across clients. In particular, we consider multi-task learning, where we want the model to adapt to the task of the client. We propose a parameter-efficient framework to tackle this issue, where each client learns to mix between parameter-efficient adaptors according to its task. We use Low-Rank Adaptors (LoRAs) as the backbone and extend its concept to other types of layers. We call our framework Federated Low-Rank Adaptive Learning (FLoRAL). This framework is not an algorithm but rather a model parameterization for a multi-task learning objective, so it can work on top of any algorithm that optimizes this objective, which includes many algorithms from the literature. FLoRAL is memory-efficient, and clients are personalized with small states (e.g., one number per adaptor) as the adaptors themselves are federated. Hence, personalization is--in this sense--federated as well. Even though clients can personalize more freely by training an adaptor locally, we show that collaborative and efficient training of adaptors is possible and performs better. We also show that FLoRAL can outperform an ensemble of full models with optimal cluster assignment, which demonstrates the benefits of federated personalization and the robustness of FLoRAL to overfitting. We show promising experimental results on synthetic datasets, real-world federated multi-task problems such as MNIST, CIFAR-10, and CIFAR-100. We also provide a theoretical analysis of local SGD on a relaxed objective and discuss the effects of aggregation mismatch on convergence.	翻訳日:2024-11-02 21:50:00 公開日:2024-10-04
# FedStein: James-Stein Estimatorによるマルチドメインフェデレーション学習の促進 FedStein: Enhancing Multi-Domain Federated Learning Through James-Stein Estimator ( http://arxiv.org/abs/2410.03499v1 ) ライセンス: Link先を確認	Sunny Gupta, Nikita Jangid, Amit Sethi,	(参考訳) Federated Learning (FL)は、分散クライアント間の協調的なインサイトトレーニングを可能にすることで、データのプライバシを促進する。その固有の利点にもかかわらず、FLは独立で同一に分散されていないデータを扱う際に、パフォーマンスと収束の重大な課題に直面している(非i.d.)。従来,クライアント間でスキュードなラベル分散が問題視されてきたが,本研究では,異なる特徴分布を持つ異なるドメインからクライアントデータが派生するマルチドメインFLの課題に焦点をあてた。本稿では,FedStein: Enhancing Multi-Domain Federated Learning through the James-Stein Estimatorを提案する。 FedSteinは、ローカルBNパラメータを維持しながら、クライアント間でバッチ正規化(BN)統計のJames-Stein(JS)推定のみを共有する。非BN層パラメータは標準FL技術で交換される。 3つのデータセットと複数のモデルで実施された大規模な実験は、FedSteinがFedAvgやFedBNといった既存の手法を上回り、特定のドメインで精度が14%以上向上し、ドメインの一般化が促進されたことを示している。コードはhttps://github.com/sunnyinAI/FedSteinで入手できる。 Federated Learning (FL) facilitates data privacy by enabling collaborative in-situ training across decentralized clients. Despite its inherent advantages, FL faces significant challenges of performance and convergence when dealing with data that is not independently and identically distributed (non-i.i.d.). While previous research has primarily addressed the issue of skewed label distribution across clients, this study focuses on the less explored challenge of multi-domain FL, where client data originates from distinct domains with varying feature distributions. We introduce a novel method designed to address these challenges FedStein: Enhancing Multi-Domain Federated Learning Through the James-Stein Estimator. FedStein uniquely shares only the James-Stein (JS) estimates of batch normalization (BN) statistics across clients, while maintaining local BN parameters. The non-BN layer parameters are exchanged via standard FL techniques. Extensive experiments conducted across three datasets and multiple models demonstrate that FedStein surpasses existing methods such as FedAvg and FedBN, with accuracy improvements exceeding 14% in certain domains leading to enhanced domain generalization. The code is available at https://github.com/sunnyinAI/FedStein	翻訳日:2024-11-02 21:50:00 公開日:2024-10-04
# CliMedBench: 臨床シナリオにおける医学大言語モデル評価のための大規模中国語ベンチマーク CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios ( http://arxiv.org/abs/2410.03502v1 ) ライセンス: Link先を確認	Zetian Ouyang, Yishuai Qiu, Linlin Wang, Gerard de Melo, Ya Zhang, Yanfeng Wang, Liang He,	(参考訳) 多様な領域におけるLarge Language Models (LLMs) の普及に伴い、臨床医療シナリオにおける統一的な評価基準が特に必要となる。 CliMedBenchは、7つの方向のLSMの医学的能力を評価するために特別に設計された14の専門的な臨床シナリオを備えた総合的なベンチマークである。上位第3階層の病院の実際の医療報告と、本物の検査演習から得られた33,735の質問から成り立っている。このベンチマークの信頼性はいくつかの点で確認されている。その後、既存のLLMを用いた実験により、以下の結果が得られた。 (i)中国の医学LLMは、特に医学的推論と事実的整合性が不可欠である場合において、臨床知識と診断精度の進歩の必要性を浮き彫りにしている。 (II)いくつかの一般ドメイン LLM は医療クリニックにおいて有意な可能性を示す一方で,多くの医療 LLM の入力能力の制限は,その実用性を妨げている。これらの結果から,臨床シナリオにおけるLSMの強度と限界が明らかとなり,臨床研究における重要な知見が得られた。 With the proliferation of Large Language Models (LLMs) in diverse domains, there is a particular need for unified evaluation standards in clinical medical scenarios, where models need to be examined very thoroughly. We present CliMedBench, a comprehensive benchmark with 14 expert-guided core clinical scenarios specifically designed to assess the medical ability of LLMs across 7 pivot dimensions. It comprises 33,735 questions derived from real-world medical reports of top-tier tertiary hospitals and authentic examination exercises. The reliability of this benchmark has been confirmed in several ways. Subsequent experiments with existing LLMs have led to the following findings: (i) Chinese medical LLMs underperform on this benchmark, especially where medical reasoning and factual consistency are vital, underscoring the need for advances in clinical knowledge and diagnostic accuracy. (ii) Several general-domain LLMs demonstrate substantial potential in medical clinics, while the limited input capacity of many medical LLMs hinders their practical use. These findings reveal both the strengths and limitations of LLMs in clinical scenarios and offer critical insights for medical research.	翻訳日:2024-11-02 21:50:00 公開日:2024-10-04
# 医療機器ディジタル双生児の不確実性を考慮した環境シミュレーション Uncertainty-Aware Environment Simulation of Medical Devices Digital Twins ( http://arxiv.org/abs/2410.03504v1 ) ライセンス: Link先を確認	Hassan Sartaj, Shaukat Ali, Julie Marie Gjøby,	(参考訳) スマートメディカルデバイスは、IoT(Health Internet of Things)の不可欠なコンポーネントであり、IoTベースのアプリケーションを通じて、さまざまなヘルスケアサービスを提供する。システムと統合レベルのテストを通じて、そのようなアプリケーションの信頼性を確保することは、多くの医療機器の物理的統合を義務付ける。この文脈では、医療機器のデジタルツインが、自動化テストの促進に不可欠な役割を担っている。医療機器の不確実な環境要因を考慮せずに、デジタルツインでテストすることは、IoTベースの医療アプリケーションに未検証の機能を残している。さらに、環境要因のないディジタル双生児は、実際の環境で機能する対応するデバイスと同期し、未調整のままである。本稿では,医療機器のディジタル双生児の環境をモデル化し,シミュレーションするためのモデルベースアプローチ(EnvDT)を提案する。実世界のIoTベースのヘルスケアアプリケーションに接続された3つの薬品ディスペンサー、Karie, Medido, Pillyを使用して、EnvDTを実証的に評価した。評価対象は,デジタル双生児を対象とした環境モデルと不確実性シナリオの多様性の分析である。その結果,EnvDTは環境モデルの約61%のカバレッジを達成し,複数の環境シミュレーションにおいて,様々な不確実なシナリオ(最大ダイバーシティ値0.62)を生成することがわかった。 Smart medical devices are an integral component of the healthcare Internet of Things (IoT), providing patients with various healthcare services through an IoT-based application. Ensuring the dependability of such applications through system and integration-level testing mandates the physical integration of numerous medical devices, which is costly and impractical. In this context, digital twins of medical devices play an essential role in facilitating testing automation. Testing with digital twins without accounting for uncertain environmental factors of medical devices leaves many functionalities of IoT-based healthcare applications untested. In addition, digital twins operating without environmental factors remain out of sync and uncalibrated with their corresponding devices functioning in the real environment. To deal with these challenges, in this paper, we propose a model-based approach (EnvDT) for modeling and simulating the environment of medical devices' digital twins under uncertainties. We empirically evaluate the EnvDT using three medicine dispensers, Karie, Medido, and Pilly connected to a real-world IoT-based healthcare application. Our evaluation targets analyzing the coverage of environment models and the diversity of uncertain scenarios generated for digital twins. Results show that EnvDT achieves approximately 61% coverage of environment models and generates diverse uncertain scenarios (with a near-maximum diversity value of 0.62) during multiple environmental simulations.	翻訳日:2024-11-02 21:50:00 公開日:2024-10-04
# 分類Denoising Networks Classification-Denoising Networks ( http://arxiv.org/abs/2410.03505v1 ) ライセンス: Link先を確認	Louis Thiry, Florentin Guth,	(参考訳) 画像分類と認知は、堅牢性の欠如や条件情報の部分的に無視という相補的な問題に悩まされる。両タスクを(ノイズの多い)画像とクラスラベルの結合確率のモデルで統一することで緩和できると論じる。フォワードパスで分類を行い、コンディショニングを行う。 Tweedie-Miyasawa式を用いて,楽譜を用いた復調関数の評価を行った。トレーニングの目的は、ノイズレベルを総合したクロスエントロピー損失とデノジングスコアマッチング損失の組み合わせである。 CIFAR-10 と ImageNet の数値実験では、参照深層畳み込み分類器/デノワよりも競合的な分類とデノナイジング性能を示し、従来のジョイントアプローチに比べて効率が大幅に向上した。本モデルでは, 標準的な識別型分類器と比較して, 対向的摂動に対する頑健さが向上し, 対向的勾配の新たな解釈が可能となった。 Image classification and denoising suffer from complementary issues of lack of robustness or partially ignoring conditioning information. We argue that they can be alleviated by unifying both tasks through a model of the joint probability of (noisy) images and class labels. Classification is performed with a forward pass followed by conditioning. Using the Tweedie-Miyasawa formula, we evaluate the denoising function with the score, which can be computed by marginalization and back-propagation. The training objective is then a combination of cross-entropy loss and denoising score matching loss integrated over noise levels. Numerical experiments on CIFAR-10 and ImageNet show competitive classification and denoising performance compared to reference deep convolutional classifiers/denoisers, and significantly improves efficiency compared to previous joint approaches. Our model shows an increased robustness to adversarial perturbations compared to a standard discriminative classifier, and allows for a novel interpretation of adversarial gradients as a difference of denoisers.	翻訳日:2024-11-02 21:50:00 公開日:2024-10-04
# 水中音響ネットワークにおける位置追跡による認証 Authentication by Location Tracking in Underwater Acoustic Networks ( http://arxiv.org/abs/2410.03511v1 ) ライセンス: Link先を確認	Gianmaria Ventura, Francesco Ardizzon, Stefano Tomasin,	(参考訳) 水中音響ネットワーク(UWAN)における物理層メッセージ認証は,送信装置の指紋として水中音響チャネル(UWAC)の特性を利用する。しかし、デバイスがUWACを変更すると、認証機構はそのようなバリエーションを追跡する必要がある。本稿では,まず,水中デバイスの位置を推定し,それに基づいて将来の位置を推定する,という2つのステップで動作するコンテキストベース認証機構を提案する。送信の真正性を確認するため,推定位置と予測位置を比較した。推定されたUWACのサンプル共分散行列を入力として、畳み込みニューラルネットワークを用いて位置を推定する。この予測は、カルマンフィルタまたはリカレントニューラルネットワーク(RNN)を使用する。予測位置と推定位置の2乗誤差に対して認証チェックを行う。カルマンフィルタに基づく解は、典型的な水中動作を再現する相関したガウス-マルコフ運動モデルに従って、RNN上に構築された解よりも優れる。 Physical layer message authentication in underwater acoustic networks (UWANs) leverages the characteristics of the underwater acoustic channel (UWAC) as a fingerprint of the transmitting device. However, as the device moves its UWAC changes, and the authentication mechanism must track such variations. In this paper, we propose a context-based authentication mechanism operating in two steps: first, we estimate the position of the underwater device, then we predict its future position based on the previously estimated ones. To check the authenticity of the transmission, we compare the estimated and the predicted position. The location is estimated using a convolutional neural network taking as input the sample covariance matrix of the estimated UWACs. The prediction uses either a Kalman filter or a recurrent neural network (RNN). The authentication check is performed on the squared error between the predicted and estimated positions. The solution based on the Kalman filter outperforms that built on the RNN when the device moves according to a correlated Gauss-Markov mobility model, which reproduces a typical underwater motion.	翻訳日:2024-11-02 21:50:00 公開日:2024-10-04
# Weisfeiler-Lemanの微粒化表現力--準同型数論の観点から Fine-Grained Expressive Power of Weisfeiler-Leman: A Homomorphism Counting Perspective ( http://arxiv.org/abs/2410.03517v1 ) ライセンス: Link先を確認	Junru Zhou, Muhan Zhang,	(参考訳) グラフニューラルネットワーク(GNN)が準同型を数える能力は、最近、その表現力の実用的できめ細かい尺度として提案されている。いくつかの既存の研究で特定のGNNファミリーの準同型カウント能力について研究されているが、その問題を解析するためのシンプルで統一的な枠組みは欠如している。本稿では,まず,表現型GNNのフレキシブルな設計基盤として \emph{ Generalized folklore Weisfeiler-Leman (GFWL) アルゴリズムを提案する。検討されている設計空間は、既知の強力なGNNのほとんどすべてに対応するのに十分の大きさであるので、我々の結果は既存のすべての作業を大幅に拡張し、GNNモデル設計の自動化にその応用を見出すことができるかもしれない。 The ability of graph neural networks (GNNs) to count homomorphisms has recently been proposed as a practical and fine-grained measure of their expressive power. Although several existing works have investigated the homomorphism counting power of certain GNN families, a simple and unified framework for analyzing the problem is absent. In this paper, we first propose \emph{generalized folklore Weisfeiler-Leman (GFWL)} algorithms as a flexible design basis for expressive GNNs, and then provide a theoretical framework to algorithmically determine the homomorphism counting power of an arbitrary class of GNN within the GFWL design space. As the considered design space is large enough to accommodate almost all known powerful GNNs, our result greatly extends all existing works, and may find its application in the automation of GNN model design.	翻訳日:2024-11-02 21:50:00 公開日:2024-10-04
# 複雑な不均衡データストリームのためのオンラインバギングの改善 Improving Online Bagging for Complex Imbalanced Data Stream ( http://arxiv.org/abs/2410.03519v1 ) ライセンス: Link先を確認	Bartosz Przybyl, Jerzy Stefanowski,	(参考訳) 不均衡とデータストリームのドリフトから分類器を学ぶことは依然として課題である。現在の提案の多くは、グローバル不均衡比の変化を考慮に入れ、少数民族のサブコンセプトへの分解や、安全でない種類の例(国境線や珍しいもの)の存在など、局所的な困難要素を無視することに集中している。ストリームに存在する要因はオンライン分類器の性能を低下させる可能性があるため、安全でないマイノリティ事例の存在を考慮し、オンライン・バッグングの強化、すなわち、近隣アンダーサンプリング(Neighbourhood Undersampling)やオーバーサンプリング・オンライン・バッグング(Oversampling Online Bagging)を提案する。合成複素不均衡データストリームを用いた計算実験は、オンラインバッグ再サンプリングアンサンブルの以前の変種よりも有利であることを示した。 Learning classifiers from imbalanced and concept drifting data streams is still a challenge. Most of the current proposals focus on taking into account changes in the global imbalance ratio only and ignore the local difficulty factors, such as the minority class decomposition into sub-concepts and the presence of unsafe types of examples (borderline or rare ones). As the above factors present in the stream may deteriorate the performance of popular online classifiers, we propose extensions of resampling online bagging, namely Neighbourhood Undersampling or Oversampling Online Bagging to take better account of the presence of unsafe minority examples. The performed computational experiments with synthetic complex imbalanced data streams have shown their advantage over earlier variants of online bagging resampling ensembles.	翻訳日:2024-11-02 21:50:00 公開日:2024-10-04
# コード実行とテキスト推論の間の大規模言語モデルのステアリング Steering Large Language Models between Code Execution and Textual Reasoning ( http://arxiv.org/abs/2410.03524v1 ) ライセンス: Link先を確認	Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma, Chuchu Fan, Chi Wang,	(参考訳) 最近の研究は、多エージェントフレームワークや推論チェーンを最適化することで、LLM(Large Language Models)のテキスト推論能力の向上に重点を置いているが、直接コーディングによる100%の成功によって、いくつかのベンチマークタスクを解決できる。テキスト推論は、数学、論理学、最適化、探索における課題を伴うタスクの解決に固有の制限があり、それは単にモデルとデータサイズをスケールアップするだけでは解決できない。最近リリースされたOpenAI GPT Code InterpreterとAutoGenのようなマルチエージェントフレームワークは、LCMを使って複雑なタスクを解くためにコード生成と実行を統合するのに顕著な能力を示した。しかし、14のタスクと6種類のLLM(新しいO1-previewを含む)を持つシングルターンとマルチターンの両方でコード/テキスト生成をステアリングするための既存の7つの方法の実験に基づいて、現在、必要な時にコードを書き込むのに最適な方法が存在しない。タスクの複雑さとモデルサイズへの進化によって、モデルがコードを使う場合と、テキストによる推論を使用する場合の興味深いパターンを見つけました。また、LLMで書かれたコードの結果が、たとえそのタスクがコードを通して解決できたとしても、テキスト推論よりも必ずしも良いとは限らないことを発見した。上記の問題を緩和するため,LLMのコード/テキスト生成を向上し,顕著な改善を実現するための3つの方法を提案する。トークン長とランタイムのコストは、すべてのメソッドで完全に議論されている。 LLMコード/テキスト生成のステアリング問題は今後の研究にとって重要であり、さらなる改善のための余地があると考えています。 Project Page、Datasets、Codesはhttps://yongchao98.github.io/CodeSteer/.comで入手できる。 While a lot of recent research focuses on enhancing the textual reasoning capabilities of Large Language Models (LLMs) by optimizing the multi-agent framework or reasoning chains, several benchmark tasks can be solved with 100% success through direct coding, which is more scalable and avoids the computational overhead associated with textual iterating and searching. Textual reasoning has inherent limitations in solving tasks with challenges in math, logics, optimization, and searching, which is unlikely to be solved by simply scaling up the model and data size. The recently released OpenAI GPT Code Interpreter and multi-agent frameworks such as AutoGen have demonstrated remarkable proficiency of integrating code generation and execution to solve complex tasks using LLMs. However, based on our experiments on 7 existing popular methods for steering code/text generation in both single- and multi-turn settings with 14 tasks and 6 types of LLMs (including the new O1-preview), currently there is no optimal method to correctly steer LLMs to write code when needed. We discover some interesting patterns on when models use code vs. textual reasoning with the evolution to task complexity and model sizes, which even result in an astonishingly inverse scaling law. We also discover that results from LLM written code are not always better than using textual reasoning, even if the task could be solved through code. To mitigate the above issues, we propose three methods to better steer LLM code/text generation and achieve a notable improvement. The costs of token lengths and runtime are thoroughly discussed for all the methods. We believe the problem of steering LLM code/text generation is critical for future research and has much space for further improvement. Project Page, Datasets, and Codes are available at https://yongchao98.github.io/CodeSteer/.	翻訳日:2024-11-02 21:50:00 公開日:2024-10-04
# 話す必要がない: 言語モデルの非同期混合 No Need to Talk: Asynchronous Mixture of Language Models ( http://arxiv.org/abs/2410.03529v1 ) ライセンス: Link先を確認	Anastasiia Filippova, Angelos Katharopoulos, David Grangier, Ronan Collobert,	(参考訳) SmallTalk LM(SmallTalk LM)は,言語モデルの混合をほぼ非同期に訓練する革新的な手法である。混合物の各モデルは、各モデルを訓練するノード間での高帯域通信を必要とせず、データ分散の異なる部分に特化している。推測では、短いプレフィックスによると、軽量ルータが与えられたシーケンスを単一の専門家に指示する。この推論スキームは、全混合モデルからパラメータのごく一部を自然に利用する。言語モデリング実験では,SmallTalk LMは,同一の訓練FLOPとほぼ同一の推論コストに対して,高密度モデルベースラインよりも難易度が著しく低いことを実証した。最後に、下流の評価では、タスクの75セント%で密度の高いベースラインを上回ります。 We introduce SmallTalk LM, an innovative method for training a mixture of language models in an almost asynchronous manner. Each model of the mixture specializes in distinct parts of the data distribution, without the need of high-bandwidth communication between the nodes training each model. At inference, a lightweight router directs a given sequence to a single expert, according to a short prefix. This inference scheme naturally uses a fraction of the parameters from the overall mixture model. Our experiments on language modeling demonstrate tha SmallTalk LM achieves significantly lower perplexity than dense model baselines for the same total training FLOPs and an almost identical inference cost. Finally, in our downstream evaluations we outperform the dense baseline on $75\%$ of the tasks.	翻訳日:2024-11-02 21:39:44 公開日:2024-10-04
# MARE: 教師なしRationale抽出におけるマルチアスペクトRationaleエクストラクタ MARE: Multi-Aspect Rationale Extractor on Unsupervised Rationale Extraction ( http://arxiv.org/abs/2410.03531v1 ) ライセンス: Link先を確認	Han Jiang, Junwen Duan, Zhe Qu, Jianxin Wang,	(参考訳) 教師なしの合理性抽出は、明示的な合理性アノテーションなしでモデル予測をサポートするためにテキストスニペットを抽出することを目的としている。研究者はこの課題の解決に多くの努力を払ってきた。従来の作業は各側面を独立してエンコードすることが多く、アスペクト間の有意義な内部相関を捉える能力を制限する可能性がある。突発的相関を緩和する研究は盛んに行われてきたが,本手法では,有益な内部相関を利用して多視点理性抽出を改善することに重点を置いている。本稿では,複数の側面を同時に説明・予測するマルチアスペクト・ライタリー・エクストラクタ(MARE)を提案する。具体的には,複数のテキストチャンクを同時に符号化するハード削除に基づくマルチアスペクトマルチヘッドアテンション(MAMHA)機構を提案する。さらに、テキストの前に複数の特別なトークンをプリプションし、それぞれが1つの特定の側面に対応する。最後に、トレーニングオーバーヘッドを低減するためにマルチタスクトレーニングがデプロイされる。 2つの教師なし理性抽出ベンチマークの実験結果は、MAREが最先端の性能を達成することを示す。アブレーション研究により, 本手法の有効性がさらに示された。私たちのコードはhttps://github.com/CSU-NLP-Group/MAREで公開されています。 Unsupervised rationale extraction aims to extract text snippets to support model predictions without explicit rationale annotation. Researchers have made many efforts to solve this task. Previous works often encode each aspect independently, which may limit their ability to capture meaningful internal correlations between aspects. While there has been significant work on mitigating spurious correlations, our approach focuses on leveraging the beneficial internal correlations to improve multi-aspect rationale extraction. In this paper, we propose a Multi-Aspect Rationale Extractor (MARE) to explain and predict multiple aspects simultaneously. Concretely, we propose a Multi-Aspect Multi-Head Attention (MAMHA) mechanism based on hard deletion to encode multiple text chunks simultaneously. Furthermore, multiple special tokens are prepended in front of the text with each corresponding to one certain aspect. Finally, multi-task training is deployed to reduce the training overhead. Experimental results on two unsupervised rationale extraction benchmarks show that MARE achieves state-of-the-art performance. Ablation studies further demonstrate the effectiveness of our method. Our codes have been available at https://github.com/CSU-NLP-Group/MARE.	翻訳日:2024-11-02 21:39:44 公開日:2024-10-04
# NRGBoost:エネルギーベースで生長する高木 NRGBoost: Energy-Based Generative Boosted Trees ( http://arxiv.org/abs/2410.03535v1 ) ライセンス: Link先を確認	João Bravo,	(参考訳) 非構造データ領域におけるディープラーニングの優位性の高まりにもかかわらず、ランダムフォレスト(RF)やグラディエントブースト決定木(GBDT)のような木に基づく手法は、表層データにおける差別的タスクを扱うための作業場である。我々は、データ密度(正規化定数まで)を明示的にモデル化することに焦点を当て、これらの人気アルゴリズムの生成拡張を検討し、サンプリング以外のアプリケーションを可能にする。本研究の主な貢献として,XGBoost などの人気パッケージに実装された第2次ブースティングに類似したエネルギーベース生成促進アルゴリズムを提案する。提案アルゴリズムは,任意の入力変数に対して推論タスクを処理可能な生成モデルを生成する一方で,実世界のグラフデータセットにおいて,GBDTと類似の識別性能を実現し,代替生成手法よりも優れていることを示す。同時に、サンプリングのためのニューラルネットワークベースのモデルとも競合することを示した。 Despite the rise to dominance of deep learning in unstructured data domains, tree-based methods such as Random Forests (RF) and Gradient Boosted Decision Trees (GBDT) are still the workhorses for handling discriminative tasks on tabular data. We explore generative extensions of these popular algorithms with a focus on explicitly modeling the data density (up to a normalization constant), thus enabling other applications besides sampling. As our main contribution we propose an energy-based generative boosting algorithm that is analogous to the second order boosting implemented in popular packages like XGBoost. We show that, despite producing a generative model capable of handling inference tasks over any input variable, our proposed algorithm can achieve similar discriminative performance to GBDT on a number of real world tabular datasets, outperforming alternative generative approaches. At the same time, we show that it is also competitive with neural network based models for sampling.	翻訳日:2024-11-02 21:39:44 公開日:2024-10-04
# Ward: LLM透かしによる確率的RAGデータセット推論 Ward: Provable RAG Dataset Inference via LLM Watermarks ( http://arxiv.org/abs/2410.03537v1 ) ライセンス: Link先を確認	Nikola Jovanović, Robin Staab, Maximilian Baader, Martin Vechev,	(参考訳) Retrieval-Augmented Generation (RAG)は、ジェネレーション中に外部データを組み込むことでLLMを改善する。これにより、RAGシステムにおけるコンテンツの不正使用に対するデータ所有者の懸念が高まる。その重要性にもかかわらず、そのような不正使用を検出するという課題は未解決のままであり、近隣の分野からの既存のデータセットや方法論は研究に不適である。この作業では、このギャップを埋めるためにいくつかのステップを踏んでいます。まず、この問題を(ブラックボックス)RAGデータセット推論(RAG-DI)として定式化する。さらに,現実的な条件下でのRAG-DI手法のベンチマークに特化して設計された新しいデータセットを導入し,一連のベースラインアプローチを提案する。この基盤を基盤として,データ所有者がRAGシステムにおけるデータセットの使用に関する厳密な統計的保証を得られるようなLCM透かしに基づくRAG-DI手法であるWardを導入する。実験評価では、Wardは、多くの難易度設定において、全てのベースラインを一貫して上回り、高い精度、優れたクエリ効率、ロバスト性を実現している。我々の研究は今後のRAG-DI研究の基礎を提供し、この問題に対する将来的なアプローチとしてLCM透かしを強調します。 Retrieval-Augmented Generation (RAG) improves LLMs by enabling them to incorporate external data during generation. This raises concerns for data owners regarding unauthorized use of their content in RAG systems. Despite its importance, the challenge of detecting such unauthorized usage remains underexplored, with existing datasets and methodologies from adjacent fields being ill-suited for its study. In this work, we take several steps to bridge this gap. First, we formalize this problem as (black-box) RAG Dataset Inference (RAG-DI). To facilitate research on this challenge, we further introduce a novel dataset specifically designed for benchmarking RAG-DI methods under realistic conditions, and propose a set of baseline approaches. Building on this foundation, we introduce Ward, a RAG-DI method based on LLM watermarks that enables data owners to obtain rigorous statistical guarantees regarding the usage of their dataset in a RAG system. In our experimental evaluation, we show that Ward consistently outperforms all baselines across many challenging settings, achieving higher accuracy, superior query efficiency and robustness. Our work provides a foundation for future studies of RAG-DI and highlights LLM watermarks as a promising approach to this problem.	翻訳日:2024-11-02 21:39:44 公開日:2024-10-04
# アノテーションによる性差別とミソジニー分類の再検討 Re-examining Sexism and Misogyny Classification with Annotator Attitudes ( http://arxiv.org/abs/2410.03543v1 ) ライセンス: Link先を確認	Aiqi Jiang, Nikolas Vitsakis, Tanvi Dinkar, Gavin Abercrombie, Ioannis Konstas,	(参考訳) Gender-Based Violence(GBV)はオンライン上の問題だが、既存のデータセットは複数の可能なアノテータの視点を捉えたり、影響を受けるグループの表現を確実にすることができない。我々はGBVのモデレーションパイプラインにおいて,(1)手動データラベリング,(2)自動分類の2つの重要な段階を再考する。 1)アノテータのアイデンティティと態度の関係と,それらが2つのGBVラベリングタスクに与える応答について検討する。この目的のために,社会心理学の3つの検証された調査データを用いて,クラウドソーシングアノテータから人口統計情報と人口統計情報を収集した。右翼権威主義のスコアは、テキストをセクシストとしてラベル付けする確率が高いのに対して、社会的支配指向とネオセクシズムの態度では、高いスコアは、それを行う負の傾向に関連している。 2)大規模言語モデルと5つのプロンプト戦略を用いて分類実験を行う。以下に示す。 (i)アノテータの態度は、分類者のラベルの予測能力に影響を及ぼす。二適時情報を含むものは、よく構造化された簡潔な注釈書を用いて、性能を高めることができる。三モデルは、新しいラベルセットの複雑さと不均衡なクラスの増加を反映するのに苦労する。 Gender-Based Violence (GBV) is an increasing problem online, but existing datasets fail to capture the plurality of possible annotator perspectives or ensure the representation of affected groups. We revisit two important stages in the moderation pipeline for GBV: (1) manual data labelling; and (2) automated classification. For (1), we examine two datasets to investigate the relationship between annotator identities and attitudes and the responses they give to two GBV labelling tasks. To this end, we collect demographic and attitudinal information from crowd-sourced annotators using three validated surveys from Social Psychology. We find that higher Right Wing Authoritarianism scores are associated with a higher propensity to label text as sexist, while for Social Dominance Orientation and Neosexist Attitudes, higher scores are associated with a negative tendency to do so. For (2), we conduct classification experiments using Large Language Models and five prompting strategies, including infusing prompts with annotator information. We find: (i) annotator attitudes affect the ability of classifiers to predict their labels; (ii) including attitudinal information can boost performance when we use well-structured brief annotator descriptions; and (iii) models struggle to reflect the increased complexity and imbalanced classes of the new label sets.	翻訳日:2024-11-02 21:39:44 公開日:2024-10-04
# 単純重複によるデータ品質向上:責任ある計算社会科学研究をナビゲートする Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research ( http://arxiv.org/abs/2410.03545v1 ) ライセンス: Link先を確認	Yida Mu, Mali Jin, Xingyi Song, Nikolaos Aletras,	(参考訳) 計算社会科学(CSS)のための自然言語処理(NLP)の研究は、ソーシャルメディアプラットフォームからのデータに大きく依存している。このデータは,オンラインコミュニティにおける社会言語現象の分析モデルの開発において重要な役割を担っている。本研究では,NLP for CSSで広く使われている20のデータセットの詳細な調査を行い,データ品質を包括的に調査する。分析の結果、ソーシャルメディアのデータセットは様々なレベルのデータ重複を示すことが明らかとなった。これにより、ラベルの不整合やデータの漏洩といった問題が発生し、モデルの信頼性が損なわれる。我々の研究結果は、データ重複が現在の最先端性能の主張に影響を与え、現実のシナリオにおけるモデルの有効性を過大評価する可能性があることを示唆している。最後に,ソーシャルメディアデータからデータセット開発を改善するための新しいプロトコルとベストプラクティスを提案する。 Research in natural language processing (NLP) for Computational Social Science (CSS) heavily relies on data from social media platforms. This data plays a crucial role in the development of models for analysing socio-linguistic phenomena within online communities. In this work, we conduct an in-depth examination of 20 datasets extensively used in NLP for CSS to comprehensively examine data quality. Our analysis reveals that social media datasets exhibit varying levels of data duplication. Consequently, this gives rise to challenges like label inconsistencies and data leakage, compromising the reliability of models. Our findings also suggest that data duplication has an impact on the current claims of state-of-the-art performance, potentially leading to an overestimation of model effectiveness in real-world scenarios. Finally, we propose new protocols and best practices for improving dataset development from social media data and its usage.	翻訳日:2024-11-02 21:39:44 公開日:2024-10-04
# 単一光子LiDARを用いた隠れ物体のイメージングによる自律走行の促進 Enhancing Autonomous Navigation by Imaging Hidden Objects using Single-Photon LiDAR ( http://arxiv.org/abs/2410.03555v1 ) ライセンス: Link先を確認	Aaron Young, Nevindu M. Batagoda, Harry Zhang, Akshat Dave, Adithya Pediredla, Dan Negrut, Ramesh Raskar,	(参考訳) 可視性に制限のある環境でのロバストな自律ナビゲーションは、ロボティクスにおける重要な課題である。単一光子LiDARを用いたNon-Line-of-Sight(NLOS)センシングによる視認性の向上と自律ナビゲーションの向上を目的とした新しいアプローチを提案する。提案手法により,移動ロボットは,マルチバウンス光情報を利用して「角を見回す」ことができ,インフラを付加せずに知覚範囲を効果的に拡張することができる。本研究では,(1)SPADベースのLiDARを用いてマルチバウンスヒストグラムをキャプチャするセンシング,(2)畳み込みニューラルネットワークを用いて隠れた領域の占有マップを推定する知覚,(3)ロボットが推定した占有状況に基づいて安全な経路を辿ることができる制御の3つのモジュールパイプラインを提案する。我々は,L字型廊下を隠れ障害物で航行する移動ロボットのシミュレーションと実世界実験により,我々のアプローチを評価する。我々の研究は、自律ナビゲーションのためのNLOSイメージングの初めての実験的なデモであり、複雑な環境で動くより安全で効率的なロボットシステムを実現するための道を開いた。我々はまた、NLOSシナリオをシミュレートし、この領域における将来の研究を促進するための、新しい動的統合トランジェントレンダリングフレームワークにも貢献する。 Robust autonomous navigation in environments with limited visibility remains a critical challenge in robotics. We present a novel approach that leverages Non-Line-of-Sight (NLOS) sensing using single-photon LiDAR to improve visibility and enhance autonomous navigation. Our method enables mobile robots to "see around corners" by utilizing multi-bounce light information, effectively expanding their perceptual range without additional infrastructure. We propose a three-module pipeline: (1) Sensing, which captures multi-bounce histograms using SPAD-based LiDAR; (2) Perception, which estimates occupancy maps of hidden regions from these histograms using a convolutional neural network; and (3) Control, which allows a robot to follow safe paths based on the estimated occupancy. We evaluate our approach through simulations and real-world experiments on a mobile robot navigating an L-shaped corridor with hidden obstacles. Our work represents the first experimental demonstration of NLOS imaging for autonomous navigation, paving the way for safer and more efficient robotic systems operating in complex environments. We also contribute a novel dynamics-integrated transient rendering framework for simulating NLOS scenarios, facilitating future research in this domain.	翻訳日:2024-11-02 21:29:56 公開日:2024-10-04
# færdXel:デンマークの交通法の専門家システム færdXel: An Expert System for Danish Traffic Law ( http://arxiv.org/abs/2410.03560v1 ) ライセンス: Link先を確認	Luís Cruz-Filipe, Jonas Vistrup,	(参考訳) デンマークの交通法分野における象徴的推論ツール f{\ae}rdXel について述べる。 f{\ae}rdXelは、論理プログラミングの技法と新しいインターフェースを組み合わせることで、ユーザーは推論過程をナビゲートし、システムの信頼性を確保することができる。予備的な実証的な評価は、この研究が非常に有望であると見なされ、デンマークの法務部門で専門家を支援する現実世界のAIツールの基礎になる可能性があることを示している。 We present f{\ae}rdXel, a tool for symbolic reasoning in the domain of Danish traffic law. f{\ae}rdXel combines techniques from logic programming with a novel interface that allows users to navigate through its reasoning process, thereby ensuring the system's trustworthiness. A preliminary empirical evaluation indicates that this work is seen as very promising, and has the potential to become a foundation for real-world AI tools supporting professionals in the Danish legal sector.	翻訳日:2024-11-02 21:29:56 公開日:2024-10-04
# 吸収及び排出を補正するコードの種類 Class of codes correcting absorptions and emissions ( http://arxiv.org/abs/2410.03562v1 ) ライセンス: Link先を確認	Arda Aydin, Alexander Barg,	(参考訳) 我々は、全ての放射、吸収、軽視、エラーの上昇/低下を任意の順序で防止する一般的な量子符号群を構築する。このような符号は、文献では吸収放出符号(AE codes)として知られている。一般的なAE符号に対する簡易な誤り訂正条件を導出し、$\le t$エラーを補正する置換不変コードをAE符号にマッピングし、オーダー=t$遷移を補正できることを示す。置換不変符号のパラメータを慎重に調整し、全角運動量が少ないシステムでホストされる効率的なAE符号のいくつかの例を構築した。また、スピン符号をAE符号にマッピングすることで、そのような符号の特定のサブクラスに対する論理演算子を特徴付けることができることを示す。 We construct a general family of quantum codes that protect against all emission, absorption, dephasing, and raising/lowering errors up to an arbitrary fixed order. Such codes are known in the literature as absorption-emission (AE) codes. We derive simplified error correction conditions for a general AE code and show that any permutation-invariant code that corrects $\le t$ errors can be mapped to an AE code that corrects up to order-$t$ transitions. Carefully tuning the parameters of permutationally invariant codes, we construct several examples of efficient AE codes, hosted in systems with low total angular momentum. Our results also imply that spin codes can be mapped to AE codes, enabling us to characterize logical operators for certain subclasses of such codes.	翻訳日:2024-11-02 21:29:56 公開日:2024-10-04
# 強化学習における一般化のためのより到達可能な課題の育成 Training on more Reachable Tasks for Generalisation in Reinforcement Learning ( http://arxiv.org/abs/2410.03565v1 ) ライセンス: Link先を確認	Max Weltevrede, Caroline Horsch, Matthijs T. J. Spaan, Wendelin Böhmer,	(参考訳) マルチタスク強化学習では、エージェントは一定のタスクセットでトレーニングを行い、新しいタスクに一般化する必要がある。近年の研究では、探索の増加がこの一般化を改善することが示されているが、その理由は不明である。本稿では、マルチタスク強化学習における到達可能性の概念を導入し、初期探索フェーズがエージェントが訓練する到達可能なタスクの数を増やすことを示す。これは、探索の増大ではなく、到達不可能なタスクに対しても、一般化の改善の責任がある。そこで本研究では,各エピソードの開始時に,このような探索フェーズを実装する新しい手法であるExplore-Goを提案する。 Explore-Goは、経験の収集方法を変更するだけであり、既存のほとんどのオン・ポリティクスまたはオフ・ポリティクスの強化学習アルゴリズムで使用することができる。いくつかの一般的なアルゴリズムと組み合わせることで,本手法の有効性を実証し,いくつかの環境における一般化性能の向上を示す。 In multi-task reinforcement learning, agents train on a fixed set of tasks and have to generalise to new ones. Recent work has shown that increased exploration improves this generalisation, but it remains unclear why exactly that is. In this paper, we introduce the concept of reachability in multi-task reinforcement learning and show that an initial exploration phase increases the number of reachable tasks the agent is trained on. This, and not the increased exploration, is responsible for the improved generalisation, even to unreachable tasks. Inspired by this, we propose a novel method Explore-Go that implements such an exploration phase at the beginning of each episode. Explore-Go only modifies the way experience is collected and can be used with most existing on-policy or off-policy reinforcement learning algorithms. We demonstrate the effectiveness of our method when combined with some popular algorithms and show an increase in generalisation performance across several environments.	翻訳日:2024-11-02 21:17:55 公開日:2024-10-04
# 言語モデル(LLM)の言語的認識と言語非依存化に向けて Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs) ( http://arxiv.org/abs/2410.03568v1 ) ライセンス: Link先を確認	Abrar Rahman, Garry Bowlin, Binit Mohanty, Sean McGunigal,	(参考訳) 本稿では,最先端の大規模言語モデル (LLM) が採用するトークン化技術と,それらが様々な言語,特に低リソース言語におけるサービスのコストと可用性に与える影響を包括的に研究する。この分析では、GPT-4(cl100k_base埋め込み)、GPT-3(p50k_base埋め込み)、DaVinci(r50k_base埋め込み)を含む複数のLCMと、広く使用されているBERTベーストークンーザが検討されている。本研究は,これらのモデル間で観測されるトークン化の多様性を評価し,サブワードトークン化における言語表現の課題について検討する。この研究は、特に伝統的にリソース不足の言語に対して、言語的に認識された開発プラクティスを育むことの重要性を強調している。さらに,電子健康記録(EHR)システムにおけるトークン化選択の現実的意味を強調するケーススタディを紹介する。本研究の目的は、AIアプリケーションで伝統的に表現されていない言語において、この領域以上のAIサービスの開発において、一般化可能な国際化(I18N)の実践を促進することである。 This paper presents a comprehensive study on the tokenization techniques employed by state-of-the-art large language models (LLMs) and their implications on the cost and availability of services across different languages, especially low resource languages. The analysis considers multiple LLMs, including GPT-4 (using cl100k_base embeddings), GPT-3 (with p50k_base embeddings), and DaVinci (employing r50k_base embeddings), as well as the widely used BERT base tokenizer. The study evaluates the tokenization variability observed across these models and investigates the challenges of linguistic representation in subword tokenization. The research underscores the importance of fostering linguistically-aware development practices, especially for languages that are traditionally under-resourced. Moreover, this paper introduces case studies that highlight the real-world implications of tokenization choices, particularly in the context of electronic health record (EHR) systems. This research aims to promote generalizable Internationalization (I18N) practices in the development of AI services in this domain and beyond, with a strong emphasis on inclusivity, particularly for languages traditionally underrepresented in AI applications.	翻訳日:2024-11-02 21:17:55 公開日:2024-10-04
# 変圧器の大規模化によるモジュラー算術の指導 Teaching Transformers Modular Arithmetic at Scale ( http://arxiv.org/abs/2410.03569v1 ) ライセンス: Link先を確認	Eshika Saxena, Alberto Alfarano, Emily Wenger, Kristin Lauter,	(参考訳) モジュラー加算は単純な演算である:$\mathbb{Z}_q$ の$N$要素が与えられたとき、その和 modulo $q$ が計算される。しかし、この問題に対するスケーラブルな機械学習ソリューションは、いまだ解明されていない: 事前作業は、N \le 6$ element mod $q \le 1000$を和算するMLモデルを訓練する。暗号解析のためのMLモデルの応用を実証する - 多くの場合、大きな$N$と$q$のモチベーションを持つモジュラー演算を伴う。この作業では、より多様なトレーニングデータ、角の埋め込み、カスタムロス関数という、モジュール追加モデルのトレーニングパイプラインに3つの変更を提案する。これらの変更で、我々は、N = 256, q = 3329$のアプローチで成功し、暗号アプリケーションにとって興味深いケースであり、以前の作業でN = 256, $q$が大幅に増加したことを実証した。これらの手法は他のモジュラー算術問題にも一般化し、将来の研究を動機付けている。 Modular addition is, on its face, a simple operation: given $N$ elements in $\mathbb{Z}_q$, compute their sum modulo $q$. Yet, scalable machine learning solutions to this problem remain elusive: prior work trains ML models that sum $N \le 6$ elements mod $q \le 1000$. Promising applications of ML models for cryptanalysis-which often involve modular arithmetic with large $N$ and $q$-motivate reconsideration of this problem. This work proposes three changes to the modular addition model training pipeline: more diverse training data, an angular embedding, and a custom loss function. With these changes, we demonstrate success with our approach for $N = 256, q = 3329$, a case which is interesting for cryptographic applications, and a significant increase in $N$ and $q$ over prior work. These techniques also generalize to other modular arithmetic problems, motivating future work.	翻訳日:2024-11-02 21:17:55 公開日:2024-10-04
# ソフトウェアエンジニアリング領域における生成AI: 職業的アイデンティティの緊張とアイデンティティ保護のパターン Generative AI in the Software Engineering Domain: Tensions of Occupational Identity and Patterns of Identity Protection ( http://arxiv.org/abs/2410.03571v1 ) ライセンス: Link先を確認	Anuschka Schmitt, Krzysztof Z. Gajos, Osnat Mokryn,	(参考訳) 組織環境におけるジェネレーティブ・人工知能(GAI)の導入は、労働者の役割に疑問を投げかけ、それに関連して、長期的なスキル開発やドメインの専門知識に影響を及ぼす。ソフトウェアエンジニアリング領域における質的研究では、職業的アイデンティティの理論レンズと自己決定理論に基づいて、ソフトウェアエンジニアがどのようにして、なぜ自分たちの仕事にGAIを理解するのかを理解しています。技術者のセンスメイキングは、下級生や上級生が、能力、自律性、関連性に対するニーズが、GAIによって異なる影響を受けていると感じているため、ドメインの専門知識に依存していることが分かりました。我々は、職業的アイデンティティを保護するセンスメイキングに従事するエンジニアとして、暗黙のドメイン知識を保存する上で、個人の役割の重要性を強調した。本稿では、労働者の意識形成過程において組織がどのように積極的な役割を担っているかを説明し、組織やシステムデザイナが労働者の職業的アイデンティティに技術的変化が及ぼす影響をいかに促進するかに関する設計ガイドラインを提案する。 The adoption of generative Artificial Intelligence (GAI) in organizational settings calls into question workers' roles, and relatedly, the implications for their long-term skill development and domain expertise. In our qualitative study in the software engineering domain, we build on the theoretical lenses of occupational identity and self-determination theory to understand how and why software engineers make sense of GAI for their work. We find that engineers' sense-making is contingent on domain expertise, as juniors and seniors felt their needs for competence, autonomy, and relatedness to be differently impacted by GAI. We shed light on the importance of the individual's role in preserving tacit domain knowledge as engineers engaged in sense-making that protected their occupational identity. We illustrate how organizations play an active role in shaping workers' sense-making process and propose design guidelines on how organizations and system designers can facilitate the impact of technological change on workers' occupational identity.	翻訳日:2024-11-02 21:17:55 公開日:2024-10-04
# 木テンソルネットワークによる多変量関数の圧縮 Compressing multivariate functions with tree tensor networks ( http://arxiv.org/abs/2410.03572v1 ) ライセンス: Link先を確認	Joseph Tindall, Miles Stoudenmire, Ryan Levy,	(参考訳) テンソルネットワークは多次元データのための圧縮フォーマットである。 1次元テンソルネットワーク - テンソルトレイン (TT) や行列積状態 (MPS) と呼ばれるは、入力を離散二進数に「量子化」することで連続関数の数値アンザッツとして益々使われている。ここでは、この目的のために、より一般的なツリーテンソルネットワークのパワーを実証する。一般的な木テンソルネットワークとして多くの基本関数を直接構成し、テンソル交叉補間アルゴリズムの一般化によりより複雑な関数に対する補間構造を提供する。多次元の関数に対して、一般的に使用されるテンソルトレインよりもはるかに効率的なアンザッツが、より構造化されたツリーテンソルネットワークがどのように提供されるかを示す。本手法の多次元非線形フレドホルム方程式への応用を実演し、解の階数に厳密な境界を与え、ある問題に対してツリーテンソルネットワークのサイズで指数関数的スケーリング精度を保証する。 Tensor networks are a compressed format for multi-dimensional data. One-dimensional tensor networks -- often referred to as tensor trains (TT) or matrix product states (MPS) -- are increasingly being used as a numerical ansatz for continuum functions by "quantizing" the inputs into discrete binary digits. Here we demonstrate the power of more general tree tensor networks for this purpose. We provide direct constructions of a number of elementary functions as generic tree tensor networks and interpolative constructions for more complicated functions via a generalization of the tensor cross interpolation algorithm. For a range of multi-dimensional functions we show how more structured tree tensor networks offer a significantly more efficient ansatz than the commonly used tensor train. We demonstrate an application of our methods to solving multi-dimensional, non-linear Fredholm equations, providing a rigorous bound on the rank of the solution which, in turn, guarantees exponentially scaling accuracy with the size of the tree tensor network for certain problems.	翻訳日:2024-11-02 21:17:55 公開日:2024-10-04
# HyResPINNs:物理インフォームドモデリングのためのニューラルネットワークとRBFコンポーネントの最適組み合わせ学習のための適応型ハイブリッド残差ネットワーク HyResPINNs: Adaptive Hybrid Residual Networks for Learning Optimal Combinations of Neural and RBF Components for Physics-Informed Modeling ( http://arxiv.org/abs/2410.03573v1 ) ライセンス: Link先を確認	Madison Cooley, Robert M. Kirby, Shandian Zhe, Varun Shankar,	(参考訳) 物理インフォームドニューラルネットワーク(英: Physics-informed Neural Network, PINN)は、PDE(偏微分方程式)の数値解法において、関連するPDE項で正規化された損失関数を用いてニューラルネットワークを訓練し、物理的制約を強制する手法として人気が高まっている。我々はHyResPINNと呼ばれる新しい種類のPINNを提案し、標準ニューラルネットワークと放射基底関数(RBF)ネットワークの出力を組み合わせた適応型ハイブリッド残差ブロックで従来のPINNを拡張した。提案手法の重要な特徴は,ニューラルネットワークとRBFネットワーク出力の寄与度を動的に学習する残差ブロックに適応的な組み合わせパラメータを組み込むことである。さらに、残余ブロック間の適応的な接続は、ネットワーク全体の柔軟な情報フローを可能にする。 HyResPINNは従来のPINNよりも、ポイントロケーションやニューラルネットワークアーキテクチャのトレーニングに堅牢であることを示す。さらに、HyResPINNは特定の問題に対する競合メソッドよりも桁違いに精度が高く、トレーニングコストはわずかに増加している。我々は,Allen-Cahn方程式やDarcy-Flow方程式など,PDEの挑戦に対するアプローチの強みを実証する。この結果から,HyResPINNは従来の数値解法と現代の機械学習に基づく解法とのギャップを効果的に埋めることが示唆された。 Physics-informed neural networks (PINNs) are an increasingly popular class of techniques for the numerical solution of partial differential equations (PDEs), where neural networks are trained using loss functions regularized by relevant PDE terms to enforce physical constraints. We present a new class of PINNs called HyResPINNs, which augment traditional PINNs with adaptive hybrid residual blocks that combine the outputs of a standard neural network and a radial basis function (RBF) network. A key feature of our method is the inclusion of adaptive combination parameters within each residual block, which dynamically learn to weigh the contributions of the neural network and RBF network outputs. Additionally, adaptive connections between residual blocks allow for flexible information flow throughout the network. We show that HyResPINNs are more robust to training point locations and neural network architectures than traditional PINNs. Moreover, HyResPINNs offer orders of magnitude greater accuracy than competing methods on certain problems, with only modest increases in training costs. We demonstrate the strengths of our approach on challenging PDEs, including the Allen-Cahn equation and the Darcy-Flow equation. Our results suggest that HyResPINNs effectively bridge the gap between traditional numerical methods and modern machine learning-based solvers.	翻訳日:2024-11-02 21:17:55 公開日:2024-10-04
# 低リソースのインデックス言語に対するテーブル質問回答 Table Question Answering for Low-resourced Indic Languages ( http://arxiv.org/abs/2410.03576v1 ) ライセンス: Link先を確認	Vaishali Pal, Evangelos Kanoulas, Andrew Yates, Maarten de Rijke,	(参考訳) TableQAは構造化された情報のテーブル上で質問に答え、個々のセルやテーブルを出力として返すタスクである。 TableQAの研究は、主に高リソース言語に焦点を当てており、注釈付きデータやニューラルモデルが不足しているため、中低リソース言語はほとんど進歩していない。予算が限られている低リソース言語に対して,完全に自動化された大規模テーブルQAデータ生成プロセスを導入することで,このギャップに対処する。表QAデータセットやモデルを持たない2つのIndic言語であるBengaliとHindiにデータ生成手法を組み込む。大規模データセットに基づいてトレーニングされたTableQAモデルは、最先端のLLMよりも優れています。さらに、数学的推論能力やゼロショット言語間移動など、異なる側面の訓練されたモデルについて研究する。当社の作業は、スケーラブルなデータ生成と評価手順に焦点を当てた、低リソースのテーブルQAに関する最初のものです。提案手法は,Web が存在する低リソース言語にも適用可能である。データセット、モデル、コード(https://github.com/kolk/Low-Resource-TableQA-Indic-langs)をリリースします。 TableQA is the task of answering questions over tables of structured information, returning individual cells or tables as output. TableQA research has focused primarily on high-resource languages, leaving medium- and low-resource languages with little progress due to scarcity of annotated data and neural models. We address this gap by introducing a fully automatic large-scale tableQA data generation process for low-resource languages with limited budget. We incorporate our data generation method on two Indic languages, Bengali and Hindi, which have no tableQA datasets or models. TableQA models trained on our large-scale datasets outperform state-of-the-art LLMs. We further study the trained models on different aspects, including mathematical reasoning capabilities and zero-shot cross-lingual transfer. Our work is the first on low-resource tableQA focusing on scalable data generation and evaluation procedures. Our proposed data generation method can be applied to any low-resource language with a web presence. We release datasets, models, and code (https://github.com/kolk/Low-Resource-TableQA-Indic-languages).	翻訳日:2024-11-02 21:17:55 公開日:2024-10-04
# メモリスペース・ビジュアル・リトラクションによるマルチモーダル大言語モデルにおける幻覚の緩和 Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models ( http://arxiv.org/abs/2410.03577v1 ) ライセンス: Link先を確認	Xin Zou, Yizhou Wang, Yibo Yan, Sirui Huang, Kening Zheng, Junkai Chen, Chang Tang, Xuming Hu,	(参考訳) その印象的な能力にもかかわらず、マルチモーダル大言語モデル(MLLM)は幻覚の影響を受けやすい。上記の課題に対処するために、私たちは共通の認知プロセスに従います - 重要なオンサイトの詳細の初期の記憶が消えると、現実的で正確な答えを求めるために、もう一度彼らを見るのは直感的です。そこで我々は,外部知識検索や追加の微調整を必要とせず,新たな幻覚緩和パラダイムであるメモリスペース・ビジュアル・リトラクション(MemVR)を導入する。特に、モデルが不確かである場合や、質問関連視覚記憶に注意を払っている場合、フィードフォワードネットワーク(FFN)を介してMLLMにリジェクションされる補助的証拠として視覚刺激をキーバリューメモリとして扱う。総合的な実験的評価により、MemVRは様々なMLLMの幻覚問題を著しく軽減し、追加の時間オーバーヘッドを発生させることなく、一般的なベンチマークで優れていることが示される。 Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) are susceptible to hallucinations, especially assertively fabricating content not present in the visual inputs. To address the aforementioned challenge, we follow a common cognitive process - when one's initial memory of critical on-sight details fades, it is intuitive to look at them a second time to seek a factual and accurate answer. Therefore, we introduce Memory-space Visual Retracing (MemVR), a novel hallucination mitigation paradigm that without the need for external knowledge retrieval or additional fine-tuning. In particular, we treat visual prompts as supplementary evidence to be reinjected into MLLMs via Feed Forward Network (FFN) as key-value memory, when the model is uncertain or even amnesic about question-relevant visual memories. Comprehensive experimental evaluations demonstrate that MemVR significantly mitigates hallucination issues across various MLLMs and excels in general benchmarks without incurring added time overhead, thus emphasizing its potential for widespread applicability.	翻訳日:2024-11-02 21:17:55 公開日:2024-10-04
# 自動運転車開発におけるビデオデータ検索のためのマルチモデルアプローチ A Multi-model Approach for Video Data Retrieval in Autonomous Vehicle Development ( http://arxiv.org/abs/2410.03580v1 ) ライセンス: Link先を確認	Jesper Knapp, Klas Moberg, Yuchuan Jin, Simin Sun, Miroslaw Staron,	(参考訳) 自律運転ソフトウェアは毎秒大量のデータを生成し、それはソフトウェア開発組織が将来の分析とテストのためにログ形式で保存する。しかし、このデータの大きさを考えると、車両ログのコレクション内の特定のシナリオを特定することは困難である。これらのシナリオを見つけるために正しいSQLクエリを書くには、エンジニアがSQLと特定のデータベースに強いバックグラウンドを持つ必要があり、さらに検索プロセスが複雑になる。本稿では,SQLの代わりに自然言語記述を用いて,ログコレクションの特定のシナリオを検索するパイプラインを提示し,評価する。生成した記述は、Zenseactで車両ログを1から5のスケールで作業するエンジニアによって評価された。私たちのアプローチは平均3.3のスコアを獲得し、ソフトウェア開発ワークフローを改善するためにマルチモデルアーキテクチャを使うことの可能性を示しました。また、クエリプロセスを視覚化し、結果を視覚化するインターフェースも提示する。 Autonomous driving software generates enormous amounts of data every second, which software development organizations save for future analysis and testing in the form of logs. However, given the vast size of this data, locating specific scenarios within a collection of vehicle logs can be challenging. Writing the correct SQL queries to find these scenarios requires engineers to have a strong background in SQL and the specific databases in question, further complicating the search process. This paper presents and evaluates a pipeline that allows searching for specific scenarios in log collections using natural language descriptions instead of SQL. The generated descriptions were evaluated by engineers working with vehicle logs at the Zenseact on a scale from 1 to 5. Our approach achieved a mean score of 3.3, demonstrating the potential of using a multi-model architecture to improve the software development workflow. We also present an interface that can visualize the query process and visualize the results.	翻訳日:2024-11-02 21:17:55 公開日:2024-10-04
# 散逸型Landau-Zenerモデルにおける量子軌道の統計的解析 Statistical analysis of quantum trajectories in dissipative Landau-Zener model ( http://arxiv.org/abs/2410.03582v1 ) ライセンス: Link先を確認	Laleh Memarzadeh, Rosario Fazio,	(参考訳) マルコフ過程を行うランダウ・ツェナー・ハミルトニアンを持つ2レベル系における量子ジャンプの統計について述べる。断熱・非断熱・非断熱のシミュレーションに成功しているランダウ・ツェナーモデルについて, 2種類の散逸を考察する。第一に、ジャンプ作用素のプロジェクトは、初期基底状態とハミルトニアンの励起状態に$t\to -\infty$で記述する。第2のタイプでは、ジャンプ作用素はハミルトニアンの瞬時固有状態に射影する。量子軌道法により、両方のモデルに対する断熱的および非断熱的状態におけるジャンプ数の確率を示す。さらに、進化の時間間隔におけるジャンプの統計を実証する。また, 浴槽温度, 環境との結合強度, スピンカップリング方向が量子ジャンプの統計に与える影響を示す。 We present statistics of quantum jumps in the two-level system with landau-Zener Hamiltonian that undergoes a Markovian process. For the Landau-Zener model, which is successful in simulating adiabatic/non-adiabatic evolution and quantum annealing, we consider two types of dissipation. In the first one, the jump operators project states to the initial ground state and excited state of the Hamiltonian at $t\to -\infty$. In the second type, the jump operators project to the instantaneous eigenstates of the Hamiltonian. By the quantum trajectories approach, we present the probability of the number of jumps in adiabatic and non-adiabatic regimes for both models. Furthermore, we demonstrate the statistics of jumps in time intervals of the evolutions. Also, we show the role of bath temperature, coupling strength to the environment, and spin-coupling directions on the statistics of quantum jumps.	翻訳日:2024-11-02 21:17:55 公開日:2024-10-04
# MeDeT:少人数のメタラーニングによる医療機器デジタルツインズ作成 MeDeT: Medical Device Digital Twins Creation with Few-shot Meta-learning ( http://arxiv.org/abs/2410.03585v1 ) ライセンス: Link先を確認	Hassan Sartaj, Shaukat Ali, Julie Marie Gjøby,	(参考訳) システムと統合レベルにおける医療用IoT(Internet of Things)アプリケーションをテストするには、さまざまなタイプの医療機器を統合する必要がある。医療機器の導入の課題 (i)その連続的な進化により、すべての装置の変種を含ませることが不可能となり、 (ii) 大規模な厳密なテストには複数のデバイスとそのバリエーションが必要で、それは時間集約的でコストがかかり、実用的ではない。私たちの共同研究者であるOslo City’s Health Departmentは、自動テストインフラストラクチャの開発において、これらの課題に直面しました。本稿では,医療機器のディジタルツイン(DT)を生成し,進化するデバイスにDTを適用するメタラーニングベースアプローチ(MeDeT)を提案する。我々は、現実世界の医療用IoTアプリケーションと統合された5つの広く使われている医療機器を用いて、OsloCityのコンテキストでMeDeTを評価する。評価では,様々なデバイスやバージョンにまたがるDTの生成と適応を行うMeDeTの能力,これらのDTの忠実度,1000個のDTの同時動作のスケーラビリティ,関連する時間的コストを評価した。その結果、MeDeTは96%以上の忠実度でDTを生成し、異なるデバイスや新しいバージョンにDTを適応させ、時間コスト(約1分)を削減し、忠実度レベルを維持しながらスケーラブルな1000のDTを運用でき、テスト用の物理デバイスの代わりに機能することがわかった。 Testing healthcare Internet of Things (IoT) applications at system and integration levels necessitates integrating numerous medical devices of various types. Challenges of incorporating medical devices are: (i) their continuous evolution, making it infeasible to include all device variants, and (ii) rigorous testing at scale requires multiple devices and their variants, which is time-intensive, costly, and impractical. Our collaborator, Oslo City's health department, faced these challenges in developing automated test infrastructure, which our research aims to address. In this context, we propose a meta-learning-based approach (MeDeT) to generate digital twins (DTs) of medical devices and adapt DTs to evolving devices. We evaluate MeDeT in OsloCity's context using five widely-used medical devices integrated with a real-world healthcare IoT application. Our evaluation assesses MeDeT's ability to generate and adapt DTs across various devices and versions using different few-shot methods, the fidelity of these DTs, the scalability of operating 1000 DTs concurrently, and the associated time costs. Results show that MeDeT can generate DTs with over 96% fidelity, adapt DTs to different devices and newer versions with reduced time cost (around one minute), and operate 1000 DTs in a scalable manner while maintaining the fidelity level, thus serving in place of physical devices for testing.	翻訳日:2024-11-02 21:17:55 公開日:2024-10-04
# 不均衡分類における性能・適応性向上のためのハイパーパラメータ分布の学習 Training Over a Distribution of Hyperparameters for Enhanced Performance and Adaptability on Imbalanced Classification ( http://arxiv.org/abs/2410.03588v1 ) ライセンス: Link先を確認	Kelsey Lieberman, Swarna Kamlam Ravindran, Shuai Yuan, Carlo Tomasi,	(参考訳) 二項分類はよく研究されている問題であるが、厳密なクラス不均衡の下での信頼性の高い分類器の訓練は依然として課題である。最近の技術は、損失関数や最適化方法を変更することにより、トレーニングにおける不均衡の悪影響を軽減する。これらの損失関数上の異なるハイパーパラメータ値が、異なるリコール値でよりよく機能することを示す。我々は,この事実を,単一値の代わりにハイパーパラメータ値の分布を1つのモデルで学習し,LCT(Loss Conditional Training)を介して活用することを提案する。実験により、ハイパーパラメータの分布に対するトレーニングは、いくつかのモデルのパフォーマンスを近似するだけでなく、CIFARおよびメラノーマや糖尿病網膜症検出などの実際の医療画像アプリケーションにおけるモデル全体のパフォーマンスを実際に改善することが示された。さらに、LCTを用いたトレーニングモデルは、スクラッチから再トレーニングする必要なく、個々のニーズを満たすためにトレーニング後にいくつかのハイパーパラメータチューニングを行うことができるため、より効率的である。 Although binary classification is a well-studied problem, training reliable classifiers under severe class imbalance remains a challenge. Recent techniques mitigate the ill effects of imbalance on training by modifying the loss functions or optimization methods. We observe that different hyperparameter values on these loss functions perform better at different recall values. We propose to exploit this fact by training one model over a distribution of hyperparameter values--instead of a single value--via Loss Conditional Training (LCT). Experiments show that training over a distribution of hyperparameters not only approximates the performance of several models but actually improves the overall performance of models on both CIFAR and real medical imaging applications, such as melanoma and diabetic retinopathy detection. Furthermore, training models with LCT is more efficient because some hyperparameter tuning can be conducted after training to meet individual needs without needing to retrain from scratch.	翻訳日:2024-11-02 21:17:55 公開日:2024-10-04
# 変分ベイズガウススプレイティング Variational Bayes Gaussian Splatting ( http://arxiv.org/abs/2410.03592v1 ) ライセンス: Link先を確認	Toon Van de Maele, Ozan Catal, Alexander Tschantz, Christopher L. Buckley, Tim Verbelen,	(参考訳) 近年,3Dガウシアン・スプラッティングはガウシアンの混合物を用いた3Dシーンのモデリングに有望なアプローチとして出現している。これらのモデルの主な最適化方法は、連続的なデータストリームを扱う際の破滅的な忘れ込みに苦労する、微分可能なレンダリングパイプラインによる勾配のバックプロパゲートに依存している。この制限に対処するために,モデルパラメータに対する変分推論としてガウススプレートをトレーニングするための新しいアプローチである変分ベイズ・ガウススプラッティング(VBGS)を提案する。多変量ガウスの共役特性を利用することで、バッファを再生することなく部分的な逐次観測から効率的な更新を可能にする閉形式変分更新規則を導出する。実験の結果,VBGSは静的データセット上での最先端性能と一致しただけでなく,逐次ストリーミングされた2Dデータや3Dデータからの連続学習も可能であり,この設定における性能を大幅に向上させることができた。 Recently, 3D Gaussian Splatting has emerged as a promising approach for modeling 3D scenes using mixtures of Gaussians. The predominant optimization method for these models relies on backpropagating gradients through a differentiable rendering pipeline, which struggles with catastrophic forgetting when dealing with continuous streams of data. To address this limitation, we propose Variational Bayes Gaussian Splatting (VBGS), a novel approach that frames training a Gaussian splat as variational inference over model parameters. By leveraging the conjugacy properties of multivariate Gaussians, we derive a closed-form variational update rule, allowing efficient updates from partial, sequential observations without the need for replay buffers. Our experiments show that VBGS not only matches state-of-the-art performance on static datasets, but also enables continual learning from sequentially streamed 2D and 3D data, drastically improving performance in this setting.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# Explicit, Implicit, Scattered: 複雑な引数のキャプチャへのイベント抽出の再検討 Explicit, Implicit, and Scattered: Revisiting Event Extraction to Capture Complex Arguments ( http://arxiv.org/abs/2410.03594v1 ) ライセンス: Link先を確認	Omar Sharif, Joseph Gatto, Madhusudan Basak, Sarah M. Preum,	(参考訳) 先行研究は、イベント固有の引数の抽出をスパン抽出問題として定式化している。本研究では、既存のEEフレームワークでモデル化できない2つの主要な引数タイプを導入することで、イベント抽出(EE)の定義を再考する。まず、暗黙の引数は、テキストで明示的に言及されていないが、コンテキストを通して推論できるイベント引数である。第二に、散在する引数は、テキスト全体に散在する情報からなるイベント引数である。これら2つの引数タイプは、適切なイベントモデリングに必要な全情報を引き出すために不可欠である。明示的,暗黙的,散在的な議論の抽出を支援するために,オンライン健康談話から7,464の論証アノテーションを含む新しいデータセットであるDiscourseEEを開発した。特に51.2%が暗黙的であり、17.4%が散在しており、DiscourseEEは複雑なイベント抽出のためのユニークなコーパスとなっている。さらに、テキスト生成問題として引数抽出を定式化し、複雑な引数型の抽出を容易にする。我々は、最先端モデルの総合的な評価を行い、生成イベント抽出における重要なオープン課題を明らかにする。私たちのデータとコードベースはhttps://omar-sharif03.github.io/DiscourseEEで公開されています。 Prior works formulate the extraction of event-specific arguments as a span extraction problem, where event arguments are explicit -- i.e. assumed to be contiguous spans of text in a document. In this study, we revisit this definition of Event Extraction (EE) by introducing two key argument types that cannot be modeled by existing EE frameworks. First, implicit arguments are event arguments which are not explicitly mentioned in the text, but can be inferred through context. Second, scattered arguments are event arguments that are composed of information scattered throughout the text. These two argument types are crucial to elicit the full breadth of information required for proper event modeling. To support the extraction of explicit, implicit, and scattered arguments, we develop a novel dataset, DiscourseEE, which includes 7,464 argument annotations from online health discourse. Notably, 51.2% of the arguments are implicit, and 17.4% are scattered, making DiscourseEE a unique corpus for complex event extraction. Additionally, we formulate argument extraction as a text generation problem to facilitate the extraction of complex argument types. We provide a comprehensive evaluation of state-of-the-art models and highlight critical open challenges in generative event extraction. Our data and codebase are available at https://omar-sharif03.github.io/DiscourseEE.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# ホップフィールドの視点から考える「誓いの連鎖」の推論 Understanding Reasoning in Chain-of-Thought from the Hopfieldian View ( http://arxiv.org/abs/2410.03595v1 ) ライセンス: Link先を確認	Lijie Hu, Liang Liu, Shu Yang, Xin Chen, Zhen Tan, Muhammad Asif Ali, Mengdi Li, Di Wang,	(参考訳) 大規模言語モデルは様々なタスクにおいて顕著な能力を示しており、Chain-of-Thought(CoT)は推論能力を高める重要なテクニックとして台頭している。しかし、既存の研究は主にパフォーマンスの改善に焦点を当てており、CoTの成功の背景にある基本的な要因を説明し、理解するための包括的なフレームワークが欠如している。このギャップを埋めるために、認知神経科学における認知のホップフィールド的視点に基づく新しい視点を導入する。我々は、CoT推論と刺激、行動、神経集団、表現空間といった重要な認知要素との関係を確立する。我々の見解では、これらの表現空間間の移動として、推論過程を理解することができる。この知見に基づいて,CoTの応答における推論誤差の局所化手法を開発した。さらに、低次元表現空間のロバスト性を利用して、CoTにおける推論過程のロバスト性を高めるRepresentation-of-Thought(RoT)フレームワークを提案する。実験により,RoTは推理過程のきめ細かい制御を行いながら,CoT推理の堅牢性と解釈性を向上させることが示された。 Large Language Models have demonstrated remarkable abilities across various tasks, with Chain-of-Thought (CoT) prompting emerging as a key technique to enhance reasoning capabilities. However, existing research primarily focuses on improving performance, lacking a comprehensive framework to explain and understand the fundamental factors behind CoT's success. To bridge this gap, we introduce a novel perspective grounded in the Hopfieldian view of cognition in cognitive neuroscience. We establish a connection between CoT reasoning and key cognitive elements such as stimuli, actions, neural populations, and representation spaces. From our view, we can understand the reasoning process as the movement between these representation spaces. Building on this insight, we develop a method for localizing reasoning errors in the response of CoTs. Moreover, we propose the Representation-of-Thought (RoT) framework, which leverages the robustness of low-dimensional representation spaces to enhance the robustness of the reasoning process in CoTs. Experimental results demonstrate that RoT improves the robustness and interpretability of CoT reasoning while offering fine-grained control over the reasoning process.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# SiMilarity-Enhanced Homophily for Multi-View Heterophilous Graph Clustering SiMilarity-Enhanced Homophily for Multi-View Heterophilous Graph Clustering ( http://arxiv.org/abs/2410.03596v1 ) ライセンス: Link先を確認	Jianpeng Chen, Yawen Ling, Yazhou Ren, Zichen Wen, Tianyi Wu, Shufei Zhang, Lifang He,	(参考訳) グラフ構造化データの普及に伴い、様々なダウンストリームアプリケーションでマルチビューグラフクラスタリングが広く使われている。既存のアプローチは主に、クラスタリングのパフォーマンスを大幅に向上させる統一メッセージパッシング機構に依存しています。それにもかかわらず、この機構はホモフィリーの仮定に基づいて基本的に述示されるため、その適用性に制限を与える、すなわち連結ノードは同じクラスに属することが多い。実際、この仮定は必ずしも成り立たない; 適度に、あるいは軽度にホモフィル性のあるグラフは、グラフの必然的ヘテロフィル性情報のため、完全ホモフィル性のあるグラフよりも一般的である。本稿では,多視点ヘテロ親和性グラフクラスタリング(SMHGC)のためのSiMilarity-enhanced Homophilyを提案する。類似度とグラフホモフィリーの関係を解析することにより,隣接パターン類似度,ノード特徴類似度,多視点グローバル類似度という3つの類似度項をラベルフリーで導入することにより,ホモフィリィを向上させることを提案する。そして、異なる視点から改良された同好性グラフを融合させ、クラスタリングに利用するために、コンセンサスに基づくビュー内およびビュー内融合パラダイムを提案する。多視点ヘテロ親和性データセットおよびホモ親和性データセットの最先端実験結果は、教師なし多視点ヘテロ親和性グラフ学習における類似性の強い能力を示す。さらに、ホモフィリーのレベルが異なる半合成データセット間の一貫した性能は、SMHGCのヘテロフィリーへの弾力性のさらなる証拠となる。 With the increasing prevalence of graph-structured data, multi-view graph clustering has been widely used in various downstream applications. Existing approaches primarily rely on a unified message passing mechanism, which significantly enhances clustering performance. Nevertheless, this mechanism limits its applicability to heterophilous situations, as it is fundamentally predicated on the assumption of homophily, i.e., the connected nodes often belong to the same class. In reality, this assumption does not always hold; a moderately or even mildly homophilous graph is more common than a fully homophilous one due to inevitable heterophilous information in the graph. To address this issue, in this paper, we propose a novel SiMilarity-enhanced Homophily for Multi-view Heterophilous Graph Clustering (SMHGC) approach. By analyzing the relationship between similarity and graph homophily, we propose to enhance the homophily by introducing three similarity terms, i.e., neighbor pattern similarity, node feature similarity, and multi-view global similarity, in a label-free manner. Then, a consensus-based inter- and intra-view fusion paradigm is proposed to fuse the improved homophilous graph from different views and utilize them for clustering. The state-of-the-art experimental results on both multi-view heterophilous and homophilous datasets collectively demonstrate the strong capacity of similarity for unsupervised multi-view heterophilous graph learning. Additionally, the consistent performance across semi-synthetic datasets with varying levels of homophily serves as further evidence of SMHGC's resilience to heterophily.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# 混合音源テキスト中のウォーターマーク付きセグメントを効率よく同定する Efficiently Identifying Watermarked Segments in Mixed-Source Texts ( http://arxiv.org/abs/2410.03600v1 ) ライセンス: Link先を確認	Xuandong Zhao, Chenwen Liao, Yu-Xiang Wang, Lei Li,	(参考訳) 大規模言語モデル(LLM)のテキスト透かしは、偽ニュースや学術的不正といった誤用を軽減し、合成テキストを検出するためにますます使われている。既存の透かし検出技術は主に文書全体を透かしとして分類することに焦点を当てているが、より長い混在する文書の中で個々の透かしセグメントを特定するという一般的なシナリオを無視することが多い。プラジャリズム検出システムからインスピレーションを得て, 部分透かし検出のための2つの新しい手法を提案する。まず,長文に透かしセグメントが存在するかどうかを判定するための幾何被覆検出フレームワークを開発する。第2に,テキスト内の透かしセグメントの正確な位置を特定できる適応型オンライン学習アルゴリズムを提案する。提案手法は,KGW-Watermark,Unigram-Watermark,Gumbel-Watermarkの3つの一般的な透かし技術(KGW-Watermark,Unigram-Watermark,Gumbel-Watermark)で評価され,精度が高く,ベースライン法よりも優れていた。さらに,本フレームワークは他の透かし技術にも適応し,高精度な透かし検出のための新たな洞察を提供する。 Text watermarks in large language models (LLMs) are increasingly used to detect synthetic text, mitigating misuse cases like fake news and academic dishonesty. While existing watermarking detection techniques primarily focus on classifying entire documents as watermarked or not, they often neglect the common scenario of identifying individual watermark segments within longer, mixed-source documents. Drawing inspiration from plagiarism detection systems, we propose two novel methods for partial watermark detection. First, we develop a geometry cover detection framework aimed at determining whether there is a watermark segment in long text. Second, we introduce an adaptive online learning algorithm to pinpoint the precise location of watermark segments within the text. Evaluated on three popular watermarking techniques (KGW-Watermark, Unigram-Watermark, and Gumbel-Watermark), our approach achieves high accuracy, significantly outperforming baseline methods. Moreover, our framework is adaptable to other watermarking techniques, offering new insights for precise watermark detection.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# 離散拡散と連続拡散:確率積分フレームワークによる離散拡散モデルの包括的解析 How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework ( http://arxiv.org/abs/2410.03601v1 ) ライセンス: Link先を確認	Yinuo Ren, Haoxuan Chen, Grant M. Rotskoff, Lexing Ying,	(参考訳) 離散拡散モデルは、抽出可能なサンプリングと推論で複雑な分布をモデル化する能力に注目が集まっている。しかし、離散拡散モデルの誤差解析はよく理解されていない。本研究では,L'evy型確率積分に基づく離散拡散モデルの誤差解析のための包括的フレームワークを提案する。ポアソン確率測度を時間非依存かつ状態依存の強度で一般化することにより、離散拡散モデルの確率的積分定式化を厳格に確立し、それに対応する測度定理を、イット・オ積分やギルサノフの連続的な定理と類似した形で与える。我々のフレームワークは離散拡散モデルにおける現在の理論結果を統一・強化し、KL分散における$\tau$-leapingスキームに対する最初のエラー境界を得る。誤差源の特定により,離散拡散モデルの数学的性質を新たに把握し,実世界の離散拡散モデル応用のための効率的かつ正確なアルゴリズム設計のためのガイダンスを提供する。 Discrete diffusion models have gained increasing attention for their ability to model complex distributions with tractable sampling and inference. However, the error analysis for discrete diffusion models remains less well-understood. In this work, we propose a comprehensive framework for the error analysis of discrete diffusion models based on L\'evy-type stochastic integrals. By generalizing the Poisson random measure to that with a time-independent and state-dependent intensity, we rigorously establish a stochastic integral formulation of discrete diffusion models and provide the corresponding change of measure theorems that are intriguingly analogous to It\^o integrals and Girsanov's theorem for their continuous counterparts. Our framework unifies and strengthens the current theoretical results on discrete diffusion models and obtains the first error bound for the $\tau$-leaping scheme in KL divergence. With error sources clearly identified, our analysis gives new insight into the mathematical properties of discrete diffusion models and offers guidance for the design of efficient and accurate algorithms for real-world discrete diffusion model applications.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# 勾配に基づく最適化によるゲージ固定条件の探索 Exploring gauge-fixing conditions with gradient-based optimization ( http://arxiv.org/abs/2410.03602v1 ) ライセンス: Link先を確認	William Detmold, Gurtej Kanwar, Yin Lin, Phiala E. Shanahan, Michael L. Wagman,	(参考訳) 格子ゲージの固定はゲージ変量を計算するために必要であり、例えば RI-MOM 再正規化スキームやモデル計算の比較対象として用いられる。近年,輪郭変形を用いた信号対雑音の最適化にはゲージ変量の方が有効であることが判明した。これらの応用は、ゲージ固定スキームの体系的なパラメータ化と探索を動機付けている。この研究は、ランダウゲージ、クーロンゲージ、最大木ゲージをカバーするのに十分な広さを持つゲージ固定の微分可能なパラメータ化を導入する。随伴状態法は勾配に基づく最適化を可能にし、任意の目標損失関数を最小化するゲージ固定スキームを選択する。 Lattice gauge fixing is required to compute gauge-variant quantities, for example those used in RI-MOM renormalization schemes or as objects of comparison for model calculations. Recently, gauge-variant quantities have also been found to be more amenable to signal-to-noise optimization using contour deformations. These applications motivate systematic parameterization and exploration of gauge-fixing schemes. This work introduces a differentiable parameterization of gauge fixing which is broad enough to cover Landau gauge, Coulomb gauge, and maximal tree gauges. The adjoint state method allows gradient-based optimization to select gauge-fixing schemes that minimize an arbitrary target loss function.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# 多出力量子パルスゲートを用いた単一光子のプログラム可能な時間周波数モードソート Programmable time-frequency mode-sorting of single photons with a multi-output quantum pulse gate ( http://arxiv.org/abs/2410.03606v1 ) ライセンス: Link先を確認	Laura Serino, Christof Eigner, Benjamin Brecht, Christine Silberhorn,	(参考訳) 我々は、パルスモード、周波数ビン、時間ビン、およびそれらの重畳を含む異なる時間モード符号化をプログラムで切り替えることができる、マルチ出力の量子パルスゲートに基づく単一光子の高次元モードソータを実証する。この装置は、高次元量子鍵分布などの量子情報応用の実用的な実現を容易にし、情報容量を増強したセキュアな通信を可能にする。我々は3次元と5次元の検出器トモグラフィーを用いてモードソータを特徴付け、単一光子レベルで0.958\pm 0.030$の忠実度を求める。 We demonstrate a high-dimensional mode-sorter for single photons based on a multi-output quantum pulse gate, which we can program to switch between different temporal-mode encodings including pulse modes, frequency bins, time bins, and their superpositions. This device can facilitate practical realizations of quantum information applications such as high-dimensional quantum key distribution, and thus enables secure communication with enhanced information capacity. We characterize the mode-sorter through a detector tomography in 3 and 5 dimensions and find a fidelity up to $0.958\pm 0.030$ at the single-photon level.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# TICKing all the Boxes: Generated Checklists improveing LLM Evaluation and generation TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation ( http://arxiv.org/abs/2410.03608v1 ) ライセンス: Link先を確認	Jonathan Cook, Tim Rocktäschel, Jakob Foerster, Dennis Aumiller, Alex Wang,	(参考訳) LLM(Large Language Models)の普及と利用が広まる中、命令追従能力を柔軟かつ解釈可能な評価を行うことが不可欠である。複雑な多面的選好を単一ランクに蒸留したにもかかわらず、モデル出力間の選好判断がデファクト評価標準となっている。さらに、人間のアノテーションは遅くてコストがかかるため、信頼性と解釈可能性の犠牲から、LCMはこれらの判断を下すためにますます使われています。本研究では,LLM生成した命令固有チェックリストを用いて評価を構造化する,完全に自動化された解釈可能な評価プロトコルであるTICK(Targeted Instruct-evaluation with ChecKlists)を提案する。まず、命令が与えられた場合、LLMは、命令を一連のYES/NO質問に分解する高品質で調整された評価チェックリストを確実に生成できることを示す。各質問は、候補の応答が命令の特定の要求を満たすかどうかを問う。 LLMの判定と人的嗜好の正確な一致の頻度は、LCMが直接アウトプットを採点するのに対して、TICKを使用すると顕著に増加(46.4%$\to$52.2%)することを示した。次に、STICK(Self-TICK)は、自己精製とベストオブN選択により、複数のベンチマークで生成品質を向上させることができることを示す。 STICKによるLiveBench推論タスクの自己リファインメントは、絶対的な$$7.8%、STICKによるベスト・オブ・Nの選択は、実世界の命令データセットWildBenchに対して$6.3%の絶対的な改善を達成している。これを踏まえ、構造化された多面的自己改善は、LLM機能をさらに向上するための有望な方法であることが示されている。最後に、WildBench命令に対して直接LLM応答をスコアする人間評価者にLLM生成チェックリストを提供することにより、アノテーション間の合意(0.194$\to$ 0.256)を増大させる。 Given the widespread adoption and usage of Large Language Models (LLMs), it is crucial to have flexible and interpretable evaluations of their instruction-following ability. Preference judgments between model outputs have become the de facto evaluation standard, despite distilling complex, multi-faceted preferences into a single ranking. Furthermore, as human annotation is slow and costly, LLMs are increasingly used to make these judgments, at the expense of reliability and interpretability. In this work, we propose TICK (Targeted Instruct-evaluation with ChecKlists), a fully automated, interpretable evaluation protocol that structures evaluations with LLM-generated, instruction-specific checklists. We first show that, given an instruction, LLMs can reliably produce high-quality, tailored evaluation checklists that decompose the instruction into a series of YES/NO questions. Each question asks whether a candidate response meets a specific requirement of the instruction. We demonstrate that using TICK leads to a significant increase (46.4% $\to$ 52.2%) in the frequency of exact agreements between LLM judgements and human preferences, as compared to having an LLM directly score an output. We then show that STICK (Self-TICK) can be used to improve generation quality across multiple benchmarks via self-refinement and Best-of-N selection. STICK self-refinement on LiveBench reasoning tasks leads to an absolute gain of $+$7.8%, whilst Best-of-N selection with STICK attains $+$6.3% absolute improvement on the real-world instruction dataset, WildBench. In light of this, structured, multi-faceted self-improvement is shown to be a promising way to further advance LLM capabilities. Finally, by providing LLM-generated checklists to human evaluators tasked with directly scoring LLM responses to WildBench instructions, we notably increase inter-annotator agreement (0.194 $\to$ 0.256).	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# 輸入代替条件下におけるハイテク企業の経営 Management of high-tech companies in conditions of import substitution ( http://arxiv.org/abs/2410.03610v1 ) ライセンス: Link先を確認	S. E. Pyatovsky, N. S. Efimova, E. V. Surkova,	(参考訳) この記事では、輸入代替の文脈で、ロシア経済のハイテクセクターの発展を分析する。優先順位の高いプロジェクトのポートフォリオを管理する機能は考慮されている。航空産業企業のための統合情報空間構築の課題は、経営決定支援の修正OLAP技術の導入という文脈において研究される。ロシア経済のハイテク部門の投資魅力は、プロジェクト製品に付加される総生産価値の係数に基づいて推定される。投資過熱産業が特定され、市場補正やプロジェクト資産のバランスの取れた状態への返還に関する勧告が与えられる。 The article analyzes the development of high-tech sectors of the Russian economy in the context of import substitution. Features of managing priority project portfolios are considered. Issues of creating a unified information space for aviation industry enterprises are studied in the context of introduction of a modified OLAP technology of management decision support. Investment attractiveness of high-tech sectors of the Russian economy is estimated based on the coefficient of gross value added of project products. Investment-overheated industries are identified, and recommendations on market correction and returning project assets to a balanced state are given.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# モバイルプラットフォーム上での大規模言語モデルパフォーマンスベンチマーク:詳細な評価 Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation ( http://arxiv.org/abs/2410.03613v1 ) ライセンス: Link先を確認	Jie Xiao, Qianyi Huang, Xu Chen, Chen Tian,	(参考訳) 大規模言語モデル(LLM)が私たちの仕事や日常生活のあらゆる側面にますます統合されるにつれて、ユーザのプライバシに関する懸念が高まっています。スマートフォン上でローカルに実行できる軽量のLCM(Gemini Nano, LLAMA2 7Bなど)が多数ある。急速に普及しているアプリケーションとして、市販のモバイルデバイスのパフォーマンスを懸念しています。モバイルプラットフォーム上でのLLM展開の現在の状況を理解するため,モバイル機器に関する総合的な計測研究を行った。トークンのスループットやレイテンシ,バッテリ消費といったユーザエクスペリエンスに影響を与える指標に加えて,リソース利用やDVFS戦略,推論エンジンなど,開発者にとって重要な要因も評価します。さらに、これらのハードウェア機能とシステムダイナミクスがデバイス上でのLCMパフォーマンスにどのように影響するかを詳細に分析し、モバイルLCMアプリケーションのボトルネックを特定し対処するのに役立ちます。また、主要なベンダーとモバイルシステムオンチップ(SoC)を総合的に比較し、LLMワークロードの処理におけるパフォーマンスの違いを強調します。本研究は,デバイス上でのLCMの開発と,将来のモバイルシステムアーキテクチャの設計の両面での洞察を得られることを期待する。 As large language models (LLMs) increasingly integrate into every aspect of our work and daily lives, there are growing concerns about user privacy, which push the trend toward local deployment of these models. There are a number of lightweight LLMs (e.g., Gemini Nano, LLAMA2 7B) that can run locally on smartphones, providing users with greater control over their personal data. As a rapidly emerging application, we are concerned about their performance on commercial-off-the-shelf mobile devices. To fully understand the current landscape of LLM deployment on mobile platforms, we conduct a comprehensive measurement study on mobile devices. We evaluate both metrics that affect user experience, including token throughput, latency, and battery consumption, as well as factors critical to developers, such as resource utilization, DVFS strategies, and inference engines. In addition, we provide a detailed analysis of how these hardware capabilities and system dynamics affect on-device LLM performance, which may help developers identify and address bottlenecks for mobile LLM applications. We also provide comprehensive comparisons across the mobile system-on-chips (SoCs) from major vendors, highlighting their performance differences in handling LLM workloads. We hope that this study can provide insights for both the development of on-device LLMs and the design for future mobile system architecture.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# スケールでのモデルマージには何が重要か? What Matters for Model Merging at Scale? ( http://arxiv.org/abs/2410.03617v1 ) ライセンス: Link先を確認	Prateek Yadav, Tu Vu, Jonathan Lai, Alexandra Chronopoulou, Manaal Faruqui, Mohit Bansal, Tsendsuren Munkhdalai,	(参考訳) モデルマージは、複数の専門家モデルとより有能な単一モデルを組み合わせることを目的としており、ストレージの削減やサービスコストの削減、一般化の改善、分散モデル開発のサポートなどのメリットを提供する。その約束にもかかわらず、これまでの研究は主にいくつかの小さなモデルをマージすることに焦点を当ててきた。このことは、モデルのサイズをスケールすることの影響や、マージされたモデルの性能に影響を与えるために、ベースモデルの品質やエキスパートモデルの数など、他の重要な要因とどのように相互作用するかについて、多くの未解決の疑問を残している。この研究は、これらの異なる要因の影響を調べながら、大規模にマージするモデルの有用性を体系的に評価する。 Averaging, Task~Arithmetic, Dare, TIESの4つの一般的なマージ手法を使って,1B-64Bパラメータから最大8種類のエキスパートモデルへのマージを行う。筆者らは, 保持タスク, すなわち, 専門家の訓練タスク, 保持タスクのゼロショット一般化について, 統合モデルの評価を行った。我々の実験は、スケールでのモデルマージと異なる要因間の相互作用に関するいくつかの新しい洞察を提供する。まず、強力なベースモデル、すなわちゼロショット性能の優れたモデルから専門家が作成されると、マージがより効果的であることが分かる。第二に、より大型のモデルによりマージが容易になる。第3のマージは、常に一般化機能を改善する。特に、8つの大きなエキスパートモデルをマージする場合、マージされたモデルは、マルチタスクのトレーニングされたモデルと比較して、より一般化されることが多い。第4に,より大きなモデルを扱う場合には,より多くのエキスパートモデルをマージする方がよいのです。第5に、異なるマージ法は大規模で非常に同じように振る舞う。全体としては、モデルマージの興味深い性質に光を当てつつ、いくつかの制限を強調しています。我々は,本研究が今後の研究の大規模統合の基準点になることを期待している。 Model merging aims to combine multiple expert models into a more capable single model, offering benefits such as reduced storage and serving costs, improved generalization, and support for decentralized model development. Despite its promise, previous studies have primarily focused on merging a few small models. This leaves many unanswered questions about the effect of scaling model size and how it interplays with other key factors -- like the base model quality and number of expert models -- , to affect the merged model's performance. This work systematically evaluates the utility of model merging at scale, examining the impact of these different factors. We experiment with merging fully fine-tuned models using 4 popular merging methods -- Averaging, Task~Arithmetic, Dare, and TIES -- across model sizes ranging from 1B-64B parameters and merging up to 8 different expert models. We evaluate the merged models on both held-in tasks, i.e., the expert's training tasks, and zero-shot generalization to unseen held-out tasks. Our experiments provide several new insights about model merging at scale and the interplay between different factors. First, we find that merging is more effective when experts are created from strong base models, i.e., models with good zero-shot performance. Second, larger models facilitate easier merging. Third merging consistently improves generalization capabilities. Notably, when merging 8 large expert models, the merged models often generalize better compared to the multitask trained models. Fourth, we can better merge more expert models when working with larger models. Fifth, different merging methods behave very similarly at larger scales. Overall, our findings shed light on some interesting properties of model merging while also highlighting some limitations. We hope that this study will serve as a reference point on large-scale merging for upcoming research.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# 長期的イマジネーションによるオープンワールド強化学習 Open-World Reinforcement Learning over Long Short-Term Imagination ( http://arxiv.org/abs/2410.03618v1 ) ライセンス: Link先を確認	Jiajian Li, Qi Wang, Yunbo Wang, Xin Jin, Yang Li, Wenjun Zeng, Xiaokang Yang,	(参考訳) 高次元オープンワールドにおける視覚的強化学習エージェントの訓練は、大きな課題を呈している。様々なモデルに基づく手法は、インタラクティブな世界モデルを学ぶことによってサンプル効率を向上させるが、これらのエージェントは、典型的には想像された経験の短い断片に基づいて訓練されるため、短視される傾向がある。オープンワールドの意思決定における大きな障害は、広範な州空間を横断する政治外探査の効率を改善することだと我々は主張する。本稿では,LS-Imagineについて述べる。LS-Imagineは,限られた状態遷移ステップ内でのイマジネーションの地平線を拡大し,エージェントが長期的フィードバックにつながる可能性のある動作を探索することを可能にする。私たちのアプローチの基盤は、長期的な短期的な世界モデルを構築することです。これを実現するために,目標条件付ジャンピー状態遷移をシミュレートし,単一の画像内の特定の領域をズームインすることで,対応するアベイランスマップを計算する。これにより、行動学習への直接的な長期的価値の統合が容易になる。我々の手法は、MineDojoの最先端技術よりも大幅に改善されていることを示す。 Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges. While various model-based methods have improved sample efficiency by learning interactive world models, these agents tend to be "short-sighted", as they are typically trained on short snippets of imagined experiences. We argue that the primary obstacle in open-world decision-making is improving the efficiency of off-policy exploration across an extensive state space. In this paper, we present LS-Imagine, which extends the imagination horizon within a limited number of state transition steps, enabling the agent to explore behaviors that potentially lead to promising long-term feedback. The foundation of our approach is to build a long short-term world model. To achieve this, we simulate goal-conditioned jumpy state transitions and compute corresponding affordance maps by zooming in on specific areas within single images. This facilitates the integration of direct long-term values into behavior learning. Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# 複素値軌道を用いたRDMFTとpCCDの時間反転対称性 Time-Reversal Symmetry in RDMFT and pCCD with Complex-Valued Orbitals ( http://arxiv.org/abs/2410.03620v1 ) ライセンス: Link先を確認	Mauricio Rodríguez-Mayorga, Pierre-François Loos, Fabien Bruneval, Lucas Visscher,	(参考訳) 還元密度行列汎関数理論(RDMFT)と結合クラスタ理論(pCCD)は、いわゆる非動的電子相関効果を考慮した効率的な手法として出現している。これまでのところ、分子計算は実数値軌道で行われている。しかしながら、これらの方法論の適用性をブロッホ状態が用いられる拡張系に拡張する前には、複雑な値を持つ軌道での作業の微妙さと、時間反転対称性を課す結果に注意を払わなければならない。本稿では、複素数値軌道係数を許容する場合に、RDMFTとpCCDで時間反転対称性を採用することの理論的および実践的意義について述べる。理論的な考察は主に最適化アルゴリズムに影響を与えるが、実際的な意味は解の安定性に関する根本的な疑問を提起する。具体的には、非動的電子相関効果が発音されるとき、複素解はエネルギーを低下させる。 N-representability violations(N-representability violations)によって引き起こされるこれらの不安定性と可能な問題を説明・議論するための数値的な例を示す。 Reduced density matrix functional theory (RDMFT) and coupled cluster theory restricted to paired double excitations (pCCD) are emerging as efficient methodologies for accounting for the so-called non-dynamic electronic correlation effects. Up to now, molecular calculations have been performed with real-valued orbitals. However, before extending the applicability of these methodologies to extended systems, where Bloch states are employed, the subtleties of working with complex-valued orbitals and the consequences of imposing time-reversal symmetry must be carefully addressed. In this work, we describe the theoretical and practical implications of adopting time-reversal symmetry in RDMFT and pCCD when allowing for complex-valued orbital coefficients. The theoretical considerations primarily affect the optimization algorithms, while the practical implications raise fundamental questions about the stability of solutions. Specifically, we find that complex solutions lower the energy when non-dynamic electronic correlation effects are pronounced. We present numerical examples to illustrate and discuss these instabilities and possible problems introduced by N-representability violations.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# 電子医療消費者のためのグローバル医療データセキュリティとプライバシ保護標準識別フレームワーク A Global Medical Data Security and Privacy Preserving Standards Identification Framework for Electronic Healthcare Consumers ( http://arxiv.org/abs/2410.03621v1 ) ライセンス: Link先を確認	Vinaytosh Mishra, Kishu Gupta, Deepika Saxena, Ashutosh Kumar Singh,	(参考訳) 電子健康記録(EHR)はデジタルヘルスケアの成功に不可欠であり、消費者をこの変革の中心に据えることに重点を置いている。しかし、医療記録のデジタル化は個人情報のセキュリティとプライバシーのリスクをもたらす。主な懸念は、異なる国が医療データのセキュリティとプライバシーに関して様々な基準を持っていることである。本稿では,これらのルールをグローバルに標準化するための,新しい包括的枠組みを提案する。この提案を支持するため、本研究では既存の文献をレビューし、この問題に対する研究の関心を理解する。また、セキュリティとプライバシに関する6つの重要な法律と標準を調べ、20のコンセプトを特定している。提案手法はK平均クラスタリングを用いて,これらの概念を分類し,5つの要因を同定した。最後に、EHRの文脈におけるこれらの要因の望ましい実装を決定するために、正規優先アプローチを適用した。本研究は、電子健康記録の文脈において、プライバシとセキュリティを実装するための記述的かつ規範的なフレームワークを提供する。したがって,提案フレームワークの発見は,専門家や政策立案者にとって,EHRに関連するセキュリティとプライバシの改善に有用である。 Electronic Health Records (EHR) are crucial for the success of digital healthcare, with a focus on putting consumers at the center of this transformation. However, the digitalization of healthcare records brings along security and privacy risks for personal data. The major concern is that different countries have varying standards for the security and privacy of medical data. This paper proposed a novel and comprehensive framework to standardize these rules globally, bringing them together on a common platform. To support this proposal, the study reviews existing literature to understand the research interest in this issue. It also examines six key laws and standards related to security and privacy, identifying twenty concepts. The proposed framework utilized K-means clustering to categorize these concepts and identify five key factors. Finally, an Ordinal Priority Approach is applied to determine the preferred implementation of these factors in the context of EHRs. The proposed study provides a descriptive then prescriptive framework for the implementation of privacy and security in the context of electronic health records. Therefore, the findings of the proposed framework are useful for professionals and policymakers in improving the security and privacy associated with EHRs.	翻訳日:2024-11-02 21:08:10 公開日:2024-10-04
# HyperCMR: イーグルロスを用いたマルチコントラストCMR再建術 HyperCMR: Enhanced Multi-Contrast CMR Reconstruction with Eagle Loss ( http://arxiv.org/abs/2410.03624v1 ) ライセンス: Link先を確認	Ruru Xu, Caner Özer, Ilkay Oksuz,	(参考訳) 心臓磁気共鳴画像(CMRI)の加速画像取得は重要な課題である。 CMRxRecon2024の課題は、マルチコントラストCMR再構築の最先端の設定である。本稿では,マルチコントラスト心磁気共鳴(CMR)画像の再構成を促進するための新しいフレームワークであるHyperCMRを提案する。 HyperCMR は既存の PromptMR モデルを強化し、特に革新的なイーグルロス (Eagle Loss) を取り入れた。 CMRxRecon2024チャレンジデータセットで実施された大規模な実験は、HyperCMRが複数の評価指標で一貫してベースラインを上回り、優れたSSIMとPSNRスコアを達成していることを示している。 Accelerating image acquisition for cardiac magnetic resonance imaging (CMRI) is a critical task. CMRxRecon2024 challenge aims to set the state of the art for multi-contrast CMR reconstruction. This paper presents HyperCMR, a novel framework designed to accelerate the reconstruction of multi-contrast cardiac magnetic resonance (CMR) images. HyperCMR enhances the existing PromptMR model by incorporating advanced loss functions, notably the innovative Eagle Loss, which is specifically designed to recover missing high-frequency information in undersampled k-space. Extensive experiments conducted on the CMRxRecon2024 challenge dataset demonstrate that HyperCMR consistently outperforms the baseline across multiple evaluation metrics, achieving superior SSIM and PSNR scores.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# 分散補助データを用いたロバストオフライン模倣学習 Robust Offline Imitation Learning from Diverse Auxiliary Data ( http://arxiv.org/abs/2410.03626v1 ) ライセンス: Link先を確認	Udita Ghosh, Dripta S. Raychaudhuri, Jiachen Li, Konstantinos Karydis, Amit K. Roy-Chowdhury,	(参考訳) オフラインの模倣学習は、環境の相互作用なしに、専門家によるデモンストレーションのセットからのみポリシーを学ぶことができる。少数の専門家データによる分布シフトの問題を軽減するため、近年の研究では、専門家データと並行して多数の補助的なデモンストレーションが組み込まれている。しかし、これらの手法の性能は補助データの品質と構成に関する仮定に依存している。しかし、これらの仮定が守られなければ成功することは滅多にない。この制限に対処するために、Diverse Auxiliary Data (ROIDA) からのRobust Offline Imitationを提案する。 ROIDAはまず、学習された報酬関数を使用して、補助データセット全体からの高品質な遷移を特定する。これらのハイリワードサンプルは、重み付けされた行動クローニングのための専門家のデモンストレーションと組み合わせられる。低品質のサンプルでは、ROIDAは時間差学習を適用して、高水準の状態に対する政策を操り、長期的なリターンを改善する。この2段階のアプローチにより、私たちのフレームワークは仮定なしで、高品質なデータと低品質のデータの両方を効果的に活用できます。大規模な実験により、ROIDAは専門家と非専門家の多様な比率で複数の補助データセット間で堅牢で一貫したパフォーマンスを達成することが検証された。 ROIDAはラベルなしの補助データを効果的に活用し、特定のデータ仮定に依存する事前の手法より優れている。 Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongside the expert data. However, the performance of these approaches rely on assumptions about the quality and composition of the auxiliary data. However, they are rarely successful when those assumptions do not hold. To address this limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data (ROIDA). ROIDA first identifies high-quality transitions from the entire auxiliary dataset using a learned reward function. These high-reward samples are combined with the expert demonstrations for weighted behavioral cloning. For lower-quality samples, ROIDA applies temporal difference learning to steer the policy towards high-reward states, improving long-term returns. This two-pronged approach enables our framework to effectively leverage both high and low-quality data without any assumptions. Extensive experiments validate that ROIDA achieves robust and consistent performance across multiple auxiliary datasets with diverse ratios of expert and non-expert demonstrations. ROIDA effectively leverages unlabeled auxiliary data, outperforming prior methods reliant on specific data assumptions.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# 量子LDPC符号間のユニバーサルアダプタ Universal adapters between quantum LDPC codes ( http://arxiv.org/abs/2410.03628v1 ) ライセンス: Link先を確認	Esha Swaroop, Tomas Jochym-O'Connor, Theodore J. Yoder,	(参考訳) 本稿では,量子低密度パリティチェック (LDPC) コードブロック内での論理的パウリ測定を行う方法として,繰り返しコードアダプタを提案する。このアダプタは、LDPCのコードとパウリスの測定によらず動作するという意味で普遍的である。この構成は、$O(td\log^2d)$追加のキュービットとチェックと$O(d)$タイムを用いて、$t$weight $O(d)$演算子の合同論理的パウリ測度を達成する。また、固定された$D\ge2$次元の幾何的局所符号についても、代わりに$O(td)$追加キュービットとチェックが必要であることを示す。例えば$t=2$でアダプタを拡張することで、$O(d^2)$追加のqubitとチェックを使用して、Dehnツイストを介して任意のLDPCコード上でターゲット論理CNOTゲートを実行するトーリックコードアダプタを構築します。これらの結果のいくつかを得るため、我々はより弱いグラフエッジ展開法を開発した。 We propose the repetition code adapter as a way to perform joint logical Pauli measurements within a quantum low-density parity check (LDPC) codeblock or between separate such codeblocks. This adapter is universal in the sense that it works regardless of the LDPC codes involved and the Paulis being measured. The construction achieves joint logical Pauli measurement of $t$ weight $O(d)$ operators using $O(td\log^2d)$ additional qubits and checks and $O(d)$ time. We also show for some geometrically-local codes in fixed $D\ge2$ dimensions that only $O(td)$ additional qubits and checks are required instead. By extending the adapter in the case $t=2$, we construct a toric code adapter that uses $O(d^2)$ additional qubits and checks to perform targeted logical CNOT gates on arbitrary LDPC codes via Dehn twists. To obtain some of these results, we develop a novel weaker form of graph edge expansion.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# 光の運動量に関する局所光子視点 A local photon perspective on the momentum of light ( http://arxiv.org/abs/2410.03633v1 ) ライセンス: Link先を確認	Gabriel Waite, Daniel Hodgson, Ben Lang, Varghese Alapatt, Almut Beige,	(参考訳) 最近我々は、位置空間における量子化された電磁場をモデル化するための局所光子アプローチを導入した。本稿では,光波パケットの空間変換のための生成器として,量子力学点粒子の運動量に類似した光の運動量を定義する。その後、空気からより密度の高い誘電体媒体に遷移する際の光の運動量ダイナミクスを解析する。我々の分析は、電磁場の運動量の特徴化に関わる複雑さを浮き彫りにしたアブラハム・ミンコフスキー論争に新たな光を当てている。我々の結果はミンコフスキーの理論や量子電磁力学における光の正準運動量の定義と一致しているが、いくつかの重要な違いもある。 Recently we introduced a local photon approach for modelling the quantised electromagnetic field in position space. Using this approach, in this paper we define the momentum of light, in analogy to the momentum of quantum mechanical point particles, as the generator for the spatial translation of photonic wave packets. Afterwards, we analyse the momentum dynamics of light when transitioning from air into a denser dielectric medium. Our analysis shines new light onto the Abraham-Minkowski controversy which highlights the intricacies involved in the characterisation of the momentum of the electromagnetic field. Although our results align with Minkowski's theory and with the definition of the canonical momentum of light in quantum electrodynamics, there are also some crucial differences.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# 適応型タンパク質言語モデルを用いた条件酵素生成 Conditional Enzyme Generation Using Protein Language Models with Adapters ( http://arxiv.org/abs/2410.03634v1 ) ライセンス: Link先を確認	Jason Yang, Aadyot Bhatnagar, Jeffrey A. Ruffolo, Ali Madani,	(参考訳) 所望の機能および/または性質を持つタンパク質の条件付き生成は、生成モデルの重要な目標である。言語モデルのプロンプトに基づく既存の方法は、所望の酵素ファミリーのような標的機能で条件付けられたタンパク質を生成することができる。しかし、これらの手法は単純でトークン化された条件付けに限定されており、目に見えない関数に一般化することが示されていない。本研究では,タンパク質言語モデルに対するアダプタを用いた条件付きタンパク質生成手法である ProCALM (Protein Conditionally Adapted Language Model) を提案する。 ProCALMの具体的実装は、酵素機能と分類の条件付け表現を組み込むためにProGen2を微調整することを含む。 ProCALMは、標的酵素ファミリーから条件付き配列を生成する既存の方法と一致している。印象的なことに、酵素機能と分類学の合同分布内でも生成でき、希少で目に見えない酵素ファミリーや分類学に一般化することができる。全体として, ProCALMはフレキシブルかつ計算効率のよいアプローチであり, 幅広い生成言語モデルに拡張できることを期待する。 The conditional generation of proteins with desired functions and/or properties is a key goal for generative models. Existing methods based on prompting of language models can generate proteins conditioned on a target functionality, such as a desired enzyme family. However, these methods are limited to simple, tokenized conditioning and have not been shown to generalize to unseen functions. In this study, we propose ProCALM (Protein Conditionally Adapted Language Model), an approach for the conditional generation of proteins using adapters to protein language models. Our specific implementation of ProCALM involves finetuning ProGen2 to incorporate conditioning representations of enzyme function and taxonomy. ProCALM matches existing methods at conditionally generating sequences from target enzyme families. Impressively, it can also generate within the joint distribution of enzymatic function and taxonomy, and it can generalize to rare and unseen enzyme families and taxonomies. Overall, ProCALM is a flexible and computationally efficient approach, and we expect that it can be extended to a wide range of generative language models.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# リアルタイムベンチマークは、拡散モデルでメンバーシップ推論攻撃を失敗させる Real-World Benchmarks Make Membership Inference Attacks Fail on Diffusion Models ( http://arxiv.org/abs/2410.03640v1 ) ライセンス: Link先を確認	Chumeng Liang, Jiaxuan You,	(参考訳) 拡散モデルに対するメンバーシップ推論攻撃(MIA)は、事前訓練された拡散モデルのトレーニングにおいて、不正なデータ使用の潜在的証拠として現れている。これらの攻撃は、拡散モデルのトレーニングデータセットにおける特定の画像の存在を検出することを目的としている。本研究は,拡散モデルにおける最新のMIAの評価について検討し,既存のMIA評価における重大な欠陥と過度に楽観的な性能評価を明らかにした。 CopyMarkはより現実的なMIAベンチマークで、事前訓練された拡散モデル、偏りのないデータセット、公正な評価パイプラインのサポートを通じて自分自身を区別する。実験により, 従来のMIA法の有効性は, より実践的な条件下で著しく低下することが実証された。この結果に基づき、MIAは、現在、事前訓練された拡散モデルにおいて、不正なデータの使用を識別するための信頼性の高いアプローチではないことを警告する。我々の知る限り、拡散モデル上でのMIAの性能過大評価を初めて発見し、より現実的な評価のための統一されたベンチマークを提示する。私たちのコードはGitHubで入手可能です。 Membership inference attacks (MIAs) on diffusion models have emerged as potential evidence of unauthorized data usage in training pre-trained diffusion models. These attacks aim to detect the presence of specific images in training datasets of diffusion models. Our study delves into the evaluation of state-of-the-art MIAs on diffusion models and reveals critical flaws and overly optimistic performance estimates in existing MIA evaluation. We introduce CopyMark, a more realistic MIA benchmark that distinguishes itself through the support for pre-trained diffusion models, unbiased datasets, and fair evaluation pipelines. Through extensive experiments, we demonstrate that the effectiveness of current MIA methods significantly degrades under these more practical conditions. Based on our results, we alert that MIA, in its current state, is not a reliable approach for identifying unauthorized data usage in pre-trained diffusion models. To the best of our knowledge, we are the first to discover the performance overestimation of MIAs on diffusion models and present a unified benchmark for more realistic evaluation. Our code is available on GitHub: \url{https://github.com/caradryanl/CopyMark}.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# 対話による個人選好を考慮したLLMのアライメント Aligning LLMs with Individual Preferences via Interaction ( http://arxiv.org/abs/2410.03642v1 ) ライセンス: Link先を確認	Shujin Wu, May Fung, Cheng Qian, Jeonghwan Kim, Dilek Hakkani-Tur, Heng Ji,	(参考訳) 大規模言語モデル(LLM)は、ますます高度な能力を示すため、その振る舞いを人間の価値観や好みと整合させることが、広く採用するには不可欠である。これまでの研究では、役に立つこと、無害さ、誠実さといった原則への一般的な整合性に焦点が当てられていたが、個人的および多様な嗜好を説明する必要性はほとんど見過ごされ、カスタマイズされた人間の体験を損なう可能性がある。このギャップに対処するため、我々は、LLMのメタスキルを育み、マルチターン会話を通じて現在のユーザのパーソナライズされた嗜好を暗黙的に推測し、次に次の行動や反応を推論された嗜好に動的に調整する「協調する相互作用」を訓練する。当社のアプローチでは、最初はシードサンプルを作成して3,310人の異なるユーザペルソナを多種多様なプールにすることで、反復的な自己生成とフィルタリングを通じて拡張する。異なるユーザペルソナによってガイドされたマルチLLMコラボレーションを利用して,木構造に3K以上のマルチターン会話を含むマルチターン選好データセットを開発する。最後に、教師付き微調整および強化学習を適用し、このデータセットを用いてLCMを強化する。 ALOE(Align With CustOmized PrEferences)ベンチマークは、慎重に選択された100のサンプルと、会話中にカスタマイズされたアライメント性能を測定するためのよく設計されたメトリクスから構成される。実験により,対話による動的,パーソナライズされたアライメントの実現に本手法の有効性が示された。 As large language models (LLMs) demonstrate increasingly advanced capabilities, aligning their behaviors with human values and preferences becomes crucial for their wide adoption. While previous research focuses on general alignment to principles such as helpfulness, harmlessness, and honesty, the need to account for individual and diverse preferences has been largely overlooked, potentially undermining customized human experiences. To address this gap, we train LLMs that can ''interact to align'', essentially cultivating the meta-skill of LLMs to implicitly infer the unspoken personalized preferences of the current user through multi-turn conversations, and then dynamically align their following behaviors and responses to these inferred preferences. Our approach involves establishing a diverse pool of 3,310 distinct user personas by initially creating seed examples, which are then expanded through iterative self-generation and filtering. Guided by distinct user personas, we leverage multi-LLM collaboration to develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures. Finally, we apply supervised fine-tuning and reinforcement learning to enhance LLMs using this dataset. For evaluation, we establish the ALOE (ALign With CustOmized PrEferences) benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations. Experimental results demonstrate the effectiveness of our method in enabling dynamic, personalized alignment via interaction.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# 学習不能な3Dポイント・クラウド:クラスワイド・トランスフォーメーションは必要なもの Unlearnable 3D Point Clouds: Class-wise Transformation Is All You Need ( http://arxiv.org/abs/2410.03644v1 ) ライセンス: Link先を確認	Xianlong Wang, Minghui Li, Wei Liu, Hangtao Zhang, Shengshan Hu, Yechao Zhang, Ziqi Zhou, Hai Jin,	(参考訳) 従来の学習不可能な戦略は、許可されていないユーザーが2D画像データでトレーニングすることを防ぐために提案されている。センシティブな情報を含むより多くの3Dポイントクラウドデータによって、この新しいタイプのデータの不正使用も深刻な懸念となっている。これを解決するために,2つのプロセスを含む3Dポイントクラウドのための最初の統合不可能なフレームワークを提案する。 i)カテゴリ適応型アロケーション戦略とサンプルに割り当てられた複数変換によって確立されたクラスワイズ設定を含む、学習不能なデータ保護手法を提案する。 2) クラスワイド逆行列変換を利用したデータ復元手法を提案し, 学習不能なデータに対する認証のみのトレーニングを可能にする。この復元プロセスは、既存の未学習の文献である \ie で見落とされ、認可されたユーザでさえ、3Dの未学習データから知識を得るのに苦労している。理論的および実証的な結果(6つのデータセット、16のモデル、2のタスクを含む)は、提案した非学習可能なフレームワークの有効性を示す。私たちのコードは \url{https://github.com/CGCL-codes/UnlearnablePC} で利用可能です。 Traditional unlearnable strategies have been proposed to prevent unauthorized users from training on the 2D image data. With more 3D point cloud data containing sensitivity information, unauthorized usage of this new type data has also become a serious concern. To address this, we propose the first integral unlearnable framework for 3D point clouds including two processes: (i) we propose an unlearnable data protection scheme, involving a class-wise setting established by a category-adaptive allocation strategy and multi-transformations assigned to samples; (ii) we propose a data restoration scheme that utilizes class-wise inverse matrix transformation, thus enabling authorized-only training for unlearnable data. This restoration process is a practical issue overlooked in most existing unlearnable literature, \ie, even authorized users struggle to gain knowledge from 3D unlearnable data. Both theoretical and empirical results (including 6 datasets, 16 models, and 2 tasks) demonstrate the effectiveness of our proposed unlearnable framework. Our code is available at \url{https://github.com/CGCL-codes/UnlearnablePC}	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# GenSim2:マルチモーダル・推論LDMによるロボットデータ生成のスケーリング GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs ( http://arxiv.org/abs/2410.03645v1 ) ライセンス: Link先を確認	Pu Hua, Minghuan Liu, Annabella Macaluso, Yunfeng Lin, Weinan Zhang, Huazhe Xu, Lirui Wang,	(参考訳) 今やロボットシミュレーションは、多様なシミュレーションタスクやシーンを作るのに必要な人的努力のために、スケールアップが難しいままだ。シミュレーション訓練されたポリシーは、多くのsim-to-realメソッドが単一のタスクに焦点を当てているため、スケーラビリティの問題にも直面する。これらの課題に対処するため、この研究は、多モーダルおよび推論機能を備えたLLMのコーディングを活用するスケーラブルなフレームワークであるGenSim2を提案している。そこで本研究では,これらのタスクを大規模に表現するための実演データを自動的に生成するために,オブジェクトカテゴリ内で一般化する計画とRLソルバを提案する。パイプラインは200のオブジェクトで最大100の調音タスクのデータを生成し、必要な人的労力を減らすことができる。このようなデータを活用するために,プロプリセプティブ・ポイントクラウド・トランスフォーマー (PPT) と呼ばれる,マルチタスク言語による効果的なポリシーアーキテクチャを提案し,その実演から学習し,強力なsim-to-realゼロショット転送を示す。提案したパイプラインとポリシアーキテクチャを組み合わせることで,生成したデータをゼロショット転送や実世界の収集データとの協調訓練に使用できる,GenSim2の有望な利用方法を示す。 Robotic simulation today remains challenging to scale up due to the human efforts required to create diverse simulation tasks and scenes. Simulation-trained policies also face scalability issues as many sim-to-real methods focus on a single task. To address these challenges, this work proposes GenSim2, a scalable framework that leverages coding LLMs with multi-modal and reasoning capabilities for complex and realistic simulation task creation, including long-horizon tasks with articulated objects. To automatically generate demonstration data for these tasks at scale, we propose planning and RL solvers that generalize within object categories. The pipeline can generate data for up to 100 articulated tasks with 200 objects and reduce the required human efforts. To utilize such data, we propose an effective multi-task language-conditioned policy architecture, dubbed proprioceptive point-cloud transformer (PPT), that learns from the generated demonstrations and exhibits strong sim-to-real zero-shot transfer. Combining the proposed pipeline and the policy architecture, we show a promising usage of GenSim2 that the generated data can be used for zero-shot transfer or co-train with real-world collected data, which enhances the policy performance by 20% compared with training exclusively on limited real data.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# Minimax-Optimal Trust-Aware Multi-armed bandits Minimax-optimal trust-aware multi-armed bandits ( http://arxiv.org/abs/2410.03651v1 ) ライセンス: Link先を確認	Changxiao Cai, Jiacheng Zhang,	(参考訳) マルチアームバンディット(MAB)アルゴリズムは、人間が推奨ポリシーを完全に実装するという前提のもと、シーケンシャルな意思決定アプリケーションにおいて大きな成功を収めている。しかし、既存の手法はしばしば、学習アルゴリズムにおける人間の信頼の重要な要素を見落としている。信頼が欠如している場合、人間は推奨された方針から逸脱し、望ましくない学習パフォーマンスにつながる。このギャップに起因して、動的信頼モデルを標準のMABフレームワークに統合することにより、信頼を意識したMAB問題を研究する。具体的には、推奨・実際に実施された政策は、人間の信頼によって異なると仮定し、推奨された政策の質とともに進化する。我々は、信頼問題の存在下でのミニマックスの後悔を確立し、上位信頼境界(UCB)アルゴリズムのようなバニラMABアルゴリズムの準最適性を実証する。この制限を克服するために、我々は、ほぼ最適統計保証を確実に達成する、2段階の信頼認識手順を導入する。本研究は,信頼問題に対処する際のアルゴリズムの利点を説明するためのシミュレーション研究である。 Multi-armed bandit (MAB) algorithms have achieved significant success in sequential decision-making applications, under the premise that humans perfectly implement the recommended policy. However, existing methods often overlook the crucial factor of human trust in learning algorithms. When trust is lacking, humans may deviate from the recommended policy, leading to undesired learning performance. Motivated by this gap, we study the trust-aware MAB problem by integrating a dynamic trust model into the standard MAB framework. Specifically, it assumes that the recommended and actually implemented policy differs depending on human trust, which in turn evolves with the quality of the recommended policy. We establish the minimax regret in the presence of the trust issue and demonstrate the suboptimality of vanilla MAB algorithms such as the upper confidence bound (UCB) algorithm. To overcome this limitation, we introduce a novel two-stage trust-aware procedure that provably attains near-optimal statistical guarantees. A simulation study is conducted to illustrate the benefits of our proposed algorithm when dealing with the trust issue.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# Dorami: RISC-V TEEのセキュリティ監視を分離する秘密 Dorami: Privilege Separating Security Monitor on RISC-V TEEs ( http://arxiv.org/abs/2410.03653v1 ) ライセンス: Link先を確認	Mark Kuhne, Stavros Volos, Shweta Shinde,	(参考訳) RISC-V上のTEE実装は、SM(Security Monitor)と呼ばれる信頼できるコンポーネントを導入することで、エンクレーブの抽象化を提供する。 SMは、物理メモリ保護を強制する特権有能なISA命令を使用することで、エンクレーブをOSから分離するなどの重要なタスクを実行する。しかし、SMは、サイズが大きいだけでなく、プラットフォーム固有のサードパーティのベンダーコードを含むサイドファームウェアとともに、プラットフォーム上の最高特権層(マシンモード)で実行される。本稿では,ファームウェアからSMを分離し,TEEの攻撃面を低減する特権分離手法であるDoramiを提案する。 Doramiは既存のISA機能を再使用してアイソレーションを強制し、大きなオーバーヘッドなしに目標を達成する。 TEE implementations on RISC-V offer an enclave abstraction by introducing a trusted component called the security monitor (SM). The SM performs critical tasks such as isolating enclaves from each other as well as from the OS by using privileged ISA instructions that enforce the physical memory protection. However, the SM executes at the highest privilege layer on the platform (machine-mode) along side firmware that is not only large in size but also includes third-party vendor code specific to the platform. In this paper, we present Dorami - a privilege separation approach that isolates the SM from the firmware thus reducing the attack surface on TEEs. Dorami re-purposes existing ISA features to enforce its isolation and achieves its goals without large overheads.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# 混在地におけるヒューマノイド移動の学習 Learning Humanoid Locomotion over Challenging Terrain ( http://arxiv.org/abs/2410.03654v1 ) ライセンス: Link先を確認	Ilija Radosavovic, Sarthak Kamat, Trevor Darrell, Jitendra Malik,	(参考訳) 人間型ロボットは、原則として、足を使ってほぼどこにでも行ける。しかし、多様な地形を横断できるコントローラーを開発することは、依然として大きな課題である。古典的なコントローラーは広く一般化するのは難しいが、学習に基づく手法は主に穏やかな地形に焦点を当てている。そこで本研究では,自然と人為的な地形を横断する視覚的ヒューマノイド移動の学習的アプローチを提案する。本手法では, 主観的観察と行動の歴史に基づいて, 次の行動を予測するために, トランスフォーマーモデルを用いる。このモデルは、まず、シーケンスモデリングを伴う平地軌道のデータセット上で事前訓練され、その後、強化学習を用いて不均一な地形で微調整される。本研究では, 荒面, 変形面, 傾斜面など, 様々な地形にまたがる実際のヒューマノイドロボットを用いて, モデルを評価する。このモデルは、堅牢な性能、コンテキスト内適応、創発的な地形表現を示す。現実のケーススタディでは、私たちのヒューマノイドロボットは、バークレーのハイキングコースを4マイル以上越え、サンフランシスコで最も急な道のいくつかを登った。 Humanoid robots can, in principle, use their legs to go almost anywhere. Developing controllers capable of traversing diverse terrains, however, remains a considerable challenge. Classical controllers are hard to generalize broadly while the learning-based methods have primarily focused on gentle terrains. Here, we present a learning-based approach for blind humanoid locomotion capable of traversing challenging natural and man-made terrain. Our method uses a transformer model to predict the next action based on the history of proprioceptive observations and actions. The model is first pre-trained on a dataset of flat-ground trajectories with sequence modeling, and then fine-tuned on uneven terrain using reinforcement learning. We evaluate our model on a real humanoid robot across a variety of terrains, including rough, deformable, and sloped surfaces. The model demonstrates robust performance, in-context adaptation, and emergent terrain representations. In real-world case studies, our humanoid robot successfully traversed over 4 miles of hiking trails in Berkeley and climbed some of the steepest streets in San Francisco.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# 幾何学的表現条件は等変分子生成を改善する Geometric Representation Condition Improves Equivariant Molecule Generation ( http://arxiv.org/abs/2410.03655v1 ) ライセンス: Link先を確認	Zian Li, Cai Zhou, Xiyuan Wang, Xingang Peng, Muhan Zhang,	(参考訳) 分子生成モデルの最近の進歩は、特に薬物設計において、科学的発見を加速させる大きな可能性を示している。しかしながら、これらのモデルはしばしば、特に特定の分子特性を満たさなければならない条件付きシナリオにおいて、高品質な分子を生成する上で困難に直面する。本研究では,幾何表現条件を統合することで,分子生成モデルの性能を高めるための汎用フレームワークであるGeoRCGを紹介する。分子生成過程を2段階に分解し,まず情報的幾何学的表現を生成する。分子を直接生成するのに比べ、第1段階における比較的容易に生成できる表現は、よりゴール指向でより高速な方法で高品質な分子に到達するよう第2段階の世代を導く。ベースジェネレータとしてEDMを活用することで、広く使用されているQM9およびGEOM-DRUGデータセット上での無条件分子生成の大幅な品質改善を観察する。さらに, 従来の手法と同様に, 個々の特性値に対する条件付けよりも, 意味的にリッチな幾何学的表現に対する条件付けの方が優れていることを示す。さらに,このような表現指導により,1000ステップ以上の優れた生成品質を維持しつつ,拡散ステップの数を100に削減し,生成プロセスを大幅に高速化できることを示す。 Recent advancements in molecular generative models have demonstrated substantial potential in accelerating scientific discovery, particularly in drug design. However, these models often face challenges in generating high-quality molecules, especially in conditional scenarios where specific molecular properties must be satisfied. In this work, we introduce GeoRCG, a general framework to enhance the performance of molecular generative models by integrating geometric representation conditions. We decompose the molecule generation process into two stages: first, generating an informative geometric representation; second, generating a molecule conditioned on the representation. Compared to directly generating a molecule, the relatively easy-to-generate representation in the first-stage guides the second-stage generation to reach a high-quality molecule in a more goal-oriented and much faster way. Leveraging EDM as the base generator, we observe significant quality improvements in unconditional molecule generation on the widely-used QM9 and GEOM-DRUG datasets. More notably, in the challenging conditional molecular generation task, our framework achieves an average 31\% performance improvement over state-of-the-art approaches, highlighting the superiority of conditioning on semantically rich geometric representations over conditioning on individual property values as in previous approaches. Furthermore, we show that, with such representation guidance, the number of diffusion steps can be reduced to as small as 100 while maintaining superior generation quality than that achieved with 1,000 steps, thereby significantly accelerating the generation process.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# RAFT:テキスト・ディテクターへの現実的な攻撃 RAFT: Realistic Attacks to Fool Text Detectors ( http://arxiv.org/abs/2410.03658v1 ) ライセンス: Link先を確認	James Wang, Ran Li, Junfeng Yang, Chengzhi Mao,	(参考訳) 大規模言語モデル(LLM)は、様々なタスクにまたがって顕著な流速を示した。しかし、偽情報の拡散など倫理的でない応用が懸念されている。最近の研究で多くのLSM検出法が提案されているが、その堅牢性と信頼性は未定である。本稿では,既存のLLM検出器に対する文法エラーのないブラックボックス攻撃であるRAFTを提案する。従来の言語モデルに対する攻撃とは対照的に,本手法では,本来のテキスト品質を維持しつつ,単語レベルでのLLM埋め込みの転送性を利用する。我々は、補助的な埋め込みを利用して、ターゲット検出器に対して摂動する候補単語を欲求的に選択する。実験により、我々の攻撃は、さまざまな領域にわたる研究において、すべての検出器を99%まで効果的に妥協し、ソースモデル間で転送可能であることが判明した。手動による人的評価研究は、我々の攻撃が人間の文章と現実的で区別できないことを示している。また、RAFTによって生成された例は、逆向きに頑健な検出器の訓練に利用できることを示す。我々の研究は、現在のLLM検出器は対角的に堅牢ではなく、より弾力性のある検出機構の緊急性の必要性を浮き彫りにしている。 Large language models (LLMs) have exhibited remarkable fluency across various tasks. However, their unethical applications, such as disseminating disinformation, have become a growing concern. Although recent works have proposed a number of LLM detection methods, their robustness and reliability remain unclear. In this paper, we present RAFT: a grammar error-free black-box attack against existing LLM detectors. In contrast to previous attacks for language models, our method exploits the transferability of LLM embeddings at the word-level while preserving the original text quality. We leverage an auxiliary embedding to greedily select candidate words to perturb against the target detector. Experiments reveal that our attack effectively compromises all detectors in the study across various domains by up to 99%, and are transferable across source models. Manual human evaluation studies show our attacks are realistic and indistinguishable from original human-written text. We also show that examples generated by RAFT can be used to train adversarially robust detectors. Our work shows that current LLM detectors are not adversarially robust, underscoring the urgent need for more resilient detection mechanisms.	翻訳日:2024-11-02 20:58:02 公開日:2024-10-04
# ディープラーニングと機械学習 - ビッグデータ分析とデザインパターンによる管理の改善 Deep Learning and Machine Learning: Advancing Big Data Analytics and Management with Design Patterns ( http://arxiv.org/abs/2410.03795v1 ) ライセンス: Link先を確認	Keyu Chen, Ziqian Bi, Tianyang Wang, Yizhu Wen, Pohsun Feng, Qian Niu, Junyu Liu, Benji Peng, Sen Zhang, Ming Li, Xuanhe Pan, Jiawei Xu, Jinlang Wang, Caitlyn Heqi Yin, Ming Liu,	(参考訳) この本“Design Patterns in Machine Learning and Deep Learning: Advancing Big Data Analytics Management”では、大規模な機械学習やディープラーニングアプリケーションに適した、基本的なデザインパターンに関する包括的な研究が紹介されている。この本は、ビッグデータ分析システムの開発、保守、スケーラビリティを最適化するために、古典的なソフトウェアエンジニアリングパターン、創造的、構造的、行動的、並行パターンの応用について説明している。実践的な例と詳細なPython実装を通じて、従来のオブジェクト指向設計パターンと、現代のデータ分析環境のユニークな要求とのギャップを埋める。シングルトン、ファクトリ、オブザーバ、ストラテジーといった主要なデザインパターンは、モデル管理、デプロイメント戦略、チームコラボレーションへの影響を分析し、効率的で再利用可能な、柔軟なシステムのエンジニアリングに関する貴重な洞察を提供する。このボリュームは、開発者、研究者、エンジニアにとって、マシンラーニングとソフトウェア設計の両方における技術的専門知識を強化するために不可欠なリソースである。 This book, Design Patterns in Machine Learning and Deep Learning: Advancing Big Data Analytics Management, presents a comprehensive study of essential design patterns tailored for large-scale machine learning and deep learning applications. The book explores the application of classical software engineering patterns, Creational, Structural, Behavioral, and Concurrency Patterns, to optimize the development, maintenance, and scalability of big data analytics systems. Through practical examples and detailed Python implementations, it bridges the gap between traditional object-oriented design patterns and the unique demands of modern data analytics environments. Key design patterns such as Singleton, Factory, Observer, and Strategy are analyzed for their impact on model management, deployment strategies, and team collaboration, providing invaluable insights into the engineering of efficient, reusable, and flexible systems. This volume is an essential resource for developers, researchers, and engineers aiming to enhance their technical expertise in both machine learning and software design.	翻訳日:2024-11-02 16:20:48 公開日:2024-10-04
# 信頼された多視点学習のための動的エビデンスデカップリング Dynamic Evidence Decoupling for Trusted Multi-view Learning ( http://arxiv.org/abs/2410.03796v1 ) ライセンス: Link先を確認	Ying Liu, Lihong Liu, Cai Xu, Xiangyu Song, Ziyu Guan, Wei Zhao,	(参考訳) マルチビュー学習手法は、意思決定の不確実性を無視しながら、意思決定の正確性を改善することに集中し、安全クリティカルなアプリケーションに対する適合性を制限している。これを軽減するために,各インスタンスのクラス分布を学習することにより,分類確率と不確実性を推定する信頼度の高い多視点学習手法を提案する。しかし、これらの手法は、実世界のマルチビューデータにおける意味的曖昧さ現象を無視して、各視点のデータが全てのカテゴリを効果的に区別できると仮定する。以上の結果から,この現象は既存の手法による視点特異的な証拠の学習を著しく抑制することが明らかとなった。本稿では,この問題を解決するために,一貫性と補完性を考慮した信頼多視点学習(CCML)手法を提案する。我々はまず,信念の質量ベクトルと不確実性推定からなる明らかな深層ニューラルネットワークを用いて,見解を構築する。次に、一貫性と相補的な証拠を動的に分離する。一貫性のある証拠は、すべてのビューで共有された部分から導き出され、補完的な証拠は、すべてのビューで異なる部分の平均化によって得られる。一貫性のある証拠から構築された意見が、根本真実のカテゴリーと厳密に一致していることを保証する。補完的な証拠から構築された意見については、証拠の潜在的な曖昧さを許容する。 CCMLと最先端のベースラインを、1つの合成データセットと6つの実世界のデータセットで比較する。その結果, 動的エビデンスデカップリング戦略の有効性を検証し, CCMLが精度と信頼性の基準線を著しく上回ることを示した。コードはhttps://github.com/Lihong-Liu/CCMLで公開されている。 Multi-view learning methods often focus on improving decision accuracy, while neglecting the decision uncertainty, limiting their suitability for safety-critical applications. To mitigate this, researchers propose trusted multi-view learning methods that estimate classification probabilities and uncertainty by learning the class distributions for each instance. However, these methods assume that the data from each view can effectively differentiate all categories, ignoring the semantic vagueness phenomenon in real-world multi-view data. Our findings demonstrate that this phenomenon significantly suppresses the learning of view-specific evidence in existing methods. We propose a Consistent and Complementary-aware trusted Multi-view Learning (CCML) method to solve this problem. We first construct view opinions using evidential deep neural networks, which consist of belief mass vectors and uncertainty estimates. Next, we dynamically decouple the consistent and complementary evidence. The consistent evidence is derived from the shared portions across all views, while the complementary evidence is obtained by averaging the differing portions across all views. We ensure that the opinion constructed from the consistent evidence strictly aligns with the ground-truth category. For the opinion constructed from the complementary evidence, we allow it for potential vagueness in the evidence. We compare CCML with state-of-the-art baselines on one synthetic and six real-world datasets. The results validate the effectiveness of the dynamic evidence decoupling strategy and show that CCML significantly outperforms baselines on accuracy and reliability. The code is released at https://github.com/Lihong-Liu/CCML.	翻訳日:2024-11-02 16:20:48 公開日:2024-10-04
# 大規模言語モデルを用いた医用転写のベストプラクティスの探索 Searching for Best Practices in Medical Transcription with Large Language Model ( http://arxiv.org/abs/2410.03797v1 ) ライセンス: Link先を確認	Jiafeng Li, Yanda Mu,	(参考訳) 医学的モノローグの転写、特に専門用語の密度が高く、異なるアクセントで提供されるものは、既存の自動化システムにとって重要な課題である。本稿では,大言語モデル(LLM)を応用して,医師のモノローグの音声記録から高精度な医用文字を生成する手法を提案する。提案手法は,単語誤り率(WER)を低くし,重要な医療用語の正確な認識を確保するために,高度な言語モデリング技術を統合する。医療記録の包括的データセットの厳密なテストを通じて,本手法は,全体の転写精度と重要な医療用語の忠実度の両方において,大幅な改善を示す。以上の結果から,本システムは医療機関が高い精度を維持しつつ,転写ニーズの合理化を図るための信頼性の高いツールとして,臨床ドキュメント作成に大いに役立つ可能性が示唆された。 The transcription of medical monologues, especially those containing a high density of specialized terminology and delivered with a distinct accent, presents a significant challenge for existing automated systems. This paper introduces a novel approach leveraging a Large Language Model (LLM) to generate highly accurate medical transcripts from audio recordings of doctors' monologues, specifically focusing on Indian accents. Our methodology integrates advanced language modeling techniques to lower the Word Error Rate (WER) and ensure the precise recognition of critical medical terms. Through rigorous testing on a comprehensive dataset of medical recordings, our approach demonstrates substantial improvements in both overall transcription accuracy and the fidelity of key medical terminologies. These results suggest that our proposed system could significantly aid in clinical documentation processes, offering a reliable tool for healthcare providers to streamline their transcription needs while maintaining high standards of accuracy.	翻訳日:2024-11-02 16:20:48 公開日:2024-10-04
# M2AR:拡張現実ワークフローモデリング言語のためのWebベースのモデリング環境 M2AR: A Web-based Modeling Environment for the Augmented Reality Workflow Modeling Language ( http://arxiv.org/abs/2410.03800v1 ) ライセンス: Link先を確認	Fabian Muff, Hans-Georg Fill,	(参考訳) 本稿では,プログラミング知識を必要とせずに拡張現実アプリケーションのモデリングと実行を可能にする,Webベースの2次元および3次元モデリング環境であるM2ARを紹介する。このプラットフォームは、3D JavaScriptライブラリと混合現実没入型Web標準WebXRに基づいている。実現可能性の最初のデモとして、以前導入されたARWFML(Augmented Reality Workflow Modeling Language)がこの環境を使ってうまく実装されている。新しいモデリング環境の有用性は、M2AR上のARWFMLの使用例を示すことによって示される。 This paper introduces M2AR, a new web-based, two- and three-dimensional modeling environment that enables the modeling and execution of augmented reality applications without requiring programming knowledge. The platform is based on a 3D JavaScript library and the mixed reality immersive web standard WebXR. For a first demonstration of its feasibility, the previously introduced Augmented Reality Workflow Modeling Language (ARWFML) has been successfully implemented using this environment. The usefulness of the new modeling environment is demonstrated by showing use cases of the ARWFML on M2AR.	翻訳日:2024-11-02 16:10:45 公開日:2024-10-04
# 大動脈瘤破裂リスク予測のためのメッシュインフォームドリダクションモデル Mesh-Informed Reduced Order Models for Aneurysm Rupture Risk Prediction ( http://arxiv.org/abs/2410.03802v1 ) ライセンス: Link先を確認	Giuseppe Alessio D'Inverno, Saeid Moradizadeh, Sajad Salavatidezfouli, Pasquale Claudio Africa, Gianluigi Rozza,	(参考訳) 心臓血管系の複雑さは、健康状態を迅速に認識するために正確に再現する必要がある。一方、フルオーダーモデル(FOM)は正確な血行動態の評価を行うが、その高い計算要求はリアルタイム臨床応用を妨げる。対照的に、ROMはより効率的で正確なソリューションを提供しており、パーソナライズされたヘルスケアとタイムリーな臨床的意思決定に不可欠である。本研究では,大動脈瘤の進展と破裂のリスクを予測するために,FOMとROMを統合することにより,心血管医学における計算流体力学(CFD)の適用について検討する。腹部大動脈瘤の異なる成長段階で採取された壁せん断応力 (WSS) とOscillatory Shear Index (OSI) をグラフニューラルネットワーク (GNN) を用いて予測する。 GNNは、入力グラフの寸法に関係なく、空間的局所情報を考慮した有限体積(FV)離散化によって得られるメッシュの自然なグラフ構造を利用する。実験的な検証フレームワークは有望な結果をもたらし,その方法が次元の呪いを克服する有効な代替手段であることを確認した。 The complexity of the cardiovascular system needs to be accurately reproduced in order to promptly acknowledge health conditions; to this aim, advanced multifidelity and multiphysics numerical models are crucial. On one side, Full Order Models (FOMs) deliver accurate hemodynamic assessments, but their high computational demands hinder their real-time clinical application. In contrast, ROMs provide more efficient yet accurate solutions, essential for personalized healthcare and timely clinical decision-making. In this work, we explore the application of computational fluid dynamics (CFD) in cardiovascular medicine by integrating FOMs with ROMs for predicting the risk of aortic aneurysm growth and rupture. Wall Shear Stress (WSS) and the Oscillatory Shear Index (OSI), sampled at different growth stages of the abdominal aortic aneurysm, are predicted by means of Graph Neural Networks (GNNs). GNNs exploit the natural graph structure of the mesh obtained by the Finite Volume (FV) discretization, taking into account the spatial local information, regardless of the dimension of the input graph. Our experimental validation framework yields promising results, confirming our method as a valid alternative that overcomes the curse of dimensionality.	翻訳日:2024-11-02 16:10:45 公開日:2024-10-04
# 3次元分子生成のためのテキスト誘導拡散モデル Text-guided Diffusion Model for 3D Molecule Generation ( http://arxiv.org/abs/2410.03803v1 ) ライセンス: Link先を確認	Yanchen Luo, Junfeng Fang, Sihang Li, Zhiyuan Liu, Jiancan Wu, An Zhang, Wenjie Du, Xiang Wang,	(参考訳) 標的となる性質を持つ分子のデノボ生成は、生物学、化学、薬物発見において重要である。現在の生成モデルは、特定のプロパティ値を条件として使用することに限定されており、詳細な人間の言語で記述された複雑なカスタマイズに苦慮している。そこで本論文では,テキストガイドと3次元拡散モデルを用いたテキスト誘導小分子生成手法であるTextSMOGを提案する。この方法は、テキスト条件を用いて分子生成を誘導し、安定性と多様性を両立させる。実験の結果,TextSMOGはテキスト記述から情報を取得し,活用する能力を示し,複雑なテキストのカスタマイズに対応する3次元分子構造を生成する強力なツールとなった。 The de novo generation of molecules with targeted properties is crucial in biology, chemistry, and drug discovery. Current generative models are limited to using single property values as conditions, struggling with complex customizations described in detailed human language. To address this, we propose the text guidance instead, and introduce TextSMOG, a new Text-guided Small Molecule Generation Approach via 3D Diffusion Model which integrates language and diffusion models for text-guided small molecule generation. This method uses textual conditions to guide molecule generation, enhancing both stability and diversity. Experimental results show TextSMOG's proficiency in capturing and utilizing information from textual descriptions, making it a powerful tool for generating 3D molecular structures in response to complex textual customizations.	翻訳日:2024-11-02 16:10:45 公開日:2024-10-04
# 投機的復号化のための注意の混合 Mixture of Attentions For Speculative Decoding ( http://arxiv.org/abs/2410.03804v1 ) ライセンス: Link先を確認	Matthieu Zimmer, Milan Gritta, Gerasimos Lampouras, Haitham Bou Ammar, Jun Wang,	(参考訳) LLM(Large Language Models)のパラメータ数の増加により、計算要求が大幅に急増し、デプロイが困難でコストがかかるようになった。投機的復号法(SD)はより小さなモデルを利用して将来のトークンを効率的に提案し、LLMによって並列に検証される。 LLMからのアクティベーションを利用する小型モデルは、現在最も高速な復号速度を実現している。しかし,SDモデルには,トレーニング中の政治力の欠如や部分観測可能性の欠如など,いくつかの制限がある。これらの欠点に対処するために,SD用ミキサー・オブ・アテンションを導入することで,小型モデルのより基礎的なアーキテクチャを提案する。我々の新しいアーキテクチャは、従来の単一デバイスデプロイメントと、小型モデルをコンシューマデバイスにホストする新しいクライアントサーバデプロイメントと、サーバ上のLLMという2つのシナリオに適用できる。単一デバイスシナリオでは、EAGLE-2を9.5%改善し、受け入れ期間を25%改善する最先端のスピードアップを実証する。クライアントサーバの設定で、我々の実験は以下のとおりである。 1) ネットワーク条件の異なるサーバへの最小限の呼び出しによる最先端のレイテンシ 2) 完全切断の場合,本手法は他のSD手法と比較して精度が向上し, 生成プロセスの継続が不可能なLCMに対するAPI呼び出しよりも有利であることを示す。 The growth in the number of parameters of Large Language Models (LLMs) has led to a significant surge in computational requirements, making them challenging and costly to deploy. Speculative decoding (SD) leverages smaller models to efficiently propose future tokens, which are then verified by the LLM in parallel. Small models that utilise activations from the LLM currently achieve the fastest decoding speeds. However, we identify several limitations of SD models including the lack of on-policyness during training and partial observability. To address these shortcomings, we propose a more grounded architecture for small models by introducing a Mixture of Attentions for SD. Our novel architecture can be applied in two scenarios: a conventional single device deployment and a novel client-server deployment where the small model is hosted on a consumer device and the LLM on a server. In a single-device scenario, we demonstrate state-of-the-art speedups improving EAGLE-2 by 9.5% and its acceptance length by 25%. In a client-server setting, our experiments demonstrate: 1) state-of-the-art latencies with minimal calls to the server for different network conditions, and 2) in the event of a complete disconnection, our approach can maintain higher accuracy compared to other SD methods and demonstrates advantages over API calls to LLMs, which would otherwise be unable to continue the generation process.	翻訳日:2024-11-02 16:10:45 公開日:2024-10-04
# 時系列のメタデータ - トランスフォーマーによるインフォーマティブな予測 Metadata Matters for Time Series: Informative Forecasting with Transformers ( http://arxiv.org/abs/2410.03806v1 ) ライセンス: Link先を確認	Jiaxiang Dong, Haixu Wu, Yuxuan Wang, Li Zhang, Jianmin Wang, Mingsheng Long,	(参考訳) 時系列予測は、財務分析やエネルギー計画など、広範囲の現実世界の応用で一般的である。これまでの研究では、時系列に固有の複雑なバリエーションと依存関係を捉えようと努力した、時系列のモダリティに主に焦点が当てられていた。数値時系列データ以外にも、メタデータ(例えばデータセットや変数記述)にも予測に不可欠な貴重な情報があり、これはアプリケーションシナリオを識別し、桁数列よりも解釈可能な知識を提供するのに利用できる。この観測にインスパイアされたMetaTST(Metadata-informed Time Series Transformer)を提案する。メタデータの非構造化の性質に取り組むために、MetaTSTは、事前に設計されたテンプレートによってそれらを自然言語に形式化し、これらのテキストをメタデータトークンにエンコードするために大きな言語モデル(LLM)を活用する。さらに、Transformerエンコーダを使用して、時系列およびメタデータトークンを通信し、メタデータ情報による系列表現を拡張して、より正確な予測を行う。この設計により、モデルは様々なシナリオにまたがるコンテキスト固有のパターンを適応的に学習することができ、特に大規模で多様なシナリオ予測タスクの処理に有効である。実験により,MetaTSTは,単一データセットと複数データセットのジョイントトレーニング設定を対象とし,高度時系列モデルとLLMに基づく短時間・長期予測ベンチマークの手法と比較して,最先端技術を実現している。 Time series forecasting is prevalent in extensive real-world applications, such as financial analysis and energy planning. Previous studies primarily focus on time series modality, endeavoring to capture the intricate variations and dependencies inherent in time series. Beyond numerical time series data, we notice that metadata (e.g.~dataset and variate descriptions) also carries valuable information essential for forecasting, which can be used to identify the application scenario and provide more interpretable knowledge than digit sequences. Inspired by this observation, we propose a Metadata-informed Time Series Transformer (MetaTST), which incorporates multiple levels of context-specific metadata into Transformer forecasting models to enable informative time series forecasting. To tackle the unstructured nature of metadata, MetaTST formalizes them into natural languages by pre-designed templates and leverages large language models (LLMs) to encode these texts into metadata tokens as a supplement to classic series tokens, resulting in an informative embedding. Further, a Transformer encoder is employed to communicate series and metadata tokens, which can extend series representations by metadata information for more accurate forecasting. This design also allows the model to adaptively learn context-specific patterns across various scenarios, which is particularly effective in handling large-scale, diverse-scenario forecasting tasks. Experimentally, MetaTST achieves state-of-the-art compared to advanced time series models and LLM-based methods on widely acknowledged short- and long-term forecasting benchmarks, covering both single-dataset individual and multi-dataset joint training settings.	翻訳日:2024-11-02 16:10:45 公開日:2024-10-04
# デジタルマンモグラフィーにおけるラジオオパークアーティファクト:下流効果の自動検出と解析 Radio-opaque artefacts in digital mammography: automatic detection and analysis of downstream effects ( http://arxiv.org/abs/2410.03809v1 ) ライセンス: Link先を確認	Amelia Schueppert, Ben Glocker, Mélanie Roschewitz,	(参考訳) 本研究では, 皮膚マーカー, 乳房インプラント, ペースメーカーなどの放射性不透明物がマンモグラフィー分類モデルに及ぼす影響について検討した。 EMBEDデータセットから22,012個のマンモグラムを手動で注釈付けした後、頑健なマルチラベルアーティファクト検出器が開発され、5つの異なるアーティファクトタイプ(円形および三角形の皮膚マーカー、乳房インプラント、支持装置、スポット圧縮構造)を特定した。乳房密度評価とがん検診の2つの臨床的研究の結果、これらの人工物がモデルの性能、分類基準の変更、出力分布の歪曲などに大きな影響を及ぼすことが明らかとなった。これらの結果は,デジタルマンモグラフィーにおける信頼性とロバストな分類モデルの開発において,正確な自動アーティファクト検出の重要性を浮き彫りにした。将来の研究を促進するために、アノテーション、コード、モデル予測が公開されています。 This study investigates the effects of radio-opaque artefacts, such as skin markers, breast implants, and pacemakers, on mammography classification models. After manually annotating 22,012 mammograms from the publicly available EMBED dataset, a robust multi-label artefact detector was developed to identify five distinct artefact types (circular and triangular skin markers, breast implants, support devices and spot compression structures). Subsequent experiments on two clinically relevant tasks $-$ breast density assessment and cancer screening $-$ revealed that these artefacts can significantly affect model performance, alter classification thresholds, and distort output distributions. These findings underscore the importance of accurate automatic artefact detection for developing reliable and robust classification models in digital mammography. To facilitate future research our annotations, code, and model predictions are made publicly available.	翻訳日:2024-11-02 16:10:45 公開日:2024-10-04
# マンバはいつも「フリーランチ」を楽しめますか? Can Mamba Always Enjoy the "Free Lunch"? ( http://arxiv.org/abs/2410.03810v1 ) ライセンス: Link先を確認	Ruifeng Ren, Zhicong Li, Yong Liu,	(参考訳) トランスフォーマーは、現在のLarge Language Models (LLMs) の基盤となっているが、シーケンス長に関する推論におけるオーバーヘッドの線形増加は、長いシーケンスをモデル化する上での課題となっている。この文脈において、マンバは推論中の一定レベルのサイズから徐々に注目され、既存の経験的結果から、大きな貯蓄を提供しながらシーケンスモデリングにおいてトランスフォーマーと相容れない性能を発揮できることが示されている。しかし、マンバはいつも『フリーランチ』を楽しめますか? 本稿では,マンバの表現能力を理論的観点から分析することに焦点を当てる。まず,マンバと線形注意の関連性に着想を得て,COPY操作時のマンバの潜在的な欠点について検討した。以上の結果から,一定サイズのマンバはCOPY処理時にボトルネックに遭遇する可能性があるが,配列長と線形にスケールする場合には完璧に性能が向上する可能性が示唆された。本研究は,CoT (Chain of Thought) を装着した場合,マンバがDP問題に対処する能力について分析する。この結果から,任意のDP問題を解くために,Mambaの総コストは標準および効率的なトランスフォーマーに匹敵することがわかった。しかし、効率的なトランスフォーマーと同様に、ローカリティなどの良好な特性を持つDP問題に直面した場合、Mambaはオーバーヘッドを節約できる。私たちの結果はマンバの深い理解に寄与します。 Transformers have been the cornerstone of current Large Language Models (LLMs); however, its linear growth in overhead during inference with respect to sequence length poses challenges for modeling long sequences. In this context, Mamba has gradually attracted attention due to its constant-level size during inference and existing empirical results have shown that it can perform comparably to Transformers in sequence modeling while offering significant savings. However, one may ask that, can Mamba always enjoy the ``free lunch"? In this paper, we focus on analyzing the expressive ability of Mamba from a theoretical standpoint. First, inspired by the connection between Mamba and linear attention, we investigate potential shortcomings of the Mamba when performing the COPY operation. Our results indicate that Mamba with constant size may encounter bottlenecks when handling COPY, while it can achieve perfect performance when the size scales linearly with sequence length. Based on this observation, we analyze Mamba's ability to tackle DP problems when equipped with Chain of Thought (CoT). Our findings suggest that to solve arbitrary DP problems, the total cost of Mamba is comparable to standard and efficient Transformers. However, similar to efficient Transformers, when facing DP problems with favorable properties such as locality, Mamba can provide savings in overhead. Our results contribute to a deeper understanding of Mamba.	翻訳日:2024-11-02 16:10:45 公開日:2024-10-04
# EvenNICER-SLAM:イベントベースのニューラルインプリシトエンコーディングSLAM EvenNICER-SLAM: Event-based Neural Implicit Encoding SLAM ( http://arxiv.org/abs/2410.03812v1 ) ライセンス: Link先を確認	Shi Chen, Danda Pani Paudel, Luc Van Gool,	(参考訳) 密集した視覚的同時局在とマッピング(SLAM)の進歩は、神経暗黙表現の出現によって大いに促進された。 NICE-SLAMの典型的な例であるニューラル暗黙符号化SLAMは,近年,大規模屋内シーンにおいて有望な結果を示した。しかし、これらの手法は通常、適切に機能するために、入力として時間的に密度の高いRGB-D画像ストリームに依存している。入力源が高いフレームレートをサポートしていない場合やカメラの動きが速すぎる場合、これらの手法はしばしばクラッシュや追跡とマッピングの精度の大幅な低下を経験する。本稿では,イベントカメラの導入によりこの問題に対処する新しいアプローチである EvenNICER-SLAM を提案する。イベントカメラは、絶対的な明るさではなく、強度の変化に反応するバイオインスパイアされたカメラである。具体的には、イベントロスバックプロパゲーションストリームをNICE-SLAMパイプラインに統合し、RGB-D入力が不十分なカメラトラッキングを強化した。 EvenNICER-SLAM は高周波数のイベント画像入力を含め,RGB-D 入力周波数の低減で NICE-SLAM を著しく上回っていることがわかった。以上の結果から,イベントカメラによる高密度SLAMシステムの高速カメラ動作に対する堅牢性向上の可能性が示唆された。 The advancement of dense visual simultaneous localization and mapping (SLAM) has been greatly facilitated by the emergence of neural implicit representations. Neural implicit encoding SLAM, a typical example of which is NICE-SLAM, has recently demonstrated promising results in large-scale indoor scenes. However, these methods typically rely on temporally dense RGB-D image streams as input in order to function properly. When the input source does not support high frame rates or the camera movement is too fast, these methods often experience crashes or significant degradation in tracking and mapping accuracy. In this paper, we propose EvenNICER-SLAM, a novel approach that addresses this issue through the incorporation of event cameras. Event cameras are bio-inspired cameras that respond to intensity changes instead of absolute brightness. Specifically, we integrated an event loss backpropagation stream into the NICE-SLAM pipeline to enhance camera tracking with insufficient RGB-D input. We found through quantitative evaluation that EvenNICER-SLAM, with an inclusion of higher-frequency event image input, significantly outperforms NICE-SLAM with reduced RGB-D input frequency. Our results suggest the potential for event cameras to improve the robustness of dense SLAM systems against fast camera motion in real-world scenarios.	翻訳日:2024-11-02 16:10:45 公開日:2024-10-04
# SOI:モデルの部分状態の推定による計算複雑性のスケールダウン SOI: Scaling Down Computational Complexity by Estimating Partial States of the Model ( http://arxiv.org/abs/2410.03813v1 ) ライセンス: Link先を確認	Grzegorz Stefański, Paweł Daniluk, Artur Szumaczuk, Jakub Tkaczuk,	(参考訳) 消費者電子はムーアの法則によって記述された小型化の傾向に従っていた。マイクロコントローラユニット(MCU)の処理能力は向上しているが、最小限のアプライアンスで使用されるMCUは、特に時間に敏感なシナリオにおいて、さらに大きく、最先端の人工知能ニューラルネットワーク(ANN)を実行することができない。本研究では,ANNの計算複雑性を低減することを目的とした,Scattered Online Inference (SOI) と呼ばれる新しい手法を提案する。 SOIは時系列データとモデル予測の連続性と季節性を活用し、特に深い層において処理速度の改善のための外挿を可能にする。圧縮を適用することで、SOIはANNのより一般的な内部部分状態を生成し、各推論で完全なモデル再計算をスキップすることができる。 Consumer electronics used to follow the miniaturization trend described by Moore's Law. Despite increased processing power in Microcontroller Units (MCUs), MCUs used in the smallest appliances are still not capable of running even moderately big, state-of-the-art artificial neural networks (ANNs) especially in time-sensitive scenarios. In this work, we present a novel method called Scattered Online Inference (SOI) that aims to reduce the computational complexity of ANNs. SOI leverages the continuity and seasonality of time-series data and model predictions, enabling extrapolation for processing speed improvements, particularly in deeper layers. By applying compression, SOI generates more general inner partial states of ANN, allowing skipping full model recalculation at each inference.	翻訳日:2024-11-02 16:10:45 公開日:2024-10-04
# MSTARデータに基づくSAR画像における空間的・時間的土地クラッタ統計のモデル化と解析 Modeling and Analysis of Spatial and Temporal Land Clutter Statistics in SAR Imaging Based on MSTAR Data ( http://arxiv.org/abs/2410.03816v1 ) ライセンス: Link先を確認	Shahrokh Hamidi,	(参考訳) SAR(Synthetic Aperture Radar)イメージングのためのランド・クラッタの統計的解析は、研究や調査においてますます重要になっている。また,ロバストなアルゴリズムを設計し,背景クラッタにおける目標検出のタスクを実行するためには,絶対的に必要である。ランド・クラッタから所望の目標のエネルギーを抽出しようとする試みは、バックグラウンド・クラッタの統計的性質の完全な知識を必要とする。本稿では,ランド・クラッタの空間的および時間的特性について検討する。各画像のデータは異なるアスペクト角に基づいて収集されているため、時間解析はアスペクト角の変化を含む。これにより、時間解析は、データを収集したアスペクト角に関するレーダ断面の特性を含む。統計解析を行うために、Weibull、Log-normal、Gamma、Rayleighといった、よく知られた、関連するいくつかの分布が、土地の乱れをモデル化する主要な候補と見なされている。適合性テストは、KL(Kullback-Leibler)のディバージェンス(Diversergence)測定値に基づいている。本稿では,ワイブル分布が時間-アスペクト角統計解析に適合することを示すとともに,レイリー分布は背景クラッタの空間特性を高精度にモデル化する。最後に、上記の統計分析と、定数False Alarm Rate (CFAR) アルゴリズムを用いて、ランド・クラッタにおける目標検出を行う。解析の総合的な検証は、Xバンドのスポットライトモードで収集された移動目標獲得・認識(MSTAR)データセットを利用して行い、その結果を示す。 The statistical analysis of land clutter for Synthetic Aperture Radar (SAR) imaging has become an increasingly important subject for research and investigation. It is also absolutely necessary for designing robust algorithms capable of performing the task of target detection in the background clutter. Any attempt to extract the energy of the desired targets from the land clutter requires complete knowledge of the statistical properties of the background clutter. In this paper, the spatial as well as the temporal characteristics of the land clutter are studied. Since the data for each image has been collected based on a different aspect angle; therefore, the temporal analysis contains variation in the aspect angle. Consequently, the temporal analysis includes the characteristics of the radar cross section with respect to the aspect angle based on which the data has been collected. In order to perform the statistical analysis, several well-known and relevant distributions, namely, Weibull, Log-normal, Gamma, and Rayleigh are considered as prime candidates to model the land clutter. The goodness-of-fit test is based on the Kullback-Leibler (KL) Divergence metric. The detailed analysis presented in this paper demonstrates that the Weibull distribution is a more accurate fit for the temporal-aspect-angle statistical analysis while the Rayleigh distribution models the spatial characteristics of the background clutter with higher accuracy. Finally, based on the aforementioned statistical analyses and by utilizing the Constant False Alarm Rate (CFAR) algorithm, we perform target detection in land clutter. The overall verification of the analysis is performed by exploiting the Moving and Stationary Target Acquisition and Recognition (MSTAR) data-set, which has been collected in spotlight mode at X-band, and the results are presented.	翻訳日:2024-11-02 16:10:45 公開日:2024-10-04
# 特徴展開と類似性マッピングを組み合わせたTLSに基づく新しいフィンガープリント手法 A novel TLS-based Fingerprinting approach that combines feature expansion and similarity mapping ( http://arxiv.org/abs/2410.03817v1 ) ライセンス: Link先を確認	Amanda Thomson, Leandros Maglaras, Naghmeh Moradpoor,	(参考訳) 悪意あるドメインはインターネットのランドスケープの一部だが、企業と個人の両方にとってより普及し、より危険になっている。さまざまなテクノロジをホストして,Malwareやコマンドやコントロール,さらには偽装や公開を意図した複雑なフィッシングサイトなど,さまざまなコンテンツを提供することも可能だ。このようなドメインの追跡、ブロッキング、検出は複雑で、多くの場合、オープンソースのTLSフィンガープリント技術とのSIEM統合やリスト管理が複雑になる。 JARMやJA3のような多くの指紋技術は、ドメイン分類を決定するために脅威ハンターによって使用されているが、特にCDNにおいてTLS類似性の増加に伴い、有用性が低下している。本研究の目的は,オープンソースのTLSフィンガープリント技術を改良して粒度を向上し,これまで未知であったドメインの追跡と検出を可能にする類似性マッピングシステムを開発することである。これは、TLSフィンガープリントをHTTPヘッダデータで強化し、MinHashとローカル感度ハッシュを使用して高次元データを表現した粒度類似性視覚化を生成する。化学領域からの影響を受け、化学指紋の高次元類似性の問題にしばしば遭遇する。濃縮された指紋が生成され、3つの異なるデータセットで視覚化された。結果は分析され評価され、67の未知の悪意のあるドメインが既知の悪意のあるドメインと類似していることから検出された。類似性マッピング技術は、MalwareドメインとPhishingドメインの早期検出の領域において、明確な可能性を証明している。 Malicious domains are part of the landscape of the internet but are becoming more prevalent and more dangerous to both companies and individuals. They can be hosted on variety of technologies and serve an array of content, ranging from Malware, command and control, and complex Phishing sites that are designed to deceive and expose. Tracking, blocking and detecting such domains is complex, and very often involves complex allow or deny list management or SIEM integration with open-source TLS fingerprinting techniques. Many fingerprint techniques such as JARM and JA3 are used by threat hunters to determine domain classification, but with the increase in TLS similarity, particularly in CDNs, they are becoming less useful. The aim of this paper is to adapt and evolve open-source TLS fingerprinting techniques with increased features to enhance granularity, and to produce a similarity mapping system that enables the tracking and detection of previously unknown malicious domains. This is done by enriching TLS fingerprints with HTTP header data and producing a fine grain similarity visualisation that represented high dimensional data using MinHash and local sensitivity hashing. Influence was taken from the Chemistry domain, where the problem of high dimensional similarity in chemical fingerprints is often encountered. An enriched fingerprint was produced which was then visualised across three separate datasets. The results were analysed and evaluated, with 67 previously unknown malicious domains being detected based on their similarity to known malicious domains and nothing else. The similarity mapping technique produced demonstrates definite promise in the arena of early detection of Malware and Phishing domains.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-04
# 大規模言語モデルは強い自己解凍器になり得る Large Language Models can be Strong Self-Detoxifiers ( http://arxiv.org/abs/2410.03818v1 ) ライセンス: Link先を確認	Ching-Yun Ko, Pin-Yu Chen, Payel Das, Youssef Mroueh, Soham Dan, Georgios Kollias, Subhajit Chaudhury, Tejaswini Pedapati, Luca Daniel,	(参考訳) 有害で有害な出力を生成する可能性を減らすことは、大きな言語モデル(LLM)を整列させる上で必須の課題である。既存の手法は主に、外部報酬モデル(例えば、他の言語モデル)のトレーニングや、自己生成データを用いてLPMを微調整することに依存している。本稿では,LLMが付加的な報酬モデルや再学習を使わずに自己解毒能力を有することを示す。本稿では,LSMの毒性低減のための軽量制御復号アルゴリズムである‘textit{Self-disciplined Autoregressive Sampling (SASA) を提案する。 SASA は LLM の文脈表現を利用して、分析形式で有毒な v.s. 非有毒な出力を特徴付ける線形部分空間を学習する。応答トークンをトーケンで自動補完する場合、SASAは、自動回帰サンプリング戦略を調整することにより、現在の出力のマージンを動的に追跡し、有害なサブスペースから退避させる。 Llama-3.1-Instruct (8B), Llama-2 (7B), GPT2-L model with the RealToxicityPrompts, BOLD, and AttaQ benchmarks, SASAは、異なるスケールと性質のLLM(Llama-3.1-Instruct (8B), Llama-2 (7B), GPT2-L model with the RealToxicityPrompts, BOLD, AttaQ benchmarks, SASAは、元のモデルと比較して生成された文の質を著しく向上し、最先端のデトキシ化技術に匹敵する性能を達成し、LLMの内部表現のみを使用することで毒性レベルを著しく低減した。 Reducing the likelihood of generating harmful and toxic output is an essential task when aligning large language models (LLMs). Existing methods mainly rely on training an external reward model (i.e., another language model) or fine-tuning the LLM using self-generated data to influence the outcome. In this paper, we show that LLMs have the capability of self-detoxification without the use of an additional reward model or re-training. We propose \textit{Self-disciplined Autoregressive Sampling (SASA)}, a lightweight controlled decoding algorithm for toxicity reduction of LLMs. SASA leverages the contextual representations from an LLM to learn linear subspaces characterizing toxic v.s. non-toxic output in analytical forms. When auto-completing a response token-by-token, SASA dynamically tracks the margin of the current output to steer the generation away from the toxic subspace, by adjusting the autoregressive sampling strategy. Evaluated on LLMs of different scale and nature, namely Llama-3.1-Instruct (8B), Llama-2 (7B), and GPT2-L models with the RealToxicityPrompts, BOLD, and AttaQ benchmarks, SASA markedly enhances the quality of the generated sentences relative to the original models and attains comparable performance to state-of-the-art detoxification techniques, significantly reducing the toxicity level by only using the LLM's internal representations.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-04
# 結晶チャーン絶縁体における電偏極と境界電荷と角電荷からの離散シフト Electric polarization and discrete shift from boundary and corner charge in crystalline Chern insulators ( http://arxiv.org/abs/2410.03821v1 ) ライセンス: Link先を確認	Yuxuan Zhang, Maissam Barkeshli,	(参考訳) 最近では、結晶対称性を持つ物質の位相位相と$U(1)$の電荷保存が、離散シフト$\mathscr{S}_{\text{o}}$と電気分極$\vec{\mathscr{P}}_{\text{o}}$という、多体不変量の集合によって部分的に特徴づけられることが示されている。重要なことに、これらは非ゼロチャーン数や磁場でも定義することができる。これらの不変量の1つが、格子の微分や転位に近い電荷に対する量子化された分数的寄与である。本稿では,これらの不変量も,システムの境界における全電荷(mod1)の長さとコーナー依存性から抽出できることを示す。系のすべての部分領域の総電荷に対して$\mathscr{S}_{\text{o}}$および$\vec{\mathscr{P}}_{\text{o}}$の項で一般的な公式を提供し、それは、完全な境界やバルク格子欠陥を含むことができ、境界、コーナー、偏差、電荷応答を単一の一般理論に統一することができる。これらの結果はチャーン絶縁体にとって、その隙間のないキラルエッジモードにもかかわらず成り立ち、本質的に2次元電気分極の曖昧な定義が最近まで不明である。我々はまた、我々の理論が四重極絶縁体のトポロジカル応答を完全に特徴づける方法について論じる。 Recently, it has been shown how topological phases of matter with crystalline symmetry and $U(1)$ charge conservation can be partially characterized by a set of many-body invariants, the discrete shift $\mathscr{S}_{\text{o}}$ and electric polarization $\vec{\mathscr{P}}_{\text{o}}$, where $\text{o}$ labels a high symmetry point. Crucially, these can be defined even with non-zero Chern number and/or magnetic field. One manifestation of these invariants is through quantized fractional contributions to the charge in the vicinity of a lattice disclination or dislocation. In this paper, we show that these invariants can also be extracted from the length and corner dependence of the total charge (mod 1) on the boundary of the system. We provide a general formula in terms of $\mathscr{S}_{\text{o}}$ and $\vec{\mathscr{P}}_{\text{o}}$ for the total charge of any subregion of the system which can include full boundaries or bulk lattice defects, unifying boundary, corner, disclination, and dislocation charge responses into a single general theory. These results hold for Chern insulators, despite their gapless chiral edge modes, and for which an unambiguous definition of an intrinsically two-dimensional electric polarization has been unclear until recently. We also discuss how our theory can fully characterize the topological response of quadrupole insulators.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-04
# MonST3R: 動きの有無で幾何学を推定するための簡単なアプローチ MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion ( http://arxiv.org/abs/2410.03825v1 ) ライセンス: Link先を確認	Junyi Zhang, Charles Herrmann, Junhwa Hur, Varun Jampani, Trevor Darrell, Forrester Cole, Deqing Sun, Ming-Hsuan Yang,	(参考訳) 物体が時間とともに動き、変形する動的なシーンから幾何学を推定することは、コンピュータビジョンにおける中核的な課題である。現在のアプローチは、しばしばマルチステージパイプラインや、深さやフローなどのサブタスクに問題を分解するグローバル最適化に依存しており、複雑なシステムがエラーを起こしやすい。本稿では,動的シーンから時間ステップごとの幾何を直接推定する新しい幾何学的アプローチであるMotion DUSt3R(MonST3R)を提案する。重要な洞察は、各タイムステップのポイントマップを単純に推定することで、静的シーンにのみ使用されるDUST3Rの表現を動的シーンに効果的に適用できるということです。しかし、このアプローチは、適切なトレーニングデータの不足、すなわち、深度ラベル付きポーズ付きビデオの不足という、大きな課題を呈している。それにもかかわらず、この問題を微調整タスクとして処理し、いくつかの適切なデータセットを特定し、この制限されたデータ上でモデルを戦略的に訓練することにより、明示的な動き表現なしでも、モデルを驚くほど動的に扱えることを示す。そこで本研究では,複数のダウンストリームビデオ特化タスクに対する新たな最適化を導入し,映像深度とカメラポーズ推定における高い性能を示し,ロバストさと効率性の観点から先行作業よりも優れた性能を示す。さらに、MonST3Rは、主にフィードフォワード4D再構成の有望な結果を示す。 Estimating geometry from dynamic scenes, where objects move and deform over time, remains a core challenge in computer vision. Current approaches often rely on multi-stage pipelines or global optimizations that decompose the problem into subtasks, like depth and flow, leading to complex systems prone to errors. In this paper, we present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes. Our key insight is that by simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes. However, this approach presents a significant challenge: the scarcity of suitable training data, namely dynamic, posed videos with depth labels. Despite this, we show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics, even without an explicit motion representation. Based on this, we introduce new optimizations for several downstream video-specific tasks and demonstrate strong performance on video depth and camera pose estimation, outperforming prior work in terms of robustness and efficiency. Moreover, MonST3R shows promising results for primarily feed-forward 4D reconstruction.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-04
# ミシン情報(MisLC:misinformation with Legal Consequences) : ミス情報の社会的ハームを和らげるための新しい課題 Misinformation with Legal Consequences (MisLC): A New Task Towards Harnessing Societal Harm of Misinformation ( http://arxiv.org/abs/2410.03829v1 ) ライセンス: Link先を確認	Chu Fei Luo, Radin Shayanfar, Rohan Bhambhoria, Samuel Dahan, Xiaodan Zhu,	(参考訳) 誤った情報や不正確な情報と定義されている誤報は、悪意のある意図や無害な意図によって拡散されるときに、社会に重大な害をもたらす可能性がある。迅速なオンライン情報交換は、誤情報による被害を軽減するために高度な検出機構を必要とする。しかし、既存の研究は、主に虚偽情報の法的含意と社会的帰結を見越して、正確性を評価することに重点を置いている。本研究では,法的な問題を用いた誤情報検出の定義を社会的影響の測定として統合し,誤情報に対処するための学際的努力と結果をもたらすことを目的としている。 Misinformation with Legal Consequence (MisLC) は、ヘイトスピーチ、選挙法、プライバシー規制を含む、より広範な4つの法的トピックと11のきめ細かい法的問題を含む、幅広い法的ドメインの定義を活用する。本研究は,クラウドソース型チェックネスネスと誤情報のエキスパート評価を利用した2段階のデータセットキュレーション手法を提案する。課題定義から実験,専門家の関与に至るまで,実証的な証拠を通じてMisLCタスクに関する洞察を提供する。最新の大規模言語モデルと検索拡張生成はタスクの効果的なベースラインであるが、専門家のパフォーマンスの複製には程遠い。 Misinformation, defined as false or inaccurate information, can result in significant societal harm when it is spread with malicious or even innocuous intent. The rapid online information exchange necessitates advanced detection mechanisms to mitigate misinformation-induced harm. Existing research, however, has predominantly focused on assessing veracity, overlooking the legal implications and social consequences of misinformation. In this work, we take a novel angle to consolidate the definition of misinformation detection using legal issues as a measurement of societal ramifications, aiming to bring interdisciplinary efforts to tackle misinformation and its consequence. We introduce a new task: Misinformation with Legal Consequence (MisLC), which leverages definitions from a wide range of legal domains covering 4 broader legal topics and 11 fine-grained legal issues, including hate speech, election laws, and privacy regulations. For this task, we advocate a two-step dataset curation approach that utilizes crowd-sourced checkworthiness and expert evaluations of misinformation. We provide insights about the MisLC task through empirical evidence, from the problem definition to experiments and expert involvement. While the latest large language models and retrieval-augmented generation are effective baselines for the task, we find they are still far from replicating expert performance.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-04
# 機械学習におけるファインチューニングの課題 : 理論的考察と治療的アプローチ Why Fine-Tuning Struggles with Forgetting in Machine Unlearning? Theoretical Insights and a Remedial Approach ( http://arxiv.org/abs/2410.03833v1 ) ライセンス: Link先を確認	Meng Ding, Jinhui Xu, Kaiyi Ji,	(参考訳) 機械学習は、トレーニングされたモデルからデータの特定のサブセットを「取り除く」ことに焦点を当てた、重要な研究領域として登場した。ファインチューニング(FT)手法は、モデル性能を効果的に維持するため、未学習を近似するための基本的なアプローチの1つとなっている。しかし,本手法は対象データを忘れることに苦慮している。本稿では,線形回帰フレームワーク内でのFT手法の理論的解析を行い,この現象を深く研究する。異なる特徴と重なり合う特徴を持つ2つのシナリオについて検討する。以上の結果から,FTモデルが残余損失をゼロにできるが,黄金モデルと異なり,忘れるデータを忘れることができないことが明らかとなった。この分析結果から,事前学習したモデルでは忘れデータに関する情報が保持され,微調整処理はこの保持情報に影響を与えないことが判明した。この問題に対処するために,我々はまず,事前訓練されたモデルにおけるデータ保存の維持を緩和する理論的アプローチを提案する。分析の結果, 忘れるデータの影響を除去することで, FTモデルが黄金モデルの性能と一致できることが判明した。この知見に基づいて、細調整モデルと黄金モデルの間の未学習損失ギャップを実質的に低減する識別正則化項を導入する。提案手法の有効性を検証し,提案手法の有効性を実証した。 Machine Unlearning has emerged as a significant area of research, focusing on 'removing' specific subsets of data from a trained model. Fine-tuning (FT) methods have become one of the fundamental approaches for approximating unlearning, as they effectively retain model performance. However, it is consistently observed that naive FT methods struggle to forget the targeted data. In this paper, we present the first theoretical analysis of FT methods for machine unlearning within a linear regression framework, providing a deeper exploration of this phenomenon. We investigate two scenarios with distinct features and overlapping features. Our findings reveal that FT models can achieve zero remaining loss yet fail to forget the forgetting data, unlike golden models (trained from scratch without the forgetting data). This analysis reveals that naive FT methods struggle with forgetting because the pretrained model retains information about the forgetting data, and the fine-tuning process has no impact on this retained information. To address this issue, we first propose a theoretical approach to mitigate the retention of forgetting data in the pretrained model. Our analysis shows that removing the forgetting data's influence allows FT models to match the performance of the golden model. Building on this insight, we introduce a discriminative regularization term to practically reduce the unlearning loss gap between the fine-tuned model and the golden model. Our experiments on both synthetic and real-world datasets validate these theoretical insights and demonstrate the effectiveness of the proposed regularization method.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-04
# GraphRouter: LLM選択のためのグラフベースのルータ GraphRouter: A Graph-based Router for LLM Selections ( http://arxiv.org/abs/2410.03834v1 ) ライセンス: Link先を確認	Tao Feng, Yanzhen Shen, Jiaxuan You,	(参考訳) LLM(Large Language Models)の急速な増加と多種多様さは、特に性能と計算コストのトレードオフを考慮すると、与えられたクエリに対して適切なLLMを効率的に選択する上で大きな課題となる。現在のLLM選択法は、タスク、クエリ、LLM間の文脈的相互作用を利用する能力の制限や、トランスダクティブ学習フレームワークへの依存のため、新しいLLMと異なるタスクをまたいだ一般化に苦慮することが多い。これらの欠点に対処するために,タスク,クエリ,LLM間のコンテキスト情報をフル活用してLLM選択プロセスを強化する,GraphRouterという新しいインダクティブグラフフレームワークを導入する。 GraphRouterは、タスク、クエリ、LLMノードからなる異種グラフをエッジとして表現し、クエリの要求とLLMの機能の間のコンテキスト情報を効率的にキャプチャする。革新的なエッジ予測メカニズムを通じて、GraphRouterは潜在的なエッジの属性(LLM応答の効果とコスト)を予測でき、既存のLLMと新しく導入されたLLMの両方に適応する最適化されたレコメンデーションを、再トレーニングを必要とせずに実現することができる。 3つの異なるエフェクトコストの重みシナリオに関する総合的な実験により、GraphRouterは既存のルータを大幅に上回り、12.3%の最小パフォーマンス向上を実現している。さらに、新しいLLM設定をまたいだ拡張一般化を実現し、少なくとも9.5%の高速化と計算要求の大幅な削減により、多様なタスクをサポートする。この研究は、LLMの文脈的かつ適応的な選択にグラフベースのアプローチを適用し、現実世界のアプリケーションに対する洞察を提供する。 GraphRouterのコードはまもなくhttps://github.com/ulab-uiuc/GraphRouter.orgで公開される。 The rapidly growing number and variety of Large Language Models (LLMs) present significant challenges in efficiently selecting the appropriate LLM for a given query, especially considering the trade-offs between performance and computational cost. Current LLM selection methods often struggle to generalize across new LLMs and different tasks because of their limited ability to leverage contextual interactions among tasks, queries, and LLMs, as well as their dependence on a transductive learning framework. To address these shortcomings, we introduce a novel inductive graph framework, named as GraphRouter, which fully utilizes the contextual information among tasks, queries, and LLMs to enhance the LLM selection process. GraphRouter constructs a heterogeneous graph comprising task, query, and LLM nodes, with interactions represented as edges, which efficiently captures the contextual information between the query's requirements and the LLM's capabilities. Through an innovative edge prediction mechanism, GraphRouter is able to predict attributes (the effect and cost of LLM response) of potential edges, allowing for optimized recommendations that adapt to both existing and newly introduced LLMs without requiring retraining. Comprehensive experiments across three distinct effect-cost weight scenarios have shown that GraphRouter substantially surpasses existing routers, delivering a minimum performance improvement of 12.3%. In addition, it achieves enhanced generalization across new LLMs settings and supports diverse tasks with at least a 9.5% boost in effect and a significant reduction in computational demands. This work endeavors to apply a graph-based approach for the contextual and adaptive selection of LLMs, offering insights for real-world applications. Our codes for GraphRouter will soon be released at https://github.com/ulab-uiuc/GraphRouter.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-04
# 繰り返し測定による非線形力学系の量子シミュレーション Quantum Simulation of Nonlinear Dynamical Systems Using Repeated Measurement ( http://arxiv.org/abs/2410.03838v1 ) ライセンス: Link先を確認	Joseph Andress, Alexander Engel, Yuan Shi, Scott Parker,	(参考訳) 本稿では, プラズマ物理学における偏微分方程式から生じる非線形常微分方程式(ODE)の初期値問題を解くために, 繰り返し測定に基づく量子アルゴリズムを提案する。力学系をハミルトン形式に写像し、ハミルトン行列は力学変数の関数である。時間を早めるために、前回の時間ステップからの期待値を測定し、古典的にハミルトン関数を評価し、力学に確率性を導入する。次に、評価定数ハミルトン行列を用いて、短時間で標準量子ハミルトンシミュレーションを行う。このアプローチでは、必要な観測可能量を測定するために各ステップで消費される量子状態のアンサンブルを進化させる必要がある。古典ロジスティック系とローレンツ系に、積分可能かつカオス的状態の両方でこのアプローチを適用する。アウト分析により、解の精度は確率的サンプリング率と力学系の性質の両方に影響されることが示された。 We present a quantum algorithm based on repeated measurement to solve initial-value problems for nonlinear ordinary differential equations (ODEs), which may be generated from partial differential equations in plasma physics. We map a dynamical system to a Hamiltonian form, where the Hamiltonian matrix is a function of dynamical variables. To advance in time, we measure expectation values from the previous time step, and evaluate the Hamiltonian function classically, which introduces stochasticity into the dynamics. We then perform standard quantum Hamiltonian simulation over a short time, using the evaluated constant Hamiltonian matrix. This approach requires evolving an ensemble of quantum states, which are consumed each step to measure required observables. We apply this approach to the classic logistic and Lorenz systems, in both integrable and chaotic regimes. Out analysis shows that solutions' accuracy is influenced by both the stochastic sampling rate and the nature of the dynamical system.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-04
# FaithCAMERA:広告テキスト生成のための忠実なデータセットの構築 FaithCAMERA: Construction of a Faithful Dataset for Ad Text Generation ( http://arxiv.org/abs/2410.03839v1 ) ライセンス: Link先を確認	Akihiko Kato, Masato Mita, Soichiro Murakami, Ukyo Honda, Sho Hoshino, Peinan Zhang,	(参考訳) 広告テキスト生成(ATG)では、望ましい広告テキストは忠実かつ情報的である。つまり、入力文書に忠実であると同時に、潜在的な顧客にアピールする重要な情報を含んでいるべきである。既存の評価データであるCAMERA(arXiv:2309.12030)は、広告クリエーターによって作成された参照広告テキストからなるので、情報量を評価するのに適している。しかしながら、これらの参照には入力に不満足な情報が含まれており、ATG研究の推進において顕著な障害となっている。本研究では、社内の広告制作者と協力して、CAMERA参照を洗練させ、FithCAMERAと呼ばれる別のATG評価データセットを開発し、参照の忠実性を保証する。 FaithCAMERAを用いて、忠実性を維持しながら、既存の忠実性を改善する方法が、いかに情報的広告テキストを生成できるかを評価することができる。実験の結果,不誠実なエンティティを含むトレーニングデータを削除することで,エンティティレベルでの忠実度や情報性が向上するが,文レベルでは両者を減少させることがわかった。この結果は、将来のATG研究にとって、トレーニングデータをスケールするだけでなく、その忠実性を確保することが不可欠であることを示唆している。私たちのデータセットは公開されます。 In ad text generation (ATG), desirable ad text is both faithful and informative. That is, it should be faithful to the input document, while at the same time containing important information that appeals to potential customers. The existing evaluation data, CAMERA (arXiv:2309.12030), is suitable for evaluating informativeness, as it consists of reference ad texts created by ad creators. However, these references often include information unfaithful to the input, which is a notable obstacle in promoting ATG research. In this study, we collaborate with in-house ad creators to refine the CAMERA references and develop an alternative ATG evaluation dataset called FaithCAMERA, in which the faithfulness of references is guaranteed. Using FaithCAMERA, we can evaluate how well existing methods for improving faithfulness can generate informative ad text while maintaining faithfulness. Our experiments show that removing training data that contains unfaithful entities improves the faithfulness and informativeness at the entity level, but decreases both at the sentence level. This result suggests that for future ATG research, it is essential not only to scale the training data but also to ensure their faithfulness. Our dataset will be publicly available.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-04
# 要旨:STANのシンプルかつ高速な説明 Explaining the (Not So) Obvious: Simple and Fast Explanation of STAN, a Next Point of Interest Recommendation System ( http://arxiv.org/abs/2410.03841v1 ) ライセンス: Link先を確認	Fajrian Yunus, Talel Abdessalem,	(参考訳) 近年、機械学習システムの説明に多くの努力が費やされている。しかし、いくつかの機械学習手法は本質的に説明可能であるため、完全にブラックボックスではない。これによって開発者は、複雑で高価な説明可能性技術を開発することなく、アウトプットを理解することができる。それに加えて、説明責任は問題のコンテキストに適合するように調整されるべきである。協調フィルタリングに依存するレコメンデーションシステムでは, 類似ユーザの行動に基づいたレコメンデーションを行う。同様に、レコメンデーションシステムがシーケンス予測に基づいている場合、どの入力タイムステップが最も影響のあるかを説明すべきである。我々は,STAN(Spatio-Temporal Attention Network for Next Location Recommendation)におけるこの哲学/パラダイムを,協調フィルタリングとシーケンス予測に基づく次の関心点推薦システムで実証する。また、その説明がアウトプットの“デバッグ”に役立つことも示しています。 A lot of effort in recent years have been expended to explain machine learning systems. However, some machine learning methods are inherently explainable, and thus are not completely black box. This enables the developers to make sense of the output without a developing a complex and expensive explainability technique. Besides that, explainability should be tailored to suit the context of the problem. In a recommendation system which relies on collaborative filtering, the recommendation is based on the behaviors of similar users, therefore the explanation should tell which other users are similar to the current user. Similarly, if the recommendation system is based on sequence prediction, the explanation should also tell which input timesteps are the most influential. We demonstrate this philosophy/paradigm in STAN (Spatio-Temporal Attention Network for Next Location Recommendation), a next Point of Interest recommendation system based on collaborative filtering and sequence prediction. We also show that the explanation helps to "debug" the output.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-04
# ORAsistant: OpenROADのためのカスタムRAGベースの会話アシスタント ORAssistant: A Custom RAG-based Conversational Assistant for OpenROAD ( http://arxiv.org/abs/2410.03845v1 ) ライセンス: Link先を確認	Aviral Kaintura, Palaniappan R, Shui Song Luar, Indira Iyer Almeida,	(参考訳) オープンソースの電子設計自動化(EDA)ツールは、複雑さやコスト、アクセスといった商用EDAツールの重要な障壁に対処することで、チップ設計を急速に変革している。近年のLLM(Large Language Models)の進歩は,セットアップや意思決定,フロー自動化など,さまざまなタスクにユーザ支援を提供することによって,チップ設計の効率を向上している。本稿では,OpenROAD の対話型アシスタント ORAssistant について紹介する。 ORAsistant は OpenROAD フローのユーザエクスペリエンスを RTL-GDSII から改善することを目的としている。 ORAsistantは現在、OpenROAD、OpenROAD-flow-scripts、Yosys、OpenSTA、KLayoutを統合している。データモデルは、公開されているドキュメントとGitHubリソースから構築されている。提案されたアーキテクチャはスケーラブルで、他のオープンソースツール、オペレーティングモード、LLMモデルの拡張をサポートする。 ORAsistantの構築とテストには,基本LLMモデルとしてGoogle Geminiを使用します。 RAGモデルの初期評価結果は、非微調整LDMと比較して、性能と精度が顕著に向上したことを示している。 Open-source Electronic Design Automation (EDA) tools are rapidly transforming chip design by addressing key barriers of commercial EDA tools such as complexity, costs, and access. Recent advancements in Large Language Models (LLMs) have further enhanced efficiency in chip design by providing user assistance across a range of tasks like setup, decision-making, and flow automation. This paper introduces ORAssistant, a conversational assistant for OpenROAD, based on Retrieval-Augmented Generation (RAG). ORAssistant aims to improve the user experience for the OpenROAD flow, from RTL-GDSII by providing context-specific responses to common user queries, including installation, command usage, flow setup, and execution, in prose format. Currently, ORAssistant integrates OpenROAD, OpenROAD-flow-scripts, Yosys, OpenSTA, and KLayout. The data model is built from publicly available documentation and GitHub resources. The proposed architecture is scalable, supporting extensions to other open-source tools, operating modes, and LLM models. We use Google Gemini as the base LLM model to build and test ORAssistant. Early evaluation results of the RAG-based model show notable improvements in performance and accuracy compared to non-fine-tuned LLMs.	翻訳日:2024-11-02 16:00:59 公開日:2024-10-04
# 確率環境における逆逆強化学習のためのモデルベース逆整形 Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments ( http://arxiv.org/abs/2410.03847v1 ) ライセンス: Link先を確認	Simon Sinong Zhan, Qingyuan Wu, Philip Wang, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu,	(参考訳) 本稿では,理論的結果が得られず,性能が劣化する確率環境下でのAIRL(Adversarial Inverse Reinforcement Learning)手法の限界に対処することを目的とする。この問題に対処するため,確率的環境における最適政策の理論的保証とともに,力学情報を報酬形成に注入する手法を提案する。本稿では,モデル強化型AIRLフレームワークについて,モデル拡張型AIRLフレームワークを提案する。さらに,提案手法における報酬誤差境界と性能差の包括的理論的解析を行った。 MuJoCoベンチマークによる実験結果から,本手法は確率的環境における優れた性能と決定論的環境における競合性能を達成でき,既存のベースラインと比較して試料効率が大幅に向上することが示された。 In this paper, we aim to tackle the limitation of the Adversarial Inverse Reinforcement Learning (AIRL) method in stochastic environments where theoretical results cannot hold and performance is degraded. To address this issue, we propose a novel method which infuses the dynamics information into the reward shaping with the theoretical guarantee for the induced optimal policy in the stochastic environments. Incorporating our novel model-enhanced rewards, we present a novel Model-Enhanced AIRL framework, which integrates transition model estimation directly into reward shaping. Furthermore, we provide a comprehensive theoretical analysis of the reward error bound and performance difference bound for our method. The experimental results in MuJoCo benchmarks show that our method can achieve superior performance in stochastic environments and competitive performance in deterministic environments, with significant improvement in sample efficiency, compared to existing baselines.	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# プロンプトを用いて実人の言語スタイルを模倣する大規模言語モデル Using Prompts to Guide Large Language Models in Imitating a Real Person's Language Style ( http://arxiv.org/abs/2410.03848v1 ) ライセンス: Link先を確認	Ziyang Chen, Stylios Moscholios,	(参考訳) GPTシリーズやLlamaシリーズのような大規模言語モデル(LLM)は、自然言語処理、文脈理解、テキスト生成において強力な能力を示している。近年、研究者は様々なタスクの実行においてLLMの能力を高めようとしており、よく設計されたプロンプトがこれらのタスクにおけるLLMのパフォーマンスを大幅に改善できることを多くの研究が証明している。本研究では,同じゼロショットプロンプトの指導の下で,3つの異なる大言語モデルの言語スタイルの模倣能力を比較する。また、3つの異なるプロンプトによって個別にガイドされる場合、同じ大きな言語モデルの模倣能力を比較する。さらに、Llama 3にTree-of-Thoughts (ToT) Promptingメソッドを適用することで、実際の人の言語スタイルを持つ会話型AIが作られた。本研究では,LSMとプロンプトの評価に3つの評価手法を用いた。その結果,Llama 3は言語スタイルの模倣に最適であり,ToTプロンプト法が最も効果的であることがわかった。 ToTフレームワークを使用して、Llama 3は、コアパラメータを変更することなく、特定の個人の言語スタイルでユーザと対話するようにガイドされ、それによって、個人の言語スタイルを反映したテキストベースの会話型AIを生成する。 Large language models (LLMs), such as GPT series and Llama series have demonstrated strong capabilities in natural language processing, contextual understanding, and text generation. In recent years, researchers are trying to enhance the abilities of LLMs in performing various tasks, and numerous studies have proved that well-designed prompts can significantly improve the performance of LLMs on these tasks. This study compares the language style imitation ability of three different large language models under the guidance of the same zero-shot prompt. It also involves comparing the imitation ability of the same large language model when guided by three different prompts individually. Additionally, by applying a Tree-of-Thoughts (ToT) Prompting method to Llama 3, a conversational AI with the language style of a real person was created. In this study, three evaluation methods were used to evaluate LLMs and prompts. The results show that Llama 3 performs best at imitating language styles, and that the ToT prompting method is the most effective to guide it in imitating language styles. Using a ToT framework, Llama 3 was guided to interact with users in the language style of a specific individual without altering its core parameters, thereby creating a text-based conversational AI that reflects the language style of the individual.	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# 文脈付き逐次確率割り当て:Minimax Regret、Contextual Shtarkov Sums、Contextual Normalized Maximum Likelihood Sequential Probability Assignment with Contexts: Minimax Regret, Contextual Shtarkov Sums, and Contextual Normalized Maximum Likelihood ( http://arxiv.org/abs/2410.03849v1 ) ライセンス: Link先を確認	Ziyi Liu, Idan Attias, Daniel M. Roy,	(参考訳) 我々は、任意の、おそらくは非パラメトリックな仮説クラスに関して、逐次確率代入の基本的問題(対数損失を伴うオンライン学習)について研究する。我々のゴールは、ミニマックスの後悔を特徴づける仮説クラスに対する複雑さの測定と、一般化されたミニマックスの最適アルゴリズムを決定することである。特に、連続$\ell_{\infty}$エントロピー(Rakhlin and Sridharan, 2015, Bilodeau et al , 2020, Wu et al , 2023)は、一般にミニマックスリスクを特徴づけないことを示した。シュタルコフ (1987) とラフリン (Rakhlin)、スリダーラン (Sridharan)、テワリ (Tewari) (2010) のセミナルな研究から着想を得て、シュタルコフ和を多元的文脈木に射影した後のシュタルコフ和に対応する新しい複雑性測度 \emph{contextual Shtarkov sum} を導入し、最悪の場合の文脈的シュタルコフ和がミニマックス後悔と等しいことを示す。文脈シュタルコフ和を用いて、極小極小戦略を導出し、これを 'emph{contextual Normalized Maximum Likelihood} (cNML) と呼ぶ。私たちの結果は、バイナリラベルを超えて、シーケンシャルな専門家に当てはまります。この特徴付けの有用性を説明するために、Blodeau et al (2020) と Wu et al (2023) によるシーケンシャル $\ell_{\infty}$ entropy による新しい後悔の上界の短い証明を与える。 We study the fundamental problem of sequential probability assignment, also known as online learning with logarithmic loss, with respect to an arbitrary, possibly nonparametric hypothesis class. Our goal is to obtain a complexity measure for the hypothesis class that characterizes the minimax regret and to determine a general, minimax optimal algorithm. Notably, the sequential $\ell_{\infty}$ entropy, extensively studied in the literature (Rakhlin and Sridharan, 2015, Bilodeau et al., 2020, Wu et al., 2023), was shown to not characterize minimax risk in general. Inspired by the seminal work of Shtarkov (1987) and Rakhlin, Sridharan, and Tewari (2010), we introduce a novel complexity measure, the \emph{contextual Shtarkov sum}, corresponding to the Shtarkov sum after projection onto a multiary context tree, and show that the worst case log contextual Shtarkov sum equals the minimax regret. Using the contextual Shtarkov sum, we derive the minimax optimal strategy, dubbed \emph{contextual Normalized Maximum Likelihood} (cNML). Our results hold for sequential experts, beyond binary labels, which are settings rarely considered in prior work. To illustrate the utility of this characterization, we provide a short proof of a new regret upper bound in terms of sequential $\ell_{\infty}$ entropy, unifying and sharpening state-of-the-art bounds by Bilodeau et al. (2020) and Wu et al. (2023).	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# ハイブリッド量子フォトニックキャビティのスケーラブルな構築 Scalable construction of hybrid quantum photonic cavities ( http://arxiv.org/abs/2410.03851v1 ) ライセンス: Link先を確認	Andrew S. Greenspon, Mark Dong, Ian Christen, Gerald Gilbert, Matt Eichenfield, Dirk Englund,	(参考訳) ナノフォトニック共振器は、効率的なスピン光子界面からレーザー発振器、精密センシングまで、多くの応用の中心となっている。先導的なアプローチは、幅広い誘電体材料で実現されたフォトニック結晶(PhC)キャビティである。しかし、概念実証装置を機能システムに変換するには、波長スケールの閉じ込めと高品質な要素を持つ共振器、集積回路とフォトニック回路とのスケーラブルな統合、電気的または機械的キャビティチューニング、そして多くの場合、スピンフォトロンインタフェースのためのIII-V半導体やダイヤモンドカラーセンターのような機能性材料との不均一な統合の必要性など、多くの新しいアプローチを刺激する。ここでは、上記要件を満たす2つの異種光学材料間の選択波長で微調整可能なPhCキャビティを生成する概念を紹介する。この空洞は、誘電体格子ミラーとアクティブチューニング機能からなる容易な加工材料の上に、簡単な導波路ジオメトリでハード・トゥ・プロセス材料をスタンプして形成されている。キャビティ結合ダイヤモンドカラーセンタの配列に基づく多重化量子リピータの特に困難な設計問題に対する我々の概念をシミュレートし、理論的に計算された未負荷品質係数10^6$、モードボリュームが1.2(\lambda/n_{eff})^3$、蛍光光子の合計60パーセント以上を維持した。さらに、これらのハイブリッドダイヤモンドキャビティの低消費電力圧電チューニング法を導入し、光共鳴シフトを最大760GHzまでシミュレーションし、キャビティチューニングとは無関係に5GHzのカラー中心蛍光チューニングを行う。これらの結果は、より大規模なシステム互換設計への統合フォトニックキャビティの動機付けとなる。 Nanophotonic resonators are central to numerous applications, from efficient spin-photon interfaces to laser oscillators and precision sensing. A leading approach consists of photonic crystal (PhC) cavities, which have been realized in a wide range of dielectric materials. However, translating proof-of-concept devices into a functional system entails a number of additional challenges, inspiring new approaches that combine: resonators with wavelength-scale confinement and high quality factors; scalable integration with integrated circuits and photonic circuits; electrical or mechanical cavity tuning; and, in many cases, a need for heterogeneous integration with functional materials such as III-V semiconductors or diamond color centers for spin-photon interfaces. Here we introduce a concept that generates a finely tunable PhC cavity at a select wavelength between two heterogeneous optical materials whose properties satisfy the above requirements. The cavity is formed by stamping a hard-to-process material with simple waveguide geometries on top of an easy-to-process material consisting of dielectric grating mirrors and active tuning capability. We simulate our concept for the particularly challenging design problem of multiplexed quantum repeaters based on arrays of cavity-coupled diamond color centers, achieving theoretically calculated unloaded quality factors of $10^6$, mode volumes as small as $1.2(\lambda/n_{eff})^3$, and maintaining >60 percent total on-chip collection efficiency of fluorescent photons. We further introduce a method of low-power piezoelectric tuning of these hybrid diamond cavities, simulating optical resonance shifts up to ~760 GHz and color center fluorescence tuning of 5 GHz independent of cavity tuning. These results will motivate integrated photonic cavities toward larger scale systems-compatible designs.	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# 量子近似最適化アルゴリズムと量子拡張マルコフ連鎖モンテカルロ:4DVARにおけるデータ同化へのハイブリッドアプローチ Quantum Approximate Optimization Algorithm and Quantum-enhanced Markov Chain Monte Carlo: A Hybrid Approach to Data Assimilation in 4DVAR ( http://arxiv.org/abs/2410.03853v1 ) ライセンス: Link先を確認	Abhiram Sripat,	(参考訳) 本稿では,量子近似最適化アルゴリズム (QAOA) と量子拡張マルコフ連鎖モンテカルロ (QMCMC) を統合し, 4次元変動データ同化法 (4DVAR) の計算課題に対処するハイブリッド量子古典的フレームワークを提案する。 4DVARは、数値的な天気予報に広く用いられているが、高次元非線形システムの非効率性に悩まされている。提案手法である量子変分粒子フィルタ(QVPF)は,QAOAを用いて粒子提案を最適化し,QMCMCを用いて効率よく粒子重みを計算し,再サンプリングを行い,計算負荷を低減しながら収束を加速する。 QVPFフレームワークは、正確な状態推定に必要な粒子数を最小化し、複雑な力学を持つシステムの効率を向上させることで、次元性の呪いに対処する。ハイブリッドモデルは量子アルゴリズムを変分粒子フィルタに統合することで精度を高め、特に気候モデリング、宇宙天気予報、防衛への応用に適している。予測モデルにおいて前例のない解決を達成できる可能性は、高解像度の予測に依存するセクターを変革する可能性がある。本稿では,アルゴリズムの実装とハードウェア要件に関する議論とともに,このアプローチの数学的基礎について述べる。初期の結果は、このハイブリッドフレームワークがデータ同化を大幅に改善し、短期量子デバイスへの将来の実装は、スケールアップの実践的な経路を提供することを示唆している。この研究は、量子コンピューティングが大規模データ同化におけるより正確で計算可能な方法の必要性にどのように対処できるかを示す。 We propose a novel hybrid quantum-classical framework that integrates the Quantum Approximate Optimization Algorithm (QAOA) and Quantum-enhanced Markov Chain Monte Carlo (QMCMC) with variational particle filters to tackle the computational challenges in Four-Dimensional Variational Data Assimilation (4DVAR). 4DVAR, widely used in numerical weather prediction, suffers from inefficiencies in high-dimensional, non-linear systems. Our approach, the Quantum Variational Particle Filter (QVPF), uses QAOA to optimize particle proposals and QMCMC to efficiently compute particle weights and resample, accelerating convergence while reducing the computational load. The QVPF framework addresses the curse of dimensionality by minimizing the number of particles required for accurate state estimation, thus improving efficiency in systems with complex dynamics. The hybrid model offers enhanced accuracy by integrating quantum algorithms into the variational particle filter, making it particularly suited for applications in climate modeling, space weather prediction, and defense. The potential for achieving unprecedented resolution in predictive models could transform sectors that rely on high-resolution forecasting. We present the mathematical foundations of the approach, along with discussions on algorithmic implementation and hardware requirements. Early results suggest that this hybrid framework could significantly improve data assimilation, with future implementations on near-term quantum devices offering a practical pathway for scaling up. This work demonstrates how quantum computing can address the growing need for more accurate and computationally feasible methods in large-scale data assimilation.	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# NISQ量子デバイスのためのオープン量子システムのトロタレスシミュレーション Trotterless Simulation of Open Quantum Systems for NISQ Quantum Devices ( http://arxiv.org/abs/2410.03854v1 ) ライセンス: Link先を確認	Colin Burdine, Enrique P. Blair,	(参考訳) 量子システムのシミュレーションは、NISQ(ノイズの多い中間量子スケール)コンピューティングデバイスのフラッグシップアプリケーションの一つである。 NISQハードウェアでは、オープン量子システムのリッチで非ユニタリなダイナミクスを効果的にシミュレートすることが困難である。オープン量子系の現在のシミュレーション手法では、時間ステップのトロッター積公式(「トロッター化」)が使われており、シミュレーション時間やシステム次元についてはあまりスケールできない。本稿では,Kraus演算子系列表現の導出に基づく新しいシミュレーション手法を提案する。我々は,この手法が時間に依存しない深さの回路を生成するオープン量子系のクラスを同定する。 The simulation of quantum systems is one of the flagship applications of near-term NISQ (noisy intermediate-scale quantum) computing devices. Efficiently simulating the rich, non-unitary dynamics of open quantum systems remains challenging on NISQ hardware. Current simulation methods for open quantum systems employ time-stepped Trotter product formulas ("Trotterization") which can scale poorly with respect to the simulation time and system dimension. Here, we propose a new simulation method based on the derivation of a Kraus operator series representation of the system. We identify a class of open quantum systems for which this method produces circuits of time-independent depth, which may serve as a desirable alternative to Trotterization, especially on NISQ devices.	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# フェデレーションラーニングにおけるグループフェアネスに関する調査--課題,解決の分類,今後の研究への方向性 A Survey on Group Fairness in Federated Learning: Challenges, Taxonomy of Solutions and Directions for Future Research ( http://arxiv.org/abs/2410.03855v1 ) ライセンス: Link先を確認	Teresa Salazar, Helder Araújo, Alberto Cano, Pedro Henriques Abreu,	(参考訳) 機械学習におけるグループフェアネス(英: Group Fairness)は、人種や性別などのセンシティブな属性によって定義された異なるグループ間で平等な結果を達成することに焦点を当てた研究分野である。フェデレーション学習(Federated Learning)とは、複数のデバイスや組織にまたがる機械学習モデルを、生データを共有せずにトレーニングするための分散アプローチである。連合学習とグループフェアネスの交わりは大きな関心を集めており、この問題に特化して47の研究研究が進められている。しかし、連合学習におけるグループフェアネスの総合的な調査は行われていない。本研究は、この分野における重要な課題に対処し、関連研究をレビューする、このトピックに関する詳細な調査を提示する。データパーティショニング、ロケーション、適用戦略といった重要な基準に基づいて、これらのアプローチの新しい分類法を作成します。さらに、この問題に関するより広範な懸念について検討し、異なるアプローチが様々なセンシティブなグループとその交点の複雑さをどのように扱うかを検討する。最後に、現在の研究でよく使われるデータセットとアプリケーションについてレビューする。我々は、今後の研究の鍵となる領域を強調し、連合システムにおけるグループフェアネスの実現の複雑さに対処する方法の必要性を強調した。 Group fairness in machine learning is a critical area of research focused on achieving equitable outcomes across different groups defined by sensitive attributes such as race or gender. Federated learning, a decentralized approach to training machine learning models across multiple devices or organizations without sharing raw data, amplifies the need for fairness due to the heterogeneous data distributions across clients, which can exacerbate biases. The intersection of federated learning and group fairness has attracted significant interest, with 47 research works specifically dedicated to addressing this issue. However, no dedicated survey has focused comprehensively on group fairness in federated learning. In this work, we present an in-depth survey on this topic, addressing the critical challenges and reviewing related works in the field. We create a novel taxonomy of these approaches based on key criteria such as data partitioning, location, and applied strategies. Additionally, we explore broader concerns related to this problem and investigate how different approaches handle the complexities of various sensitive groups and their intersections. Finally, we review the datasets and applications commonly used in current research. We conclude by highlighting key areas for future research, emphasizing the need for more methods to address the complexities of achieving group fairness in federated systems.	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# 遅延空間変数を用いた機械生成長大コンテンツの検出 Detecting Machine-Generated Long-Form Content with Latent-Space Variables ( http://arxiv.org/abs/2410.03856v1 ) ライセンス: Link先を確認	Yufei Tian, Zeyu Pan, Nanyun Peng,	(参考訳) 大規模言語モデル(LLMs)による、流動的な長文を生成する能力の増大は、表現の信頼性と信頼性を確保する上で不可欠である、機械出力と人書きテキストを区別する上で、新たな課題を提起している。既存のゼロショット検出器は、異なるプロンプトやデコード戦略、敵攻撃を含む現実世界のドメインシフトに弱いトークンレベルの分散に主にフォーカスする。本研究では,イベントのシーケンスや人文から派生したトピックに関する潜在空間モデルをトレーニングすることにより,機械と人文を検知する重要な要因として,イベント遷移などの抽象的要素を組み込んだより堅牢な手法を提案する。 3つの異なる領域において、もともとトークンレベルの人間のテキストとは分離できない機械生成テキストは、我々の潜在空間モデルとよりよく区別され、DerctionGPTのような強力なベースラインよりも31%改善される。我々の分析は、人間とは異なり、GPT-4のような現代のLCMはイベントトリガと遷移を異なる方法で生成し、本手法が機械生成テキストを堅牢に検出するのに役立つ本質的な相違を明らかにした。 The increasing capability of large language models (LLMs) to generate fluent long-form texts is presenting new challenges in distinguishing machine-generated outputs from human-written ones, which is crucial for ensuring authenticity and trustworthiness of expressions. Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts, including different prompting and decoding strategies, and adversarial attacks. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts by training a latent-space model on sequences of events or topics derived from human-written texts. In three different domains, machine-generated texts, which are originally inseparable from human texts on the token level, can be better distinguished with our latent-space model, leading to a 31% improvement over strong baselines such as DetectGPT. Our analysis further reveals that, unlike humans, modern LLMs like GPT-4 generate event triggers and their transitions differently, an inherent disparity that helps our method to robustly detect machine-generated texts.	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# 教師なしの事前学習:ビデオからカテゴリの優先順位を発見する Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos ( http://arxiv.org/abs/2410.03858v1 ) ライセンス: Link先を確認	Ziyu Wang, Shuangpeng Han, Mike Zheng Shou, Mengmi Zhang,	(参考訳) 前者は、システムに関する信念や仮定の集まりを表し、推論と意思決定を支援する。本稿では、AIモデルが自己教師型でビデオからオブジェクトをアニメーションする事前ポーズを学習する、ポーズ推定における教師なし事前学習の課題を紹介する。これらのビデオは、さまざまなアクションを実行し、キーポイントと接続性に関する重要な情報を提供する。前者はポーズ推定に有効であるが、それを取得することは困難である。本稿では,PPL(Pose Prior Learner)という新しい手法を提案する。 PPLは階層記憶を用いて、原型的なポーズの合成部分を保存する。これによりテンプレート変換と画像再構成によりポーズ推定精度が向上する。 PPLは、追加の人間のアノテーションや介入なしに有意義なポーズ前処理を学習し、人間と動物のポーズ推定データセットの競争ベースラインを上回っている。特に,本実験の結果から,学習済み先行画像のポーズ推定におけるPPLの有効性が明らかとなった。反復推論により、PPLは推定されたポーズを洗練させ、メモリに格納された任意の原型ポーズに回帰させる。私たちのコード、モデル、データは公開されます。 A prior represents a set of beliefs or assumptions about a system, aiding inference and decision-making. In this work, we introduce the challenge of unsupervised prior learning in pose estimation, where AI models learn pose priors of animate objects from videos in a self-supervised manner. These videos present objects performing various actions, providing crucial information about their keypoints and connectivity. While priors are effective in pose estimation, acquiring them can be difficult. We propose a novel method, named Pose Prior Learner (PPL), to learn general pose priors applicable to any object category. PPL uses a hierarchical memory to store compositional parts of prototypical poses, from which we distill a general pose prior. This prior enhances pose estimation accuracy through template transformation and image reconstruction. PPL learns meaningful pose priors without any additional human annotations or interventions, outperforming competitive baselines on both human and animal pose estimation datasets. Notably, our experimental results reveal the effectiveness of PPL using learnt priors for pose estimation on occluded images. Through iterative inference, PPL leverages priors to refine estimated poses, regressing them to any prototypical poses stored in memory. Our code, model, and data will be publicly available.	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# SWE-bench Multimodal: AIシステムはビジュアルソフトウェアドメインに一般化するのか? SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? ( http://arxiv.org/abs/2410.03859v1 ) ライセンス: Link先を確認	John Yang, Carlos E. Jimenez, Alex L. Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik R. Narasimhan, Diyi Yang, Sida I. Wang, Ofir Press,	(参考訳) ソフトウェアエンジニアリングのための自律システムは、バグを修正し、機能を開発することができる。これらのシステムは、GitHubリポジトリからソフトウェア問題を解決する能力を評価するSWE-bench(Jimenez et al , 2024a)で一般的に評価されている。しかし、SWE-benchはPythonリポジトリのみを使用し、主にテキストとして提示される問題ステートメントと、画像のような視覚的要素が欠如している。この限定的なカバレッジは、異なるプログラミング言語とパラダイムを使用する非表現のソフトウェアエンジニアリングドメイン(例えば、フロントエンド、ゲーム開発、DevOps)で、既存のシステムがどのように機能するか、という調査を動機付けます。そこで本稿では,SWE-bench Multimodal (SWE-bench M) を提案する。 SWE-bench Mは、Webインターフェース設計、ダイアグラム、データ視覚化、シンタックスハイライト、インタラクティブマッピングに使用される17のJavaScriptライブラリから収集された617のタスクインスタンスを特徴とする。各SWE-bench Mタスクインスタンスは、その問題ステートメントまたはユニットテストに少なくとも1つのイメージを含む。分析の結果,SWE-benchシステムはSWE-bench Mと競合し,視覚的問題解決や言語間の一般化に限界があることが判明した。最後に、SWE-agentの柔軟な言語非依存機能により、SWE-bench Mにおける代替案を大幅に上回り、タスクインスタンスの12%を次のベストシステムの6%と比較して解決できることを示す。 Autonomous systems for software engineering are now capable of fixing bugs and developing features. These systems are commonly evaluated on SWE-bench (Jimenez et al., 2024a), which assesses their ability to solve software issues from GitHub repositories. However, SWE-bench uses only Python repositories, with problem statements presented predominantly as text and lacking visual elements such as images. This limited coverage motivates our inquiry into how existing systems might perform on unrepresented software engineering domains (e.g., front-end, game development, DevOps), which use different programming languages and paradigms. Therefore, we propose SWE-bench Multimodal (SWE-bench M), to evaluate systems on their ability to fix bugs in visual, user-facing JavaScript software. SWE-bench M features 617 task instances collected from 17 JavaScript libraries used for web interface design, diagramming, data visualization, syntax highlighting, and interactive mapping. Each SWE-bench M task instance contains at least one image in its problem statement or unit tests. Our analysis finds that top-performing SWE-bench systems struggle with SWE-bench M, revealing limitations in visual problem-solving and cross-language generalization. Lastly, we show that SWE-agent's flexible language-agnostic features enable it to substantially outperform alternatives on SWE-bench M, resolving 12% of task instances compared to 6% for the next best system.	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# MDMP:不確実性を伴う教師あり動作予測のための多モード拡散 MDMP: Multi-modal Diffusion for supervised Motion Predictions with uncertainty ( http://arxiv.org/abs/2410.03860v1 ) ライセンス: Link先を確認	Leo Bringer, Joey Wilson, Kira Barton, Maani Ghaffari,	(参考訳) 本稿では,動作予測(MDMP)のための多モード拡散モデルを提案する。既存の動き予測や動き生成の方法は、特に長い時間にわたって、精度や制御の限界に直面した、前の動きやテキストのプロンプトにのみ依存する。提案手法のマルチモーダルな性質は人間の動きの文脈的理解を促進させる一方,グラフベースのトランスフォーマー・フレームワークは空間的・時間的動きのダイナミクスを効果的に捉えている。その結果、我々のモデルは、長期動作を正確に予測する既存の生成技術より一貫して優れていた。さらに,拡散モデルの様々な予測モードを捉える能力を活用することで,不確実性を推定し,各関節の信頼度が変化する領域を組み込むことで,人間とロボットの相互作用における空間的認識を著しく向上する。 This paper introduces a Multi-modal Diffusion model for Motion Prediction (MDMP) that integrates and synchronizes skeletal data and textual descriptions of actions to generate refined long-term motion predictions with quantifiable uncertainty. Existing methods for motion forecasting or motion generation rely solely on either prior motions or text prompts, facing limitations with precision or control, particularly over extended durations. The multi-modal nature of our approach enhances the contextual understanding of human motion, while our graph-based transformer framework effectively capture both spatial and temporal motion dynamics. As a result, our model consistently outperforms existing generative techniques in accurately predicting long-term motions. Additionally, by leveraging diffusion models' ability to capture different modes of prediction, we estimate uncertainty, significantly improving spatial awareness in human-robot interactions by incorporating zones of presence with varying confidence levels for each body joint.	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# 多視点微分レンダリングによる単眼深度マップの微細化 Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering ( http://arxiv.org/abs/2410.03861v1 ) ライセンス: Link先を確認	Laura Fink, Linus Franke, Joachim Keinert, Marc Stamminger,	(参考訳) 画像のピクセルごとの深度を正確に再現することは、コンピュータグラフィックス、コンピュータビジョン、ロボット工学における多くのタスクにとって不可欠である。本稿では,複数の画像から一貫した詳細な深度マップを生成するための新しい手法を提案する。我々は、位相的に完全であるが計量的に不正確な深度マップを生成し、微分可能なレンダラーに基づく2段階最適化プロセスでそれらを洗練する単分子深度推定の進歩を活用する。単分子深度マップを入力として、まずこのマップを構造から移動して絶対距離まで拡大し、深さを三角形表面メッシュに変換する。次に、この深度メッシュを局所的に最適化し、光度と幾何の整合性を強制する。提案手法は, 室内の難易度の高い高精細度深度マップを作成でき, 最先端の深度再構築手法よりも優れていることを示す。プロジェクトの概要と補足資料はhttps://lorafib.github.io/ref_depth/.com/で見ることができる。 The accurate reconstruction of per-pixel depth for an image is vital for many tasks in computer graphics, computer vision, and robotics. In this paper, we present a novel approach to generate view consistent and detailed depth maps from a number of posed images. We leverage advances in monocular depth estimation, which generate topologically complete, but metrically inaccurate depth maps and refine them in a two-stage optimization process based on a differentiable renderer. Taking the monocular depth map as input, we first scale this map to absolute distances based on structure-from-motion and transform the depths to a triangle surface mesh. We then refine this depth mesh in a local optimization, enforcing photometric and geometric consistency. Our evaluation shows that our method is able to generate dense, detailed, high-quality depth maps, also in challenging indoor scenarios, and outperforms state-of-the-art depth reconstruction approaches. Overview and supplemental material of this project can be found at https://lorafib.github.io/ref_depth/.	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# レンズ空間密度による解像度変化によるマッパーのロバスト性向上 Improving Mapper's Robustness by Varying Resolution According to Lens-Space Density ( http://arxiv.org/abs/2410.03862v1 ) ライセンス: Link先を確認	Kaleb D. Ruscitti, Leland McInnes,	(参考訳) 本研究では,セマンティック空間にまたがる単一解像度スケールの仮定を除去し,パラメータの変化による結果のロバスト性を改善するMapperアルゴリズムの改良を提案する。これによりパラメータの選択が簡単になり、特にMapperで使用されるMorse関数$f$で、高度に可変なローカル密度を持つデータセットに対してである。これは、この密度をMapperのカバーの選択に組み込むことによって達成される。さらに、いくつかの自然仮説で表すと、Mapperが出力するグラフはデータのRip複合体のReebグラフとのボトルネック距離に収束するが、通常のMapperの表紙よりも位相的特徴を捉えることができる。最後に,実装の詳細と計算実験の結果について述べる。また、関連する参照実装も提供します。 We propose an improvement to the Mapper algorithm that removes the assumption of a single resolution scale across semantic space, and improves the robustness of the results under change of parameters. This eases parameter selection, especially for datasets with highly variable local density in the Morse function $f$ used for Mapper. This is achieved by incorporating this density into the choice of cover for Mapper. Furthermore, we prove that for covers with some natural hypotheses, the graph output by Mapper still converges in bottleneck distance to the Reeb graph of the Rips complex of the data, but captures more topological features than when using the usual Mapper cover. Finally, we discuss implementation details, and include the results of computational experiments. We also provide an accompanying reference implementation.	翻訳日:2024-11-02 15:50:43 公開日:2024-10-04
# 世代別遺伝的アルゴリズムの選択再考による組合せ最適化問題の解法--組換えのための上界に基づく親選択戦略 Rethinking Selection in Generational Genetic Algorithms to Solve Combinatorial Optimization Problems: An Upper Bound-based Parent Selection Strategy for Recombination ( http://arxiv.org/abs/2410.03863v1 ) ライセンス: Link先を確認	Prashant Sankaran, Katie McConky,	(参考訳) 世代別GAにおける親選択の確率的選択戦略は遺伝的多様性の構築と探索の維持に役立つが、親選択のための情報的決定を行うプロセスによって得られる知識を活用する可能性を無視しており、多くの場合、大きな、挑戦的な最適化問題の非効率な探索につながる。本研究では, NP-ハード組合せ最適化問題を解くために, アッパーバウンドに基づくペアレント選択 (UBS) と呼ばれる世代GA設定において, リコンビネーションのための決定論的親選択戦略を提案する。具体的には、UBS戦略の一環として、MABフレームワークと修正 UCB1アルゴリズムを用いて親選択問題を定式化し、探索と利用を管理する。さらに,検索の進歩に関する知識を世代間で伝達し,検索を高速化する,ユニークな類似性に基づくアプローチを提案する。従来の確率的選択戦略と比較して,提案手法の有効性を示すため,チームオリエンテーリングと2次割り当ての2つのNPハード組合せ最適化問題を実験的に検討した。具体的には、まず、UBSの可能性と、関連するすべての選択戦略に最適な構成を決定するために、キャラクタリゼーション研究を行う。次に、これらの最適構成を用いた実験を比較研究の一環として実施する。評価研究の結果、ほとんどの場合、UBSは世代間での個体数に大きなバリエーションを好んでいることが明らかとなった。次に, 実験条件下でのNP-ハード組合せ最適化問題に対して, UBSは従来の確率的選択法よりも高速に, 高品質な解を探索できることを示した。 Existing stochastic selection strategies for parent selection in generational GA help build genetic diversity and sustain exploration; however, it ignores the possibility of exploiting knowledge gained by the process to make informed decisions for parent selection, which can often lead to an inefficient search for large, challenging optimization problems. This work proposes a deterministic parent selection strategy for recombination in a generational GA setting called Upper Bound-based Parent Selection (UBS) to solve NP-hard combinatorial optimization problems. Specifically, as part of the UBS strategy, we formulate the parent selection problem using the MAB framework and a modified UCB1 algorithm to manage exploration and exploitation. Further, we provided a unique similarity-based approach for transferring knowledge of the search progress between generations to accelerate the search. To demonstrate the effectiveness of the proposed UBS strategy in comparison to traditional stochastic selection strategies, we conduct experimental studies on two NP-hard combinatorial optimization problems: team orienteering and quadratic assignment. Specifically, we first perform a characterization study to determine the potential of UBS and the best configuration for all the selection strategies involved. Next, we run experiments using these best configurations as part of the comparison study. The results from the characterization studies reveal that UBS, in most cases, favors larger variations among the population between generations. Next, the comparison studies reveal that UBS can effectively search for high-quality solutions faster than traditional stochastic selection strategies on challenging NP-hard combinatorial optimization problems under given experimental conditions.	翻訳日:2024-11-02 15:40:54 公開日:2024-10-04
# DOTS:最適推論軌道探索によるLLMの動的推論学習 DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search ( http://arxiv.org/abs/2410.03864v1 ) ライセンス: Link先を確認	Murong Yue, Wenlin Yao, Haitao Mi, Dian Yu, Ziyu Yao, Dong Yu,	(参考訳) 近年,大規模言語モデル(LLM)の推論能力の向上が注目されている。これまでの研究では、ステップバイステップ思考、回答前の反映、プログラムの解決、それらの組み合わせなど、LLMの推論(「推論行動」と呼ばれる)を支援する様々な促進戦略の有効性が実証されてきた。しかしながら、これらの手法は、各質問の特徴やタスク解決 LLM の機能を考慮することなく、全ての質問に対して静的で事前定義された推論動作を均一に適用することが多い。本稿では,各質問の具体的特徴やタスク解決 LLM 固有の能力に合わせて,最適推論軌道探索により LLM を動的に推論できるアプローチである DOTS を提案する。私たちのアプローチには3つの重要なステップがあります。一様々な推論行動軌道に構成できる原子推論行動モジュールを定義すること。二特定課題解決 LLM の反復探索及び評価により、各訓練質問に対する最適な行動軌跡を求めること。三収集した最適軌跡を用いて、LCMを訓練し、目に見えない疑問の軌跡を立案すること。特に,タスク解決 LLM を指導するプランナーとして外部 LLM を微調整する,あるいはタスク解決 LLM を直接微調整する,という2つの学習パラダイムを提案する。 8つの推論タスクに対する実験により,静的推論手法とバニラ命令チューニング手法を一貫して上回る結果が得られた。さらなる分析により,LLMは問題複雑性に基づいて計算を調整し,より深い思考と難解な問題への推論を可能にすることが明らかとなった。 Enhancing the capability of large language models (LLMs) in reasoning has gained significant attention in recent years. Previous studies have demonstrated the effectiveness of various prompting strategies in aiding LLMs in reasoning (called "reasoning actions"), such as step-by-step thinking, reflecting before answering, solving with programs, and their combinations. However, these approaches often applied static, predefined reasoning actions uniformly to all questions, without considering the specific characteristics of each question or the capability of the task-solving LLM. In this paper, we propose DOTS, an approach enabling LLMs to reason dynamically via optimal reasoning trajectory search, tailored to the specific characteristics of each question and the inherent capability of the task-solving LLM. Our approach involves three key steps: i) defining atomic reasoning action modules that can be composed into various reasoning action trajectories; ii) searching for the optimal action trajectory for each training question through iterative exploration and evaluation for the specific task-solving LLM; and iii) using the collected optimal trajectories to train an LLM to plan for the reasoning trajectories of unseen questions. In particular, we propose two learning paradigms, i.e., fine-tuning an external LLM as a planner to guide the task-solving LLM, or directly fine-tuning the task-solving LLM with an internalized capability for reasoning actions planning. Our experiments across eight reasoning tasks show that our method consistently outperforms static reasoning techniques and the vanilla instruction tuning approach. Further analysis reveals that our method enables LLMs to adjust their computation based on problem complexity, allocating deeper thinking and reasoning to harder problems.	翻訳日:2024-11-02 15:40:54 公開日:2024-10-04
# グラフ指向データベースによるドメイン特化言語モデルの強化:パフォーマンスとモデルの保守におけるパラダイムシフト Empowering Domain-Specific Language Models with Graph-Oriented Databases: A Paradigm Shift in Performance and Model Maintenance ( http://arxiv.org/abs/2410.03867v1 ) ライセンス: Link先を確認	Ricardo Di Pasquale, Soledad Represa,	(参考訳) データに支配される時代において、ドメイン特化言語の管理と利用は、様々なアプリケーションドメイン、特に業界特化要件を持つ領域において重要な課題として現れてきた。私たちの作業は、特定のアプリケーションドメインに固有の大量の短いテキスト文書を効果的に管理し、処理する必要があることによるものです。ドメイン固有の知識と専門知識を活用することで、これらのドメイン内の事実データを形作り、エンドユーザによる活用と理解を促進することを目的としています。我々の方法論の中心は、グラフ指向データベースとドメイン固有の言語モデルの統合であり、ターゲットとするドメイン内のテキストデータのシームレスな処理、分析、利用を容易にする。我々の研究は、ドメイン固有言語モデルとグラフ指向データベースのパートナーシップの変革の可能性を強調します。この協力は、メトリクスの使用、レイテンシの問題の緩和、説明可能性の向上、デバッグの強化、全体的なモデルパフォーマンスの向上を研究者とエンジニアが支援することを目的としている。今後は、AIエンジニアのガイドとして、グラフ指向のデータベースとともにドメイン固有の言語モデルの実装に関する貴重な洞察を提供し、また、この種のプロダクトのフルライフサイクルのメンテナンスにおいて、貴重な経験を提供することを期待します。 In an era dominated by data, the management and utilization of domain-specific language have emerged as critical challenges in various application domains, particularly those with industry-specific requirements. Our work is driven by the need to effectively manage and process large volumes of short text documents inherent in specific application domains. By leveraging domain-specific knowledge and expertise, our approach aims to shape factual data within these domains, thereby facilitating enhanced utilization and understanding by end-users. Central to our methodology is the integration of domain-specific language models with graph-oriented databases, facilitating seamless processing, analysis, and utilization of textual data within targeted domains. Our work underscores the transformative potential of the partnership of domain-specific language models and graph-oriented databases. This cooperation aims to assist researchers and engineers in metric usage, mitigation of latency issues, boosting explainability, enhancing debug and improving overall model performance. Moving forward, we envision our work as a guide AI engineers, providing valuable insights for the implementation of domain-specific language models in conjunction with graph-oriented databases, and additionally provide valuable experience in full-life cycle maintenance of this kind of products.	翻訳日:2024-11-02 15:40:54 公開日:2024-10-04
# 言語モデルは個人主義的人的価値と嗜好について理にかなっているか? Can Language Models Reason about Individualistic Human Values and Preferences? ( http://arxiv.org/abs/2410.03868v1 ) ライセンス: Link先を確認	Liwei Jiang, Taylor Sorensen, Sydney Levine, Yejin Choi,	(参考訳) 近年、多元的アライメントを求める声は、AIシステムはすべての人々の多様なニーズに対処する必要があることを強調している。しかし、この分野の取り組みは、しばしば人々を、あらかじめ特定された多様性を定義する次元(例えば、人口統計、個人性、コミュニケーションスタイル)の固定されたバケツに分類し、スムーズなアウトや、個人主義的なバリエーションの豊富なスペクトルをステレオタイピングするリスクを負う。個人性を尊重する多様性の真正表現を実現するために,個人主義的アライメントを提案する。個人主義的アライメントは様々な形態をとることができるが、本稿では、影響力のある世界価値調査(WVS)から変換されたデータセットであるIndieValueCatalogを導入し、個人主義的価値推論の特定の課題について言語モデル(LM)を研究する。具体的には、個人の値表現ステートメントのサンプルを考えると、モデルは新しいケースにおける価値判断を予測することを任務とする。 IndieValueCatalogでは、アキュラシーを伴う個人主義的人間の価値を推論する、フロンティアLMの能力の限界を55%から65%程度しか明らかにしていない。さらに, 個人主義的価値の正確な記述は, 人口統計情報のみでは近似できないことが明らかとなった。また,提案した値不等式指数({\sigma}INEQUITY)を用いて,大域的個人主義的価値を推定する際のLMの部分性についても検討した。最後に、IndieValueCatalogを使用して、IndieValueReasoner(IndieValueReasoner)のシリーズをトレーニングし、モデルの個人的価値推論能力を向上し、グローバルな人間の価値に新しいパターンとダイナミクスを明らかにする。個人主義的アライメントを進めるための今後の研究課題と機会を概説する。 Recent calls for pluralistic alignment emphasize that AI systems should address the diverse needs of all people. Yet, efforts in this space often require sorting people into fixed buckets of pre-specified diversity-defining dimensions (e.g., demographics, personalities, communication styles), risking smoothing out or even stereotyping the rich spectrum of individualistic variations. To achieve an authentic representation of diversity that respects individuality, we propose individualistic alignment. While individualistic alignment can take various forms, in this paper, we introduce IndieValueCatalog, a dataset transformed from the influential World Values Survey (WVS), to study language models (LMs) on the specific challenge of individualistic value reasoning. Specifically, given a sample of an individual's value-expressing statements, models are tasked with predicting their value judgments in novel cases. With IndieValueCatalog, we reveal critical limitations in frontier LMs' abilities to reason about individualistic human values with accuracies, only ranging between 55% to 65%. Moreover, our results highlight that a precise description of individualistic values cannot be approximated only via demographic information. We also identify a partiality of LMs in reasoning about global individualistic values, as measured by our proposed Value Inequity Index ({\sigma}INEQUITY). Finally, we train a series of Individualistic Value Reasoners (IndieValueReasoner) using IndieValueCatalog to enhance models' individualistic value reasoning capability, revealing new patterns and dynamics into global human values. We outline future research challenges and opportunities for advancing individualistic alignment.	翻訳日:2024-11-02 15:40:54 公開日:2024-10-04
# ステップ別編集による画像生成モデルのチェーン・オブ・ジェイルブレイク攻撃 Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step ( http://arxiv.org/abs/2410.03869v1 ) ライセンス: Link先を確認	Wenxuan Wang, Kuiyi Gao, Zihan Jia, Youliang Yuan, Jen-tse Huang, Qiuzhi Liu, Shuai Wang, Wenxiang Jiao, Zhaopeng Tu,	(参考訳) Stable DiffusionやDALL-E 3といったテキストベースの画像生成モデルは、コンテンツ作成やパブリッシュワークフローにおいて大きな可能性を秘めており、近年は注目されている。多様な鮮明な画像を生成できる優れた能力にもかかわらず、乱用、暴力、ポルノなどの有害なコンテンツの発生を防ぐために、かなりの努力が払われている。既存のモデルの安全性を評価するために、ステップバイステップの編集プロセスを通じて画像生成モデルを侵害する、Chain-of-Jailbreak (CoJ)アタックと呼ばれる新しいジェイルブレイク手法を導入する。具体的には、単一のプロンプトでセーフガードをバイパスできない悪意のあるクエリに対して、クエリを複数のサブクエリに意図的に分解する。画像生成モデルは、これらのサブクエリに基づいて画像を生成し、反復的に編集するように促される。本手法の有効性を評価するため,9つの安全シナリオ,3種類の編集操作,3種類の編集要素を含む包括的データセットであるCoJ-Benchを構築した。 GPT-4V, GPT-4o, Gemini 1.5, Gemini 1.5 Proの4つの広範に利用されている画像生成サービスの実験では、我々のCoJ攻撃手法が60%以上のケースでモデルの保護を回避できることが実証された。さらに,これらのモデルのCoJ攻撃に対する安全性を高めるために,CoJ攻撃の95%以上を効果的に防御できるプロンプトベースの方法であるThink Twice Promptingを提案する。 AI安全研究を促進するために、データセットとコードを公開しています。 Text-based image generation models, such as Stable Diffusion and DALL-E 3, hold significant potential in content creation and publishing workflows, making them the focus in recent years. Despite their remarkable capability to generate diverse and vivid images, considerable efforts are being made to prevent the generation of harmful content, such as abusive, violent, or pornographic material. To assess the safety of existing models, we introduce a novel jailbreaking method called Chain-of-Jailbreak (CoJ) attack, which compromises image generation models through a step-by-step editing process. Specifically, for malicious queries that cannot bypass the safeguards with a single prompt, we intentionally decompose the query into multiple sub-queries. The image generation models are then prompted to generate and iteratively edit images based on these sub-queries. To evaluate the effectiveness of our CoJ attack method, we constructed a comprehensive dataset, CoJ-Bench, encompassing nine safety scenarios, three types of editing operations, and three editing elements. Experiments on four widely-used image generation services provided by GPT-4V, GPT-4o, Gemini 1.5 and Gemini 1.5 Pro, demonstrate that our CoJ attack method can successfully bypass the safeguards of models for over 60% cases, which significantly outperforms other jailbreaking methods (i.e., 14%). Further, to enhance these models' safety against our CoJ attack method, we also propose an effective prompting-based method, Think Twice Prompting, that can successfully defend over 95% of CoJ attack. We release our dataset and code to facilitate the AI safety research.	翻訳日:2024-11-02 15:40:54 公開日:2024-10-04
# カメラからペルソナへ:人間-ロボット対話における自己異型性の調査とモデル化 From Pixels to Personas: Investigating and Modeling Self-Anthropomorphism in Human-Robot Dialogues ( http://arxiv.org/abs/2410.03870v1 ) ライセンス: Link先を確認	Yu Li, Devamanyu Hazarika, Di Jin, Julia Hirschberg, Yang Liu,	(参考訳) ロボットにおける自己人類同型は、好みや感情を表現するなど、対話における人間のような特徴の表示を通じて、自分自身を表わす。本研究は,対話システムにおける自己擬人化応答と非自己擬人化応答の対比を概説し,対話データセット内の自己擬人化表現を体系的に分析する。これら2種類の応答に有意な差異を示し、あるタイプから別のタイプへの遷移を提案する。 Pix2Personaは、倫理的かつ魅力的なAIシステムを様々な実施形態で開発することを目的とした、新しいデータセットである。このデータセットは、既存のコーパスからのオリジナルの対話を保存し、ペア化された応答で強化する。我々の研究は、これまで未調査だったボット応答の新しいカテゴリを明らかにするだけでなく、AIシステムにおける自己人類同型レベルを動的に調整し、倫理的基準やユーザ期待に合わせるための、将来の研究の基盤となる。 Self-anthropomorphism in robots manifests itself through their display of human-like characteristics in dialogue, such as expressing preferences and emotions. Our study systematically analyzes self-anthropomorphic expression within various dialogue datasets, outlining the contrasts between self-anthropomorphic and non-self-anthropomorphic responses in dialogue systems. We show significant differences in these two types of responses and propose transitioning from one type to the other. We also introduce Pix2Persona, a novel dataset aimed at developing ethical and engaging AI systems in various embodiments. This dataset preserves the original dialogues from existing corpora and enhances them with paired responses: self-anthropomorphic and non-self-anthropomorphic for each original bot response. Our work not only uncovers a new category of bot responses that were previously under-explored but also lays the groundwork for future studies about dynamically adjusting self-anthropomorphism levels in AI systems to align with ethical standards and user expectations.	翻訳日:2024-11-02 15:40:54 公開日:2024-10-04
# SPARTUN3D:大規模言語モデルにおける3次元世界の空間的理解 SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models ( http://arxiv.org/abs/2410.03878v1 ) ライセンス: Link先を確認	Yue Zhang, Zhiyang Xu, Ying Shen, Parisa Kordjamshidi, Lifu Huang,	(参考訳) 3次元世界を大規模言語モデル(3次元LLM)に統合することは、3次元シーン理解のための有望な研究方向である。しかし、現在の3DベースのLLMは、2つの重要な制限があるため、位置理解に不足している。 1) 既存の3Dデータセットは3Dシーンのグローバルな視点から構築され, 位置するコンテキストが欠如している。 2) 既存の3次元LLMのアーキテクチャは3次元シーンの空間表現と自然言語との明確な整合性を欠いており, 正確な空間推論を必要とするタスクにおける性能を制限している。 Spartun3Dと呼ばれる、様々な位置空間推論タスクを組み込んだスケーラブルな3Dデータセットを導入することで、これらの問題に対処する。さらに,既存の3次元空間アライメントモジュールをベースとしたSpartun3D-LLMを提案する。実験の結果,提案したデータセットとアライメントモジュールは,3次元LLMの位置空間的理解を著しく向上させることがわかった。 Integrating the 3D world into large language models (3D-based LLMs) has been a promising research direction for 3D scene understanding. However, current 3D-based LLMs fall short in situated understanding due to two key limitations: 1) existing 3D datasets are constructed from a global perspective of the 3D scenes and lack situated context. 2) the architectures of existing 3D-based LLMs lack explicit alignment between the spatial representations of 3D scenes and natural language, limiting their performance in tasks requiring precise spatial reasoning. We address these issues by introducing a scalable situated 3D dataset, named Spartun3D, that incorporates various situated spatial reasoning tasks. Furthermore, we propose Spartun3D-LLM, built on an existing 3D-based LLM but integrated with a novel situated spatial alignment module, aiming to enhance the alignment between 3D visual representations and their corresponding textual descriptions. Experimental results demonstrate that both our proposed dataset and alignment module significantly enhance the situated spatial understanding of 3D-based LLMs.	翻訳日:2024-11-02 15:40:54 公開日:2024-10-04
# DiSK:ノイズ低減のための簡易カルマンフィルタを用いた微分プライベート最適化 DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction ( http://arxiv.org/abs/2410.03883v1 ) ライセンス: Link先を確認	Xinwei Zhang, Zhiqi Bu, Borja Balle, Mingyi Hong, Meisam Razaviyayn, Vahab Mirrokni,	(参考訳) 差別化プライバシ(DP)は、個々のデータのプライバシを保護するための堅牢なフレームワークを提供する。現代の機械学習モデルの訓練にDPを利用するために、近年は微分プライベートオプティマイザが広く使われている。最適化器を民営化するための一般的なアプローチは、個々の勾配をクリップし、クリップされた勾配に十分な大きなノイズを加えることである。このアプローチは、微調整タスクや少数のトレーニングパラメータを持つタスクにおいて、プライベートでないタスクと同等のパフォーマンスを持つDPオプティマイザの開発につながった。しかし、これらのオプティマイザを大規模トレーニングに適用した場合、顕著な性能低下が観察される。この劣化はDPの維持に必要なノイズ注入に起因し、オプティマイザのダイナミクスを損なう。本稿では,DPオプティマイザの性能向上を目的とした新しいフレームワークであるDiSKを紹介する。 DiSKは、制御と信号処理から引き出されたカルマンフィルタリングを用いて、民営化勾配を効果的に識別し、徐々に洗練された勾配推定を生成する。大規模トレーニングの実用性を確保するため,Kalmanフィルタプロセスを簡素化し,メモリと計算要求を最小化する。我々は,DiSKの理論的プライバシー利用トレードオフ保証を確立し,DPSGDのような標準DPオプティマイザよりも,イテレーションの複雑さが上向きであることを示す。 CIFAR-100やImageNet-1kといったビジョンタスクやGLUE、E2E、DARTといった言語微調整タスクなど、さまざまなタスクにわたる広範な実験は、DiSKの有効性を検証する。その結果、DPオプティマイザの性能を大幅に向上する能力を示し、いくつかのベンチマークで同じプライバシー制約の下で最先端の結果を上回った。 Differential privacy (DP) offers a robust framework for safeguarding individual data privacy. To utilize DP in training modern machine learning models, differentially private optimizers have been widely used in recent years. A popular approach to privatize an optimizer is to clip the individual gradients and add sufficiently large noise to the clipped gradient. This approach led to the development of DP optimizers that have comparable performance with their non-private counterparts in fine-tuning tasks or in tasks with a small number of training parameters. However, a significant performance drop is observed when these optimizers are applied to large-scale training. This degradation stems from the substantial noise injection required to maintain DP, which disrupts the optimizer's dynamics. This paper introduces DiSK, a novel framework designed to significantly enhance the performance of DP optimizers. DiSK employs Kalman filtering, a technique drawn from control and signal processing, to effectively denoise privatized gradients and generate progressively refined gradient estimations. To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands. We establish theoretical privacy-utility trade-off guarantees for DiSK, and demonstrate provable improvements over standard DP optimizers like DPSGD in terms of iteration complexity upper-bound. Extensive experiments across diverse tasks, including vision tasks such as CIFAR-100 and ImageNet-1k and language fine-tuning tasks such as GLUE, E2E, and DART, validate the effectiveness of DiSK. The results showcase its ability to significantly improve the performance of DP optimizers, surpassing state-of-the-art results under the same privacy constraints on several benchmarks.	翻訳日:2024-11-02 15:40:54 公開日:2024-10-04
# KidLM: 子ども向け言語モデルの改善 - 早期洞察と今後の方向性 KidLM: Advancing Language Models for Children -- Early Insights and Future Directions ( http://arxiv.org/abs/2410.03884v1 ) ライセンス: Link先を確認	Mir Tafseer Nayeem, Davood Rafiei,	(参考訳) 近年の研究では、子ども向けの教育ツール作成における大きな言語モデルの可能性を強調しているが、言語的ニュアンス、認知的ニーズ、安全基準といった、子固有の重要な特性を維持する上で大きな課題が残っている。本稿では,子ども固有の言語モデルの開発に向けた基礎的なステップを探求し,高品質な事前学習データの必要性を強調した。本研究では,子供用,時には子供用のコーパスを収集し,検証する,ユーザ中心のデータ収集パイプラインを提案する。さらに、ドメイン固有の児童言語データに基づいてマスキング確率を動的に調整し、モデルによる語彙や概念の優先順位付けを可能にする、新たなトレーニング目標であるStratified Maskingを提案する。実験により,本モデルは下級テキストの理解に優れ,ステレオタイプを避けて安全性を維持し,子どもの独特な嗜好を捉えていることが示された。さらに,子ども固有の言語モデリングにおける将来の研究・開発に有効な知見を提供する。 Recent studies highlight the potential of large language models in creating educational tools for children, yet significant challenges remain in maintaining key child-specific properties such as linguistic nuances, cognitive needs, and safety standards. In this paper, we explore foundational steps toward the development of child-specific language models, emphasizing the necessity of high-quality pre-training data. We introduce a novel user-centric data collection pipeline that involves gathering and validating a corpus specifically written for and sometimes by children. Additionally, we propose a new training objective, Stratified Masking, which dynamically adjusts masking probabilities based on our domain-specific child language data, enabling models to prioritize vocabulary and concepts more suitable for children. Experimental evaluations demonstrate that our model excels in understanding lower grade-level text, maintains safety by avoiding stereotypes, and captures children's unique preferences. Furthermore, we provide actionable insights for future research and development in child-specific language modeling.	翻訳日:2024-11-02 15:40:54 公開日:2024-10-04
# 供給モード依存型故障率によるデュアルサーシング問題の解法 Solving Dual Sourcing Problems with Supply Mode Dependent Failure Rates ( http://arxiv.org/abs/2410.03887v1 ) ライセンス: Link先を確認	Fabian Akkerman, Nils Knofius, Matthieu van der Heijden, Martijn Mes,	(参考訳) 本稿では、サプライモード依存故障率による二重ソーシング問題、特に、ダウンタイムクリティカル資産の予備部品管理に関係のある問題について検討する。レジリエンスを高めるために、企業は従来の製造技術と付加的な製造技術の両方を用いて、デュアルソーシング戦略を採用するようになった。本稿では,これらの戦略が,部品特性や故障率の変動に対処してソーシングを最適化する方法について検討する。重要な課題は、これらの手法が生み出す部品の障害特性が、将来の需要に影響を及ぼすことである。そこで本研究では,新たな反復的ヒューリスティックおよび複数の強化学習手法と,内因性パラメータ学習(EPL)アプローチを併用して提案する。このEPLアプローチは、どんな学習方法とも互換性があり、単一のポリシーで複数の項目に対して様々な入力パラメータを処理できます。スタイリングされた環境では、我々の最良のポリシーは平均最適性ギャップを0.4%達成する。エネルギー部門におけるケーススタディでは、私たちの政策は91.1%のインスタンスでベースラインを上回り、平均コストは22.6%まで削減される。 This paper investigates dual sourcing problems with supply mode dependent failure rates, particularly relevant in managing spare parts for downtime-critical assets. To enhance resilience, businesses increasingly adopt dual sourcing strategies using both conventional and additive manufacturing techniques. This paper explores how these strategies can optimise sourcing by addressing variations in part properties and failure rates. A significant challenge is the distinct failure characteristics of parts produced by these methods, which influence future demand. To tackle this, we propose a new iterative heuristic and several reinforcement learning techniques combined with an endogenous parameterised learning (EPL) approach. This EPL approach - compatible with any learning method - allows a single policy to handle various input parameters for multiple items. In a stylised setting, our best policy achieves an average optimality gap of 0.4%. In a case study within the energy sector, our policies outperform the baseline in 91.1% of instances, yielding average cost savings up to 22.6%.	翻訳日:2024-11-02 15:40:54 公開日:2024-10-04
# コスト感性意思決定に向けて Towards Cost Sensitive Decision Making ( http://arxiv.org/abs/2410.03892v1 ) ライセンス: Link先を確認	Yang Li, Junier Oliva,	(参考訳) 多くの現実世界の状況では、限られたデータや不確実なデータで決定を行う際に追加の関連情報を取得することができる。しかし、従来のRLアプローチでは、事前に取得されるすべての機能(例えば、MDP)を、取得できないデータ(例えば、POMDP)としてみなす必要がある。本研究では,環境から機能を積極的に獲得し,意思決定の質と確実性を向上させるとともに,機能獲得プロセスのコストとタスク決定プロセスの報酬を自動的にバランスさせるRLモデルを考察する。本稿では,Active-Acquisition POMDPを提案する。積極的部分観測環境におけるエージェントの支援と探索・探索ジレンマの緩和を目的として, モデルベースアプローチを開発し, 深層生成モデルを用いて特徴の依存関係を把握し, 未観測特徴をインプットする。命令は本質的にエージェントの信念を表している。動的モデルを用いて,AA-POMDPの両タイプを解く階層的RLアルゴリズムを開発した。実験により,本手法は既存のPOMDP-RLソリューションよりもかなり優れた性能を示すことが示された。 Many real-world situations allow for the acquisition of additional relevant information when making decisions with limited or uncertain data. However, traditional RL approaches either require all features to be acquired beforehand (e.g. in a MDP) or regard part of them as missing data that cannot be acquired (e.g. in a POMDP). In this work, we consider RL models that may actively acquire features from the environment to improve the decision quality and certainty, while automatically balancing the cost of feature acquisition process and the reward of task decision process. We propose the Active-Acquisition POMDP and identify two types of the acquisition process for different application domains. In order to assist the agent in the actively-acquired partially-observed environment and alleviate the exploration-exploitation dilemma, we develop a model-based approach, where a deep generative model is utilized to capture the dependencies of the features and impute the unobserved features. The imputations essentially represent the beliefs of the agent. Equipped with the dynamics model, we develop hierarchical RL algorithms to resolve both types of the AA-POMDPs. Empirical results demonstrate that our approach achieves considerably better performance than existing POMDP-RL solutions.	翻訳日:2024-11-02 15:40:54 公開日:2024-10-04
# ささやかな検索機能を備えたヒトアライメントチェス Human-aligned Chess with a Bit of Search ( http://arxiv.org/abs/2410.03893v1 ) ライセンス: Link先を確認	Yiming Zhang, Athul Paul Jacob, Vivian Lai, Daniel Fried, Daphne Ippolito,	(参考訳) チェスは長い間、人間の知性とマッチするAIの探求の試行場であり、近年ではチェスAIシステムがゲームで最強の人間を上回っている。しかしながら、これらのシステムは人間と協調するものではなく、全ての人間のパートナーのスキルレベルと一致したり、部品の動きを超えた人間の様態をモデル化することができない。本稿では,この古典的なゲームにおいて,人工知能と人間の知能のギャップを埋めるために設計されたチェスをプレイするAIであるAllieを紹介する。 Allieは実際のチェスゲームのログシーケンスに基づいてトレーニングされ、スキルスペクトル全体にわたって人間のチェス選手の振る舞いをモデル化する。例えば、熟考時間や辞退などの非移動行動オフライン評価では、Allieは人間のような振る舞いを示す。モデルでは,新たな時間適応型モンテカルロ木探索(MCTS)手順において,推定関数として使用可能な各ゲーム状態に対して,報酬を確実に割り当てることを学ぶ。適応探索は,1000から2600Eloの格付けを持つプレイヤーに対する大規模なオンライン評価において,平均49Eloのスキルギャップを生じさせる。グランドマスターレベル(2500エロ)の対戦相手に対して、適応探索のアリエは仲間のグランドマスターの強さを示し、全て人間から学習する。 Chess has long been a testbed for AI's quest to match human intelligence, and in recent years, chess AI systems have surpassed the strongest humans at the game. However, these systems are not human-aligned; they are unable to match the skill levels of all human partners or model human-like behaviors beyond piece movement. In this paper, we introduce Allie, a chess-playing AI designed to bridge the gap between artificial and human intelligence in this classic game. Allie is trained on log sequences of real chess games to model the behaviors of human chess players across the skill spectrum, including non-move behaviors such as pondering times and resignations In offline evaluations, we find that Allie exhibits humanlike behavior: it outperforms the existing state-of-the-art in human chess move prediction and "ponders" at critical positions. The model learns to reliably assign reward at each game state, which can be used at inference as a reward function in a novel time-adaptive Monte-Carlo tree search (MCTS) procedure, where the amount of search depends on how long humans would think in the same positions. Adaptive search enables remarkable skill calibration; in a large-scale online evaluation against players with ratings from 1000 to 2600 Elo, our adaptive search method leads to a skill gap of only 49 Elo on average, substantially outperforming search-free and standard MCTS baselines. Against grandmaster-level (2500 Elo) opponents, Allie with adaptive search exhibits the strength of a fellow grandmaster, all while learning exclusively from humans.	翻訳日:2024-11-02 15:40:54 公開日:2024-10-04
# 壁紙は愚かである:視覚と言語を用いた屋内のローカライゼーション The Wallpaper is Ugly: Indoor Localization using Vision and Language ( http://arxiv.org/abs/2410.03900v1 ) ライセンス: Link先を確認	Seth Pate, Lawson L. S. Wong,	(参考訳) 本研究では,自然言語クエリと環境からのイメージを用いて,マッピングした屋内環境におけるユーザの位置を探索する作業について検討する。近年の事前学習された視覚言語モデルに基づいて,テキスト記述と環境中の位置の画像との類似点を学習する。このスコアは、言語クエリに最もよくマッチする場所を特定し、ユーザの位置を推定します。私たちのアプローチでは、トレーニング中に見られなかった環境、テキスト、イメージをローカライズすることが可能です。 1つのモデル、微調整のCLIPは、評価において人間より優れていた。 We study the task of locating a user in a mapped indoor environment using natural language queries and images from the environment. Building on recent pretrained vision-language models, we learn a similarity score between text descriptions and images of locations in the environment. This score allows us to identify locations that best match the language query, estimating the user's location. Our approach is capable of localizing on environments, text, and images that were not seen during training. One model, finetuned CLIP, outperformed humans in our evaluation.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# 目標認識型コントラスト損失の増大によるノード表現の改善 Improving Node Representation by Boosting Target-Aware Contrastive Loss ( http://arxiv.org/abs/2410.03901v1 ) ライセンス: Link先を確認	Ying-Chun Lin, Jennifer Neville,	(参考訳) グラフは、複雑な接続をキャプチャするノードとエッジを持つエンティティ間の複雑な関係をモデル化する。ノード表現学習では、ノードを低次元の埋め込みに変換する。これらの埋め込みは典型的には下流タスクの機能として使用される。そのため、その品質はタスクのパフォーマンスに大きな影響を与えます。ノード表現学習のための既存のアプローチは、半教師付き、非教師なし、自己教師付きパラダイムである。グラフ領域では、(半教師付き学習はクラスラベルに基づくモデルのみを最適化し、他の豊富なグラフ信号を無視し、一般化を制限する。自己教師付き学習や教師なし学習は、基礎となるグラフ信号をよりよくキャプチャする表現を生成するが、これらのキャプチャされた信号が下流のターゲットタスクに有用であることは、様々である。このギャップを埋めるために,目標タスクとノード表現間の相互情報を自己教師型学習プロセスで最大化し,目標タスク性能を向上させることを目的としたターゲット認識コントラスト学習(Target-Aware Contrastive Learning, CL)を導入する。これは、XGBoost Sampler (XGSampler) というサンプリング機能によって実現され、提案されているTarget-Aware Contrastive Loss (XTCL) の適切な正のサンプルをサンプリングする。 XTCLを最小化することにより、ターゲット認識CLはターゲットタスクとノード表現間の相互情報を増大させ、モデルの一般化が向上する。さらに、XGSamplerは適切な正のサンプルをサンプリングするための重みを示すことによって、各信号の解釈可能性を高める。実験により,XTCLはノード分類とリンク予測タスクの2つのタスクにおいて,最先端モデルと比較して性能を著しく向上することを示した。 Graphs model complex relationships between entities, with nodes and edges capturing intricate connections. Node representation learning involves transforming nodes into low-dimensional embeddings. These embeddings are typically used as features for downstream tasks. Therefore, their quality has a significant impact on task performance. Existing approaches for node representation learning span (semi-)supervised, unsupervised, and self-supervised paradigms. In graph domains, (semi-)supervised learning often only optimizes models based on class labels, neglecting other abundant graph signals, which limits generalization. While self-supervised or unsupervised learning produces representations that better capture underlying graph signals, the usefulness of these captured signals for downstream target tasks can vary. To bridge this gap, we introduce Target-Aware Contrastive Learning (Target-aware CL) which aims to enhance target task performance by maximizing the mutual information between the target task and node representations with a self-supervised learning process. This is achieved through a sampling function, XGBoost Sampler (XGSampler), to sample proper positive examples for the proposed Target-Aware Contrastive Loss (XTCL). By minimizing XTCL, Target-aware CL increases the mutual information between the target task and node representations, such that model generalization is improved. Additionally, XGSampler enhances the interpretability of each signal by showing the weights for sampling the proper positive examples. We show experimentally that XTCL significantly improves the performance on two target tasks: node classification and link prediction tasks, compared to state-of-the-art models.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# 大域的に駆動される中性原子配列による量子最適化 Quantum optimization with globally driven neutral atom arrays ( http://arxiv.org/abs/2410.03902v1 ) ライセンス: Link先を確認	Martin Lanthaler, Kilian Ender, Clemens Dlaska, Wolfgang Lechner,	(参考訳) 本稿では,大域レーザードライブのみを必要とする中性原子の配列上の高次項を含む任意の接続性を有する組合せ最適化問題のスケーラブルな符号化法を提案する。我々のアプローチは、少数の問題に依存しないガジェットのモジュラー配置に依存している。これらのガジェットは、そのようなデバイスに固有の単位ディスクグラフ上の最大ウェイト独立集合(MWIS)問題を表す。 MWIS重みをサイト依存のレーザーデチューニングでプログラミングする代わりに、このスキームは補助原子の体系的な配置に依存している。これらの補助原子は、問題固有のプログラミングと、長距離相互作用の尾から生じる不要な効果の緩和の両方に同時に使用できることを示す。 We propose a scalable encoding of combinatorial optimization problems with arbitrary connectivity, including higher-order terms, on arrays of trapped neutral atoms requiring only a global laser drive. Our approach relies on modular arrangements of a small number of problem-independent gadgets. These gadgets represent maximum-weight independent set (MWIS) problems on unit-disk graphs, which are native to such devices. Instead of programming MWIS weights with site-dependent laser detunings, the scheme relies on systematic placements of auxiliary atoms. We show, that these auxiliary atoms can be simultaneously used for both problem-specific programming and the mitigation of unwanted effects originating from the tails of long-range interactions.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# 聞いた? AADGの導入: 音声異常検出におけるベンチマークデータ生成フレームワーク Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection ( http://arxiv.org/abs/2410.03904v1 ) ライセンス: Link先を確認	Ksheeraja Raghavan, Samiran Gode, Ankit Shah, Surabhi Raghavan, Wolfram Burgard, Bhiksha Raj, Rita Singh,	(参考訳) 本稿では,異常検出と局所化に特化して設計された,新しい汎用オーディオ生成フレームワークを提案する。産業や機械関連音に主にフォーカスする既存のデータセットとは異なり、我々のフレームワークは幅広い環境に焦点を当てており、特にビデオやテレフォニックオーディオなど、音声データしか利用できない現実世界のシナリオで有用である。このようなデータを生成するために,LLM-Moduloフレームワークにヒントを得た新しい手法を提案する。このツールはモジュール式で、プラグアンドプレイのアプローチを可能にする。まず、LLMを使って、実証可能な現実世界のシナリオを予測します。 LLMはさらに、これらを結合してコヒーレントな全体を生成するための構成音、順序、方法を取り出す。 LLM-Moduloフレームワークと同様に、各出力ステージの厳密な検証を含み、生成されたデータの信頼性を保証する。このフレームワークで生成されたデータは、異常検出アプリケーションのベンチマークとして機能し、オーディオデータ、特にアウト・オブ・ディストリビューションのケースを扱う際にトレーニングされたモデルの性能を向上させる可能性がある。我々の貢献は、オーディオ異常検出リソースにおける重要な空白を埋め、多様なリアルなオーディオデータを生成するためのスケーラブルなツールを提供する。 We introduce a novel, general-purpose audio generation framework specifically designed for anomaly detection and localization. Unlike existing datasets that predominantly focus on industrial and machine-related sounds, our framework focuses a broader range of environments, particularly useful in real-world scenarios where only audio data are available, such as in video-derived or telephonic audio. To generate such data, we propose a new method inspired by the LLM-Modulo framework, which leverages large language models(LLMs) as world models to simulate such real-world scenarios. This tool is modular allowing a plug-and-play approach. It operates by first using LLMs to predict plausible real-world scenarios. An LLM further extracts the constituent sounds, the order and the way in which these should be merged to create coherent wholes. Much like the LLM-Modulo framework, we include rigorous verification of each output stage, ensuring the reliability of the generated data. The data produced using the framework serves as a benchmark for anomaly detection applications, potentially enhancing the performance of models trained on audio data, particularly in handling out-of-distribution cases. Our contributions thus fill a critical void in audio anomaly detection resources and provide a scalable tool for generating diverse, realistic audio data.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# PersonalSum: 大規模言語モデルのためのユーザ目的のパーソナライズ・サマリゼーションデータセット PersonalSum: A User-Subjective Guided Personalized Summarization Dataset for Large Language Models ( http://arxiv.org/abs/2410.03905v1 ) ライセンス: Link先を確認	Lemei Zhang, Peng Liu, Marcus Tiedemann Oekland Henriksboe, Even W. Lauvrak, Jon Atle Gulla, Heri Ramampiaro,	(参考訳) 近年の自然言語処理の急速な進歩により、大規模言語モデル(LLM)が生み出す一般的な要約が、ジャーナリストなどの専門家の注釈を上回ることがあることが、人間による評価によって示されている。しかし、これらの一般的な要約が一般人の個人のニーズに合致するかどうかについては、限定的な研究がなされている。最大の障害は、一般大衆からの人間による注釈付きデータセットの欠如である。パーソナライズされた要約に関する既存の研究は、しばしば、一般的な要約データセットから生成された疑似データセットや、特定の名前のエンティティや、アノテータのイニシアチブのない仮説的なタスクから収集された生成された要約の長さと特異性など、他の側面に焦点を当てた制御可能なタスクに依存している。このギャップを埋めるために、PersonalSumと呼ばれる高品質でパーソナライズされた手作業による要約データセットを提案する。このデータセットは、LLMが生成した一般的な要約とパブリック読者の焦点が異なっているかどうかを調査する最初のものである。これには、ユーザープロファイル、与えられた記事のソース文を伴うパーソナライズされた要約、およびソースと共にマシン生成されたジェネリック要約が含まれる。 LLMを用いたパーソナライズド・サマリーの生成に影響を及ぼすいくつかの個人的信号(エンティティ、トピック、プロット、記事の構造)を、数ショットのインコンテキスト学習シナリオで検討する。我々の予備的な結果と分析は、エンティティ/トピックがユーザの多様な嗜好に影響を与える重要な要因の1つであり、パーソナライズされた要約は、既存のLCMにとって重要な課題であることを示している。 With the rapid advancement of Natural Language Processing in recent years, numerous studies have shown that generic summaries generated by Large Language Models (LLMs) can sometimes surpass those annotated by experts, such as journalists, according to human evaluations. However, there is limited research on whether these generic summaries meet the individual needs of ordinary people. The biggest obstacle is the lack of human-annotated datasets from the general public. Existing work on personalized summarization often relies on pseudo datasets created from generic summarization datasets or controllable tasks that focus on specific named entities or other aspects, such as the length and specificity of generated summaries, collected from hypothetical tasks without the annotators' initiative. To bridge this gap, we propose a high-quality, personalized, manually annotated abstractive summarization dataset called PersonalSum. This dataset is the first to investigate whether the focus of public readers differs from the generic summaries generated by LLMs. It includes user profiles, personalized summaries accompanied by source sentences from given articles, and machine-generated generic summaries along with their sources. We investigate several personal signals - entities/topics, plot, and structure of articles - that may affect the generation of personalized summaries using LLMs in a few-shot in-context learning scenario. Our preliminary results and analysis indicate that entities/topics are merely one of the key factors that impact the diverse preferences of users, and personalized summarization remains a significant challenge for existing LLMs.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# ゲートセットパウリ雑音の効率的な自己整合学習 Efficient self-consistent learning of gate set Pauli noise ( http://arxiv.org/abs/2410.03906v1 ) ライセンス: Link先を確認	Senrui Chen, Zhihan Zhang, Liang Jiang, Steven T. Flammia,	(参考訳) 量子ノイズを理解することは、実用的な量子情報処理システムを構築するための重要なステップである。パウリノイズは量子ベンチマーク、誤差緩和、誤り訂正に広く応用された有用なモデルである。集中的な研究にもかかわらず、現存するほとんどの研究は、ゲートセット全体を扱うのではなく、特定のゲートに関連するパウリノイズチャネルの学習に重点を置いている。自己整合性,完全性,効率的性を同時に備えた学習アルゴリズムはまだ確立されていない。本研究では,一組の量子ゲート,状態準備,測定を行うゲートセットパウリ雑音学習の課題について検討する。代数グラフ理論のツールを用いて、任意の線形アンザッツを持つパウリ雑音モデルに対する自己整合的に学習可能な自由度を解析的に特徴付け、学習可能な全ての情報を効率的に学習するための設計実験を行う。具体的には, ゲートノイズに関する学習可能な情報はすべて, 相対的精度で学習可能であることを示す。次に, 空間的局所雑音や準局所雑音など) および実験的に関連するゲートセット(平行なCZゲートなど)に対して, 具体的動機付けのアンザッツに適用することにより, 理論の柔軟性を実証する。これらの結果は、量子ノイズ学習の理論的理解を深めるだけでなく、既存および近未来の量子情報処理装置を特徴付けるための実現可能なレシピを提供する。 Understanding quantum noise is an essential step towards building practical quantum information processing systems. Pauli noise is a useful model that has been widely applied in quantum benchmarking, error mitigation, and error correction. Despite intensive study, most existing works focus on learning Pauli noise channels associated with some specific gates rather than treating the gate set as a whole. A learning algorithm that is self-consistent, complete, and efficient at the same time is yet to be established. In this work, we study the task of gate set Pauli noise learning, where a set of quantum gates, state preparation, and measurements all suffer from unknown Pauli noise channels with a customized noise ansatz. Using tools from algebraic graph theory, we analytically characterize the self-consistently learnable degrees of freedom for Pauli noise models with arbitrary linear ansatz, and design experiments to efficiently learn all the learnable information. Specifically, we show that all learnable information about the gate noise can be learned to relative precision, under mild assumptions on the noise ansatz. We then demonstrate the flexibility of our theory by applying it to concrete physically motivated ansatzs (such as spatially local or quasi-local noise) and experimentally relevant gate sets (such as parallel CZ gates). These results not only enhance the theoretical understanding of quantum noise learning, but also provide a feasible recipe for characterizing existing and near-future quantum information processing devices.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# ActPlan-1K: 家庭活動における視覚言語モデルの手続き計画能力のベンチマーク ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities ( http://arxiv.org/abs/2410.03907v1 ) ライセンス: Link先を確認	Ying Su, Zhan Ling, Haochen Shi, Jiayang Cheng, Yauwai Yim, Yangqiu Song,	(参考訳) 大規模言語モデル~(LLM)は、その強力な推論能力のために、テキストタスク記述の処理と、具体化されたAIタスクの手続き計画を達成するために採用されている。しかし、マルチモーダルタスク入力を考慮した場合、視覚言語モデル~(VLM)がどのように振る舞うかについてはまだ研究されていない。代替タスクの状況よりもモデルの推論能力を評価する対物プランニングも、悪用されている。マルチモーダル面とデファクト面の両方の計画能力を評価するために,ActPlan-1Kを提案する。 ActPlan-1KはChatGPTと家庭用アクティビティシミュレータiGibson2に基づいて構築されたマルチモーダル計画ベンチマークである。ベンチマークは153のアクティビティと1,187のインスタンスで構成されている。 1つのアクティビティを記述する各インスタンスには、自然言語タスク記述とシミュレータからの複数の環境イメージがある。各インスタンスのゴールドプランは、提供されたシーンのオブジェクトに対するアクションシーケンスである。典型的VLMにおいて,正当性および常識満足度の評価を行った。現在のVLMは、正常な活動と反現実的な活動の両方のために、人間レベルの手続き的な計画を作成するのに苦戦していることが判明した。さらに、BLEURTモデルを微調整して自動評価指標を提供し、将来のベンチマーク研究を促進する。 Large language models~(LLMs) have been adopted to process textual task description and accomplish procedural planning in embodied AI tasks because of their powerful reasoning ability. However, there is still lack of study on how vision language models~(VLMs) behave when multi-modal task inputs are considered. Counterfactual planning that evaluates the model's reasoning ability over alternative task situations are also under exploited. In order to evaluate the planning ability of both multi-modal and counterfactual aspects, we propose ActPlan-1K. ActPlan-1K is a multi-modal planning benchmark constructed based on ChatGPT and household activity simulator iGibson2. The benchmark consists of 153 activities and 1,187 instances. Each instance describing one activity has a natural language task description and multiple environment images from the simulator. The gold plan of each instance is action sequences over the objects in provided scenes. Both the correctness and commonsense satisfaction are evaluated on typical VLMs. It turns out that current VLMs are still struggling at generating human-level procedural plans for both normal activities and counterfactual activities. We further provide automatic evaluation metrics by finetuning over BLEURT model to facilitate future research on our benchmark.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# メンタルヘルス診断のための大規模言語モデルの評価 Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis ( http://arxiv.org/abs/2410.03908v1 ) ライセンス: Link先を確認	Amey Hengle, Atharva Kulkarni, Shantanu Patankar, Madhumitha Chandrasekaran, Sneha D'Silva, Jemima Jacob, Rashmi Gupta,	(参考訳) 本研究では,ソーシャルメディア投稿からの抑うつ不安度分類のための,新しい第一種ベンチマークであるANGSTを紹介する。異なるメンタルヘルス障害間の複雑な相互作用を、孤立した状態として扱うことで、しばしば単純化する現代のデータセットとは異なり、ANGSTは複数のラベルの分類を可能にし、各ポストをうつ病や不安を示すものとして同時に識別することができる。専門家心理学者による2876の微妙な注釈付き投稿と7667の銀ラベル付き投稿を補完し、ANGSTはオンラインのメンタルヘルス談話のより代表的なサンプルを提示する。さらに、メンタル-BERT から GPT-4 まで、様々な最先端言語モデルを用いてANGST をベンチマークする。我々の結果は、複雑な診断シナリオにおけるこれらのモデルの能力と限界に関する重要な洞察を提供する。 GPT-4は一般的に他のモデルよりも優れていますが、F1スコアはマルチクラスのコンコービッド分類で72%を超え、メンタルヘルス診断に言語モデルを適用する上で進行中の課題を浮き彫りにしています。 In this study, we introduce ANGST, a novel, first-of-its kind benchmark for depression-anxiety comorbidity classification from social media posts. Unlike contemporary datasets that often oversimplify the intricate interplay between different mental health disorders by treating them as isolated conditions, ANGST enables multi-label classification, allowing each post to be simultaneously identified as indicating depression and/or anxiety. Comprising 2876 meticulously annotated posts by expert psychologists and an additional 7667 silver-labeled posts, ANGST posits a more representative sample of online mental health discourse. Moreover, we benchmark ANGST using various state-of-the-art language models, ranging from Mental-BERT to GPT-4. Our results provide significant insights into the capabilities and limitations of these models in complex diagnostic scenarios. While GPT-4 generally outperforms other models, none achieve an F1 score exceeding 72% in multi-class comorbid classification, underscoring the ongoing challenges in applying language models to mental health diagnostics.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# 収益の株価トレンド予測のための基礎分析の活用 Leveraging Fundamental Analysis for Stock Trend Prediction for Profit ( http://arxiv.org/abs/2410.03913v1 ) ライセンス: Link先を確認	John Phan, Hung-Fu Chang,	(参考訳) 本稿では,機械学習モデル,Long Short-Term Memory (LSTM), 1次元畳み込みニューラルネットワーク (1D CNN) およびロジスティック回帰 (LR) を用いて,基本解析に基づく株価トレンドの予測を行う。技術や感情分析を主に活用する既存の研究とは異なり、当社は企業の財務諸表と本質的な価値をトレンド予測に活用することを強調している。 2019年から2023年にかけて、各分野の上場企業から得られた269件のデータポイントのデータセットを用いて、2つの予測タスク、すなわち年次株価差(ASPD)と現在の株価と内在価値の差(DCSPIV)を定式化するために、主要な金融比率とDCF(Discounted Cash Flow)モデルを用いています。これらのタスクはそれぞれ、年間利益の可能性と現在の収益性を評価する。その結果、LRモデルはCNNやLSTMモデルより優れており、ASPDでは平均テスト精度は74.66%、DCSPIVでは72.85%であることがわかった。本研究は,学術研究と実践的投資戦略の両方に有用な洞察を提供するとともに,ストック予測のための機械学習への基礎解析の統合に関する限られた文献に寄与する。基本データを活用することで、当社のアプローチは長期的な株価トレンド予測の可能性を強調し、意思決定プロセスにおけるポートフォリオマネージャを支援します。 This paper investigates the application of machine learning models, Long Short-Term Memory (LSTM), one-dimensional Convolutional Neural Networks (1D CNN), and Logistic Regression (LR), for predicting stock trends based on fundamental analysis. Unlike most existing studies that predominantly utilize technical or sentiment analysis, we emphasize the use of a company's financial statements and intrinsic value for trend forecasting. Using a dataset of 269 data points from publicly traded companies across various sectors from 2019 to 2023, we employ key financial ratios and the Discounted Cash Flow (DCF) model to formulate two prediction tasks: Annual Stock Price Difference (ASPD) and Difference between Current Stock Price and Intrinsic Value (DCSPIV). These tasks assess the likelihood of annual profit and current profitability, respectively. Our results demonstrate that LR models outperform CNN and LSTM models, achieving an average test accuracy of 74.66% for ASPD and 72.85% for DCSPIV. This study contributes to the limited literature on integrating fundamental analysis into machine learning for stock prediction, offering valuable insights for both academic research and practical investment strategies. By leveraging fundamental data, our approach highlights the potential for long-term stock trend prediction, supporting portfolio managers in their decision-making processes.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# 配電ガイドによるアクティブな特徴獲得 Distribution Guided Active Feature Acquisition ( http://arxiv.org/abs/2410.03915v1 ) ライセンス: Link先を確認	Yang Li, Junier Oliva,	(参考訳) 人間のエージェントは、不完全で泥だらけのデータを持つインスタンスを定期的に理由付けする(そして、さらなる特徴を得るコストを計測する)。対照的に、MLの大部分は、すべての特徴が観察され、インスタンスに関するさらなる情報が排除される非現実的で不安定な環境に費やされている。ここでは、静的MLを過去のものに拡張し、環境と対話してオンザフライで新しい情報を得ることのできるアクティブ機能獲得(AFA)フレームワークを開発する。 1) 不完全な特徴に直面して,インスタンスに推論を行う。 2 特徴取得の計画を決定し、当該事例について追加情報を得る。データに存在する情報と条件依存を理解するバックボーン上に、AFAフレームワークを構築します。まず、任意の機能のサブセットに対する依存関係をキャプチャし、これらのモデルをグリード方式で取得するために利用する生成モデルの構築方法を示す。以上より, 生成モデルから得られた副作用および補助報酬を用いて, AFAに対するRLエージェントのトレーニングを指導することが可能であることが示唆された。また,AFAモデルを現実のシナリオ,すなわち解釈可能性と堅牢性に展開する上での2つの重要な要因についても検討する。大規模な実験は、AFAフレームワークの最先端のパフォーマンスを実証します。 Human agents routinely reason on instances with incomplete and muddied data (and weigh the cost of obtaining further features). In contrast, much of ML is devoted to the unrealistic, sterile environment where all features are observed and further information on an instance is obviated. Here we extend past static ML and develop an active feature acquisition (AFA) framework that interacts with the environment to obtain new information on-the-fly and can: 1) make inferences on an instance in the face of incomplete features, 2) determine a plan for feature acquisitions to obtain additional information on the instance at hand. We build our AFA framework on a backbone of understanding the information and conditional dependencies that are present in the data. First, we show how to build generative models that can capture dependencies over arbitrary subsets of features and employ these models for acquisitions in a greedy scheme. After, we show that it is possible to guide the training of RL agents for AFA via side-information and auxiliary rewards stemming from our generative models. We also examine two important factors for deploying AFA models in real-world scenarios, namely interpretability and robustness. Extensive experiments demonstrate the state-of-the-art performance of our AFA framework.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# STONE: アクティブ3次元オブジェクト検出のためのサブモジュール最適化フレームワーク STONE: A Submodular Optimization Framework for Active 3D Object Detection ( http://arxiv.org/abs/2410.03918v1 ) ライセンス: Link先を確認	Ruiyu Mao, Sarthak Kumar Maharana, Rishabh K Iyer, Yunhui Guo,	(参考訳) 3Dオブジェクト検出は、自律運転やロボット工学など、様々な新興アプリケーションにとって基本的に重要である。正確な3Dオブジェクト検出器をトレーニングするための重要な要件は、大量のLiDARベースのポイントクラウドデータが利用可能であることである。残念なことに、ポイントクラウドデータのラベル付けは非常に難しい。本稿では,3Dオブジェクト検出のトレーニングのラベル付けコストを大幅に削減する,統合されたアクティブな3Dオブジェクト検出フレームワークを提案する。本フレームワークは, アクティブな3次元物体検出の問題に特化して, サブモジュラー最適化の新たな定式化を基礎としている。特に, アクティブな3Dオブジェクト検出に関連する2つの基本的な課題に対処する: データ不均衡と, 様々な難易度を持つLiDARベースのポイントクラウドデータを含むデータの分布をカバーする必要性。大規模実験により,本手法は既存の能動学習法と比較して,高い計算効率で最先端の性能を達成できることが実証された。 3D object detection is fundamentally important for various emerging applications, including autonomous driving and robotics. A key requirement for training an accurate 3D object detector is the availability of a large amount of LiDAR-based point cloud data. Unfortunately, labeling point cloud data is extremely challenging, as accurate 3D bounding boxes and semantic labels are required for each potential object. This paper proposes a unified active 3D object detection framework, for greatly reducing the labeling cost of training 3D object detector. Our framework is based on a novel formulation of submodular optimization, specifically tailored to the problem of active 3D object detection. In particular, we address two fundamental challenges associated with active 3D object detection: data imbalance and the need to cover the distribution of the data, including LiDAR-based point cloud data of varying difficulty levels. Extensive experiments demonstrate that our method achieves state-of-the-art performance with high computational efficiency compared to existing active learning methods.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# 拡散先行したオンライン後方サンプリング Online Posterior Sampling with a Diffusion Prior ( http://arxiv.org/abs/2410.03919v1 ) ライセンス: Link先を確認	Branislav Kveton, Boris Oreshkin, Youngsuk Park, Aniket Deshmukh, Rui Song,	(参考訳) ガウス事前の文脈的包帯における後方サンプリングは、ラプラス近似を用いて正確にあるいはほぼ実施することができる。ガウス事前は計算的に効率的であるが、複素分布を記述できない。そこで本研究では,拡散モデルを用いた文脈帯域に対する近似的な後方サンプリングアルゴリズムを提案する。鍵となるアイデアは、ラプラス近似を用いて閉じた形で推定される逆過程の各段階の1つである、近似条件付き後続の連鎖からサンプリングすることである。我々の近似は、ガウス先行の後方サンプリングによって動機付けられ、その単純さと効率を継承する。これらは漸近的に一貫性があり、様々な文脈的盗賊問題で経験的に機能する。 Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior. The key idea is to sample from a chain of approximate conditional posteriors, one for each stage of the reverse process, which are estimated in a closed form using the Laplace approximation. Our approximations are motivated by posterior sampling with a Gaussian prior, and inherit its simplicity and efficiency. They are asymptotically consistent and perform well empirically on a variety of contextual bandit problems.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# ロボットと物体の相互作用によるロボットの知覚による物体特性の学習 Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction ( http://arxiv.org/abs/2410.03920v1 ) ライセンス: Link先を確認	Peter Yichen Chen, Chao Liu, Pingchuan Ma, John Eastman, Daniela Rus, Dylan Randle, Yuri Ivanov, Wojciech Matusik,	(参考訳) 微分可能シミュレーションは、システム識別の強力なツールとなっている。従来の研究では,ロボット固有のデータやオブジェクトのプロパティをオブジェクト固有のデータを用いて識別することに注力していたが,本手法では,オブジェクト自体のデータに頼ることなく,ロボットからの情報を用いてオブジェクトのプロパティを校正する。具体的には,ロボットのジョイントエンコーダ情報を利用する。私たちのキーとなる観察は、操作対象に対するロボットの反応を分析することで、慣性や柔らかさなどの物体の特性を推測できるということです。この知見を生かして,操作対象の性質を逆同定する,ロボットと物体の相互作用の微分可能なシミュレーションを開発した。われわれのアプローチは、ロボットの内部感知能力であるプロプリセプションのみに依存しており、外部計測ツールや視覚ベースのトラッキングシステムを必要としない。本手法は,任意の関節ロボットに適用可能であり,関節位置情報のみを必要とする。低コストなロボットプラットフォーム上での本手法の有効性を実証し,ノートパソコン上で数秒の計算で操作対象の高精度な質量および弾性率推定を実現する。 Differentiable simulation has become a powerful tool for system identification. While prior work has focused on identifying robot properties using robot-specific data or object properties using object-specific data, our approach calibrates object properties by using information from the robot, without relying on data from the object itself. Specifically, we utilize robot joint encoder information, which is commonly available in standard robotic systems. Our key observation is that by analyzing the robot's reactions to manipulated objects, we can infer properties of those objects, such as inertia and softness. Leveraging this insight, we develop differentiable simulations of robot-object interactions to inversely identify the properties of the manipulated objects. Our approach relies solely on proprioception -- the robot's internal sensing capabilities -- and does not require external measurement tools or vision-based tracking systems. This general method is applicable to any articulated robot and requires only joint position information. We demonstrate the effectiveness of our method on a low-cost robotic platform, achieving accurate mass and elastic modulus estimations of manipulated objects with just a few seconds of computation on a laptop.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# バングラの質問応答システム:クローズドドメインのための細調整BERT-Bangla Question-Answering System for Bangla: Fine-tuning BERT-Bangla for a Closed Domain ( http://arxiv.org/abs/2410.03923v1 ) ライセンス: Link先を確認	Subal Chandra Roy, Md Motaleb Hossen Manik,	(参考訳) Bengaliの質問応答システムは、特にドメイン固有のアプリケーションにおいて、開発が限られている。本稿では,自然言語処理の進歩を生かして,このギャップに対処するための細調整BERT-Banglaモデルについて検討する。閉領域における細調整BERT-Banglaモデルを用いたベンガル語問答システムの開発について述べる。データセットは、Khulna University of Engineering \& Technology(KUET)のウェブサイトや他の関連するテキストから得られた。このシステムは、キュレートされたデータから2500の質問応答対を抽出し、評価した。 Exact Match (EM) スコアと F1 スコアを含む主要な指標が評価に使われ、それぞれ 55.26\% と 74.21\% のスコアが得られた。この結果は,ドメイン固有なベンガル質問答えシステムに有望な可能性を示す。より複雑なクエリのパフォーマンスを改善するためには、さらなる改善が必要である。 Question-answering systems for Bengali have seen limited development, particularly in domain-specific applications. Leveraging advancements in natural language processing, this paper explores a fine-tuned BERT-Bangla model to address this gap. It presents the development of a question-answering system for Bengali using a fine-tuned BERT-Bangla model in a closed domain. The dataset was sourced from Khulna University of Engineering \& Technology's (KUET) website and other relevant texts. The system was trained and evaluated with 2500 question-answer pairs generated from curated data. Key metrics, including the Exact Match (EM) score and F1 score, were used for evaluation, achieving scores of 55.26\% and 74.21\%, respectively. The results demonstrate promising potential for domain-specific Bengali question-answering systems. Further refinements are needed to improve performance for more complex queries.	翻訳日:2024-11-02 15:31:01 公開日:2024-10-04
# オンライン制御インフォームドラーニング Online Control-Informed Learning ( http://arxiv.org/abs/2410.03924v1 ) ライセンス: Link先を確認	Zihao Liang, Tianyu Zhou, Zehui Lu, Shaoshuai Mou,	(参考訳) 本稿では,オンライン制御・インフォームド・ラーニング(OCIL)フレームワークを提案する。この新たな統合は、ノイズ測定データやオンライン学習、データ効率といった機械学習の実践的な問題に効果的に対処する。任意のロボットを調整可能な最適制御系として考慮し,拡張カルマンフィルタ(EKF)に基づくオンラインパラメータ推定器を提案する。提案手法は,データ中の雑音を効果的に管理することにより,学習の堅牢性も向上する。 OCILの収束と後悔を示す理論的解析が提供される。オンライン模倣学習,オンラインシステム同定,ポリシチューニングオンザフライの3つの学習モードを実験により検討し,その有効性を検証した。 This paper proposes an Online Control-Informed Learning (OCIL) framework, which synthesizes the well-established control theories to solve a broad class of learning and control tasks in real time. This novel integration effectively handles practical issues in machine learning such as noisy measurement data, online learning, and data efficiency. By considering any robot as a tunable optimal control system, we propose an online parameter estimator based on extended Kalman filter (EKF) to incrementally tune the system in real time, enabling it to complete designated learning or control tasks. The proposed method also improves robustness in learning by effectively managing noise in the data. Theoretical analysis is provided to demonstrate the convergence and regret of OCIL. Three learning modes of OCIL, i.e. Online Imitation Learning, Online System Identification, and Policy Tuning On-the-fly, are investigated via experiments, which validate their effectiveness.	翻訳日:2024-11-02 15:21:16 公開日:2024-10-04
# C3PA: スケーラブルな規制コンプライアンス監査を可能にするエキスパートアノテートおよび規制アウェアプライバシポリシのオープンデータセット C3PA: An Open Dataset of Expert-Annotated and Regulation-Aware Privacy Policies to Enable Scalable Regulatory Compliance Audits ( http://arxiv.org/abs/2410.03925v1 ) ライセンス: Link先を確認	Maaz Bin Musa, Steven M. Winston, Garrison Allen, Jacob Schiller, Kevin Moore, Sean Quick, Johnathan Melvin, Padmini Srinivasan, Mihailis E. Diamantis, Rishab Nithyanand,	(参考訳) プライバシポリシから組織データの習慣を分析し、抽出するツールとテクニックの開発は、スケーラブルな規制コンプライアンス監査にとって非常に重要です。残念なことに、これらのツールは、コンプライアンスの問題や修正を識別する能力において、ますます制限されている。その多くは、EUs GDPRやCalifornia CCPAのような目覚ましいプライバシー規制が導入される前から得られた、注釈付きプライバシーポリシーの規則に依存しないデータセットを使用して開発された。本稿では,C3PA(CCPA Privacy Policy Provision Annotations,CCPA Privacy Policy Provision Annotations)という,専門家に通知されたプライバシポリシの最初のオープンレギュレーション対応データセットについて述べる。 C3PAには、CCPA固有の411の組織からの開示義務に対する応答に関連する、専門家ラベル付きプライバシポリシテキストセグメントが48K以上含まれている。我々は,C3PAデータセットがCCPA関連開示委任書のコンプライアンスの自動化を支援するのに一意に適していることを示した。 The development of tools and techniques to analyze and extract organizations data habits from privacy policies are critical for scalable regulatory compliance audits. Unfortunately, these tools are becoming increasingly limited in their ability to identify compliance issues and fixes. After all, most were developed using regulation-agnostic datasets of annotated privacy policies obtained from a time before the introduction of landmark privacy regulations such as EUs GDPR and Californias CCPA. In this paper, we describe the first open regulation-aware dataset of expert-annotated privacy policies, C3PA (CCPA Privacy Policy Provision Annotations), aimed to address this challenge. C3PA contains over 48K expert-labeled privacy policy text segments associated with responses to CCPA-specific disclosure mandates from 411 unique organizations. We demonstrate that the C3PA dataset is uniquely suited for aiding automated audits of compliance with CCPA-related disclosure mandates.	翻訳日:2024-11-02 15:21:16 公開日:2024-10-04
# 深層学習に基づくVoxel-to-voxel変換によるエンドツーエンド反応場エネルギーモデリング End-to-End Reaction Field Energy Modeling via Deep Learning based Voxel-to-voxel Transform ( http://arxiv.org/abs/2410.03927v1 ) ライセンス: Link先を確認	Yongxian Wu, Qiang Zhu, Ray Luo,	(参考訳) 計算生化学と生物物理学において、静電気相互作用の役割を理解することは、生体分子の構造、力学、機能を明らかにするために重要である。ポアソン・ボルツマン方程式(ポアソン・ボルツマンかん、英: Poisson-Boltzmann equation)は、荷電分子とその周辺における静電ポテンシャルを記述することによって、これらの相互作用をモデル化するための基礎的なツールである。しかし、PB方程式の解法は、生体分子表面の複雑さと移動イオンを考慮に入れる必要性により、計算上の大きな課題を生じさせる。 PB方程式を解く従来の数値法は正確であるが、計算コストが高く、システムサイズが大きくなるとスケールが低下する。これらの課題に対処するために,ニューラルネットワークに基づく偏微分方程式解法の最近の進歩に触発された,新しい機械学習手法PBNeFを紹介する。提案手法は,PB方程式の入力および境界静電条件を学習可能なボクセル表現に定式化し,ニューラル場変換器を用いてPB溶液を予測し,その後反応場電位エネルギーを推定する。 PBNeFは従来のPBソルバに比べて100倍以上のスピードアップを実現し、一般ボルン(GB)モデルに匹敵する精度を維持している。 In computational biochemistry and biophysics, understanding the role of electrostatic interactions is crucial for elucidating the structure, dynamics, and function of biomolecules. The Poisson-Boltzmann (PB) equation is a foundational tool for modeling these interactions by describing the electrostatic potential in and around charged molecules. However, solving the PB equation presents significant computational challenges due to the complexity of biomolecular surfaces and the need to account for mobile ions. While traditional numerical methods for solving the PB equation are accurate, they are computationally expensive and scale poorly with increasing system size. To address these challenges, we introduce PBNeF, a novel machine learning approach inspired by recent advancements in neural network-based partial differential equation solvers. Our method formulates the input and boundary electrostatic conditions of the PB equation into a learnable voxel representation, enabling the use of a neural field transformer to predict the PB solution and, subsequently, the reaction field potential energy. Extensive experiments demonstrate that PBNeF achieves over a 100-fold speedup compared to traditional PB solvers, while maintaining accuracy comparable to the Generalized Born (GB) model.	翻訳日:2024-11-02 15:21:16 公開日:2024-10-04
# Reverb: RevからオープンソースASRとダイアリゼーション Reverb: Open-Source ASR and Diarization from Rev ( http://arxiv.org/abs/2410.03930v1 ) ライセンス: Link先を確認	Nishchal Bhandari, Danny Chen, Miguel Ángel del Río Fernández, Natalie Delworth, Jennifer Drexler Fox, Migüel Jetté, Quinten McNamara, Corey Miller, Ondřej Novotný, Ján Profant, Nan Qin, Martin Ratajczak, Jean-Philippe Robichaud,	(参考訳) 今日では、非商用利用のためのコア音声認識およびダイアリゼーションモデルをオープンソース化しています。開発者のためのフルプロダクションパイプラインと、実験用のパースダウンリサーチモデルの両方をリリースしています。 Revは、これらのリリースが音声技術の研究とイノベーションを加速させることを期待している。今日リリースされた音声認識モデルは、様々な長文の音声認識領域で、既存のすべてのオープンソースの音声認識モデルを上回っている。 Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as pared-down research models for experimentation. Rev hopes that these releases will spur research and innovation in the fast-moving domain of voice technology. The speech recognition models released today outperform all existing open source speech recognition models across a variety of long-form speech recognition domains.	翻訳日:2024-11-02 15:21:16 公開日:2024-10-04
# GAS-Norm:ディープラーニングにおける非定常時系列予測のためのスコア駆動適応正規化 GAS-Norm: Score-Driven Adaptive Normalization for Non-Stationary Time Series Forecasting in Deep Learning ( http://arxiv.org/abs/2410.03935v1 ) ライセンス: Link先を確認	Edoardo Urettini, Daniele Atzeni, Reshawn J. Ramjattan, Antonio Carta,	(参考訳) その人気にもかかわらず、時系列予測に適用されるディープニューラルネットワーク(DNN)は、単純な統計モデルに勝てないことが多い。この準最適性能の主な原因の1つは、多くのプロセスに存在するデータ非定常性である。特に、入力データの平均と分散の変化は、DNNの予測能力を損なう可能性がある。本稿では,DNN予測モデルが静的でない単純な設定でフェールするかを最初に示す。次に,GASモデルとディープニューラルネットワークを組み合わせた適応時系列正規化と予測の新しい手法であるGAS-Normを紹介する。 GASアプローチは、スコア駆動型のモデルのファミリーを含み、各新しい観測における平均と分散を推定し、ディープモデルの入力データを正規化するための最新の統計を提供する。最終的に、DNNの出力はGASモデルによって予測される統計を用いて非正規化され、その結果、統計モデリングとディープラーニングの両方の長所を利用するハイブリッドアプローチがもたらされる。適応正規化は、非定常設定におけるモデルの性能を改善する。提案手法はモデルに依存しず,任意のDNN予測モデルに適用可能である。我々の提案を実証的に検証するために、まずGAS-Normと他の最先端の正規化手法を比較した。その後、最先端のDNN予測モデルと組み合わせて、Monashオープンアクセス予測レポジトリの実際のデータセットでテストします。その結果,GAS-Normと組み合わせた場合,25項目中21項目の予測モデルでは,他の正規化手法と比較して精度が向上した。 Despite their popularity, deep neural networks (DNNs) applied to time series forecasting often fail to beat simpler statistical models. One of the main causes of this suboptimal performance is the data non-stationarity present in many processes. In particular, changes in the mean and variance of the input data can disrupt the predictive capability of a DNN. In this paper, we first show how DNN forecasting models fail in simple non-stationary settings. We then introduce GAS-Norm, a novel methodology for adaptive time series normalization and forecasting based on the combination of a Generalized Autoregressive Score (GAS) model and a Deep Neural Network. The GAS approach encompasses a score-driven family of models that estimate the mean and variance at each new observation, providing updated statistics to normalize the input data of the deep model. The output of the DNN is eventually denormalized using the statistics forecasted by the GAS model, resulting in a hybrid approach that leverages the strengths of both statistical modeling and deep learning. The adaptive normalization improves the performance of the model in non-stationary settings. The proposed approach is model-agnostic and can be applied to any DNN forecasting model. To empirically validate our proposal, we first compare GAS-Norm with other state-of-the-art normalization methods. We then combine it with state-of-the-art DNN forecasting models and test them on real-world datasets from the Monash open-access forecasting repository. Results show that deep forecasting models improve their performance in 21 out of 25 settings when combined with GAS-Norm compared to other normalization methods.	翻訳日:2024-11-02 15:21:16 公開日:2024-10-04
# 類似学習とグラフ拡散によるアルツハイマー病サブタイプのクラスタリング Clustering Alzheimer's Disease Subtypes via Similarity Learning and Graph Diffusion ( http://arxiv.org/abs/2410.03937v1 ) ライセンス: Link先を確認	Tianyi Wei, Shu Yang, Davoud Ataee Tarzanagh, Jingxuan Bao, Jia Xu, Patryk Orzechowski, Joost B. Wagenaar, Qi Long, Li Shen,	(参考訳) アルツハイマー病(英語: Alzheimer's disease、AD)は、世界中の何百万人もの人に影響を及ぼす複雑な神経変性疾患である。 ADの異種性のため、診断と治療は重大な課題となる。その結果、近年、これらの課題に対処する手助けができる同種ADサブタイプを同定することに対する研究の関心が高まっている。本研究では, グラフ拡散と類似性学習を用いた非教師なしクラスタリングを用いて, 臨床特徴や病態を特徴付けるADのサブタイプを同定することを目的とする。われわれは多核類似性学習フレームワークであるSIMLRとグラフ拡散を用いて829名のAD患者と軽度認知障害(MCI)患者を対象に,MRI(MRI)画像から抽出した大脳皮質の厚さ測定に基づいてクラスタリングを行った。私たちが利用したクラスタリング手法は、これまでADサブタイプ処理のタスクでは検討されていなかったが、いくつかの一般的なクラスタリング手法よりもはるかに優れた性能を示した。具体的には,サブタイプ検出におけるノイズの影響を低減するために,グラフ拡散のパワーを示した。以上の結果より, バイオマーカー, 認知状態, その他の臨床的特徴に有意差が認められた。得られたサブタイプを更に評価するために、遺伝子関連研究を行い、異なるADサブタイプの潜在的な遺伝的アンダーピンを同定した。私たちのソースコードは、https://github.com/PennShenLab/AD-SIMLR.comで公開されています。 Alzheimer's disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Due to the heterogeneous nature of AD, its diagnosis and treatment pose critical challenges. Consequently, there is a growing research interest in identifying homogeneous AD subtypes that can assist in addressing these challenges in recent years. In this study, we aim to identify subtypes of AD that represent distinctive clinical features and underlying pathology by utilizing unsupervised clustering with graph diffusion and similarity learning. We adopted SIMLR, a multi-kernel similarity learning framework, and graph diffusion to perform clustering on a group of 829 patients with AD and mild cognitive impairment (MCI, a prodromal stage of AD) based on their cortical thickness measurements extracted from magnetic resonance imaging (MRI) scans. Although the clustering approach we utilized has not been explored for the task of AD subtyping before, it demonstrated significantly better performance than several commonly used clustering methods. Specifically, we showed the power of graph diffusion in reducing the effects of noise in the subtype detection. Our results revealed five subtypes that differed remarkably in their biomarkers, cognitive status, and some other clinical features. To evaluate the resultant subtypes further, a genetic association study was carried out and successfully identified potential genetic underpinnings of different AD subtypes. Our source code is available at: https://github.com/PennShenLab/AD-SIMLR.	翻訳日:2024-11-02 15:21:16 公開日:2024-10-04
# AutoLoRA: AutoGuidanceは拡散モデルに対する低ランク適応を実現する AutoLoRA: AutoGuidance Meets Low-Rank Adaptation for Diffusion Models ( http://arxiv.org/abs/2410.03941v1 ) ライセンス: Link先を確認	Artur Kasymov, Marcin Sendera, Michał Stypułkowski, Maciej Zięba, Przemysław Spurek,	(参考訳) 低ランク適応(LoRA)は条件付き生成拡散モデルに適用できる微調整技術である。 LoRAは、モデルを特定のドメイン、キャラクタ、スタイル、コンセプトに適応させるために、少数のコンテキストサンプルを使用する。しかし、訓練中に使用される限られたデータのために、微調整されたモデルの性能は、しばしば強い文脈バイアスと、生成された画像の変動度の低さによって特徴づけられる。この問題を解決するために,LoRAアプローチを微調整した拡散モデルのための新しいガイダンス手法であるAutoLoRAを紹介する。他のガイダンス手法にインスパイアされたAutoLoRAは、LoRA重みで表される領域内の一貫性と基本条件拡散モデルからのサンプルの多様性の間のトレードオフを探索する。さらに,LoRA微調整モデルとベースモデルの両方に分類器フリーガイダンスを組み込むことで,より多様性と品質のよいサンプルが生成されることを示す。いくつかの微調整されたLoRAドメインの実験結果は、選択したメトリクスに対する既存のガイダンス技術よりも優れていることを示している。 Low-rank adaptation (LoRA) is a fine-tuning technique that can be applied to conditional generative diffusion models. LoRA utilizes a small number of context examples to adapt the model to a specific domain, character, style, or concept. However, due to the limited data utilized during training, the fine-tuned model performance is often characterized by strong context bias and a low degree of variability in the generated images. To solve this issue, we introduce AutoLoRA, a novel guidance technique for diffusion models fine-tuned with the LoRA approach. Inspired by other guidance techniques, AutoLoRA searches for a trade-off between consistency in the domain represented by LoRA weights and sample diversity from the base conditional diffusion model. Moreover, we show that incorporating classifier-free guidance for both LoRA fine-tuned and base models leads to generating samples with higher diversity and better quality. The experimental results for several fine-tuned LoRA domains show superiority over existing guidance techniques on selected metrics.	翻訳日:2024-11-02 15:21:16 公開日:2024-10-04
# 振動状態空間モデル Oscillatory State-Space Models ( http://arxiv.org/abs/2410.03943v1 ) ライセンス: Link先を確認	T. Konstantin Rusch, Daniela Rus,	(参考訳) 線形振動状態空間モデル(LinOSS)を提案する。生体神経ネットワークの皮質力学にインスパイアされた我々の提案するLinOSSモデルは、強制調和振動子のシステムに基づく。高速な連想並列スキャンを用いて時間とともに統合された安定な離散化により、提案した状態空間モデルが得られる。我々はLinOSSが非負の対角行列のみを必要とする安定な力学を生成することを証明した。これは、制約パラメータ化に大きく依存する多くの従来の状態空間モデルとは対照的である。さらに、LinOSSが普遍であること、すなわち時間変化関数間の連続および因果演算子マッピングを所望の精度で近似できることを厳密に示す。さらに、LinOSSの暗黙的・明示的な離散化は、基礎となる力学の時間可逆性の対称性を完全に保存していることを示す。これらの特性は、安定かつ正確な長距離予測を確実にしながら、長距離相互作用の効率的なモデリングを可能にする。最後に,中距離から超長距離の分類・回帰まで幅広い時系列タスクにまたがる実験結果と,長水平予測を行い,提案したLinOSSモデルが常に最先端のシーケンスモデルより優れていることを示す。特にLinOSSは、長さ50kのシーケンスを持つシーケンスモデリングタスクにおいて、Mambaを約2倍、LRUを2.5倍上回る。 We propose Linear Oscillatory State-Space models (LinOSS) for efficiently learning on long sequences. Inspired by cortical dynamics of biological neural networks, we base our proposed LinOSS model on a system of forced harmonic oscillators. A stable discretization, integrated over time using fast associative parallel scans, yields the proposed state-space model. We prove that LinOSS produces stable dynamics only requiring nonnegative diagonal state matrix. This is in stark contrast to many previous state-space models relying heavily on restrictive parameterizations. Moreover, we rigorously show that LinOSS is universal, i.e., it can approximate any continuous and causal operator mapping between time-varying functions, to desired accuracy. In addition, we show that an implicit-explicit discretization of LinOSS perfectly conserves the symmetry of time reversibility of the underlying dynamics. Together, these properties enable efficient modeling of long-range interactions, while ensuring stable and accurate long-horizon forecasting. Finally, our empirical results, spanning a wide range of time-series tasks from mid-range to very long-range classification and regression, as well as long-horizon forecasting, demonstrate that our proposed LinOSS model consistently outperforms state-of-the-art sequence models. Notably, LinOSS outperforms Mamba by nearly 2x and LRU by 2.5x on a sequence modeling task with sequences of length 50k.	翻訳日:2024-11-02 15:21:16 公開日:2024-10-04
# 窒化アルミニウム製圧電高Q_m$ナノメカニカル共振器の機械的性質の厚さ依存性 Thickness dependence of the mechanical properties of piezoelectric high-$Q_m$ nanomechanical resonators made from aluminium nitride ( http://arxiv.org/abs/2410.03944v1 ) ライセンス: Link先を確認	Anastasiia Ciers, Alexander Jung, Joachim Ciers, Laurentius Radit Nindito, Hannes Pfeifer, Armin Dadgar, Jürgen Bläsing, André Strittmatter, Witlef Wieczorek,	(参考訳) 高品質な因子(Q_m$)を持つナノメカニカル共振器は、メカニカルベースの量子技術、特に量子センシングと量子トランスダクションを可能にする。 kHz〜MHz帯の高Q_m$ナノメカニカル共振器は、消散希釈技術を用いることでQ_m$を大幅に増大させることができる引張歪薄膜でも実現可能である。本研究では, 窒化アルミニウム(AlN)から作製した引張ひずみ圧電膜の材料特性について検討した。金属-有機気相エピタキシーによりSi(111)上に直接成長した,45nmから295nmの結晶AlN膜を特徴付ける。我々は,AlN薄膜の結晶質と表面粗さ,圧電応答,残留応力および放出応力について報告する。重要なことは、高真空下で室温でフィルムの本質的品質係数を決定することである。拡散希釈を利用したAlNナノメカニカル共振器を作製, 特性評価し, フィルムの引張ひずみを利用して本質的品質係数を高める。 200nm以下のAlNナノメカニカル共振器は10^{12}$Hzで最高Q_m\cdot f_m$-productを示す。我々は,Q_m\cdot f_m$-products がさらに高額なデバイスに到達するであろう材料成長を最適化するための戦略について議論する。これは、引張歪んだ圧電AlNから作られる光電磁気量子デバイスの将来の発展の道を開く。 Nanomechanical resonators with high quality factors ($Q_m$) enable mechanics-based quantum technologies, in particular quantum sensing and quantum transduction. High-$Q_m$ nanomechanical resonators in the kHz to MHz frequency range can be realized in tensile-strained thin films that allow the use of dissipation dilution techniques to drastically increase $Q_m$. In our work, we study the material properties of tensile-strained piezoelectric films made from aluminium nitride (AlN). We characterize crystalline AlN films with a thickness ranging from 45nm to 295nm, which are directly grown on Si(111) by metal-organic vapour-phase epitaxy. We report on the crystal quality and surface roughness, the piezoelectric response, and the residual and released stress of the AlN thin films. Importantly, we determine the intrinsic quality factor of the films at room temperature in high vacuum. We fabricate and characterize AlN nanomechanical resonators that exploit dissipation dilution to enhance the intrinsic quality factor by utilizing the tensile strain in the film. We find that AlN nanomechanical resonators below 200nm thickness exhibit the highest $Q_m\cdot f_m$-product, on the order of $10^{12}$Hz. We discuss possible strategies to optimize the material growth that should lead to devices that reach even higher $Q_m\cdot f_m$-products. This will pave the way for future advancements of optoelectromechanical quantum devices made from tensile-strained piezoelectric AlN.	翻訳日:2024-11-02 15:21:16 公開日:2024-10-04
# 複数の領域にまたがる非整合格子上の気象ダウンスケーリングのための補間自由深層学習と風力発電への応用 Interpolation-Free Deep Learning for Meteorological Downscaling on Unaligned Grids Across Multiple Domains with Application to Wind Power ( http://arxiv.org/abs/2410.03945v1 ) ライセンス: Link先を確認	Jean-Sébastien Giroux, Simon-Philippe Breton, Julie Carreau,	(参考訳) 気候変動が増すにつれて、クリーンなエネルギー源へのシフトがますます急激になる。風力エネルギーの生産が加速するためには、その効率的な使用を保証するために信頼性の高い風力確率予測が不可欠である。しかし、数値的な天気予報モデルは計算に高価であるため、全てのメソスケール風の挙動を捉えるには大きすぎる解像度で確率予測が生成される。統計的ダウンスケーリングは、通常、気候モデルシミュレーションの解像度を高めるために適用され、低分解能(LR)変数から高分解能(HR)気象変数へのマッピングを学習することで、より低い計算コストで実行可能なソリューションを提供する。深層学習を活用することで,風速の粗大な確率論的予測からアンサンブル部材に適用した,最先端のU-Netアーキテクチャに基づくダウンスケーリングモデルを評価する。本アーキテクチャは,(1)LR-HRグリッドのミスマッチを解決するための学習グリッドアライメント戦略と,(2)マルチレベル大気予測器の処理モジュールを組み込んだものである。ダウンスケーリングモデルの適用性を固定空間領域からカナダ全域に拡張するために,転送学習アプローチを評価する。以上の結果から,学習グリッドアライメント戦略は従来のプリプロセッシング補間ステップと同様に動作し,複数のレベルにおけるLR風速は予測器として十分であり,よりコンパクトなアーキテクチャを実現することが示唆された。さらに、移動学習を用いた新しい空間領域への拡張は有望であり、風速のダウンスケールは、風力エネルギーにとって重要な現象である風力ランプの検出を改善する可能性を示すことを示唆している。 As climate change intensifies, the shift to cleaner energy sources becomes increasingly urgent. With wind energy production set to accelerate, reliable wind probabilistic forecasts are essential to ensure its efficient use. However, since numerical weather prediction models are computationally expensive, probabilistic forecasts are produced at resolutions too coarse to capture all mesoscale wind behaviors. Statistical downscaling, typically applied to enchance the resolution of climate model simulations, presents a viable solution with lower computational costs by learning a mapping from low-resolution (LR) variables to high-resolution (HR) meteorological variables. Leveraging deep learning, we evaluate a downscaling model based on a state-of-the-art U-Net architecture, applied to an ensemble member from a coarse-scale probabilistic forecast of wind velocity. The architecture is modified to incorporate (1) a learned grid alignment strategy to resolve LR-HR grid mismatches and (2) a processing module for multi-level atmospheric predictors. To extend the downscaling model's applicability from fixed spatial domains to the entire Canadian region, we assess a transfer learning approach. Our results show that the learned grid alignment strategy performs as well as conventional pre-processing interpolation steps and that LR wind speed at multiple levels is sufficient as a predictor, enabling a more compact architecture. Additionally, they suggest that extending to new spatial domains using transfer learning is promising, and that downscaled wind velocities demonstrate potential in improving the detection of wind power ramps, a critical phenomenon for wind energy.	翻訳日:2024-11-02 15:21:16 公開日:2024-10-04
# リストを囲む構造的質問応答 Structured List-Grounded Question Answering ( http://arxiv.org/abs/2410.03950v1 ) ライセンス: Link先を確認	Mujeen Sung, Song Feng, James Gung, Raphael Shu, Yi Zhang, Saab Mansour,	(参考訳) 文書対話システムは,外部情報を活用することで,ユーザからの問い合わせに答えることを目的としている。従来の研究は主に自由形式の文書を扱うことに焦点を当てており、しばしばリストのような構造化されたデータを見渡す。 GPT-3.5のような先進言語モデルでさえ、しばしばリストのセマンティックな手がかりを見逃してしまうという観察に触発された本論文は、構造化リストの解釈と使用を改善するための質問応答システム(QA)を強化することを目的としている。この目的のために、リスト情報を用いてQAシステムが効果的に応答する能力を評価するための新しいベンチマークであるLIST2QAデータセットを導入する。このデータセットは、言語モデルとモデルベースのフィルタリングプロセスを使用して、ラベルなしの顧客サービスドキュメントから作成され、データ品質を向上させる。微調整されたモデルによる応答を直接生成することとは別に、リスト項目をユーザ背景と整合させて、人間が応答を生成する前にリスト項目をどのように解釈するかをよりよく反映する、ISL(Intermediate Steps for Lists)の明示的な使用についても検討する。実験結果から,LIST2QAでトレーニングしたモデルとISLアプローチが,様々な指標のベースラインより優れていることが示された。具体的には,Flan-T5-XLモデルでは,ROUGE-Lでは3.1%,精度は4.6%,忠実度は4.5%,完全度は20.6%であった。 Document-grounded dialogue systems aim to answer user queries by leveraging external information. Previous studies have mainly focused on handling free-form documents, often overlooking structured data such as lists, which can represent a range of nuanced semantic relations. Motivated by the observation that even advanced language models like GPT-3.5 often miss semantic cues from lists, this paper aims to enhance question answering (QA) systems for better interpretation and use of structured lists. To this end, we introduce the LIST2QA dataset, a novel benchmark to evaluate the ability of QA systems to respond effectively using list information. This dataset is created from unlabeled customer service documents using language models and model-based filtering processes to enhance data quality, and can be used to fine-tune and evaluate QA models. Apart from directly generating responses through fine-tuned models, we further explore the explicit use of Intermediate Steps for Lists (ISL), aligning list items with user backgrounds to better reflect how humans interpret list items before generating responses. Our experimental results demonstrate that models trained on LIST2QA with our ISL approach outperform baselines across various metrics. Specifically, our fine-tuned Flan-T5-XL model shows increases of 3.1% in ROUGE-L, 4.6% in correctness, 4.5% in faithfulness, and 20.6% in completeness compared to models without applying filtering and the proposed ISL method.	翻訳日:2024-11-02 15:10:07 公開日:2024-10-04
# UFLUX v2.0: 地上炭素摂取の効率的かつ説明可能なモデリングのためのプロセスインフォームド機械学習フレームワーク UFLUX v2.0: A Process-Informed Machine Learning Framework for Efficient and Explainable Modelling of Terrestrial Carbon Uptake ( http://arxiv.org/abs/2410.03951v1 ) ライセンス: Link先を確認	Wenquan Dong, Songyan Zhu, Jian Xu, Casey M. Ryan, Man Chen, Jingya Zeng, Hao Yu, Congfeng Cao, Jiancheng Shi,	(参考訳) 光合成によって固定された炭素植物の量であるグロースプライマリ生産性(GPP)は、大域的な炭素循環と生態系の機能を理解する上で重要である。生態学的プロセスの知識に基づいて構築されたプロセスベースのモデルは、それらの仮定と近似から生じるバイアスに影響を受けやすい。これらの制限は、世界のGPP推定にかなりの不確実性をもたらす可能性があり、ネットゼロの目標に重大な課題をもたらす可能性がある。本研究では,プロセスベースモデルとエディ共分散(EC)測定のバイアスを学習することにより,GPP推定の不確実性を低減するため,最先端の生態知識と高度な機械学習技術を統合したプロセスインフォームドモデルであるUFLUX v2.0を提案する。以上の結果から, UFLUX v2.0 では R^2 が 0.79 で, RMSE が 1.60 g C m^-2 d^-1 で, R^2 が 0.51 で RMSE が 3.09 g C m^-2 d^-1 であったのに対し, UFLUX v2.0 では RMSE が 0.9 g C m^-2 d^-1 であった。 UFLUX v2.0 とプロセスベースモデルが同様の全グローバル GPP (137.47 Pg C と 132.23 Pg C ) を達成したのに対し, 空間分布に大きな差が認められた。これらの違いは、プロセスベースのモデルにおける体系的なバイアスと、気候や環境条件に対する感受性の違いに起因する可能性が高い。本研究は, 多様な生態系にまたがるGPPモデリングの適応性を向上し, 地球規模の炭素循環とその環境変化に対する応答の理解を深めるものである。 Gross Primary Productivity (GPP), the amount of carbon plants fixed by photosynthesis, is pivotal for understanding the global carbon cycle and ecosystem functioning. Process-based models built on the knowledge of ecological processes are susceptible to biases stemming from their assumptions and approximations. These limitations potentially result in considerable uncertainties in global GPP estimation, which may pose significant challenges to our Net Zero goals. This study presents UFLUX v2.0, a process-informed model that integrates state-of-art ecological knowledge and advanced machine learning techniques to reduce uncertainties in GPP estimation by learning the biases between process-based models and eddy covariance (EC) measurements. In our findings, UFLUX v2.0 demonstrated a substantial improvement in model accuracy, achieving an R^2 of 0.79 with a reduced RMSE of 1.60 g C m^-2 d^-1, compared to the process-based model's R^2 of 0.51 and RMSE of 3.09 g C m^-2 d^-1. Our global GPP distribution analysis indicates that while UFLUX v2.0 and the process-based model achieved similar global total GPP (137.47 Pg C and 132.23 Pg C, respectively), they exhibited large differences in spatial distribution, particularly in latitudinal gradients. These differences are very likely due to systematic biases in the process-based model and differing sensitivities to climate and environmental conditions. This study offers improved adaptability for GPP modelling across diverse ecosystems, and further enhances our understanding of global carbon cycles and its responses to environmental changes.	翻訳日:2024-11-02 15:10:07 公開日:2024-10-04
# LLM-TOPLA:多様性の最大化による効率的なLLMアンサンブル LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity ( http://arxiv.org/abs/2410.03953v1 ) ライセンス: Link先を確認	Selim Furkan Tekin, Fatih Ilhan, Tiansheng Huang, Sihao Hu, Ling Liu,	(参考訳) トレーニングや推論時に大きな言語モデルを組み合わせることで、コンポーネントLLMよりも大幅にパフォーマンスが向上した。本稿では,多様性を最適化したLLMアンサンブル法であるLLM-TOPLAについて述べる。 (i) アンサンブルの成分LLM間の多様性と性能の相関を捉えるために, 焦点多様性指標を導入する。 (II)我々は,N$ベースLLMのプールからトップkサブアンサンブルを選択するために,多様性最適化型アンサンブルプルーニングアルゴリズムを開発した。我々のプルーニング法では、最高性能のLLMサブアンサンブルが$S$で、しばしば$N$よりもずっと小さいことを推奨している。 3) アンサンブルのすべてのコンポーネントLLM間の出力不整合の検出と解決を学習する学習対アンサンブルアプローチを用いて,各プロンプトクエリに対して新たな出力を生成する。 4つのベンチマークを総合的に評価すると、最高のLLMアンサンブル法よりも優れた性能が得られた。 (i) LLM-TOPLA は MMLU において 2.2 % の精度で、GSM8k では 2.1 % の精度で LLM アンサンブル (MoreAgent ) より優れる。 (ii)ジェネレーティブタスクでは、LLM-TOPLAは検索QAの上位2パフォーマー(Llama70b/Mixtral)をF1で$3.9\mathrm{x}$、XSumではROUGE-1で$38ドル以上パフォーマンスする。 4つのベンチマークで8つの現代的なLCMの出力を含むコードとデータセットはhttps://github.com/git-disl/llm-toplaで公開されています。 Combining large language models during training or at inference time has shown substantial performance gain over component LLMs. This paper presents LLM-TOPLA, a diversity-optimized LLM ensemble method with three unique properties: (i) We introduce the focal diversity metric to capture the diversity-performance correlation among component LLMs of an ensemble. (ii) We develop a diversity-optimized ensemble pruning algorithm to select the top-k sub-ensembles from a pool of $N$ base LLMs. Our pruning method recommends top-performing LLM subensembles of size $S$, often much smaller than $N$. (iii) We generate new output for each prompt query by utilizing a learn-to-ensemble approach, which learns to detect and resolve the output inconsistency among all component LLMs of an ensemble. Extensive evaluation on four different benchmarks shows good performance gain over the best LLM ensemble methods: (i) In constrained solution set problems, LLM-TOPLA outperforms the best-performing ensemble (Mixtral) by 2.2\% in accuracy on MMLU and the best-performing LLM ensemble (MoreAgent) on GSM8k by 2.1\%. (ii) In generative tasks, LLM-TOPLA outperforms the top-2 performers (Llama70b/Mixtral) on SearchQA by $3.9\mathrm{x}$ in F1, and on XSum by more than $38$ in ROUGE-1. Our code and dataset, which contains outputs of 8 modern LLMs on 4 benchmarks is available at https://github.com/git-disl/llm-topla	翻訳日:2024-11-02 15:10:07 公開日:2024-10-04
# 適応型時空間多変時系列インプットのためのSDA-GRIN SDA-GRIN for Adaptive Spatial-Temporal Multivariate Time Series Imputation ( http://arxiv.org/abs/2410.03954v1 ) ライセンス: Link先を確認	Amir Eskandari, Aman Anand, Drishti Sharma, Farhana Zulkernine,	(参考訳) 様々な応用において、多変量時系列はしばしば欠落データに悩まされる。この問題は、データに依存するシステムを著しく破壊する可能性がある。空間的および時間的依存関係は、欠落したサンプルを暗示するために利用することができる。既存の計算手法は、しばしば空間依存の動的変化を無視する。 SDA-GRINは,空間依存性の動的変化を捉えることのできる空間動的グラフリカレントインプットネットワーク(SDA-GRIN)を提案する。 SDA-GRINは、時間グラフの列として多変量時系列をモデル化し、計算に繰り返しメッセージパッシングアーキテクチャを使用する。 SDA-GRINは、AQIでは9.51%、AQI-36では9.40%改善する。 PEMS-BAYデータセットでは、MSEが1.94%改善されている。詳細なアブレーション研究は、ウィンドウサイズと欠落したデータが手法の性能に与える影響を実証している。プロジェクトページ:https://ameskandari.github.io/sda-grin/ In various applications, the multivariate time series often suffers from missing data. This issue can significantly disrupt systems that rely on the data. Spatial and temporal dependencies can be leveraged to impute the missing samples. Existing imputation methods often ignore dynamic changes in spatial dependencies. We propose a Spatial Dynamic Aware Graph Recurrent Imputation Network (SDA-GRIN) which is capable of capturing dynamic changes in spatial dependencies.SDA-GRIN leverages a multi-head attention mechanism to adapt graph structures with time. SDA-GRIN models multivariate time series as a sequence of temporal graphs and uses a recurrent message-passing architecture for imputation. We evaluate SDA-GRIN on four real-world datasets: SDA-GRIN improves MSE by 9.51% for the AQI and 9.40% for AQI-36. On the PEMS-BAY dataset, it achieves a 1.94% improvement in MSE. Detailed ablation study demonstrates the effect of window sizes and missing data on the performance of the method. Project page:https://ameskandari.github.io/sda-grin/	翻訳日:2024-11-02 15:10:07 公開日:2024-10-04
# アナログ量子プロセッサにおける中性子散乱のシミュレーション Simulating Neutron Scattering on an Analog Quantum Processor ( http://arxiv.org/abs/2410.03958v1 ) ライセンス: Link先を確認	Nora Bauer, Victor Ale, Pontus Laurell, Serena Huang, Seth Watabe, David Alan Tennant, George Siopsis,	(参考訳) 物質の中性子散乱特性は、絡み合いや微細構造の研究を可能にするが、理論モデルや予測と比較して古典的にシミュレートするのは非効率である。しかし、量子プロセッサ、特にアナログ量子シミュレーターは、相転移、動的性質、絡み合いの証人を計算するために、状態をリアルタイムで進化させることにより、前例のない、効率的なハミルトンシミュレーション方法を提供する可能性がある。本稿では,QuEra の Aquila プロセッサ上での中性子散乱を,臨界逆場 Ising chain の原型例に対する動的構造因子 (DSF) の測定によりシミュレーションし,誤差軽減法を提案する。ハードウェア上でのプロシージャの性能を,長さ$L=25$まで数値シミュレーションおよび実験的に評価した。さらに, DSFの結果を用いて量子フィッシャー情報(QFI)の密度を計算し, システム内の二部構造の絡み合いを実験的に確認する。 Neutron scattering characterization of materials allows for the study of entanglement and microscopic structure, but is inefficient to simulate classically for comparison to theoretical models and predictions. However, quantum processors, notably analog quantum simulators, have the potential to offer an unprecedented, efficient method of Hamiltonian simulation by evolving a state in real time to compute phase transitions, dynamical properties, and entanglement witnesses. Here, we present a method for simulating neutron scattering on QuEra's Aquila processor by measuring the dynamic structure factor (DSF) for the prototypical example of the critical transverse field Ising chain, and propose a method for error mitigation. We provide numerical simulations and experimental results for the performance of the procedure on the hardware, up to a chain of length $L=25$. Additionally, the DSF result is used to compute the quantum Fisher information (QFI) density, where we confirm bipartite entanglement in the system experimentally.	翻訳日:2024-11-02 15:10:07 公開日:2024-10-04
# 多視点参照通信における接地言語 Grounding Language in Multi-Perspective Referential Communication ( http://arxiv.org/abs/2410.03959v1 ) ライセンス: Link先を確認	Zineng Tang, Lingjun Mao, Alane Suhr,	(参考訳) マルチエージェント環境における表現生成と理解のためのタスクとデータセットを提案する。このタスクでは、共有シーン内の2つのエージェントは、シーン内のオブジェクトへの参照とそれらの間の空間的関係の両方を生成・理解するために、互いに異なる視覚的視点を考慮に入れなければならない。人間の記述した参照表現2,970のデータセットを収集し、それぞれが人間の理解判断と組み合わせ、自動化されたモデルの性能を、人間のパートナーと組み合わせた話者やリスナーとして評価し、人間のエージェントのペアよりも遅れた参照生成と理解の遅延のモデル性能を見出した。その結果、58.9から69.3%に改善され、最強のプロプライエタリモデルよりも優れています。 We introduce a task and dataset for referring expression generation and comprehension in multi-agent embodied environments. In this task, two agents in a shared scene must take into account one another's visual perspective, which may be different from their own, to both produce and understand references to objects in a scene and the spatial relations between them. We collect a dataset of 2,970 human-written referring expressions, each paired with human comprehension judgments, and evaluate the performance of automated models as speakers and listeners paired with human partners, finding that model performance in both reference generation and comprehension lags behind that of pairs of human agents. Finally, we experiment training an open-weight speaker model with evidence of communicative success when paired with a listener, resulting in an improvement from 58.9 to 69.3% in communicative success and even outperforming the strongest proprietary model.	翻訳日:2024-11-02 15:10:07 公開日:2024-10-04
# SwiftKV:知識保存モデル変換による高速プリフィル最適化推論 SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation ( http://arxiv.org/abs/2410.03960v1 ) ライセンス: Link先を確認	Aurick Qiao, Zhewei Yao, Samyam Rajbhandari, Yuxiong He,	(参考訳) LLM推論は、要約、RAG、コードジェネレーションなどの一般的なエンタープライズユースケースにおいて、通常、世代長よりも桁長のプロンプト長を観測する。この特性により、プリフィルのコストが高くなり、応答遅延が増加する。本稿では,生成トークンの品質を保ちながら,プロンプトトークン処理の時間とコストを削減することを目的とした,新しいモデル変換・蒸留手法であるSwiftKVを提案する。 SwiftKVは3つの重要なメカニズムを組み合わせたものだ。 i)SingleInputKVは、ずっと前のレイヤの出力を使用して、後のレイヤのKVキャッシュをプリフィルし、プロンプトトークンがモデル計算の多くをスキップできるようにする。二隣接するレイヤのKVキャッシュをマージしてメモリフットプリントを削減し、より高いスループットのためにより大きなバッチサイズをサポートするAcrossKV 三既存のLLMをSwiftKVに適用し、最小限の精度で、計算量及びデータ要件を低くすることができる知識保存蒸留方法。 Llama-3.1-8Bと70Bでは、SwiftKVはプリフィルの計算要求を50%削減し、KVキャッシュのメモリ要求を62.5%削減した。最適化されたvLLM実装を使用したエンドツーエンドの推論では、SwiftKVは最大2倍の集約スループットと出力トークンあたりの60%の時間短縮を実現している。 4x H100 GPUの16ビット精度で、Llama-3.1-70Bの16Kトークン/sに変換される正規化推論スループットの560 TFlops/GPUを実現することができる。 LLM inference for popular enterprise use cases, such as summarization, RAG, and code-generation, typically observes orders of magnitude longer prompt lengths than generation lengths. This characteristic leads to high cost of prefill and increased response latency. In this paper, we present SwiftKV, a novel model transformation and distillation procedure specifically designed to reduce the time and cost of processing prompt tokens while preserving high quality of generated tokens. SwiftKV combines three key mechanisms: i) SingleInputKV, which prefills later layers' KV cache using a much earlier layer's output, allowing prompt tokens to skip much of the model computation, ii) AcrossKV, which merges the KV caches of neighboring layers to reduce the memory footprint and support larger batch size for higher throughput, and iii) a knowledge-preserving distillation procedure that can adapt existing LLMs for SwiftKV with minimal accuracy impact and low compute and data requirement. For Llama-3.1-8B and 70B, SwiftKV reduces the compute requirement of prefill by 50% and the memory requirement of the KV cache by 62.5% while incurring minimum quality degradation across a wide range of tasks. In the end-to-end inference serving using an optimized vLLM implementation, SwiftKV realizes up to 2x higher aggregate throughput and 60% lower time per output token. It can achieve a staggering 560 TFlops/GPU of normalized inference throughput, which translates to 16K tokens/s for Llama-3.1-70B in 16-bit precision on 4x H100 GPUs.	翻訳日:2024-11-02 15:10:07 公開日:2024-10-04
# 安定化器状態の局所等価性検証アルゴリズム Algorithm to Verify Local Equivalence of Stabilizer States ( http://arxiv.org/abs/2410.03961v1 ) ライセンス: Link先を確認	Adam Burchardt, Jarn de Jong, Lina Vandré,	(参考訳) グラフと安定化器状態の局所的ユニタリ(LU)等価性を検証するアルゴリズムを提案する。本手法は,モジュラー算術における線形方程式系の解法における問題点を軽減する。さらに、2つのグラフ状態間の任意のLU同値が特定の形式を採り、局所クリフォード(LC)同値のクラスを自然に一般化することを示した。最後に、既存のライブラリーを用いて、最大$n=11$の場合、安定化状態のLU軌道とLC軌道の数は同一であることを確認した。 We present an algorithm for verifying the local unitary (LU) equivalence of graph and stabilizer states. Our approach reduces the problem to solving a system of linear equations in modular arithmetic. Furthermore, we demonstrate that any LU equivalence between two graph states takes a specific form, naturally generalizing the class of local Clifford (LC) equivalences. Lastly, using existing libraries, we verify that for up to $n=11$, the number of LU and LC orbits of stabilizer states is identical.	翻訳日:2024-11-02 15:10:07 公開日:2024-10-04
# SpecSAR-Former:Integrated Sentinel-1とSentinel-2を用いたグローバルLULCマッピングのための軽量変換器ベースネットワーク SpecSAR-Former: A Lightweight Transformer-based Network for Global LULC Mapping Using Integrated Sentinel-1 and Sentinel-2 ( http://arxiv.org/abs/2410.03962v1 ) ライセンス: Link先を確認	Hao Yu, Gen Li, Haoyu Liu, Songyan Zhu, Wenquan Dong, Changjian Li,	(参考訳) リモートセンシングの最近のアプローチは、多様な地球観測データセットが利用可能になるにつれて、マルチモーダルデータに注目が集まっている。異なるモダリティから補完的な情報を統合することは、意味的理解を強化する大きな可能性を示している。しかし、既存のグローバルマルチモーダルデータセットには、テクスチャや構造の詳細を捉えるのに優れた合成開口レーダ(SAR)データが含まれていないことが多い。 SARは、他のモダリティと相補的な視点として、地球規模の土地利用と土地被覆(LULC)のための空間情報の利用を促進する。このギャップに対処するため、我々はDynamic World+データセットを導入し、現在の信頼できるマルチスペクトルデータセットであるDynamic WorldをSARデータで拡張した。さらに,マルチスペクトルとSARデータの組み合わせを容易にするために,SpecSAR-Formerと呼ばれる軽量トランスフォーマアーキテクチャを提案する。 Dual Modal Enhancement Module (DMEM) と Mutual Modal Aggregation Module (MMAM) という2つの革新的なモジュールが組み込まれている。これらのモジュールは、スペクトル情報と空間情報を統合するモデルの能力を高め、グローバルLULCセマンティックセマンティックセグメンテーションの全体的な性能を向上させる。さらに,その重要度と情報密度に基づいてパラメータを異なるモダリティに割り当てる不均衡パラメータ割り当て戦略を採用する。大規模な実験により、我々のネットワークは既存のトランスフォーマーやCNNベースのモデルよりも優れており、平均的なユニオンのインターセクション(mIoU)は59.58%、総合的精度(OA)は79.48%、F1スコアは71.68%、パラメータは26.70Mに過ぎなかった。コードはhttps://github.com/Reagan1311/LULC_segmentation.comから入手できる。 Recent approaches in remote sensing have increasingly focused on multimodal data, driven by the growing availability of diverse earth observation datasets. Integrating complementary information from different modalities has shown substantial potential in enhancing semantic understanding. However, existing global multimodal datasets often lack the inclusion of Synthetic Aperture Radar (SAR) data, which excels at capturing texture and structural details. SAR, as a complementary perspective to other modalities, facilitates the utilization of spatial information for global land use and land cover (LULC). To address this gap, we introduce the Dynamic World+ dataset, expanding the current authoritative multispectral dataset, Dynamic World, with aligned SAR data. Additionally, to facilitate the combination of multispectral and SAR data, we propose a lightweight transformer architecture termed SpecSAR-Former. It incorporates two innovative modules, Dual Modal Enhancement Module (DMEM) and Mutual Modal Aggregation Module (MMAM), designed to exploit cross-information between the two modalities in a split-fusion manner. These modules enhance the model's ability to integrate spectral and spatial information, thereby improving the overall performance of global LULC semantic segmentation. Furthermore, we adopt an imbalanced parameter allocation strategy that assigns parameters to different modalities based on their importance and information density. Extensive experiments demonstrate that our network outperforms existing transformer and CNN-based models, achieving a mean Intersection over Union (mIoU) of 59.58%, an Overall Accuracy (OA) of 79.48%, and an F1 Score of 71.68% with only 26.70M parameters. The code will be available at https://github.com/Reagan1311/LULC_segmentation.	翻訳日:2024-11-02 15:10:07 公開日:2024-10-04
# 全目的量子強調センシングのためのスピンボソンモデルにおけるハーネスング量子カオス Harnessing quantum chaos in spin-boson models for all-purpose quantum-enhanced sensing ( http://arxiv.org/abs/2410.03965v1 ) ライセンス: Link先を確認	Yicheng Zhang, Juan Zuniga Castro, Robert J. Lewis-Swan,	(参考訳) 多体量子カオスは、絡み合った状態の準備を加速し、デコヒーレンスと技術的ノイズによる課題を克服するツールとして、大きな可能性を秘めている。ここでは、量子ビットのアンサンブルの共通ボソニックモードへの均一結合を記述するパラダイム的ディックモデルのカオスが、システムパラメータや初期条件を微調整することなく、非ガウス交絡スピンボソン状態の迅速な生成を可能にするかを検討する。しかし、これらの状態の複雑さは、標準プロトコルによる量子強調センシングのユーティリティをアンロックするためには、複雑な、または通常アクセス不能な可観測物の測定が必要であることを意味している。この課題に対処するため、スピン測定のみを用いて、大域スピン回転やボソニックディプレッションの準最適量子化メトロジーを実装可能な、インタラクションベースの読み出しに基づくセンシングスキームを開発した。提案手法は, 技術的ノイズや不完全性に対して堅牢であり, トラップイオンやキャビティQED実験などの現在の量子科学プラットフォームにおいて, カオス力学によって生じる複雑な絡み合った状態を利用する新たな機会を開く。 Many-body quantum chaos has immense potential as a tool to accelerate the preparation of entangled states and overcome challenges due to decoherence and technical noise. Here, we study how chaos in the paradigmatic Dicke model, which describes the uniform coupling of an ensemble of qubits to a common bosonic mode, can enable the rapid generation of non-Gaussian entangled spin-boson states without fine tuning of system parameters or initial conditions. However, the complexity of these states means that unlocking their utility for quantum-enhanced sensing with standard protocols would require the measurement of complex or typically inaccessible observables. To address this challenge, we develop a sensing scheme based on interaction-based readout that enable us to implement near-optimal quantum-enhanced metrology of global spin rotations or bosonic dipslacements using only spin measurements. We show that our approach is robust to technical noise and imperfections and thus opens new opportunities to exploit complex entangled states generated by chaotic dynamics in current quantum science platforms such as trapped-ion and cavity-QED experiments.	翻訳日:2024-11-02 15:00:17 公開日:2024-10-04
# デコードゲーム:ヒューリスティックテキスト生成戦略の最小最適性について Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies ( http://arxiv.org/abs/2410.03968v1 ) ライセンス: Link先を確認	Sijin Chen, Omar Hagrass, Jason M. Klusowski,	(参考訳) 復号化戦略は現代言語モデルにおけるテキスト生成において重要な役割を担っているが、ファズリングのギャップは理論と実践を分ける。意外なことに、例えばMAP(Maximum a Posteriori)のような直感的に最適であるべき戦略は、実際は不十分であることが多い。一方、Top-k$やNucleus sampleのような一般的なヒューリスティックなアプローチでは、条件付き次トーケン確率のトランケーションと正規化が採用されているが、理論的な正当性を欠いている。本稿では,テキスト生成を真の分布に信頼性のあるテキストを生成しようとするストラテジストと,真の分布を逆向きに歪曲するNatureの2プレイヤーゼロサムゲームとして再定義する,包括的な理論フレームワークであるDecoding Gameを提案する。マルチステップ生成の可逆性を議論した後、ワンステップ復号ゲームにおける閉形式の最適戦略を導出する。逆数自然は極大化に対して暗黙の正則化を課し、トランケーション正規化法は、この正則化の下での最適戦略に対する一階近似である。さらに,デコードゲームの目的とパラメータを一般化することにより,グリーディ探索,温度スケーリング,ハイブリッドなどの多種多様な手法を準最適に扱う。理論的解析を補完する数値実験を行った。 Decoding strategies play a pivotal role in text generation for modern language models, yet a puzzling gap divides theory and practice. Surprisingly, strategies that should intuitively be optimal, such as Maximum a Posteriori (MAP), often perform poorly in practice. Meanwhile, popular heuristic approaches like Top-$k$ and Nucleus sampling, which employ truncation and normalization of the conditional next-token probabilities, have achieved great empirical success but lack theoretical justifications. In this paper, we propose Decoding Game, a comprehensive theoretical framework which reimagines text generation as a two-player zero-sum game between Strategist, who seeks to produce text credible in the true distribution, and Nature, who distorts the true distribution adversarially. After discussing the decomposibility of multi-step generation, we derive the optimal strategy in closed form for one-step Decoding Game. It is shown that the adversarial Nature imposes an implicit regularization on likelihood maximization, and truncation-normalization methods are first-order approximations to the optimal strategy under this regularization. Additionally, by generalizing the objective and parameters of Decoding Game, near-optimal strategies encompass diverse methods such as greedy search, temperature scaling, and hybrids thereof. Numerical experiments are conducted to complement our theoretical analysis.	翻訳日:2024-11-02 15:00:17 公開日:2024-10-04
# エンブレス拒絶:無作為なランダムピボットColeskyによるカーネル行列近似 Embrace rejection: Kernel matrix approximation by accelerated randomly pivoted Cholesky ( http://arxiv.org/abs/2410.03969v1 ) ライセンス: Link先を確認	Ethan N. Epperly, Joel A. Tropp, Robert J. Webber,	(参考訳) ランダムピボットされたコレスキー (RPCholesky) は、少数の列を用いて正半有限行列の低ランク近似を構築するアルゴリズムである。本稿では,ブロック行列計算とリジェクションサンプリングを併用したRPCholeskyの高速化版を開発し,元のアルゴリズムの実行を効率的にシミュレートする。カーネル行列を近似するタスクでは、加速されたアルゴリズムは40\times$より高速に動作することができる。本稿では,実装の詳細,理論的保証,ベンチマークデータセットの実験,計算化学への応用について述べる。 Randomly pivoted Cholesky (RPCholesky) is an algorithm for constructing a low-rank approximation of a positive-semidefinite matrix using a small number of columns. This paper develops an accelerated version of RPCholesky that employs block matrix computations and rejection sampling to efficiently simulate the execution of the original algorithm. For the task of approximating a kernel matrix, the accelerated algorithm can run over $40\times$ faster. The paper contains implementation details, theoretical guarantees, experiments on benchmark data sets, and an application to computational chemistry.	翻訳日:2024-11-02 15:00:17 公開日:2024-10-04

Title

Authors

Abstract

論文公表日・翻訳日

# Spectrum-Aware Debiasing - 主要コンポーネントの回帰処理を応用した現代的な推論フレームワーク

Spectrum-Aware Debiasing: A Modern Inference Framework with Applications to Principal Components Regression ( http://arxiv.org/abs/2309.07810v4 )

ライセンス: Link先を確認

Yufan Li, Pragya Sur,

(参考訳) 偏見は高次元統計学における基本的な概念である。自由度調整は、高次元線形回帰における最先端技術である一方、これはi.d.サンプルと亜ガウス共変量に限られる。これらの制約は、その広範な実用性を妨げている。本稿では,高次元回帰のための新しい手法であるSpectrum-Aware Debiasingを紹介する。我々のアプローチは、構造化された依存関係、重いテール、低ランク構造に関する問題に適用されます。提案手法は, サンプル共分散行列のスペクトル情報を用いて再スケーリング係数を導出し, 再スケール勾配降下ステップによるデバイアス化を実現する。スペクトルベースのアプローチは、より広い文脈での正確な偏りの除去を可能にする。特徴量とサンプル数が比例的にスケールする共通近代体制を考察する。我々は、共変量体が右回転不変であるとき、様々な収束概念の下で、提案した推定器の漸近正規性(好適に中心化およびスケール化)を確立する。このような設計は、圧縮センシングにおいて重要な役割を担っているため、近年注目を集めている。さらに、その漸近的分散に対する一貫した推定器を考案する。まず、主成分回帰(PCR)のバイアスを補正するためにSpectrum-Aware Debiasingを使用し、高次元における最初の脱バイアスPCR推定器を提供する。第2に、サンプル共分散行列の信号と固有ベクトルとの整合性を確認するための原理的テストを導入する。このテストは、近似メッセージパッシング(英語版)、Leave-one-out(英語版)、凸ガウスのmin-max定理(英語版)を用いて開発された統計手法には独立に有用である。シミュレーションおよび実データ実験により本手法を実証する。技術的には、近似メッセージパッシングアルゴリズムとデバイアスを結合し、ベクトル近似メッセージパッシング(V-AMP)のコーシー性の最初の証明を提供する。

Debiasing is a fundamental concept in high-dimensional statistics. While degrees-of-freedom adjustment is the state-of-the-art technique in high-dimensional linear regression, it is limited to i.i.d. samples and sub-Gaussian covariates. These constraints hinder its broader practical use. Here, we introduce Spectrum-Aware Debiasing--a novel method for high-dimensional regression. Our approach applies to problems with structured dependencies, heavy tails, and low-rank structures. Our method achieves debiasing through a rescaled gradient descent step, deriving the rescaling factor using spectral information of the sample covariance matrix. The spectrum-based approach enables accurate debiasing in much broader contexts. We study the common modern regime where the number of features and samples scale proportionally. We establish asymptotic normality of our proposed estimator (suitably centered and scaled) under various convergence notions when the covariates are right-rotationally invariant. Such designs have garnered recent attention due to their crucial role in compressed sensing. Furthermore, we devise a consistent estimator for its asymptotic variance. Our work has two notable by-products: first, we use Spectrum-Aware Debiasing to correct bias in principal components regression (PCR), providing the first debiased PCR estimator in high dimensions. Second, we introduce a principled test for checking alignment between the signal and the eigenvectors of the sample covariance matrix. This test is independently valuable for statistical methods developed using approximate message passing, leave-one-out, or convex Gaussian min-max theorems. We demonstrate our method through simulated and real data experiments. Technically, we connect approximate message passing algorithms with debiasing and provide the first proof of the Cauchy property of vector approximate message passing (V-AMP).

翻訳日:2024-11-09 14:28:50 公開日:2024-10-04

# 注意層上でのシンプルなドロップインロラ条件は拡散モデルを改善する

Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model ( http://arxiv.org/abs/2405.03958v2 )

ライセンス: Link先を確認

Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu,

(参考訳) 現在の最先端拡散モデルでは、畳み込み層と(qkv)自己アテンション層を含むU-Netアーキテクチャを採用している。 U-Netは、サンプリングステップ毎にタイム埋め込み入力と、所望の条件生成に対応するクラスまたはキャプション埋め込み入力とに基づいて、条件付きで画像を処理する。このような条件付けは、畳み込み層へのスケール・アンド・シフト操作を含むが、注意層に直接影響しない。これらの標準的なアーキテクチャ選択は確かに有効であるが、注意層を条件付けしないことは任意であり、潜在的に最適であると感じている。本研究では,U-Netアーキテクチャの他の部分を変更・調整することなく,LoRAコンディショニングをアテンション層に追加するだけで画像生成品質が向上することを示す。例えば、EDM拡散モデルにLoRA条件を付加すると、不条件およびクラス条件のCIFAR-10生成に対するFIDスコアが 1.91/1.75 となり、ベースラインが 1.97/1.79 となる。

Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79.

翻訳日:2024-11-09 02:52:29 公開日:2024-10-04

# 結合チューニングに基づく固定位相可変方向カプラ

A fixed phase tunable directional coupler based on coupling tuning ( http://arxiv.org/abs/2405.13660v2 )

ライセンス: Link先を確認

Yang Yang, Tim Weiss, Hamed Arianfard, Akram Youssry, Alberto Peruzzo,

(参考訳) フォトニック集積回路の分野は近年大きく進歩し、高性能な再構成が可能なデバイスへの需要が高まっている。従来の調整可能な指向性カプラ(TDC)が、反射率を調整しながら一定の位相を維持することができないため、大規模な回路構築において、反射率調整のための一次構造ブロックとしてマッハ・ツェンダー干渉計(MZI)が使用される。しかし、MZIは、そのスケーラビリティを妨げる0-1反射率を達成するために、完全なバランスの取れた方向性結合器を必要とするため、製造エラーを起こしやすい。本研究では,薄膜Lithium Niobateプラットフォームにおける結合定数チューニングに基づくTDCの設計と最適化設計を提案する。最適化されたTDC設計は、幅広い動作波長で一貫した位相を確保しつつ、任意の反射率調整を可能にする。さらに、MZIよりも曲げ面積が少なく、MZIと従来のTDCと比べ、導波路形状および結合長の加工誤差に本質的に耐性がある。本研究は,光通信システムや量子情報処理など,様々な分野に影響を及ぼす高性能フォトニック集積回路の開発に寄与する。

The field of photonic integrated circuits has witnessed significant progress in recent years, with a growing demand for devices that offer high-performance reconfigurability. Due to the inability of conventional tunable directional couplers (TDCs) to maintain a fixed phase while tuning the reflectivity, Mach-Zehnder interferometers (MZIs) are employed as the primary building blocks for reflectivity tuning in constructing large-scale circuits. However, MZIs are prone to fabrication errors due to the need for perfect balanced directional couplers to achieve 0-1 reflectivity, which hinders their scalability. In this study, we introduce a design of a TDC based on coupling constant tuning in the thin film Lithium Niobate platform and present an optimized design. Our optimized TDC design enables arbitrary reflectivity tuning while ensuring a consistent phase across a wide range of operating wavelengths. Furthermore, it exhibits fewer bending sections than MZIs and is inherently resilient to fabrication errors in waveguide geometry and coupling length compared to both MZIs and conventional TDCs. Our work contributes to developing high-performance photonic integrated circuits with implications for various fields, including optical communication systems and quantum information processing.

翻訳日:2024-11-09 02:18:45 公開日:2024-10-04

# LearnerVoice:非負の英語学習者の自発音声のデータセット

LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech ( http://arxiv.org/abs/2407.04280v2 )

ライセンス: Link先を確認

Haechan Kim, Junho Myung, Seoyoung Kim, Sungpah Lee, Dongyeop Kang, Juho Kim,

(参考訳) 第二言語(L2)学習者による自然発話における非文法的表現と不一致は、自動音声認識(ASR)システムに固有の課題を提起する。しかし、L2学習音声に適したデータセットはほとんどない。我々はLearnerVoiceを公開し、LearnerVoiceは50.04時間の音声とL2学習者の自然発話の書き起こしからなるデータセットである。言語学的分析の結果,L2S(L2学習者の自発音声)の特徴は,非文法的表現と不一致(例えば,充足語,単語繰り返し,自己修復,偽開始)から成り立っていることがわかった。 LearnerVoiceによる微調整のwhisper-small.enのWERは10.26%、バニラのwhisper-small.enよりも44.2%低い。さらに,LearnerVoiceにおけるバニラモデルの誤差の54.2%がL2Sの特徴によるもので,48.1%が微調整モデルで減少している。

Prevalent ungrammatical expressions and disfluencies in spontaneous speech from second language (L2) learners pose unique challenges to Automatic Speech Recognition (ASR) systems. However, few datasets are tailored to L2 learner speech. We publicly release LearnerVoice, a dataset consisting of 50.04 hours of audio and transcriptions of L2 learners' spontaneous speech. Our linguistic analysis reveals that transcriptions in our dataset contain L2S (L2 learner's Spontaneous speech) features, consisting of ungrammatical expressions and disfluencies (e.g., filler words, word repetitions, self-repairs, false starts), significantly more than native speech datasets. Fine-tuning whisper-small.en with LearnerVoice achieves a WER of 10.26%, 44.2% lower than vanilla whisper-small.en. Furthermore, our qualitative analysis indicates that 54.2% of errors from the vanilla model on LearnerVoice are attributable to L2S features, with 48.1% of them being reduced in the fine-tuned model.

翻訳日:2024-11-08 23:57:53 公開日:2024-10-04

# 微分プライベートインダクティブマイナー

Differentially Private Inductive Miner ( http://arxiv.org/abs/2407.04595v2 )

ライセンス: Link先を確認

Max Schulze, Yorck Zisgen, Moritz Kirschte, Esfandiar Mohammadi, Agnes Koschmider,

(参考訳) プロセスマイニングにおけるイベントトレースのような個人に関する個人データの保護は、個人が引き起こしたプロセスモデルにおいて、イベントトレースがパスに関する情報を漏らすため、本質的に難しい作業である。しかし、k-匿名性やイベントログの衛生化といったイベントトレースの以前の匿名化手法は、そのようなリークに対して、特に十分な背景知識を持つ敵に対する防御に苦慮していた。本研究では,プライバシ保護方式でプロセスツリーを学習し,センシティブなイベントトレースを要約する手法を提案する。我々は、いわゆる差分プライバシー(DP)プロパティを通して、結果の要約から、イベントトレース内の任意の個人データについて有用な推論ができないことを証明した。技術的には、インダクティブマイナーの微分プライベート近似(DPIM)を導入する。実験により、DPIMとインダクティブマイナーを14の現実世界のイベントトレースで比較し、フィットネス、精度、単純さ、一般化といったよく知られた指標を評価した。実験の結果,DPIMは個人データを保護するだけでなく,インダクティブ・マイナーよりも有効性が低い忠実なプロセスツリーを生成することがわかった。

Protecting personal data about individuals, such as event traces in process mining, is an inherently difficult task since an event trace leaks information about the path in a process model that an individual has triggered. Yet, prior anonymization methods of event traces like k-anonymity or event log sanitization struggled to protect against such leakage, in particular against adversaries with sufficient background knowledge. In this work, we provide a method that tackles the challenge of summarizing sensitive event traces by learning the underlying process tree in a privacy-preserving manner. We prove via the so-called Differential Privacy (DP) property that from the resulting summaries no useful inference can be drawn about any personal data in an event trace. On the technical side, we introduce a differentially private approximation (DPIM) of the Inductive Miner. Experimentally, we compare our DPIM with the Inductive Miner on 14 real-world event traces by evaluating well-known metrics: fitness, precision, simplicity, and generalization. The experiments show that our DPIM not only protects personal data but also generates faithful process trees that exhibit little utility loss above the Inductive Miner.

翻訳日:2024-11-08 23:46:45 公開日:2024-10-04

# CopyBench: 言語モデル生成における著作権保護テキストのリテラルと非リテラル再現の測定

CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation ( http://arxiv.org/abs/2407.07087v2 )

ライセンス: Link先を確認

Tong Chen, Akari Asai, Niloofar Mireshghallah, Sewon Min, James Grimmelmann, Yejin Choi, Hannaneh Hajishirzi, Luke Zettlemoyer, Pang Wei Koh,

(参考訳) 言語モデル(LM)による著作権保護されたコンテンツの再生の度合いを評価することは、AIと法的なコミュニティにとって重要な関心事である。再現度を評価する際には、リテラルと非リテラルの類似性の両方が裁判所によって検討されているが、先行研究はリテラルの類似性のみに焦点を当てている。このギャップを埋めるために、私たちは、LM世代におけるリテラルと非リテラルの両方のコピーを測定するために設計されたベンチマークであるCopyBenchを紹介します。著作権書をテキストソースとして使用することにより,著作権書から事実を想起し,流動的な完成物を生成する能力の観点から,リテラルおよびノンリテラルコピーを評価するための自動評価プロトコルを提供する。リテラル複写は比較的稀であるが、イベント複写と文字複写という2種類の非リテラル複写は、7Bパラメータのモデルでも発生する。 Llama3-8Bモデルと70Bモデルを比較すると、リテラルコピー率は0.2\%から10.5\%に増加し、非リテラルコピーは2.3\%から5.9\%に増加した。さらに,(1) トレーニング時アライメントはリテラル複写を削減できるが,非リテラル複写を増大させる可能性があり,(2) 現行の推論時緩和手法はリテラルを減少させるが,非リテラル複写を減少させるものではないことを示す。

Evaluating the degree of reproduction of copyright-protected content by language models (LMs) is of significant interest to the AI and legal communities. Although both literal and non-literal similarities are considered by courts when assessing the degree of reproduction, prior research has focused only on literal similarities. To bridge this gap, we introduce CopyBench, a benchmark designed to measure both literal and non-literal copying in LM generations. Using copyrighted fiction books as text sources, we provide automatic evaluation protocols to assess literal and non-literal copying, balanced against the model utility in terms of the ability to recall facts from the copyrighted works and generate fluent completions. We find that, although literal copying is relatively rare, two types of non-literal copying -- event copying and character copying -- occur even in models as small as 7B parameters. Larger models demonstrate significantly more copying, with literal copying rates increasing from 0.2\% to 10.5\% and non-literal copying from 2.3\% to 5.9\% when comparing Llama3-8B and 70B models, respectively. We further evaluate the effectiveness of current strategies for mitigating copying and show that (1) training-time alignment can reduce literal copying but may increase non-literal copying, and (2) current inference-time mitigation methods primarily reduce literal but not non-literal copying.

翻訳日:2024-11-08 22:51:19 公開日:2024-10-04

# 共形場理論とホログラフィーにおける絡み合い非対称性

Entanglement asymmetry in conformal field theory and holography ( http://arxiv.org/abs/2407.07969v2 )

ライセンス: Link先を確認

Francesco Benini, Victor Godet, Amartya Harsh Singh,

(参考訳) エンタングルメント非対称性(英: Entanglement asymmetric)は、量子情報理論に着想を得た量子サブシステムにおける対称性の破れの尺度である。 U(1)対称性を持つ共形場の量子論における励起的「コヒーレント状態」のクラスの絡み合い非対称性を、位相対称性の欠陥を持つユークリッドパス積分法とレプリカ形式主義を用いて研究する。摂動理論では、平面空間における有限球面部分領域、有限体積、正の温度を含む様々なサブシステムの非対称性を先導的に計算する。我々はまた、そのローレンツ時間の進化を研究し、熱化による対称性の動的復元と量子ムペンバ効果の存在を示す。我々の結果は普遍的であり、任意の次元に適用できる。また、摂動エンタングルメント非対称性は、ホランドス=ウォルド標準エネルギーと呼ばれる既知のホログラフィック双対を持つフィッシャー情報量と関係しており、エンタングルメントウェッジに含まれるAdSバルク電荷によって捕捉されることを示す。

Entanglement asymmetry is a measure of symmetry breaking in quantum subsystems, inspired by quantum information theory, particularly suited to study out-of-equilibrium states. We study the entanglement asymmetry of a class of excited "coherent states" in conformal quantum field theories with a U(1) symmetry, employing Euclidean path-integral methods with topological symmetry defects and the replica formalism. We compute, at leading order in perturbation theory, the asymmetry for a variety of subsystems, including finite spherical subregions in flat space, in finite volume, and at positive temperature. We also study its Lorentzian time evolution, showcasing the dynamical restoration of the symmetry due to thermalization, as well as the presence of a quantum Mpemba effect. Our results are universal, and apply in any number of dimensions. We also show that the perturbative entanglement asymmetry is related to the Fisher information metric, which has a known holographic dual called Hollands-Wald canonical energy, and that it is captured by the AdS bulk charge contained in the entanglement wedge.

翻訳日:2024-11-08 22:29:09 公開日:2024-10-04

# 2024年欧州議会議員選挙に関する事例研究

Investigating LLMs as Voting Assistants via Contextual Augmentation: A Case Study on the European Parliament Elections 2024 ( http://arxiv.org/abs/2407.08495v2 )

ライセンス: Link先を確認

Ilias Chalkidis,

(参考訳) 2024年の欧州議会議員選挙では、LLMがVoting Advice Applications (VAA)として利用できるかどうかを調査している。我々は、MISTRALとMIXTRALモデルを評価し、最新の「EUとI」投票支援アンケートに基づいて、政党の姿勢を予測する際の精度を評価する。さらに、Web検索に頼って入力コンテキストをRAG(Retrieval-Augmented Generation)によって拡張し、モデルの内部メモリから関連コンテンツを再収集することを目的とした、段階的会話を用いた自己回帰(Self-Reflection)により、モデルの性能を改善する方法を検討する。その結果,MIXTRALは平均82%の精度で高い精度を示し,異なる政治グループ(50～95%)で有意な性能差が認められた。入力コンテキストを専門家による情報で拡張することで、近似が大幅に向上する可能性がある。これは、キュレートされたコンテンツを考慮しても、自動RAGアプローチのオープンな課題であり続けている。

In light of the recent 2024 European Parliament elections, we are investigating if LLMs can be used as Voting Advice Applications (VAAs). We audit MISTRAL and MIXTRAL models and evaluate their accuracy in predicting the stance of political parties based on the latest "EU and I" voting assistance questionnaire. Furthermore, we explore alternatives to improve models' performance by augmenting the input context via Retrieval-Augmented Generation (RAG) relying on web search, and Self-Reflection using staged conversations that aim to re-collect relevant content from the model's internal memory. We find that MIXTRAL is highly accurate with an 82% accuracy on average with a significant performance disparity across different political groups (50-95%). Augmenting the input context with expert-curated information can lead to a significant boost of approx. 9%, which remains an open challenge for automated RAG approaches, even considering curated content.

翻訳日:2024-11-08 22:17:54 公開日:2024-10-04

# 深層学習を用いたシングルイメージシャドウ除去:包括的調査

Single-Image Shadow Removal Using Deep Learning: A Comprehensive Survey ( http://arxiv.org/abs/2407.08865v2 )

ライセンス: Link先を確認

Laniqng Guo, Chong Wang, Yufei Wang, Yi Yu, Siyu Huang, Wenhan Yang, Alex C. Kot, Bihan Wen,

(参考訳) シャドウ除去は、シャドウ領域と非シャドウ領域の均一な照明分布を追求し、シャドウ領域内の画像内容を復元することを目的としている。【他の画像復元作業と比較して,影除去には2つの課題がある】 1) 影のパターンは任意であり、変化し、しばしば非常に複雑なトレース構造を持つため、「トレースレス」画像の回復は困難である。 2) 陰影による劣化は空間的に不均一であり, 照度と陰影領域と非陰影領域の色の矛盾が生じている。この分野での最近の開発は、主にディープラーニングベースのソリューションによって進められており、様々な学習戦略、ネットワークアーキテクチャ、損失関数、トレーニングデータを利用している。それでも、ディープラーニングに基づくシャドウ除去技術に関する、徹底的で洞察に富んだレビューは、まだ欠落している。本稿では,技術詳細からアプリケーションまで,さまざまな側面をカバーする総合的な調査を初めて実施する。深層学習に基づくシングルイメージシャドウ除去手法の大きな進歩を強調し、様々なカテゴリにわたる過去の研究を徹底的にレビューし、これらの発展の歴史的進展に関する洞察を提供する。さらに,性能比較を定量的かつ質的に要約する。シャドウ除去の技術的側面の他に、この分野の将来的な方向性についても検討する。

Shadow removal aims at restoring the image content within shadow regions, pursuing a uniform distribution of illumination that is consistent between shadow and non-shadow regions. {Comparing to other image restoration tasks, there are two unique challenges in shadow removal:} 1) The patterns of shadows are arbitrary, varied, and often have highly complex trace structures, making ``trace-less'' image recovery difficult. 2) The degradation caused by shadows is spatially non-uniform, resulting in inconsistencies in illumination and color between shadow and non-shadow areas. Recent developments in this field are primarily driven by deep learning-based solutions, employing a variety of learning strategies, network architectures, loss functions, and training data. Nevertheless, a thorough and insightful review of deep learning-based shadow removal techniques is still lacking. In this paper, we are the first to provide a comprehensive survey to cover various aspects ranging from technical details to applications. We highlight the major advancements in deep learning-based single-image shadow removal methods, thoroughly review previous research across various categories, and provide insights into the historical progression of these developments. Additionally, we summarize performance comparisons both quantitatively and qualitatively. Beyond the technical aspects of shadow removal methods, we also explore potential future directions for this field.

翻訳日:2024-11-08 22:17:54 公開日:2024-10-04

# Rydberg-atom arrayを用いたディジタルアナログ量子遺伝的アルゴリズム

Digital-analog quantum genetic algorithm using Rydberg-atom arrays ( http://arxiv.org/abs/2407.09308v2 )

ライセンス: Link先を確認

Aleix Llenas, Lucas Lamata,

(参考訳) デジタルアナログ量子コンピューティング(DAQC)は、デジタルゲートとアナログ演算を組み合わせて、普遍的な量子計算の代替パラダイムを提供する。このアプローチは、アナログ演算の高忠実度と局所的な単一量子ゲートの柔軟性を活用する。本稿では,Rydberg-atom emulator を用いたDAQCフレームワーク内の量子遺伝的アルゴリズムを提案する。このアルゴリズムは、デジタル領域における単一量子演算とアナログ領域におけるライドバーグ・ハミルトニアンに基づく大域的駆動相互作用を用いる。我々は,ハミルトニアンの基底状態エネルギーを推定してアルゴリズムの性能を評価し,$\rm H_2$,$\rm LiH$,$\rm BeH_2$などの分子に着目した。その結果, 誤差が1%未満で, 状態重なりが1に近づき, 計算時間は, 数分で$\rm H_2$ (2-qubit) から$\rm LiH$ と $\rm BeH_2$ (6-qubit) までの1～2日間であった。グローバルアナログ演算のゲート忠実性は、ノイズの多い中間スケール量子時代における有望な量子コンピューティング戦略としてDAQCをさらに強調する。

Digital-analog quantum computing (DAQC) combines digital gates with analog operations, offering an alternative paradigm for universal quantum computation. This approach leverages the higher fidelities of analog operations and the flexibility of local single-qubit gates. In this paper, we propose a quantum genetic algorithm within the DAQC framework using a Rydberg-atom emulator. The algorithm employs single-qubit operations in the digital domain and a global driving interaction based on the Rydberg Hamiltonian in the analog domain. We evaluate the algorithm performance by estimating the ground-state energy of Hamiltonians, with a focus on molecules such as $\rm H_2$, $\rm LiH$, and $\rm BeH_2$. Our results show energy estimations with less than 1% error and state overlaps nearing 1, with computation times ranging from a few minutes for $\rm H_2$ (2-qubit circuits) to one to two days for $\rm LiH$ and $\rm BeH_2$ (6-qubit circuits). The gate fidelities of global analog operations further underscore DAQC as a promising quantum computing strategy in the noisy intermediate-scale quantum era.

翻訳日:2024-11-08 22:06:29 公開日:2024-10-04

# AIoTにおけるFPGAを用いた時系列予測のための整数のみ量子変換器

Integer-only Quantized Transformers for Embedded FPGA-based Time-series Forecasting in AIoT ( http://arxiv.org/abs/2407.11041v3 )

ライセンス: Link先を確認

Tianheng Ling, Chao Qian, Gregor Schiele,

(参考訳) 本稿では,AIoTシステムにおけるデバイス上の時系列予測に最適化されたTransformers用ハードウェアアクセラレータの設計について述べる。整数のみの量子化と量子化アウェアトレーニングを最適化されたハードウェア設計と統合し、6ビットおよび4ビットの量子化トランスフォーマーモデルを実現し、関連する研究から8ビットの量子化モデルに匹敵する精度を達成した。組み込みFPGA(Xilinx Spartan-7 XC7S15)の完全な実装を利用して,組込みIoTデバイスにTransformerモデルをデプロイする可能性を検討する。これには、達成可能な精度、リソース利用、タイミング、電力、デバイス上の推論のためのエネルギー消費の徹底的な分析が含まれる。以上の結果から,十分な性能を達成できたとしても,最適化プロセスは簡単ではないことが示唆された。例えば、量子化ビット幅の削減は、様々な最適化の組み合わせを体系的に探索する必要性を強調し、レイテンシやエネルギー消費を一貫して減少させるわけではない。関連する研究で8ビット量子トランスモデルと比較すると、我々の4ビット量子トランスモデルはテスト損失をわずか0.63%増加させ、最大132.33倍速く動作し、48.19倍のエネルギーを消費する。

This paper presents the design of a hardware accelerator for Transformers, optimized for on-device time-series forecasting in AIoT systems. It integrates integer-only quantization and Quantization-Aware Training with optimized hardware designs to realize 6-bit and 4-bit quantized Transformer models, which achieved precision comparable to 8-bit quantized models from related research. Utilizing a complete implementation on an embedded FPGA (Xilinx Spartan-7 XC7S15), we examine the feasibility of deploying Transformer models on embedded IoT devices. This includes a thorough analysis of achievable precision, resource utilization, timing, power, and energy consumption for on-device inference. Our results indicate that while sufficient performance can be attained, the optimization process is not trivial. For instance, reducing the quantization bitwidth does not consistently result in decreased latency or energy consumption, underscoring the necessity of systematically exploring various optimization combinations. Compared to an 8-bit quantized Transformer model in related studies, our 4-bit quantized Transformer model increases test loss by only 0.63%, operates up to 132.33x faster, and consumes 48.19x less energy.

翻訳日:2024-11-08 21:21:36 公開日:2024-10-04

Tianheng Ling, Chao Qian, Gregor Schiele,

翻訳日:2024-11-08 21:21:36 公開日:2024-10-04

# VLMはチャートを本当に理解しているか? 一貫性とロバストさを深く掘り下げる

Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness ( http://arxiv.org/abs/2407.11229v2 )

ライセンス: Link先を確認

Srija Mukhopadhyay, Adnan Qidwai, Aparna Garimella, Pritika Ramu, Vivek Gupta, Dan Roth,

(参考訳) チャート質問応答(CQA)は、ビジュアル言語理解の重要な領域である。しかし、この分野における現在のVisual Language Models (VLM) の頑健さと一貫性はいまだ解明されていない。本稿では,多種多様な問合せカテゴリやチャート形式を含む包括的データセット上での最先端VLMの評価を行う。私たちは2つの重要な側面を調査します。 1) モデルが様々なレベルのチャートを処理し、複雑さを問う能力、及び 2)同じ基礎データの異なる視覚的表現にまたがる堅牢性。本分析では,従来のモデルの強みと弱みを両立させ,質問型とチャート型に基づく有意な性能変化を明らかにした。さらに,より堅牢で信頼性の高いCQAシステムを構築するために,改善すべき領域を特定し,今後の研究方向性を提案する。この研究は、現在のモデルの限界に光を当て、今後の分野の発展への道を開く。

Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art VLMs on comprehensive datasets, developed specifically for this study, encompassing diverse question categories and chart formats. We investigate two key aspects: 1) the models' ability to handle varying levels of chart and question complexity, and 2) their robustness across different visual representations of the same underlying data. Our analysis reveals significant performance variations based on question and chart types, highlighting both strengths and weaknesses of current models. Additionally, we identify areas for improvement and propose future research directions to build more robust and reliable CQA systems. This study sheds light on the limitations of current models and paves the way for future advancements in the field.

翻訳日:2024-11-08 21:10:26 公開日:2024-10-04

# タブラルデータに対する敵対的攻撃の非受容性の検討--経験的分析

Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis ( http://arxiv.org/abs/2407.11463v3 )

ライセンス: Link先を確認

Zhipeng He, Chun Ouyang, Laith Alzubaidi, Alistair Barros, Catarina Moreira,

(参考訳) 敵対的攻撃は、入力データに対する知覚不能な摂動を通じて誤った予測を引き起こすことによって、機械学習モデルに対する潜在的な脅威である。これらの攻撃は、画像のような構造化されていないデータで広く研究されているが、それらを表のデータに適用することは、新しい課題をもたらす。これらの課題は、画像データとは異なる表データの固有の不均一性と複雑な特徴相互依存性から生じる。この区別を考慮に入れるには、表型データに特有な適合不能基準を確立する必要がある。しかし、現在、表データに対する敵攻撃の非受容性を評価するための標準化された指標が欠如している。このギャップに対処するために、表データに対する知覚不能な敵攻撃を包括的に特徴付けるために、重要な特性とそれに対応するメトリクスのセットを提案する。それらは、元の入力に近づき、変更された特徴の空間性、元のデータ分布からの逸脱、狭い分散を伴う摂動特性の感度、変更すべき機能の不変性、有効な実用的な範囲を超えてはならない特定の特徴値の実現性、データ属性間の複雑な関係をキャプチャする機能相互依存性である。提案手法を用いて,有界攻撃と非有界攻撃の両方を含む5つの敵攻撃の非受容性を評価する。その結果、これらの攻撃の不可避性と有効性の間のトレードオフが明らかとなった。この研究は、現在の攻撃アルゴリズムの限界を特定し、この分野における将来の研究をガイドする洞察を提供する。この経験的分析から得られた知見は、敵攻撃アルゴリズムの設計を強化する上で貴重な方向を提供する。

Adversarial attacks are a potential threat to machine learning models by causing incorrect predictions through imperceptible perturbations to the input data. While these attacks have been extensively studied in unstructured data like images, applying them to tabular data, poses new challenges. These challenges arise from the inherent heterogeneity and complex feature interdependencies in tabular data, which differ from the image data. To account for this distinction, it is necessary to establish tailored imperceptibility criteria specific to tabular data. However, there is currently a lack of standardised metrics for assessing the imperceptibility of adversarial attacks on tabular data. To address this gap, we propose a set of key properties and corresponding metrics designed to comprehensively characterise imperceptible adversarial attacks on tabular data. These are: proximity to the original input, sparsity of altered features, deviation from the original data distribution, sensitivity in perturbing features with narrow distribution, immutability of certain features that should remain unchanged, feasibility of specific feature values that should not go beyond valid practical ranges, and feature interdependencies capturing complex relationships between data attributes. We evaluate the imperceptibility of five adversarial attacks, including both bounded attacks and unbounded attacks, on tabular data using the proposed imperceptibility metrics. The results reveal a trade-off between the imperceptibility and effectiveness of these attacks. The study also identifies limitations in current attack algorithms, offering insights that can guide future research in the area. The findings gained from this empirical analysis provide valuable direction for enhancing the design of adversarial attack algorithms, thereby advancing adversarial machine learning on tabular data.

翻訳日:2024-11-08 21:10:26 公開日:2024-10-04

# TGIF:テキスト入力による偽造データ

TGIF: Text-Guided Inpainting Forgery Dataset ( http://arxiv.org/abs/2407.11566v2 )

ライセンス: Link先を確認

Hannes Mareen, Dimitrios Karageorgiou, Glenn Van Wallendael, Peter Lambert, Symeon Papadopoulos,

(参考訳) デジタル画像操作は、生成AI技術の出現により、ますますアクセスしやすく、現実的なものになりつつある。近年の進歩により、テキストガイドによるインペイントが可能となり、最小限の努力で高度な画像編集が可能になった。これはデジタルメディアの法医学に新たな課題をもたらす。例えば、拡散モデルに基づくアプローチは、塗装された領域を元の画像に分割するか、あるいは全体像を再生することができる。後者の場合、従来のイメージフォージェリーローカライゼーション(IFL)メソッドは通常失敗する。本稿では,画像フォージェリローカライゼーションと合成画像検出(SID)手法のトレーニングと評価を支援するために設計された画像の包括的コレクションであるText-Guided Inpainting Forgery (TGIF)データセットを紹介する。 TGIFデータセットには、SD2、SDXL、Adobe Fireflyといった人気のあるオープンソースおよび商用メソッドから派生した、約75kの偽画像が含まれている。我々は、TGIF上で、最先端のIFLとSIDのいくつかの手法をベンチマークする。従来のIRF法ではスプライシング画像が検出できるが、再生されたインペイント画像は検出できない。さらに、従来のSIDは、再生した塗布された画像が偽のものであることを検出できるが、塗布された領域をローカライズすることはできない。最後に、IFLとSIDのどちらの手法も強い圧縮にさらされると失敗するが、WEBPのような現代の圧縮アルゴリズムでは堅牢ではない。結論として、現代の生成的アプローチによる局所的な操作に対する最先端検出器の非効率性を実証し、より有能なIFL法とSID法の開発を支援することを目的としている。データセットとコードはhttps://github.com/IDLabMedia/tgif-dataset.comからダウンロードできる。

Digital image manipulation has become increasingly accessible and realistic with the advent of generative AI technologies. Recent developments allow for text-guided inpainting, making sophisticated image edits possible with minimal effort. This poses new challenges for digital media forensics. For example, diffusion model-based approaches could either splice the inpainted region into the original image, or regenerate the entire image. In the latter case, traditional image forgery localization (IFL) methods typically fail. This paper introduces the Text-Guided Inpainting Forgery (TGIF) dataset, a comprehensive collection of images designed to support the training and evaluation of image forgery localization and synthetic image detection (SID) methods. The TGIF dataset includes approximately 75k forged images, originating from popular open-source and commercial methods, namely SD2, SDXL, and Adobe Firefly. We benchmark several state-of-the-art IFL and SID methods on TGIF. Whereas traditional IFL methods can detect spliced images, they fail to detect regenerated inpainted images. Moreover, traditional SID may detect the regenerated inpainted images to be fake, but cannot localize the inpainted area. Finally, both IFL and SID methods fail when exposed to stronger compression, while they are less robust to modern compression algorithms, such as WEBP. In conclusion, this work demonstrates the inefficiency of state-of-the-art detectors on local manipulations performed by modern generative approaches, and aspires to help with the development of more capable IFL and SID methods. The dataset and code can be downloaded at https://github.com/IDLabMedia/tgif-dataset.

翻訳日:2024-11-08 21:10:26 公開日:2024-10-04

# 歴史インク:19世紀のラテンアメリカ・スペイン新聞社 LLM OCR 補正

Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction ( http://arxiv.org/abs/2407.12838v2 )

ライセンス: Link先を確認

Laura Manrique-Gómez, Tony Montes, Arturo Rodríguez-Herrera, Rubén Manrique,

(参考訳) まず、19世紀のラテンアメリカの新聞のテキストのデータセットを導入し、この地域の歴史的・言語学的分析のための特殊なコーパスにおける重要なギャップに対処する。第二に、デジタルコーパスにおけるOCR誤り訂正と言語表面形状検出にLarge Language Modelを利用するフレキシブルなフレームワークを開発する。この半自動フレームワークは、さまざまなコンテキストやデータセットに適用可能で、新しく作成されたデータセットに適用できる。

This paper presents two significant contributions: First, it introduces a novel dataset of 19th-century Latin American newspaper texts, addressing a critical gap in specialized corpora for historical and linguistic analysis in this region. Second, it develops a flexible framework that utilizes a Large Language Model for OCR error correction and linguistic surface form detection in digitized corpora. This semi-automated framework is adaptable to various contexts and datasets and is applied to the newly created dataset.

翻訳日:2024-11-08 20:25:29 公開日:2024-10-04

# 大規模言語モデルは人間レベルナラティブを生成することができるか?

Are Large Language Models Capable of Generating Human-Level Narratives? ( http://arxiv.org/abs/2407.13248v2 )

ライセンス: Link先を確認

Yufei Tian, Tenghao Huang, Miri Liu, Derek Jiang, Alexander Spangher, Muhao Chen, Jonathan May, Nanyun Peng,

(参考訳) 本稿ではストーリーテリングにおけるLLMの能力について考察し,物語の展開とプロットの進行に着目した。 3つの談話レベルの側面から物語を分析するための新しい計算フレームワークを導入する。ストーリー・アーク; ストーリー・アーク; ストーリー・アーク二点を回すこと、及び三覚醒及び静寂を含む情緒的寸法専門家と自動アノテーションを活用することで、LLMと人間による物語の間に大きな相違点が明らかになる。人間による物語はサスペンスがあり、刺激的であり、物語構造において多様であるが、LLMの物語は均質に肯定的であり、緊張を欠いている。次に,ナラティブ推論スキルを生成能力の前駆体として測定し,ほとんどのLLMは言論理解における人間の能力に欠けていると結論付けた。最後に, 上記の談話機能の明示的な統合は, 多様性, サスペンス, 覚醒の観点から, 40%以上のニューラルストーリーテリングの改善が示されるように, ストーリーテリングを促進できることを示す。

This paper investigates the capability of LLMs in storytelling, focusing on narrative development and plot progression. We introduce a novel computational framework to analyze narratives through three discourse-level aspects: i) story arcs, ii) turning points, and iii) affective dimensions, including arousal and valence. By leveraging expert and automatic annotations, we uncover significant discrepancies between the LLM- and human- written stories. While human-written stories are suspenseful, arousing, and diverse in narrative structures, LLM stories are homogeneously positive and lack tension. Next, we measure narrative reasoning skills as a precursor to generative capacities, concluding that most LLMs fall short of human abilities in discourse understanding. Finally, we show that explicit integration of aforementioned discourse features can enhance storytelling, as is demonstrated by over 40% improvement in neural storytelling in terms of diversity, suspense, and arousal.

翻訳日:2024-11-08 20:14:30 公開日:2024-10-04

# SpeciaLex: In-Context Specialized Lexicon Learningのベンチマーク

SpeciaLex: A Benchmark for In-Context Specialized Lexicon Learning ( http://arxiv.org/abs/2407.13297v2 )

ライセンス: Link先を確認

Joseph Marvin Imperial, Harish Tayyar Madabushi,

(参考訳) 特殊レキシコン(英: Specialated lexicons)は、特別な定義、特定の役割、目的とする対象のオーディエンスなど、関連する制約のある単語の集合である。これらの制約は、テキストコンテンツの曖昧さを減らし、特定のオーディエンスに対する全体的な可読性を高めることを目的として、コンテンツ生成およびドキュメントタスク(例えば、テクニカルマニュアルや子供の読書資料を書くこと)に必要である。これらの制約をいかに大きな言語モデルが捉えるかを理解することで、研究者はNLPコミュニティを超えて、より優れた、より影響力のあるツールを構築することができる。この目的に向けて、私たちはSpeciaLexを紹介した。これは言語モデルが18の多様なサブタスクにまたがる特別なレキシコンベースの制約に従う能力を評価するためのベンチマークであり、チェック、識別、書き換え、オープンジェネレーションのコアタスクをカバーする1,785のテストインスタンスである。本稿では,15のオープン・クローズド・ソース LLM の実証評価を行い,モデルスケール,オープンネス,セットアップ,信頼性などの要因が,ベンチマークで評価した場合のパフォーマンスに与える影響について考察する。

Specialized lexicons are collections of words with associated constraints such as special definitions, specific roles, and intended target audiences. These constraints are necessary for content generation and documentation tasks (e.g., writing technical manuals or children's reading materials), where the goal is to reduce the ambiguity of text content and increase its overall readability for a specific group of audience. Understanding how large language models can capture these constraints can help researchers build better, more impactful tools for wider use beyond the NLP community. Towards this end, we introduce SpeciaLex, a benchmark for evaluating a language model's ability to follow specialized lexicon-based constraints across 18 diverse subtasks with 1,785 test instances covering core tasks of Checking, Identification, Rewriting, and Open Generation. We present an empirical evaluation of 15 open and closed-source LLMs and discuss insights on how factors such as model scale, openness, setup, and recency affect performance upon evaluating with the benchmark.

翻訳日:2024-11-08 20:14:30 公開日:2024-10-04

# 逆二乗相互作用を持つ新しい翻訳的不変な超対称鎖:分配関数、熱力学、臨界性

A novel translationally invariant supersymmetric chain with inverse-square interactions: partition function, thermodynamics and criticality ( http://arxiv.org/abs/2407.13827v3 )

ライセンス: Link先を確認

Bireswar Basu-Mallick, Federico Finkel, Artemio González-López,

(参考訳) 我々は、ルート系に直接関連しない長距離相互作用を持つ翻訳不変su$(m|n)$超対称スピン鎖の新しい族を導入する。我々はこれらのモデルの対称性について研究し、特にこの種のシステムのボソン-フェルミオン双対性(boson-fermion duality)特性の存在を確立した。新しい鎖とそれに付随する多体超対称スピン力学モデルの関係を利用して、m$と$n$のすべての値と任意の数のスピンに対して、それらの分割関数を閉形式で計算することができる。 m$ と $n$ の両方が偶数であるとき、分配函数は2つの超対称ハルダン-シャストリースピン鎖の分配函数の積として分解され、したがって適切な転移行列のペロン固有値の観点からスピン毎の熱力学自由エネルギーの簡単な式が導かれる。この式を用いて、これらの鎖の大規模な熱力学を解析し、特に、特定の熱が1つのショットキーピークを、適切な$k$レベルのモデルとほぼ同じ温度で表すことを示す。また,新しい鎖の臨界挙動,特に基底状態の縮退と線形エネルギー-分子分散関係による低エネルギー励起の存在を解析した。このようにして、可能な唯一の臨界鎖は$m=0,1,2$であることを示すことができる。さらに、分割函数の明示的な公式を用いて、$n$ の Su$(0|n)$ および su$(2|n)$ の鎖の臨界性を確立し、関連する共形体理論の中心電荷を評価することができる。

We introduce a novel family of translationally-invariant su$(m|n)$ supersymmetric spin chains with long-range interaction not directly associated to a root system. We study the symmetries of these models, establishing in particular the existence of a boson-fermion duality characteristic of this type of systems. Taking advantage of the relation of the new chains with an associated many-body supersymmetric spin dynamical model, we are able to compute their partition function in closed form for all values of $m$ and $n$ and for an arbitrary number of spins. When both $m$ and $n$ are even, we show that the partition function factorizes as the product of the partition functions of two supersymmetric Haldane-Shastry spin chains, which in turn leads to a simple expression for the thermodynamic free energy per spin in terms of the Perron eigenvalue of a suitable transfer matrix. We use this expression to study the thermodynamics of a large class of these chains, showing in particular that the specific heat presents a single Schottky peak at approximately the same temperature as a suitable $k$-level model. We also analyze the critical behavior of the new chains, and in particular the ground state degeneracy and the existence of low energy excitations with a linear energy-momentum dispersion relation. In this way we are able to show that the only possible critical chains are the ones with $m=0,1,2$. In addition, using the explicit formula for the partition function we are able to establish the criticality of the su$(0|n)$ and su$(2|n)$ chains with even $n$, and to evaluate the central charge of their associated conformal field theory.

翻訳日:2024-11-08 20:01:00 公開日:2024-10-04

# グリッドパズル解決のためのステップバイステップ推論: LLMはFalterとは?

Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter? ( http://arxiv.org/abs/2407.14790v2 )

ライセンス: Link先を確認

Nemika Tyagi, Mihir Parmar, Mohith Kulkarni, Aswin RRV, Nisarg Patel, Mutsumi Nakamura, Arindam Mitra, Chitta Baral,

(参考訳) グリッドパズルを解くには、かなりの量の論理的推論が必要となる。したがって、モデルの推論能力を評価することは良いドメインであり、モデルの推論能力を改善するために私たちを導くことができる。しかし、既存のほとんどの研究は、LLMの推論連鎖の詳細な分析(例えば、その分岐点など)を掘り下げたり、それらを評価するためのより詳細な指標を提供することなく、パズルの最終的な解のみを評価する。 LLMは単純なヒューリスティックやアーティファクトに頼って最終解を予測できるため、LLMの推論能力を正確に評価するためには、全体的な正当性測定以上の推論連鎖を評価することが重要である。この目的のために、まずGridPuzzleを開発した。これは、複雑度が異なる274のグリッドベースのパズルからなる評価データセットである。第2に, GPT-4, Claude-3, Gemini, Mistral, Llama-2 など LLM の推論鎖を手動で解析した新しい誤り分類法を提案する。そこで我々は,大規模主観的評価(すなわち誤りの特定)のためのLLMベースのフレームワークと客観的な指標であるPuzzleEvalを開発し,推論連鎖の正しさを評価する。 LLMから推論鎖を評価することは、いくつかの興味深い発見につながる。さらに、モデルの推論能力を向上させるために使われている既存のプロンプト手法は、GridPuzzleの性能を向上しないことを示す。このことは、細粒度エラーを理解することの重要性を強調し、これらのエラーに対処する手法を開発することにより、LLMのパズル解決能力を高めるための今後の研究課題を示す。データとソースコードはhttps://github.com/Mihir3009/GridPuzzle.comで入手できる。

Solving grid puzzles involves a significant amount of logical reasoning. Hence, it is a good domain to evaluate the reasoning capability of a model which can then guide us to improve the reasoning ability of models. However, most existing works evaluate only the final predicted answer of a puzzle, without delving into an in-depth analysis of the LLMs' reasoning chains (such as where they falter) or providing any finer metrics to evaluate them. Since LLMs may rely on simple heuristics or artifacts to predict the final answer, it is crucial to evaluate the generated reasoning chain beyond overall correctness measures, for accurately evaluating the reasoning abilities of LLMs. To this end, we first develop GridPuzzle, an evaluation dataset comprising 274 grid-based puzzles with different complexities. Second, we propose a new error taxonomy derived from manual analysis of reasoning chains from LLMs including GPT-4, Claude-3, Gemini, Mistral, and Llama-2. Then, we develop an LLM-based framework for large-scale subjective evaluation (i.e., identifying errors) and an objective metric, PuzzleEval, to evaluate the correctness of reasoning chains. Evaluating reasoning chains from LLMs leads to several interesting findings. We further show that existing prompting methods used for enhancing models' reasoning abilities do not improve performance on GridPuzzle. This highlights the importance of understanding fine-grained errors and presents a challenge for future research to enhance LLMs' puzzle-solving abilities by developing methods that address these errors. Data and source code are available at https://github.com/Mihir3009/GridPuzzle.

翻訳日:2024-11-08 19:27:32 公開日:2024-10-04

# クロスドメインマニピュレーションインタフェースとしてのフロー

Flow as the Cross-Domain Manipulation Interface ( http://arxiv.org/abs/2407.15208v2 )

ライセンス: Link先を確認

Mengda Xu, Zhenjia Xu, Yinghao Xu, Cheng Chi, Gordon Wetzstein, Manuela Veloso, Shuran Song,

(参考訳) In2Flow2Actは、ロボットが現実世界のロボットのトレーニングデータを必要とせずに、現実世界の操作スキルを習得できるスケーラブルな学習フレームワークである。 Im2Flow2Actの背景にある重要な考え方は、操作インターフェースとしてオブジェクトフローを使用すること、異なる実施形態(人間とロボット)とトレーニング環境(現実世界とシミュレーション)の間のドメインギャップを埋めることである。 Im2Flow2Actはフロー生成ネットワークとフロー条件ポリシーの2つのコンポーネントから構成される。人間のデモビデオに基づいて訓練されたフロー生成ネットワークは、タスク記述に基づいて初期シーン画像からオブジェクトフローを生成する。シミュレーションされたロボットプレイデータに基づいて訓練されたフロー条件付きポリシーは、生成されたオブジェクトフローをロボットアクションにマッピングし、所望のオブジェクトの動きを実現する。フローを入力として使うことで、このポリシーは最小限のsim-to-realギャップで現実世界に直接展開できる。実世界の人間のビデオとシミュレーションされたロボットのプレイデータを活用することで、現実世界での物理的ロボットの遠隔操作という課題を回避し、多様なタスクのためのスケーラブルなシステムを実現する。我々はIm2Flow2Actの様々な実世界のタスクにおいて、剛性、調音、変形可能なオブジェクトの操作を含む能力を実証する。

We present Im2Flow2Act, a scalable learning framework that enables robots to acquire real-world manipulation skills without the need of real-world robot training data. The key idea behind Im2Flow2Act is to use object flow as the manipulation interface, bridging domain gaps between different embodiments (i.e., human and robot) and training environments (i.e., real-world and simulated). Im2Flow2Act comprises two components: a flow generation network and a flow-conditioned policy. The flow generation network, trained on human demonstration videos, generates object flow from the initial scene image, conditioned on the task description. The flow-conditioned policy, trained on simulated robot play data, maps the generated object flow to robot actions to realize the desired object movements. By using flow as input, this policy can be directly deployed in the real world with a minimal sim-to-real gap. By leveraging real-world human videos and simulated robot play data, we bypass the challenges of teleoperating physical robots in the real world, resulting in a scalable system for diverse tasks. We demonstrate Im2Flow2Act's capabilities in a variety of real-world tasks, including the manipulation of rigid, articulated, and deformable objects.

翻訳日:2024-11-08 15:56:37 公開日:2024-10-04

# PreAlign:多言語アライメントの早期確立による言語間移動の促進

PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment ( http://arxiv.org/abs/2407.16222v2 )

ライセンス: Link先を確認

Jiahuan Li, Shujian Huang, Aarron Ching, Xinyu Dai, Jiajun Chen,

(参考訳) 大規模な言語モデルは、英語中心の事前訓練にもかかわらず、合理的な多言語能力を示す。しかし、これらのモデルにおける自発的な多言語アライメントは弱く、不満足な言語間移動と知識共有をもたらすことが示されている。事前訓練の前後に多言語アライメント情報を明示的に注入することでこの問題に対処する。したがって、事前訓練の初期段階において、アライメントは言語間で情報や知識を共有するために弱い。本稿では,言語モデル事前学習に先立って多言語アライメントを確立するフレームワークであるPreAlignを提案する。 PreAlignはモデルを初期化して多言語アライメントを注入し、アライメントされた単語の類似表現を生成し、事前訓練中にコードスイッチング戦略を用いてこのアライメントを保存する。 PreAlignは、言語モデリング、ゼロショットの言語間移動、および言語間知識アプリケーションにおいて、標準多言語共同訓練を著しく上回っている。実世界のシナリオにおけるさらなる実験は、様々なモデルサイズにわたるPreAlignの有効性をさらに検証した。

Large language models demonstrate reasonable multilingual abilities, despite predominantly English-centric pretraining. However, the spontaneous multilingual alignment in these models is shown to be weak, leading to unsatisfactory cross-lingual transfer and knowledge sharing. Previous works attempt to address this issue by explicitly injecting multilingual alignment information during or after pretraining. Thus for the early stage in pretraining, the alignment is weak for sharing information or knowledge across languages. In this paper, we propose PreAlign, a framework that establishes multilingual alignment prior to language model pretraining. PreAlign injects multilingual alignment by initializing the model to generate similar representations of aligned words and preserves this alignment using a code-switching strategy during pretraining. Extensive experiments in a synthetic English to English-Clone setting demonstrate that PreAlign significantly outperforms standard multilingual joint training in language modeling, zero-shot cross-lingual transfer, and cross-lingual knowledge application. Further experiments in real-world scenarios further validate PreAlign's effectiveness across various model sizes.

翻訳日:2024-11-08 15:34:26 公開日:2024-10-04

# OpenHands: ジェネラリストエージェントとしてのAIソフトウェア開発者のためのオープンプラットフォーム

OpenHands: An Open Platform for AI Software Developers as Generalist Agents ( http://arxiv.org/abs/2407.16741v2 )

ライセンス: Link先を確認

Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig,

(参考訳) ソフトウェアは人間の手元にある最も強力なツールの1つです。熟練したプログラマが複雑で深い方法で世界と対話することを可能にするのです。同時に、大きな言語モデル(LLM)の改善により、周辺環境の変化と相互作用し、影響を及ぼすAIエージェントの急速な開発も行われている。本稿では,人間の開発者と同じような方法で世界と対話する,強力で柔軟なAIエージェントを開発するためのプラットフォームであるOpenHands(f.k.OpenDevin)を紹介します。本稿では,新しいエージェントの実装,コード実行のためのサンドボックス環境との安全なインタラクション,複数エージェント間の調整,評価ベンチマークの導入について述べる。現在組み込まれているベンチマークに基づいて、ソフトウェアエンジニアリング(SWE-BENCHなど)やWebブラウジング(WEBARENAなど)を含む15の課題タスクに対してエージェントの評価を行う。寛容なMITライセンスの下でリリースされているOpenHandsは、学術と産業にまたがるコミュニティプロジェクトであり、188人以上のコントリビュータから2.1K以上のコントリビューションがある。

Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenHands (f.k.a. OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web. We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, coordination between multiple agents, and incorporation of evaluation benchmarks. Based on our currently incorporated benchmarks, we perform an evaluation of agents over 15 challenging tasks, including software engineering (e.g., SWE-BENCH) and web browsing (e.g., WEBARENA), among others. Released under the permissive MIT license, OpenHands is a community project spanning academia and industry with more than 2.1K contributions from over 188 contributors.

翻訳日:2024-11-08 15:23:20 公開日:2024-10-04

# 知るか知らないか : あいまいさ下における大規模言語モデルの自己整合性の分析

To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity ( http://arxiv.org/abs/2407.17125v3 )

ライセンス: Link先を確認

Anastasiia Sedova, Robert Litschko, Diego Frassinelli, Benjamin Roth, Barbara Plank,

(参考訳) 大規模言語モデル(LLM)の顕著な性能に寄与する主要な側面の1つは、事前学習中に蓄積された膨大な事実知識である。しかし、多くのLDMは自己整合性に悩まされており、信頼性と信頼性に疑問を呈している。本稿では, 実体型あいまいさに着目し, 不明瞭な実体を刺激した場合の事実知識の適用において, 最先端のLCMの習熟度と一貫性を解析する。そこで本研究では,49個の曖昧なエンティティ上で,知識の適用から知識を逸脱する評価プロトコルを提案し,最先端のLCMをテストした。実験の結果, LLMは正しいエンティティの読み出しに苦慮し, 平均精度は85%, 未特定プロンプトで75%と低かった。結果は、LLMの行動における系統的な差異を明らかにし、モデルが知識を持っている一方で、それらを一貫して適用することに苦労し、好ましい読み方に対する偏見を示し、自己矛盾を示すことを示した。これは、より信頼できるLLMのための将来的なエンティティの曖昧さに対処する必要性を強調します。

One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.

翻訳日:2024-11-08 15:23:20 公開日:2024-10-04

# 計算量削減のためのツール支援学習

Tool-Assisted Learning of Computational Reductions ( http://arxiv.org/abs/2407.18215v2 )

ライセンス: Link先を確認

Tristan Kneisel, Elias Radtke, Marko Schmellenkamp, Fabian Vehlken, Thomas Zeume,

(参考訳) 計算機科学において計算量削減は重要かつ強力な概念である。しかし、多くの学生には理解が難しい。本稿では,削減学習が教育支援システムによってどのように支援されるか,という概念を概説する。本稿では,そのようなシステムにおける概念の具体的実装について述べるとともに,理論計算機科学の入門講座において,その教材を用いた経験を報告する。

Computational reductions are an important and powerful concept in computer science. However, they are difficult for many students to grasp. In this paper, we outline a concept for how the learning of reductions can be supported by educational support systems. We present an implementation of the concept within such a system, concrete web-based and interactive learning material for reductions, and report on our experiences using the material in a large introductory course on theoretical computer science.

翻訳日:2024-11-08 15:01:09 公開日:2024-10-04

# ベイズ並列分岐グラフニューラルネットワークにおけるロバスト学習:狭幅限界

Robust Learning in Bayesian Parallel Branching Graph Neural Networks: The Narrow Width Limit ( http://arxiv.org/abs/2407.18807v2 )

ライセンス: Link先を確認

Zechen Zhang, Haim Sompolinsky,

(参考訳) ランダムニューラルネットワークの無限幅制限は、タスク非依存のカーネルを特徴とするGaussian Process (NNGP) (Lee et al [2018]) としてニューラルネットワークに現れることが知られている。より広いネットワーク幅が一般化に寄与することが広く受け入れられている(Park et al [2019])。しかし、この研究は、残余ネットワークに類似したアーキテクチャであるベイズ並列分岐グラフニューラルネットワーク(BPB-GNN)の幅制限を調査することによって、この概念に挑戦する。我々は,BPB-GNNの幅がトレーニング例の数に比べて著しく小さい場合,各分岐はカーネル再正規化における分岐の対称性の破れにより,より堅牢な学習を示すことを示した。驚いたことに、狭い幅制限におけるBPB-GNNの性能は、バイアス制限シナリオの幅制限で達成されるものよりも、一般的に優れているか、同等である。さらに、狭い幅制限における各ブランチの読み出しノルムは、アーキテクチャのハイパーパラメータとは独立しているが、概してデータの性質を反映している。本結果は,並列分岐ネットワークにおいて,新たに定義された狭帯域方式を特徴付けるものである。

The infinite width limit of random neural networks is known to result in Neural Networks as Gaussian Process (NNGP) (Lee et al. [2018]), characterized by task-independent kernels. It is widely accepted that larger network widths contribute to improved generalization (Park et al. [2019]). However, this work challenges this notion by investigating the narrow width limit of the Bayesian Parallel Branching Graph Neural Network (BPB-GNN), an architecture that resembles residual networks. We demonstrate that when the width of a BPB-GNN is significantly smaller compared to the number of training examples, each branch exhibits more robust learning due to a symmetry breaking of branches in kernel renormalization. Surprisingly, the performance of a BPB-GNN in the narrow width limit is generally superior or comparable to that achieved in the wide width limit in bias-limited scenarios. Furthermore, the readout norms of each branch in the narrow width limit are mostly independent of the architectural hyperparameters but generally reflective of the nature of the data. Our results characterize a newly defined narrow-width regime for parallel branching networks in general.

翻訳日:2024-11-08 14:50:05 公開日:2024-10-04

# 高次解法における異方性p適応と誤差推定のための強化学習

Reinforcement learning for anisotropic p-adaptation and error estimation in high-order solvers ( http://arxiv.org/abs/2407.19000v2 )

ライセンス: Link先を確認

David Huergo, Martín de Frutos, Eduardo Jané, Oscar A. Marino, Gonzalo Rubio, Esteban Ferrer,

(参考訳) Reinforcement Learning (RL) を用いた高次h/pソルバにおける異方性p適応の自動化と最適化のための新しい手法を提案する。動的RL適応は、高階多項式を調整するために進化的解を用いる。我々は,シミュレーションを行う際の最小限のオーバーコストを示す,主解法から切り離されたオフライントレーニング手法を開発した。さらに、局所的な離散化誤差の定量化を可能にする、安価なRLベースの誤差推定手法を導出する。提案手法は計算メッシュと解く偏微分方程式の両方に非依存である。 RLのメッシュ適応への応用にはいくつかの利点がある。これにより、自動化された適応的なメッシュリファインメントが可能になり、手作業による介入の必要が軽減される。計算資源を最適化し、必要であれば高次多項式を動的に割当て、安定な領域での洗練を最小化する。これにより、解の精度を維持しながら計算コストの削減につながる。さらに、RLは従来のメッシュ適応の探索を可能にし、シミュレーションの精度と堅牢性を高める可能性がある。この研究は、より堅牢で再現性があり、複雑な3次元問題に適用可能なアプローチを提供することによって、我々の当初の研究を拡張します。本稿では, 円柱, テイラー・グリーン・ボルテックス, 10MWの風力タービンによる, 提案手法の柔軟性の検証を行う。

We present a novel approach to automate and optimize anisotropic p-adaptation in high-order h/p solvers using Reinforcement Learning (RL). The dynamic RL adaptation uses the evolving solution to adjust the high-order polynomials. We develop an offline training approach, decoupled from the main solver, which shows minimal overcost when performing simulations. In addition, we derive an inexpensive RL-based error estimation approach that enables the quantification of local discretization errors. The proposed methodology is agnostic to both the computational mesh and the partial differential equation to be solved. The application of RL to mesh adaptation offers several benefits. It enables automated and adaptive mesh refinement, reducing the need for manual intervention. It optimizes computational resources by dynamically allocating high-order polynomials where necessary and minimizing refinement in stable regions. This leads to computational cost savings while maintaining the accuracy of the solution. Furthermore, RL allows for the exploration of unconventional mesh adaptations, potentially enhancing the accuracy and robustness of simulations. This work extends our original research, offering a more robust, reproducible, and generalizable approach applicable to complex three-dimensional problems. We provide validation for laminar and turbulent cases: circular cylinders, Taylor Green Vortex and a 10MW wind turbine to illustrate the flexibility of the proposed approach.

翻訳日:2024-11-08 14:38:53 公開日:2024-10-04

# 低雑音に対するスパースLPNとLSPNのアルゴリズム

Algorithms for Sparse LPN and LSPN Against Low-noise ( http://arxiv.org/abs/2407.19215v2 )

ライセンス: Link先を確認

Xue Chen, Wenxuan Shu, Zhaienhe Zhou,

(参考訳) 本研究では,古典的学習パリティ(LPN)問題の2つのスパース変種に対する学習アルゴリズムについて検討した。我々は、幅広いパラメータの最先端性を改善する新しいアルゴリズムフレームワークを提供する。このフレームワークは、従来のアプローチと異なる単純な構造を持ち、最初のステップはスパーシティの知識によるドメインの縮小であり、ガウスの除去によるサブプロブレムの解決である。 n$ を次元とし、$k$ を空間パラメータとし、$\eta$ をノイズレートとし、各ラベルが確率$\eta$ で反転する。スパースLPN問題(様々なパラメータを持つ)は、暗号に広く応用されている。標準のLPN問題とは異なり、$\mathbf{F}_2^n$のランダムベクトルをサンプリングし、ランダムな$k$スパースベクトルをサンプリングする。誕生日パラドックスは、$m=n^{k/2}$サンプルを与えられた自明な区別アルゴリズムを意味する。 m=n^{1+(\frac{k}{2}-1)(1-\delta)}$と$\delta \in (0,1)$の場合、最もよく知られているアルゴリズムは実行時間$\min\{e^{\eta n}, e^{\tilde{O}(n^{\delta})}\}$である。時間複雑性$e^{\tilde{O}(\eta \cdot n^{\frac{1+\delta}{2}})}$およびサンプル複雑性$m=\max\{1,\frac{\eta \cdot n^{\frac{1+\delta}{2}}}{k^2}\} \cdot \tilde{O}(n)^{1+(\frac{k-1}{2})(1-\delta)}$とするスパースLPNの学習アルゴリズムを提案する。これは、より広い範囲の$\eta$を持つ任意の定数または超定数$k$に対する以前の結果を改善する。ノイズによる学習スパースパリティ(LSPN)問題は、隠れパリティが$k$スパースであると仮定する。 LSPNは学習理論と暗号の両方で広く研究されている。しかし、最先端技術は、幅広いパラメータに対して${n \choose k/2} = \Omega(n/k)^{k/2}$時間を必要とし、単純な列挙アルゴリズムは${n \choose k}=O(n/k)^k$時間を必要とする。 LSPNアルゴリズムは、任意の$\eta$と$k$に対して、時間$O(\eta \cdot n/k)^k$で実行される。これにより、幅広いパラメータでスパースパリティを学習するための最先端技術が改善される。

We study learning algorithms for two sparse variants of the classical learning parity with noise (LPN) problem. We provide a new algorithmic framework that improves the state of the art for a wide range of parameters. This framework has a simple structure different from previous approaches: the first step is a domain reduction via the knowledge of sparsity; then it solves sub-problems by Gaussian elimination. Let $n$ be the dimension, $k$ be the sparsity parameter, and $\eta$ be the noise rate such that each label gets flipped with probability $\eta$. The sparse LPN problem (with various parameters) has wide applications in cryptography. Different from the standard LPN problem that samples random vectors in $\mathbf{F}_2^n$, it samples random $k$-sparse vectors. The birthday paradox implies a trivial distinguishing algorithm given $m=n^{k/2}$ samples. For $m=n^{1+(\frac{k}{2}-1)(1-\delta)}$ with $\delta \in (0,1)$, the best known algorithm has running time $\min\{e^{\eta n}, e^{\tilde{O}(n^{\delta})}\}$. We present a learning algorithm for sparse LPN with time complexity $e^{\tilde{O}(\eta \cdot n^{\frac{1+\delta}{2}})}$ and sample complexity $m=\max\{1,\frac{\eta \cdot n^{\frac{1+\delta}{2}}}{k^2}\} \cdot \tilde{O}(n)^{1+(\frac{k-1}{2})(1-\delta)}$. It improves previous results for any constant or super-constant $k$ with a wide range of $\eta$. The learning sparse parity with noise (LSPN) problem assumes the hidden parity is $k$-sparse. LSPN has been extensively studied in both learning theory and cryptography. However, the state of the art needs ${n \choose k/2} = \Omega(n/k)^{k/2}$ time for a wide range of parameters while the simple enumeration algorithm takes ${n \choose k}=O(n/k)^k$ time. Our LSPN algorithm runs in time $O(\eta \cdot n/k)^k$ for any $\eta$ and $k$. This improves the state-of-the-art for learning sparse parity in a wide range of parameters.

翻訳日:2024-11-08 14:38:53 公開日:2024-10-04

# 低雑音に対するスパースLPNとLSPNのアルゴリズム

Algorithms for Sparse LPN and LSPN Against Low-noise ( http://arxiv.org/abs/2407.19215v3 )

ライセンス: Link先を確認

Xue Chen, Wenxuan Shu, Zhaienhe Zhou,

翻訳日:2024-11-08 14:38:53 公開日:2024-10-04

# 方向性グラフのための良い位置エンコーディングとは何か?

What Are Good Positional Encodings for Directed Graphs? ( http://arxiv.org/abs/2407.20912v2 )

ライセンス: Link先を確認

Yinan Huang, Haoyu Wang, Pan Li,

(参考訳) 位置エンコーディング(PE)は、ノード間の相対空間関係を効果的に捉えるため、強力で表現力のあるグラフニューラルネットワークとグラフトランスフォーマーを構築するために不可欠である。無向グラフのPEについて広範な研究が行われてきたが、有向グラフのPEは比較的未探索のままである。この研究はこのギャップに対処しようと試みている。まず、有向グラフに対するウォークカウントシーケンスの一般化であるウォークプロファイルの概念を紹介する。ウォークプロファイルは、プログラム解析や回路性能予測など、有向グラフ関連アプリケーションに不可欠な多くの構造的特徴を含んでいる。歩行プロファイルの表現における既存のPE手法の限界を特定し,複数のポテンシャル因子を組み込むことで,磁気ラプラシア固有ベクトルに基づくPEを拡張した,新しいMulti-q Magnetic Laplacian PEを提案する。新しいPEは、歩行プロファイルを確実に表現できる。さらに,従来の基底不変ニューラルネットワークを一般化し,複雑な領域における新しいPEの安定した利用を可能にする。提案するPEの表現性を検証し,ネットワークの整合性の解決と回路ベンチマークの高速化に有効であることを示す。私たちのコードはhttps://github.com/Graph-COM/Multi-q-Maglapで利用可能です。

Positional encodings (PEs) are essential for building powerful and expressive graph neural networks and graph transformers, as they effectively capture the relative spatial relationships between nodes. Although extensive research has been devoted to PEs in undirected graphs, PEs for directed graphs remain relatively unexplored. This work seeks to address this gap. We first introduce the notion of Walk Profile, a generalization of walk-counting sequences for directed graphs. A walk profile encompasses numerous structural features crucial for directed graph-relevant applications, such as program analysis and circuit performance prediction. We identify the limitations of existing PE methods in representing walk profiles and propose a novel Multi-q Magnetic Laplacian PE, which extends the Magnetic Laplacian eigenvector-based PE by incorporating multiple potential factors. The new PE can provably express walk profiles. Furthermore, we generalize prior basis-invariant neural networks to enable the stable use of the new PE in the complex domain. Our numerical experiments validate the expressiveness of the proposed PEs and demonstrate their effectiveness in solving sorting network satisfiability and performing well on general circuit benchmarks. Our code is available at https://github.com/Graph-COM/Multi-q-Maglap.

翻訳日:2024-11-08 14:05:01 公開日:2024-10-04

# U(N)$の量子信号処理と量子特異値変換

Quantum Signal Processing and Quantum Singular Value Transformation on $U(N)$ ( http://arxiv.org/abs/2408.01439v2 )

ライセンス: Link先を確認

Xi Lu, Yuan Liu, Hongwei Lin,

(参考訳) 量子信号処理と量子特異値変換は、ブロック符号化行列の多項式変換を量子コンピュータに実装するための強力なツールであり、多くの著名な量子アルゴリズムにおいて漸近的に最適な複雑性を達成した。ブロック符号化された入力から複数の多項式を同時に実現する量子信号処理と量子特異値変換の枠組みを,元となるフレームワークにおける$U(2)$の一般化として提案する。また、達成可能な多項式の包括的解析を行い、所望の多項式変換を与える量子回路を構成する再帰的アルゴリズムを与える。二変量多項式関数を実現するためのフレームワークを提案し、漸近的に最適なクエリ複雑性を持つ量子振幅推定アルゴリズムについて検討する。

Quantum signal processing and quantum singular value transformation are powerful tools to implement polynomial transformations of block-encoded matrices on quantum computers, and has achieved asymptotically optimal complexity in many prominent quantum algorithms. We propose a framework of quantum signal processing and quantum singular value transformation on $U(N)$, which realizes multiple polynomials simultaneously from a block-encoded input, as a generalization of those on $U(2)$ in the original frameworks. We also perform a comprehensive analysis on achievable polynomials and give a recursive algorithm to construct the quantum circuit that gives the desired polynomial transformation. As two example applications, we propose a framework to realize bi-variate polynomial functions, and study the quantum amplitude estimation algorithm with asymptotically optimal query complexity.

翻訳日:2024-11-08 13:18:17 公開日:2024-10-04

# ランダウアーの原理とブラックホール領域の量子化

Landauer's principle and black hole area quantization ( http://arxiv.org/abs/2408.02077v3 )

ライセンス: Link先を確認

Bijan Bagchi, Aritra Ghosh, Sauvik Sen,

(参考訳) この記事では、シュワルツシルトブラックホールの領域量子化の文脈における情報理論からランダウアーの原理を評価する。ホーキング蒸発が領域(または質量)スペクトルの離散状態間の遷移の観点で解釈できる量子力学的視点の中では、ランダウアーの原理は、ブラックホールのミクロ状態の数が2^n$になるときに飽和形で一貫して成り立つことを正当化する。これは、$\Delta A = \alpha l_P^2$(自然単位)の領域と等価であり、$\alpha = 4 \ln 2$ はボルツマン単位の連続レベル間のエントロピー間隔がちょうど1ビットの情報と一致する。また、文献で一般的な$\alpha$の他の値についてもコメントします。

This article assesses Landauer's principle from information theory in the context of area quantization of the Schwarzschild black hole. Within a quantum-mechanical perspective where Hawking evaporation can be interpreted in terms of transitions between the discrete states of the area (or mass) spectrum, we justify that Landauer's principle holds consistently in the saturated form when the number of microstates of the black hole goes as $2^n$, where $n$ is a large positive integer labeling the levels of the area/mass spectrum in the semiclassical regime. This is equivalent to the area spacing $\Delta A = \alpha l_P^2$ (in natural units), where $\alpha = 4 \ln 2$ for which the entropy spacing between consecutive levels in Boltzmann units coincides exactly with one bit of information. We also comment on the situation for other values of $\alpha$ prevalent in the literature.

翻訳日:2024-11-08 12:55:51 公開日:2024-10-04

# Logistic Regression は小さな LLM を強力かつ説明可能な "tens-of-shot" 分類器にする

Logistic Regression makes small LLMs strong and explainable "tens-of-shot" classifiers ( http://arxiv.org/abs/2408.03414v2 )

ライセンス: Link先を確認

Marcus Buckmann, Edward Hill,

(参考訳) 簡単な分類処理では,性能のトレードオフや追加のラベル付けコストを伴わずに,大規模な商用モデルではなく,小規模でローカルな生成言語モデルを使用することの利点を享受できることが示される。プライバシ、可用性、コスト、説明可能性といったこれらのアドバンテージは、商用アプリケーションにおいても、AIの広範な民主化においても重要です。 17の文分類タスク (2-4クラス) の実験を通して、小さなLLMの埋め込みにおける対物的回帰は、"tens-of-shot"体制における大きなLLMの性能に等しい(そして通常より優れている)ことを示す。これは、大きなLLMのパフォーマンスを検証するのに必要な以上のラベル付きインスタンスを必要としない。最後に,分類決定のための安定かつ合理的な説明を抽出する。

For simple classification tasks, we show that users can benefit from the advantages of using small, local, generative language models instead of large commercial models without a trade-off in performance or introducing extra labelling costs. These advantages, including those around privacy, availability, cost, and explainability, are important both in commercial applications and in the broader democratisation of AI. Through experiments on 17 sentence classification tasks (2-4 classes), we show that penalised logistic regression on the embeddings from a small LLM equals (and usually betters) the performance of a large LLM in the "tens-of-shot" regime. This requires no more labelled instances than are needed to validate the performance of the large LLM. Finally, we extract stable and sensible explanations for classification decisions.

翻訳日:2024-11-08 12:33:46 公開日:2024-10-04

# Mathfish:教育カリキュラムのグラウンド化による言語モデル数学推論の評価

Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula ( http://arxiv.org/abs/2408.04226v3 )

ライセンス: Link先を確認

Li Lucy, Tal August, Rose E. Wang, Luca Soldaini, Courtney Allison, Kyle Lo,

(参考訳) 数学カリキュラムが学級に適しており、教育基準に従って重要なスキルや概念と整合することを保証するため、教育専門家は、公表された数学問題を何ヶ月も慎重にレビューすることができる。このプロセスからインスピレーションを得て,本研究は,言語モデル(LM)の数学的能力を評価するための新しい角度を示し,それらが数学コンテンツによって実現されるスキルや概念を識別できるかどうかを検証した。 1つは、Achieve the Core(ATC)のK-12数学スキルと概念、あるいは標準を385のきめ細かい記述からなり、もう1つは、これらの標準(MathFish)をラベル付けした9.9K数学問題の1つである。本研究では, 1 つの問題が与えられた基準に合致するかどうかを検証し, 2 つの問題に一貫した基準を付したタグ付けを行うことにより, LM の数学的問題を評価する能力を評価するための2つのタスクを開発する。経験豊富な教師と一緒に働くと、LMは問題に関連する標準をタグ付けして検証するのに苦労し、代わりに、真実に近いが微妙な方法で異なるラベルを予測することに気付きます。また, LMは, プロンプトに記載されている標準と完全に一致しない問題が発生することが少なく, かつ, LMを巻き込むユースケースに対して, 慎重に精査する必要があることが示唆された。最後に、GSM8kの問題を数学標準を用いて分類し、なぜ他のモデルよりも解決が難しいのかをよりよく理解する。

To ensure that math curriculum is grade-appropriate and aligns with critical skills or concepts in accordance with educational standards, pedagogical experts can spend months carefully reviewing published math problems. Drawing inspiration from this process, our work presents a novel angle for evaluating language models' (LMs) mathematical abilities, by investigating whether they can discern skills and concepts enabled by math content. We contribute two datasets: one consisting of 385 fine-grained descriptions of K-12 math skills and concepts, or standards, from Achieve the Core (ATC), and another of 9.9K math problems labeled with these standards (MathFish). We develop two tasks for evaluating LMs' abilities to assess math problems: (1) verifying whether a problem aligns with a given standard, and (2) tagging a problem with all aligned standards. Working with experienced teachers, we find that LMs struggle to tag and verify standards linked to problems, and instead predict labels that are close to ground truth, but differ in subtle ways. We also show that LMs often generate problems that do not fully align with standards described in prompts, suggesting the need for careful scrutiny on use cases involving LMs for generating curricular materials. Finally, we categorize problems in GSM8k using math standards, allowing us to better understand why some problems are more difficult to solve for models than others.

翻訳日:2024-11-08 12:22:45 公開日:2024-10-04

# モデル評価のためのクロスバリデーションに基づく品質指標のロバスト性調査

Robustness investigation of cross-validation based quality measures for model assessment ( http://arxiv.org/abs/2408.04391v2 )

ライセンス: Link先を確認

Thomas Most, Lars Gräning, Sebastian Wolff,

(参考訳) 本稿では,機械学習モデルの評価のための品質指標の精度とロバスト性について検討する。機械学習モデルの予測品質は、未知データに対して近似誤差を推定するクロスバリデーションアプローチに基づいて、モデルに依存しない評価を行う。提案手法は,モデル予測における説明された変動量の定量化である。これらの測定の信頼性は、いくつかの数値的な例を用いて評価され、推定された予測誤差の検証のための追加データセットが利用可能である。さらに、提案した品質指標の信頼性境界を推定し、クロスバリデーション手法により得られた予測残差から局所品質指標を導出する。

In this paper the accuracy and robustness of quality measures for the assessment of machine learning models are investigated. The prediction quality of a machine learning model is evaluated model-independent based on a cross-validation approach, where the approximation error is estimated for unknown data. The presented measures quantify the amount of explained variation in the model prediction. The reliability of these measures is assessed by means of several numerical examples, where an additional data set for the verification of the estimated prediction error is available. Furthermore, the confidence bounds of the presented quality measures are estimated and local quality measures are derived from the prediction residuals obtained by the cross-validation approach.

翻訳日:2024-11-08 12:11:36 公開日:2024-10-04

# DataNarrative: 可視化とテキストによるデータ駆動ストーリテリングの自動化

DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts ( http://arxiv.org/abs/2408.05346v3 )

ライセンス: Link先を確認

Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty,

(参考訳) データ駆動型ストーリーテリングは、物語技法と可視化とテキストを組み合わせることで洞察を伝達する強力な方法である。これらのストーリーには、ハイライトされたバーやチャートの行などの視覚的補助と、洞察を説明するテキストアノテーションが組み込まれている。しかし、そのような物語を作るには、データと綿密な物語計画の深い理解が必要であり、しばしば人間の介入を必要とする。 LLM(Large Language Models)は様々なNLPタスクに優れていますが、一貫性のある包括的なデータストーリーを生成する能力はまだ未定です。本研究では,データストーリ生成のための新しいタスクと,さまざまなソースから1,449件のストーリを含むベンチマークを紹介する。一貫性のあるデータストーリーを作成する上での課題に対処するために,人間のストーリーテリングプロセスを再現する2つのLLMエージェントを用いたマルチエージェントフレームワークを提案する。我々のエージェント・フレームワークは一般的にモデルベースと人的評価の両方において非エージェント・フレームワークよりも優れていますが、結果はデータ・ストーリー・ジェネレーションにおける独特な課題を明らかにします。

Data-driven storytelling is a powerful method for conveying insights by combining narrative techniques with visualizations and text. These stories integrate visual aids, such as highlighted bars and lines in charts, along with textual annotations explaining insights. However, creating such stories requires a deep understanding of the data and meticulous narrative planning, often necessitating human intervention, which can be time-consuming and mentally taxing. While Large Language Models (LLMs) excel in various NLP tasks, their ability to generate coherent and comprehensive data stories remains underexplored. In this work, we introduce a novel task for data story generation and a benchmark containing 1,449 stories from diverse sources. To address the challenges of crafting coherent data stories, we propose a multiagent framework employing two LLM agents designed to replicate the human storytelling process: one for understanding and describing the data (Reflection), generating the outline, and narration, and another for verification at each intermediary step. While our agentic framework generally outperforms non-agentic counterparts in both model-based and human evaluations, the results also reveal unique challenges in data story generation.

翻訳日:2024-11-08 12:00:35 公開日:2024-10-04

# 条件の連鎖:条件質問応答のための構成、検証、解決条件

Chain of Condition: Construct, Verify and Solve Conditions for Conditional Question Answering ( http://arxiv.org/abs/2408.05442v2 )

ライセンス: Link先を確認

Jiuheng Lin, Yuxuan Lai, Yansong Feng,

(参考訳) 条件付き質問応答(CQA)は、可能な回答を見つけ、不足した条件を特定することを目的とした重要なタスクである。既存のアプローチは,(1)必要な条件と論理的関係を正確に同定し,(2)不足しているものを検出するための条件を検証するという2つの課題により,CQAと競合する。本稿では,まずすべての条件を特定し,その条件が満たされているかどうかを検証し,最後に論理的表現を解き,不足した条件を示し,それに応じて回答を生成することによって,条件の連鎖という新しいプロンプト手法を提案する。 2つのCQAベンチマークデータセットの実験では、私たちの状態の連鎖は、既存のプロンプトベースラインよりも優れており、新しい最先端技術を確立しています。さらに, GPT-3.5-Turbo や GPT-4 が既存の教師付きモデルよりも優れていることを示す。

Conditional question answering (CQA) is an important task that aims to find probable answers and identify missing conditions. Existing approaches struggle with CQA due to two challenges: (1) precisely identifying necessary conditions and the logical relationship, and (2) verifying conditions to detect any that are missing. In this paper, we propose a novel prompting approach, Chain of condition, by first identifying all conditions and constructing their logical relationships explicitly according to the document, then verifying whether these conditions are satisfied, finally solving the logical expression to indicate any missing conditions and generating the answer accordingly. Experiments on two CQA benchmark datasets show our chain of condition outperforms existing prompting baselines, establishing a new state of the art. Furthermore, with only a few examples, our method can facilitate GPT-3.5-Turbo or GPT-4 to outperform all existing supervised models.

翻訳日:2024-11-08 12:00:35 公開日:2024-10-04

# LLMを用いたタスク計画のための検索型階層型階層型インコンテクスト強化学習と隠れモジュールリフレクション

Retrieval-Augmented Hierarchical in-Context Reinforcement Learning and Hindsight Modular Reflections for Task Planning with LLMs ( http://arxiv.org/abs/2408.06520v2 )

ライセンス: Link先を確認

Chuanneng Sun, Songjun Huang, Dario Pompili,

(参考訳) 大規模言語モデル(LLM)は、様々な言語タスクにおいて顕著な能力を示しており、ロボット工学における意思決定の候補として有望である。階層強化学習(Hierarchical Reinforcement Learning, HRL)に着想を得て, 複雑なタスクをLDMベースの高レベルポリシーを用いてサブタスクに分解する新しいフレームワークであるRetrieval-Augmented in-context reinforcement Learning (RAHL)を提案する。目標によって定義されたサブタスクは、完成する低レベルポリシーに割り当てられる。マルチエピソード実行におけるエージェントの性能を向上させるため,HMR(Hindsight Modular Reflection)を提案する。提案するRAHLの判定能力は,ALFWorld,Webshop,HotpotQAの3つのベンチマーク環境で評価した。以上の結果から, RAHLは5エピソードで9%, 42%, 10%において, 強いベースラインでのパフォーマンス向上を達成できることが示唆された。さらに,Boston Dynamics SPOTロボットにRAHLを実装した。実験の結果、ロボットは環境をスキャンし、入り口を見つけ、LSMポリシーで制御された新しい部屋へと移動できることがわかった。

Large Language Models (LLMs) have demonstrated remarkable abilities in various language tasks, making them promising candidates for decision-making in robotics. Inspired by Hierarchical Reinforcement Learning (HRL), we propose Retrieval-Augmented in-context reinforcement Learning (RAHL), a novel framework that decomposes complex tasks into sub-tasks using an LLM-based high-level policy, in which a complex task is decomposed into sub-tasks by a high-level policy on-the-fly. The sub-tasks, defined by goals, are assigned to the low-level policy to complete. To improve the agent's performance in multi-episode execution, we propose Hindsight Modular Reflection (HMR), where, instead of reflecting on the full trajectory, we let the agent reflect on shorter sub-trajectories to improve reflection efficiency. We evaluated the decision-making ability of the proposed RAHL in three benchmark environments--ALFWorld, Webshop, and HotpotQA. The results show that RAHL can achieve an improvement in performance in 9%, 42%, and 10% in 5 episodes of execution in strong baselines. Furthermore, we also implemented RAHL on the Boston Dynamics SPOT robot. The experiment shows that the robot can scan the environment, find entrances, and navigate to new rooms controlled by the LLM policy.

翻訳日:2024-11-08 11:26:46 公開日:2024-10-04

# リンク予測における知識グラフ埋め込みの予測多重性

Predictive Multiplicity of Knowledge Graph Embeddings in Link Prediction ( http://arxiv.org/abs/2408.08226v2 )

ライセンス: Link先を確認

Yuqicheng Zhu, Nico Potyka, Mojtaba Nayyeri, Bo Xiong, Yunjie He, Evgeny Kharlamov, Steffen Staab,

(参考訳) 知識グラフ埋め込み(KGE)モデルは、知識グラフ(KG)の欠落するリンクを予測するためにしばしば使用される。しかし、複数のKG埋め込みはリンク予測にほぼ等しく機能するが、未知のクエリに対して矛盾する予測を与える。この現象は文学において「textit{predictive multiplicity}」と呼ばれる。ハイテイク領域におけるKGEベースのアプリケーションには重大なリスクが伴うが、KGEの研究では見落とされている。我々は、リンク予測における予測多重度を定義し、評価指標を導入し、一般的なベンチマークデータセット上の代表的KGE法に対する予測多重度を測定する。私たちの経験的研究は、リンク予測において重大な予測多重性を示し、矛盾する予測を示すクエリを8\%から39\%に削減しています。社会選択理論から投票方法を活用することでこの問題に対処し、我々の実験では、紛争を6,6\%から7,8\%に大幅に軽減する。

Knowledge graph embedding (KGE) models are often used to predict missing links for knowledge graphs (KGs). However, multiple KG embeddings can perform almost equally well for link prediction yet give conflicting predictions for unseen queries. This phenomenon is termed \textit{predictive multiplicity} in the literature. It poses substantial risks for KGE-based applications in high-stake domains but has been overlooked in KGE research. We define predictive multiplicity in link prediction, introduce evaluation metrics and measure predictive multiplicity for representative KGE methods on commonly used benchmark datasets. Our empirical study reveals significant predictive multiplicity in link prediction, with $8\%$ to $39\%$ testing queries exhibiting conflicting predictions. We address this issue by leveraging voting methods from social choice theory, significantly mitigating conflicts by $66\%$ to $78\%$ in our experiments.

翻訳日:2024-11-08 07:29:14 公開日:2024-10-04

# 鮮明な視点から見たラター凝集と品質

Rater Cohesion and Quality from a Vicarious Perspective ( http://arxiv.org/abs/2408.08411v2 )

ライセンス: Link先を確認

Deepak Pandita, Tharindu Cyril Weerasooriya, Sujan Dutta, Sarah K. Luger, Tharindu Ranasinghe, Ashiqur R. KhudaBukhsh, Marcos Zampieri, Christopher M. Homan,

(参考訳) 人間のフィードバックは、AI安全性、コンテンツモデレーション、感情分析など、不一致が頻発する領域にわたって、人間中心のAIシステムを構築するために不可欠である。多くの意見の相違は、特に政治的に告発された状況において、ラッカーが反対の価値観や信念を持っているために生じる。 Vicariousアノテーションは、他の人がデータにアノテートすると考える方法をラウンダーに問うことによって、不一致を断ち切る方法である。本稿では,レーダの不一致を緩和するための分析手法を用いた活気あるアノテーションの利用について検討する。我々はレーダ結束指標を用いて、政治的関係や人種的背景がラテンダーの犯罪に対する認識に与える影響について検討する。さらに、ラッカーの人口動態を考慮に入れたCrowdTruthのレーダ品質指標を用いて、ラッカーとそのアノテーションをスコアリングする。我々は,レーダの品質指標が,個人的および活気あるレベルにわたって,グループ内およびグループ間レーダの凝集にどのように影響するかを検討する。

Human feedback is essential for building human-centered AI systems across domains where disagreement is prevalent, such as AI safety, content moderation, or sentiment analysis. Many disagreements, particularly in politically charged settings, arise because raters have opposing values or beliefs. Vicarious annotation is a method for breaking down disagreement by asking raters how they think others would annotate the data. In this paper, we explore the use of vicarious annotation with analytical methods for moderating rater disagreement. We employ rater cohesion metrics to study the potential influence of political affiliations and demographic backgrounds on raters' perceptions of offense. Additionally, we utilize CrowdTruth's rater quality metrics, which consider the demographics of the raters, to score the raters and their annotations. We study how the rater quality metrics influence the in-group and cross-group rater cohesion across the personal and vicarious levels.

翻訳日:2024-11-08 07:29:14 公開日:2024-10-04

# Soda-Eval:LLM時代のオープンドメイン対話評価

Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs ( http://arxiv.org/abs/2408.10902v2 )

ライセンス: Link先を確認

John Mendonça, Isabel Trancoso, Alon Lavie,

(参考訳) オープンドメイン対話評価では,人間による評価がゴールドスタンダードとなっているが,Large Language Models (LLMs) を用いた自動評価の人気が高まっている。しかし、ほとんどのフレームワークは、現在のモデルに関連する課題を反映していない、流布や妥当性といった側面で古いチャットボットを評価するベンチマークを活用している。実際、GPT-3.5生成対話データセットであるSodaの質的分析では、現在のチャットボットはコヒーレンスやコモンセンスの知識にまつわるいくつかの繰り返しの問題を示す可能性があるが、一般的には高度に流動的で関連する応答を生成する。上述の制限について,本論文では,10K対話で120K以上のターンレベルアセスメントをカバーし,GPT-4でアノテーションを生成するSoda-Evalについて紹介する。 Soda-Eval をベンチマークとして,複数のオープンアクセス命令チューニング LLM の性能を調べた結果,対話評価は依然として困難であることが判明した。これらのモデルを微調整することで、相関と説明の両面において、数ショットの推論よりもパフォーマンスが向上する。

Although human evaluation remains the gold standard for open-domain dialogue evaluation, the growing popularity of automated evaluation using Large Language Models (LLMs) has also extended to dialogue. However, most frameworks leverage benchmarks that assess older chatbots on aspects such as fluency and relevance, which are not reflective of the challenges associated with contemporary models. In fact, a qualitative analysis on Soda, a GPT-3.5 generated dialogue dataset, suggests that current chatbots may exhibit several recurring issues related to coherence and commonsense knowledge, but generally produce highly fluent and relevant responses. Noting the aforementioned limitations, this paper introduces Soda-Eval, an annotated dataset based on Soda that covers over 120K turn-level assessments across 10K dialogues, where the annotations were generated by GPT-4. Using Soda-Eval as a benchmark, we then study the performance of several open-access instruction-tuned LLMs, finding that dialogue evaluation remains challenging. Fine-tuning these models improves performance over few-shot inferences, both in terms of correlation and explanation.

翻訳日:2024-11-08 06:22:37 公開日:2024-10-04

# Soda-Eval:LLM時代のオープンドメイン対話評価

Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs ( http://arxiv.org/abs/2408.10902v3 )

ライセンス: Link先を確認

John Mendonça, Isabel Trancoso, Alon Lavie,

翻訳日:2024-11-08 06:22:37 公開日:2024-10-04

# LLMを用いたRAGとFew-Shot In-Context Learningを用いたエビデンス支援Fact Checking

Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs ( http://arxiv.org/abs/2408.12060v2 )

ライセンス: Link先を確認

Ronit Singhal, Pransh Patwa, Parth Patwa, Aman Chadha, Amitava Das,

(参考訳) ソーシャルメディア上で偽情報の拡散が広まる中、オンラインクレームのファクトチェック機構を実装することが不可欠である。すべてのクレームを手動で検証することは非常に困難で、自動化されたファクトチェックシステムの必要性を強調します。本稿では,この問題に対処するためのシステムについて述べる。 Averitec データセット (Schlichtkrull et al , 2023) を用いてファクトチェックシステムの性能を評価する。精度予測に加えて,本システムでは,データセットから抽出した証拠を裏付ける。本研究では,知識ベースから関連するエビデンス文を抽出する検索・生成(RAG)パイプラインを開発し,そのクレームとともに分類のための大規模言語モデル(LLM)に入力する。また,複数のLLMのICL(In-Context Learning)機能についても検討した。本システムでは,ベースラインに対する22%の絶対改善である0.33の「平均」スコアを達成している。私たちのコードはhttps://github.com/ronit-singhal/evidence-backed-fact-checking-using-rag-and-few-in-context-learning -with-llmsで公開されています。

Given the widespread dissemination of misinformation on social media, implementing fact-checking mechanisms for online claims is essential. Manually verifying every claim is very challenging, underscoring the need for an automated fact-checking system. This paper presents our system designed to address this issue. We utilize the Averitec dataset (Schlichtkrull et al., 2023) to assess the performance of our fact-checking system. In addition to veracity prediction, our system provides supporting evidence, which is extracted from the dataset. We develop a Retrieve and Generate (RAG) pipeline to extract relevant evidence sentences from a knowledge base, which are then inputted along with the claim into a large language model (LLM) for classification. We also evaluate the few-shot In-Context Learning (ICL) capabilities of multiple LLMs. Our system achieves an 'Averitec' score of 0.33, which is a 22% absolute improvement over the baseline. Our Code is publicly available on https://github.com/ronit-singhal/evidence-backed-fact-checking-using-rag-and-few-shot-in-context-lea rning-with-llms.

翻訳日:2024-11-08 05:49:00 公開日:2024-10-04

# 適応言語モデルに対する優先誘導反射サンプリング

Preference-Guided Reflective Sampling for Aligning Language Models ( http://arxiv.org/abs/2408.12163v2 )

ライセンス: Link先を確認

Hai Ye, Hwee Tou Ng,

(参考訳) 反復データ生成とモデル再訓練は、大きな言語モデル(LLM)を人間の好みに効果的に合わせることができる。データサンプリングのプロセスは、政策改善の成功に大きな影響を与えるため、非常に重要である。繰り返しランダムサンプリングは、独立してモデルを複数回クエリして出力を生成するために広く使われている手法である。本研究では,より効果的なサンプリング手法であるPreference-Guided Reflective Smpling(PRS)を提案する。ランダムサンプリングとは異なり、PSSはより効率的なサンプリングを可能にするためにツリーベースの生成フレームワークを使用している。適応的な自己精製技術を活用してサンプリング空間をよりよく探索する。自然言語でユーザの好みを指定することで、PSSはこれらの好みに応じて応答生成をさらに最適化することができる。その結果、PSSはモデルを多様なユーザの好みに合わせることができる。実験の結果,PSSは高い報酬率で高品質な応答を生成できることがわかった。 AlpacaEval と Arena-Hard では、PSS は N$ のサンプリングで繰り返しランダムサンプリングを著しく上回っている。さらに、PSSは、反復的なオフラインRLトレーニングに適用した場合、高い性能を示す。

Iterative data generation and model re-training can effectively align large language models(LLMs) to human preferences. The process of data sampling is crucial, as it significantly influences the success of policy improvement. Repeated random sampling is a widely used method that independently queries the model multiple times to generate outputs. In this work, we propose a more effective sampling method, named Preference-Guided Reflective Sampling (PRS). Unlike random sampling, PRS employs a tree-based generation framework to enable more efficient sampling. It leverages adaptive self-refinement techniques to better explore the sampling space. By specifying user preferences in natural language, PRS can further optimize response generation according to these preferences. As a result, PRS can align models to diverse user preferences. Our experiments demonstrate that PRS generates higher-quality responses with significantly higher rewards. On AlpacaEval and Arena-Hard, PRS substantially outperforms repeated random sampling in best-of-$N$ sampling. Moreover, PRS shows strong performance when applied in iterative offline RL training.

翻訳日:2024-11-08 05:49:00 公開日:2024-10-04

# 深層学習に基づく量子鍵分布プロトコルの連続攻撃

Deep-learning-based continuous attacks on quantum key distribution protocols ( http://arxiv.org/abs/2408.12571v2 )

ライセンス: Link先を確認

Théo Lejeune, François Damanet,

(参考訳) 量子鍵分配(QKD)プロトコルの最も重要な特徴は、サードパーティの攻撃に対するセキュリティと潜在的な対策である。新たなタイプの攻撃は文献で定期的に開発されているが、弱い連続測定を使用することは滅多にない。ここでは、深いリカレントニューラルネットワークの強力なパターン認識能力とともに連続測定を利用する、$\textit{Deep-learning-based continuous attack}$ (DLCA)と呼ばれる新たな攻撃方式を設計する。 BB84プロトコルに適用した場合、スパイが量子通信チャネルに送信された量子ビットの状態に関する重要な情報を抽出しながらも、我々の攻撃に気づくことが困難であることを示す。最後に、スパイが量子フィードバックを利用してトラックをさらにカバーする方法について研究する。我々の攻撃方法は、まだおもちゃモデルの初期段階にあるが、様々なQKDプロトコルに適用でき、様々な方法で一般化できるため、調査に値する潜在的な脅威を構成している。

The most important characteristic of a Quantum Key Distribution (QKD) protocol is its security against third-party attacks, and the potential countermeasures available. While new types of attacks are regularly developed in the literature, they rarely involve the use of weak continuous measurement. Here, we design a new attack scheme called $\textit{Deep-learning-based continuous attack}$ (DLCA) that exploits continuous measurement together with the powerful pattern recognition capacities of deep recurrent neural networks. We show that, when applied to the BB84 protocol, our attack can be difficult to notice while still allowing the spy to extract significant information about the states of the qubits sent in the quantum communication channel. Finally, we study how the spy can exploit quantum feedback to further cover their tracks. Our attack scheme, while still at the early stages of a toy model, constitutes a potential threat which is worthwhile to be investigated, also as it could be applied to different QKD protocols and generalized in many different ways.

翻訳日:2024-11-08 05:37:29 公開日:2024-10-04

# MobileQuant:オンデバイス言語モデルのためのモバイルフレンドリーな量子化

MobileQuant: Mobile-friendly Quantization for On-device Language Models ( http://arxiv.org/abs/2408.13933v2 )

ライセンス: Link先を確認

Fuwen Tan, Royson Lee, Łukasz Dudziak, Shell Xu Hu, Sourav Bhattacharya, Timothy Hospedales, Georgios Tzimiropoulos, Brais Martinez,

(参考訳) 大規模言語モデル(LLM)は言語処理に革命をもたらし、複数のアプリケーションにまたがって優れた結果をもたらしている。しかしながら、エッジデバイスにLSMをデプロイすることは、メモリ、エネルギ、計算コストに関していくつかの課題をもたらし、携帯電話などのデバイスでの利用を制限している。期待できる解決策は、ウェイトとアクティベーションを表すために使われるビットの数を減らすことである。既存の研究は、LLMを低ビット幅、eg 4ビットの重みに量子化することに部分的に成功し、16ビット以上のアクティベーションを量子化することは、デバイス上の量子化サポートの貧弱さや相当な精度低下による大きな計算オーバーヘッドにつながることがしばしばある。しかし、8ビットのアクティベーションは、モバイルフレンドリーなハードウェア、例えばNeural Processing Units(NPU)をLLMが完全に活用できるようにするため、デバイス上でのデプロイメントにとって非常に魅力的なものだ。本研究では、整数のみの量子化を用いたLCMのデバイス上での展開を容易にするための最初の試みを行う。まず、オンデバイス展開における既存の量子化手法の限界について検討し、特にアクティベーション量子化に着目した。この制限に対処するため、MobileQuantという簡単な後学習量子化手法を導入し、ウェイト変換とアクティベーションレンジパラメータをエンドツーエンドに最適化することで、従来のウェイト等価変換作業を拡張した。 MobileQuantが既存のメソッドよりも優れた機能をデモ 1) LLM ベンチマークの広い範囲でニアロスレス量子化を実現する。 2) 現在のオンデバイス量子化戦略と比較して, レイテンシとエネルギー消費を20～50%削減した。 3)計算予算の制限。 4)モバイルフレンドリーな計算ユニット,例えばNPUと互換性がある。

Large language models (LLMs) have revolutionized language processing, delivering outstanding results across multiple applications. However, deploying LLMs on edge devices poses several challenges with respect to memory, energy, and compute costs, limiting their widespread use in devices such as mobile phones. A promising solution is to reduce the number of bits used to represent weights and activations. While existing works have found partial success at quantizing LLMs to lower bitwidths, e.g. 4-bit weights, quantizing activations beyond 16 bits often leads to large computational overheads due to poor on-device quantization support, or a considerable accuracy drop. Yet, 8-bit activations are very attractive for on-device deployment as they would enable LLMs to fully exploit mobile-friendly hardware, e.g. Neural Processing Units (NPUs). In this work, we make a first attempt to facilitate the on-device deployment of LLMs using integer-only quantization. We first investigate the limitations of existing quantization methods for on-device deployment, with a special focus on activation quantization. We then address these limitations by introducing a simple post-training quantization method, named MobileQuant, that extends previous weight equivalent transformation works by jointly optimizing the weight transformation and activation range parameters in an end-to-end manner. MobileQuant demonstrates superior capabilities over existing methods by 1) achieving near-lossless quantization on a wide range of LLM benchmarks, 2) reducing latency and energy consumption by 20\%-50\% compared to current on-device quantization strategies, 3) requiring limited compute budget, 4) being compatible with mobile-friendly compute units, e.g. NPU.

翻訳日:2024-11-08 05:15:13 公開日:2024-10-04

# 物理認識による時空間予測を利用した言語モデル

Language Model Empowered Spatio-Temporal Forecasting via Physics-Aware Reprogramming ( http://arxiv.org/abs/2408.14505v2 )

ライセンス: Link先を確認

Hao Wang, Jindong Han, Wei Fan, Hao Liu,

(参考訳) 時空間予測は、交通計画、エネルギー管理、気候モニタリングなど、多くの実世界の応用において重要である。本研究では,プレトレーニング言語モデル(PLM)の推論と一般化能力を活用して,特にデータ共有シナリオにおいて,より効果的な時空間予測を実現することを目的とする。しかし、最近の研究では、主にテキストデータに基づいて訓練されているPLMが、数値時系列における複雑な相関をモデル化するタスクをこなすと、しばしば混乱し、時空間データの解釈におけるその効果が制限されることが判明している。このギャップを埋めるために,時空間予測に適した物理対応 PLM 再プログラミングフレームワーク RePST を提案する。具体的には、まず、空間的に相関した時系列を解釈可能なサブコンポーネントに適応的に分解する物理認識型デコンポザを提案する。さらに,時空間の空間を拡大して時空間列を離散表現に投影する,選択的離散的再プログラミング手法を提案する。このスキームは、再プログラミング中の情報損失を最小限に抑え、PLMから派生した表現を豊かにする。実世界のデータセットに対する大規模な実験により、提案したRePSTは、特にデータスカースシナリオにおいて12の最先端のベースライン手法より優れており、時空間予測におけるPLMの有効性と優れた一般化能力を強調している。

Spatio-temporal forecasting is pivotal in numerous real-world applications, including transportation planning, energy management, and climate monitoring. In this work, we aim to harness the reasoning and generalization abilities of Pre-trained Language Models (PLMs) for more effective spatio-temporal forecasting, particularly in data-scarce scenarios. However, recent studies uncover that PLMs, which are primarily trained on textual data, often falter when tasked with modeling the intricate correlations in numerical time series, thereby limiting their effectiveness in comprehending spatio-temporal data. To bridge the gap, we propose RePST, a physics-aware PLM reprogramming framework tailored for spatio-temporal forecasting. Specifically, we first propose a physics-aware decomposer that adaptively disentangles spatially correlated time series into interpretable sub-components, which facilitates PLM to understand sophisticated spatio-temporal dynamics via a divide-and-conquer strategy. Moreover, we propose a selective discrete reprogramming scheme, which introduces an expanded spatio-temporal vocabulary space to project spatio-temporal series into discrete representations. This scheme minimizes the information loss during reprogramming and enriches the representations derived by PLMs. Extensive experiments on real-world datasets show that the proposed RePST outperforms twelve state-of-the-art baseline methods, particularly in data-scarce scenarios, highlighting the effectiveness and superior generalization capabilities of PLMs for spatio-temporal forecasting.

翻訳日:2024-11-08 05:04:12 公開日:2024-10-04

# T-FAKE: 顔のランドマークのための熱画像の合成

T-FAKE: Synthesizing Thermal Images for Facial Landmarking ( http://arxiv.org/abs/2408.15127v2 )

ライセンス: Link先を確認

Philipp Flotho, Moritz Piening, Anna Kukleva, Gabriele Steidl,

(参考訳) 顔分析は、セキュリティ、自律運転、エンターテイメント、ヘルスケアなど、幅広いアプリケーションにおいて重要なコンポーネントである。様々な顔のRGBデータセットが利用可能であるにもかかわらず、生命科学、医学、バイオメトリックスにおいて重要な役割を果たす熱モダリティはほとんど見過ごされてきた。このギャップに対処するために、スパースと密集したランドマークを備えた新しい大規模合成熱データセットであるT-FAKEデータセットを導入する。データセットの作成を容易にするため,RGB面へのサーマルスタイルの移動を可能にする新しいRGB2熱損失関数を提案する。サーマルパッチとRGBパッチ間のワッサースタイン距離と臨床温度分布の統計解析を利用して、生成したサーマルイメージが実際の試料とよく似ていることを確かめる。 RGB2熱損失関数に基づくRGB2熱伝達を用いて、顔の大規模合成熱データセットであるT-FAKEデータセットを作成する。新たなT-FAKEデータセット、確率的ランドマーク予測、ラベル適応ネットワークを活用して、異なるランドマーク規則における熱画像のランドマーク検出方法の大幅な改善を示す。我々のモデルは、スパース70点のランドマークと密度478点のランドマークアノテーションの両方で優れた性能を示している。私たちのコードとモデルはhttps://github.com/phflot/tfake.comで公開されています。

Facial analysis is a key component in a wide range of applications such as security, autonomous driving, entertainment, and healthcare. Despite the availability of various facial RGB datasets, the thermal modality, which plays a crucial role in life sciences, medicine, and biometrics, has been largely overlooked. To address this gap, we introduce the T-FAKE dataset, a new large-scale synthetic thermal dataset with sparse and dense landmarks. To facilitate the creation of the dataset, we propose a novel RGB2Thermal loss function, which enables the transfer of thermal style to RGB faces. By utilizing the Wasserstein distance between thermal and RGB patches and the statistical analysis of clinical temperature distributions on faces, we ensure that the generated thermal images closely resemble real samples. Using RGB2Thermal style transfer based on our RGB2Thermal loss function, we create the T-FAKE dataset, a large-scale synthetic thermal dataset of faces. Leveraging our novel T-FAKE dataset, probabilistic landmark prediction, and label adaptation networks, we demonstrate significant improvements in landmark detection methods on thermal images across different landmark conventions. Our models show excellent performance with both sparse 70-point landmarks and dense 478-point landmark annotations. Our code and models are available at https://github.com/phflot/tfake.

翻訳日:2024-11-08 04:41:58 公開日:2024-10-04

# 基本エントロピーの不等式から生じる量子エントロピーの連続性境界

Continuity bounds for quantum entropies arising from a fundamental entropic inequality ( http://arxiv.org/abs/2408.15306v2 )

ライセンス: Link先を確認

Koenraad Audenaert, Bjarne Bergh, Nilanjana Datta, Michael G. Jabbour, Ángela Capel, Paul Gondolf,

(参考訳) 我々は、フォン・ノイマンエントロピーの2つの量子状態、$\rho_1$ と $\rho_2$ の差について、厳密な上限を確立する。この境界は、差作用素 $(\rho_1 - \rho_2)$ のヨルダン=ハーン分解から導かれる相互直交状態のフォン・ノイマンエントロピーで表される。これは、よく知られた Audenaert-Fannes (AF) の不等式を意味する新しいエントロピー不等式をもたらす。事実、これはAFの不平等の洗練にも繋がる。この不等式を用いて、条件系上の限界が一致する2つの状態の量子条件エントロピーに対して一様連続性を得る。さらに、両変数の量子相対エントロピーに対して有界な連続性を導出するためにそれを用いる。我々の証明は、大まかに言えば、一般化理論と凸最適化に基づいている。興味深いことに、基本エントロピーの不等式は無限次元においても有効である。

We establish a tight upper bound for the difference in von Neumann entropies between two quantum states, $\rho_1$ and $\rho_2$. This bound is expressed in terms of the von Neumann entropies of the mutually orthogonal states derived from the Jordan-Hahn decomposition of the difference operator $(\rho_1 - \rho_2)$. This yields a novel entropic inequality that implies the well-known Audenaert-Fannes (AF) inequality. In fact, it also leads to a refinement of the AF inequality. We employ this inequality to obtain a uniform continuity bound for the quantum conditional entropy of two states whose marginals on the conditioning system coincide. We additionally use it to derive a continuity bound for the quantum relative entropy in both variables. Our proofs are largely based on majorization theory and convex optimization. Interestingly, the fundamental entropic inequality is also valid in infinite dimensions.

翻訳日:2024-11-08 04:41:58 公開日:2024-10-04

# CyberCortex.AI: 自律ロボットと複雑自動化のためのAIベースのオペレーティングシステム

CyberCortex.AI: An AI-based Operating System for Autonomous Robotics and Complex Automation ( http://arxiv.org/abs/2409.01241v2 )

ライセンス: Link先を確認

Sorin Grigorescu, Mihai Zaha,

(参考訳) 自律型ロボットと複雑な自動化アプリケーションを制御するための基盤となるフレームワークは、知覚制御タスクをスケジューリングできるオペレーティングシステム(OS)であり、他のロボットピアやリモートクラウドコンピュータにリアルタイムのデータ通信を提供する。本稿では、異種AIベースのロボットと複雑な自動化アプリケーションを実現するために設計されたロボットOSであるCyberCortex AIを紹介する。 CyberCortex AIは分散分散OSで、ロボット同士の対話やクラウド上の高性能コンピュータ(HPC)との通信を可能にする。ロボットのセンサーと制御データは、その後ロボットにデプロイされるAIアルゴリズムのトレーニングを目的として、HPCシステムに向けてストリームされる。ロボットの各機能(例えば、知覚データ取得、経路計画、動作制御など)は、インターネットを介して共有されるいわゆるDataBlock of Filterの中で実行される。データは、いわゆるTAM(Temporal Addressable Memory)を通じて格納され、各フィルタの入力と出力の間のゲートウェイとして機能する。 CyberCortex.AIには2つの主要なコンポーネントがある。一ロボットの組込みハードウェア上で動作するDataBlockのリアルタイム実装であるCyberCortex AI推論システム二クラウド上のHPCコンピュータ上で実行されるCyberCortex AI dojoで、AIアルゴリズムの設計、トレーニング、デプロイに使用される。本稿では,2つの協調ロボティクスアプリケーションを用いて提案手法の定量的,定性的な性能解析を行う。一ユニツリーA1脚ロボット及びAnafi Parrot 4Kドローンに基づく森林火災防止システム二協調認識及び運動制御にCyberCortex.AIを使用する自律運転システム。

The underlying framework for controlling autonomous robots and complex automation applications are Operating Systems (OS) capable of scheduling perception-and-control tasks, as well as providing real-time data communication to other robotic peers and remote cloud computers. In this paper, we introduce CyberCortex AI, a robotics OS designed to enable heterogeneous AI-based robotics and complex automation applications. CyberCortex AI is a decentralized distributed OS which enables robots to talk to each other, as well as to High Performance Computers (HPC) in the cloud. Sensory and control data from the robots is streamed towards HPC systems with the purpose of training AI algorithms, which are afterwards deployed on the robots. Each functionality of a robot (e.g. sensory data acquisition, path planning, motion control, etc.) is executed within a so-called DataBlock of Filters shared through the internet, where each filter is computed either locally on the robot itself, or remotely on a different robotic system. The data is stored and accessed via a so-called Temporal Addressable Memory (TAM), which acts as a gateway between each filter's input and output. CyberCortex.AI has two main components: i) the CyberCortex AI inference system, which is a real-time implementation of the DataBlock running on the robots' embedded hardware, and ii) the CyberCortex AI dojo, which runs on an HPC computer in the cloud, and it is used to design, train and deploy AI algorithms. We present a quantitative and qualitative performance analysis of the proposed approach using two collaborative robotics applications: i) a forest fires prevention system based on an Unitree A1 legged robot and an Anafi Parrot 4K drone, as well as ii) an autonomous driving system which uses CyberCortex.AI for collaborative perception and motion control.

翻訳日:2024-11-08 03:23:46 公開日:2024-10-04

Sorin Grigorescu, Mihai Zaha,

(参考訳) 自律型ロボットと複雑な自動化アプリケーションを制御するための基盤となるフレームワークは、知覚制御タスクをスケジューリングできるオペレーティングシステム(OS)であり、他のロボットピアやリモートクラウドコンピュータにリアルタイムのデータ通信を提供する。本稿では、異種AIベースのロボットと複雑な自動化アプリケーションを実現するために設計されたロボットOSであるCyberCortex AIを紹介する。 CyberCortex AIは分散分散OSで、ロボット同士の対話やクラウド上の高性能コンピュータ(HPC)との通信を可能にする。ロボットのセンサーと制御データは、その後ロボットにデプロイされるAIアルゴリズムのトレーニングを目的として、HPCシステムに向けてストリームされる。ロボットの各機能(例えば、知覚データ取得、経路計画、動作制御など)は、インターネットを介して共有されるいわゆるDataBlock of Filterの中で実行される。データは、いわゆるTAM(Temporal Addressable Memory)を通じて格納され、各フィルタの入力と出力の間のゲートウェイとして機能する。 CyberCortex AIには2つの主要コンポーネントがある。一ロボットの組込みハードウェア上で動作するDataBlockのリアルタイム実装であるCyberCortex AI推論システム二クラウド上のHPCコンピュータ上で実行されるCyberCortex AI dojoで、AIアルゴリズムの設計、トレーニング、デプロイに使用される。本稿では,2つの協調ロボティクスアプリケーションを用いて提案手法の定量的,定性的な性能解析を行う。一ユニツリーA1脚ロボット及びAnafi Parrot 4Kドローンに基づく森林火災防止システム二協調認識及び運動制御にCyberCortex AIを使用する自律運転システム。

The underlying framework for controlling autonomous robots and complex automation applications are Operating Systems (OS) capable of scheduling perception-and-control tasks, as well as providing real-time data communication to other robotic peers and remote cloud computers. In this paper, we introduce CyberCortex AI, a robotics OS designed to enable heterogeneous AI-based robotics and complex automation applications. CyberCortex AI is a decentralized distributed OS which enables robots to talk to each other, as well as to High Performance Computers (HPC) in the cloud. Sensory and control data from the robots is streamed towards HPC systems with the purpose of training AI algorithms, which are afterwards deployed on the robots. Each functionality of a robot (e.g. sensory data acquisition, path planning, motion control, etc.) is executed within a so-called DataBlock of Filters shared through the internet, where each filter is computed either locally on the robot itself, or remotely on a different robotic system. The data is stored and accessed via a so-called Temporal Addressable Memory (TAM), which acts as a gateway between each filter's input and output. CyberCortex AI has two main components: i) the CyberCortex AI inference system, which is a real-time implementation of the DataBlock running on the robots' embedded hardware, and ii) the CyberCortex AI dojo, which runs on an HPC computer in the cloud, and it is used to design, train and deploy AI algorithms. We present a quantitative and qualitative performance analysis of the proposed approach using two collaborative robotics applications: i) a forest fires prevention system based on an Unitree A1 legged robot and an Anafi Parrot 4K drone, as well as ii) an autonomous driving system which uses CyberCortex AI for collaborative perception and motion control.

翻訳日:2024-11-08 03:23:46 公開日:2024-10-04

# 原子干渉計を用いた重力曲率の局所測定方式

Local Measurement Scheme of Gravitational Curvature using Atom Interferometers ( http://arxiv.org/abs/2409.03515v3 )

ライセンス: Link先を確認

Michael Werner, Ali Lezeik, Dennis Schlippert, Ernst Rasel, Naceur Gaaloul, Klemens Hammerer,

(参考訳) 光パルス原子干渉計(英: Light pulse atom Interferometers、AIF)は、空間的不均一性と重力曲率の精巧な量子プローブである。さらに、極長塩基性原子干渉計(VLBAI)には詳細な測定と校正が必要不可欠である。ここでは、2つの共位置干渉計の差分信号が重力ポテンシャルの曲率に比例した位相シフトを逸脱する手法を提案する。スケール係数は、光子波数、干渉計時間、原子反動など、よく制御された量にのみ依存し、測定された位相から曲率を正確に推定することができる。ケーススタディでは,ハノーバーVLBAI施設の文脈において,このような重力波干渉計を数値シミュレーションし,複雑な空間依存性を持つ重力場における位相シフトのロバスト性を証明する。非自明な重力場に対する重力曲率の推定器を定義し、空間分解能に関する信号強度と推定精度のトレードオフを計算する。本稿では,時間依存重力場とそれに対応する測定戦略について考察する。

Light pulse atom interferometers (AIFs) are exquisite quantum probes of spatial inhomogeneity and gravitational curvature. Moreover, detailed measurement and calibration are necessary prerequisites for very-long-baseline atom interferometry (VLBAI). Here we present a method in which the differential signal of two co-located interferometers singles out a phase shift proportional to the curvature of the gravitational potential. The scale factor depends only on well controlled quantities, namely the photon wave number, the interferometer time and the atomic recoil, which allows the curvature to be accurately inferred from a measured phase. As a case study, we numerically simulate such a co-located gradiometric interferometer in the context of the Hannover VLBAI facility and prove the robustness of the phase shift in gravitational fields with complex spatial dependence. We define an estimator of the gravitational curvature for non-trivial gravitational fields and calculate the trade-off between signal strength and estimation accuracy with regard to spatial resolution. As a perspective, we discuss the case of a time-dependent gravitational field and corresponding measurement strategies.

翻訳日:2024-11-07 23:23:02 公開日:2024-10-04

# Qihoo-T2X:テキスト・ツー・アニータスクのための効率的なプロキシ・トークン型拡散変換器

Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task ( http://arxiv.org/abs/2409.04005v2 )

ライセンス: Link先を確認

Jing Wang, Ao Ma, Jiasong Feng, Dawei Leng, Yuhui Yin, Xiaodan Liang,

(参考訳) 拡散変圧器におけるグローバル自己保持機構は、視覚情報のスパースと冗長性に起因する冗長な計算を伴い、空間窓内のトークンの注意マップは、かなりの類似性を示している。この冗長性に対処するため、グローバルな視覚情報を効率的にモデル化するために、スパースな代表トークンアテンション(代表トークンの数はトークンの総数よりもはるかに少ない)を利用するプロキシ・トークン化拡散変換器(PT-DiT)を提案する。具体的には、各トランスブロック内で、各時空間ウィンドウから平均化トークンを計算し、その領域のプロキシトークンとして機能する。グローバルセマンティクスは、これらのプロキシトークンの自己アテンションを通じてキャプチャされ、その後、クロスアテンションを介してすべての潜在トークンに注入される。同時に、スパースアテンション機構によって引き起こされる詳細モデリングの限界に対処するために、ウィンドウとシフトウインドウのアテンションを導入する。 PT-DiTに基づいて,T2I,T2V,T2MVタスクの様々なモデルを含むQihoo-T2Xファミリーをさらに発展させる。実験の結果,PT-DiTは画像生成タスクと映像生成タスクの計算複雑性を減らし,競争性能が向上することがわかった(例:DiTに比べて49%,PixArt-$\alpha$に比べて34%)。 Qihoo-T2Xのビジュアルエキシビションとソースコードはhttps://360cvgroup.github.io/Qihoo-T2X/で公開されている。

The global self-attention mechanism in diffusion transformers involves redundant computation due to the sparse and redundant nature of visual information, and the attention map of tokens within a spatial window shows significant similarity. To address this redundancy, we propose the Proxy-Tokenized Diffusion Transformer (PT-DiT), which employs sparse representative token attention (where the number of representative tokens is much smaller than the total number of tokens) to model global visual information efficiently. Specifically, within each transformer block, we compute an averaging token from each spatial-temporal window to serve as a proxy token for that region. The global semantics are captured through the self-attention of these proxy tokens and then injected into all latent tokens via cross-attention. Simultaneously, we introduce window and shift window attention to address the limitations in detail modeling caused by the sparse attention mechanism. Building on the well-designed PT-DiT, we further develop the Qihoo-T2X family, which includes a variety of models for T2I, T2V, and T2MV tasks. Experimental results show that PT-DiT achieves competitive performance while reducing the computational complexity in both image and video generation tasks (e.g., a 49% reduction compared to DiT and a 34% reduction compared to PixArt-$\alpha$). The visual exhibition and source code of Qihoo-T2X is available at https://360cvgroup.github.io/Qihoo-T2X/.

翻訳日:2024-11-07 23:11:54 公開日:2024-10-04

# GRVFL-MV:マルチビュー学習に基づくグラフランダムベクトル関数リンク

GRVFL-MV: Graph Random Vector Functional Link Based on Multi-View Learning ( http://arxiv.org/abs/2409.04743v2 )

ライセンス: Link先を確認

M. Tanveer, R. K. Sharma, M. Sajid, A. Quadir,

(参考訳) ランダム化されたニューラルネットワークであるランダムベクトル汎関数リンク(RVFL)の分類性能は広く認識されている。しかし、その浅い学習特性のため、RVFLはデータセットで利用可能なすべての関連情報を考慮できないことが多い。さらにデータセットの幾何学的性質も見落としている。これらの制約に対処するために,マルチビュー学習(GRVFL-MV)モデルに基づく新しいグラフランダムベクトル汎関数リンクを提案する。提案モデルは,マルチビュー学習(MVL)の概念を取り入れた複数のビューに基づいて学習し,グラフ埋め込み(GE)フレームワークを用いて,すべてのビューの幾何学的特性を取り入れた。 RVFLネットワーク, MVL, GEフレームワークの融合により, 提案したモデルにより, 以下のことを実現できる。 i) 効率的な学習: RVFLのトポロジを活用することにより,提案したモデルは,多視点データ内の非線形関係を効率的に把握し,効率的かつ正確な予測を容易にする。二包括的表現多様な視点から情報を融合することにより、提案されたモデルがデータ内の複雑なパターンや関係を捕捉し、モデル全体の一般化性能を向上させる能力を高めること。三構造意識:本提案モデルは、GEフレームワークを用いて、本質的及びペナルティ的サブスペース学習基準の両方を自然に活用することにより、データセットの本来のデータ分布を利用する。 27のUCIデータセットとKEELデータセット、Corel5kの50データセット、AwAの45データセットを含む、さまざまなデータセット上で提案されたGRVFL-MVモデルの評価は、ベースラインモデルよりも優れたパフォーマンスを示している。これらの結果は,提案したGRVFL-MVモデルの多種多様なデータセットに対する拡張一般化能力を強調した。

The classification performance of the random vector functional link (RVFL), a randomized neural network, has been widely acknowledged. However, due to its shallow learning nature, RVFL often fails to consider all the relevant information available in a dataset. Additionally, it overlooks the geometrical properties of the dataset. To address these limitations, a novel graph random vector functional link based on multi-view learning (GRVFL-MV) model is proposed. The proposed model is trained on multiple views, incorporating the concept of multiview learning (MVL), and it also incorporates the geometrical properties of all the views using the graph embedding (GE) framework. The fusion of RVFL networks, MVL, and GE framework enables our proposed model to achieve the following: i) efficient learning: by leveraging the topology of RVFL, our proposed model can efficiently capture nonlinear relationships within the multi-view data, facilitating efficient and accurate predictions; ii) comprehensive representation: fusing information from diverse perspectives enhance the proposed model's ability to capture complex patterns and relationships within the data, thereby improving the model's overall generalization performance; and iii) structural awareness: by employing the GE framework, our proposed model leverages the original data distribution of the dataset by naturally exploiting both intrinsic and penalty subspace learning criteria. The evaluation of the proposed GRVFL-MV model on various datasets, including 27 UCI and KEEL datasets, 50 datasets from Corel5k, and 45 datasets from AwA, demonstrates its superior performance compared to baseline models. These results highlight the enhanced generalization capabilities of the proposed GRVFL-MV model across a diverse range of datasets.

翻訳日:2024-11-07 22:49:49 公開日:2024-10-04

# データアトリビューションに対する敵対的攻撃

Adversarial Attacks on Data Attribution ( http://arxiv.org/abs/2409.05657v2 )

ライセンス: Link先を確認

Xinhe Wang, Pingbang Hu, Junwei Deng, Jiaqi W. Ma,

(参考訳) データ属性は、トレーニングデータの価値を測定し、データプロバイダを補うために使用されるAIモデルの出力に対する個々のトレーニングデータポイントの貢献を定量化することを目的としている。金融決定と補償機構への影響を考えると、データ帰属手法の対角的堅牢性に批判的な疑問が生じる。しかし、この問題に対処する体系的な研究はほとんど行われていない。本研究は、敵の目標と能力について明確な仮定で脅威モデルを詳述し、データ帰属に対する原則的敵攻撃手法を提案することによって、このギャップを埋めることを目的としている。本稿では, 処理したデータセットを生成し, 補償を逆方向に拡大するシャドウアタックとアウトリーアタックという2つの手法を提案する。シャドーアタック(シャドーアタック)は、AIアプリケーションにおけるデータ配布に関する知識を活用し、メンバシップ推論攻撃で一般的に使用されるテクニックである"シャドートレーニング(Shadow training)"を通じて、敵の摂動を導出する。対照的に、Outlier攻撃はデータ配布に関する知識を前提とせず、ターゲットモデルの予測にブラックボックスクエリのみに依存する。多くのデータ属性メソッドに存在する帰納バイアス(アウトリーなデータポイントは影響を受けやすい)を活用し、操作されたデータセットを生成するために逆例を使用する。画像分類やテキスト生成タスクにおいて、シャドウアタックはデータ属性ベースの補償を少なくとも200%増加させ、アウトリエアタックは185%から643%の補償インフレーションを達成する。

Data attribution aims to quantify the contribution of individual training data points to the outputs of an AI model, which has been used to measure the value of training data and compensate data providers. Given the impact on financial decisions and compensation mechanisms, a critical question arises concerning the adversarial robustness of data attribution methods. However, there has been little to no systematic research addressing this issue. In this work, we aim to bridge this gap by detailing a threat model with clear assumptions about the adversary's goal and capabilities and proposing principled adversarial attack methods on data attribution. We present two methods, Shadow Attack and Outlier Attack, which generate manipulated datasets to inflate the compensation adversarially. The Shadow Attack leverages knowledge about the data distribution in the AI applications, and derives adversarial perturbations through "shadow training", a technique commonly used in membership inference attacks. In contrast, the Outlier Attack does not assume any knowledge about the data distribution and relies solely on black-box queries to the target model's predictions. It exploits an inductive bias present in many data attribution methods - outlier data points are more likely to be influential - and employs adversarial examples to generate manipulated datasets. Empirically, in image classification and text generation tasks, the Shadow Attack can inflate the data-attribution-based compensation by at least 200%, while the Outlier Attack achieves compensation inflation ranging from 185% to as much as 643%.

翻訳日:2024-11-07 22:27:40 公開日:2024-10-04

# エンド・ツー・エンド・エンド・ラーニング・アプローチによるマルチ・エボディメント・ロコモーション

One Policy to Run Them All: an End-to-end Learning Approach to Multi-Embodiment Locomotion ( http://arxiv.org/abs/2409.06366v2 )

ライセンス: Link先を確認

Nico Bohlinger, Grzegorz Czechmanowski, Maciej Krupka, Piotr Kicki, Krzysztof Walas, Jan Peters, Davide Tateo,

(参考訳) 深層強化学習技術は、頑丈な足の移動において最先端の結果を達成する。四足歩行、ヒューマノイド、ヘキサポッドなどの多種多様な脚を持つプラットフォームが存在するが、この分野には、これらの異なる実施物を簡単かつ効果的に制御できる単一の学習フレームワークがまだ欠けている。本稿では,このギャップを埋めるために,統一ロボット形態学アーキテクチャであるURMAを紹介する。筆者らのフレームワークは,脚ロボットの領域にエンド・ツー・エンドのマルチタスク強化学習アプローチを導入し,学習方針がロボット形態を制御できるようにする。提案手法の鍵となる考え方は,形態に依存しないエンコーダとデコーダにより,ネットワークがエボディメント間でシームレスに共有できる抽象的な移動制御器を学習できるようにすることである。この柔軟なアーキテクチャは、足歩行ロボットの移動の基礎モデルを構築するための第一歩となる可能性がある。実験の結果,URMAは,シミュレーションや実世界において,見えないロボットプラットフォームに容易に移動可能な,複数の実施形態の移動ポリシーを学習できることが判明した。

Deep Reinforcement Learning techniques are achieving state-of-the-art results in robust legged locomotion. While there exists a wide variety of legged platforms such as quadruped, humanoids, and hexapods, the field is still missing a single learning framework that can control all these different embodiments easily and effectively and possibly transfer, zero or few-shot, to unseen robot embodiments. We introduce URMA, the Unified Robot Morphology Architecture, to close this gap. Our framework brings the end-to-end Multi-Task Reinforcement Learning approach to the realm of legged robots, enabling the learned policy to control any type of robot morphology. The key idea of our method is to allow the network to learn an abstract locomotion controller that can be seamlessly shared between embodiments thanks to our morphology-agnostic encoders and decoders. This flexible architecture can be seen as a potential first step in building a foundation model for legged robot locomotion. Our experiments show that URMA can learn a locomotion policy on multiple embodiments that can be easily transferred to unseen robot platforms in simulation and the real world.

翻訳日:2024-11-07 22:16:23 公開日:2024-10-04

# RePlay: 実験と生産のための推奨フレームワーク

RePlay: a Recommendation Framework for Experimentation and Production Use ( http://arxiv.org/abs/2409.07272v3 )

ライセンス: Link先を確認

Alexey Vasilev, Anna Volodkevich, Denis Kulandin, Tatiana Bysheva, Anton Klenitskiy,

(参考訳) 推奨システムの構築と比較に1つのツールを使用すると、新しいモデルの市場投入までの時間が大幅に削減される。さらに、このようなツールを使用する場合の比較結果は、より一貫性があるように見える。そのため、リコメンデーション分野の研究者のための様々なツールやライブラリが最近登場した。残念なことに、これらのフレームワークのほとんどは主に研究者を対象としており、大規模なデータセットや不適切なアーキテクチャで作業できないため、本番環境での使用のために修正が必要である。このデモでは、オープンソースのツールキットであるRePlayを紹介します。 RePlayはまた、各ステージでパイプラインに適したスタック(Pandas、Polars、Spark)を使用することもできる。これにより、ライブラリは計算をスケールし、クラスタにデプロイできる。したがって、RePlayはデータサイエンティストが同じインターフェイスを使って簡単に研究モードからプロダクションモードに移行することを可能にする。

Using a single tool to build and compare recommender systems significantly reduces the time to market for new models. In addition, the comparison results when using such tools look more consistent. This is why many different tools and libraries for researchers in the field of recommendations have recently appeared. Unfortunately, most of these frameworks are aimed primarily at researchers and require modification for use in production due to the inability to work on large datasets or an inappropriate architecture. In this demo, we present our open-source toolkit RePlay - a framework containing an end-to-end pipeline for building recommender systems, which is ready for production use. RePlay also allows you to use a suitable stack for the pipeline on each stage: Pandas, Polars, or Spark. This allows the library to scale computations and deploy to a cluster. Thus, RePlay allows data scientists to easily move from research mode to production mode using the same interfaces.

翻訳日:2024-11-07 21:53:46 公開日:2024-10-04

# ソフトペアワイズ精度による自動計量の人的評価における統計的意義の改善

Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy ( http://arxiv.org/abs/2409.09598v2 )

ライセンス: Link先を確認

Brian Thompson, Nitika Mathur, Daniel Deutsch, Huda Khayrallah,

(参考訳) 人間のアノテータを最もよくエミュレートする自動計量を選択することは、しばしば「ベストエミュレート」という明確な定義がないため、非自明である。メタメトリックは、人間の判断と自動メートル法スコアを比較するために必要であり、メートル法ランキングはメタメトリックの選択に依存する。我々は,Pairwise Accuracy(PA)に基づく新しいメタメトリックであるSoft Pairwise Accuracy(SPA)を提案する。評価に用いるシステム/セグメント数の変化に対して,SPAはPAよりも安定であることを示す。また,測定値に異なる出力値の小さなセットのみを割り当てることが可能であることを示し,その結果,多くの指標が全く同じPAスコアに人工的に割り当てられることがわかった。 SPAがこの問題を修正することを実証します。最後に、SPAはPAよりも差別的であり、メトリクス間の統計的に有意な比較をもたらすことを示す。 SPAは2024 WMT Metrics Shared Taskの公式なシステムレベルメトリクスに選ばれた。

Selecting an automatic metric that best emulates human annotators is often non-trivial, because there is no clear definition of "best emulates." A meta-metric is required to compare the human judgments to the automatic metric scores, and metric rankings depend on the choice of meta-metric. We propose Soft Pairwise Accuracy (SPA), a new meta-metric that builds on Pairwise Accuracy (PA) but incorporates the statistical significance of both the human judgments and the metric scores. We show that SPA is more stable than PA with respect to changes in the number of systems/segments used for evaluation. We also show that PA can only assign a small set of distinct output values to metrics, and this results in many metrics being artificially assigned the exact same PA score. We demonstrate that SPA fixes this issue. Finally, we show that SPA is more discriminative than PA, producing more statistically significant comparisons between metrics. SPA was selected as the official system-level metric for the 2024 WMT Metrics Shared Task.

翻訳日:2024-11-07 20:46:36 公開日:2024-10-04

# GOSt-MT: 機械翻訳における作業関連性バイアスの知識グラフ

GOSt-MT: A Knowledge Graph for Occupation-related Gender Biases in Machine Translation ( http://arxiv.org/abs/2409.10989v2 )

ライセンス: Link先を確認

Orfeas Menis Mastromichalakis, Giorgos Filandrianos, Eva Tsouparopoulou, Dimitris Parsanoglou, Maria Symeonaki, Giorgos Stamou,

(参考訳) 機械翻訳(MT)システムにおけるジェンダーバイアスは、しばしば有害なステレオタイプを補強する重大な課題を引き起こす。特に、職業が特定の性別と不正確な関係にある労働領域では、そのような偏見は伝統的なジェンダーのステレオタイプを持続させ、社会に大きな影響を及ぼす。これらの問題に対処することは、公平かつ正確なMTシステムの確保に不可欠である。本稿では, GOSt-MT (Gender and Occupation Statistics for Machine Translation) Knowledge Graph の作成を通じて, 職業関連性バイアスを研究するための新しい手法を提案する。 GOSt-MTは、MTトレーニングで使用される実世界の労働データとテキストコーパスからの包括的性別統計を統合している。この知識グラフは、英語、フランス語、ギリシア語にまたがる男女バイアスの詳細な分析を可能にし、永続的なステレオタイプと介入を必要とする領域の同定を容易にする。 GOSt-MTは、労働市場とMTシステムの両方でどのように職業がジェンダー化されているかを理解するための構造化された枠組みを提供することによって、MTシステムをより公平にし、自動翻訳における性別バイアスを減らすことを目的とした取り組みに貢献している。

Gender bias in machine translation (MT) systems poses significant challenges that often result in the reinforcement of harmful stereotypes. Especially in the labour domain where frequently occupations are inaccurately associated with specific genders, such biases perpetuate traditional gender stereotypes with a significant impact on society. Addressing these issues is crucial for ensuring equitable and accurate MT systems. This paper introduces a novel approach to studying occupation-related gender bias through the creation of the GOSt-MT (Gender and Occupation Statistics for Machine Translation) Knowledge Graph. GOSt-MT integrates comprehensive gender statistics from real-world labour data and textual corpora used in MT training. This Knowledge Graph allows for a detailed analysis of gender bias across English, French, and Greek, facilitating the identification of persistent stereotypes and areas requiring intervention. By providing a structured framework for understanding how occupations are gendered in both labour markets and MT systems, GOSt-MT contributes to efforts aimed at making MT systems more equitable and reducing gender biases in automated translations.

翻訳日:2024-11-07 20:13:03 公開日:2024-10-04

# 家庭の音:音声除去された音声イベント検出用家庭用オーディオデータセット

The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection ( http://arxiv.org/abs/2409.11262v2 )

ライセンス: Link先を確認

Gabriel Bibbó, Thomas Deacon, Arshdeep Singh, Mark D. Plumbley,

(参考訳) 本稿では,高齢者の幸福感向上を目的としたスマートホームアプリケーションのための音声イベント検出研究を支援する住宅用オーディオデータセットを提案する。このデータセットは、55～80歳の8人の家庭に7日間の音声記録システムを展開することで構築される。音響特性は、詳細なフロアプランと建設材料情報を通して記録され、AIモデル展開のための記録環境の複製を可能にする。事前訓練された音声ニューラルネットワークを用いて、他の音声イベントを含むセグメントを保存しながら、音声を含むセグメントを検出し、除去する、新しい自動音声除去パイプラインを開発する。得られたデータセットは、住宅空間内の日常生活の音環境と活動を正確に把握するプライバシーに準拠したオーディオ記録で構成されている。本稿では,データセット作成手法,カスケードモデルアーキテクチャを利用した音声除去パイプライン,音声ラベル分布の解析を行い,音声除去プロセスの検証を行う。このデータセットは、家庭内アプリケーションに特化した音響イベント検出モデルの開発とベンチマークを可能にする。

This paper presents a residential audio dataset to support sound event detection research for smart home applications aimed at promoting wellbeing for older adults. The dataset is constructed by deploying audio recording systems in the homes of 8 participants aged 55-80 years for a 7-day period. Acoustic characteristics are documented through detailed floor plans and construction material information to enable replication of the recording environments for AI model deployment. A novel automated speech removal pipeline is developed, using pre-trained audio neural networks to detect and remove segments containing spoken voice, while preserving segments containing other sound events. The resulting dataset consists of privacy-compliant audio recordings that accurately capture the soundscapes and activities of daily living within residential spaces. The paper details the dataset creation methodology, the speech removal pipeline utilizing cascaded model architectures, and an analysis of the vocal label distribution to validate the speech removal process. This dataset enables the development and benchmarking of sound event detection models tailored specifically for in-home applications.

翻訳日:2024-11-07 20:13:03 公開日:2024-10-04

# EIA: プライバシ漏洩のためのジェネリストWebエージェントに対する環境注入攻撃

EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage ( http://arxiv.org/abs/2409.11295v2 )

ライセンス: Link先を確認

Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun,

(参考訳) ジェネラリストのウェブエージェントは、実際のウェブサイトで広範囲のタスクを自律的に完了させ、人間の生産性を著しく向上させる驚くべき可能性を示してきた。しかし、フライトの予約のようなウェブタスクは、通常ユーザーのPIIを介し、Webエージェントが誤って侵害されたウェブサイトと対話した場合、潜在的にプライバシー上のリスクにさらされる可能性がある。本研究では,敵環境におけるジェネラリストWebエージェントのプライバシーリスクに関する最初の研究を行うことにより,このギャップを狭める。まず,Webサイト上での攻撃に対する現実的な脅威モデルを提示し,ユーザ固有のPIIを盗むか,あるいはユーザ要求全体に対して,敵対的な2つのターゲットを検討する。次に,環境注入攻撃(EIA)と呼ばれる新しい攻撃手法を提案する。 EIAは、エージェントが操作する環境に順応するように設計された悪意のあるコンテンツを注入し、我々の作業は、Web環境のプライバシーシナリオに特化してEIAをインスタンス化する。我々は、Mind2Webから様々なPIIカテゴリを含む177のアクションステップを収集し、これまでで最も有能なジェネラリストWebエージェントフレームワークの1つを使用して実験を行う。その結果、EIAは特定のPIIを盗む際に最大70%のASRを達成し、16%のASRを全ユーザ要求で達成した。さらに、ステルスネスにアクセスして防衛システムプロンプトを試すことにより、EIAは検出および緩和が困難であることを示す。特に、Webページに適さない攻撃は、人間の検査によって検出できるため、セキュリティと自律性の間のトレードオフに関する議論につながります。しかし、追加の攻撃者の努力はEIAをシームレスに適応させ、そのような監督を効果的にしない。そこで我々は,人事監督に頼らず,より先進的な防衛戦略を求めることなく,Webサイトの前・後段階での防衛についてさらに議論する。

Generalist web agents have demonstrated remarkable potential in autonomously completing a wide range of tasks on real websites, significantly boosting human productivity. However, web tasks, such as booking flights, usually involve users' PII, which may be exposed to potential privacy risks if web agents accidentally interact with compromised websites, a scenario that remains largely unexplored in the literature. In this work, we narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments. First, we present a realistic threat model for attacks on the website, where we consider two adversarial targets: stealing users' specific PII or the entire user request. Then, we propose a novel attack method, termed Environmental Injection Attack (EIA). EIA injects malicious content designed to adapt well to environments where the agents operate and our work instantiates EIA specifically for privacy scenarios in web environments. We collect 177 action steps that involve diverse PII categories on realistic websites from the Mind2Web, and conduct experiments using one of the most capable generalist web agent frameworks to date. The results demonstrate that EIA achieves up to 70% ASR in stealing specific PII and 16% ASR for full user request. Additionally, by accessing the stealthiness and experimenting with a defensive system prompt, we indicate that EIA is hard to detect and mitigate. Notably, attacks that are not well adapted for a webpage can be detected via human inspection, leading to our discussion about the trade-off between security and autonomy. However, extra attackers' efforts can make EIA seamlessly adapted, rendering such supervision ineffective. Thus, we further discuss the defenses at the pre- and post-deployment stages of the websites without relying on human supervision and call for more advanced defense strategies.

翻訳日:2024-11-07 20:13:03 公開日:2024-10-04

# EIA: プライバシ漏洩のためのジェネリストWebエージェントに対する環境注入攻撃

EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage ( http://arxiv.org/abs/2409.11295v3 )

ライセンス: Link先を確認

Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun,

翻訳日:2024-11-07 20:13:03 公開日:2024-10-04

# 確率的時系列予測のためのリカレント補間器

Recurrent Interpolants for Probabilistic Time Series Prediction ( http://arxiv.org/abs/2409.11684v2 )

ライセンス: Link先を確認

Yu Chen, Marin Biloš, Sarthak Mittal, Wei Deng, Kashif Rasul, Anderson Schneider,

(参考訳) リカレントニューラルネットワークやトランスフォーマーのような逐次モデルは、様々な領域にわたる確率的多変量時系列予測の標準となっている。その強みにもかかわらず、彼らは高次元の分布と機能横断的な依存関係を捉えるのに苦労している。近年の研究では、拡散モデルやフローベースモデルを用いて、時系列計算や予測に拡張した生成的アプローチについて検討している。しかし、スケーラビリティは依然として課題である。本研究は, 確率的補間と制御機能付き条件付き生成に基づく拡散モデルの確率モデルに, 繰り返しニューラルネットワークの効率を組み合わす新しい手法を提案する。

Sequential models like recurrent neural networks and transformers have become standard for probabilistic multivariate time series forecasting across various domains. Despite their strengths, they struggle with capturing high-dimensional distributions and cross-feature dependencies. Recent work explores generative approaches using diffusion or flow-based models, extending to time series imputation and forecasting. However, scalability remains a challenge. This work proposes a novel method combining recurrent neural networks' efficiency with diffusion models' probabilistic modeling, based on stochastic interpolants and conditional generation with control features, offering insights for future developments in this dynamic field.

翻訳日:2024-11-07 19:50:48 公開日:2024-10-04

# 衛星映像における赤外小ターゲット検出:新しいデータセットと新しい特徴再構成フレームワーク

Infrared Small Target Detection in Satellite Videos: A New Dataset and A Novel Recurrent Feature Refinement Framework ( http://arxiv.org/abs/2409.12448v1 )

ライセンス: Link先を確認

Xinyi Ying, Li Liu, Zaipin Lin, Yangsi Shi, Yingqian Wang, Ruojing Li, Xu Cao, Boyang Li, Shilin Zhou,

(参考訳) 衛星ビデオにおけるMIRST(Multi-frame infrared small target)検出は、何十年にもわたって持続する基本的かつ困難な課題であり、その課題は次のように要約できる: 第一に、非常に小さなターゲットサイズ、非常に複雑なクラッタとノイズ、様々な衛星の動きは、限られた特徴表現、高い偽アラーム、難しい動き解析である。第2に、衛星ビデオにおける大規模公開可能なMIRSTデータセットの欠如は、アルゴリズムの開発を著しく妨げている。上記の課題に対処するため、我々はまず衛星ビデオ(IRSatVideo-LEO)におけるMIRST検出のための大規模データセットを構築し、次にベースライン法として繰り返し機能改善(RFR)フレームワークを開発する。具体的には、IRSatVideo-LEOは、合成された衛星の動き、ターゲットの外観、軌道、強度を備えたセミシミュレートされたデータセットであり、衛星ビデオ生成のための標準ツールボックスと、アルゴリズム開発を容易にする信頼性の高い評価プラットフォームを提供することができる。ベースライン法では,時間的依存の長期利用と統合的動き補償とMIRST検出のための既存の強力なCNNベースの手法が提案されている。具体的には, ピラミッド変形性アライメント (PDA) モジュールと時間空間周波数変調 (TSFM) モジュールを提案し, 効率的な特徴アライメント, 伝搬, 凝集, 精製を実現する。提案手法の有効性と優位性を示すため, 大規模な実験を行った。比較の結果,ResUNetのRFRは最先端のMIRST検出法よりも優れていた。データセットとコードはhttps://github.com/XinyiYing/RFR.comで公開されている。

Multi-frame infrared small target (MIRST) detection in satellite videos is a long-standing, fundamental yet challenging task for decades, and the challenges can be summarized as: First, extremely small target size, highly complex clutters & noises, various satellite motions result in limited feature representation, high false alarms, and difficult motion analyses. Second, the lack of large-scale public available MIRST dataset in satellite videos greatly hinders the algorithm development. To address the aforementioned challenges, in this paper, we first build a large-scale dataset for MIRST detection in satellite videos (namely IRSatVideo-LEO), and then develop a recurrent feature refinement (RFR) framework as the baseline method. Specifically, IRSatVideo-LEO is a semi-simulated dataset with synthesized satellite motion, target appearance, trajectory and intensity, which can provide a standard toolbox for satellite video generation and a reliable evaluation platform to facilitate the algorithm development. For baseline method, RFR is proposed to be equipped with existing powerful CNN-based methods for long-term temporal dependency exploitation and integrated motion compensation & MIRST detection. Specifically, a pyramid deformable alignment (PDA) module and a temporal-spatial-frequency modulation (TSFM) module are proposed to achieve effective and efficient feature alignment, propagation, aggregation and refinement. Extensive experiments have been conducted to demonstrate the effectiveness and superiority of our scheme. The comparative results show that ResUNet equipped with RFR outperforms the state-of-the-art MIRST detection methods. Dataset and code are released at https://github.com/XinyiYing/RFR.

翻訳日:2024-11-07 14:52:37 公開日:2024-10-04

Xinyi Ying, Li Liu, Zaipin Lin, Yangsi Shi, Yingqian Wang, Ruojing Li, Xu Cao, Boyang Li, Shilin Zhou,

翻訳日:2024-11-07 14:52:37 公開日:2024-10-04

# CodePlan: コード形式計画のスケールアップによる大規模ランガウジモデルにおける推論可能性のアンロック

CodePlan: Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning ( http://arxiv.org/abs/2409.12452v1 )

ライセンス: Link先を確認

Jiaxin Wen, Jian Guan, Hongning Wang, Wei Wu, Minlie Huang,

(参考訳) 従来の自然言語処理タスクにおける大規模言語モデル(LLM)の顕著な成功にもかかわらず、その計画能力は複雑な多段階推論タスクに取り組む上で重要なボトルネックとなっている。既存のアプローチは主にプロンプトやタスク固有の微調整に依存しており、しばしば弱い堅牢性とクロスタスクの一般化に悩まされている。この制限に対処するため,私たちは,高度で構造化された推論プロセスの概要を概説したコード形式計画の擬似コードの生成と追跡を可能にする,スケーラブルなパラダイムであるCODEPLANを紹介した。 CODEPLANは、構造化され汎用的なコードの性質を活用することで、洗練された推論に固有のリッチなセマンティクスと制御フローを効果的にキャプチャする。重要な点として、CODEPLANは、大規模で広範囲なテキストコーパスから、修正されたタスク固有のデータセットを必要とせずに、コード形式のプランを自動的に抽出することを可能にする。これにより、効率よくスケールアップし、さまざまなシナリオにおける推論機能を改善することができる。 CODEPLANをトレーニングするために,既存のコーパスから標準のプロンプト応答ペアとコード形式計画を統合する2Mサンプルの大規模データセットを構築した。 CODEPLANは、トレーニングと推論の間、計算オーバーヘッドが最小限に抑えられ、直接生成する応答と比較して25.1%の改善を実現し、数学的推論、記号的推論、命令追従、マルチホップQA、意思決定タスクにまたがる13の挑戦的なマルチステップ推論ベンチマークで平均化されている。さらなる分析により、CODEPLANはより複雑な推論タスクの性能向上と、その一般化能力によるデータ効率の向上を明らかにしている。

Despite the remarkable success of large language models (LLMs) on traditional natural language processing tasks, their planning ability remains a critical bottleneck in tackling complex multi-step reasoning tasks. Existing approaches mainly rely on prompting or task-specific fine-tuning, often suffering from weak robustness and cross-task generalization. To address the limitation, we introduce CODEPLAN, a scalable paradigm that empowers LLMs to generate and follow code-form plans pseudocode that outlines high-level, structured reasoning processes. By leveraging the structured and versatile nature of code, CODEPLAN effectively captures the rich semantics and control flows inherent to sophisticated reasoning. Importantly, CODEPLAN allows the automatic extraction of code-form plans from massive, wide-ranging text corpora without the need for curated, task-specific datasets. This enables it to scale up efficiently and improve reasoning capabilities across diverse scenarios. To train CODEPLAN, we construct a large-scale dataset of 2M examples that integrate code-form plans with standard prompt-response pairs from existing corpora. With minimal computation overhead during both training and inference, CODEPLAN achieves a 25.1% relative improvement compared with directly generating responses, averaged across 13 challenging multi-step reasoning benchmarks, spanning mathematical reasoning, symbolic reasoning, instruction-following, multi-hop QA, and decision-making tasks. Further analysis reveals CODEPLAN's increasing performance gains on more complex reasoning tasks, as well as significant data efficiency thanks to its generalization ability.

翻訳日:2024-11-07 14:52:37 公開日:2024-10-04

# コード・フォーム・プランニングのスケーリングによるLangaugeモデルにおけるアンロック推論の可能性

Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning ( http://arxiv.org/abs/2409.12452v2 )

ライセンス: Link先を確認

Jiaxin Wen, Jian Guan, Hongning Wang, Wei Wu, Minlie Huang,

(参考訳) 従来の自然言語処理タスクにおける大規模言語モデル(LLM)の顕著な成功にもかかわらず、その計画能力は複雑な多段階推論タスクに取り組む上で重要なボトルネックとなっている。既存のアプローチは主にプロンプトやタスク固有の微調整に依存しており、しばしばロバスト性やクロスタスクの一般化に悩まされている。この制限に対処するため、私たちはスケーラブルなフレームワークであるCodePlanを紹介します。 CodePlanは構造化され、汎用的なコードの性質を活用することで、洗練された推論タスクに固有のリッチなセマンティクスと制御フローを効果的にキャプチャする。重要な点として、CodePlanは、大規模で広範囲なテキストコーパスから、修正されたタスク固有のデータセットを必要とせずに、コード形式のプランを自動的に抽出することを可能にする。これにより、効率よくスケールアップでき、様々なシナリオでLCMの推論能力を改善することができる。 CodePlanをトレーニングするために、コードフォームプランと既存のコーパスから標準のプロンプト-レスポンスペアを統合する2Mサンプルの大規模なデータセットを構築した。トレーニングと推論の両方で計算オーバーヘッドが最小限に抑えられ、CodePlanは直接生成する応答と比較して25.1\%の改善を実現し、数学的推論、記号的推論、命令追従、マルチホップQA、意思決定タスクにまたがる13の挑戦的なマルチステップ推論ベンチマークで平均化されている。さらなる分析により、より複雑な推論タスクにおけるCodePlanのパフォーマンス向上と、その一般化能力によるデータ効率の向上が明らかになった。

Despite the remarkable success of large language models (LLMs) on traditional natural language processing tasks, their planning ability remains a critical bottleneck in tackling complex multi-step reasoning tasks. Existing approaches mainly rely on prompting or task-specific fine-tuning, often suffering from poor robustness and cross-task generalization. To address the limitation, we introduce CodePlan, a scalable framework that empowers LLMs to generate and follow \textit{code-form plans} -- pseudocode that outlines high-level, structured reasoning processes. By leveraging the structured and versatile nature of code, CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks. Importantly, CodePlan allows automatic extraction of code-form plans from massive, wide-ranging text corpora without the need for curated, task-specific datasets. This enables it to scale up efficiently and improve LLM's reasoning capabilities across diverse scenarios. To train CodePlan, we construct a large-scale dataset of 2M examples that integrate code-form plans with standard prompt-response pairs from existing corpora. With minimal computation overhead during both training and inference, CodePlan achieves a 25.1\% relative improvement compared with directly generating responses, averaged across 13 challenging multi-step reasoning benchmarks, spanning mathematical reasoning, symbolic reasoning, instruction-following, multi-hop QA, and decision-making tasks. Further analysis reveals CodePlan's increasing performance gains on more complex reasoning tasks, as well as significant data efficiency thanks to its generalization ability.

翻訳日:2024-11-07 14:52:37 公開日:2024-10-04

# 強化学習による自己補正のための言語モデルの構築

Training Language Models to Self-Correct via Reinforcement Learning ( http://arxiv.org/abs/2409.12917v1 )

ライセンス: Link先を確認

Aviral Kumar, Vincent Zhuang, Rishabh Agarwal, Yi Su, John D Co-Reyes, Avi Singh, Kate Baumli, Shariq Iqbal, Colton Bishop, Rebecca Roelofs, Lei M Zhang, Kay McKinney, Disha Shrivastava, Cosmin Paduraru, George Tucker, Doina Precup, Feryal Behbahani, Aleksandra Faust,

(参考訳) 自己補正は大規模言語モデル(LLM)において非常に望ましい能力であるが、現代のLLMではほとんど効果がないことが一貫して確認されている。自己補正を訓練するための既存のアプローチは、複数のモデルを必要とするか、より有能なモデルや他の形式の監督に依存している。この目的のために,完全自己生成データを用いたLLMの自己補正能力を大幅に向上させるマルチターンオンライン強化学習(RL)手法であるSCoReを開発した。 SCoReを構築するために、オフラインモデル生成補正トレースにおける教師付き微調整(SFT)の変種が自己補正動作の注入に不十分であることを示す。特に、SFTによるトレーニングは、トレーニングデータとモデル自身の応答の間の分布ミスマッチに苦しむか、あるいはテスト時に有効でない特定の修正行動のみを暗黙的に好むかのどちらかである。 SCoReは、モデル独自の自己生成補正トレースの分布の下でトレーニングを行い、適切な正規化を使用して、与えられたプロンプトに単純にハイリワード応答を適合させるのではなく、テスト時に有効である自己補正戦略を学習する。この正規化は、ベースモデル上でRLの第1フェーズを実行して、崩壊しにくいポリシー初期化を生成し、トレーニング中の自己補正を増幅するために報酬ボーナスを使用する。 Gemini 1.0 Pro と 1.5 Flash モデルに適用すると、SCoRe は最先端の自己補正性能を達成し、それぞれ MATH と HumanEval ベンチマークでベースモデルの自己補正を 15.6% と 9.1% 改善している。

Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Existing approaches for training self-correction either require multiple models or rely on a more capable model or other forms of supervision. To this end, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data. To build SCoRe, we first show that variants of supervised fine-tuning (SFT) on offline model-generated correction traces are insufficient for instilling self-correction behavior. In particular, we observe that training via SFT either suffers from a distribution mismatch between the training data and the model's own responses or implicitly prefers only a certain mode of correction behavior that is often not effective at test time. SCoRe addresses these challenges by training under the model's own distribution of self-generated correction traces and using appropriate regularization to steer the learning process into learning a self-correction strategy that is effective at test time as opposed to simply fitting high-reward responses for a given prompt. This regularization prescribes running a first phase of RL on a base model to generate a policy initialization that is less susceptible to collapse and then using a reward bonus to amplify self-correction during training. When applied to Gemini 1.0 Pro and 1.5 Flash models, we find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on the MATH and HumanEval benchmarks.

翻訳日:2024-11-07 12:59:09 公開日:2024-10-04

# 強化学習による自己補正のための言語モデルの構築

Training Language Models to Self-Correct via Reinforcement Learning ( http://arxiv.org/abs/2409.12917v2 )

ライセンス: Link先を確認

(参考訳) 自己補正は大規模言語モデル(LLM)において非常に望ましい能力であるが、現代のLLMではほとんど効果がないことが一貫して確認されている。現在の自己補正の訓練方法は、通常、複数のモデル、より高度なモデル、または追加の監督形式に依存する。これらの欠点に対処するため、完全自己生成データを用いたLLMの自己補正能力を大幅に向上させるマルチターンオンライン強化学習(RL)アプローチであるSCoReを開発した。 SCoReを構築するために、オフラインモデル生成補正トレースにおける教師付き微調整(SFT)の変種は、しばしば自己補正動作の注入に不十分であることを示す。特に、SFTによるトレーニングは、データ収集ポリシーとモデル自身の応答のミスによる分布ミスマッチや、学習が暗黙的に修正行動の特定のモードのみを優先する行動崩壊に陥ることが観察された。 SCoReは、モデル独自の自己生成補正トレースの分布の下でトレーニングを行い、適切な正規化を使用して、与えられたプロンプトに高次応答を適用するのではなく、テスト時に有効である自己補正動作を学ぶ。この正規化プロセスは、基本モデル上のマルチターンRLの初期フェーズを含み、崩壊しにくいポリシー初期化を生成し、その後、報酬ボーナスを使用して自己補正を増幅する。 Gemini 1.0 Pro と 1.5 Flash モデルでは、SCoRe は最先端の自己補正性能を達成し、ベースモデルの自己補正を MATH と HumanEval でそれぞれ 15.6% と 9.1% 改善している。

Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Current methods for training self-correction typically depend on either multiple models, a more advanced model, or additional forms of supervision. To address these shortcomings, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data. To build SCoRe, we first show that variants of supervised fine-tuning (SFT) on offline model-generated correction traces are often insufficient for instilling self-correction behavior. In particular, we observe that training via SFT falls prey to either a distribution mismatch between mistakes made by the data-collection policy and the model's own responses, or to behavior collapse, where learning implicitly prefers only a certain mode of correction behavior that is often not effective at self-correction on test problems. SCoRe addresses these challenges by training under the model's own distribution of self-generated correction traces and using appropriate regularization to steer the learning process into learning a self-correction behavior that is effective at test time as opposed to fitting high-reward responses for a given prompt. This regularization process includes an initial phase of multi-turn RL on a base model to generate a policy initialization that is less susceptible to collapse, followed by using a reward bonus to amplify self-correction. With Gemini 1.0 Pro and 1.5 Flash models, we find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on MATH and HumanEval.

翻訳日:2024-11-07 12:59:09 公開日:2024-10-04

# 最適大域制御による絡み合い型量子センシング

Entanglement-enhanced quantum sensing via optimal global control ( http://arxiv.org/abs/2409.12932v1 )

ライセンス: Link先を確認

Vineesha Srivastava, Sven Jandura, Gavin K Brennen, Guido Pupillo,

(参考訳) 共役キャビティモードに結合した$N$スピンの対称ディック部分空間における任意の絡み合った状態を生成するための決定論的プロトコルを提案する。このプロトコルは、新しい幾何学的位相ゲート、ノイズのある量子チャネルダイナミクスの解析解、最適制御法を組み合わせることで、量子センシングに有用な絡み合った状態を作成し、光子キャビティ損失、自然放出、復号化の存在下で、標準量子限界よりも精度が大幅に向上する。この研究は、キャビティ内の冷たい閉じ込められた原子と絡み合うエンハンスドセンシングへの道を開き、また、閉じ込められたイオンの実験にも直接的に関係している。

We present a deterministic protocol for the preparation of arbitrary entangled states in the symmetric Dicke subspace of $N$ spins coupled to a common cavity mode. By combining a new geometric phase gate, an analytic solution of the noisy quantum channel dynamics and optimal control methods, the protocol prepares entangled states that are useful for quantum sensing, achieving a precision significantly better than the standard quantum limit in the presence of photon cavity loss, spontaneous emission and dephasing. This work opens the way to entanglement-enhanced sensing with cold trapped atoms in cavities and is also directly relevant for experiments with trapped ions.

翻訳日:2024-11-07 12:48:01 公開日:2024-10-04

# 最適大域制御による絡み合い型量子センシング

Entanglement-enhanced quantum sensing via optimal global control ( http://arxiv.org/abs/2409.12932v2 )

ライセンス: Link先を確認

Vineesha Srivastava, Sven Jandura, Gavin K Brennen, Guido Pupillo,

翻訳日:2024-11-07 12:48:01 公開日:2024-10-04

# データダイエット:PET/CTデータセットをトリミングできるか?

Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation? ( http://arxiv.org/abs/2409.13548v1 )

ライセンス: Link先を確認

Alexander Jaus, Simon Reiß, Jens Klesiek, Rainer Stiefelhagen,

(参考訳) 本稿では,AutoPET3データ中心のトラックで競合するアプローチについて述べる。従来の知恵は、より大きなデータセットがより良いモデル性能をもたらすことを示しているが、最近の研究では、特定のトレーニングサンプルを除くと、モデルの精度が向上することを示している。 AutoPETIIIデータセットでは,特にPSMA-PETに対して多数の偽陽性を発生させることにより,データセット全体をトレーニングしたモデルが望ましくない特性を示すことがわかった。我々は、スクラッチから再トレーニングする前に、モデル損失によって測定されたトレーニングデータセットから最も簡単なサンプルを取り除き、これを対処する。提案手法を用いることで, 予備試験セットにおける偽陰体積とダイススコアの両方において, 偽負体積を下げ, ベースラインモデルを改善することができる。コードと事前訓練されたモデルはgithub.com/alexanderjaus/autopet3_datadietで入手できる。

In this work, we describe our approach to compete in the autoPET3 datacentric track. While conventional wisdom suggests that larger datasets lead to better model performance, recent studies indicate that excluding certain training samples can enhance model accuracy. We find that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics by producing a large number of false positives particularly for PSMA-PETs. We counteract this by removing the easiest samples from the training dataset as measured by the model loss before retraining from scratch. Using the proposed approach we manage to drive down the false negative volume and improve upon the baseline model in both false negative volume and dice score on the preliminary test set. Code and pre-trained models are available at github.com/alexanderjaus/autopet3_datadiet.

翻訳日:2024-11-07 06:41:58 公開日:2024-10-04

# データダイエット:PET/CTデータセットをトリミングできるか?

Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation? ( http://arxiv.org/abs/2409.13548v2 )

ライセンス: Link先を確認

Alexander Jaus, Simon Reiß, Jens Klesiek, Rainer Stiefelhagen,

翻訳日:2024-11-07 06:41:58 公開日:2024-10-04

# データダイエット:PET/CTデータセットをトリミングできるか?

Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation? ( http://arxiv.org/abs/2409.13548v3 )

ライセンス: Link先を確認

Alexander Jaus, Simon Reiß, Jens Klesiek, Rainer Stiefelhagen,

翻訳日:2024-11-07 06:41:58 公開日:2024-10-04

# ブロックワールドにおける修復: マルチモーダル言語モデルによるユーザ訂正処理のための新しいベンチマーク

Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models ( http://arxiv.org/abs/2409.14247v1 )

ライセンス: Link先を確認

Javier Chiyah-Garcia, Alessandro Suglia, Arash Eshghi,

(参考訳) 対話では、ディレクタはまず話者を誤解し、誤って応答し、しばしば第3の位置修正(TPR)で次のターンで誤解を修正するように促す。このような修復シーケンスを適切に処理し、応答する能力は、会話型AIシステムにおいて重要である。本稿では,まずBlockWorld-Repairsを設計・分析・公開し,指示追従操作タスクにおけるマルチモーダルなTPRシーケンスのデータセットについて述べる。このデータセットを用いて、複数の設定にまたがって複数の最先端のビジョン・アンド・言語モデル(VLM)を評価し、TPRを処理し、正確に応答し、それによって誤通信から回復する能力に焦点を当てる。このタスクでは、人間に比べて、すべてのモデルの性能が著しく劣っていることが分かりました。次に、VLMは、微調整中に関連するトークンをターゲットとした特別な損失の恩恵を受けることができ、より良い性能と生成性を実現することができることを示す。これらのモデルは、修復が一般的であるマルチモーダルな協調環境において、まだ展開する準備が整っていないことを示唆し、インタラクションからの学習を容易にするトレーニング体制や目的を設計する必要性を強調した。

In dialogue, the addressee may initially misunderstand the speaker and respond erroneously, often prompting the speaker to correct the misunderstanding in the next turn with a Third Position Repair (TPR). The ability to process and respond appropriately to such repair sequences is thus crucial in conversational AI systems. In this paper, we first collect, analyse, and publicly release BlockWorld-Repairs: a dataset of multi-modal TPR sequences in an instruction-following manipulation task that is, by design, rife with referential ambiguity. We employ this dataset to evaluate several state-of-the-art Vision and Language Models (VLM) across multiple settings, focusing on their capability to process and accurately respond to TPRs and thus recover from miscommunication. We find that, compared to humans, all models significantly underperform in this task. We then show that VLMs can benefit from specialised losses targeting relevant tokens during fine-tuning, achieving better performance and generisability. Our results suggest that these models are not yet ready to be deployed in multi-modal collaborative settings where repairs are common, and highlight the need to design training regimes and objectives that facilitate learning from interaction.

翻訳日:2024-11-06 23:37:15 公開日:2024-10-04

Javier Chiyah-Garcia, Alessandro Suglia, Arash Eshghi,

(参考訳) 対話では、ディレクタはまず話者を誤解し、誤って応答し、しばしば第3の位置修正(TPR)で次のターンで誤解を修正するように促す。このような修復シーケンスを適切に処理し、応答する能力は、会話型AIシステムにおいて重要である。本稿では,まずBlockWorld-Repairsを設計・分析・公開し,指示追従操作タスクにおけるマルチモーダルなTPRシーケンスのデータセットについて述べる。このデータセットを用いて、複数の設定にまたがって複数の最先端のビジョン・アンド・言語モデル(VLM)を評価し、TPRを処理し、正確に応答し、それによって誤通信から回復する能力に焦点を当てる。このタスクでは、人間に比べて、すべてのモデルの性能が著しく劣っていることが分かりました。次に、VLMは、微調整中に関連するトークンをターゲットとした特別な損失の恩恵を受けることができ、パフォーマンスが向上し、新しいシナリオに最適化できることを示す。これらのモデルは、修復が一般的であるマルチモーダルな協調環境において、まだ展開する準備が整っていないことを示唆し、インタラクションからの学習を容易にするトレーニング体制や目的を設計する必要性を強調した。私たちのコードとデータはwww.github.com/JChiyah/blockworld-repairsで利用可能です。

In dialogue, the addressee may initially misunderstand the speaker and respond erroneously, often prompting the speaker to correct the misunderstanding in the next turn with a Third Position Repair (TPR). The ability to process and respond appropriately to such repair sequences is thus crucial in conversational AI systems. In this paper, we first collect, analyse, and publicly release BlockWorld-Repairs: a dataset of multi-modal TPR sequences in an instruction-following manipulation task that is, by design, rife with referential ambiguity. We employ this dataset to evaluate several state-of-the-art Vision and Language Models (VLM) across multiple settings, focusing on their capability to process and accurately respond to TPRs and thus recover from miscommunication. We find that, compared to humans, all models significantly underperform in this task. We then show that VLMs can benefit from specialised losses targeting relevant tokens during fine-tuning, achieving better performance and generalising better to new scenarios. Our results suggest that these models are not yet ready to be deployed in multi-modal collaborative settings where repairs are common, and highlight the need to design training regimes and objectives that facilitate learning from interaction. Our code and data are available at www.github.com/JChiyah/blockworld-repairs

翻訳日:2024-11-06 23:37:15 公開日:2024-10-04

# アスペクト感度トリプレット抽出におけるASTE変換器の依存性のモデル化

ASTE Transformer Modelling Dependencies in Aspect-Sentiment Triplet Extraction ( http://arxiv.org/abs/2409.15202v2 )

ライセンス: Link先を確認

Iwo Naglik, Mateusz Lango,

(参考訳) Aspect-Sentiment Triplet extract (ASTE)は、最近提案されたアスペクトベースの感情分析のタスクであり、ある文から三重項(アスペクトフレーズ、意見フレーズ、感情極性)を抽出する。最近の最先端の手法では、まず与えられたテキストから可能なすべてのテキストを抽出し、次に潜在的なアスペクトと意見句を分類器でフィルタリングし、最後にすべてのペアを別の分類器で考慮し、さらに感情の極性を割り当てることによって、このタスクにアプローチしている。上記のスキームのいくつかのバリエーションが提案されているが、一般的な特徴は、最終的な結果が独立した分類器の連続によって構成されることである。これにより、抽出されたフレーズ間の依存関係の活用が妨げられ、分類器間の相互関係に関する知識の使用が防止され、性能が向上する。本稿では,3つのトランスフォーマーにインスパイアされたレイヤからなる新しいASTE手法を提案する。実験結果から,この手法はF1測度において,他のベンチマーク手法よりも高い性能を示すことが示された。さらに,簡単な事前学習手法により,モデルの性能が向上することを示す。

Aspect-Sentiment Triplet Extraction (ASTE) is a recently proposed task of aspect-based sentiment analysis that consists in extracting (aspect phrase, opinion phrase, sentiment polarity) triples from a given sentence. Recent state-of-the-art methods approach this task by first extracting all possible text spans from a given text, then filtering the potential aspect and opinion phrases with a classifier, and finally considering all their pairs with another classifier that additionally assigns sentiment polarity to them. Although several variations of the above scheme have been proposed, the common feature is that the final result is constructed by a sequence of independent classifier decisions. This hinders the exploitation of dependencies between extracted phrases and prevents the use of knowledge about the interrelationships between classifier predictions to improve performance. In this paper, we propose a new ASTE approach consisting of three transformer-inspired layers, which enables the modelling of dependencies both between phrases and between the final classifier decisions. Experimental results show that the method achieves higher performance in terms of F1 measure than other methods studied on popular benchmarks. In addition, we show that a simple pre-training technique further improves the performance of the model.

翻訳日:2024-11-06 20:27:58 公開日:2024-10-04

# 視覚認識におけるパラメータ効率変換学習(PETL)の統一的研究から学んだ教訓

Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition ( http://arxiv.org/abs/2409.16434v2 )

ライセンス: Link先を確認

Zheda Mai, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Li Zhang, Wei-Lun Chao,

(参考訳) 近年, パラメータ効率変換学習 (PETL) が注目されている。これは, 事前学習モデルのサイズが増大し, より優れたダウンストリーム性能を実現するために, それらを微調整 (FT) する必要があるためである。このコミュニティ全体の熱意は、多くのアプローチを引き起こしました。それにもかかわらず、パフォーマンスと適切なアプリケーションシナリオを理解するための体系的な研究には不足があり、PETLをいつ適用するか、どのアプローチを使うかといった疑問がほとんど答えられていない。本稿では,視覚変換器の文脈における代表的PETL手法の統一的な実証的研究を行う。我々は、下流タスクの精度を正確に比較するために、これらのハイパーパラメータを体系的に調整する。私たちの研究は価値あるユーザーガイドを提供するだけでなく、いくつかの新しい洞察も発表しています。まず、慎重に調整すると、異なるPETL法がローショットベンチマークVTAB-1Kで同様の精度が得られる。これにはFTのような単純な方法が含まれており、バイアス項は劣っていると報告されている。第二に、PETL法は類似した精度で異なる誤りと高い信頼率の予測を行う。このような矛盾(あるいは相補性)はアンサンブル手法の機会を開き、予備的な試みを行う。第3に、一般的に使用されるローショットタスクを超えて、PETLは、多くのショットレシエーションでも有用であることが分かりました。最後に,PETLの分散シフトに対する頑健性(例えば,CLIPバックボーン)を維持する能力について検討する。おそらく驚くことではないが、PETL法は完全なFT法よりも優れている。しかし、重量空間のアンサンブルでは、完全な微調整モデルにより、分布と分布シフト性能のバランスが良くなり、PETLの今後の研究方向性が示唆される。

Parameter-efficient transfer learning (PETL) has attracted significant attention lately, due to the increasing size of pre-trained models and the need to fine-tune (FT) them for superior downstream performance. This community-wide enthusiasm has sparked a plethora of approaches. Nevertheless, a systematic study to understand their performance and suitable application scenarios is lacking, leaving questions like when to apply PETL and which approach to use largely unanswered. In this paper, we conduct a unifying empirical study of representative PETL methods in the context of Vision Transformers. We systematically tune their hyper-parameters to fairly compare their accuracy on downstream tasks. Our study not only offers a valuable user guide but also unveils several new insights. First, if tuned carefully, different PETL methods can obtain similar accuracy in the low-shot benchmark VTAB-1K. This includes simple methods like FT the bias terms that were reported inferior. Second, though with similar accuracy, we find that PETL methods make different mistakes and high-confidence predictions, likely due to their different inductive biases. Such an inconsistency (or complementariness) opens up the opportunity for ensemble methods, and we make preliminary attempts at this. Third, going beyond the commonly used low-shot tasks, we find that PETL is also useful in many-shot regimes -- it achieves comparable and sometimes better accuracy than full FT, using much fewer learnable parameters. Last but not least, we investigate PETL's ability to preserve a pre-trained model's robustness to distribution shifts (e.g., a CLIP backbone). Perhaps not surprisingly, PETL methods outperform full FT alone. However, with weight-space ensembles, the fully fine-tuned model can better balance target (i.e., downstream) distribution and distribution shift performance, suggesting a future research direction for PETL.

翻訳日:2024-11-06 17:30:16 公開日:2024-10-04

Zheda Mai, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Li Zhang, Wei-Lun Chao,

翻訳日:2024-11-06 17:30:16 公開日:2024-10-04

# オンラインからオフラインへのフードデリバリープラットフォームが健康食品選択に与える影響を調査するサイバーフードスワップ

Cyber Food Swamps: Investigating the Impacts of Online-to-Offline Food Delivery Platforms on Healthy Food Choices ( http://arxiv.org/abs/2409.16601v2 )

ライセンス: Link先を確認

Yunke Zhang, Yiran Fan, Peijie Liu, Fengli Xu, Yong Li,

(参考訳) オンライン・トゥ・オフライン(O2O)フードデリバリープラットフォームは、都市住民の食品選択を大幅に強化し、より便利な食品アウトレットへのアクセスを可能にしている。しかし,O2Oフードデリバリープラットフォームがユーザの健康的な食品選択に与える影響については,特に懸念が残る。本研究は、大手O2Oデリバリープラットフォームからの大規模実証データを利用して、オンライン食品選択行動の包括的分析と、ファーストフードレストランへのオンライン露出、すなわちオンライン食品環境の影響について述べる。分析の結果,人口集団や都市規模において,男性,低所得者,若年者,大都市におけるファストフードの注文は,O2Oプラットフォームを経由する傾向がみられた。さらに、オンラインおよびオフライン環境における食品暴露の違いについて比較分析を行い、O2Oプラットフォームの拡張サービス範囲がより大きな「サイバフード湿地」を創出できることを確認した。さらに、レグレッション分析では、ファーストフードの注文の比率が高いのは、アクセス可能なファーストフードレストランの比率が高いのが特徴の「サイバーフード湿地」と関連していることを示している。このシェアが10%上昇すると、ファーストフードの注文率が22.0%上昇する。さらに、準自然実験は、オンライン食品環境の変化が健康食品選択に長期的な因果効果を裏付けるものである。以上の結果から,O2Oフードデリバリープラットフォームは,オンライン食品選択曝露の健康への影響に対処し,住民の食生活改善に様々な利害関係者の努力を喚起する必要性が示唆された。

Online-to-offline (O2O) food delivery platforms have substantially enriched the food choices of urban residents by allowing them to conveniently access farther food outlets. However, concerns about the healthiness of delivered food persist, especially because the impact of O2O food delivery platforms on users' healthy food choices remains unclear. This study leverages large-scale empirical data from a leading O2O delivery platform to comprehensively analyze online food choice behaviors and how they are influenced by the online exposure to fast food restaurants, i.e., online food environment. Our analyses reveal significant discrepancy in food preferences across demographic groups and city sizes, where male, low-income, and younger users and those located in larger cities more likely to order fast food via O2O platforms. Besides, we also perform a comparative analysis on the food exposure differences in online and offline environments, confirming that the extended service ranges of O2O platforms can create larger "cyber food swamps". Furthermore, regression analysis highlights that a higher ratio of fast food orders is associated with "cyber food swamps", areas characterized by a higher share of accessible fast food restaurants. A 10% increase in this share raises the probability of ordering fast food by 22.0%. Moreover, a quasi-natural experiment substantiates the long-term causal effect of online food environment changes on healthy food choices. Our findings underscore the need for O2O food delivery platforms to address the health implications of online food choice exposure, thereby informing efforts by various stakeholders to improve residents' dietary health.

翻訳日:2024-11-06 17:30:16 公開日:2024-10-04

# SDCL:半教師型医用画像分割のための学生の不一致情報修正学習

SDCL: Students Discrepancy-Informed Correction Learning for Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2409.16728v2 )

ライセンス: Link先を確認

Bentao Song, Qingfeng Wang,

(参考訳) 半教師付き医用画像セグメンテーション(SSMIS)は、限られた医療ラベル付きデータの問題を緩和する可能性を実証している。しかし, 教師によるSSMIS法は, 疑似ラベルの誤用により, 確証と認知バイアスが影響する可能性が示唆された。この課題に対処するために,我々は,2人の学生と1人の非訓練教師を含む,平均的教師のアプローチを改善し,自己修正学習の指導に2人の学生の分節差を利用するSDCL(Dedisrepancy-Informed Correction Learning)フレームワークを提案する。 SDCLの本質は、セグメンテーションの差異の領域を潜在的なバイアス領域として識別し、モデルが正しい認知をレビューし、これらの領域で自身のバイアスを補正することを奨励することである。連続的なレビューと修正によるバイアス補正学習を容易にするために、正しいセグメンテーションボクセル距離を最小化し、誤セグメンテーションボクセルエントロピーを最大化する2つの補正損失関数を用いる。 2つの3次元データセット(CTとMRI)と1つの2次元データセット(MRI)の3つの公開医用画像データセットについて実験を行った。その結果, SDCL は現在の State-of-the-Art (SOTA) 法を2.57\%, 3.04\%, 2.34\% で上回っていることがわかった。さらに,本手法の精度は,ACDCデータセットの完全教師付き手法に非常に近く,膵臓およびLAデータセットの完全教師付き手法を超えている。 (コードは \url{https://github.com/pascalcpp/SDCL})。

Semi-supervised medical image segmentation (SSMIS) has been demonstrated the potential to mitigate the issue of limited medical labeled data. However, confirmation and cognitive biases may affect the prevalent teacher-student based SSMIS methods due to erroneous pseudo-labels. To tackle this challenge, we improve the mean teacher approach and propose the Students Discrepancy-Informed Correction Learning (SDCL) framework that includes two students and one non-trainable teacher, which utilizes the segmentation difference between the two students to guide the self-correcting learning. The essence of SDCL is to identify the areas of segmentation discrepancy as the potential bias areas, and then encourage the model to review the correct cognition and rectify their own biases in these areas. To facilitate the bias correction learning with continuous review and rectification, two correction loss functions are employed to minimize the correct segmentation voxel distance and maximize the erroneous segmentation voxel entropy. We conducted experiments on three public medical image datasets: two 3D datasets (CT and MRI) and one 2D dataset (MRI). The results show that our SDCL surpasses the current State-of-the-Art (SOTA) methods by 2.57\%, 3.04\%, and 2.34\% in the Dice score on the Pancreas, LA, and ACDC datasets, respectively. In addition, the accuracy of our method is very close to the fully supervised method on the ACDC dataset, and even exceeds the fully supervised method on the Pancreas and LA dataset. (Code available at \url{https://github.com/pascalcpp/SDCL}).

翻訳日:2024-11-06 17:20:02 公開日:2024-10-04

# インフォームド深層階層分類--非標準解析によるアプローチ

Informed deep hierarchical classification: a non-standard analysis inspired approach ( http://arxiv.org/abs/2409.16956v2 )

ライセンス: Link先を確認

Lorenzo Fiaschi, Marco Cococcioni,

(参考訳) 本研究は, 厳密な親子構造に組織された複数のラベルによるデータ分類の問題という, 階層的分類課題に対する新しいアプローチを提案する。出力層の前に配置された特定のプロジェクション演算子を備えた多出力ディープニューラルネットワークで構成されている。辞書型ハイブリッドディープニューラルネットワーク(LH-DNN)と呼ばれるアーキテクチャの設計は、辞書型多目的最適化、非標準分析、ディープラーニングといった、異なる研究分野のツールを組み合わせることで実現されている。このアプローチの有効性を評価するために、結果として得られるネットワークは、階層的な分類タスクに適した畳み込みニューラルネットワークであるB-CNN、CIFAR10、CIFAR100(複数の現実世界のアプリケーションに採用され、調整される前に提案された)、Fashion-MNISTベンチマークと比較される。エビデンスによれば、LH-DNNは、特に階層関係の学習において、アドホック損失関数を重み付けすることなく、学習パラメータの劇的な減少、エポックの訓練、計算時間に直面して、優れた性能を達成できる。

This work proposes a novel approach to the deep hierarchical classification task, i.e., the problem of classifying data according to multiple labels organized in a rigid parent-child structure. It consists in a multi-output deep neural network equipped with specific projection operators placed before each output layer. The design of such an architecture, called lexicographic hybrid deep neural network (LH-DNN), has been possible by combining tools from different and quite distant research fields: lexicographic multi-objective optimization, non-standard analysis, and deep learning. To assess the efficacy of the approach, the resulting network is compared against the B-CNN, a convolutional neural network tailored for hierarchical classification tasks, on the CIFAR10, CIFAR100 (where it has been originally and recently proposed before being adopted and tuned for multiple real-world applications) and Fashion-MNIST benchmarks. Evidence states that an LH-DNN can achieve comparable if not superior performance, especially in the learning of the hierarchical relations, in the face of a drastic reduction of the learning parameters, training epochs, and computational time, without the need for ad-hoc loss functions weighting values.

翻訳日:2024-11-06 17:10:14 公開日:2024-10-04

# シリコン窒化物外部キャビティレーザーの100Hz以下固有線幅852nm

Sub-100 Hz Intrinsic Linewidth 852 nm Silicon Nitride External Cavity Laser ( http://arxiv.org/abs/2409.17382v2 )

ライセンス: Link先を確認

Hani Nejadriahi, Eric Kittlaus, Debapam Bose, Nitesh Chauhan, Jiawei Wang, Mathieu Fradet, Mahmood Bagheri, Andrei Isichenko, David Heim, Siamak Forouhar, Daniel Blumenthal,

(参考訳) レーザー冷却とセシウム原子の操作に関係し, 動作波長852nm付近に100Hz以下の固有線幅を有する外部共振器レーザを試作した。最大CW出力は24mW、波長可変は15nm、サイドモード抑制比は50dBを超える。この性能レベルは、市販の半導体ゲインチップと組み合わせて外部キャビティとして機能する低損失集積窒化ケイ素フォトニック回路を慎重に設計することによる。提案手法は, 半導体ゲイン媒質の選択により, より短い波長に拡張可能な, 超低温原子をベースとした新しいセンサ概念の必要性に着目した, サブkHzライン幅の小型集積レーザの実現可能性を示すものである。

We demonstrate an external cavity laser with intrinsic linewidth below 100 Hz around an operating wavelength of 852 nm, selected for its relevance to laser cooling and manipulation of cesium atoms. This system achieves a maximum CW output power of 24 mW, wavelength tunability over 15 nm, and a side-mode suppression ratio exceeding 50 dB. This performance level is facilitated by careful design of a low-loss integrated silicon nitride photonic circuit serving as the external cavity combined with commercially available semiconductor gain chips. This approach demonstrates the feasibility of compact integrated lasers with sub-kHz linewidth centering on the needs of emerging sensor concepts based on ultracold atoms and can be further extended to shorter wavelengths via selection of suitable semiconductor gain media.

翻訳日:2024-11-06 16:30:51 公開日:2024-10-04

# MoJE:脱獄専門家の混成、暴行攻撃の警護にタブラル・クラシファイア(動画あり)

MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks ( http://arxiv.org/abs/2409.17699v3 )

ライセンス: Link先を確認

Giandomenico Cornacchia, Giulio Zizzo, Kieran Fraser, Muhammad Zaid Hameed, Ambrish Rawat, Mark Purcell,

(参考訳) 多様なアプリケーションにおけるLarge Language Models(LLMs)の普及は、潜在的ジェイルブレイク攻撃を防ぐための堅牢なセキュリティ対策の必要性を浮き彫りにしている。これらの攻撃は、LSM内の脆弱性、データ完全性やユーザのプライバシを危険にさらす。ガードレールはこのような脅威に対して重要な防御機構として機能するが、既存のモデルは検出精度と計算効率の両方の観点から、しばしば不足する。本稿では,LLMに対するジェイルブレイク攻撃防止の重要性を論じ,これらのモデルを保護する上での入力ガードレールの役割を強調した。現状のガードレールの限界を超えるよう設計された新しいガードレールアーキテクチャであるMoJE(Mixture of Jailbreak Expert)を紹介する。単純な言語統計手法を用いることで、MoJEはモデル推論中に最小限の計算オーバーヘッドを維持しながら、ジェイルブレイク攻撃の検出に優れる。厳格な実験を通じて、MoJEは良心的なプロンプトを損なうことなく90%の攻撃を検知できる優れた性能を示し、脱獄攻撃に対するLLMの安全性を高めた。

The proliferation of Large Language Models (LLMs) in diverse applications underscores the pressing need for robust security measures to thwart potential jailbreak attacks. These attacks exploit vulnerabilities within LLMs, endanger data integrity and user privacy. Guardrails serve as crucial protective mechanisms against such threats, but existing models often fall short in terms of both detection accuracy, and computational efficiency. This paper advocates for the significance of jailbreak attack prevention on LLMs, and emphasises the role of input guardrails in safeguarding these models. We introduce MoJE (Mixture of Jailbreak Expert), a novel guardrail architecture designed to surpass current limitations in existing state-of-the-art guardrails. By employing simple linguistic statistical techniques, MoJE excels in detecting jailbreak attacks while maintaining minimal computational overhead during model inference. Through rigorous experimentation, MoJE demonstrates superior performance capable of detecting 90% of the attacks without compromising benign prompts, enhancing LLMs security against jailbreak attacks.

翻訳日:2024-11-06 16:10:55 公開日:2024-10-04

# Pairwise RankingのためのFew-shot Prompting: 効果的な非パラメトリック検索モデル

Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model ( http://arxiv.org/abs/2409.17745v3 )

ライセンス: Link先を確認

Nilanjan Sinhababu, Andrew Parry, Debasis Ganguly, Debasis Samanta, Pabitra Mitra,

(参考訳) 教師付きランキングモデルは、効果的であることの利点にもかかわらず、通常複雑な処理(通常、タスク固有の事前トレーニングと微調整の複数の段階)を伴います。これによって研究者たちは,ゼロショットで動作可能な大規模言語モデル(LLM)を活用した,シンプルなパイプラインの探索を動機付けている。しかし、ゼロショット推論では、クエリのペアとその関連ドキュメントのトレーニングセットは使用しないため、そのパフォーマンスは、そのようなペアでトレーニングされる教師付きモデルよりも大幅に低下する。トレーニングサンプルが一般的にゼロショットのパフォーマンスを改善するという既存の知見に触発されて、私たちの研究では、これがランキングモデルにも当てはまるかどうか調査している。より具体的には、クエリとドキュメントのペアが与えられた場合、トレーニングセットから類似したクエリの好みの例を増やすことで、好み予測タスクが改善される。提案手法は,インドメイン (TREC DL) とアウトドメイン (BEIR サブセット) の検索ベンチマークにおいて,ゼロショットベースラインに対する一貫した改善を示す。また,複雑なトレーニングパイプラインを必要とせず,教師付きモデルに近い性能を実現する。

A supervised ranking model, despite its advantage of being effective, usually involves complex processing - typically multiple stages of task-specific pre-training and fine-tuning. This has motivated researchers to explore simpler pipelines leveraging large language models (LLMs) that are capable of working in a zero-shot manner. However, since zero-shot inference does not make use of a training set of pairs of queries and their relevant documents, its performance is mostly worse than that of supervised models, which are trained on such example pairs. Motivated by the existing findings that training examples generally improve zero-shot performance, in our work, we explore if this also applies to ranking models. More specifically, given a query and a pair of documents, the preference prediction task is improved by augmenting examples of preferences for similar queries from a training set. Our proposed pairwise few-shot ranker demonstrates consistent improvements over the zero-shot baseline on both in-domain (TREC DL) and out-domain (BEIR subset) retrieval benchmarks. Our method also achieves a close performance to that of a supervised model without requiring any complex training pipeline.

翻訳日:2024-11-06 16:00:56 公開日:2024-10-04

# MLによる透かしの安全性評価:コピーと除去攻撃

Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks ( http://arxiv.org/abs/2409.18211v1 )

ライセンス: Link先を確認

Vitaliy Kinakh, Brian Pulfer, Yury Belousov, Pierre Fernandez, Teddy Furon, Slava Voloshynovskiy,

(参考訳) 現実世界やAIが生成したメディアから取得した膨大な量のデジタルコンテンツは、著作権保護、トレーサビリティ、データ証明の方法を必要とする。デジタル透かしはこれらの課題に対処するための重要なアプローチである。その進化は、手作り、オートエンコーダベース、基礎モデルベースメソッドの3世代に及ぶ。 % Itsの進化は、手作りの方法、オートエンコーダベースのスキーム、基礎モデルに基づく方法の3世代にまたがる。これらのシステムの堅牢性は十分に文書化されているが、敵の攻撃に対するセキュリティは未解明のままである。本稿では,逆埋め込み技術を用いた基礎モデルの潜時空間デジタル透かしシステムのセキュリティ評価を行う。一連の実験は、コピーと削除攻撃の下でのセキュリティの次元を調査し、これらのシステムの脆弱性に関する実証的な洞察を提供する。すべての実験コードと結果はhttps://github.com/vkinakh/ssl-watermarking- attacks}{repositoryで公開されている。

The vast amounts of digital content captured from the real world or AI-generated media necessitate methods for copyright protection, traceability, or data provenance verification. Digital watermarking serves as a crucial approach to address these challenges. Its evolution spans three generations: handcrafted, autoencoder-based, and foundation model based methods. %Its evolution spans three generations: handcrafted methods, autoencoder-based schemes, and methods based on foundation models. While the robustness of these systems is well-documented, the security against adversarial attacks remains underexplored. This paper evaluates the security of foundation models' latent space digital watermarking systems that utilize adversarial embedding techniques. A series of experiments investigate the security dimensions under copy and removal attacks, providing empirical insights into these systems' vulnerabilities. All experimental codes and results are available at https://github.com/vkinakh/ssl-watermarking-attacks}{repository

翻訳日:2024-11-06 15:21:45 公開日:2024-10-04

# MLによる透かしの安全性評価:コピーと除去攻撃

Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks ( http://arxiv.org/abs/2409.18211v2 )

ライセンス: Link先を確認

Vitaliy Kinakh, Brian Pulfer, Yury Belousov, Pierre Fernandez, Teddy Furon, Slava Voloshynovskiy,

(参考訳) 現実世界やAIが生成したメディアから取得した膨大な量のデジタルコンテンツは、著作権保護、トレーサビリティ、データ証明の方法を必要とする。デジタル透かしはこれらの課題に対処するための重要なアプローチである。その進化は、手作り、オートエンコーダベース、基礎モデルベースメソッドの3世代に及ぶ。これらのシステムの堅牢性は十分に文書化されているが、敵の攻撃に対するセキュリティは未解明のままである。本稿では,逆埋め込み技術を用いた基礎モデルの潜時空間デジタル透かしシステムのセキュリティ評価を行う。一連の実験は、コピーと削除攻撃の下でのセキュリティの次元を調査し、これらのシステムの脆弱性に関する実証的な洞察を提供する。実験コードと結果はすべてhttps://github.com/vkinakh/ssl-watermarking- attacksで公開されている。

The vast amounts of digital content captured from the real world or AI-generated media necessitate methods for copyright protection, traceability, or data provenance verification. Digital watermarking serves as a crucial approach to address these challenges. Its evolution spans three generations: handcrafted, autoencoder-based, and foundation model based methods. While the robustness of these systems is well-documented, the security against adversarial attacks remains underexplored. This paper evaluates the security of foundation models' latent space digital watermarking systems that utilize adversarial embedding techniques. A series of experiments investigate the security dimensions under copy and removal attacks, providing empirical insights into these systems' vulnerabilities. All experimental codes and results are available at https://github.com/vkinakh/ssl-watermarking-attacks .

翻訳日:2024-11-06 15:21:45 公開日:2024-10-04

# 拡散形状事前推定によるアモーダル・インスタンス・セグメンテーション

Amodal Instance Segmentation with Diffusion Shape Prior Estimation ( http://arxiv.org/abs/2409.18256v1 )

ライセンス: Link先を確認

Minh Tran, Khoa Vo, Tri Nguyen, Ngan Le,

(参考訳) Amodal Instance Segmentation (AIS)は、画像内のオブジェクトの可視部分と隠蔽部分の両方のセグメンテーション予測を含む、興味深い課題を提示している。従来は、アモーダルセグメンテーションを強化するために、トレーニングデータから収集した形状の事前情報に頼っていた。しかし、これらのアプローチは対象圏の詳細を過度に適合させ無視する可能性がある。最近の進歩は、潜在空間から画像を生成するために、広範囲なデータセットで事前訓練された条件付き拡散モデルの可能性を強調している。そこで我々は,拡散形状優先推定(DiffSP)モジュールを用いたAISDiffを提案する。 AISDiffは、目に見えるセグメンテーションマスクとオブジェクトカテゴリの予測から始まり、オクルージョンマスクの予測を通じてオクルージョン認識処理を行う。その後、これらの要素はDiffSPモジュールに入力され、オブジェクトの前の形状を推測します。 DiffSPは、広範囲なデータセットで事前訓練された条件付き拡散モデルを使用して、形状事前推定のためのリッチな視覚的特徴を抽出する。さらに,アモーダルセグメンテーションに先立って,その形状から注目に基づく特徴写像を利用する形状優先アモーダル予測器を提案する。様々なAISベンチマークによる実験では、AISDiffの有効性が示されています。

Amodal Instance Segmentation (AIS) presents an intriguing challenge, including the segmentation prediction of both visible and occluded parts of objects within images. Previous methods have often relied on shape prior information gleaned from training data to enhance amodal segmentation. However, these approaches are susceptible to overfitting and disregard object category details. Recent advancements highlight the potential of conditioned diffusion models, pretrained on extensive datasets, to generate images from latent space. Drawing inspiration from this, we propose AISDiff with a Diffusion Shape Prior Estimation (DiffSP) module. AISDiff begins with the prediction of the visible segmentation mask and object category, alongside occlusion-aware processing through the prediction of occluding masks. Subsequently, these elements are inputted into our DiffSP module to infer the shape prior of the object. DiffSP utilizes conditioned diffusion models pretrained on extensive datasets to extract rich visual features for shape prior estimation. Additionally, we introduce the Shape Prior Amodal Predictor, which utilizes attention-based feature maps from the shape prior to refine amodal segmentation. Experiments across various AIS benchmarks demonstrate the effectiveness of our AISDiff.

翻訳日:2024-11-06 15:01:18 公開日:2024-10-04

# 拡散形状事前推定によるアモーダル・インスタンス・セグメンテーション

Amodal Instance Segmentation with Diffusion Shape Prior Estimation ( http://arxiv.org/abs/2409.18256v2 )

ライセンス: Link先を確認

Minh Tran, Khoa Vo, Tri Nguyen, Ngan Le,

翻訳日:2024-11-06 15:01:18 公開日:2024-10-04

# 合成西ブロット源属性のための説明可能なアーティファクト

Explainable Artifacts for Synthetic Western Blot Source Attribution ( http://arxiv.org/abs/2409.18881v2 )

ライセンス: Link先を確認

João Phillipe Cardenuto, Sara Mandelli, Daniel Moreira, Paolo Bestagini, Edward Delp, Anderson Rocha,

(参考訳) 近年の人工知能の進歩により、生成モデルは原始的なものと区別できない合成科学的イメージを作成できるようになった。不正な記事を体系的に生成する製紙所として知られる組織によって活用されると、これらの技術は根拠のない科学に関する誤報の拡散に大きく寄与し、科学研究への信頼を損なう可能性がある。以前の研究では、合成コンテンツを識別するための畳み込みニューラルネットワークのようなブラックボックスソリューションを探索してきたが、異なるモデルにまたがって一般化し、検出過程を知らせる合成画像のアーティファクトに関する洞察を提供するという課題に対処する者はほとんどいなかった。本研究の目的は、最先端の生成モデル(ジェネレーティブ・ディフュージョン・モデル、ジェネレーティブ・ディフュージョン・モデル)によって生成された説明可能なアーティファクトを特定し、それらをオープン・セットの識別とソース属性(すなわち、画像を作成するモデルを指し示す)に活用することである。

Recent advancements in artificial intelligence have enabled generative models to produce synthetic scientific images that are indistinguishable from pristine ones, posing a challenge even for expert scientists habituated to working with such content. When exploited by organizations known as paper mills, which systematically generate fraudulent articles, these technologies can significantly contribute to the spread of misinformation about ungrounded science, potentially undermining trust in scientific research. While previous studies have explored black-box solutions, such as Convolutional Neural Networks, for identifying synthetic content, only some have addressed the challenge of generalizing across different models and providing insight into the artifacts in synthetic images that inform the detection process. This study aims to identify explainable artifacts generated by state-of-the-art generative models (e.g., Generative Adversarial Networks and Diffusion Models) and leverage them for open-set identification and source attribution (i.e., pointing to the model that created the image).

翻訳日:2024-11-06 05:32:49 公開日:2024-10-04

# 半修正対象検出における低バイアス教師モデルの適用

Applying the Lower-Biased Teacher Model in Semi-Suepervised Object Detection ( http://arxiv.org/abs/2409.19703v1 )

ライセンス: Link先を確認

Shuang Wang,

(参考訳) 半教師対象検出タスクに適したアンバイアスド教師モデルの強化であるローワーバイアスド教師モデルを提案する。このモデルの主な革新は、教師モデルへのローカライズ損失の統合であり、擬似ラベル生成の精度を大幅に向上させる。クラス不均衡やバウンディングボックスの精度といった重要な問題に対処することにより、ローワーバイアスト・教師・モデルはオブジェクト検出タスクにおいて優れたパフォーマンスを示す。複数の半教師対象検出データセットに対する大規模な実験により、下バイアス教師モデルは、クラス不均衡に起因する擬似ラベルバイアスを低減させるだけでなく、不正な境界ボックスによる誤りを緩和することが示された。その結果,既存の手法と比較して,mAPスコアが向上し,信頼性の高い検出結果が得られることがわかった。本研究は,精度の高い擬似ラベル生成の重要性を浮き彫りにして,半教師あり学習におけるオブジェクト検出のための堅牢なフレームワークを提供する。

I present the Lower Biased Teacher model, an enhancement of the Unbiased Teacher model, specifically tailored for semi-supervised object detection tasks. The primary innovation of this model is the integration of a localization loss into the teacher model, which significantly improves the accuracy of pseudo-label generation. By addressing key issues such as class imbalance and the precision of bounding boxes, the Lower Biased Teacher model demonstrates superior performance in object detection tasks. Extensive experiments on multiple semi-supervised object detection datasets show that the Lower Biased Teacher model not only reduces the pseudo-labeling bias caused by class imbalances but also mitigates errors arising from incorrect bounding boxes. As a result, the model achieves higher mAP scores and more reliable detection outcomes compared to existing methods. This research underscores the importance of accurate pseudo-label generation and provides a robust framework for future advancements in semi-supervised learning for object detection.

翻訳日:2024-11-05 21:29:26 公開日:2024-10-04

# 半監督対象検出における低バイアス教師モデルの適用

Applying the Lower-Biased Teacher Model in Semi-Supervised Object Detection ( http://arxiv.org/abs/2409.19703v2 )

ライセンス: Link先を確認

Shuang Wang,

翻訳日:2024-11-05 21:29:26 公開日:2024-10-04

# Coffee-Gym: 誤ったコードに対する自然言語フィードバックの評価と改善のための環境

Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code ( http://arxiv.org/abs/2409.19715v1 )

ライセンス: Link先を確認

Hyungjoo Chae, Taeyoon Kwon, Seungjun Moon, Yongho Song, Dongjin Kang, Kai Tzu-iunn Ong, Beong-woo Kwak, Seonghyeon Bae, Seung-won Hwang, Jinyoung Yeo,

(参考訳) 本稿では、コード編集のフィードバックを提供する訓練モデルのための総合的なRL環境であるCoffee-Gymについて述べる。 Coffee-Gymには,(1)人間のコード編集トレースを含むデータセットであるCoffee,(2)誤ったコード編集のための機械によるフィードバックを含むデータセットであるCoffeeEval,(2)修正されたコードのパフォーマンスをユニットテストで評価することで,フィードバックの有用性を忠実に反映する報酬関数であるCoffeeEvalが含まれる。それらとともに、Coffee-Gymは、RLでフィードバックモデルをトレーニングするための高品質データセットの有効性に対処し、SOTA報酬モデル(すなわちGPT-4)よりも正確な報酬を提供する。 Coffee-Gymを適用することで、オープンソースのLLMのコード編集の強化において、ベースラインよりも優れたフィードバックモデルを求め、それをクローズドソースのLLMに匹敵するものにする。データセットとモデルチェックポイントを公開しています。

This paper presents Coffee-Gym, a comprehensive RL environment for training models that provide feedback on code editing. Coffee-Gym includes two major components: (1) Coffee, a dataset containing humans' code edit traces for coding questions and machine-written feedback for editing erroneous code; (2) CoffeeEval, a reward function that faithfully reflects the helpfulness of feedback by assessing the performance of the revised code in unit tests. With them, Coffee-Gym addresses the unavailability of high-quality datasets for training feedback models with RL, and provides more accurate rewards than the SOTA reward model (i.e., GPT-4). By applying Coffee-Gym, we elicit feedback models that outperform baselines in enhancing open-source code LLMs' code editing, making them comparable with closed-source LLMs. We make the dataset and the model checkpoint publicly available.

翻訳日:2024-11-05 21:29:26 公開日:2024-10-04

Hyungjoo Chae, Taeyoon Kwon, Seungjun Moon, Yongho Song, Dongjin Kang, Kai Tzu-iunn Ong, Beong-woo Kwak, Seonghyeon Bae, Seung-won Hwang, Jinyoung Yeo,

翻訳日:2024-11-05 21:29:26 公開日:2024-10-04

# 指紋の質と人口動態に関する大規模運用研究

A large-scale operational study of fingerprint quality and demographics ( http://arxiv.org/abs/2409.19992v1 )

ライセンス: Link先を確認

Javier Galbally, Aleksandrs Cepilovs, Ramon Blanco-Gonzalo, Gillian Ormiston, Oscar Miguel-Hurtado, Istvan Sz. Racz,

(参考訳) 特定の人口集団に対する指紋認識技術の性能にはある程度の偏りがあるが、性別、年齢、指型などの特定の要因が指紋の品質や指紋照合精度に与える影響を理解するための十分な証拠は残っていない。本研究は、約16,000人の被験者の10プリントインプレッションを含む大規模運用データのデータベース上で、まだ研究中の課題に対処する。以上の結果から, 指紋品質と人口動態の依存性についてさらなる知見が得られ, 実際に, 個体群の異なる部分を対象とした指紋認識システムには, ある程度の性能変動が存在することが示唆された。実験的な評価に基づき、研究は、データ駆動による証拠に基づく新しい観察を指摘し、そのような観察を説明するための妥当な仮説を提供し、観察された指紋品質の違いを減らすのに役立つ潜在的なフォローアップ行動で結論付ける。このようにして、本論文は、バイオメトリック技術のアルゴリズム的公正性と等価性をさらに高めるための貢献とみなすことができる。

Even though a few initial works have shown on small sets of data some level of bias in the performance of fingerprint recognition technology with respect to certain demographic groups, there is still not sufficient evidence to understand the impact that certain factors such as gender, age or finger-type may have on fingerprint quality and, in turn, also on fingerprint matching accuracy. The present work addresses this still under researched topic, on a large-scale database of operational data containing 10-print impressions of almost 16,000 subjects. The results reached provide further insight into the dependency of fingerprint quality and demographics, and show that there in fact exists a certain degree of performance variability in fingerprint-based recognition systems for different segments of the population. Based on the experimental evaluation, the work points out new observations based on data-driven evidence, provides plausible hypotheses to explain such observations, and concludes with potential follow-up actions that can help to reduce the observed fingerprint quality differences. This way, the current paper can be considered as a contribution to further increase the algorithmic fairness and equality of biometric technology.

翻訳日:2024-11-05 16:18:02 公開日:2024-10-04

# 指紋の質と人口動態に関する大規模運用研究

A large-scale operational study of fingerprint quality and demographics ( http://arxiv.org/abs/2409.19992v2 )

ライセンス: Link先を確認

Javier Galbally, Aleksandrs Cepilovs, Ramon Blanco-Gonzalo, Gillian Ormiston, Oscar Miguel-Hurtado, Istvan Sz. Racz,

翻訳日:2024-11-05 16:18:02 公開日:2024-10-04

# VideoINSTA: LLMを用いたインフォーマティブ空間時間推論によるゼロショット長ビデオ理解

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs ( http://arxiv.org/abs/2409.20365v2 )

ライセンス: Link先を確認

Ruotong Liao, Max Erler, Huiyu Wang, Guangyao Zhai, Gengyuan Zhang, Yunpu Ma, Volker Tresp,

(参考訳) ビデオ言語領域では、ビデオ理解のためのゼロショットのLarge Language Modelベースの推論を利用した最近の研究が、従来のエンドツーエンドモデルと競合する問題となっている。しかし、長いビデオ理解は、ゼロショットLCMベースのアプローチであっても、拡張タイムパンに対する推論の複雑さのために、ユニークな課題を呈している。長ビデオにおける情報冗長性の課題は、大規模言語モデル(LLM)にどのような情報が必要なのか、そしてそれを長期ビデオ解析における複雑な時空間推論にどのように活用するかという問題を引き起こす。 Informative Spatial-TemporAl Reasoning for zero-shot long-form video understanding。 VideoINSTAは,(1)LLMを用いた長時間ビデオ理解のためのゼロショットフレームワーク,(2)ビデオ内の空間的時間的情報を引き出すイベントベースの時間的推論とコンテンツに基づく空間的推論アプローチ,(3)情報充足性と予測信頼度に基づく時間的要因のバランスをとる自己反射的情報推論スキームを提供する。 EgoSchema、NextQA、IntentQAの3つの長いビデオ質問応答ベンチマークと、オープンな質問応答データセットActivityNetQA。コードは、https://github.com/mayhugotong/VideoINSTA.comで公開されている。

In the video-language domain, recent works in leveraging zero-shot Large Language Model-based reasoning for video understanding have become competitive challengers to previous end-to-end models. However, long video understanding presents unique challenges due to the complexity of reasoning over extended timespans, even for zero-shot LLM-based approaches. The challenge of information redundancy in long videos prompts the question of what specific information is essential for large language models (LLMs) and how to leverage them for complex spatial-temporal reasoning in long-form video analysis. We propose a framework VideoINSTA, i.e. INformative Spatial-TemporAl Reasoning for zero-shot long-form video understanding. VideoINSTA contributes (1) a zero-shot framework for long video understanding using LLMs; (2) an event-based temporal reasoning and content-based spatial reasoning approach for LLMs to reason over spatial-temporal information in videos; (3) a self-reflective information reasoning scheme balancing temporal factors based on information sufficiency and prediction confidence. Our model significantly improves the state-of-the-art on three long video question-answering benchmarks: EgoSchema, NextQA, and IntentQA, and the open question answering dataset ActivityNetQA. The code is released here: https://github.com/mayhugotong/VideoINSTA.

翻訳日:2024-11-05 15:48:47 公開日:2024-10-04

# M2Distill: 一生学習のためのマルチモーダル蒸留

M2Distill: Multi-Modal Distillation for Lifelong Imitation Learning ( http://arxiv.org/abs/2410.00064v1 )

ライセンス: Link先を確認

Kaushik Roy, Akila Dissanayake, Brendan Tidd, Peyman Moghadam,

(参考訳) 操作タスクに対する生涯の模倣学習は、漸進的な学習ステップで発生する分散シフトによって大きな課題を生じさせる。既存の手法はしばしば教師なしのスキル発見に焦点を合わせ、成長を続けるスキルライブラリを構築したり、複数のポリシーから蒸留したりすることで、多様な操作タスクが継続的に導入され、学習プロセスを通して一貫した潜伏空間を確保するのに失敗するなど、スケーラビリティの問題につながる可能性がある。本稿では,マルチモーダル蒸留を用いた生涯模擬学習手法であるM2Distillを紹介し,学習過程全体を通して視覚,言語,行動分布を一貫した潜伏空間を保存することに着目した。従来の段階から現在の段階にまたがる潜在表現の変化を規制し、連続的な学習ステップ間のガウス混合モデル(GMM)ポリシーの相違を低減させることにより、学習方針は、新しいスキルをシームレスに統合しながら、学習済みのタスクを実行する能力を維持する。 LIBERO-OBJECT, LIBERO-GOAL, LIBERO-SPATIALなど, 寿命の長い模擬学習ベンチマークスイートの大規模な評価は, 評価指標のすべてにおいて, 従来手法よりも常に優れていたことを示す。

Lifelong imitation learning for manipulation tasks poses significant challenges due to distribution shifts that occur in incremental learning steps. Existing methods often focus on unsupervised skill discovery to construct an ever-growing skill library or distillation from multiple policies, which can lead to scalability issues as diverse manipulation tasks are continually introduced and may fail to ensure a consistent latent space throughout the learning process, leading to catastrophic forgetting of previously learned skills. In this paper, we introduce M2Distill, a multi-modal distillation-based method for lifelong imitation learning focusing on preserving consistent latent space across vision, language, and action distributions throughout the learning process. By regulating the shifts in latent representations across different modalities from previous to current steps, and reducing discrepancies in Gaussian Mixture Model (GMM) policies between consecutive learning steps, we ensure that the learned policy retains its ability to perform previously learned tasks while seamlessly integrating new skills. Extensive evaluations on the LIBERO lifelong imitation learning benchmark suites, including LIBERO-OBJECT, LIBERO-GOAL, and LIBERO-SPATIAL, demonstrate that our method consistently outperforms prior state-of-the-art methods across all evaluated metrics.

翻訳日:2024-11-05 15:09:43 公開日:2024-10-04

# M2Distill: 一生学習のためのマルチモーダル蒸留

M2Distill: Multi-Modal Distillation for Lifelong Imitation Learning ( http://arxiv.org/abs/2410.00064v2 )

ライセンス: Link先を確認

Kaushik Roy, Akila Dissanayake, Brendan Tidd, Peyman Moghadam,

翻訳日:2024-11-05 15:09:43 公開日:2024-10-04

# 有限データを用いた深部モデル解釈 : コアセットに基づくアプローチ

Deep Model Interpretation with Limited Data : A Coreset-based Approach ( http://arxiv.org/abs/2410.00524v1 )

ライセンス: Link先を確認

Hamed Behzadi-Khormouji, José Oramas,

(参考訳) モデル解釈は、訓練されたモデルの内部から洞察を抽出することを目的としている。この課題に対処する一般的なアプローチは、適切な操作に欠かせないモデルで内部的に符号化された関連する機能の特徴づけである。これらの手法の最近の進歩にもかかわらず、それらが必要とするデータセットの厳密な評価のため、計算コストが低いという弱点がある。その結果、これらの手法の設計に関する研究は、より小さなデータサブセットに焦点を合わせており、洞察の減少につながる可能性がある。これらの計算コストに対処するために,コアセット選択手法を用いて,大規模なデータセットの代表的なサブセットを抽出するコアセットベースの解釈フレームワークを提案する。そこで本稿では,モデル解釈手法のロバスト性を評価するための類似性に基づく評価プロトコルを提案する。いくつかの解釈法、DNNモデル、コアセット選択法を考慮した実験は、提案手法の有効性を示す。

Model Interpretation aims at the extraction of insights from the internals of a trained model. A common approach to address this task is the characterization of relevant features internally encoded in the model that are critical for its proper operation. Despite recent progress of these methods, they come with the weakness of being computationally expensive due to the dense evaluation of datasets that they require. As a consequence, research on the design of these methods have focused on smaller data subsets which may led to reduced insights. To address these computational costs, we propose a coreset-based interpretation framework that utilizes coreset selection methods to sample a representative subset of the large dataset for the interpretation task. Towards this goal, we propose a similarity-based evaluation protocol to assess the robustness of model interpretation methods towards the amount data they take as input. Experiments considering several interpretation methods, DNN models, and coreset selection methods show the effectiveness of the proposed framework.

翻訳日:2024-11-05 05:07:10 公開日:2024-10-04

# 有限データを用いた深部モデル解釈 : コアセットに基づくアプローチ

Deep Model Interpretation with Limited Data : A Coreset-based Approach ( http://arxiv.org/abs/2410.00524v2 )

ライセンス: Link先を確認

Hamed Behzadi-Khormouji, José Oramas,

翻訳日:2024-11-05 05:07:10 公開日:2024-10-04

# VideoCLIP-XL:ビデオCLIPモデルの長文記述理解の改善

VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models ( http://arxiv.org/abs/2410.00741v1 )

ライセンス: Link先を確認

Jiapeng Wang, Chengyu Wang, Kunzhe Huang, Jun Huang, Lianwen Jin,

(参考訳) Contrastive Language-Image Pre-Training (CLIP) は広く研究され、多くの応用に応用されている。しかし、事前トレーニング中の短い要約テキストに重点を置いているため、CLIPは長い記述を理解することができない。この問題は、ビデオが豊富な詳細コンテンツを含んでいることを考えると、特に鋭い。本稿では,ビデオCLIPモデルの長文理解能力を解き放つことを目的とした,ビデオCLIP-XL(eXtra Length)モデルを提案する。まず、自動データ収集システムを構築し、VIdeoとLong-Descriptionのペアで大規模なVILD事前学習データセットを収集する。次に,テキスト類似性誘導型プライマリコンポーネントマッチング(TPCM)を提案し,長文記述能力を拡張しながら特徴空間の分布をよりよく学習する。また,より理解を深めるために,Detail-aware Description Ranking (DDR) と Hallucination-aware Description Ranking (HDR) という2つの新しいタスクを導入した。最後に,Long Video Description Ranking (LVDR) ベンチマークを構築し,より包括的にLong Video Description Ranking (LVDR) を評価する。長文と短文を併用した広範に使用されているテキストビデオ検索ベンチマークとLVDRベンチマークの大規模な実験結果により,本手法の有効性が明らかとなった。

Contrastive Language-Image Pre-training (CLIP) has been widely studied and applied in numerous applications. However, the emphasis on brief summary texts during pre-training prevents CLIP from understanding long descriptions. This issue is particularly acute regarding videos given that videos often contain abundant detailed contents. In this paper, we propose the VideoCLIP-XL (eXtra Length) model, which aims to unleash the long-description understanding capability of video CLIP models. Firstly, we establish an automatic data collection system and gather a large-scale VILD pre-training dataset with VIdeo and Long-Description pairs. Then, we propose Text-similarity-guided Primary Component Matching (TPCM) to better learn the distribution of feature space while expanding the long description capability. We also introduce two new tasks namely Detail-aware Description Ranking (DDR) and Hallucination-aware Description Ranking (HDR) for further understanding improvement. Finally, we construct a Long Video Description Ranking (LVDR) benchmark for evaluating the long-description capability more comprehensively. Extensive experimental results on widely-used text-video retrieval benchmarks with both short and long descriptions and our LVDR benchmark can fully demonstrate the effectiveness of our method.

翻訳日:2024-11-05 04:05:39 公開日:2024-10-04

# VideoCLIP-XL:ビデオCLIPモデルの長文記述理解の改善

VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models ( http://arxiv.org/abs/2410.00741v2 )

ライセンス: Link先を確認

Jiapeng Wang, Chengyu Wang, Kunzhe Huang, Jun Huang, Lianwen Jin,

翻訳日:2024-11-05 04:05:39 公開日:2024-10-04

# VHASR:視覚ホットワードを用いたマルチモーダル音声認識システム

VHASR: A Multimodal Speech Recognition System With Vision Hotwords ( http://arxiv.org/abs/2410.00822v1 )

ライセンス: Link先を確認

Jiliang Hu, Zuchao Li, Ping Wang, Haojun Ai, Lefei Zhang, Hai Zhao,

(参考訳) 画像に基づくマルチモーダル音声認識(ASR)モデルは、音声関連画像を組み込んだ音声認識性能を向上させる。しかし、モデルに画像情報を導入することは、ASRの性能向上に寄与しない、という研究もある。本稿では,音声関連画像情報を活用した新しい手法を提案し,視覚をホットワードとして利用するマルチモーダル音声認識システムVHASRを提案する。本システムでは,まず2つのストリームのテキストを別々に書き起こし,出力を合成する。提案したモデルをFlickr8k,ADE20k,COCO,OpenImagesの4つのデータセットで評価した。実験の結果,VHASRは画像のキー情報を効果的に活用し,モデルの音声認識能力を向上できることがわかった。既存の画像ベースマルチモーダル ASR の中で,その性能は単調な ASR を上回るだけでなく,SOTA も達成している。

The image-based multimodal automatic speech recognition (ASR) model enhances speech recognition performance by incorporating audio-related image. However, some works suggest that introducing image information to model does not help improving ASR performance. In this paper, we propose a novel approach effectively utilizing audio-related image information and set up VHASR, a multimodal speech recognition system that uses vision as hotwords to strengthen the model's speech recognition capability. Our system utilizes a dual-stream architecture, which firstly transcribes the text on the two streams separately, and then combines the outputs. We evaluate the proposed model on four datasets: Flickr8k, ADE20k, COCO, and OpenImages. The experimental results show that VHASR can effectively utilize key information in images to enhance the model's speech recognition ability. Its performance not only surpasses unimodal ASR, but also achieves SOTA among existing image-based multimodal ASR.

翻訳日:2024-11-05 03:55:54 公開日:2024-10-04

# VHASR:視覚ホットワードを用いたマルチモーダル音声認識システム

VHASR: A Multimodal Speech Recognition System With Vision Hotwords ( http://arxiv.org/abs/2410.00822v2 )

ライセンス: Link先を確認

Jiliang Hu, Zuchao Li, Ping Wang, Haojun Ai, Lefei Zhang, Hai Zhao,

翻訳日:2024-11-05 03:55:54 公開日:2024-10-04

# 長軸マニピュレーションタスクのための安定力学系のシングルショット学習

Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks ( http://arxiv.org/abs/2410.01033v1 )

ライセンス: Link先を確認

Alexandre St-Aubin, Amin Abyaneh, Hsiu-Chin Lin,

(参考訳) 複雑なシーケンシャルなタスクをマスターすることは、ロボティクスにおいて重要な課題である。長距離操作タスクの学習は進歩してきたが、既存のほとんどのアプローチは信頼性と成功を保証するための厳密な数学的保証を欠いている。本稿では,課題達成率の向上に焦点をあて,必要となるトレーニングデータの量を削減することを目的とした,長期的タスクの学習と安定政策に関するこれまでの取り組みを拡張する。提案手法では,(1)経路ポイントとサブゴールによって定義された離散的なステップに分割し,(2)知覚ノイズやランダムな乱れに直面した場合でも,ロボットを各サブゴールに誘導するグローバルな動的システムポリシーを学習する。シミュレーションと実世界の両方の実験を通して,本手法を検証し,シミュレーションから物理ロボットプラットフォームへの効果的移行を実証した。コードはhttps://github.com/Alestaubin/stable-imitation-policy-with-waypointsで公開されている。

Mastering complex sequential tasks continues to pose a significant challenge in robotics. While there has been progress in learning long-horizon manipulation tasks, most existing approaches lack rigorous mathematical guarantees for ensuring reliable and successful execution. In this paper, we extend previous work on learning long-horizon tasks and stable policies, focusing on improving task success rates while reducing the amount of training data needed. Our approach introduces a novel method that (1) segments long-horizon demonstrations into discrete steps defined by waypoints and subgoals, and (2) learns globally stable dynamical system policies to guide the robot to each subgoal, even in the face of sensory noise and random disturbances. We validate our approach through both simulation and real-world experiments, demonstrating effective transfer from simulation to physical robotic platforms. Code is available at https://github.com/Alestaubin/stable-imitation-policy-with-waypoints

翻訳日:2024-11-04 23:40:11 公開日:2024-10-04

# 長軸マニピュレーションタスクのための安定力学系のシングルショット学習

Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks ( http://arxiv.org/abs/2410.01033v2 )

ライセンス: Link先を確認

Alexandre St-Aubin, Amin Abyaneh, Hsiu-Chin Lin,

翻訳日:2024-11-04 23:30:27 公開日:2024-10-04

# 必要なRNNは全部あるのか?

Were RNNs All We Needed? ( http://arxiv.org/abs/2410.01201v1 )

ライセンス: Link先を確認

Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, Hossein Hajimirsadegh,

(参考訳) シーケンス長に関するトランスフォーマーのスケーラビリティ制限は、トレーニング中に並列化可能なリカレントシーケンスモデルに新たな関心を寄せている。その結果、S4、Mamba、Aarenといった新しい再並行アーキテクチャが、同等のパフォーマンスを実現するために提案されている。本研究では、従来のリカレントニューラルネットワーク(RNN)を10年以上前のLSTM(1997年)とGRU(2014年)で再検討する。これらのモデルは,時間的バックプロパゲーション(BPTT)を必要とするため遅いが,入力から隠れた状態依存を取り除くことで,LSTMやGRUはBPTTを必要とせず,並列で効率的に訓練できることを示す。これに基づいて,(1)従来のパラメータよりもはるかに少ないパラメータを使用する最小バージョン (minLSTMs と minGRUs) を導入し,(2) はトレーニング中に完全に並列化可能である(長さ512のシーケンスでは175倍高速)。最後に、これらの10年前のRNNの取り除かれたバージョンは、最近のシーケンスモデルの実証的な性能と一致していることを示す。

The scalability limitations of Transformers regarding sequence length have renewed interest in recurrent sequence models that are parallelizable during training. As a result, many novel recurrent architectures, such as S4, Mamba, and Aaren, have been proposed that achieve comparable performance. In this work, we revisit traditional recurrent neural networks (RNNs) from over a decade ago: LSTMs (1997) and GRUs (2014). While these models were slow due to requiring to backpropagate through time (BPTT), we show that by removing their hidden state dependencies from their input, forget, and update gates, LSTMs and GRUs no longer need to BPTT and can be efficiently trained in parallel. Building on this, we introduce minimal versions (minLSTMs and minGRUs) that (1) use significantly fewer parameters than their traditional counterparts and (2) are fully parallelizable during training (175x faster for a sequence of length 512). Lastly, we show that these stripped-down versions of decade-old RNNs match the empirical performance of recent sequence models.

翻訳日:2024-11-04 22:50:44 公開日:2024-10-04

# 必要なRNNは全部あるのか?

Were RNNs All We Needed? ( http://arxiv.org/abs/2410.01201v2 )

ライセンス: Link先を確認

Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, Hossein Hajimirsadegh,

翻訳日:2024-11-04 22:40:58 公開日:2024-10-04

# CSIM:画像品質評価のための局所的変化に敏感なコピュラに基づく類似度指数

CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment ( http://arxiv.org/abs/2410.01411v1 )

ライセンス: Link先を確認

Safouane El Ghazouali, Umberto Michelucci, Yassin El Hillali, Hichem Nouira,

(参考訳) 画像類似度メトリクスは、画像処理、コンピュータビジョン、機械学習で使用されるため、コンピュータビジョンアプリケーションにおいて重要な役割を果たす。さらに、これらのメトリクスは、画像検索、オブジェクト認識、品質評価などのタスクを可能にし、医療、天文学、監視といった分野に必須である。 PSNR、MSE、SSIM、ISSM、FSIMといった既存のメトリクスは、画像の小さな変更に対する速度、複雑さ、感度のいずれにおいても制限に直面していることが多い。これらの課題に対処するために,画像の微妙な変化に敏感でありながらリアルタイムに組み合わせた新しい画像類似度指標CSIMについて検討した。この新しい計量は、確率論からガウスコピュラを使い、画像が局所的な画像パッチに関連する画素分布のベクトルに変換する。これらのベクトルには、強度と画素位置に加えて、画素値間の依存関係に関する情報が含まれ、画像内の構造的関係をキャプチャする。 Copulasの特性を活用することで、CSIMはピクセル強度の結合分布を効果的にモデル化し、画像パッチのより微妙な比較を可能にし、他のメトリクスと比較して局所的な変化に敏感になる。実験により、CSIMは、ノイズ、圧縮アーティファクト、ぼやけなど、様々な画像歪みシナリオにおいて、既存の類似度指標よりも優れていることが示された。この距離計が微妙な違いを検知する能力は、医用画像などの高精度なアプリケーションに適しており、小さな異常の検出が重要となる可能性がある。この研究で得られた結果は、このGithubリポジトリから再現することができる。

Image similarity metrics play an important role in computer vision applications, as they are used in image processing, computer vision and machine learning. Furthermore, those metrics enable tasks such as image retrieval, object recognition and quality assessment, essential in fields like healthcare, astronomy and surveillance. Existing metrics, such as PSNR, MSE, SSIM, ISSM and FSIM, often face limitations in terms of either speed, complexity or sensitivity to small changes in images. To address these challenges, a novel image similarity metric, namely CSIM, that combines real-time while being sensitive to subtle image variations is investigated in this paper. The novel metric uses Gaussian Copula from probability theory to transform an image into vectors of pixel distribution associated to local image patches. These vectors contain, in addition to intensities and pixel positions, information on the dependencies between pixel values, capturing the structural relationships within the image. By leveraging the properties of Copulas, CSIM effectively models the joint distribution of pixel intensities, enabling a more nuanced comparison of image patches making it more sensitive to local changes compared to other metrics. Experimental results demonstrate that CSIM outperforms existing similarity metrics in various image distortion scenarios, including noise, compression artifacts and blur. The metric's ability to detect subtle differences makes it suitable for applications requiring high precision, such as medical imaging, where the detection of minor anomalies can be of a high importance. The results obtained in this work can be reproduced from this Github repository: https://github.com/safouaneelg/copulasimilarity.

翻訳日:2024-11-04 21:09:23 公開日:2024-10-04

Safouane El Ghazouali, Umberto Michelucci, Yassin El Hillali, Hichem Nouira,

翻訳日:2024-11-04 21:09:23 公開日:2024-10-04

# Verbalized Graph Representation Learning: Entire Processを通しての大規模言語モデルに基づく完全解釈可能なグラフモデル

Verbalized Graph Representation Learning: A Fully Interpretable Graph Model Based on Large Language Models Throughout the Entire Process ( http://arxiv.org/abs/2410.01457v1 )

ライセンス: Link先を確認

Xingyu Ji, Jiale Liu, Lu Li, Maojun Wang, Zeyu Zhang,

(参考訳) テキスト分散グラフ(TAGs)での表現学習は、特にグラフニューラルネットワーク(GNNs)を通じて、広範囲にわたる実世界の応用により、大きな関心を集めている。従来のGNN手法はグラフの構造情報を符号化することに重点を置いており、しばしばノードやエッジ属性の浅いテキスト埋め込みを用いている。これにより、データ内のリッチなセマンティック情報と、複雑な下流タスクの推論能力を理解することができ、解釈可能性も欠如する。大規模言語モデル(LLM)の台頭に伴い、グラフ表現学習や下流タスクのためのGNNと組み合わせた研究が増えている。これらのアプローチは、TAGsデータセットのリッチなセマンティック情報を効果的に活用するが、主な欠点は、部分的に解釈可能であり、クリティカルフィールドでの応用を制限することである。本稿では,完全に解釈可能な言語型グラフ表現学習(VGRL)手法を提案する。通常、連続的なパラメータ空間内で最適化される従来のグラフ機械学習モデルとは対照的に、VGRLは、このパラメータ空間をテキスト記述に制限し、プロセス全体を通して完全な解釈可能性を保証する。我々は,VGRLの有効性を実証的に評価するためにいくつかの研究を行い,これらの手法がグラフ表現学習におけるステップストーンとして役立つと信じている。

Representation learning on text-attributed graphs (TAGs) has attracted significant interest due to its wide-ranging real-world applications, particularly through Graph Neural Networks (GNNs). Traditional GNN methods focus on encoding the structural information of graphs, often using shallow text embeddings for node or edge attributes. This limits the model to understand the rich semantic information in the data and its reasoning ability for complex downstream tasks, while also lacking interpretability. With the rise of large language models (LLMs), an increasing number of studies are combining them with GNNs for graph representation learning and downstream tasks. While these approaches effectively leverage the rich semantic information in TAGs datasets, their main drawback is that they are only partially interpretable, which limits their application in critical fields. In this paper, we propose a verbalized graph representation learning (VGRL) method which is fully interpretable. In contrast to traditional graph machine learning models, which are usually optimized within a continuous parameter space, VGRL constrains this parameter space to be text description which ensures complete interpretability throughout the entire process, making it easier for users to understand and trust the decisions of the model. We conduct several studies to empirically evaluate the effectiveness of VGRL and we believe these method can serve as a stepping stone in graph representation learning.

翻訳日:2024-11-04 17:44:25 公開日:2024-10-04

Xingyu Ji, Jiale Liu, Lu Li, Maojun Wang, Zeyu Zhang,

翻訳日:2024-11-04 17:44:25 公開日:2024-10-04

# フェデレーション学習における低ランク適応のための選択的集約

Selective Aggregation for Low-Rank Adaptation in Federated Learning ( http://arxiv.org/abs/2410.01463v1 )

ライセンス: Link先を確認

Pengxin Guo, Shuang Zeng, Yanran Wang, Huijie Fan, Feifei Wang, Liangqiong Qu,

(参考訳) 我々は、学習した$A$および$B$行列の非対称性解析のレンズを通して、連合学習におけるLoRAについて検討する。そうすることで、$A$行列が一般的な知識を学習するのに対して、$B$行列はクライアント固有の知識を取得することに重点を置いていることがわかりました。この発見に基づいて、フェデレートシェアA 低ランク適応(FedSA-LoRA)を導入し、重量更新をモデル化するために2つの低ランクトレーニング可能な行列をA$とB$で使用するが、集約のためにサーバと共有されるのはA$のみである。さらに、学習した$A$と$B$の関係をrsLoRAやVeRAといった他のLoRA変種で調べ、一貫したパターンを明らかにします。その結果,FedSA-rsLoRA法をこれらのLoRA変種に拡張し,FedSA-rsLoRA法とFedSA-VeRA法が得られた。このようにして、Lora と FL を統合するための一般的なパラダイムを確立し、その後の LoRA 変種と FL を組み合わせるためのガイダンスを提供する。自然言語理解と生成タスクに関する大規模な実験結果から,提案手法の有効性が示された。

We investigate LoRA in federated learning through the lens of the asymmetry analysis of the learned $A$ and $B$ matrices. In doing so, we uncover that $A$ matrices are responsible for learning general knowledge, while $B$ matrices focus on capturing client-specific knowledge. Based on this finding, we introduce Federated Share-A Low-Rank Adaptation (FedSA-LoRA), which employs two low-rank trainable matrices $A$ and $B$ to model the weight update, but only $A$ matrices are shared with the server for aggregation. Moreover, we delve into the relationship between the learned $A$ and $B$ matrices in other LoRA variants, such as rsLoRA and VeRA, revealing a consistent pattern. Consequently, we extend our FedSA-LoRA method to these LoRA variants, resulting in FedSA-rsLoRA and FedSA-VeRA. In this way, we establish a general paradigm for integrating LoRA with FL, offering guidance for future work on subsequent LoRA variants combined with FL. Extensive experimental results on natural language understanding and generation tasks demonstrate the effectiveness of the proposed method.

翻訳日:2024-11-04 17:34:40 公開日:2024-10-04

# フェデレーション学習における低ランク適応のための選択的集約

Selective Aggregation for Low-Rank Adaptation in Federated Learning ( http://arxiv.org/abs/2410.01463v2 )

ライセンス: Link先を確認

Pengxin Guo, Shuang Zeng, Yanran Wang, Huijie Fan, Feifei Wang, Liangqiong Qu,

(参考訳) 我々は、学習した$A$および$B$行列の非対称性解析のレンズを通して、連合学習におけるLoRAについて検討する。そうすることで、$A$行列が一般的な知識を学習するのに対して、$B$行列はクライアント固有の知識を取得することに重点を置いていることがわかりました。この発見に基づいて、フェデレートシェアA 低ランク適応(FedSA-LoRA)を導入し、重量更新をモデル化するために2つの低ランクトレーニング可能な行列をA$とB$で使用するが、集約のためにサーバと共有されるのはA$のみである。さらに、学習した$A$と$B$の関係をrsLoRAやVeRAといった他のLoRA変種で調べ、一貫したパターンを明らかにします。その結果,FedSA-rsLoRA法をこれらのLoRA変種に拡張し,FedSA-rsLoRA法とFedSA-VeRA法が得られた。このようにして、Lora と FL を統合するための一般的なパラダイムを確立し、その後の LoRA 変種と FL を組み合わせるためのガイダンスを提供する。自然言語理解と生成タスクに関する大規模な実験結果から,提案手法の有効性が示された。私たちのコードはhttps://github.com/Pengxin-Guo/FedSA-LoRAで公開されています。

We investigate LoRA in federated learning through the lens of the asymmetry analysis of the learned $A$ and $B$ matrices. In doing so, we uncover that $A$ matrices are responsible for learning general knowledge, while $B$ matrices focus on capturing client-specific knowledge. Based on this finding, we introduce Federated Share-A Low-Rank Adaptation (FedSA-LoRA), which employs two low-rank trainable matrices $A$ and $B$ to model the weight update, but only $A$ matrices are shared with the server for aggregation. Moreover, we delve into the relationship between the learned $A$ and $B$ matrices in other LoRA variants, such as rsLoRA and VeRA, revealing a consistent pattern. Consequently, we extend our FedSA-LoRA method to these LoRA variants, resulting in FedSA-rsLoRA and FedSA-VeRA. In this way, we establish a general paradigm for integrating LoRA with FL, offering guidance for future work on subsequent LoRA variants combined with FL. Extensive experimental results on natural language understanding and generation tasks demonstrate the effectiveness of the proposed method. Our code is available at https://github.com/Pengxin-Guo/FedSA-LoRA.

翻訳日:2024-11-04 17:34:40 公開日:2024-10-04

# HarmAug: 安全ガードモデルの知識蒸留に有効なデータ拡張

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models ( http://arxiv.org/abs/2410.01524v1 )

ライセンス: Link先を確認

Seanie Lee, Haebin Seong, Dong Bok Lee, Minki Kang, Xiaoyin Chen, Dominik Wagner, Yoshua Bengio, Juho Lee, Sung Ju Hwang,

(参考訳) 大規模言語モデル(LLM)を対象とした悪意のあるクエリを検出する安全ガードモデルは、現実世界のアプリケーションにおけるLLMのセキュアで責任あるデプロイを保証するために不可欠である。しかし、モバイル機器にLLMと並行して数十億のパラメータを持つ既存の安全ガードモデルをデプロイするのは、かなりのメモリ要件とレイテンシのために現実的ではない。このコストを削減するため、二元的有害性ラベルを持つ命令応答対のラベル付きデータセットを用いて、大規模な教師安全ガードモデルをより小さなものに蒸留する。既存のラベル付きデータセットの有害な命令の多様性が限られているため、ナトリウム蒸留モデルはより大きなモデルに比べて性能が劣る傾向にある。小型モデルと大規模モデルの間のギャップを埋めるため,LLMをジェイルブレイクして有害な命令を生成する単純なデータ拡張手法であるHarmAugを提案する。攻撃的コンテンツを誘発する単一の有害なインストラクションプロンプト」のようなプロンプトが与えられたら、LLMの応答に肯定的なプレフィックス(例:"I have an idea for a prompt:")を追加する。これによりLSMは応答の残りを引き続き生成し、有害な命令をサンプリングする。別のLCMは有害な命令に対する応答を生成し、教師モデルは命令応答対をラベル付けする。 HarmAugが他の関連するベースラインより優れていることを実証的に示しています。さらに、HarmAugでトレーニングされた435万パラメータの安全ガードモデルは、70億以上のパラメータを持つ大型モデルに匹敵するF1スコアを達成し、計算コストの25%未満で運用しながら、AUPRCでそれを上回ります。

Safety guard models that detect malicious queries aimed at large language models (LLMs) are essential for ensuring the secure and responsible deployment of LLMs in real-world applications. However, deploying existing safety guard models with billions of parameters alongside LLMs on mobile devices is impractical due to substantial memory requirements and latency. To reduce this cost, we distill a large teacher safety guard model into a smaller one using a labeled dataset of instruction-response pairs with binary harmfulness labels. Due to the limited diversity of harmful instructions in the existing labeled dataset, naively distilled models tend to underperform compared to larger models. To bridge the gap between small and large models, we propose HarmAug, a simple yet effective data augmentation method that involves jailbreaking an LLM and prompting it to generate harmful instructions. Given a prompt such as, "Make a single harmful instruction prompt that would elicit offensive content", we add an affirmative prefix (e.g., "I have an idea for a prompt:") to the LLM's response. This encourages the LLM to continue generating the rest of the response, leading to sampling harmful instructions. Another LLM generates a response to the harmful instruction, and the teacher model labels the instruction-response pair. We empirically show that our HarmAug outperforms other relevant baselines. Moreover, a 435-million-parameter safety guard model trained with HarmAug achieves an F1 score comparable to larger models with over 7 billion parameters, and even outperforms them in AUPRC, while operating at less than 25% of their computational cost.

翻訳日:2024-11-04 17:14:45 公開日:2024-10-04

# HarmAug: 安全ガードモデルの知識蒸留に有効なデータ拡張

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models ( http://arxiv.org/abs/2410.01524v2 )

ライセンス: Link先を確認

Seanie Lee, Haebin Seong, Dong Bok Lee, Minki Kang, Xiaoyin Chen, Dominik Wagner, Yoshua Bengio, Juho Lee, Sung Ju Hwang,

翻訳日:2024-11-04 17:14:45 公開日:2024-10-04

# 球面上のひび割れしたカーネル確率勾配

Truncated Kernel Stochastic Gradient Descent on Spheres ( http://arxiv.org/abs/2410.01570v1 )

ライセンス: Link先を確認

JinHui Bai, Lei Shi,

(参考訳) 球面高調波の構造に着想を得て,最小二乗損失関数を持つT-kernel SGDアルゴリズムを提案する。 TカーネルSGDは「トランケーション」演算を用いており、直列ベースのカーネル関数を確率勾配降下に適用することで、高次元空間で適切な閉形式カーネル関数を見つけるのが困難になるのを避けることができる。従来のカーネルSGDとは対照的に、TカーネルSGDは反復中に仮説空間を動的に調整することでバイアスと分散のバランスをとるのにより効果的である。提案アルゴリズムの最も重要な利点は、カーネルSGDの固有の飽和問題を克服しつつ、一定のステップサイズ(サンプルサイズに依存しない)を用いて理論的に最適な収束率を達成することができることである。さらに、球面多項式の構造を利用して等価なTカーネルSGDを導出し、カーネルSGDと比較してストレージと計算コストを大幅に削減する。典型的には、Tカーネル SGD は、計算複雑性が$\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$計算量と$\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$ストレージを必要とする。この規則性は、目的関数の滑らか度パラメータと、カーネル関数に付随する積分作用素の固有値の減衰率によって決定され、どちらも推定問題の難しさを反映している。本研究の主な成果は,この先行情報がTカーネルSGDの収束にどのように影響するかを定量的に評価することである。数値実験により, 本論文で示された理論的知見をさらに検証した。

Inspired by the structure of spherical harmonics, we propose the truncated kernel stochastic gradient descent (T-kernel SGD) algorithm with a least-square loss function for spherical data fitting. T-kernel SGD employs a "truncation" operation, enabling the application of a series-based kernel function in stochastic gradient descent, thereby avoiding the difficulties of finding suitable closed-form kernel functions in high-dimensional spaces. In contrast to traditional kernel SGD, T-kernel SGD is more effective in balancing bias and variance by dynamically adjusting the hypothesis space during iterations. The most significant advantage of the proposed algorithm is that it can achieve theoretically optimal convergence rates using a constant step size (independent of the sample size) while overcoming the inherent saturation problem of kernel SGD. Additionally, we leverage the structure of spherical polynomials to derive an equivalent T-kernel SGD, significantly reducing storage and computational costs compared to kernel SGD. Typically, T-kernel SGD requires only $\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$ computational complexity and $\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$ storage to achieve optimal rates for the d-dimensional sphere, where $0<\epsilon<\frac{1}{2}$ can be arbitrarily small if the optimal fitting or the underlying space possesses sufficient regularity. This regularity is determined by the smoothness parameter of the objective function and the decaying rate of the eigenvalues of the integral operator associated with the kernel function, both of which reflect the difficulty of the estimation problem. Our main results quantitatively characterize how this prior information influences the convergence of T-kernel SGD. The numerical experiments further validate the theoretical findings presented in this paper.

翻訳日:2024-11-04 16:54:49 公開日:2024-10-04

# 球面上のひび割れしたカーネル確率勾配

Truncated Kernel Stochastic Gradient Descent on Spheres ( http://arxiv.org/abs/2410.01570v2 )

ライセンス: Link先を確認

JinHui Bai, Lei Shi,

(参考訳) 球面高調波の構造に着想を得て,最小二乗損失関数を持つT-kernel SGDアルゴリズムを提案する。 TカーネルSGDは「トランケーション」演算を用い、直列ベースのカーネル関数を確率勾配下降に適用することで、高次元空間で適切な閉形式カーネル関数を見つけるのが困難になるのを避ける。従来のカーネルSGDとは対照的に、TカーネルSGDは反復中に仮説空間を動的に調整することでバイアスと分散のバランスをとるのにより効果的である。提案アルゴリズムの最も重要な利点は、カーネルSGDの固有の飽和問題を克服しつつ、一定のステップサイズ(サンプルサイズに依存しない)を用いて理論的に最適な収束率を達成することができることである。さらに、球面多項式の構造を利用して等価なTカーネルSGDを導出し、カーネルSGDと比較してストレージと計算コストを大幅に削減する。典型的には、Tカーネル SGD は、計算複雑性が$\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$計算量と$\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$ストレージを必要とする。この規則性は、目的関数の滑らか度パラメータと、カーネル関数に付随する積分作用素の固有値の減衰率によって決定され、どちらも推定問題の難しさを反映している。本研究の主な成果は,この先行情報がTカーネルSGDの収束にどのように影響するかを定量的に評価することである。数値実験により, 本論文で示された理論的知見をさらに検証した。

Inspired by the structure of spherical harmonics, we propose the truncated kernel stochastic gradient descent (T-kernel SGD) algorithm with a least-square loss function for spherical data fitting. T-kernel SGD employs a "truncation" operation, enabling the application of series-based kernels function in stochastic gradient descent, thereby avoiding the difficulties of finding suitable closed-form kernel functions in high-dimensional spaces. In contrast to traditional kernel SGD, T-kernel SGD is more effective in balancing bias and variance by dynamically adjusting the hypothesis space during iterations. The most significant advantage of the proposed algorithm is that it can achieve theoretically optimal convergence rates using a constant step size (independent of the sample size) while overcoming the inherent saturation problem of kernel SGD. Additionally, we leverage the structure of spherical polynomials to derive an equivalent T-kernel SGD, significantly reducing storage and computational costs compared to kernel SGD. Typically, T-kernel SGD requires only $\mathcal{O}(n^{1+\frac{d}{d-1}\epsilon})$ computational complexity and $\mathcal{O}(n^{\frac{d}{d-1}\epsilon})$ storage to achieve optimal rates for the d-dimensional sphere, where $0<\epsilon<\frac{1}{2}$ can be arbitrarily small if the optimal fitting or the underlying space possesses sufficient regularity. This regularity is determined by the smoothness parameter of the objective function and the decaying rate of the eigenvalues of the integral operator associated with the kernel function, both of which reflect the difficulty of the estimation problem. Our main results quantitatively characterize how this prior information influences the convergence of T-kernel SGD. The numerical experiments further validate the theoretical findings presented in this paper.

翻訳日:2024-11-04 16:54:49 公開日:2024-10-04

# HarmoniCa: 拡散トランスフォーマーアクセラレーションにおけるより良い機能キャッシュのためのトレーニングと推論の調和

HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration ( http://arxiv.org/abs/2410.01723v1 )

ライセンス: Link先を確認

Yushi Huang, Zining Wang, Ruihao Gong, Jing Liu, Xinjie Zhang, Jinyang Guo, Xianglong Liu, Jun Zhang,

(参考訳) Diffusion Transformer (DiTs) は、生成タスクにおける優れたスケーラビリティと優れたパフォーマンスで有名である。しかし、そのかなりの推論コストは実践的な展開を妨げる。タイムステップ間で冗長な計算を保存および検索する機能キャッシュメカニズムは、拡散モデルにおけるステップごとの推論時間を削減することを約束する。 DiTの既存のキャッシュメソッドは手動で設計されている。学習ベースのアプローチは戦略を適応的に最適化しようとするが、トレーニングと推論の相違に悩まされ、パフォーマンスと加速度比の両方を損なう。より詳細な分析では,(1)事前の時間差,(2)早期のキャッシュ使用の影響を無視する事前の時間差,(2)訓練対象(各時間差の予測ノイズ)が推論目標(高品質な画像の生成)から逸脱する客観的なミスマッチ,の2点が主な特徴である。これらの相違を緩和するために,ステップワイズ・デノナイジング・トレーニング(SDT)とイメージエラー・プロキシ・ガイド・オブジェクト(IEPO)をベースとした新しい学習ベースキャッシング・フレームワークを用いて,トレーニングと推論を調和させる新しい手法であるHarmoniCaを提案する。従来のトレーニングパラダイムと比較すると、新たに提案されたSDTは、推論時の動作と同じように、トレーニング中の前のタイムステップからの情報を活用することができるように、デノナイジングプロセスの継続性を維持している。さらに,キャッシュされた特徴の再利用による最終的な画像誤差を近似するために,効率的なプロキシ機構を統合したIEPOを設計する。したがって、IEPOは最終的な画像品質とキャッシュ利用のバランスを保ち、各タイムステップで予測される出力に対するキャッシュ使用の影響のみを考慮したトレーニングの問題を解消する。

Diffusion Transformers (DiTs) have gained prominence for outstanding scalability and extraordinary performance in generative tasks. However, their considerable inference costs impede practical deployment. The feature cache mechanism, which involves storing and retrieving redundant computations across timesteps, holds promise for reducing per-step inference time in diffusion models. Most existing caching methods for DiT are manually designed. Although the learning-based approach attempts to optimize strategies adaptively, it suffers from discrepancies between training and inference, which hampers both the performance and acceleration ratio. Upon detailed analysis, we pinpoint that these discrepancies primarily stem from two aspects: (1) Prior Timestep Disregard, where training ignores the effect of cache usage at earlier timesteps, and (2) Objective Mismatch, where the training target (align predicted noise in each timestep) deviates from the goal of inference (generate the high-quality image). To alleviate these discrepancies, we propose HarmoniCa, a novel method that Harmonizes training and inference with a novel learning-based Caching framework built upon Step-Wise Denoising Training (SDT) and Image Error Proxy-Guided Objective (IEPO). Compared to the traditional training paradigm, the newly proposed SDT maintains the continuity of the denoising process, enabling the model to leverage information from prior timesteps during training, similar to the way it operates during inference. Furthermore, we design IEPO, which integrates an efficient proxy mechanism to approximate the final image error caused by reusing the cached feature. Therefore, IEPO helps balance final image quality and cache utilization, resolving the issue of training that only considers the impact of cache usage on the predicted output at each timestep.

翻訳日:2024-11-04 15:43:48 公開日:2024-10-04

Yushi Huang, Zining Wang, Ruihao Gong, Jing Liu, Xinjie Zhang, Jun Zhang,

翻訳日:2024-11-04 15:43:48 公開日:2024-10-04

# 極端イベントによる説明可能な地球表面の予測

Explainable Earth Surface Forecasting under Extreme Events ( http://arxiv.org/abs/2410.01770v1 )

ライセンス: Link先を確認

Oscar J. Pellicer-Valero, Miguel-Ángel Fernández-Torres, Chaonan Ji, Miguel D. Mahecha, Gustau Camps-Valls,

(参考訳) 気候変動に関連する極端な出来事が増加する中、高次元地球観測データは生態系への影響を予測し理解するためのユニークな機会となる。しかし、これは処理、視覚化、モデリング、データの説明の複雑さによって妨げられている。この課題にどのように対処できるかを示すために、私たちは、新しいDeepExtremeCubesデータセットに基づいて、畳み込みの長い短期メモリベースのアーキテクチャをトレーニングします。 DeepExtremeCubesには、世界中の4万件のSentinel-2ミニキューブ(2016年1月～2022年10月)と、極端な気象、気象データ、植生の土地被覆、地形図などが含まれている。カーネル正規化差分植生指標を用いて将来の反射率と植生の影響を予測すると、実験セットのR$^2$スコアが0.9055に達した。説明可能な人工知能は、2020年10月の中央南アメリカの複合熱波と干ばつイベントにおけるモデルの予測を分析するために使用された。その結果, 平均気温と表面圧力は, 通常の条件下での予測値として最適であることが判明した。対照的に、蒸発と表面潜熱フラックスの最小限の異常は、イベント中にリードを取る。イベント前の属性にもレギュレーションの変化が見られ、イベントが発生するまでの期間を評価するのに役立つかもしれない。この論文のすべての実験と数字を再現するコードはhttps://github.com/DeepExtremes/txyXAIで公開されている。

With climate change-related extreme events on the rise, high dimensional Earth observation data presents a unique opportunity for forecasting and understanding impacts on ecosystems. This is, however, impeded by the complexity of processing, visualizing, modeling, and explaining this data. To showcase how this challenge can be met, here we train a convolutional long short-term memory-based architecture on the novel DeepExtremeCubes dataset. DeepExtremeCubes includes around 40,000 long-term Sentinel-2 minicubes (January 2016-October 2022) worldwide, along with labeled extreme events, meteorological data, vegetation land cover, and topography map, sampled from locations affected by extreme climate events and surrounding areas. When predicting future reflectances and vegetation impacts through kernel normalized difference vegetation index, the model achieved an R$^2$ score of 0.9055 in the test set. Explainable artificial intelligence was used to analyze the model's predictions during the October 2020 Central South America compound heatwave and drought event. We chose the same area exactly one year before the event as counterfactual, finding that the average temperature and surface pressure are generally the best predictors under normal conditions. In contrast, minimum anomalies of evaporation and surface latent heat flux take the lead during the event. A change of regime is also observed in the attributions before the event, which might help assess how long the event was brewing before happening. The code to replicate all experiments and figures in this paper is publicly available at https://github.com/DeepExtremes/txyXAI

翻訳日:2024-11-04 15:24:19 公開日:2024-10-04

# 極端イベントによる説明可能な地球表面の予測

Explainable Earth Surface Forecasting under Extreme Events ( http://arxiv.org/abs/2410.01770v2 )

ライセンス: Link先を確認

Oscar J. Pellicer-Valero, Miguel-Ángel Fernández-Torres, Chaonan Ji, Miguel D. Mahecha, Gustau Camps-Valls,

翻訳日:2024-11-04 15:24:19 公開日:2024-10-04

# 言語モデルが推論に最適化されているとき、自動回帰のエンバーをまだ示しているのか? OpenAI o1の分析

When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 ( http://arxiv.org/abs/2410.01792v1 )

ライセンス: Link先を確認

R. Thomas McCoy, Shunyu Yao, Dan Friedman, Mathew D. Hardy, Thomas L. Griffiths,

(参考訳) We showed that several large language model (LLMs) have some important limit that are a origins in next-word prediction。ここでは,従来のLLMと異なり,推論に最適化されたOpenAIの新しいシステムであるo1について,これらの問題が継続するかどうかを検討する。多くの場合、o1 は従来の LLM よりも大幅に優れており、特に希少な共通タスク(例えば、最初の文字ではなく、リスト内の各単語の2番目の文字から頭字語を生成する)に大きく改善されている。しかし、これらの定量的改善にもかかわらず、o1は以前のシステムで観測したのと同じ定性的傾向を示している。具体的には、従来のLLMと同様、o1は例やタスクの確率に敏感で、高確率設定では低確率設定よりもパフォーマンスが良く、"トークンを考える"必要が少なくなっています。これらの結果は、推論のための言語モデルの最適化は緩和できるが、言語モデルの確率感度を完全に克服できないことを示している。

In "Embers of Autoregression" (McCoy et al., 2023), we showed that several large language models (LLMs) have some important limitations that are attributable to their origins in next-word prediction. Here we investigate whether these issues persist with o1, a new system from OpenAI that differs from previous LLMs in that it is optimized for reasoning. We find that o1 substantially outperforms previous LLMs in many cases, with particularly large improvements on rare variants of common tasks (e.g., forming acronyms from the second letter of each word in a list, rather than the first letter). Despite these quantitative improvements, however, o1 still displays the same qualitative trends that we observed in previous systems. Specifically, o1 - like previous LLMs - is sensitive to the probability of examples and tasks, performing better and requiring fewer "thinking tokens" in high-probability settings than in low-probability ones. These results show that optimizing a language model for reasoning can mitigate but might not fully overcome the language model's probability sensitivity.

翻訳日:2024-11-04 15:14:33 公開日:2024-10-04

R. Thomas McCoy, Shunyu Yao, Dan Friedman, Mathew D. Hardy, Thomas L. Griffiths,

(参考訳) We showed that several large language model (LLMs) have some important limit that are a origins in next-word prediction。ここでは,従来のLLMと異なり,推論に最適化されたOpenAIの新しいシステムであるo1について,これらの問題が継続するかどうかを検討する。多くの場合、o1 は従来の LLM よりも大幅に優れており、特に希少な共通タスク(例えば、最初の文字ではなく、リスト内の各単語の2番目の文字から頭字語を生成する)に大きく改善されている。しかし、これらの定量的改善にもかかわらず、o1は以前のシステムで観測したのと同じ定性的傾向を示している。具体的には、従来のLLMと同様、o1は例やタスクの確率に敏感で、高確率設定では低確率設定よりも優れたパフォーマンスと“トークンを考える”必要が少なくなっています。これらの結果は、推論のための言語モデルの最適化は緩和できるが、言語モデルの確率感度を完全に克服できないことを示している。

In "Embers of Autoregression" (McCoy et al., 2023), we showed that several large language models (LLMs) have some important limitations that are attributable to their origins in next-word prediction. Here we investigate whether these issues persist with o1, a new system from OpenAI that differs from previous LLMs in that it is optimized for reasoning. We find that o1 substantially outperforms previous LLMs in many cases, with particularly large improvements on rare variants of common tasks (e.g., forming acronyms from the second letter of each word in a list, rather than the first letter). Despite these quantitative improvements, however, o1 still displays the same qualitative trends that we observed in previous systems. Specifically, o1 -- like previous LLMs -- is sensitive to the probability of examples and tasks, performing better and requiring fewer "thinking tokens" in high-probability settings than in low-probability ones. These results show that optimizing a language model for reasoning can mitigate but might not fully overcome the language model's probability sensitivity.

翻訳日:2024-11-04 15:14:33 公開日:2024-10-04

# Bayes-CATSI:医療時系列データ計算のための変分ベイズ的アプローチ

Bayes-CATSI: A variational Bayesian approach for medical time series data imputation ( http://arxiv.org/abs/2410.01847v1 )

ライセンス: Link先を確認

Omkar Kulkarni, Rohitash Chandra,

(参考訳) 医療時系列データセットは、データ計算方法を必要とする欠落した値を特徴としているが、従来の機械学習モデルは予測における不確実な定量化の欠如により不足している。これらのモデルの中で、CATSI(Context-Aware Time Series Imputation)は、各患者のグローバルな依存関係をキャプチャして、コンテキストベクトルをインプットプロセスに組み込むことで、その効果を際立たせる。本稿では,変分推論による不確実性定量化を利用したベイズ時間系列計算(Bayes-CATSI)フレームワークを提案する。脳波(EEG)、脳波(EOG)、筋電図(EMG)、心電図(EKG)の時系列を考察した。変分推論は後部分布の形状を仮定し、クルバック・リーバー(KL)の発散を最小化することで、真の後部分布に最も近い変分密度を求める。そこで我々は,変分ベイズディープラーニング層をCATSIモデルに統合した。その結果,ベイズCATSIは不確実な定量化を提供するだけでなく,CATSIモデルよりも優れた計算性能が得られることがわかった。具体的には、Bayes-CATSIのインスタンスはCATSIを9.57%上回っている。ベイズCATSIを他の医療データ計算問題に適用するためのオープンソースコード実装を提供する。

Medical time series datasets feature missing values that need data imputation methods, however, conventional machine learning models fall short due to a lack of uncertainty quantification in predictions. Among these models, the CATSI (Context-Aware Time Series Imputation) stands out for its effectiveness by incorporating a context vector into the imputation process, capturing the global dependencies of each patient. In this paper, we propose a Bayesian Context-Aware Time Series Imputation (Bayes-CATSI) framework which leverages uncertainty quantification offered by variational inference. We consider the time series derived from electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), electrocardiology (EKG). Variational Inference assumes the shape of the posterior distribution and through minimization of the Kullback-Leibler(KL) divergence it finds variational densities that are closest to the true posterior distribution. Thus , we integrate the variational Bayesian deep learning layers into the CATSI model. Our results show that Bayes-CATSI not only provides uncertainty quantification but also achieves superior imputation performance compared to the CATSI model. Specifically, an instance of Bayes-CATSI outperforms CATSI by 9.57 %. We provide an open-source code implementation for applying Bayes-CATSI to other medical data imputation problems.

翻訳日:2024-11-04 14:34:44 公開日:2024-10-04

# Bayes-CATSI:医療時系列データ計算のための変分ベイズ深層学習フレームワーク

Bayes-CATSI: A variational Bayesian deep learning framework for medical time series data imputation ( http://arxiv.org/abs/2410.01847v2 )

ライセンス: Link先を確認

Omkar Kulkarni, Rohitash Chandra,

翻訳日:2024-11-04 14:34:44 公開日:2024-10-04

# コンテンツ型分解による視覚基礎モデルの半監督的微調整

Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition ( http://arxiv.org/abs/2410.02069v1 )

ライセンス: Link先を確認

Mariia Drozdova, Vitaliy Kinakh, Yury Belousov, Erica Lastufka, Slava Voloshynovskiy,

(参考訳) 本稿では,限定ラベル付きデータを用いた下流タスクにおける基礎モデルの性能向上を目的とした,半教師付き微調整手法を提案する。情報理論フレームワーク内でのコンテンツスタイルの分解を利用して、事前学習された視覚基盤モデルの潜在表現を強化し、特定のタスク目標とより効果的に整合させ、分散シフトの問題に対処する。我々は、MNIST、その拡張されたバリエーション(黄色と白のストライプ)、CIFAR-10、SVHN、GalaxyMNISTを含む複数のデータセットに対するアプローチを評価した。実験は、純粋な教師付きベースライン、特に低ラベルのデータレギュレーションにおいて、テストされたデータセットの大部分に対して、凍結されたバックボーンとトレーニング可能なバックボーンの両方で改善されていることを示す。

In this paper, we present a semi-supervised fine-tuning approach designed to improve the performance of foundation models on downstream tasks with limited labeled data. By leveraging content-style decomposition within an information-theoretic framework, our method enhances the latent representations of pre-trained vision foundation models, aligning them more effectively with specific task objectives and addressing the problem of distribution shift. We evaluate our approach on multiple datasets, including MNIST, its augmented variations (with yellow and white stripes), CIFAR-10, SVHN, and GalaxyMNIST. The experiments show improvements over purely supervised baselines, particularly in low-labeled data regimes, across both frozen and trainable backbones for the majority of the tested datasets.

翻訳日:2024-11-04 09:05:40 公開日:2024-10-04

# コンテンツ型分解による視覚基礎モデルの半監督的微調整

Semi-Supervised Fine-Tuning of Vision Foundation Models with Content-Style Decomposition ( http://arxiv.org/abs/2410.02069v2 )

ライセンス: Link先を確認

Mariia Drozdova, Vitaliy Kinakh, Yury Belousov, Erica Lastufka, Slava Voloshynovskiy,

(参考訳) 本稿では,ラベル付きデータに制限のある下流タスクにおいて,事前学習した基礎モデルの性能向上を目的とした半教師付き微調整手法を提案する。情報理論フレームワーク内でのコンテンツスタイルの分解を利用して、事前学習された視覚基盤モデルの潜在表現を強化し、特定のタスク目標とより効果的に整合させ、分散シフトの問題に対処する。我々は、MNIST、その拡張されたバリエーション(黄色と白のストライプ)、CIFAR-10、SVHN、GalaxyMNISTを含む複数のデータセットに対するアプローチを評価した。実験では、トレーニング済みモデルの教師付き微調整ベースライン、特に低ラベルのデータレギュレーションにおいて、テスト済みデータセットの大部分の凍結およびトレーニング可能なバックボーンに対して改善が示されている。

In this paper, we present a semi-supervised fine-tuning approach designed to improve the performance of pre-trained foundation models on downstream tasks with limited labeled data. By leveraging content-style decomposition within an information-theoretic framework, our method enhances the latent representations of pre-trained vision foundation models, aligning them more effectively with specific task objectives and addressing the problem of distribution shift. We evaluate our approach on multiple datasets, including MNIST, its augmented variations (with yellow and white stripes), CIFAR-10, SVHN, and GalaxyMNIST. The experiments show improvements over supervised finetuning baseline of pre-trained models, particularly in low-labeled data regimes, across both frozen and trainable backbones for the majority of the tested datasets.

翻訳日:2024-11-04 09:05:40 公開日:2024-10-04

# EC-DIT:Adaptive Expert-Choice Routingによる拡散変換器のスケーリング

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing ( http://arxiv.org/abs/2410.02098v1 )

ライセンス: Link先を確認

Haotian Sun, Bowen Zhang, Yanghao Li, Haoshuo Huang, Tao Lei, Ruoming Pang, Bo Dai, Nan Du,

(参考訳) 拡散変換器はテキストと画像の合成に広く採用されている。これらのモデルを数十億のパラメータにスケールすることは、有望であることを示しているが、現在のサイズを超えてスケールする効果は、過小評価され、困難なままである。画像生成の計算的不均一性を明示的に活用することにより、エキスパート・チョイス・ルーティングを持つ拡散トランスフォーマーのためのMixture-of-Experts(MoE)モデル(EC-DIT)を新たに開発する。 EC-DITは、入力テキストを理解するために割り当てられた計算を適応的に最適化し、各画像パッチを生成する。この異質性は、EC-DITを最大97億のパラメータにスケーリングする効率的な方法を提供し、トレーニング収束、テキスト・ツー・イメージアライメント、および高密度モデルや従来のMoEモデルよりも全体的な生成品質を大幅に向上させる。本稿では,EC-DITによる拡張性と適応性を示すため,エンド・ツー・エンド・トレーニングによるテキストの重要度を認識した。特に,テキストと画像のアライメント評価では,最先端のGenEvalスコアが71.68%に達し,直感的に解釈可能な競合推論速度を維持している。

Diffusion transformers have been widely adopted for text-to-image synthesis. While scaling these models up to billions of parameters shows promise, the effectiveness of scaling beyond current sizes remains underexplored and challenging. By explicitly exploiting the computational heterogeneity of image generations, we develop a new family of Mixture-of-Experts (MoE) models (EC-DIT) for diffusion transformers with expert-choice routing. EC-DIT learns to adaptively optimize the compute allocated to understand the input texts and generate the respective image patches, enabling heterogeneous computation aligned with varying text-image complexities. This heterogeneity provides an efficient way of scaling EC-DIT up to 97 billion parameters and achieving significant improvements in training convergence, text-to-image alignment, and overall generation quality over dense models and conventional MoE models. Through extensive ablations, we show that EC-DIT demonstrates superior scalability and adaptive compute allocation by recognizing varying textual importance through end-to-end training. Notably, in text-to-image alignment evaluation, our largest models achieve a state-of-the-art GenEval score of 71.68% and still maintain competitive inference speed with intuitive interpretability.

翻訳日:2024-11-04 08:55:37 公開日:2024-10-04

Haotian Sun, Tao Lei, Bowen Zhang, Yanghao Li, Haoshuo Huang, Ruoming Pang, Bo Dai, Nan Du,

翻訳日:2024-11-04 08:45:48 公開日:2024-10-04

# L-CiteEval: ロングコンテキストモデルは応答するコンテキストを真に活用するのか?

L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? ( http://arxiv.org/abs/2410.02115v1 )

ライセンス: Link先を確認

Zecheng Tang, Keyan Zhou, Juntao Li, Baibei Ji, Jianye Hou, Min Zhang,

(参考訳) 近年、LCM(Long-context Model)は、文書要約などの長いコンテキストを含むタスクを扱うための利便性をユーザに提供することで、顕著な進歩を遂げている。コミュニティが生成結果の忠実さをますます優先するにつれて、LCM出力の正確性を保証するだけでは不十分であり、極めて長いコンテキストから結果を検証することは極めて困難である。しかし,L-CiteEval(L-CiteEval,L-CiteEval)は,L-CiteEval(L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval, L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L -CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L- CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteE,L-C,L-CiteE,L- L-CiteEvalは、コンテキストの長さを8Kから48Kまで、さまざまなドメインから11のタスクをカバーし、完全に自動化された評価スイートを提供する。 11個の最先端のクローズドソースおよびオープンソースLCMを用いてテストした結果、これらのモデルは生成された結果に小さな違いがあるものの、オープンソースモデルは引用精度とリコールの点でクローズドソースモデルよりもかなり遅れていることがわかった。これは、現在のオープンソースのLCMは、与えられたコンテキストよりも、その固有の知識に基づいて応答する傾向があり、実用的なアプリケーションにおけるユーザエクスペリエンスに重大なリスクを及ぼすことを示唆している。また,RAGアプローチを評価し,RAGは生成品質をわずかに低下させることなく,LCMの忠実度を著しく向上させることができることを観察した。さらに,LCMの注意機構と引用生成過程の相関関係を見いだした。

Long-context models (LCMs) have made remarkable strides in recent years, offering users great convenience for handling tasks that involve long context, such as document summarization. As the community increasingly prioritizes the faithfulness of generated results, merely ensuring the accuracy of LCM outputs is insufficient, as it is quite challenging for humans to verify the results from the extremely lengthy context. Yet, although some efforts have been made to assess whether LCMs respond truly based on the context, these works either are limited to specific tasks or heavily rely on external evaluation resources like GPT-4.In this work, we introduce L-CiteEval, a comprehensive multi-task benchmark for long-context understanding with citations, aiming to evaluate both the understanding capability and faithfulness of LCMs. L-CiteEval covers 11 tasks from diverse domains, spanning context lengths from 8K to 48K, and provides a fully automated evaluation suite. Through testing with 11 cutting-edge closed-source and open-source LCMs, we find that although these models show minor differences in their generated results, open-source models substantially trail behind their closed-source counterparts in terms of citation accuracy and recall. This suggests that current open-source LCMs are prone to responding based on their inherent knowledge rather than the given context, posing a significant risk to the user experience in practical applications. We also evaluate the RAG approach and observe that RAG can significantly improve the faithfulness of LCMs, albeit with a slight decrease in the generation quality. Furthermore, we discover a correlation between the attention mechanisms of LCMs and the citation generation process.

翻訳日:2024-11-04 08:45:48 公開日:2024-10-04

# L-CiteEval: ロングコンテキストモデルは応答するコンテキストを真に活用するのか?

L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? ( http://arxiv.org/abs/2410.02115v2 )

ライセンス: Link先を確認

Zecheng Tang, Keyan Zhou, Juntao Li, Baibei Ji, Jianye Hou, Min Zhang,

(参考訳) 近年、LCM(Long-context Model)は、文書要約などの長いコンテキストを含むタスクを扱うための利便性をユーザに提供することで、顕著な進歩を遂げている。コミュニティが生成結果の忠実さをますます優先するにつれて、LCM出力の正確性を保証するだけでは不十分であり、極めて長いコンテキストから結果を検証することは極めて困難である。しかし,L-CiteEval(L-CiteEval,L-CiteEval,L-CiteEval)は,L-CiteEval(L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval, L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L -CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L-CiteEval,L- CiteEval,L-L-CiteEval,L-CiteEval,L-CiteEval,L-C,L-C,L-CiteE,L- L-CiteEvalは、コンテキストの長さを8Kから48Kまで、さまざまなドメインから11のタスクをカバーし、完全に自動化された評価スイートを提供する。 11個の最先端のクローズドソースおよびオープンソースLCMを用いてテストした結果、これらのモデルは生成された結果に小さな違いがあるものの、オープンソースモデルは引用精度とリコールの点でクローズドソースモデルよりもかなり遅れていることがわかった。これは、現在のオープンソースのLCMは、与えられたコンテキストよりも、その固有の知識に基づいて応答する傾向があり、実用的なアプリケーションにおけるユーザエクスペリエンスに重大なリスクを及ぼすことを示唆している。また,RAGアプローチを評価し,RAGは生成品質をわずかに低下させることなく,LCMの忠実度を著しく向上させることができることを観察した。さらに,LCMの注意機構と引用生成過程の相関関係を見いだした。

Long-context models (LCMs) have made remarkable strides in recent years, offering users great convenience for handling tasks that involve long context, such as document summarization. As the community increasingly prioritizes the faithfulness of generated results, merely ensuring the accuracy of LCM outputs is insufficient, as it is quite challenging for humans to verify the results from the extremely lengthy context. Yet, although some efforts have been made to assess whether LCMs respond truly based on the context, these works either are limited to specific tasks or heavily rely on external evaluation resources like GPT4.In this work, we introduce L-CiteEval, a comprehensive multi-task benchmark for long-context understanding with citations, aiming to evaluate both the understanding capability and faithfulness of LCMs. L-CiteEval covers 11 tasks from diverse domains, spanning context lengths from 8K to 48K, and provides a fully automated evaluation suite. Through testing with 11 cutting-edge closed-source and open-source LCMs, we find that although these models show minor differences in their generated results, open-source models substantially trail behind their closed-source counterparts in terms of citation accuracy and recall. This suggests that current open-source LCMs are prone to responding based on their inherent knowledge rather than the given context, posing a significant risk to the user experience in practical applications. We also evaluate the RAG approach and observe that RAG can significantly improve the faithfulness of LCMs, albeit with a slight decrease in the generation quality. Furthermore, we discover a correlation between the attention mechanisms of LCMs and the citation generation process.

翻訳日:2024-11-04 08:45:48 公開日:2024-10-04

# 構造行列連続空間上の効率的な線形層探索

Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices ( http://arxiv.org/abs/2410.02117v1 )

ライセンス: Link先を確認

Andres Potapczynski, Shikai Qiu, Marc Finzi, Christopher Ferri, Zixi Chen, Micah Goldblum, Bayan Bruss, Christopher De Sa, Andrew Gordon Wilson,

(参考訳) 密度線形層は、大規模ニューラルネットワークにおいて支配的な計算ボトルネックであり、より効率的な代替手段の必要性を示唆している。従来は少数の手作り構造体に重点を置き、モデルサイズとトレーニング例の両方を最適に割り当てたときに、これらの構造が計算最適スケーリング法則で高密度層を超過できるかを調査した。本研究では,アインシュタイン和を通じて表現可能なすべての線形作用素の探索を可能にする統一フレームワークを提案する。このフレームワークは、低ランク、クローネッカー、テンソル・トレイン、ブロック・テンソル・トレイン(BTT)、モナールなど、これまでに提案された多くの構造を含む。この枠組みを解析するために、計算的および代数的特性に基づく全ての演算子の分類を開発し、計算-最適スケーリング法則の違いは、導入した少数の変数によって主に支配されていることを示す。つまり、小さな$\omega$(パラメータの共有を計測する)と大きな$\psi$(ランクを計測する)は、確実にスケーリング法則の改善につながった。計算単位あたりのパラメータを最大化するフルランク構造が最適であるという知見に導かれて,BTT構造における計算のスパース化によって得られる新しいMixture-of-Experts (MoE)アーキテクチャであるBTT-MoEを提案する。フィードフォワードネットワーク全体の標準スパースMoEとは対照的に、BTT-MoEは、アテンションブロック内の投影行列を含むモデルのすべての線形層におけるMoEを学習する。 BTT-MoEは高密度層や標準MoEに比べて計算効率が大幅に向上することがわかった。

Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are optimally allocated. In this work, we present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses many previously proposed structures, such as low-rank, Kronecker, Tensor-Train, Block Tensor-Train (BTT), and Monarch, along with many novel structures. To analyze the framework, we develop a taxonomy of all such operators based on their computational and algebraic properties and show that differences in the compute-optimal scaling laws are mostly governed by a small number of variables that we introduce. Namely, a small $\omega$ (which measures parameter sharing) and large $\psi$ (which measures the rank) reliably led to better scaling laws. Guided by the insight that full-rank structures that maximize parameters per unit of compute perform the best, we propose BTT-MoE, a novel Mixture-of-Experts (MoE) architecture obtained by sparsifying computation in the BTT structure. In contrast to the standard sparse MoE for each entire feed-forward network, BTT-MoE learns an MoE in every single linear layer of the model, including the projection matrices in the attention blocks. We find BTT-MoE provides a substantial compute-efficiency gain over dense layers and standard MoE.

翻訳日:2024-11-04 08:45:48 公開日:2024-10-04

# 構造行列連続空間上の効率的な線形層探索

Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices ( http://arxiv.org/abs/2410.02117v2 )

ライセンス: Link先を確認

Andres Potapczynski, Shikai Qiu, Marc Finzi, Christopher Ferri, Zixi Chen, Micah Goldblum, Bayan Bruss, Christopher De Sa, Andrew Gordon Wilson,

翻訳日:2024-11-04 08:45:48 公開日:2024-10-04

# C-MELT:ECG-Language事前学習のためのコントラスト強化マスク付きオートエンコーダ

C-MELT: Contrastive Enhanced Masked Auto-Encoders for ECG-Language Pre-Training ( http://arxiv.org/abs/2410.02131v1 )

ライセンス: Link先を確認

Manh Pham, Aaqib Saeed, Dong Ma,

(参考訳) 心電図(ECG)信号の正確な解釈は心血管疾患の診断に重要である。 ECG信号と付随するテキストレポートを統合することは、生理学的データと質的な洞察を組み合わせることで臨床診断を強化する大きな可能性を秘めている。しかし、この統合は、固有のモダリティの相違と、堅牢なクロスモーダル学習のためのラベル付きデータの不足により、大きな課題に直面している。これらの障害に対処するために,コントラッシブマスク付きオートエンコーダアーキテクチャを用いて,ECGとテキストデータを事前学習する新しいフレームワークであるC-MELTを提案する。 C-MELTは、生成性の強さと識別能力の強化を一意に組み合わせて、堅牢なクロスモーダル表現を実現する。これは、マスク付きモダリティモデリング、特殊損失関数、およびクロスモーダルアライメントに適した改善されたネガティブサンプリング戦略によって達成される。さまざまなダウンストリームタスクにわたる5つの公開データセットに対する大規模な実験により、C-MELTは既存の手法よりも大幅に優れており、それぞれ、最先端モデルよりも線形プローブとゼロショットのパフォーマンスが15%、2%向上していることが示された。これらの結果はC-MELTの有効性を浮き彫りにしており, マルチモーダル表現による臨床診断の進歩の可能性を示している。

Accurate interpretation of Electrocardiogram (ECG) signals is pivotal for diagnosing cardiovascular diseases. Integrating ECG signals with their accompanying textual reports holds immense potential to enhance clinical diagnostics through the combination of physiological data and qualitative insights. However, this integration faces significant challenges due to inherent modality disparities and the scarcity of labeled data for robust cross-modal learning. To address these obstacles, we propose C-MELT, a novel framework that pre-trains ECG and text data using a contrastive masked auto-encoder architecture. C-MELT uniquely combines the strengths of generative with enhanced discriminative capabilities to achieve robust cross-modal representations. This is accomplished through masked modality modeling, specialized loss functions, and an improved negative sampling strategy tailored for cross-modal alignment. Extensive experiments on five public datasets across diverse downstream tasks demonstrate that C-MELT significantly outperforms existing methods, achieving 15% and 2% increases in linear probing and zero-shot performance over state-of-the-art models, respectively. These results highlight the effectiveness of C-MELT, underscoring its potential to advance automated clinical diagnostics through multi-modal representations.

翻訳日:2024-11-04 08:35:44 公開日:2024-10-04

Manh Pham, Aaqib Saeed, Dong Ma,

翻訳日:2024-11-04 08:35:44 公開日:2024-10-04

# ピクセルからトークンへ:量子化された視覚モーダルのバイトペアエンコーディング

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities ( http://arxiv.org/abs/2410.02155v1 )

ライセンス: Link先を確認

Wanpeng Zhang, Zilong Xie, Yicheng Feng, Yijiang Li, Xingrun Xing, Sipeng Zheng, Zongqing Lu,

(参考訳) マルチモーダル大言語モデルは、視覚情報とテキスト情報を統合するために大きな進歩を遂げてきたが、これらのモダリティを効果的に整合させるのにしばしば苦労している。本稿では,BPE(Byte-Pair Encoding)の原理を視覚データに適用することにより,このギャップを埋める新しい画像トークンを提案する。視覚的エンコーダを分離する従来の手法とは異なり、本手法は構造的事前情報を画像トークンに直接組み込んで、テキストのみの大規模言語モデルで使われるトークン化戦略を模倣する。この革新的なアプローチにより、Transformerモデルはモダリティをより効果的に学習し、推論することができる。理論的解析と広範な実験により,BPEイメージトケナイザは,限られたトレーニングデータであっても,MLLMのマルチモーダル理解能力を著しく向上させることを示した。提案手法は,様々なベンチマークにおける性能向上だけでなく,有望なスケーラビリティを示すとともに,より効率的かつ有能なマルチモーダル基盤モデルの実現にも寄与する可能性がある。

Multimodal Large Language Models have made significant strides in integrating visual and textual information, yet they often struggle with effectively aligning these modalities. We introduce a novel image tokenizer that bridges this gap by applying the principle of Byte-Pair Encoding (BPE) to visual data. Unlike conventional approaches that rely on separate visual encoders, our method directly incorporates structural prior information into image tokens, mirroring the successful tokenization strategies used in text-only Large Language Models. This innovative approach enables Transformer models to more effectively learn and reason across modalities. Through theoretical analysis and extensive experiments, we demonstrate that our BPE Image Tokenizer significantly enhances MLLMs' multimodal understanding capabilities, even with limited training data. Our method not only improves performance across various benchmarks but also shows promising scalability, potentially paving the way for more efficient and capable multimodal foundation models.

翻訳日:2024-11-04 08:25:54 公開日:2024-10-04

# ピクセルからトークンへ:量子化された視覚モーダルのバイトペアエンコーディング

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities ( http://arxiv.org/abs/2410.02155v2 )

ライセンス: Link先を確認

Wanpeng Zhang, Zilong Xie, Yicheng Feng, Yijiang Li, Xingrun Xing, Sipeng Zheng, Zongqing Lu,

翻訳日:2024-11-04 08:25:54 公開日:2024-10-04

# POSIX: 大規模言語モデルのための素早い感度指数

POSIX: A Prompt Sensitivity Index For Large Language Models ( http://arxiv.org/abs/2410.02185v1 )

ライセンス: Link先を確認

Anwoy Chatterjee, H S V N S Kowndinya Renduchintala, Sumit Bhatia, Tanmoy Chakraborty,

(参考訳) その顕著な能力にもかかわらず、LLM(Large Language Models)はプロンプトの小さなバリエーションに驚くほど敏感であり、スペルエラー、単語の変更、プロンプトテンプレートなどのプロンプトの小さなバリエーションに応答して、かなり異なる出力を生成することが多い。しかしながら、LLMの品質を評価する一方で、ダウンストリームタスクにおけるパフォーマンスのみに焦点をあてる傾向があり、センシティブに注意を払わないことが多い。このギャップを埋めるため,新しいPrOmpt Sensitivity IndeXのPOSIXを提案する。 POSIXの背景にある重要な考え方は、対応するプロンプトを異なるインテント保存プロンプトに置き換えることによって、所定の応答のログ化の相対的な変化を捉えることである。本研究はPOSIXの迅速な感度測定における有効性を実証する実験的な証拠を提供する。パラメータ数の増加や命令のチューニングだけでは即発感度を低下させるわけではないが、数発の例を1回だけ追加しても、ほぼ常に即発感度を低下させる。また,テンプレートの更新がMCQ型タスクでは最も感度が高いのに対して,パラフレーズ化はオープンな生成タスクでは最も感度が高いことが判明した。結果の再現コードはhttps://github.com/kowndinyarenduchintala/POSIX.comで公開されている。

Despite their remarkable capabilities, Large Language Models (LLMs) are found to be surprisingly sensitive to minor variations in prompts, often generating significantly divergent outputs in response to minor variations in the prompts, such as spelling errors, alteration of wording or the prompt template. However, while assessing the quality of an LLM, the focus often tends to be solely on its performance on downstream tasks, while very little to no attention is paid to prompt sensitivity. To fill this gap, we propose POSIX - a novel PrOmpt Sensitivity IndeX as a reliable measure of prompt sensitivity, thereby offering a more comprehensive evaluation of LLM performance. The key idea behind POSIX is to capture the relative change in loglikelihood of a given response upon replacing the corresponding prompt with a different intent-preserving prompt. We provide thorough empirical evidence demonstrating the efficacy of POSIX in capturing prompt sensitivity and subsequently use it to measure and thereby compare prompt sensitivity of various open-source LLMs. We find that merely increasing the parameter count or instruction tuning does not necessarily reduce prompt sensitivity whereas adding some few-shot exemplars, even just one, almost always leads to significant decrease in prompt sensitivity. We also find that alterations to prompt template lead to the highest sensitivity in the case of MCQtype tasks, whereas paraphrasing results in the highest sensitivity in open-ended generation tasks. The code for reproducing our results is open-sourced at https://github.com/kowndinyarenduchintala/POSIX.

翻訳日:2024-11-04 08:15:54 公開日:2024-10-04

# POSIX: 大規模言語モデルのための素早い感度指数

POSIX: A Prompt Sensitivity Index For Large Language Models ( http://arxiv.org/abs/2410.02185v2 )

ライセンス: Link先を確認

Anwoy Chatterjee, H S V N S Kowndinya Renduchintala, Sumit Bhatia, Tanmoy Chakraborty,

(参考訳) その顕著な能力にもかかわらず、LLM(Large Language Models)はプロンプトの小さなバリエーションに驚くほど敏感であり、スペルエラー、単語の変更、プロンプトテンプレートなどのプロンプトの小さなバリエーションに応答して、かなり異なる出力を生成することが多い。しかしながら、LLMの品質を評価する一方で、ダウンストリームタスクにおけるパフォーマンスのみに焦点をあてる傾向があり、センシティブに注意を払わないことが多い。このギャップを埋めるため,新しいPrOmpt Sensitivity IndeXのPOSIXを提案する。 POSIXの背景にある重要な考え方は、対応するプロンプトを異なるインテント保存プロンプトに置き換えることによって、所定の応答のログ化の相対的な変化を捉えることである。本研究はPOSIXの迅速な感度測定における有効性を実証する実験的な証拠を提供する。パラメータ数の増加や命令のチューニングだけでは即発感度を低下させるわけではないが、数発の例を1回だけ追加しても、ほぼ常に即発感度を低下させる。また,テンプレートの変更がMCQ型タスクでは高い感度をもたらすのに対して,パラフレーズ化はオープン・エンド・ジェネレーションタスクでは高い感度をもたらすことがわかった。結果の再現コードはhttps://github.com/kowndinya-renduchintala/POSIX.comで公開されている。

Despite their remarkable capabilities, Large Language Models (LLMs) are found to be surprisingly sensitive to minor variations in prompts, often generating significantly divergent outputs in response to minor variations in the prompts, such as spelling errors, alteration of wording or the prompt template. However, while assessing the quality of an LLM, the focus often tends to be solely on its performance on downstream tasks, while very little to no attention is paid to prompt sensitivity. To fill this gap, we propose POSIX - a novel PrOmpt Sensitivity IndeX as a reliable measure of prompt sensitivity, thereby offering a more comprehensive evaluation of LLM performance. The key idea behind POSIX is to capture the relative change in loglikelihood of a given response upon replacing the corresponding prompt with a different intent-preserving prompt. We provide thorough empirical evidence demonstrating the efficacy of POSIX in capturing prompt sensitivity and subsequently use it to measure and thereby compare prompt sensitivity of various open-source LLMs. We find that merely increasing the parameter count or instruction tuning does not necessarily reduce prompt sensitivity whereas adding some few-shot exemplars, even just one, almost always leads to significant decrease in prompt sensitivity. We also find that alterations to prompt template lead to the highest sensitivity in the case of MCQ type tasks, whereas paraphrasing results in the highest sensitivity in open-ended generation tasks. The code for reproducing our results is open-sourced at https://github.com/kowndinya-renduchintala/POSIX.

翻訳日:2024-11-04 08:15:54 公開日:2024-10-04

# Buckle Up: データキュレーションによるすべてのカスタマイズステージにおけるLLMのロバスト化

Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation ( http://arxiv.org/abs/2410.02220v1 )

ライセンス: Link先を確認

Xiaoqun Liu, Jiacheng Liang, Luoxi Tang, Chenyu You, Muchao Ye, Zhaohan Xi,

(参考訳) 大規模な言語モデル(LLM)は、"カストミゼーション(customization)"と呼ばれるプロセスを通じて下流のアプリケーションに広く適用され、微調整はドメイン固有の専門知識を統合する一般的な方法である。しかし、最近の研究では、LSMを悪意のあるサンプルでチューニングすることで、その堅牢性を損なうことができ、有害なコンテンツを増幅する脆弱性が明らかにされている。このような攻撃を緩和するために、データキュレーションを利用した効果的な防御フレームワークを提案し、LLMの観点から、コモンセンステキストを改訂し、安全性を高める。キュレートされたテキストは、カスタマイズプロセスのすべての段階で、Jailbreak攻撃を緩和することができる: 将来のJailbreakの試みに対してLLMを免疫するカスタマイズ前、Jailbreakリスクを中和するカスタマイズ中、または、妥協されたモデルを復元するカスタマイズ後。キュレートされたデータは、標準の微調整ワークフローを通じてLLMを強化するため、LLM推論中に追加モジュールを導入せず、元のカスタマイズプロセスを保存する。実験の結果、ジェイルブレイク効果は大幅に減少し、最大で100%の応答が得られた。特に,本手法は,安全関連データよりも容易なコモンセンステキストでも有効である。あらゆる段階の防御フレームワークと実験性能により、この作業は、脱獄リスクを軽減し、LLMの安全なカスタマイズを確保するための重要な進歩を示す。

Large language models (LLMs) are extensively adapted for downstream applications through a process known as "customization," with fine-tuning being a common method for integrating domain-specific expertise. However, recent studies have revealed a vulnerability that tuning LLMs with malicious samples can compromise their robustness and amplify harmful content, an attack known as "jailbreaking." To mitigate such attack, we propose an effective defensive framework utilizing data curation to revise commonsense texts and enhance their safety implication from the perspective of LLMs. The curated texts can mitigate jailbreaking attacks at every stage of the customization process: before customization to immunize LLMs against future jailbreak attempts, during customization to neutralize jailbreaking risks, or after customization to restore the compromised models. Since the curated data strengthens LLMs through the standard fine-tuning workflow, we do not introduce additional modules during LLM inference, thereby preserving the original customization process. Experimental results demonstrate a substantial reduction in jailbreaking effects, with up to a 100% success in generating responsible responses. Notably, our method is effective even with commonsense texts, which are often more readily available than safety-relevant data. With the every-stage defensive framework and supporting experimental performance, this work represents a significant advancement in mitigating jailbreaking risks and ensuring the secure customization of LLMs.

翻訳日:2024-11-04 07:55:57 公開日:2024-10-04

# Buckle Up: データキュレーションによるすべてのカスタマイズステージにおけるLLMのロバスト化

Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation ( http://arxiv.org/abs/2410.02220v2 )

ライセンス: Link先を確認

Xiaoqun Liu, Jiacheng Liang, Luoxi Tang, Chenyu You, Muchao Ye, Zhaohan Xi,

翻訳日:2024-11-04 07:55:57 公開日:2024-10-04

# コーパスノベルティのアノテーションガイドライン : その1-名前付きエンティティ認識

Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition ( http://arxiv.org/abs/2410.02281v1 )

ライセンス: Link先を確認

Arthur Amalvy, Vincent Labatut,

(参考訳) ノベルティ・コーパス(英: Novelties corpus)は、名前付きエンティティ認識(NER)に注釈を付けた小説(と小説の一部)のコレクションである。本書では、その注釈中に適用されるガイドラインについて記述する。アノテータが使用する指示に加えて、注釈付き小説から検索した多くの例や、実体としてマークすべき表現、そうすべきでない表現を含む。

The Novelties corpus is a collection of novels (and parts of novels) annotated for Named Entity Recognition (NER) among other tasks. This document describes the guidelines applied during its annotation. It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels, and illustrating expressions that should be marked as entities as well as expressions that should not.

翻訳日:2024-11-04 07:36:05 公開日:2024-10-04

# マルチアーマッドバンドにおけるレイのアッパー信頼境界について

On Lai's Upper Confidence Bound in Multi-Armed Bandits ( http://arxiv.org/abs/2410.02279v1 )

ライセンス: Link先を確認

Huachen Ren, Cun-Hui Zhang,

(参考訳) この記念論文では、Tze Leung Lai による多武装の盗賊のトピックへの献身的な貢献を顕彰する。ガウス報酬の探索レベルが一定である高信頼度有界指数に対して、急激な非漸近的後悔境界を確立する。さらに, 対応する腕の標本サイズに応じて減少する探索関数を用いて, 上面信頼度有界指数に対する非漸近的後悔境界を定めている。後悔境界は、レイ・ロビンズの下界と一致する鉛直定数を持つ。我々の結果は、機械学習の文献にもっと注目に値する、Laiの独創的な作品の側面を強調します。

In this memorial paper, we honor Tze Leung Lai's seminal contributions to the topic of multi-armed bandits, with a specific focus on his pioneering work on the upper confidence bound. We establish sharp non-asymptotic regret bounds for an upper confidence bound index with a constant level of exploration for Gaussian rewards. Furthermore, we establish a non-asymptotic regret bound for the upper confidence bound index of \cite{lai1987adaptive} which employs an exploration function that decreases with the sample size of the corresponding arm. The regret bounds have leading constants that match the Lai-Robbins lower bound. Our results highlight an aspect of Lai's seminal works that deserves more attention in the machine learning literature.

翻訳日:2024-11-04 04:12:15 公開日:2024-10-04

# マルチアーマッドバンドにおけるレイのアッパー信頼境界について

On Lai's Upper Confidence Bound in Multi-Armed Bandits ( http://arxiv.org/abs/2410.02279v2 )

ライセンス: Link先を確認

Huachen Ren, Cun-Hui Zhang,

(参考訳) この記念論文では、Tze Leung Lai による多武装の盗賊のトピックへの献身的な貢献を顕彰する。ガウス報酬の探索レベルが一定である高信頼度有界指数に対して、急激な非漸近的後悔境界を確立する。さらに,Lai (1987) の高信頼束縛指数に対して,対応する腕の標本サイズに比例して減少する探索関数を用いた非漸近的後悔境界を確立する。後悔境界は、レイ・ロビンズの下界と一致する鉛直定数を持つ。我々の結果は、機械学習の文献にもっと注目に値する、Laiの独創的な作品の側面を強調します。

In this memorial paper, we honor Tze Leung Lai's seminal contributions to the topic of multi-armed bandits, with a specific focus on his pioneering work on the upper confidence bound. We establish sharp non-asymptotic regret bounds for an upper confidence bound index with a constant level of exploration for Gaussian rewards. Furthermore, we establish a non-asymptotic regret bound for the upper confidence bound index of Lai (1987) which employs an exploration function that decreases with the sample size of the corresponding arm. The regret bounds have leading constants that match the Lai-Robbins lower bound. Our results highlight an aspect of Lai's seminal works that deserves more attention in the machine learning literature.

翻訳日:2024-11-04 04:12:15 公開日:2024-10-04

# コーパスノベルティのアノテーションガイドライン : その1-名前付きエンティティ認識

Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition ( http://arxiv.org/abs/2410.02281v2 )

ライセンス: Link先を確認

Arthur Amalvy, Vincent Labatut,

翻訳日:2024-11-04 04:12:15 公開日:2024-10-04

# オンライン手書き文字作成におけるグリフからの切り離し

Decoupling Layout from Glyph in Online Chinese Handwriting Generation ( http://arxiv.org/abs/2410.02309v1 )

ライセンス: Link先を確認

Ren-Min Si, Yan-Ming Zhang, Yi Chen,

(参考訳) テキストは人類の文明の伝達において重要な役割を担い、様々なスタイルでオンラインの手書きテキストを生成する機械を教えることは、興味深い、重要な課題である。しかし、これまでのほとんどの研究は個々の中国語フォントの生成に集中しており、完全なテキスト行の生成はほとんど探索されていない。本稿では,テキスト行が自然にレイアウトとグリフの2つの構成要素に分けることができることを示す。この分割に基づいて,この課題に階層的に対処するために,テキスト行レイアウトジェネレータと拡散型スタイリゼーションフォント合成器を併用したテキスト行レイアウトジェネレータを設計した。より具体的には、レイアウト生成装置は、テキスト内容と提供されたスタイル参照に基づいて、コンテキスト内学習を行い、各グリフの位置を自己回帰的に生成する。一方、文字埋め込み辞書、複数スケールの書体スタイルエンコーダ、及び1D U-Netベースの拡散デノイザからなるフォントシンセサイザは、所定の書体参照から抽出した書体スタイルを模倣しつつ、その位置に各フォントを生成する。 CASIA-OLHWDBの定性的および定量的実験により,本手法は構造的正確かつ識別不能な模擬サンプルを生成することができることを示した。

Text plays a crucial role in the transmission of human civilization, and teaching machines to generate online handwritten text in various styles presents an interesting and significant challenge. However, most prior work has concentrated on generating individual Chinese fonts, leaving {complete text line generation largely unexplored}. In this paper, we identify that text lines can naturally be divided into two components: layout and glyphs. Based on this division, we designed a text line layout generator coupled with a diffusion-based stylized font synthesizer to address this challenge hierarchically. More concretely, the layout generator performs in-context-like learning based on the text content and the provided style references to generate positions for each glyph autoregressively. Meanwhile, the font synthesizer which consists of a character embedding dictionary, a multi-scale calligraphy style encoder, and a 1D U-Net based diffusion denoiser will generate each font on its position while imitating the calligraphy style extracted from the given style references. Qualitative and quantitative experiments on the CASIA-OLHWDB demonstrate that our method is capable of generating structurally correct and indistinguishable imitation samples.

翻訳日:2024-11-04 04:00:02 公開日:2024-10-04

# オンライン手書き文字作成におけるグリフからの切り離し

Decoupling Layout from Glyph in Online Chinese Handwriting Generation ( http://arxiv.org/abs/2410.02309v2 )

ライセンス: Link先を確認

Min-Si Ren, Yan-Ming Zhang, Yi Chen,

翻訳日:2024-11-04 04:00:02 公開日:2024-10-04

# RAGはLLMの推論にどの程度役立つか?

How Much Can RAG Help the Reasoning of LLM? ( http://arxiv.org/abs/2410.02338v1 )

ライセンス: Link先を確認

Jingyu Liu, Jiaen Lin, Yong Liu,

(参考訳) Retrieval-Augmented Generation (RAG) は、新しい知識の導入と幻覚の低減に効果があるため、現代のLarge Language Models (LLMs) において大きな人気を集めている。しかし、RAGの深い理解は依然として限られており、RAGが推論プロセスをどのように助け、RAGが推論能力を改善するのにどう役立つのかは疑問である。外部文書はドメイン固有の情報を組み込む方法として一般的に考えられているが、クエリに関連する中間的推論結果も含んでいることから、これまで検討されていないLCMの推論能力を高める可能性が示唆されている。本稿では,この問題を深く検討し,RAGが推論を補助できるのに対して,支援は限定的であることを示す。一定の深さを持つ木として推論過程を概念化すれば、RAGはより深い推論を行うLLMを支援するのに苦労する。さらに、ドキュメント内の情報はノイズをフィルタリングするために事前処理が必要である。我々は、この前処理がLLMの微調整を単純に行うのが困難であることを示し、その問題を解決するために多くのトランスフォーマー層を必要とすることをしばしば示している。問題を単純化するために,DPromptチューニングを提案する。これは,限られた変圧器層内での問題を効果的に解決し,性能が向上する。

Retrieval-Augmented Generation (RAG) has gained significant popularity in modern Large Language Models (LLMs) due to its effectiveness in introducing new knowledge and reducing hallucinations. However, the deep understanding of RAG remains limited, how does RAG help the reasoning process and can RAG help improve the reasoning capability remains question. While external documents are typically considered as a method to incorporate domain-specific information, they also contain intermediate reasoning results related to the query, this suggests that documents could enhance the reasoning capability of LLMs, which has not been previously explored. In this paper, we investigate this issue in depth and find that while RAG can assist with reasoning, the help is limited. If we conceptualize the reasoning process as a tree with fixed depth, then RAG struggles to assist LLMs in performing deeper reasoning. Additionally, the information in the documents requires preprocessing to filter out noise. We demonstrate that this preprocessing is difficult to achieve simply fine-tuning of the LLM, it often necessitates numerous additional transformer layers to solve the problem. To simplify the problem, we propose DPrompt tuning, which effectively resolves the issue within just limited transformer layers, leading to improved performance.

翻訳日:2024-11-04 03:50:17 公開日:2024-10-04

# RAGはLLMの推論にどの程度役立つか?

How Much Can RAG Help the Reasoning of LLM? ( http://arxiv.org/abs/2410.02338v2 )

ライセンス: Link先を確認

Jingyu Liu, Jiaen Lin, Yong Liu,

翻訳日:2024-11-04 03:50:17 公開日:2024-10-04

# IoT-LLM: 大規模言語モデルによる実世界のIoTタスク推論の強化

IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models ( http://arxiv.org/abs/2410.02429v1 )

ライセンス: Link先を確認

Tuo An, Yunjiao Zhou, Han Zou, Jianfei Yang,

(参考訳) 大規模言語モデル(LLM)は、テキストと視覚領域にまたがる顕著な能力を示してきたが、しばしば物理法則に違反した出力を生成し、物理世界に対する理解のギャップを明らかにしている。知覚が推論の基礎となる人間の認知に触発されて,モノのインターネット(IoT)センサデータを用いた知覚能力の向上と,物理世界でのIoTタスク推論に関する関連する知識について検討する。本研究では,実世界のIoTタスクに対処するLLMの能力を,認識と知識ベースを増強して体系的に研究し,その能力を高めるために統合されたフレームワークであるIoT-LLMを提案する。 IoT-LLMでは、IoTデータをLLMに対応可能なフォーマットにプリプロセッシングし、チェーン・オブ・シンクレットのプロンプトと特殊な役割定義を通じてコモンセンスの知識を活性化し、コンテキスト内学習に基づくIoT指向の検索強化生成を通じて理解を深める、という3つのステップをカスタマイズします。性能を評価するため、我々は、異なるデータタイプと推論困難を持つ5つの実世界のIoTタスクを備えた新しいベンチマークを設計し、6つのオープンソースおよびオープンソースLLM上でベンチマーク結果を提供する。実験の結果,これらのタスクを効果的に実行できないテキスト入力による既存のLLMの限界が示された。 GPT-4 などの LLM を推論した IoT タスクの性能は IoT-LLM により大幅に向上し,従来の手法と比較して,各タスクの平均 65% の改善が達成された。結果は、推論プロセスを提供することで、IoTデータとデータの背後にある物理法則を理解するLLMの能力を示す。我々の研究の限界は、この新時代の将来の研究に刺激を与えると主張されている。

Large Language Models (LLMs) have demonstrated remarkable capabilities across textual and visual domains but often generate outputs that violate physical laws, revealing a gap in their understanding of the physical world. Inspired by human cognition, where perception is fundamental to reasoning, we explore augmenting LLMs with enhanced perception abilities using Internet of Things (IoT) sensor data and pertinent knowledge for IoT task reasoning in the physical world. In this work, we systematically study LLMs capability to address real-world IoT tasks by augmenting their perception and knowledge base, and then propose a unified framework, IoT-LLM, to enhance such capability. In IoT-LLM, we customize three steps for LLMs: preprocessing IoT data into formats amenable to LLMs, activating their commonsense knowledge through chain-of-thought prompting and specialized role definitions, and expanding their understanding via IoT-oriented retrieval-augmented generation based on in-context learning. To evaluate the performance, We design a new benchmark with five real-world IoT tasks with different data types and reasoning difficulties and provide the benchmarking results on six open-source and close-source LLMs. Experimental results demonstrate the limitations of existing LLMs with naive textual inputs that cannot perform these tasks effectively. We show that IoT-LLM significantly enhances the performance of IoT tasks reasoning of LLM, such as GPT-4, achieving an average improvement of 65% across various tasks against previous methods. The results also showcase LLMs ability to comprehend IoT data and the physical law behind data by providing a reasoning process. Limitations of our work are claimed to inspire future research in this new era.

翻訳日:2024-11-04 03:20:51 公開日:2024-10-04

# IoT-LLM: 大規模言語モデルによる実世界のIoTタスク推論の強化

IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models ( http://arxiv.org/abs/2410.02429v2 )

ライセンス: Link先を確認

Tuo An, Yunjiao Zhou, Han Zou, Jianfei Yang,

翻訳日:2024-11-04 03:20:51 公開日:2024-10-04

# MedVisionLlama: トレーニング済みの大規模言語モデルレイヤを活用して医療画像のセグメンテーションを促進する

MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation ( http://arxiv.org/abs/2410.02458v1 )

ライセンス: Link先を確認

Gurucharan Marthi Krishna Kumar, Aman Chadha, Janine Mendola, Amir Shmuel,

(参考訳) テキストデータにおける汎用性で知られる大規模言語モデル (LLM) は, 正確な画像診断を行う上で重要な課題である, 医用画像のセグメンテーションを強化する可能性について, 研究が進んでいる。本研究では、予め訓練されたLCMトランスブロックを統合することで、医用画像セグメンテーションのためのビジョントランス(ViT)の強化について検討する。凍結LDMトランスバータブロックをViTモデルエンコーダに組み込んだアプローチにより,様々な医用画像モダリティのセグメンテーション性能が大幅に向上した。本稿では,グローバルな特徴学習と局所的な特徴学習を組み合わせたハイブリッド注意機構を提案する。改良されたモデルでは、平均Diceスコアが0.74から0.79に向上し、精度、精度、ジャカード指数が向上した。これらの結果は, 医用画像分割の精細化におけるLLMトランスフォーマーの有効性を示し, モデル精度とロバスト性を大幅に向上させる可能性を強調した。ソースコードと実装は以下の通りである。

Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: https://bit.ly/3zf2CVs

翻訳日:2024-11-04 03:11:05 公開日:2024-10-04

Gurucharan Marthi Krishna Kumar, Aman Chadha, Janine Mendola, Amir Shmuel,

翻訳日:2024-11-04 03:11:05 公開日:2024-10-04

# 拡散モデルは進化的アルゴリズムである

Diffusion Models are Evolutionary Algorithms ( http://arxiv.org/abs/2410.02543v1 )

ライセンス: Link先を確認

Yanbo Zhang, Benedikt Hartl, Hananel Hazan, Michael Levin,

(参考訳) 機械学習と生物学の収束において、拡散モデルが進化的アルゴリズムであることを明らかにする。進化を進化過程として考慮し、進化を拡散として逆転させることにより、拡散モデルが自然に進化のアルゴリズムを実行し、自然に選択、突然変異、生殖の隔離を包含することを数学的に示す。この同値性に基づいて拡散進化法(Diffusion Evolution method)を提案する。拡散モデルの文脈で最初に導入された反復的復調を利用した進化的アルゴリズムで、パラメータ空間における解をヒューリスティックに洗練する。従来のアプローチとは異なり、拡散進化は複数の最適解を効果的に同定し、主要な進化アルゴリズムより優れている。さらに、拡散モデル、すなわち潜時空間拡散と加速サンプリングの先進的な概念を活用して、高次元複素パラメータ空間における進化的タスクの解を求める潜時空間拡散進化(Latent Space Diffusion Evolution)を導入し、計算ステップを大幅に削減する。この拡散と進化の間の並列性は、2つの異なる分野を橋渡しするだけでなく、相互拡張のための新たな道を開き、オープンエンド進化に関する疑問を提起し、拡散進化の文脈において非ガウス的または離散拡散モデルを利用する可能性がある。

In a convergence of machine learning and biology, we reveal that diffusion models are evolutionary algorithms. By considering evolution as a denoising process and reversed evolution as diffusion, we mathematically demonstrate that diffusion models inherently perform evolutionary algorithms, naturally encompassing selection, mutation, and reproductive isolation. Building on this equivalence, we propose the Diffusion Evolution method: an evolutionary algorithm utilizing iterative denoising -- as originally introduced in the context of diffusion models -- to heuristically refine solutions in parameter spaces. Unlike traditional approaches, Diffusion Evolution efficiently identifies multiple optimal solutions and outperforms prominent mainstream evolutionary algorithms. Furthermore, leveraging advanced concepts from diffusion models, namely latent space diffusion and accelerated sampling, we introduce Latent Space Diffusion Evolution, which finds solutions for evolutionary tasks in high-dimensional complex parameter space while significantly reducing computational steps. This parallel between diffusion and evolution not only bridges two different fields but also opens new avenues for mutual enhancement, raising questions about open-ended evolution and potentially utilizing non-Gaussian or discrete diffusion models in the context of Diffusion Evolution.

翻訳日:2024-11-04 02:31:52 公開日:2024-10-04

# 拡散モデルは進化的アルゴリズムである

Diffusion Models are Evolutionary Algorithms ( http://arxiv.org/abs/2410.02543v2 )

ライセンス: Link先を確認

Yanbo Zhang, Benedikt Hartl, Hananel Hazan, Michael Levin,

翻訳日:2024-11-04 02:31:52 公開日:2024-10-04

# 自動音声認識におけるスペクトル圧縮のための畳み込み変分オートエンコーダ

Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition ( http://arxiv.org/abs/2410.02560v1 )

ライセンス: Link先を確認

Olga Yakovenko, Ivan Bondarenko,

(参考訳) 多くの自動音声認識(ASR)では、スペクトルがメル周波数ケプストラル係数(MFCC)よりも良い結果を示すが、実際には特徴空間の複素次元性のために使用が困難である。下記の論文では、畳み込み変分オートエンコーダ(VAE)に基づく圧縮スペクトログラム表現の代替手法を提案する。畳み込みVAEモデルは、13次元の埋め込みから短いオーディオスペクトログラム(25ms)の断片を再構成するために、LibriSpeechデータセットのサブサンプルで訓練された。トレーニングされた40次元(300ms)の埋め込みモデルは、GoogleSpeechCommandsデータセットで音声コマンドのコーパスを生成するために使用された。生成された特徴を用いて、ASRシステムを構築し、MFCCの機能を持つモデルと比較した。

For many Automatic Speech Recognition (ASR) tasks audio features as spectrograms show better results than Mel-frequency Cepstral Coefficients (MFCC), but in practice they are hard to use due to a complex dimensionality of a feature space. The following paper presents an alternative approach towards generating compressed spectrogram representation, based on Convolutional Variational Autoencoders (VAE). A Convolutional VAE model was trained on a subsample of the LibriSpeech dataset to reconstruct short fragments of audio spectrograms (25 ms) from a 13-dimensional embedding. The trained model for a 40-dimensional (300 ms) embedding was used to generate features for corpus of spoken commands on the GoogleSpeechCommands dataset. Using the generated features an ASR system was built and compared to the model with MFCC features.

翻訳日:2024-11-04 02:31:52 公開日:2024-10-04

Olga Iakovenko, Ivan Bondarenko,

翻訳日:2024-11-04 02:31:52 公開日:2024-10-04

# 他人の予測から3次元知覚を学ぶ

Learning 3D Perception from Others' Predictions ( http://arxiv.org/abs/2410.02646v1 )

ライセンス: Link先を確認

Jinsu Yoo, Zhenyang Feng, Tai-Yu Pan, Yihong Sun, Cheng Perng Phoo, Xiangyu Chen, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao,

(参考訳) 実環境における高精度な3Dオブジェクト検出には,高品質な大量の注釈付きデータが必要である。このようなデータを取得するのは面倒で費用がかかるため、新しいセンサーが採用されたり、検出器が新しい環境にデプロイされたりする際には、繰り返し作業が必要になることが多い。本研究では,3次元物体検出装置を構築するための新たなシナリオについて検討する。例えば、自動運転車が新しいエリアに入ると、その領域に最適化された検出器を持つ他の交通参加者から学ぶことができる。この設定はラベル効率、センサ非依存、通信効率が高い:近くのユニットは予測をエゴエージェント(例えば車)と共有するだけでよい。しかし、受信した予測を地絡として、エゴ車の検知器を訓練することは、性能の低下につながる。本研究は, 疑似陽性, 偽陰性, 不正確な擬似ラベルが生じる主な原因として, 問題を体系的に検討し, 視点ミスマッチと(同期やGPSエラーによる)位置ずれを同定する。距離に基づくカリキュラムを提案し、まず、類似した視点で近接した単位から学習し、その後、自己学習によって他の単位の予測の質を向上させる。さらに、有効な擬似ラベルリファインメントモジュールを少数の注釈付きデータでトレーニングできることを示し、オブジェクト検出器のトレーニングに必要なデータ量を大幅に削減する。我々は、エゴカーの擬似ラベルとして参照車の予測を用いて、最近リリースされた実世界の協調運転データセットに対するアプローチを検証する。いくつかのシナリオ(センサ、検出器、ドメインなど)を含む広範囲な実験は、他のユニットの予測から3D知覚をラベル効率よく学習するアプローチの有効性を実証している。

Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is deployed in a new environment. We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector. For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area. This setting is label-efficient, sensor-agnostic, and communication-efficient: nearby units only need to share the predictions with the ego agent (e.g., car). Naively using the received predictions as ground-truths to train the detector for the ego car, however, leads to inferior performance. We systematically study the problem and identify viewpoint mismatches and mislocalization (due to synchronization and GPS errors) as the main causes, which unavoidably result in false positives, false negatives, and inaccurate pseudo labels. We propose a distance-based curriculum, first learning from closer units with similar viewpoints and subsequently improving the quality of other units' predictions via self-training. We further demonstrate that an effective pseudo label refinement module can be trained with a handful of annotated data, largely reducing the data quantity necessary to train an object detector. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars' predictions as pseudo labels for the ego car. Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units' predictions.

翻訳日:2024-11-04 01:52:35 公開日:2024-10-04

# 他人の予測から3次元知覚を学ぶ

Learning 3D Perception from Others' Predictions ( http://arxiv.org/abs/2410.02646v2 )

ライセンス: Link先を確認

Jinsu Yoo, Zhenyang Feng, Tai-Yu Pan, Yihong Sun, Cheng Perng Phoo, Xiangyu Chen, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao,

翻訳日:2024-11-04 01:52:35 公開日:2024-10-04

# 合成データを用いたビデオインストラクションチューニング

Video Instruction Tuning With Synthetic Data ( http://arxiv.org/abs/2410.02713v1 )

ライセンス: Link先を確認

Yuanhan Zhang, Jinming Wu, Wei Li, Bo Li, Zejun Ma, Ziwei Liu, Chunyuan Li,

(参考訳) ビデオ大マルチモーダルモデル(LMM)の開発は,Webから大量の高品質な生データを収集することの難しさによって妨げられている。そこで本稿では,LLaVA-Video-178Kというビデオ命令追従のための高品質な合成データセットを作成する方法を提案する。このデータセットには、詳細なキャプション、オープンエンド質問回答(QA)、複数選択QAといった重要なタスクが含まれている。このデータセットをトレーニングすることにより、既存の視覚的インストラクションチューニングデータと組み合わせて、新しいビデオLMMであるLLaVA-Videoを導入する。実験の結果,LLaVA-Videoは様々なビデオベンチマークで高い性能を示し,データセットの有効性を強調した。データセット、生成パイプライン、モデルチェックポイントをリリースする予定です。

The development of video large multimodal models (LMMs) has been hindered by the difficulty of curating large amounts of high-quality raw data from the web. To address this, we propose an alternative approach by creating a high-quality synthetic dataset specifically for video instruction-following, namely LLaVA-Video-178K. This dataset includes key tasks such as detailed captioning, open-ended question-answering (QA), and multiple-choice QA. By training on this dataset, in combination with existing visual instruction tuning data, we introduce LLaVA-Video, a new video LMM. Our experiments demonstrate that LLaVA-Video achieves strong performance across various video benchmarks, highlighting the effectiveness of our dataset. We plan to release the dataset, its generation pipeline, and the model checkpoints.

翻訳日:2024-11-04 01:23:03 公開日:2024-10-04

# 合成データを用いたビデオインストラクションチューニング

Video Instruction Tuning With Synthetic Data ( http://arxiv.org/abs/2410.02713v2 )

ライセンス: Link先を確認

Yuanhan Zhang, Jinming Wu, Wei Li, Bo Li, Zejun Ma, Ziwei Liu, Chunyuan Li,

翻訳日:2024-11-04 01:23:03 公開日:2024-10-04

# 正義か偏見か? LLM-as-a-Judgeにおけるバイアスの定量化

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge ( http://arxiv.org/abs/2410.02736v1 )

ライセンス: Link先を確認

Jiayi Ye, Yanbo Wang, Yue Huang, Dongping Chen, Qihui Zhang, Nuno Moniz, Tian Gao, Werner Geyer, Chao Huang, Pin-Yu Chen, Nitesh V Chawla, Xiangliang Zhang,

(参考訳) LLM-as-a-Judgeは様々なベンチマークで評価手法として広く利用されており、モデルトレーニングにおける教師付き報酬として機能している。しかし、多くのドメインで優れているにもかかわらず、潜在的な問題は未調査であり、その信頼性と実用性の範囲を損なう。そこで本研究では,LLM-as-a-Judgeにおける各種類のバイアスを,自動的および原則的修正を用いて体系的に定量化し解析する,新しい自動バイアス量化フレームワークであるCALMを提案する。実験では,複数の人気言語モデルについて検討し,高度なモデルが総合的な性能を達成する一方で,特定のタスクにおいて重要なバイアスが持続することを示した。実験結果から, LLM-as-a-Judgeの信頼性は改善の余地があることが示唆された。さらに,これらのバイアスの明示的および暗黙的な影響についても論じ,LLM-as-a-Judgeの信頼性向上を示唆する。当社の作業は、これらの問題に対処するステークホルダの必要性を強調し、LLM-as-a-Judgeアプリケーションで注意を喚起します。

LLM-as-a-Judge has been widely utilized as an evaluation method in various benchmarks and served as supervised rewards in model training. However, despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. Therefore, we identify 12 key potential biases and propose a new automated bias quantification framework-CALM-which systematically quantifies and analyzes each type of bias in LLM-as-a-Judge by using automated and principle-guided modification. Our experiments cover multiple popular language models, and the results indicate that while advanced models have achieved commendable overall performance, significant biases persist in certain specific tasks. Empirical results suggest that there remains room for improvement in the reliability of LLM-as-a-Judge. Moreover, we also discuss the explicit and implicit influence of these biases and give some suggestions for the reliable application of LLM-as-a-Judge. Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.

翻訳日:2024-11-04 01:13:18 公開日:2024-10-04

# 正義か偏見か? LLM-as-a-Judgeにおけるバイアスの定量化

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge ( http://arxiv.org/abs/2410.02736v2 )

ライセンス: Link先を確認

Jiayi Ye, Yanbo Wang, Yue Huang, Dongping Chen, Qihui Zhang, Nuno Moniz, Tian Gao, Werner Geyer, Chao Huang, Pin-Yu Chen, Nitesh V Chawla, Xiangliang Zhang,

翻訳日:2024-11-04 01:13:18 公開日:2024-10-04

# AVG-LLaVA:適応的な視覚的粒度を持つマルチモーダル大モデル

AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity ( http://arxiv.org/abs/2410.02745v1 )

ライセンス: Link先を確認

Zhibin Lan, Liqiang Niu, Fandong Meng, Wenbo Li, Jie Zhou, Jinsong Su,

(参考訳) 近年、高解像度画像を扱う場合、支配的なLMMは通常、それらを複数のローカル画像と1つのグローバル画像に分割する。本研究では、入力画像と命令に基づいて適切な視覚的粒度を適応的に選択できるLMMであるAVG-LLaVAを紹介する。このアプローチは、ビジュアルトークンの数を減らし、推論を高速化するだけでなく、全体的なモデルパフォーマンスも改善する。具体的には、LLaVA-NeXTに基づく以下のモジュールを紹介する。 (a)異なる粒度の視覚的トークンを得るために複数のプール層を含む視覚的粒度スケーラ b)トランスフォーマー層、MLP層、及び投票器層を含む視覚的粒度ルータであって、画像及び指示に基づいて適切な視覚的粒度を選択するために使用される。さらに,ルータが予測する粒度をLMMの好みに合わせることを目的とした新たなトレーニングパラダイムであるRGLFを提案する。大規模な実験と分析の結果、AVG-LLaVAは11のベンチマークで優れたパフォーマンスを達成し、ビジュアルトークンの数を大幅に削減し、推論を高速化している(例えば、ビジュアルトークンの85.3%削減とAI2Dベンチマークでの推論速度の2.53$\times$上昇)。

Recently, when dealing with high-resolution images, dominant LMMs usually divide them into multiple local images and one global image, which will lead to a large number of visual tokens. In this work, we introduce AVG-LLaVA, an LMM that can adaptively select the appropriate visual granularity based on the input image and instruction. This approach not only reduces the number of visual tokens and speeds up inference, but also improves the overall model performance. Specifically, we introduce the following modules based on LLaVA-NeXT: (a) a visual granularity scaler that includes multiple pooling layers to obtain visual tokens with different granularities; (b) a visual granularity router, which includes a Transformer layer, an MLP layer, and a voter layer, used to select the appropriate visual granularity based on the image and instruction. Furthermore, we propose RGLF, a novel training paradigm that aims at aligning the granularity predicted by the router with the preferences of the LMM, without the need for additional manually annotated data. Extensive experiments and analysis show that AVG-LLaVA achieves superior performance across 11 benchmarks, as well as significantly reduces the number of visual tokens and speeds up inference (e.g., an 85.3% reduction in visual tokens and a 2.53$\times$ increase in inference speed on the AI2D benchmark).

翻訳日:2024-11-04 01:03:22 公開日:2024-10-04

# AVG-LLaVA:適応的な視覚的粒度を持つ大規模マルチモーダルモデル

AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity ( http://arxiv.org/abs/2410.02745v2 )

ライセンス: Link先を確認

Zhibin Lan, Liqiang Niu, Fandong Meng, Wenbo Li, Jie Zhou, Jinsong Su,

翻訳日:2024-11-04 01:03:22 公開日:2024-10-04

# ソーシャル・アウェア・ダイアログのためのスケーラブルなフレームベースによる社会文化的ノルムベースの構築

Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues ( http://arxiv.org/abs/2410.03049v1 )

ライセンス: Link先を確認

Shilin Qu, Weiqing Wang, Xin Zhou, Haolan Zhan, Zhuang Li, Lizhen Qu, Linhao Luo, Yuan-Fang Li, Gholamreza Haffari,

(参考訳) 社会文化の規範は、社会的相互作用における個人的行為の指針として機能し、尊敬、協力、適切な行動を強調し、会話情報検索、文脈情報検索、検索強化機械学習といったタスクに役立てることができる。本稿では,大規模言語モデル(LLM)を用いた社会文化的ノルム(SCN)ベースを構築するためのスケーラブルなアプローチを提案する。我々は、包括的で広くアクセス可能な中国社会文化ノルムベースを構築した。提案手法は,コンテキストフレームに富んだ社会認識対話を主データ源として利用し,生成過程の制約と幻覚の低減を図る。これにより、状況に関する発話の実践的な意味を生かし、高品質でニュアンスのある自然言語のノルム文を抽出することができる。金フレームを付加した実対話は容易には利用できないため、合成データを用いて提案する。私たちの経験的結果は以下のとおりです。 (i)合成データから得られるSCNの品質は、金枠に注釈を付けた実際の対話に匹敵するものであり、 (II) 実データから抽出したSCNの品質は、銀(予測)または金のフレームで注釈付けされ、フレームアノテーションを使わずにそれを超える。さらに、複数の下流対話タスクを推論するRAGモデル(Retrieval-Augmented Generation)モデルにおいて、抽出したSCNの有効性を示す。

Sociocultural norms serve as guiding principles for personal conduct in social interactions, emphasizing respect, cooperation, and appropriate behavior, which is able to benefit tasks including conversational information retrieval, contextual information retrieval and retrieval-enhanced machine learning. We propose a scalable approach for constructing a Sociocultural Norm (SCN) Base using Large Language Models (LLMs) for socially aware dialogues. We construct a comprehensive and publicly accessible Chinese Sociocultural NormBase. Our approach utilizes socially aware dialogues, enriched with contextual frames, as the primary data source to constrain the generating process and reduce the hallucinations. This enables extracting of high-quality and nuanced natural-language norm statements, leveraging the pragmatic implications of utterances with respect to the situation. As real dialogue annotated with gold frames are not readily available, we propose using synthetic data. Our empirical results show: (i) the quality of the SCNs derived from synthetic data is comparable to that from real dialogues annotated with gold frames, and (ii) the quality of the SCNs extracted from real data, annotated with either silver (predicted) or gold frames, surpasses that without the frame annotations. We further show the effectiveness of the extracted SCNs in a RAG-based (Retrieval-Augmented Generation) model to reason about multiple downstream dialogue tasks.

翻訳日:2024-11-03 04:16:10 公開日:2024-10-04

# AuroraCap: 効率的でパフォーマンスのよいビデオのキャプションとベンチマーク

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark ( http://arxiv.org/abs/2410.03051v1 )

ライセンス: Link先を確認

Wenhao Chai, Enxin Song, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tal, Jeng-Neng Hwang, Saining Xie, Christopher D. Manning,

(参考訳) ビデオの詳細なキャプションは、ビデオコンテンツの包括的で一貫性のあるテキスト記述を生成することを目的としており、ビデオの理解と生成の両方に役立っている。本稿では,大規模なマルチモーダルモデルに基づくビデオキャプタであるAuroraCapを提案する。時間的モデリングのためのパラメータを追加せずに、最もシンプルなアーキテクチャ設計に従う。長大なビデオシーケンスによるオーバーヘッドに対処するため、私たちはトークンマージ戦略を実装し、入力されたビジュアルトークンの数を減らす。驚いたことに、この戦略によってパフォーマンスがほとんど損なわれることがわかりました。例えば、Flickr30kで88.9のCIDErを取得し、GPT-4V (55.3)とGemini-1.5 Pro (82.2)を上回った。しかし、既存のビデオキャプションベンチマークには、いくつかの単語からなる単純な記述のみが含まれており、この分野の研究は制限されている。そこで我々は,千以上の注意深い注釈付き字幕を持つビデオ詳細な字幕ベンチマークであるVDCを開発した。さらに,長いキャプション評価を複数の短い質問応答対に変換する分割・問合せ戦略を採用したLCM支援メトリックVDCスコアを提案する。人間のEloランキングの助けを借りて、このベンチマークはビデオのキャプション品質に関する人間の判断と相関していることを示す。

Video detailed captioning is a key task which aims to generate comprehensive and coherent textual descriptions of video content, benefiting both video understanding and generation. In this paper, we propose AuroraCap, a video captioner based on a large multimodal model. We follow the simplest architecture design without additional parameters for temporal modeling. To address the overhead caused by lengthy video sequences, we implement the token merging strategy, reducing the number of input visual tokens. Surprisingly, we found that this strategy results in little performance loss. AuroraCap shows superior performance on various video and image captioning benchmarks, for example, obtaining a CIDEr of 88.9 on Flickr30k, beating GPT-4V (55.3) and Gemini-1.5 Pro (82.2). However, existing video caption benchmarks only include simple descriptions, consisting of a few dozen words, which limits research in this field. Therefore, we develop VDC, a video detailed captioning benchmark with over one thousand carefully annotated structured captions. In addition, we propose a new LLM-assisted metric VDCscore for bettering evaluation, which adopts a divide-and-conquer strategy to transform long caption evaluation into multiple short question-answer pairs. With the help of human Elo ranking, our experiments show that this benchmark better correlates with human judgments of video detailed captioning quality.

翻訳日:2024-11-03 04:16:10 公開日:2024-10-04

# 高速輸送を用いたクラス階層の埋め込みによる構造化表現の学習

Learning Structured Representations by Embedding Class Hierarchy with Fast Optimal Transport ( http://arxiv.org/abs/2410.03052v1 )

ライセンス: Link先を確認

Siqi Zeng, Sixian Du, Makoto Yamada, Han Zhao,

(参考訳) ラベル内に構造化された知識を特徴表現に組み込むため、先行研究 (Zeng et al , 2022) では、教師あり学習における正則化としてCophenetic correlation Coefficient (CPCC) を用いることを提案した。この正規化器は、クラス平均のペアワイズユークリッド距離を算出し、ラベル階層木から派生した対応する最短経路距離と整合する。しかし、クラス平均はクラス条件分布のよい代表ではないかもしれない。この制限に対処するため、CPCCフレームワークの下で、特徴空間内のクラス間のペア距離を測定するために、Earth Mover's Distance (EMD) を用いることを提案する。提案手法は従来の手法を一般化し,特徴空間におけるクラス条件分布がガウス分布である場合,既存のアルゴリズムを復元する。提案手法の計算効率をさらに向上するために,4つのEMD近似変種を探索し,最適なトランスポートCPCCファミリーを導入する。最も効率的なOT-CPCC変種は、データセットとタスク間の競合性能を維持しながら、データセットのサイズで線形時間で実行されます。

To embed structured knowledge within labels into feature representations, prior work (Zeng et al., 2022) proposed to use the Cophenetic Correlation Coefficient (CPCC) as a regularizer during supervised learning. This regularizer calculates pairwise Euclidean distances of class means and aligns them with the corresponding shortest path distances derived from the label hierarchy tree. However, class means may not be good representatives of the class conditional distributions, especially when they are multi-mode in nature. To address this limitation, under the CPCC framework, we propose to use the Earth Mover's Distance (EMD) to measure the pairwise distances among classes in the feature space. We show that our exact EMD method generalizes previous work, and recovers the existing algorithm when class-conditional distributions are Gaussian in the feature space. To further improve the computational efficiency of our method, we introduce the Optimal Transport-CPCC family by exploring four EMD approximation variants. Our most efficient OT-CPCC variant runs in linear time in the size of the dataset, while maintaining competitive performance across datasets and tasks.

翻訳日:2024-11-03 04:16:10 公開日:2024-10-04

# CLIP-Clique:オブジェクトベースグローバルローカライゼーションのための視覚言語モデルによるグラフベースの対応マッチング

CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization ( http://arxiv.org/abs/2410.03054v1 )

ライセンス: Link先を確認

Shigemichi Matsuzaki, Kazuhito Tanaka, Kazuhiro Shintani,

(参考訳) 本文では,意味オブジェクトランドマークを持つ地図上でのグローバルなローカライズ手法を提案する。オブジェクトマップ上のローカライズのための最も有望なアプローチの1つは、周囲のオブジェクトの分布から計算されたランドマーク記述子を用いて意味グラフマッチングを使用することである。これらの記述子は誤分類や部分的な観察に弱い。さらに、多くの既存手法はRANSACを用いた不整合抽出に依存しており、これは確率的であり、高い外れ値率に敏感である。従来の問題に対処するために、視覚言語モデル(VLM)を用いた対応マッチングを強化する。ランドマークの識別性は、周囲の物体とは独立なVLM埋め込みによって改善される。さらに、inlierはグラフ理論のアプローチを用いて決定的に推定される。また、対応性や観測完全性を考慮した最小二乗の重み付けによるポーズ計算を導入し、ロバスト性を向上させる。 ScanNetおよびTUMデータセットを用いた実験により,マッチング精度とポーズ推定精度の改善を確認した。

This letter proposes a method of global localization on a map with semantic object landmarks. One of the most promising approaches for localization on object maps is to use semantic graph matching using landmark descriptors calculated from the distribution of surrounding objects. These descriptors are vulnerable to misclassification and partial observations. Moreover, many existing methods rely on inlier extraction using RANSAC, which is stochastic and sensitive to a high outlier rate. To address the former issue, we augment the correspondence matching using Vision Language Models (VLMs). Landmark discriminability is improved by VLM embeddings, which are independent of surrounding objects. In addition, inliers are estimated deterministically using a graph-theoretic approach. We also incorporate pose calculation using the weighted least squares considering correspondence similarity and observation completeness to improve the robustness. We confirmed improvements in matching and pose estimation accuracy through experiments on ScanNet and TUM datasets.

翻訳日:2024-11-03 04:16:10 公開日:2024-10-04

# 大規模言語モデルに対する許容情報フロー解析

Permissive Information-Flow Analysis for Large Language Models ( http://arxiv.org/abs/2410.03055v1 )

ライセンス: Link先を確認

Shoaib Ahmed Siddiqui, Radhika Gaonkar, Boris Köpf, David Krueger, Andrew Paverd, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Menglin Xia, Santiago Zanella-Béguelin,

(参考訳) 大規模言語モデル(LLM)は、大規模ソフトウェアシステムのコモディティコンポーネントになりつつある。これは自然のセキュリティとプライバシの問題を引き起こす。あるコンポーネントから取得した有毒なデータは、モデルの振る舞いを変更し、システム全体を汚染する可能性がある。 1つの有望なアプローチは、動的情報フロー(別名 taint)トラッキングを通じて、システムレベルでこの問題に取り組むことである。残念ながら、出力に最も制限のある入力ラベルを伝搬する従来のアプローチは、多様なソースから取得した入力に対してLLMが動作するアプリケーションには保守的すぎる。本稿では,LLMクエリを通じて情報フローラベルを伝搬する,新しい,より寛容な手法を提案する。提案手法の背景にある重要な考え方は、モデル出力の生成に影響を及ぼすサンプルのラベルのみを伝播させ、不要な入力のラベルを除去することである。このアプローチの2つのバリエーションの有効性を実装し,検討する。 (i)プロンプトベースの検索強化、及び (ii)$k$-nearest-neighbors言語モデル。本稿では,言語モデルに出力ラベルの予測を直接依頼するイントロスペクションに基づくインフルエンス推定器のベースラインと比較する。その結果, プロンプトベースラベルプロパゲータの優位性が強調され, LLMエージェント設定の85%以上でラベルが改善した。これらの知見は,検索増強のための許容ラベル伝搬の実用性を強調した。

Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems. This poses natural security and privacy problems: poisoned data retrieved from one component can change the model's behavior and compromise the entire system, including coercing the model to spread confidential data to untrusted components. One promising approach is to tackle this problem at the system level via dynamic information flow (aka taint) tracking. Unfortunately, the traditional approach of propagating the most restrictive input label to the output is too conservative for applications where LLMs operate on inputs retrieved from diverse sources. In this paper, we propose a novel, more permissive approach to propagate information flow labels through LLM queries. The key idea behind our approach is to propagate only the labels of the samples that were influential in generating the model output and to eliminate the labels of unnecessary input. We implement and investigate the effectiveness of two variations of this approach, based on (i) prompt-based retrieval augmentation, and (ii) a $k$-nearest-neighbors language model. We compare these with the baseline of an introspection-based influence estimator that directly asks the language model to predict the output label. The results obtained highlight the superiority of our prompt-based label propagator, which improves the label in more than 85% of the cases in an LLM agent setting. These findings underscore the practicality of permissive label propagation for retrieval augmentation.

翻訳日:2024-11-03 04:16:10 公開日:2024-10-04

# アンタングル表現評価のための改良されたメトリクスの実現に向けて

Towards an Improved Metric for Evaluating Disentangled Representations ( http://arxiv.org/abs/2410.03056v1 )

ライセンス: Link先を確認

Sahib Julka, Yashu Wang, Michael Granitzer,

(参考訳) 切り離された表現学習は、表現を制御可能、解釈可能、転送可能にする上で重要な役割を果たす。領域におけるその重要性にもかかわらず、信頼性と一貫した量的絡み合い計量の探求は依然として大きな課題である。これは、さまざまな特性と、その設計によって導入された潜在的なバイアスを測定する多様なメトリクスの利用に由来する。本研究は,既存のゆがみ評価指標を網羅的に検討し,ゆがみの側面(モジュラリティ,コンパクト性,明示性)を比較し,因子コード関係を検出し,ゆがみの程度を記述した。本稿では,「emph{EDI}」と題する測度を導入し,直感的な概念である「emph{exclusivity}」と「因子コード関係」を改善し,アドホックな決定を最小化する手法を提案する。詳細な分析によると、EDIは既存のメトリクスよりも安定して重要な特性を計測し、標準化されたアプローチとして採用することを提唱している。

Disentangled representation learning plays a pivotal role in making representations controllable, interpretable and transferable. Despite its significance in the domain, the quest for reliable and consistent quantitative disentanglement metric remains a major challenge. This stems from the utilisation of diverse metrics measuring different properties and the potential bias introduced by their design. Our work undertakes a comprehensive examination of existing popular disentanglement evaluation metrics, comparing them in terms of measuring aspects of disentanglement (viz. Modularity, Compactness, and Explicitness), detecting the factor-code relationship, and describing the degree of disentanglement. We propose a new framework for quantifying disentanglement, introducing a metric entitled \emph{EDI}, that leverages the intuitive concept of \emph{exclusivity} and improved factor-code relationship to minimize ad-hoc decisions. An in-depth analysis reveals that EDI measures essential properties while offering more stability than existing metrics, advocating for its adoption as a standardised approach.

翻訳日:2024-11-03 04:16:10 公開日:2024-10-04

# DiffKillR:Dense Microscopy Imagesにおける細胞アノテーションのためのDiffomorphismsの殺害と再生

DiffKillR: Killing and Recreating Diffeomorphisms for Cell Annotation in Dense Microscopy Images ( http://arxiv.org/abs/2410.03058v1 )

ライセンス: Link先を確認

Chen Liu, Danqi Liao, Alejandro Parada-Mayorga, Alejandro Ribeiro, Marcello DiStasio, Smita Krishnaswamy,

(参考訳) 自動スライドスキャンの進歩によって誘導されるデジタル顕微鏡画像の拡散は、生体医学的な研究や臨床診断に重要な機会をもたらす。しかし、これらの画像に密集した情報を正確に注釈付けすることは大きな課題である。 DiffKillRは、アーチェタイプマッチングと画像登録タスクの組み合わせとしてセルアノテーションを再構成する新しいフレームワークである。 DiffKillRは、堅牢な細胞マッチングのために微分同相不変の特徴空間を学習するニューラルネットワークと、アノテーションマッピングのために細胞間の正確なワープフィールドを計算するニューラルネットワークを2つ採用している。注釈付きアーチタイプの小さなセットを使用して、DiffKillRは、大きな顕微鏡画像間でアノテーションを効率よく伝播し、広範囲な手動ラベリングの必要性を減らす。さらに重要なのは、どんな種類のピクセルレベルのアノテーションにも適しています。我々はDiffKillRの理論的性質について論じ、それを3つの顕微鏡タスクで検証し、既存の教師付き・半教師なし・教師なしの手法に対する利点を実証する。

The proliferation of digital microscopy images, driven by advances in automated whole slide scanning, presents significant opportunities for biomedical research and clinical diagnostics. However, accurately annotating densely packed information in these images remains a major challenge. To address this, we introduce DiffKillR, a novel framework that reframes cell annotation as the combination of archetype matching and image registration tasks. DiffKillR employs two complementary neural networks: one that learns a diffeomorphism-invariant feature space for robust cell matching and another that computes the precise warping field between cells for annotation mapping. Using a small set of annotated archetypes, DiffKillR efficiently propagates annotations across large microscopy images, reducing the need for extensive manual labeling. More importantly, it is suitable for any type of pixel-level annotation. We will discuss the theoretical properties of DiffKillR and validate it on three microscopy tasks, demonstrating its advantages over existing supervised, semi-supervised, and unsupervised methods.

翻訳日:2024-11-03 04:06:08 公開日:2024-10-04

# トロッターエラーに対する下界

Lower Bounds for the Trotter Error ( http://arxiv.org/abs/2410.03059v1 )

ライセンス: Link先を確認

Alexander Hahn, Paul Hartung, Daniel Burgarth, Paolo Facchi, Kazuya Yuasa,

(参考訳) 実際に関係する量子系のアナログおよびデジタルシミュレーションでは、ターゲット力学は概ね実装できる。トロッター積公式は、チューニング精度を許容する一般的な方法であるため、最も一般的な近似スキームである。トロッターシミュレーションの精度は常に非可換作用素に対して不正確なものであるが、現在最小誤差が何であるかは分かっていない。トロッター誤差の上限は、しばしば非常に過大評価されることが知られているため、これは重要な量である。ここでは、エラー、ノルムおよび状態の明示的な下限を示し、最小限のリソース要求を導出する。真の誤差との数値的な比較は、我々の境界が正確かつ厳密な推定を与えることを示している。

In analog and digital simulations of practically relevant quantum systems, the target dynamics can only be implemented approximately. The Trotter product formula is the most common approximation scheme as it is a generic method which allows tuning accuracy. The Trotter simulation precision will always be inexact for non-commuting operators, but it is currently unknown what the minimum possible error is. This is an important quantity because upper bounds for the Trotter error are known to often be vast overestimates. Here, we present explicit lower bounds on the error, in norm and on states, allowing to derive minimum resource requirements. Numerical comparison with the true error shows that our bounds offer accurate and tight estimates.

翻訳日:2024-11-03 04:06:08 公開日:2024-10-04

# DocKD:オープンワールド文書理解モデルのためのLLMからの知識蒸留

DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models ( http://arxiv.org/abs/2410.03061v1 )

ライセンス: Link先を確認

Sungnyun Kim, Haofu Liao, Srikar Appalaraju, Peng Tang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan, Stefano Soatto,

(参考訳) ビジュアル文書理解(VDU)は、様々なモダリティ(テキストや画像)とレイアウト(フォーム、テーブルなど)にわたる文書の理解を伴う、困難なタスクである。本研究の目的は,LLMの知識を蒸留することにより,小型VDUモデルの一般化性を高めることである。 LLMの直接的なプロンプトは、しばしば情報的で有用なデータを生成するのに失敗する。これに対し、外部文書知識を統合することでデータ生成プロセスを充実させる新しいフレームワーク(DocKD)を提案する。具体的には、キーと値のペアやレイアウト、記述など、さまざまなドキュメント要素を備えたLCMを提供し、オープンな回答を導き出します。実験の結果,DocKDは高品質な文書アノテーションを生成し,外部文書知識を活用できない直接知識蒸留手法を超越していることがわかった。さらに、DocKD生成データのみでトレーニングされた学生VDUモデルは、ドメイン内タスクで人間が注釈付けしたデータでトレーニングされたモデルに匹敵するだけでなく、ドメイン外タスクで大幅に最適化されている。

Visual document understanding (VDU) is a challenging task that involves understanding documents across various modalities (text and image) and layouts (forms, tables, etc.). This study aims to enhance generalizability of small VDU models by distilling knowledge from LLMs. We identify that directly prompting LLMs often fails to generate informative and useful data. In response, we present a new framework (called DocKD) that enriches the data generation process by integrating external document knowledge. Specifically, we provide an LLM with various document elements like key-value pairs, layouts, and descriptions, to elicit open-ended answers. Our experiments show that DocKD produces high-quality document annotations and surpasses the direct knowledge distillation approach that does not leverage external document knowledge. Moreover, student VDU models trained with solely DocKD-generated data are not only comparable to those trained with human-annotated data on in-domain tasks but also significantly excel them on out-of-domain tasks.

翻訳日:2024-11-03 04:06:08 公開日:2024-10-04

# 画像ファーストかテキストファーストか? 大規模言語モデルにおけるモーダリティのシークエンシングの最適化と推論

Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks ( http://arxiv.org/abs/2410.03062v1 )

ライセンス: Link先を確認

Grant Wardle, Teo Susnjak,

(参考訳) 本稿では,マルチモーダル内の画像とテキストのシークエンシングが,大規模言語モデル(LLM)の推論性能にどのように影響するかを検討する。 3種類の商用LCMを用いて経験的評価を行った。以上の結果から,モダリティが提示される順序は,特に複雑度が変化するタスクにおいて,性能に大きく影響を及ぼす可能性が示唆された。単一の画像を含む単純なタスクに対して、モダリティシークエンシングは精度に明確な影響を及ぼした。しかし、複数の画像と複雑な推論ステップを含むより複雑なタスクでは、シークエンシングの効果が減少し、おそらくタスクの認知的要求が増大したためである。また,本研究は疑問/急激な構造の重要性も強調した。ネストおよびマルチステップの推論タスクでは、モダリティシークエンシングがモデルパフォーマンスを形成する上で重要な役割を果たした。 LLMは推論の初期の段階では優れていたが、以前の情報の再編成に苦慮し、トランスフォーマーアーキテクチャにおけるマルチホップ推論の課題を浮き彫りにした。このことは、モダリティの列と推論ステップの論理フローとの整合性は、モダリティ順序単独よりも重要であることを示唆している。これらの知見は、教育、医用画像、クロスモーダル・ラーニングといった分野にまたがる幅広い応用によって、マルチモーダル・プロンプト・デザインを改善する上で重要な意味を持つ。

This paper examines how the sequencing of images and text within multi-modal prompts influences the reasoning performance of large language models (LLMs). We performed empirical evaluations using three commercial LLMs. Our results demonstrate that the order in which modalities are presented can significantly affect performance, particularly in tasks of varying complexity. For simpler tasks involving a single image, modality sequencing had a clear impact on accuracy. However, in more complex tasks involving multiple images and intricate reasoning steps, the effect of sequencing diminished, likely due to the increased cognitive demands of the task. Our findings also highlight the importance of question/prompt structure. In nested and multi-step reasoning tasks, modality sequencing played a key role in shaping model performance. While LLMs excelled in the initial stages of reasoning, they struggled to re-incorporate earlier information, underscoring the challenges of multi-hop reasoning within transformer architectures. This suggests that aligning the sequence of modalities with the logical flow of reasoning steps is more critical than modality order alone. These insights offer valuable implications for improving multi-modal prompt design, with broader applications across fields such as education, medical imaging, and cross-modal learning.

翻訳日:2024-11-03 04:06:08 公開日:2024-10-04

# プログラミング入門科目における自然言語プロンプトタスクの統合

Integrating Natural Language Prompting Tasks in Introductory Programming Courses ( http://arxiv.org/abs/2410.03063v1 )

ライセンス: Link先を確認

Chris Kerslake, Paul Denny, David H Smith IV, James Prather, Juho Leinonen, Andrew Luxton-Reilly, Stephen MacNeil,

(参考訳) 入門プログラミングコースは、より複雑で興味深いプログラムに進む前に、マスター構文と基本的な構成を強調することが多い。このボトムアップのアプローチは、初心者にとってはイライラさせる可能性があり、問題の解決から焦点を移し、幅広い学生にとってコンピューティングの魅力を損なう可能性がある。コード生産のための生成AIの台頭は、ハイレベルなプロンプトの構築や自動生成されるコードの評価を含む、AIモデルとのインタラクションを通じて新しいスキルを育むことによって、これらの問題に部分的に対処する可能性がある。本経験報告では,6週間のモジュールで4つの実験室にまたがって実施された,イントロダクトリー・コースにおける2つのアクティベーションに焦点を当てた2つのアクティビティについて検討する。第一に、学生は自然言語のプロンプトを書き、構文上の問題解決を強調することで、計算問題を解く必要がある。 2つ目は、プロンプトとコードの関係を理解するために、提供されたフラグメントに相当するコードを生成するプロンプトを作成することである。コースの学生の多くは、プログラミングを学ぶのが難しいと報告しており、しばしば、構文やデバッグに関する不満を引用している。学習プログラムにおける自己報告の難しさは、期待通り、テストやプロジェクトといった従来のプログラミングアセスメントのパフォーマンスと強い逆関係があることがわかりました。しかし、自然言語タスクのパフォーマンスは、自己報告の難しさとあまり強く関連しておらず、異なるスキルをターゲットにしていることが示唆された。 AIコーディングモデルとコミュニケーションする方法を学ぶことは重要なスキルとなり、自然言語によるタスクの促進は幅広い学生にアピールする可能性がある。

Introductory programming courses often emphasize mastering syntax and basic constructs before progressing to more complex and interesting programs. This bottom-up approach can be frustrating for novices, shifting the focus away from problem solving and potentially making computing less appealing to a broad range of students. The rise of generative AI for code production could partially address these issues by fostering new skills via interaction with AI models, including constructing high-level prompts and evaluating code that is automatically generated. In this experience report, we explore the inclusion of two prompt-focused activities in an introductory course, implemented across four labs in a six-week module. The first requires students to solve computational problems by writing natural language prompts, emphasizing problem-solving over syntax. The second involves students crafting prompts to generate code equivalent to provided fragments, to foster an understanding of the relationship between prompts and code. Most of the students in the course had reported finding programming difficult to learn, often citing frustrations with syntax and debugging. We found that self-reported difficulty with learning programming had a strong inverse relationship with performance on traditional programming assessments such as tests and projects, as expected. However, performance on the natural language tasks was less strongly related to self-reported difficulty, suggesting they may target different skills. Learning how to communicate with AI coding models is becoming an important skill, and natural language prompting tasks may appeal to a broad range of students.

翻訳日:2024-11-03 04:06:08 公開日:2024-10-04

# コンピュータか、KVキャッシュをロードする?

Compute Or Load KV Cache? Why Not Both? ( http://arxiv.org/abs/2410.03065v1 )

ライセンス: Link先を確認

Shuowei Jin, Xueshen Liu, Qingzhao Zhang, Z. Morley Mao,

(参考訳) 近年のLLM(Large Language Models)の進歩により、コンテキストウィンドウのサイズが大幅に増加し、高度なアプリケーションを実現するとともに、特にプリフィルステージにおけるキー値(KV)キャッシュの計算といった計算オーバーヘッドも大幅に増大した。このシナリオでは、プリフィックスキャッシュがGPUパワーを節約するために登場し、ディスク上のKVキャッシュを節約し、複数のクエリで再利用する。しかしながら、従来のプレフィックスキャッシュメカニズムは、ディスクからGPUメモリへのKVキャッシュのロード速度がI/Oデバイスのスループットによってボトルネックになるため、大きなレイテンシに悩まされることが多い。長文プリフィルのレイテンシを最適化するために,双方向並列化KVキャッシュ生成戦略を採用した新しいKVキャッシュローダであるCakeを提案する。プリフィルタスクを受信すると、Cakeはプレフィックスキャッシュ位置から保存されたKVキャッシュを同時に動的にロードし、ローカルGPU上でKVキャッシュを計算し、利用可能な計算とI/O帯域幅リソースの利用を最大化する。さらに、Cakeは手動パラメータなしで様々なシステムステータスに自動的に適応する。チューニング様々なプロンプトデータセット、GPU、I/Oデバイスの実験において、Cakeは最大68.1%のTTFT(Time To First Token)削減を計算専用法と比較し、94.6%のTTFT削減をI/O専用法と比較した。

Recent advancements in Large Language Models (LLMs) have significantly increased context window sizes, enabling sophisticated applications but also introducing substantial computational overheads, particularly computing key-value (KV) cache in the prefill stage. Prefix caching has emerged to save GPU power in this scenario, which saves KV cache at disks and reuse them across multiple queries. However, traditional prefix caching mechanisms often suffer from substantial latency because the speed of loading KV cache from disks to GPU memory is bottlenecked by the throughput of I/O devices. To optimize the latency of long-context prefill, we propose Cake, a novel KV cache loader, which employs a bidirectional parallelized KV cache generation strategy. Upon receiving a prefill task, Cake simultaneously and dynamically loads saved KV cache from prefix cache locations and computes KV cache on local GPUs, maximizing the utilization of available computation and I/O bandwidth resources. Additionally, Cake automatically adapts to diverse system statuses without manual parameter. tuning. In experiments on various prompt datasets, GPUs, and I/O devices, Cake offers up to 68.1% Time To First Token (TTFT) reduction compare with compute-only method and 94.6% TTFT reduction compare with I/O-only method.

翻訳日:2024-11-03 04:06:08 公開日:2024-10-04

# FedCert:Federated Acertacy Certification

FedCert: Federated Accuracy Certification ( http://arxiv.org/abs/2410.03067v1 )

ライセンス: Link先を確認

Minh Hieu Nguyen, Huu Tien Nguyen, Trung Thanh Nguyen, Manh Duong Nguyen, Trong Nghia Hoang, Truong Thao Nguyen, Phi Le Nguyen,

(参考訳) フェデレートラーニング(FL)は、機械学習モデルを分散的にトレーニングするための強力なパラダイムとして登場し、クライアントにローカルデータを保持することでデータのプライバシを保存する。しかし、これらのモデルのクライアントにおけるデータ摂動に対する堅牢性を評価することは、依然として大きな課題である。従来の研究では、モデルの予測の一定の割合が入力データが摂動しても正しいことを保証し、認証精度に基づいて集中トレーニングにおけるモデルの有効性を評価してきた。しかし、これらの評価をFLに拡張するという課題は、未知のクライアントのローカルデータが原因で未解決のままである。そこで本研究では,FLシステムのロバスト性評価に向けた第一歩として,FedCertという手法を提案する。提案手法は,各クライアントの認証精度とクラス分布に基づいて,グローバルモデルの認証精度を近似する。さらに、実世界のシナリオにおけるデータの非独立分散(Non-IID)特性を考慮し、近似アルゴリズムの集約段階において、信頼性の高い精度を保証するためのクライアントグループ化アルゴリズムを導入する。理論的解析を通じて,FLシステムの堅牢性と信頼性を評価する上で,FedCertの有効性を示す。さらに,様々なシナリオにおけるCIFAR-10とCIFAR-100データセットの実験結果から,FedCertはベースライン法に比べて推定誤差を一貫して減少させることが示された。本研究は,FLシステムの堅牢性を評価するためのソリューションを提供し,分散学習の信頼性を高めるための今後の研究の基盤となる。ソースコードはhttps://github.com/thanhhff/FedCert/で入手できる。

Federated Learning (FL) has emerged as a powerful paradigm for training machine learning models in a decentralized manner, preserving data privacy by keeping local data on clients. However, evaluating the robustness of these models against data perturbations on clients remains a significant challenge. Previous studies have assessed the effectiveness of models in centralized training based on certified accuracy, which guarantees that a certain percentage of the model's predictions will remain correct even if the input data is perturbed. However, the challenge of extending these evaluations to FL remains unresolved due to the unknown client's local data. To tackle this challenge, this study proposed a method named FedCert to take the first step toward evaluating the robustness of FL systems. The proposed method is designed to approximate the certified accuracy of a global model based on the certified accuracy and class distribution of each client. Additionally, considering the Non-Independent and Identically Distributed (Non-IID) nature of data in real-world scenarios, we introduce the client grouping algorithm to ensure reliable certified accuracy during the aggregation step of the approximation algorithm. Through theoretical analysis, we demonstrate the effectiveness of FedCert in assessing the robustness and reliability of FL systems. Moreover, experimental results on the CIFAR-10 and CIFAR-100 datasets under various scenarios show that FedCert consistently reduces the estimation error compared to baseline methods. This study offers a solution for evaluating the robustness of FL systems and lays the groundwork for future research to enhance the dependability of decentralized learning. The source code is available at https://github.com/thanhhff/FedCert/.

翻訳日:2024-11-03 04:06:08 公開日:2024-10-04

# ソフトウェアアプリケーションのための対話型GDPR互換プライバシポリシ生成

Interactive GDPR-Compliant Privacy Policy Generation for Software Applications ( http://arxiv.org/abs/2410.03069v1 )

ライセンス: Link先を確認

Pattaraporn Sangaroonsilp, Hoa Khanh Dam, Omar Haggag, John Grundy,

(参考訳) ソフトウェアアプリケーションは、幅広いタスクやインタラクションの実行を支援するように設計されている。彼らは普及し、このデジタル時代において人々の生活に不可欠な役割を担っている。これらのソフトウェアアプリケーションを使用するには、ユーザは時々、個人情報を提供するように要求される。プライバシーは重要な関心事となり、世界中で多くのデータ保護規制が存在しているため、ソフトウェアアプリケーションは、ユーザーの個人情報の収集と処理方法を詳述したプライバシーポリシーをユーザーに提供しなければならない。本稿では,多種多様なソフトウェアアプリケーションに対する一般データ保護規則(GDPR)に関して,包括的かつ遵守可能なプライバシポリシを生成するアプローチを提案する。これをサポートするために、我々はまず、既存のプライバシーポリシー分析に基づくプライバシー条項のライブラリを構築した。そして、インタラクティブなルールベースのシステムを開発し、一連の質問をソフトウェア開発者に促し、その回答を使って、特定のソフトウェアアプリケーション用にカスタマイズされたプライバシポリシを生成しました。我々は、我々のアプローチによって生成されたプライバシーポリシーを可読性、完全性、カバレッジの観点から評価し、3つの既存のプライバシーポリシージェネレータと生成AIベースのツールによって生成されたプライバシーポリシーと比較した。評価結果から,我々のアプローチが生み出すプライバシポリシが最も完全かつ包括的であることを示唆した。

Software applications are designed to assist users in conducting a wide range of tasks or interactions. They have become prevalent and play an integral part in people's lives in this digital era. To use those software applications, users are sometimes requested to provide their personal information. As privacy has become a significant concern and many data protection regulations exist worldwide, software applications must provide users with a privacy policy detailing how their personal information is collected and processed. We propose an approach that generates a comprehensive and compliant privacy policy with respect to the General Data Protection Regulation (GDPR) for diverse software applications. To support this, we first built a library of privacy clauses based on existing privacy policy analysis. We then developed an interactive rule-based system that prompts software developers with a series of questions and uses their answers to generate a customised privacy policy for a given software application. We evaluated privacy policies generated by our approach in terms of readability, completeness and coverage and compared them to privacy policies generated by three existing privacy policy generators and a Generative AI-based tool. Our evaluation results show that the privacy policy generated by our approach is the most complete and comprehensive.

翻訳日:2024-11-03 04:06:08 公開日:2024-10-04

# FedMAC: クロスモーダル・アグリゲーションとコントラスト規則化によるフェデレーション学習における部分モダリティの欠如に対処する

FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Contrastive Regularization ( http://arxiv.org/abs/2410.03070v1 )

ライセンス: Link先を確認

Manh Duong Nguyen, Trung Thanh Nguyen, Huy Hieu Pham, Trong Nghia Hoang, Phi Le Nguyen, Thanh Trung Huynh,

(参考訳) Federated Learning(FL)は、分散データソースを使用して機械学習モデルをトレーニングする手法である。データをローカルに保存しながら、クライアントが共同で共有グローバルモデルを学ぶことによって、プライバシを保証する。しかしながら、ある機能やモダリティが利用できない、あるいは不完全なクライアントのデータセットで欠落したモダリティを扱う場合には、大きな課題が生じる。これまでの研究では、完全なモダリティの欠如の問題に対処してきたが、インスタンスレベルでのクライアント間の重大不均一性を考慮して、部分モダリティの欠如に対処できなかった。この課題に対処するために,FL に欠落した部分モダリティ条件下で欠落する多重モダリティに対処するために,FedMAC という新しいフレームワークを提案する。さらに,マルチモーダルな特徴の自明な集約を避けるために,潜在表現空間に制約を加えるために,コントラッシブベース正規化を導入する。実験の結果, 統計的不均一性を有する各種クライアント構成におけるFedMACの有効性が示され, 深刻な欠落シナリオにおいて, 最大26%のベースライン法を上回り, フェデレートシステムにおける部分的に欠落するモダリティの解決法としての可能性を強調した。

Federated Learning (FL) is a method for training machine learning models using distributed data sources. It ensures privacy by allowing clients to collaboratively learn a shared global model while storing their data locally. However, a significant challenge arises when dealing with missing modalities in clients' datasets, where certain features or modalities are unavailable or incomplete, leading to heterogeneous data distribution. While previous studies have addressed the issue of complete-modality missing, they fail to tackle partial-modality missing on account of severe heterogeneity among clients at an instance level, where the pattern of missing data can vary significantly from one sample to another. To tackle this challenge, this study proposes a novel framework named FedMAC, designed to address multi-modality missing under conditions of partial-modality missing in FL. Additionally, to avoid trivial aggregation of multi-modal features, we introduce contrastive-based regularization to impose additional constraints on the latent representation space. The experimental results demonstrate the effectiveness of FedMAC across various client configurations with statistical heterogeneity, outperforming baseline methods by up to 26% in severe missing scenarios, highlighting its potential as a solution for the challenge of partially missing modalities in federated systems.

翻訳日:2024-11-03 04:06:08 公開日:2024-10-04

# 拡散モデルを用いた多足歩行計画

Multi-Robot Motion Planning with Diffusion Models ( http://arxiv.org/abs/2410.03072v1 )

ライセンス: Link先を確認

Yorai Shaoul, Itamar Mishani, Shivam Vats, Jiaoyang Li, Maxim Likhachev,

(参考訳) 拡散モデルは、データから複雑なマルチモーダル動作を学ぶための幅広いロボット工学応用に成功している。しかし, 従来の研究は, 多ボット拡散モデル学習の複雑度が高いため, 単一ロボットと小規模環境に限られている。本論文では,単一ロボットデータのみを用いて,基礎となるデータ分布に適合する衝突のないマルチロボット軌道を生成する手法を提案する。我々のアルゴリズムであるMulti-robot Multi-model Planning Diffusion (MMD)は、学習した拡散モデルと古典的な探索に基づく手法を組み合わせることで、衝突制約下でのデータ駆動動作を生成する。さらに,単一拡散モデルがうまく一般化できない大規模環境において,複数の拡散モデルを構成する方法を示す。我々は,ロジスティクス環境に動機付けられた様々なシナリオにおいて,多数のロボットを計画する上でのアプローチの有効性を実証する。補足資料でビデオデモをご覧ください。

Diffusion models have recently been successfully applied to a wide range of robotics applications for learning complex multi-modal behaviors from data. However, prior works have mostly been confined to single-robot and small-scale environments due to the high sample complexity of learning multi-robot diffusion models. In this paper, we propose a method for generating collision-free multi-robot trajectories that conform to underlying data distributions while using only single-robot data. Our algorithm, Multi-robot Multi-model planning Diffusion (MMD), does so by combining learned diffusion models with classical search-based techniques -- generating data-driven motions under collision constraints. Scaling further, we show how to compose multiple diffusion models to plan in large environments where a single diffusion model fails to generalize well. We demonstrate the effectiveness of our approach in planning for dozens of robots in a variety of simulated scenarios motivated by logistics environments. View video demonstrations in our supplementary material, and our code at: https://github.com/yoraish/mmd.

翻訳日:2024-11-03 04:06:08 公開日:2024-10-04

# MetaOOD:OOD検出モデルの自動選択

MetaOOD: Automatic Selection of OOD Detection Models ( http://arxiv.org/abs/2410.03074v1 )

ライセンス: Link先を確認

Yuehan Qin, Yichi Zhang, Yi Nian, Xueying Ding, Yue Zhao,

(参考訳) 様々なタスクに対して、アウト・オブ・ディストリビューション(OOD)検出モデルを自動的に選択するにはどうすればよいのか? これは、特にオンライントランザクション、自律運転、リアルタイムの患者診断といった重要な領域において、データの分散シフトを特定することによって、オープンワールドアプリケーションの信頼性を維持するために不可欠である。多くのOOD検出方法が利用可能であるにもかかわらず、様々なタスクに対して最適なモデルを選択するという課題は、特に真理ラベルが欠如しているシナリオにおいて、ほとんど未探索のままである。本稿では,メタラーニングを利用してOOD検出モデルを自動的に選択する,最初のゼロショット・アン教師なしフレームワークであるMetaOODを紹介する。メタラーニングアプローチとして、MetaOODは、さまざまなベンチマークOODデータセットにわたる既存のメソッドの履歴パフォーマンスデータを活用することにより、テスト時にラベル付きデータを必要とせずに、新しいデータセットに適したモデルを効果的に選択することが可能になる。タスクの類似性をより正確に定量化するために、データセットと検出モデルの両方の特有のOOD特性をキャプチャする言語モデルに基づく埋め込みを導入する。また,11個のOOD検出モデルの中から24個のユニークなデータセットペアを選別して実験を行い,MetaOODが既存の手法を著しく上回っており,時間的オーバーヘッドが極端に大きいことを実証した。我々の結果はウィルコクソン統計試験によって検証され、MetaOODは確立されたOOD検出器や高度な教師なし選択法を含む11のベースラインの多様なグループを超越していることが示されている。

How can we automatically select an out-of-distribution (OOD) detection model for various underlying tasks? This is crucial for maintaining the reliability of open-world applications by identifying data distribution shifts, particularly in critical domains such as online transactions, autonomous driving, and real-time patient diagnosis. Despite the availability of numerous OOD detection methods, the challenge of selecting an optimal model for diverse tasks remains largely underexplored, especially in scenarios lacking ground truth labels. In this work, we introduce MetaOOD, the first zero-shot, unsupervised framework that utilizes meta-learning to automatically select an OOD detection model. As a meta-learning approach, MetaOOD leverages historical performance data of existing methods across various benchmark OOD datasets, enabling the effective selection of a suitable model for new datasets without the need for labeled data at the test time. To quantify task similarities more accurately, we introduce language model-based embeddings that capture the distinctive OOD characteristics of both datasets and detection models. Through extensive experimentation with 24 unique test dataset pairs to choose from among 11 OOD detection models, we demonstrate that MetaOOD significantly outperforms existing methods and only brings marginal time overhead. Our results, validated by Wilcoxon statistical tests, show that MetaOOD surpasses a diverse group of 11 baselines, including established OOD detectors and advanced unsupervised selection methods.

翻訳日:2024-11-03 03:56:19 公開日:2024-10-04

# Xにおける多言語トピック分類:データセットと解析

Multilingual Topic Classification in X: Dataset and Analysis ( http://arxiv.org/abs/2410.03075v1 )

ライセンス: Link先を確認

Dimosthenis Antypas, Asahi Ushio, Francesco Barbieri, Jose Camacho-Collados,

(参考訳) ソーシャルメディアのダイナミックな領域では、多様なトピックが日々議論され、言語境界を越えている。しかし、様々な言語にまたがる理解と分類の複雑さは、この多言語的多様性に苦しむトピックモデリングのような伝統的な手法において、依然として重要な課題である。本稿では,トピック分類を目的とした4言語(英語,スペイン語,日本語,ギリシャ語)のコンテンツを含む多言語データセットであるX-Topicを紹介する。私たちのデータセットには、ソーシャルメディアコンテンツに適した幅広いトピックが含まれており、クロス言語分析、堅牢な多言語モデルの開発、オンライン対話の研究を行う科学者や専門家にとって貴重なリソースとなっている。最後に、X-Topicを活用し、包括的な言語間および多言語分析を行い、現在の汎用言語モデルとドメイン固有言語モデルの能力を比較する。

In the dynamic realm of social media, diverse topics are discussed daily, transcending linguistic boundaries. However, the complexities of understanding and categorising this content across various languages remain an important challenge with traditional techniques like topic modelling often struggling to accommodate this multilingual diversity. In this paper, we introduce X-Topic, a multilingual dataset featuring content in four distinct languages (English, Spanish, Japanese, and Greek), crafted for the purpose of tweet topic classification. Our dataset includes a wide range of topics, tailored for social media content, making it a valuable resource for scientists and professionals working on cross-linguistic analysis, the development of robust multilingual models, and computational scientists studying online dialogue. Finally, we leverage X-Topic to perform a comprehensive cross-linguistic and multilingual analysis, and compare the capabilities of current general- and domain-specific language models.

翻訳日:2024-11-03 03:56:19 公開日:2024-10-04

# CommonIT: データ分割による大規模言語モデルの共通性を考慮したインストラクションチューニング

CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions ( http://arxiv.org/abs/2410.03077v1 )

ライセンス: Link先を確認

Jun Rao, Xuebo Liu, Lian Lian, Shengjun Cheng, Yunjie Liao, Min Zhang,

(参考訳) 命令チューニングにより、LLM(Large Language Models)はコマンドに準拠する能力を高めることができる。データミキシングに焦点を当てたほとんどの研究から切り離され、トレーニング中のデータサンプリングの観点からモデルの能力向上に重点を置いている。人間の学習プロセスからインスピレーションを得て,ひとつのトピックに焦点を合わせることで,類似のトピックに対するソリューションの習得がより容易になるように,CommonIT: Commonality-aware Instruction Tuningという,新しい指導チューニング戦略を導入する。具体的には、命令データセットを3つのメトリクス(Task, Embedding, Length)で異なるグループにクラスタ化する。各トレーニングのミニバッチ(パーティション)は、単一のグループからのデータのみで構成されており、ミニバッチ全体にわたるデータランダム性と、バッチ内のデータ類似性の両方をもたらす。 LLaMaモデルの厳密なテストは、ITデータセット(FLAN、CoT、Alpaca)とモデル(LLaMa2-7B、Qwen2-7B、LLaMa 13B、BLOOM 7B)を通じてLLMの命令追従能力を向上するCommonITの有効性を示す。 CommonITは、Longthメトリックによる一般ドメイン(知識、推論、多言語性、コーディングの平均スコア)の平均2.1\%、Taskメトリックによる特殊ドメイン(GSM、オープンファンクション、コード)平均5.2\%、 Embeddingメトリックによる特定のタスク(MMLU)平均3.8\%を一貫して向上させる。コードは \url{https://github.com/raojay7/CommonIT} で入手できる。

With instruction tuning, Large Language Models (LLMs) can enhance their ability to adhere to commands. Diverging from most works focusing on data mixing, our study concentrates on enhancing the model's capabilities from the perspective of data sampling during training. Drawing inspiration from the human learning process, where it is generally easier to master solutions to similar topics through focused practice on a single type of topic, we introduce a novel instruction tuning strategy termed CommonIT: Commonality-aware Instruction Tuning. Specifically, we cluster instruction datasets into distinct groups with three proposed metrics (Task, Embedding and Length). We ensure each training mini-batch, or "partition", consists solely of data from a single group, which brings about both data randomness across mini-batches and intra-batch data similarity. Rigorous testing on LLaMa models demonstrates CommonIT's effectiveness in enhancing the instruction-following capabilities of LLMs through IT datasets (FLAN, CoT, and Alpaca) and models (LLaMa2-7B, Qwen2-7B, LLaMa 13B, and BLOOM 7B). CommonIT consistently boosts an average improvement of 2.1\% on the general domain (i.e., the average score of Knowledge, Reasoning, Multilinguality and Coding) with the Length metric, and 5.2\% on the special domain (i.e., GSM, Openfunctions and Code) with the Task metric, and 3.8\% on the specific tasks (i.e., MMLU) with the Embedding metric. Code is available at \url{https://github.com/raojay7/CommonIT}.

翻訳日:2024-11-03 03:56:19 公開日:2024-10-04

# 1RSB-FullRSB遷移を持つ高分解能量子スピングラスモデル

Exactly solvable quantum spin-glass model with 1RSB-fullRSB transition ( http://arxiv.org/abs/2410.03079v1 )

ライセンス: Link先を確認

Naoto Shiraishi,

(参考訳) 横平均場型ランダム磁石を用いた新しい量子スピングラスモデル,シェリントン・カークパトリックモデルを導入する。パラメータ領域全体の自由エネルギーの正確な表現を厳密に導出する。得られた正確な解は、低温における1RSB-fullRSB転移の存在を示唆する。我々の手法は一般的な古典的スピンモデルに適用でき、任意の可解な古典的スピンモデルがその可解な量子モデルを持つことを示す。

We introduce a novel quantum spin-glass model, a Sherrington-Kirkpatrick model with a transverse mean-field type random magnet. We rigorously derive the exact expression of the free energy of this model at the entire parameter region. The obtained exact solution implies the existence of a 1RSB-fullRSB transition at low temperatures. Our technique can be applied to general classical spin models, telling that any solvable classical spin model has its solvable quantum counterpart.

翻訳日:2024-11-03 03:56:19 公開日:2024-10-04

# 安定拡散によるエッジ生成検出

Generative Edge Detection with Stable Diffusion ( http://arxiv.org/abs/2410.03080v1 )

ライセンス: Link先を確認

Caixia Zhou, Yaping Huang, Mochu Xiang, Jiahui Ren, Haibin Ling, Jing Zhang,

(参考訳) エッジ検出は一般的に、主に識別法によって対処されるピクセルレベルの分類問題と見なされる。近年、エッジ検出タスクにおいて、生成エッジ検出方法、特に拡散モデルに基づく解が初期化されている。大きな可能性にもかかわらず、タスク固有の設計モジュールの再トレーニングと多段階の推論は、より広範なアプリケーションを制限する。より詳しく調べると、その理由の一部は、広範囲に事前訓練された大規模モデル(安定拡散モデル)で符号化されたリッチな識別情報の探索不足にあると推測する。そこで我々は,事前学習した安定拡散モデルのポテンシャルを十分に活用して,GED(Generative Edge Detector)という新しい手法を提案する。我々のモデルは、事前訓練された安定拡散によって得られる豊富な高レベルかつ低レベルの事前知識により、特定のネットワーク設計なしで効率的に訓練および推論することができる。具体的には、遅延画像特徴写像を入力として、デノイングU-Netを微調整し、遅延エッジマップを直接予測することを提案する。さらに、エッジの主観性と曖昧さから、エッジの粒度をデノナイズドU-Netモデルに組み込んで制御可能かつ多様な予測を行う。さらに、複数の予測の相対的な粒度関係を確保するために、粒度正規化を考案する。我々は、複数のデータセットに対して広範な実験を行い、競争性能を達成する(BSDSテストデータセット上でのODSとOISの観点からは、0.870、0.880)。

Edge detection is typically viewed as a pixel-level classification problem mainly addressed by discriminative methods. Recently, generative edge detection methods, especially diffusion model based solutions, are initialized in the edge detection task. Despite great potential, the retraining of task-specific designed modules and multi-step denoising inference limits their broader applications. Upon closer investigation, we speculate that part of the reason is the under-exploration of the rich discriminative information encoded in extensively pre-trained large models (\eg, stable diffusion models). Thus motivated, we propose a novel approach, named Generative Edge Detector (GED), by fully utilizing the potential of the pre-trained stable diffusion model. Our model can be trained and inferred efficiently without specific network design due to the rich high-level and low-level prior knowledge empowered by the pre-trained stable diffusion. Specifically, we propose to finetune the denoising U-Net and predict latent edge maps directly, by taking the latent image feature maps as input. Additionally, due to the subjectivity and ambiguity of the edges, we also incorporate the granularity of the edges into the denoising U-Net model as one of the conditions to achieve controllable and diverse predictions. Furthermore, we devise a granularity regularization to ensure the relative granularity relationship of the multiple predictions. We conduct extensive experiments on multiple datasets and achieve competitive performance (\eg, 0.870 and 0.880 in terms of ODS and OIS on the BSDS test dataset).

翻訳日:2024-11-03 03:56:19 公開日:2024-10-04

# 品質データを用いたパラメータ制約言語モデルのスケーリング

Scaling Parameter-Constrained Language Models with Quality Data ( http://arxiv.org/abs/2410.03083v1 )

ライセンス: Link先を確認

Ernie Chang, Matteo Paltenghi, Yang Li, Pin-Jie Lin, Changsheng Zhao, Patrick Huber, Zechun Liu, Rastislav Rabatin, Yangyang Shi, Vikas Chandra,

(参考訳) 言語モデリングにおける法則のスケーリングは、伝統的にデータセットのサイズとモデルパラメータの関数としてトレーニング損失を定量化し、計算最適推定を提供するが、しばしばデータ品質がモデル一般化に与える影響を無視する。本稿では,パラメータ制約言語モデルの性能決定に重要な要因であると考えられる,原定式化におけるデータ品質の顕微鏡的ビュー – 効果的なトレーニングトークン – を提供することにより,従来のスケーリング法則の理解を拡大する。具体的には、提案された効果的なトレーニングトークンの用語を、簡単に計算可能な2つのテキスト指標の組み合わせとして定式化する。 (i)テキストの多様性二教師モデルによる合成性。テキストの品質,モデルサイズ,トレーニングトークン,および8つの推論タスク精度スコアに関連する定数を推定した。我々は,推定定数+0.83ピアソン相関を真の精度と比較し,データサンプリングや合成といったデータ品質の向上を目的とした,広く使われているデータ技術を含むシナリオで解析した。

Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model generalization. In this paper, we extend the conventional understanding of scaling law by offering a microscopic view of data quality within the original formulation -- effective training tokens -- which we posit to be a critical determinant of performance for parameter-constrained language models. Specifically, we formulate the proposed term of effective training tokens to be a combination of two readily-computed indicators of text: (i) text diversity and (ii) syntheticity as measured by a teacher model. We pretrained over $200$ models of 25M to 1.5B parameters on a diverse set of sampled, synthetic data, and estimated the constants that relate text quality, model size, training tokens, and eight reasoning task accuracy scores. We demonstrated the estimated constants yield +0.83 Pearson correlation with true accuracies, and analyzed it in scenarios involving widely-used data techniques such as data sampling and synthesis which aim to improve data quality.

翻訳日:2024-11-03 03:56:19 公開日:2024-10-04

# 散逸促進エンタングルメント生成

Dissipation-accelerated entanglement generation ( http://arxiv.org/abs/2410.03084v1 )

ライセンス: Link先を確認

Xiao-Wei Zheng, Jun-Cong Zheng, Xue-Feng Pan, Li-Hua Lin, Pei-Rong Han, Peng-Bo Li,

(参考訳) 散逸は通常、量子効果を観測し、それらを量子技術に利用するための負の因子とみなされる。本稿では,2つの結合量子ビット間の量子絡み合いの発生を,これらの1つの量子ビットに強い散逸チャネルを導入することで高速化する手法を提案する。最大絡み合いは、これらの2つの量子ビットの間に1つの励起を均等に分配することによって条件的に確立される。当初、励起が散逸量子ビットによって保持されるとき、散逸は量子ジャンプなしで量子状態軌道の励起再分配過程を加速させる。以上の結果から,最大エンタングルメントを条件付きで達成するのに要する時間は,散逸速度の増加とともに単調に減少することが示唆された。さらに、このスキームは、2つのエルミート量子ビットに1つのNH量子ビットが対称結合された3量子ビット系に対するW状態の生成を加速するために一般化できることを示す。

Dissipation is usually considered a negative factor for observing quantum effects and for harnessing them for quantum technologies. Here we propose a scheme for speeding up the generation of quantum entanglement between two coupled qubits by introducing a strong dissipation channel to one of these qubits. The maximal entanglement is conditionally established by evenly distributing a single excitation between these two qubits. When the excitation is initially held by the dissipative qubit, the dissipation accelerates the excitation re-distribution process for the quantum state trajectory without quantum jumps. Our results show that the time needed to conditionally attain the maximal entanglement is monotonously decreased as the dissipative rate is increased. We further show that this scheme can be generalized to accelerate the production of the W state for the three-qubit system, where one NH qubit is symmetrically coupled to two Hermitian qubits.

翻訳日:2024-11-03 03:56:19 公開日:2024-10-04

# 限定ラベル付きデータとトレーニング時間を用いた最適化プロキシ -半スーパービジョンベイズニューラルネットワークアプローチ

Optimization Proxies using Limited Labeled Data and Training Time -- A Semi-Supervised Bayesian Neural Network Approach ( http://arxiv.org/abs/2410.03085v1 )

ライセンス: Link先を確認

Parikshit Pareek, Kaarthik Sundar, Deepjyoti Deka, Sidhant Misra,

(参考訳) 制約のある最適化問題は、在庫管理や電力網などの様々なエンジニアリングシステムで発生する。しかし、不確実なパラメータでそのような最適化問題を何度も解くという要求は、重要な計算課題を生じさせる。本研究では,限定ラベル付きデータと限定モデルトレーニング時間の下での制約付き最適化問題を解決するため,ベイズニューラルネットワーク(BNN)を用いた学習手法を提案する。そこで本研究では,サンドイッチ方式で学習開始を行ない,コストを最小化するための教師付き学習ステップ(ラベル付きデータを用いた)と制約実現性を高めるための教師なし学習ステップ(ラベル付きデータを用いた)を交互に行う,実用的で複雑なシステムのための半教師付きBNNを提案する。教師なしと教師なしの両方のステップはベイズ的アプローチを用いており、確率的変分推論はベイズ的推論に近似的に用いられる。提案手法は,従来のBNNおよびディープニューラルネットワーク(DNN)アーキテクチャを,エネルギーネットワーク操作による重要な非凸制約最適化問題において,最大等式ギャップの最大10倍の削減,最適性と不等式(実現可能性)のギャップの半減,補正や投射のステップを必要とせず達成する。 BNNが最小の計算コストで後続サンプルを提供する能力を活用することにより、後続(SvP)スキームによる選択が、さらに10%以上の平等ギャップを削減できることを実証する。また、ラベル付きテストデータの少ない数で構築でき、他のアプリケーションにも容易に適応できる、厳密で実用的な確率的信頼境界も提供します。

Constrained optimization problems arise in various engineering system operations such as inventory management and electric power grids. However, the requirement to repeatedly solve such optimization problems with uncertain parameters poses a significant computational challenge. This work introduces a learning scheme using Bayesian Neural Networks (BNNs) to solve constrained optimization problems under limited labeled data and restricted model training times. We propose a semi-supervised BNN for this practical but complex regime, wherein training commences in a sandwiched fashion, alternating between a supervised learning step (using labeled data) for minimizing cost, and an unsupervised learning step (using unlabeled data) for enforcing constraint feasibility. Both supervised and unsupervised steps use a Bayesian approach, where Stochastic Variational Inference is employed for approximate Bayesian inference. We show that the proposed semi-supervised learning method outperforms conventional BNN and deep neural network (DNN) architectures on important non-convex constrained optimization problems from energy network operations, achieving up to a tenfold reduction in expected maximum equality gap and halving the optimality and inequality (feasibility) gaps, without requiring any correction or projection step. By leveraging the BNN's ability to provide posterior samples at minimal computational cost, we demonstrate that a Selection via Posterior (SvP) scheme can further reduce equality gaps by more than 10%. We also provide tight and practically meaningful probabilistic confidence bounds that can be constructed using a low number of labeled testing data and readily adapted to other applications.

翻訳日:2024-11-03 03:56:19 公開日:2024-10-04

# UNComp: 効率的な大規模言語モデル推論のための不確かさを意識した長期圧縮機

UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference ( http://arxiv.org/abs/2410.03090v1 )

ライセンス: Link先を確認

Jing Xiong, Jianghan Shen, Fanghua Ye, Chaofan Tao, Zhongwei Wan, Jianqiao Lu, Xun Wu, Chuanyang Zheng, Zhijiang Guo, Lingpeng Kong, Ngai Wong,

(参考訳) 大規模言語モデル(LLM)のデプロイは、特に長期コンテキスト推論において、高いメモリと計算要求のために困難である。キー値(KV)キャッシュは、以前計算されたキーと値の再利用によって推論を加速するが、メモリオーバーヘッドも大幅に増加する。既存のKVキャッシュ圧縮手法であるエヴィジョンやマージは、生成後にKVキャッシュを圧縮し、隠れた状態のエヴィジョンを見落とし、プリフィルステージの速度を向上することができない。さらに、異なるアテンションヘッドに均一な圧縮速度を適用すると、過剰な圧縮により、ニードル・イン・ア・ヘイスタックタスクにおいて重要な検索ヘッドを損なう可能性がある。本論文では,行列エントロピーを利用した不確実性を考慮した圧縮手法UNCompを提案する。レイヤとヘッドを不確実性に基づいてグループ化することで、UNCompは隠れた状態とKVキャッシュの両方を適応的に圧縮する。本手法はプリフィル段階で1.6倍の高速化を実現し,KVキャッシュを4.74%に削減し,スループットが6.4倍向上し,1.4倍の高速化を実現した。注目すべきは、ニードル・イン・ア・ヘイスタックのタスクでは、UNCompは元のサイズの9.38%に圧縮された場合でも、フルサイズのKVキャッシュより優れていることである。当社のアプローチは,既存のKVキャッシュスキームにシームレスに統合可能な,効率的かつトレーニング不要なグループクエリアテンションパラダイムを提供する。

Deploying large language models (LLMs) is challenging due to their high memory and computational demands, especially during long-context inference. While key-value (KV) caching accelerates inference by reusing previously computed keys and values, it also introduces significant memory overhead. Existing KV cache compression methods such as eviction and merging typically compress the KV cache after it is generated and overlook the eviction of hidden states, failing to improve the speed of the prefilling stage. Additionally, applying a uniform compression rate across different attention heads can harm crucial retrieval heads in needle-in-a-haystack tasks due to excessive compression. In this paper, we propose UNComp, an uncertainty-aware compression scheme that leverages matrix entropy to estimate model uncertainty across layers and heads at the token sequence level. By grouping layers and heads based on their uncertainty, UNComp adaptively compresses both the hidden states and the KV cache. Our method achieves a 1.6x speedup in the prefilling stage and reduces the KV cache to 4.74% of its original size, resulting in a 6.4x increase in throughput and a 1.4x speedup in inference with only a 1.41% performance loss. Remarkably, in needle-in-a-haystack tasks, UNComp outperforms the full-size KV cache even when compressed to 9.38% of its original size. Our approach offers an efficient, training-free Grouped-Query Attention paradigm that can be seamlessly integrated into existing KV cache schemes.

翻訳日:2024-11-03 03:56:19 公開日:2024-10-04

# AIレースダイナミクスのシミュレーションゲームからの戦略的洞察

Strategic Insights from Simulation Gaming of AI Race Dynamics ( http://arxiv.org/abs/2410.03092v1 )

ライセンス: Link先を確認

Ross Gruetzemacher, Shahar Avin, James Fox, Alexander K Saeri,

(参考訳) 我々は、AIの将来の可能性に関するシナリオ探索演習である"Intelligence Rising"の洞察を提示する。 4年間に43試合を監督してきたファシリテーターの経験に基づいて,ゲームプレイ中に観察された繰り返しパターン,戦略,意思決定過程を照明する。このシミュレーション環境でのAI開発軌跡に関する重要な戦略的考察は、AI人種の不安定化、破滅的なリスク軽減における国際協力の重要な役割、企業と国家の利益を連携させることの課題、AI能力の迅速かつ変革的な変化の可能性などである。私たちは、このゲームがAIガバナンスに固有の複雑さと不確実性に参加者をさらけ出すのに効果的だと信じている場所を強調します。主要なゲームプレイのテーマは、国際協定の出現、そのような合意の堅牢性への挑戦、AI開発におけるサイバーセキュリティの重要な役割、予期せぬ危機がAIの軌道を劇的に変える可能性である。これらの洞察を文書化することによって、私たちは、政策立案者、業界リーダー、そしてAI開発とガバナンスの複雑な環境をナビゲートする研究者に貴重な監視を提供することを目指しています。

We present insights from "Intelligence Rising", a scenario exploration exercise about possible AI futures. Drawing on the experiences of facilitators who have overseen 43 games over a four-year period, we illuminate recurring patterns, strategies, and decision-making processes observed during gameplay. Our analysis reveals key strategic considerations about AI development trajectories in this simulated environment, including: the destabilising effects of AI races, the crucial role of international cooperation in mitigating catastrophic risks, the challenges of aligning corporate and national interests, and the potential for rapid, transformative change in AI capabilities. We highlight places where we believe the game has been effective in exposing participants to the complexities and uncertainties inherent in AI governance. Key recurring gameplay themes include the emergence of international agreements, challenges to the robustness of such agreements, the critical role of cybersecurity in AI development, and the potential for unexpected crises to dramatically alter AI trajectories. By documenting these insights, we aim to provide valuable foresight for policymakers, industry leaders, and researchers navigating the complex landscape of AI development and governance.

翻訳日:2024-11-03 03:56:19 公開日:2024-10-04

# 絡み合いによる証明可能で堅牢な量子学習の利点

Entanglement-induced provable and robust quantum learning advantages ( http://arxiv.org/abs/2410.03094v1 )

ライセンス: Link先を確認

Haimeng Zhao, Dong-Ling Deng,

(参考訳) 量子コンピューティングは、機械学習の強化、高速化、革新のための非並列ポテンシャルを持っている。しかし、量子学習の優位性の明白な実証は、今のところ達成されていない。ここでは,従来の機械学習モデルと比較して,表現性,推論速度,トレーニング効率の面で,ノイズロストな非条件量子学習の優位性を厳格に確立する。量子絡み合いは、非ローカル機械学習タスクで必要とされる通信を減らすために用いられる。特に、エンタングルメント資源を用いた変動パラメータの一定数の量子モデルを用いて、単位精度で解くことができる完全古典的タスクを設計する。さらに、量子モデルは一定時間で訓練でき、多くのサンプルは問題の大きさに逆比例することを示した。この利点は、一定偏極雑音に対して頑健であることを示す。シミュレーションにより,従来のモデルではサイズが大きくなるにつれて性能が向上するが,オーバーフィッティングに悩まされることを示した。オーバーフィッティング問題によって強化された定数対線形分離により、比較的小さなシステムサイズで量子上の優位性を示すことができる。我々は,量子古典的学習分離法であるIonQ Ariaの数値シミュレーションとトラップイオン実験を併用して実証した。この結果は,現在ノイズの多い中間規模量子デバイスを用いた実用的な応用において,量子学習の優位性を実証するための貴重なガイドを提供する。

Quantum computing holds the unparalleled potentials to enhance, speed up or innovate machine learning. However, an unambiguous demonstration of quantum learning advantage has not been achieved so far. Here, we rigorously establish a noise-robust, unconditional quantum learning advantage in terms of expressivity, inference speed, and training efficiency, compared to commonly-used classical machine learning models. Our proof is information-theoretic and pinpoints the origin of this advantage: quantum entanglement can be used to reduce the communication required by non-local machine learning tasks. In particular, we design a fully classical task that can be solved with unit accuracy by a quantum model with a constant number of variational parameters using entanglement resources, whereas commonly-used classical models must scale at least linearly with the size of the task to achieve a larger-than-exponentially-small accuracy. We further show that the quantum model can be trained with constant time and a number of samples inversely proportional to the problem size. We prove that this advantage is robust against constant depolarization noise. We show through numerical simulations that even though the classical models can have improved performance as their sizes are increased, they would suffer from overfitting. The constant-versus-linear separation, bolstered by the overfitting problem, makes it possible to demonstrate the quantum advantage with relatively small system sizes. We demonstrate, through both numerical simulations and trapped-ion experiments on IonQ Aria, the desired quantum-classical learning separation. Our results provide a valuable guide for demonstrating quantum learning advantages in practical applications with current noisy intermediate-scale quantum devices.

翻訳日:2024-11-03 03:56:19 公開日:2024-10-04

# テキストベースとドラッグベースを組み合わせた高精度・フレキシブルな画像編集

Combing Text-based and Drag-based Editing for Precise and Flexible Image Editing ( http://arxiv.org/abs/2410.03097v1 )

ライセンス: Link先を確認

Ziqi Jiang, Zhen Wang, Long Chen,

(参考訳) 正確で柔軟な画像編集は、コンピュータビジョンの基本的な課題である。修正された領域に基づいて、ほとんどの編集方法は、大域的な編集と局所的な編集の2つのタイプに分けられる。本稿では,テキストベースの編集とドラッグベースの編集という2つの一般的な編集手法を選択し,その欠点を解析する。具体的には、テキストベースのメソッドは望まれる修正を正確に記述できないことが多いが、ドラッグベースのメソッドは曖昧さに悩まされている。これらの問題に対処するため, 拡散モデル上でテキストとドラッグ信号を組み合わせて, 正確かつ曖昧な操作を行う新しい画像編集法である \textbf{CLIPDrag} を提案した。これら2つの信号を完全に活用するために、テキスト信号をグローバルガイダンスとして扱い、ドラッグポイントをローカル情報として扱う。そこで本研究では,CLIPのような学習済み言語ビジョンモデルを適用することで,テキスト信号を既存のドラッグベース手法に統合する,新たなグローバルな動作監視手法を提案する。さらに,CLIPDragにおける緩やかな収束の問題にも対処し,ドラッグポイントを正しい方向に移動させる高速な点追跡手法を提案する。大規模な実験では、CLIPDragは既存の単一のドラッグベースのメソッドやテキストベースのメソッドよりも優れています。

Precise and flexible image editing remains a fundamental challenge in computer vision. Based on the modified areas, most editing methods can be divided into two main types: global editing and local editing. In this paper, we choose the two most common editing approaches (ie text-based editing and drag-based editing) and analyze their drawbacks. Specifically, text-based methods often fail to describe the desired modifications precisely, while drag-based methods suffer from ambiguity. To address these issues, we proposed \textbf{CLIPDrag}, a novel image editing method that is the first to combine text and drag signals for precise and ambiguity-free manipulations on diffusion models. To fully leverage these two signals, we treat text signals as global guidance and drag points as local information. Then we introduce a novel global-local motion supervision method to integrate text signals into existing drag-based methods by adapting a pre-trained language-vision model like CLIP. Furthermore, we also address the problem of slow convergence in CLIPDrag by presenting a fast point-tracking method that enforces drag points moving toward correct directions. Extensive experiments demonstrate that CLIPDrag outperforms existing single drag-based methods or text-based methods.

翻訳日:2024-11-03 03:56:19 公開日:2024-10-04

# CoCoHD:議会委員会、データセットを聴取

CoCoHD: Congress Committee Hearing Dataset ( http://arxiv.org/abs/2410.03099v1 )

ライセンス: Link先を確認

Arnav Hiray, Yunsong Liu, Mingxiao Song, Agam Shah, Sudheer Chava,

(参考訳) アメリカ議会の公聴会は国民経済と社会の織物に大きな影響を与え、個人の生活に影響を及ぼした。その重要性にもかかわらず、これらの談話を分析するための包括的なデータセットが欠如している。これに対応するために、1997年から2024年までの86の委員会における聴聞を32,697レコードでカバーする連邦議会聴聞データセット(CoCoHD)を提案する。このデータセットは、医療、LGBTQ+の権利、気候正義といった重要な問題に関する政策言語の研究を可能にする。化石燃料消費に関するエネルギー商務委員会(Energy and Commerce Committee)のスタンスを分析し,1000件のエネルギー関連文に関するケーススタディでその可能性を実証した。事前学習した言語モデルを微調整することにより、各聴力に対するエネルギー関連尺度を作成する。市場分析の結果,CoCoHDを用いた自然言語分析がエネルギーセクターのトレンドを予測・強調できることがわかった。

U.S. congressional hearings significantly influence the national economy and social fabric, impacting individual lives. Despite their importance, there is a lack of comprehensive datasets for analyzing these discourses. To address this, we propose the Congress Committee Hearing Dataset (CoCoHD), covering hearings from 1997 to 2024 across 86 committees, with 32,697 records. This dataset enables researchers to study policy language on critical issues like healthcare, LGBTQ+ rights, and climate justice. We demonstrate its potential with a case study on 1,000 energy-related sentences, analyzing the Energy and Commerce Committee's stance on fossil fuel consumption. By fine-tuning pre-trained language models, we create energy-relevant measures for each hearing. Our market analysis shows that natural language analysis using CoCoHD can predict and highlight trends in the energy sector.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# YouTubeのオートコンプリート提案における人種ステレオタイプの検討

Examining Racial Stereotypes in YouTube Autocomplete Suggestions ( http://arxiv.org/abs/2410.03102v1 )

ライセンス: Link先を確認

Eunbin Ha, Haein Kong, Shagun Jhaver,

(参考訳) Autocompleteは、ユーザ入力に基づいてクエリを予測し、潜在的に関連性のある提案のセットにユーザーを誘導する人気のある検索機能である。本研究では、YouTubeのオートコンプリートが、人種情報を探究するユーザーのための情報ソースとしてどのように機能するかを検討する。本研究では、4つの人種グループに関する入力クエリに対する自動完全提案のアルゴリズム出力監査を行い、それらが具現化するステレオタイプについて検討する。批判的談話分析を用いて、人種的偏見が現れる5つの主要な社会文化的文脈(外観、能力、文化、社会的平等、マナー)を同定する。以上の結果から,我々の収集したオートコンプリートにおける差別と人種間緊張の集合的証拠が示され,他の人種的マイノリティの潜在的なリスクを浮き彫りにした。我々は、コンテンツモデレーションポリシーの設計と、検索アウトプットにおけるこれらのバイアスに対処するための実施において、緊急のイノベーションを求めている。

Autocomplete is a popular search feature that predicts queries based on user input and guides users to a set of potentially relevant suggestions. In this study, we examine how YouTube autocompletes serve as an information source for users exploring information about race. We perform an algorithm output audit of autocomplete suggestions for input queries about four racial groups and examine the stereotypes they embody. Using critical discourse analysis, we identify five major sociocultural contexts in which racial biases manifest -- Appearance, Ability, Culture, Social Equity, and Manner. Our results show evidence of aggregated discrimination and interracial tensions in the autocompletes we collected and highlight their potential risks in othering racial minorities. We call for urgent innovations in content moderation policy design and enforcement to address these biases in search outputs.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# 水平長予測:ルックアヘッド計画によるコード生成の中間機能向上

Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning ( http://arxiv.org/abs/2410.03103v1 )

ライセンス: Link先を確認

Yifeng Ding, Hantian Ding, Shiqi Wang, Qing Sun, Varun Kumar, Zijian Wang,

(参考訳) Fill-in-the-Middle(FIM)は、コード言語モデルに不可欠なものとなり、左と右の両方のコンテキストに与えられた不足コードの生成を可能にした。しかし、現在のFIMトレーニングパラダイムは、元のトレーニングシーケンスをリオーダーし、次に通常の次の学習予測(NTP)を実行することで、しばしば、周囲のコンテキストとスムーズに整合したコンテンツを生成するのに苦労するモデルに繋がる。重要なことは、既存の作業はこの弱点を回避するためにルールベースの後処理に依存しているが、そのような方法は、制約のあるデータセット固有の仮定に依存するため、オープンドメインのコード補完タスクで実際に使用することはできない(例えば、基底真実と同じ数の行を生成する)。さらに、これらの非現実的な仮定なしに、FIMタスクのモデル性能は著しく低下する。 NTPだけでは、コード入力を成功させる重要な要因である、遠くの状況で効率的な計画条件を学習するモデルには不十分である、という仮説を立てる。これを解決するために,各ステップで残るミドルトークンの数(すなわち水平長)をモデルに予測させる新たなトレーニング目標であるHorizon-Length Prediction (HLP)を提案する。 HLPはルックアヘッド計画によってFIMを前進させ、データセット固有の後処理に頼ることなく、任意の左右コンテキストの入力境界を本質的に学習することを可能にする。異なるモデルとサイズで評価したところ、HLPはファイルレベルやリポジトリレベル、非現実的なポストプロセッシング手法を使わずに、様々なベンチマークでFIM性能を最大24%向上させることがわかった。さらに、HLPによる計画能力の向上により、コード推論におけるモデルパフォーマンスが向上する。重要なのは、HLPは無視可能なトレーニングオーバーヘッドと追加の推論コストのみを発生させ、現実のシナリオにおける実用性を保証することだ。

Fill-in-the-Middle (FIM) has become integral to code language models, enabling generation of missing code given both left and right contexts. However, the current FIM training paradigm, which reorders original training sequences and then performs regular next-token prediction (NTP), often leads to models struggling to generate content that aligns smoothly with the surrounding context. Crucially, while existing works rely on rule-based post-processing to circumvent this weakness, such methods are not practically usable in open-domain code completion tasks as they depend on restrictive, dataset-specific assumptions (e.g., generating the same number of lines as in the ground truth). Moreover, model performance on FIM tasks deteriorates significantly without these unrealistic assumptions. We hypothesize that NTP alone is insufficient for models to learn effective planning conditioned on the distant right context, a critical factor for successful code infilling. To overcome this, we propose Horizon-Length Prediction (HLP), a novel training objective that teaches models to predict the number of remaining middle tokens (i.e., horizon length) at each step. HLP advances FIM with lookahead planning, enabling models to inherently learn infilling boundaries for arbitrary left and right contexts without relying on dataset-specific post-processing. Our evaluation across different models and sizes shows that HLP significantly improves FIM performance by up to 24% relatively on diverse benchmarks, across file-level and repository-level, and without resorting to unrealistic post-processing methods. Furthermore, the enhanced planning capability gained through HLP boosts model performance on code reasoning. Importantly, HLP only incurs negligible training overhead and no additional inference cost, ensuring its practicality for real-world scenarios.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# Mamba in Vision: 技術と応用に関する総合的な調査

Mamba in Vision: A Comprehensive Survey of Techniques and Applications ( http://arxiv.org/abs/2410.03105v1 )

ライセンス: Link先を確認

Md Maklachur Rahman, Abdullah Aman Tutul, Ankur Nath, Lamyanba Laishram, Soon Ki Jung, Tracy Hammond,

(参考訳) Mambaは、コンピュータビジョンにおいて、畳み込みニューラルネットワーク(CNN)とビジョントランスフォーマー(ViT)が直面する課題を克服するための、新しいアプローチとして登場した。 CNNは局所的な特徴の抽出に長けているが、複雑なアーキテクチャ変更なしに長距離依存関係をキャプチャするのに苦労することが多い。対照的に、ViTはグローバルな関係を効果的にモデル化するが、自己認識機構の二次的な複雑さのために高い計算コストに悩まされる。 MambaはSelective Structured State Space Modelsを活用して、線形計算の複雑さで長距離依存を効果的に捉えることで、これらの制限に対処する。本調査では,Mambaモデルのユニークなコントリビューション,計算的メリット,応用について分析し,課題と今後の研究方向性を明らかにする。コンピュータビジョンにおけるMambaモデルの理解と成長を促進する基盤となるリソースを提供する。この作業の概要はhttps://github.com/maklachur/Mamba-in-Computer-Vision.comで確認できる。

Mamba is emerging as a novel approach to overcome the challenges faced by Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in computer vision. While CNNs excel at extracting local features, they often struggle to capture long-range dependencies without complex architectural modifications. In contrast, ViTs effectively model global relationships but suffer from high computational costs due to the quadratic complexity of their self-attention mechanisms. Mamba addresses these limitations by leveraging Selective Structured State Space Models to effectively capture long-range dependencies with linear computational complexity. This survey analyzes the unique contributions, computational benefits, and applications of Mamba models while also identifying challenges and potential future research directions. We provide a foundational resource for advancing the understanding and growth of Mamba models in computer vision. An overview of this work is available at https://github.com/maklachur/Mamba-in-Computer-Vision.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# MBDS:グラフネットワークシミュレータのためのマルチボディダイナミクスシミュレーションデータセット

MBDS: A Multi-Body Dynamics Simulation Dataset for Graph Networks Simulators ( http://arxiv.org/abs/2410.03107v1 )

ライセンス: Link先を確認

Sheng Yang, Fengge Wu, Junsuo Zhao,

(参考訳) 物理世界の構造と事象をモデル化することは、ニューラルネットワークの基本的な目的である。様々なアプローチの中で、グラフネットワークシミュレータ(GNS)は、計算コストが低く、精度が高いため、物理現象をモデル化する主要な手法として登場した。物理シミュレーション技術のトレーニングと評価に使用されるデータセットは、典型的には研究者自身によって生成され、しばしばデータ量と品質が制限される。これにより,これらの手法の性能を正確に評価する上での課題が生じる。これに対応して、1D、2D、3Dシーンを含む高品質な物理シミュレーションデータセットを構築した。さらに、我々の研究は8つの完全なシーンを開発し、データセットの包括性を大幅に向上させることで、自分自身を区別する。私たちのデータセットの重要な特徴は、物理世界のより現実的なシミュレーションを促進する、正確な多体ダイナミクスを取り入れることである。高品質なデータセットを用いて,既存のGNS手法の体系的評価を行った。私たちのデータセットはhttps://github.com/Sherlocktein/MBDSでダウンロード可能です。

Modeling the structure and events of the physical world constitutes a fundamental objective of neural networks. Among the diverse approaches, Graph Network Simulators (GNS) have emerged as the leading method for modeling physical phenomena, owing to their low computational cost and high accuracy. The datasets employed for training and evaluating physical simulation techniques are typically generated by researchers themselves, often resulting in limited data volume and quality. Consequently, this poses challenges in accurately assessing the performance of these methods. In response to this, we have constructed a high-quality physical simulation dataset encompassing 1D, 2D, and 3D scenes, along with more trajectories and time-steps compared to existing datasets. Furthermore, our work distinguishes itself by developing eight complete scenes, significantly enhancing the dataset's comprehensiveness. A key feature of our dataset is the inclusion of precise multi-body dynamics, facilitating a more realistic simulation of the physical world. Utilizing our high-quality dataset, we conducted a systematic evaluation of various existing GNS methods. Our dataset is accessible for download at https://github.com/Sherlocktein/MBDS, offering a valuable resource for researchers to enhance the training and evaluation of their methodologies.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# 確率力学系学習のための学習自由条件拡散モデル

A Training-Free Conditional Diffusion Model for Learning Stochastic Dynamical Systems ( http://arxiv.org/abs/2410.03108v1 )

ライセンス: Link先を確認

Yanfang Liu, Yuan Chen, Dongbin Xiu, Guannan Zhang,

(参考訳) 本研究では,未知確率微分方程式(SDE)をデータを用いて学習するための学習自由条件拡散モデルを提案する。提案手法は、スコアベース拡散モデルを用いて確率フローマップを近似することにより、SDEをモデル化するための計算効率と精度の重要な課題に対処する。既存の手法とは異なり、この手法は解析的に導出された閉形式正確なスコア関数に基づいており、これは軌道データを用いてモンテカルロ法によって効率的に推定することができ、スコア関数を学ぶためにニューラルネットワークのトレーニングを不要にする。対応する逆常微分方程式を解くことでラベル付きデータを生成することにより、フローマップの教師あり学習を可能にする。線形系,非線形系,多次元系を含む多種多様なSDE型に対する大規模数値実験により,本手法の汎用性と有効性を示す。学習されたモデルは、未知の確率系の短期的および長期的挙動を予測し、しばしばドリフトと拡散係数を推定する際に、GANのようなベースライン法を上回っている。

This study introduces a training-free conditional diffusion model for learning unknown stochastic differential equations (SDEs) using data. The proposed approach addresses key challenges in computational efficiency and accuracy for modeling SDEs by utilizing a score-based diffusion model to approximate their stochastic flow map. Unlike the existing methods, this technique is based on an analytically derived closed-form exact score function, which can be efficiently estimated by Monte Carlo method using the trajectory data, and eliminates the need for neural network training to learn the score function. By generating labeled data through solving the corresponding reverse ordinary differential equation, the approach enables supervised learning of the flow map. Extensive numerical experiments across various SDE types, including linear, nonlinear, and multi-dimensional systems, demonstrate the versatility and effectiveness of the method. The learned models exhibit significant improvements in predicting both short-term and long-term behaviors of unknown stochastic systems, often surpassing baseline methods like GANs in estimating drift and diffusion coefficients.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# LoRC: プログレッシブ圧縮戦略によるLDMKVキャッシュの低ランク圧縮

LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy ( http://arxiv.org/abs/2410.03111v1 )

ライセンス: Link先を確認

Rongzhi Zhang, Kuang Wang, Liyuan Liu, Shuohang Wang, Hao Cheng, Chao Zhang, Yelong Shen,

(参考訳) キーバリュー(KV)キャッシュは、トランスフォーマーベースの自己回帰型大言語モデル(LLM)を提供する上で重要なコンポーネントであり、以前計算されたKVベクトルを格納することでより高速な推論を可能にする。しかし、メモリ消費はシーケンス長とバッチサイズと線形にスケールし、LLMデプロイメントにおいて大きなボトルネックとなる。この問題を軽減するための既存のアプローチとしては、(1) 事前訓練されたLCMには適さない広範囲なパラメータチューニングを必要とするアップサイクリング段階に統合された効率的なアテンションバリアント、(2) テスト時のKVキャッシュ圧縮、主に層間依存関係を見落とし、タスク固有のトークン消去ポリシーがある。本稿では,KVキャッシュ圧縮に対する直交的アプローチを提案する。そこで我々は,KV重量行列の低ランク近似を提案し,モデル再構成なしで既存のトランスフォーマーベースLLMとのプラグイン統合を実現する。重みレベルでKVキャッシュを効果的に圧縮するために、我々は階層的に感度を調整し、深層ネットワークにおける圧縮エラーの蓄積に関する理論的解析によって支持されるプログレッシブ圧縮戦略を導入する。本手法は,テスト段階におけるアップサイクリング段階のモデルチューニングやタスク固有のプロファイリングを伴わずに機能するように設計されている。 LLaMAモデルによる多種多様なタスクにわたる8Bから70Bパラメータの大規模な実験により、我々のアプローチは性能を維持しながらGPUメモリのフットプリントを大幅に削減することを示した。

The Key-Value (KV) cache is a crucial component in serving transformer-based autoregressive large language models (LLMs), enabling faster inference by storing previously computed KV vectors. However, its memory consumption scales linearly with sequence length and batch size, posing a significant bottleneck in LLM deployment. Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages, which requires extensive parameter tuning thus unsuitable for pre-trained LLMs; (2) KV cache compression at test time, primarily through token eviction policies, which often overlook inter-layer dependencies and can be task-specific. This paper introduces an orthogonal approach to KV cache compression. We propose a low-rank approximation of KV weight matrices, allowing for plug-in integration with existing transformer-based LLMs without model retraining. To effectively compress KV cache at the weight level, we adjust for layerwise sensitivity and introduce a progressive compression strategy, which is supported by our theoretical analysis on how compression errors accumulate in deep networks. Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages. Extensive experiments with LLaMA models ranging from 8B to 70B parameters across various tasks show that our approach significantly reduces the GPU memory footprint while maintaining performance.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# X-ALMA:プラグイン&プレイモジュールと大規模翻訳における適応的拒絶

X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale ( http://arxiv.org/abs/2410.03115v1 )

ライセンス: Link先を確認

Haoran Xu, Kenton Murray, Philipp Koehn, Hieu Hoang, Akiko Eriguchi, Huda Khayrallah,

(参考訳) 大規模言語モデル(LLM)は、様々なNLPタスクで顕著な成功を収めてきたが、英語中心の事前学習と限定的な多言語データにより、主に英語に焦点を当てている。一部の多言語 LLM は数百の言語をサポートしていると主張しているが、モデルでは中級言語と低級言語の高品質な応答が得られず、不均衡な性能は英語や中国語のような高水準の言語に大きく依存している。本稿では,多言語機械翻訳タスクに焦点をあてて,言語数よりも品質を優先し,資源レベルに関わらず,50言語にまたがるトップレベルパフォーマンスを保証することを約束するモデルであるX-ALMAを導入する。 X-ALMAは、COMET-22に従って、FLORESおよびWMT'23テストデータセット上の全ての翻訳方向において、Aya-101やAya-23のような最先端のオープンソース多言語LLMを超越している。これは、訓練中の言語競合を防止するためのプラグアンドプレイ言語固有のモジュールアーキテクチャと、翻訳性能を最大化するための新しい最適化手法を備えた、慎重に設計されたトレーニングレギュレーションによって達成される。学習体制の最終段階において,提案した適応的推論優先最適化(ARPO)は,翻訳タスクにおける既存の選好最適化手法を超越している。

Large language models (LLMs) have achieved remarkable success across various NLP tasks, yet their focus has predominantly been on English due to English-centric pre-training and limited multilingual data. While some multilingual LLMs claim to support for hundreds of languages, models often fail to provide high-quality response for mid- and low-resource languages, leading to imbalanced performance heavily skewed in favor of high-resource languages like English and Chinese. In this paper, we prioritize quality over scaling number of languages, with a focus on multilingual machine translation task, and introduce X-ALMA, a model designed with a commitment to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels. X-ALMA surpasses state-of-the-art open-source multilingual LLMs, such as Aya-101 and Aya-23, in every single translation direction on the FLORES and WMT'23 test datasets according to COMET-22. This is achieved by plug-and-play language-specific module architecture to prevent language conflicts during training and a carefully designed training regimen with novel optimization methods to maximize the translation performance. At the final stage of training regimen, our proposed Adaptive Rejection Preference Optimization (ARPO) surpasses existing preference optimization methods in translation tasks.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# ProcBench: マルチステップ推論と追従手順のベンチマーク

ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure ( http://arxiv.org/abs/2410.03117v1 )

ライセンス: Link先を確認

Ippei Fujisawa, Sensho Nobe, Hiroki Seto, Rina Onda, Yoshiaki Uchida, Hiroki Ikoma, Pei-Chun Chien, Ryota Kanai,

(参考訳) 推論は幅広い知的活動の中心であり、大規模言語モデル(LLM)の能力は進歩を続けているが、推論タスクのパフォーマンスは依然として限られている。推論のプロセスとメカニズムはまだ完全には理解されていないが、重要な要素は経路探索、関連する知識の選択、多段階推論である。問題はこれらの成分の合成によって解決される。本稿では,多段階推論の直接評価という,推論能力の特定の側面に焦点を当てたベンチマークを提案する。この目的のために,経路探索と暗黙的知識利用を大きく排除することで,多段階推論が特に焦点を絞った特別な推論タスクを設計する。我々のデータセットは、明示的な指示とそれに対応する質問のペアで構成されており、質問の解決に必要な手順は、その指示の中で完全に詳細に記述されている。この設定により、モデルは与えられた指示に従うだけで問題を解決することができる。各ステップで様々なステップの解決と応答評価を必要とする問題を構築することにより、最先端のLCMの指示に従う能力の徹底的な評価を可能にする。評価の堅牢性を確保するために、我々は複数の異なるタスクを含む。さらに,タスク間の精度の比較,ステップアウェアなメトリクスの利用,複雑性の別々に定義された尺度の適用により,タスクの推論におけるLLMの機能と限界に関する洞察を提供する実験を行う。本研究は,LSMの発達に重要な意味を持ち,今後の推論能力向上研究の分野に注目する。データセットは \url{https://huggingface.co/datasets/ifujisawa/procbench} で、コードも \url{https://github.com/ifujisawa/proc-bench} で利用可能です。

Reasoning is central to a wide range of intellectual activities, and while the capabilities of large language models (LLMs) continue to advance, their performance in reasoning tasks remains limited. The processes and mechanisms underlying reasoning are not yet fully understood, but key elements include path exploration, selection of relevant knowledge, and multi-step inference. Problems are solved through the synthesis of these components. In this paper, we propose a benchmark that focuses on a specific aspect of reasoning ability: the direct evaluation of multi-step inference. To this end, we design a special reasoning task where multi-step inference is specifically focused by largely eliminating path exploration and implicit knowledge utilization. Our dataset comprises pairs of explicit instructions and corresponding questions, where the procedures necessary for solving the questions are entirely detailed within the instructions. This setup allows models to solve problems solely by following the provided directives. By constructing problems that require varying numbers of steps to solve and evaluating responses at each step, we enable a thorough assessment of state-of-the-art LLMs' ability to follow instructions. To ensure the robustness of our evaluation, we include multiple distinct tasks. Furthermore, by comparing accuracy across tasks, utilizing step-aware metrics, and applying separately defined measures of complexity, we conduct experiments that offer insights into the capabilities and limitations of LLMs in reasoning tasks. Our findings have significant implications for the development of LLMs and highlight areas for future research in advancing their reasoning abilities. Our dataset is available at \url{https://huggingface.co/datasets/ifujisawa/procbench} and code at \url{https://github.com/ifujisawa/proc-bench}.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# 精度、安定性、一般化:カウンター言語とダイク言語を分類するためのRNN学習能力の総合評価

Precision, Stability, and Generalization: A Comprehensive Assessment of RNNs learnability capability for Classifying Counter and Dyck Languages ( http://arxiv.org/abs/2410.03118v1 )

ライセンス: Link先を確認

Neisarg Dave, Daniel Kifer, Lee Giles, Ankur Mali,

(参考訳) 本研究では,リカレントニューラルネットワーク(RNN)の構造化形式言語分類における学習可能性について検討し,カウンター言語とダイク言語に着目した。伝統的に、一階述語(LSTM)と二階述語(O2RNN)の両方のRNNは、主にチョムスキー階層内の理論的表現性に基づいて、そのようなタスクに有効であると考えられてきた。しかし、我々の研究は、RNNが主にステートマシンとして機能し、その言語能力は、その埋め込みの正確さと、ネガティブな例をサンプリングする戦略に大きく影響されていることを示すことで、この概念に挑戦する。実験の結果, 正例と負例との構造的類似性が増加するにつれて, 性能が著しく低下することがわかった。興味深いことに、RNN埋め込みを用いた基本的な単層分類器でさえ、偶然よりも優れた性能を示した。一般化を評価するため,40本までの弦のモデルを訓練し,41本から500本までの弦の試験を行った。 LSTMモデルとO2RNNモデルの安定性の比較により、O2RNNは一般に様々なシナリオでより安定した安定性を提供することが示された。さらに、我々の仮説が様々なRNNと一致していることを明らかにするために、異なる初期化戦略の影響について検討する。全体として、この研究はRNNの計算能力に関する信念を確立し、言語分類タスクに対するニューラルネットワークの可能性を評価する上で、データ構造とサンプリング技術の重要性を強調した。表現性に対する強い制約は、単なる表現性が学習の本質を捉えないため、真の学習可能性を理解するために不可欠である、と氏は強調する。

This study investigates the learnability of Recurrent Neural Networks (RNNs) in classifying structured formal languages, focusing on counter and Dyck languages. Traditionally, both first-order (LSTM) and second-order (O2RNN) RNNs have been considered effective for such tasks, primarily based on their theoretical expressiveness within the Chomsky hierarchy. However, our research challenges this notion by demonstrating that RNNs primarily operate as state machines, where their linguistic capabilities are heavily influenced by the precision of their embeddings and the strategies used for sampling negative examples. Our experiments revealed that performance declines significantly as the structural similarity between positive and negative examples increases. Remarkably, even a basic single-layer classifier using RNN embeddings performed better than chance. To evaluate generalization, we trained models on strings up to a length of 40 and tested them on strings from lengths 41 to 500, using 10 unique seeds to ensure statistical robustness. Stability comparisons between LSTM and O2RNN models showed that O2RNNs generally offer greater stability across various scenarios. We further explore the impact of different initialization strategies revealing that our hypothesis is consistent with various RNNs. Overall, this research questions established beliefs about RNNs' computational capabilities, highlighting the importance of data structure and sampling techniques in assessing neural networks' potential for language classification tasks. It emphasizes that stronger constraints on expressivity are crucial for understanding true learnability, as mere expressivity does not capture the essence of learning.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# 強化学習システムにおけるリングアトラクタを用いた空間認識による意思決定

Spatial-aware decision-making with ring attractors in reinforcement learning systems ( http://arxiv.org/abs/2410.03119v1 )

ライセンス: Link先を確認

Marcos Negre Saura, Richard Allmendinger, Theodore Papamarkou, Wei Pan,

(参考訳) 本稿では、ニューラルネットワークのダイナミックスにインスパイアされた数学的モデルであるリングアトラクションの強化学習(RL)行動選択プロセスへの統合について検討する。リングアトラクタは、空間情報と不確実性をエンコードする特別な脳にインスパイアされた構造として、学習速度と予測性能を改善する生物学的に妥当なメカニズムを提供する。アクション空間を明示的にエンコードし、神経活動の組織化を容易にし、深いRLの文脈でニューラルネットワーク全体にわたって空間表現の分散を可能にする。 RLアクション選択プロセスにおけるリングアトラクターの応用は、リング上の特定の場所にアクションをマッピングし、神経活動に基づいて選択されたアクションをデコードすることを含む。本研究では,リングアトラクタを外生モデルとして構築し,深層学習ポリシーアルゴリズムの一部として統合することにより,リングアトラクタの適用について検討する。その結果, Atari 100kベンチマークの最先端モデルでは, 大幅な改善が見られた。特に、我々の統合されたアプローチは最先端モデルの性能を半分に改善し、選択されたベースラインよりも53\%向上したことを示す。

This paper explores the integration of ring attractors, a mathematical model inspired by neural circuit dynamics, into the reinforcement learning (RL) action selection process. Ring attractors, as specialized brain-inspired structures that encode spatial information and uncertainty, offer a biologically plausible mechanism to improve learning speed and predictive performance. They do so by explicitly encoding the action space, facilitating the organization of neural activity, and enabling the distribution of spatial representations across the neural network in the context of deep RL. The application of ring attractors in the RL action selection process involves mapping actions to specific locations on the ring and decoding the selected action based on neural activity. We investigate the application of ring attractors by both building them as exogenous models and integrating them as part of a Deep Learning policy algorithm. Our results show a significant improvement in state-of-the-art models for the Atari 100k benchmark. Notably, our integrated approach improves the performance of state-of-the-art models by half, representing a 53\% increase over selected baselines.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# RIPPLECOT:チェーン・オブ・ソート・インコンテクスト学習による言語モデルにおける知識編集のリップル効果の増幅

RIPPLECOT: Amplifying Ripple Effect of Knowledge Editing in Language Models via Chain-of-Thought In-Context Learning ( http://arxiv.org/abs/2410.03122v1 )

ライセンス: Link先を確認

Zihao Zhao, Yuchen Yang, Yijiang Li, Yinzhi Cao,

(参考訳) リップル効果は、大規模言語モデルの知識編集において重要な課題となる。すなわち、単一の事実が編集されると、モデルは関連する事実の連鎖に関連付けられたマルチホップ質問によって評価されるシーケンス内の関連事実を正確に更新するのに苦労する。最近の戦略は、従来のパラメータ更新から、より柔軟で計算集約性の高い方法へと移行し、リップル効果に対処する上でより効果的であることが証明された。インコンテキストラーニング(ICL)の編集では、単純な「Imagine that + new fact」を使ってLLMをガイドするが、新しい事実だけでそのようなシナリオに関わる事実の連鎖を特定できないため、複雑なマルチホップ問題に苦労する。さらに、メモリベースの編集は、すべての編集や関連する事実に対する追加のストレージを保持し、継続的な更新を効果的に維持する必要がある。これらの設計上の制限の結果、Vicuna-7BのMQuAKE-cfベンチマークでは、最も高い精度が33.8%に留まった。そこで我々は,Chain-of-Thought(COT)推論を統合した新しいICL編集手法であるRippleCOTを提案する。 RippleCOTはデモを‘newfact, question, thought, answer’として構成し、質問の中にマルチホップロジックを特定し分解するための思考コンポーネントを組み込む。このアプローチは、関連する事実の連鎖による複雑なマルチホップ質問を通じて、モデルを効果的に導く。総合的な実験により、RippleCOTはリップル効果の最先端を著しく上回り、精度は7.8%から87.1%まで向上した。

The ripple effect poses a significant challenge in knowledge editing for large language models. Namely, when a single fact is edited, the model struggles to accurately update the related facts in a sequence, which is evaluated by multi-hop questions linked to a chain of related facts. Recent strategies have moved away from traditional parameter updates to more flexible, less computation-intensive methods, proven to be more effective in addressing the ripple effect. In-context learning (ICL) editing uses a simple demonstration `Imagine that + new fact` to guide LLMs, but struggles with complex multi-hop questions as the new fact alone fails to specify the chain of facts involved in such scenarios. Besides, memory-based editing maintains additional storage for all edits and related facts, requiring continuous updates to stay effective. As a result of these design limitations, the challenge remains, with the highest accuracy being only 33.8% on the MQuAKE-cf benchmarks for Vicuna-7B. To address this, we propose RippleCOT, a novel ICL editing approach integrating Chain-of-Thought (COT) reasoning. RippleCOT structures demonstrations as `newfact, question, thought, answer`, incorporating a thought component to identify and decompose the multi-hop logic within questions. This approach effectively guides the model through complex multi-hop questions with chains of related facts. Comprehensive experiments demonstrate that RippleCOT significantly outperforms the state-of-the-art on the ripple effect, achieving accuracy gains ranging from 7.8% to 87.1%.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# 研削:符号付き距離場からのパラメータ化表面の再構成

Shrinking: Reconstruction of Parameterized Surfaces from Signed Distance Fields ( http://arxiv.org/abs/2410.03123v1 )

ライセンス: Link先を確認

Haotian Yin, Przemyslaw Musialski,

(参考訳) 本稿では,3次元曲面に対して広く用いられている暗黙的ニューラル表現(INR)であるSigned Distance Fields (SDFs) から,明示的パラメータ化曲面を再構成する手法を提案する。従来のマーチングキューブのような再構成手法では,INRの連続的および微分可能特性を損なう離散メッシュを抽出するが,本手法ではパラメータ化初期球を目標のSDF形状に合わせて反復的に収縮させ,微分可能性と表面パラメータ化を保った。これにより、テクスチャマッピング、幾何学処理、アニメーション、有限要素解析などの下流アプリケーションが可能になる。 ABCデータセットの典型的な幾何学的形状と部分から評価し,高度なコンピュータグラフィックスや幾何学的深層学習アプリケーションに欠かせないスムーズさと差別性を保ちながら,競争力のある再現性を実現する。

We propose a novel method for reconstructing explicit parameterized surfaces from Signed Distance Fields (SDFs), a widely used implicit neural representation (INR) for 3D surfaces. While traditional reconstruction methods like Marching Cubes extract discrete meshes that lose the continuous and differentiable properties of INRs, our approach iteratively contracts a parameterized initial sphere to conform to the target SDF shape, preserving differentiability and surface parameterization throughout. This enables downstream applications such as texture mapping, geometry processing, animation, and finite element analysis. Evaluated on the typical geometric shapes and parts of the ABC dataset, our method achieves competitive reconstruction quality, maintaining smoothness and differentiability crucial for advanced computer graphics and geometric deep learning applications.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# ブラックボックス言語モデルを用いた教師なしプロンプト学習について

On Unsupervised Prompt Learning for Classification with Black-box Language Models ( http://arxiv.org/abs/2410.03124v1 )

ライセンス: Link先を確認

Zhen-Yu Zhang, Jiandong Zhang, Huaxiu Yao, Gang Niu, Masashi Sugiyama,

(参考訳) 大規模言語モデル(LLM)はテキスト形式の学習問題において顕著な成功を収めており、最も人気のあるLLMはブラックボックス方式で展開されている。一方、特定のダウンストリームタスクがより良いパフォーマンスを得るためには、通常、微調整が必要であり、この機能はブラックボックスLLMのオーナーによって提供される。ブラックボックスLSMを微調整するには、モデルパラメータを調整するためにラベル付きデータが必要である。しかし、現実の多くのアプリケーションでは、LLMは熟練した人間のアノテーションよりも高品質なテキストデータセットをラベル付けすることができ、ラベルなしデータで微調整されたブラックボックスLSMの可能性を探る動機となった。本稿では,学習パラメータがプロンプト自身とラベルなしデータの擬似ラベルであるブラックボックスLPMを用いた分類のための教師なしプロンプト学習を提案する。具体的には、プロンプトは離散トークンの列としてモデル化され、各トークンは、それぞれが学習対象のカテゴリ分布を持つ。一方、擬似ラベルを学習するには、まずLLMのテキスト内学習(ICL)機能について検討し、まずLLMを用いて信頼性の高い擬似ラベル付きデータを識別し、そのプロンプトに基づいて擬似ラベル付きデータを他の非ラベル付きデータに割り当てる。以前は、プロンプトがトレーニング中に関与していないときに、プロンプトが予測に使用される場合に関係しているため、トレーニング中にそれらを考慮することで、プロンプト学習とプロンプト利用のステージはより一貫したものになる。ベンチマークデータセットを用いた実験により,提案アルゴリズムの有効性が示された。教師なしの素早い学習の後、擬似ラベル付きデータセットを使用して、ブラックボックスLLMの所有者によるさらなる微調整を行うことができる。

Large language models (LLMs) have achieved impressive success in text-formatted learning problems, and most popular LLMs have been deployed in a black-box fashion. Meanwhile, fine-tuning is usually necessary for a specific downstream task to obtain better performance, and this functionality is provided by the owners of the black-box LLMs. To fine-tune a black-box LLM, labeled data are always required to adjust the model parameters. However, in many real-world applications, LLMs can label textual datasets with even better quality than skilled human annotators, motivating us to explore the possibility of fine-tuning black-box LLMs with unlabeled data. In this paper, we propose unsupervised prompt learning for classification with black-box LLMs, where the learning parameters are the prompt itself and the pseudo labels of unlabeled data. Specifically, the prompt is modeled as a sequence of discrete tokens, and every token has its own to-be-learned categorical distribution. On the other hand, for learning the pseudo labels, we are the first to consider the in-context learning (ICL) capabilities of LLMs: we first identify reliable pseudo-labeled data using the LLM, and then assign pseudo labels to other unlabeled data based on the prompt, allowing the pseudo-labeled data to serve as in-context demonstrations alongside the prompt. Those in-context demonstrations matter: previously, they are involved when the prompt is used for prediction while they are not involved when the prompt is trained; thus, taking them into account during training makes the prompt-learning and prompt-using stages more consistent. Experiments on benchmark datasets show the effectiveness of our proposed algorithm. After unsupervised prompt learning, we can use the pseudo-labeled dataset for further fine-tuning by the owners of the black-box LLMs.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# 可変ラウンジ相互作用を持つ量子格子モデルにおける相関スプレッド

Correlation Spreading in Quantum Lattice Models with Variable-Range Interactions ( http://arxiv.org/abs/2410.03125v1 )

ライセンス: Link先を確認

Julien Despres,

(参考訳) 本論では, 急激な大域的クエンチを通じて平衡から遠ざかる短距離あるいは長距離相互作用を持つ孤立格子モデルにおける量子相関の拡散について検討した。準粒子理論に依存する一般的な理論的アプローチが提示される。後者は、超立方体格子上の短距離相互作用粒子と長距離相互作用粒子とスピン格子モデルの両方に有効な等時連結相関関数の一般表現を公表することを許している。定常位相の議論に基づき、その因果性円錐は、相関エッジと、時空相関の外部構造と内部構造を定義する一連の局所極限からなる普遍的な2次元構造を示すことを示した。短距離相互作用では、各構造の運動は弾道的であり、関連する拡散速度は、ポストクエンチハミルトニアンの準粒子分散関係の群と位相速度と関連している。 1/|R|^{\alpha}$ という形の長距離相互作用に対して、相関の拡散は、パワーロー指数 $\alpha$ をチューニングする際の群速度のばらつきによって大きく異なる。発散群速度、すなわち準局所的な状態に対して、因果円錐に対する普遍代数的構造の証拠を提示した。相関エッジの運動は常に弾道性よりも遅いことが分かっているが、局所的なエクストリームマは、それぞれ空隙のない量子系と空隙を持つ量子系に対して、弾道性および弾道性よりも速く伝播する。局所的な状態が明確に定義された群速度を示唆するならば、類似のスケーリング法則を回復し、相関の因果性円錐に対する短距離の場合よりも速度を拡大する。

In this thesis, we have investigated the spreading of quantum correlations in isolated lattice models with short- or long-range interactions driven far from equilibrium via sudden global quenches. A general theoretical approach relying on a quasiparticle theory is presented. The latter has permitted to unveil a generic expression for the equal-time connected correlation functions valid both for short-range and long-range interacting particle and spin lattice models on a hypercubic lattice. Relying on stationary phase arguments, we have shown that its causality cone displays a universal twofold structure consisting of a correlation edge and a series of local extrema defining the outer and inner structure of the space-time correlations. For short-range interactions, the motion of each structure is ballistic and the associated spreading velocities are related to the group and phase velocities of the quasiparticle dispersion relation of the post-quench Hamiltonian. For long-range interactions of the form $1/|R|^{\alpha}$, the correlation spreading is substantially different due to a possible divergence of group velocity when tuning the power-law exponent $\alpha$. For a divergent group velocity, i.e. the quasi-local regime, we have presented evidence of a universal algebraic structure for the causality cone. While, the correlation edge motion has been found to be always slower than ballistic, the local extrema propagate faster than ballistically and ballistically for gapless and gapped quantum systems respectively. For the local regime implying a well-defined group velocity, we have recovered similar scaling laws and spreading velocities than the short-range case for the causality cone of correlations.

翻訳日:2024-11-03 03:46:34 公開日:2024-10-04

# 資格改善の機会がある場合のAIモデルの公正性を考慮した意思決定者のエンゲージメントの理解

Understanding Decision Subjects' Engagement with and Perceived Fairness of AI Models When Opportunities of Qualification Improvement Exist ( http://arxiv.org/abs/2410.03126v1 )

ライセンス: Link先を確認

Meric Altug Gemalmaz, Ming Yin,

(参考訳) 我々は、AIモデルの公正さが、決定の対象である場合、モデルの公正さに対する人々の関与と認識にどのように影響するかを考察するが、これらの決定に対して繰り返し、戦略的に反応することができる。モデルとの対話を継続するか、モデルから望ましい決定を下す可能性を改善するために自らに投資すべきか、という2つのタイプの戦略的な反応が検討されている。 3つの人-オブジェクト実験により、決定対象がAIモデルとの戦略的かつ反復的な相互作用において、モデルの公平性はモデルとの相互作用や自己改善の意思を変化させることはないことがわかった。しかし、意思決定対象は、AIモデルがグループに対して体系的に偏見を抱く場合、特に、適度な判断の適格性を改善することの難しさが、低資格の人々にとってより大きい場合には、依然として公平でないと認識する。

We explore how an AI model's decision fairness affects people's engagement with and perceived fairness of the model if they are subject to its decisions, but could repeatedly and strategically respond to these decisions. Two types of strategic responses are considered -- people could determine whether to continue interacting with the model, and whether to invest in themselves to improve their chance of future favorable decisions from the model. Via three human-subject experiments, we found that in decision subjects' strategic, repeated interactions with an AI model, the model's decision fairness does not change their willingness to interact with the model or to improve themselves, even when the model exhibits unfairness on salient protected attributes. However, decision subjects still perceive the AI model to be less fair when it systematically biases against their group, especially if the difficulty of improving one's qualification for the favorable decision is larger for the lowly-qualified people.

翻訳日:2024-11-03 03:36:45 公開日:2024-10-04

# von Mises-Fisher分布を用いたベイズ推論による変分量子アルゴリズム

A variational quantum algorithm by Bayesian Inference with von Mises-Fisher distribution ( http://arxiv.org/abs/2410.03130v1 )

ライセンス: Link先を確認

Trung Huynh, Gwangil An, Minsu Kim, Yu-Seong Jeon, Jinhyoung Lee,

(参考訳) 変分量子固有解法アルゴリズムは、多くの物理・化学的問題において基本的な課題であるハミルトンの基底状態と基底エネルギーの探索能力から注目されている。有望な結果を示しているが、様々な測定方法の使用は依然として大きな障害である。近年,量子位相推定法にインスパイアされた測定手法が提案されている。この測定手法に基づいて,フォン・ミセス=フィッシャー分布とともにベイズ推論の原理を取り入れた新しい手法を提案し,様々なランダムなハミルトン行列に対して基底状態を特定できる新しいアルゴリズムの能力を理論的に実証する。これはまた、他の量子情報科学問題におけるフォン・ミセス・フィッシャー分布ポテンシャルを探索する新しい方法を開く。

The variational quantum eigensolver algorithm has gained attentions due to its capability of locating the ground state and ground energy of a Hamiltonian, which is a fundamental task in many physical and chemical problems. Although it has demonstrated promising results, the use of various types of measurements remains a significant obstacle. Recently, a quantum phase estimation algorithm inspired measurement scheme has been proposed to overcome this issue by introducing an additional ancilla system that is coupled to the primary system. Based on this measurement scheme, we present a novel approach that employs Bayesian inference principles together with von Mises-Fisher distribution and theoretically demonstrates the new algorithm's capability in identifying the ground state with certain for various random Hamiltonian matrices. This also opens a new way for exploring the von Mises-Fisher distribution potential in other quantum information science problems.

翻訳日:2024-11-03 03:36:45 公開日:2024-10-04

# 残余寿命予測:大規模言語モデルに基づく多次元産業信号処理と効率的な伝達学習に関する研究

Remaining Useful Life Prediction: A Study on Multidimensional Industrial Signal Processing and Efficient Transfer Learning Based on Large Language Models ( http://arxiv.org/abs/2410.03134v1 )

ライセンス: Link先を確認

Yan Chen, Cheng Liu,

(参考訳) RUL(Remaining useful Life)予測は、機器の信頼性と運用安全性が最重要である現代産業システムの維持に不可欠である。従来の手法は、小規模なディープラーニングや物理・統計モデルに基づいており、複雑な多次元センサーデータや様々な操作条件に悩まされ、一般化能力を制限している。これらの課題に対処するために,大規模言語モデル(LLM)をRUL予測に用いる革新的な回帰フレームワークを提案する。コーパスデータに事前学習したLLMのモデリング能力を利用することで,複雑な時間依存性を効果的に把握し,予測精度を向上させることができる。ターボファンエンジンのRUL予測タスクにおける広範囲な実験により、提案モデルは挑戦的なFD002およびFD004サブセットの最先端(SOTA)手法を超越し、他のサブセットのSOTAに近い結果が得られることが示された。これまでの研究と異なり、我々のフレームワークは全てのサブセットに対して同じスライディングウィンドウ長と全てのセンサ信号を使用し、強い一貫性と一般化を示している。さらに、転送学習実験により、微調整のための最小限のターゲットドメインデータでは、モデルが完全なターゲットドメインデータに基づいて訓練されたSOTAメソッドより優れていることが明らかになった。本研究は、産業信号処理とRUL予測におけるLLMの意義を強調し、将来のインテリジェント産業システムにおける健康管理のための先進的なソリューションを提供する。

Remaining useful life (RUL) prediction is crucial for maintaining modern industrial systems, where equipment reliability and operational safety are paramount. Traditional methods, based on small-scale deep learning or physical/statistical models, often struggle with complex, multidimensional sensor data and varying operating conditions, limiting their generalization capabilities. To address these challenges, this paper introduces an innovative regression framework utilizing large language models (LLMs) for RUL prediction. By leveraging the modeling power of LLMs pre-trained on corpus data, the proposed model can effectively capture complex temporal dependencies and improve prediction accuracy. Extensive experiments on the Turbofan engine's RUL prediction task show that the proposed model surpasses state-of-the-art (SOTA) methods on the challenging FD002 and FD004 subsets and achieves near-SOTA results on the other subsets. Notably, different from previous research, our framework uses the same sliding window length and all sensor signals for all subsets, demonstrating strong consistency and generalization. Moreover, transfer learning experiments reveal that with minimal target domain data for fine-tuning, the model outperforms SOTA methods trained on full target domain data. This research highlights the significant potential of LLMs in industrial signal processing and RUL prediction, offering a forward-looking solution for health management in future intelligent industrial systems.

翻訳日:2024-11-03 03:36:45 公開日:2024-10-04

# 正確な世界モデルを用いた構造を考慮したLLMのリレーショナル推論

Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model ( http://arxiv.org/abs/2410.03136v1 )

ライセンス: Link先を確認

Siheng Xiong, Ali Payani, Yuan Yang, Faramarz Fekri,

(参考訳) 大規模言語モデル(LLM)の推論能力の強化は、特に複雑で多段階の意思決定を必要とするタスクにおいて、依然として重要な課題である。人間は、様々な行動の潜在的な結果をシミュレートするために、内的世界モデルによる計画的計画を活用することで、これらのタスクを遂行する。そこで我々は,LLMのための多段階推論フレームワークを提案し,これをSWAP(Structure-Aware Planning with Accurate World Model)と呼ぶ。自然言語におけるChain-of-Thought(CoT)推論のみに依存する従来のアプローチとは異なり、SWAPは構造情報を取り入れ、世界モデルを通じて推論プロセスをガイドし、ステップ上のソフトな検証メカニズムを提供する。さらに、SWAPは、より信頼性の高い世界モデリングを可能にするGenerator-Discriminatorアーキテクチャを導入することで、複雑な推論タスクにおける正確な世界状態予測の課題を克服する。具体的には、ジェネレータが次の状態を予測し、判別器は問題コンテキストで要求される論理的一貫性と整合性を確保する。 SWAPはまた、早期収束を防ぐための幅広い潜在的な行動を探る政策モデルを奨励している。多様性に基づくモデリング(DBM)を用いて行動と状態の両方の世代多様性のボトルネックを解消し、比較的ランキング(CR)による識別精度を向上させることにより、SWAPはLLMの推論性能を著しく向上させる。 SWAPは,数理推論,論理推論,コーディングタスクなど,多種多様な推論集約型ベンチマークで評価される。大規模な実験により、SWAPはベースラインよりも大幅に改善され、同じ大きさの既存のLLMよりも一貫して優れていることが示された。

Enhancing the reasoning capabilities of large language models (LLMs) remains a key challenge, especially for tasks that require complex, multi-step decision-making. Humans excel at these tasks by leveraging deliberate planning with an internal world model to simulate the potential outcomes of various actions. Inspired by this, we propose a novel multi-step reasoning framework for LLMs, referred to as Structure-aware Planning with Accurate World Model (SWAP). Unlike previous approaches that rely solely on Chain-of-Thought (CoT) reasoning in natural language, SWAP incorporates structural information to guide the reasoning process via a world model and provides a soft verification mechanism over the steps. Moreover, SWAP overcomes the challenge of accurate world state predictions in complex reasoning tasks by introducing a Generator-Discriminator architecture, which enables more reliable world modeling. Specifically, the generator predicts the next state, and the discriminator ensures alignment with the logical consistency required by the problem context. SWAP also encourages the policy model to explore a broad range of potential actions to prevent premature convergence. By resolving the bottlenecks of generation diversity for both actions and states using diversity-based modeling (DBM) and improving discrimination accuracy through contrastive ranking (CR), SWAP significantly enhances the reasoning performance of LLMs. We evaluate SWAP across diverse reasoning-intensive benchmarks including math reasoning, logical reasoning, and coding tasks. Extensive experiments demonstrate that SWAP achieves substantial improvements over the baselines and consistently outperforms existing LLMs of similar sizes.

翻訳日:2024-11-03 03:36:45 公開日:2024-10-04

# SAG: モデルコラボレーションによるスタイル対応記事生成

SAG: Style-Aligned Article Generation via Model Collaboration ( http://arxiv.org/abs/2410.03137v1 )

ライセンス: Link先を確認

Chenning Xu, Fangxun Shu, Dian Jin, Jinghao Wei, Hao Jiang,

(参考訳) 大規模言語モデル(LLM)は、パーソナライズされたスタイリッシュなコンテンツ生成に対する需要を増大させている。しかし、GPT-4のようなクローズドソースモデルは最適化の機会に制限を与える一方で、Qwen-72Bのようなオープンソースの代替品の相当なトレーニングコストと柔軟性は、かなりの課題を生んでいる。逆に、SLM(Small Language Model)は複雑な命令を理解し、学習した能力を新しい文脈に移すのに苦労し、しばしばより顕著な制限を示す。本稿では, LLM と SLM の長所を利用した協調学習フレームワークを提案する。我々はLLMを凍結して、その堅牢な命令追従能力を利用し、その後、スタイル固有のデータを用いてSLMに教師付き微調整を適用する。さらに,スタイルの整合性を高める自己改善手法を提案する。新しいベンチマークであるNoteBenchは、スタイル整合生成を徹底的に評価しています。 GPT-4と比較して, ROUGE-L0.78, BLEU-40.55の改善が得られた。

Large language models (LLMs) have increased the demand for personalized and stylish content generation. However, closed-source models like GPT-4 present limitations in optimization opportunities, while the substantial training costs and inflexibility of open-source alternatives, such as Qwen-72B, pose considerable challenges. Conversely, small language models (SLMs) struggle with understanding complex instructions and transferring learned capabilities to new contexts, often exhibiting more pronounced limitations. In this paper, we present a novel collaborative training framework that leverages the strengths of both LLMs and SLMs for style article generation, surpassing the performance of either model alone. We freeze the LLMs to harness their robust instruction-following capabilities and subsequently apply supervised fine-tuning on the SLM using style-specific data. Additionally, we introduce a self-improvement method to enhance style consistency. Our new benchmark, NoteBench, thoroughly evaluates style-aligned generation. Extensive experiments show that our approach achieves state-of-the-art performance, with improvements of 0.78 in ROUGE-L and 0.55 in BLEU-4 scores compared to GPT-4, while maintaining a low hallucination rate regarding factual and faithfulness.

翻訳日:2024-11-03 03:36:45 公開日:2024-10-04

# LLMは様々な分子を生成することができるか? : 構造的多様性との整合に向けて

Can LLMs Generate Diverse Molecules? Towards Alignment with Structural Diversity ( http://arxiv.org/abs/2410.03138v1 )

ライセンス: Link先を確認

Hyosoon Jang, Yunhui Jang, Jaehyung Kim, Sungsoo Ahn,

(参考訳) 大規模言語モデル(LLM)の最近の進歩は、分子構造を創出する際、薬物候補として顕著な性能を示しており、薬物発見を加速する大きな可能性を秘めている。しかし、現在のLSMは、様々な分子のセットを提案するという、薬物発見の重要な要件を見落としている。この多様性は、他の分子がウェットラブや臨床的検証に失敗する場合に成功する可能性のある代替分子を提供するため、生存可能な薬物を見つける可能性を改善するために不可欠である。このような多様性の必要性にもかかわらず、LLMはしばしば与えられたプロンプトから構造的に類似した分子を出力する。ビームサーチのような復号方式はテキストの多様性を高める可能性があるが、これはしばしば分子構造的な多様性と一致しない。そこで本研究では, 分子生成LDMを微調整し, 構造的に多様な分子の集合を自己回帰的に生成する手法を提案する。提案手法は,(1)LLMを自己回帰的に生成する分子に適応させるための微調整と,(2)生成分子の構造多様性を最大化するための強化学習の2段階からなる。実験により,1) 既存の復号法と比較して, LLM がより多様な分子を発見できることを示すとともに, 2) 化学ドメインに微調整された分子を含む様々な分子の生成において, LLM が他の代表的 LLM よりも優れることを示した。

Recent advancements in large language models (LLMs) have demonstrated impressive performance in generating molecular structures as drug candidates, which offers significant potential to accelerate drug discovery. However, the current LLMs overlook a critical requirement for drug discovery: proposing a diverse set of molecules. This diversity is essential for improving the chances of finding a viable drug, as it provides alternative molecules that may succeed where others fail in wet-lab or clinical validations. Despite such a need for diversity, the LLMs often output structurally similar molecules from a given prompt. While decoding schemes like beam search may enhance textual diversity, this often does not align with molecular structural diversity. In response, we propose a new method for fine-tuning molecular generative LLMs to autoregressively generate a set of structurally diverse molecules, where each molecule is generated by conditioning on the previously generated molecules. Our approach consists of two stages: (1) supervised fine-tuning to adapt LLMs to autoregressively generate molecules in a sequence and (2) reinforcement learning to maximize structural diversity within the generated molecules. Our experiments show that (1) our fine-tuning approach enables the LLMs to better discover diverse molecules compared to existing decoding schemes and (2) our fine-tuned model outperforms other representative LLMs in generating diverse molecules, including the ones fine-tuned on chemical domains.

翻訳日:2024-11-03 03:36:45 公開日:2024-10-04

# 純粋相関の存在下でのインコンテキスト学習

In-context Learning in Presence of Spurious Correlations ( http://arxiv.org/abs/2410.03140v1 )

ライセンス: Link先を確認

Hrayr Harutyunyan, Rafayel Darbinyan, Samvel Karapetyan, Hrant Khachatrian,

(参考訳) 大規模言語モデルは、いくつかの例からタスクを解くことを学ぶ、コンテキスト内学習において顕著な能力を示す。近年の研究では、コンテクスト内で単純な回帰タスクを実行するためにトランスフォーマーを訓練できることが示されている。本研究は,突発的特徴を含む分類タスクに対して,文脈内学習者を訓練する可能性について検討する。従来の文脈内学習者の訓練手法は、刺激的な特徴に影響を受けやすいことが判明した。さらに、メタトレーニングデータセットが1つのタスクのみのインスタンスを含む場合、従来のアプローチはタスクの記憶に結びつき、予測にコンテキストを活用するモデルの生成に失敗する。そこで本研究では,そのような学習者に対して,与えられた分類課題を学習するための新しい手法を提案する。注目すべきは、このコンテキスト内学習者は、ERMやGroupDROのような強力なメソッドよりも優れています。しかし、これらのアルゴリズムとは異なり、他のタスクによく当てはまらない。そこで本研究では,テキスト内学習インスタンスの多種多様なデータセットをトレーニングすることにより,未知のタスクに一般化するインコンテキスト学習者を得ることが可能であることを示す。

Large language models exhibit a remarkable capacity for in-context learning, where they learn to solve tasks given a few examples. Recent work has shown that transformers can be trained to perform simple regression tasks in-context. This work explores the possibility of training an in-context learner for classification tasks involving spurious features. We find that the conventional approach of training in-context learners is susceptible to spurious features. Moreover, when the meta-training dataset includes instances of only one task, the conventional approach leads to task memorization and fails to produce a model that leverages context for predictions. Based on these observations, we propose a novel technique to train such a learner for a given classification task. Remarkably, this in-context learner matches and sometimes outperforms strong methods like ERM and GroupDRO. However, unlike these algorithms, it does not generalize well to other tasks. We show that it is possible to obtain an in-context learner that generalizes to unseen tasks by training on a diverse dataset of synthetic in-context learning instances.

翻訳日:2024-11-03 03:24:16 公開日:2024-10-04

# Margin Matching Preference Optimization: グラニュラーフィードバックによるモデルアライメントの強化

Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback ( http://arxiv.org/abs/2410.03145v1 )

ライセンス: Link先を確認

Kyuyoung Kim, Ah Jeong Seo, Hao Liu, Jinwoo Shin, Kimin Lee,

(参考訳) 人間のフィードバックからの強化学習など、アライメント技術で微調整された大規模言語モデル(LLM)は、これまでで最も有能なAIシステムの開発に役立っている。その成功にもかかわらず、既存の手法は、ペア間の相対的な品質の微妙な違いを捉えるのに失敗する、ペアの選好で好まれる出力を示すような単純なバイナリラベルに依存するのが一般的である。この制限に対処するために、相対的な品質マージンを最適化に組み込んだMMPO(Margin Matching Preference Optimization)というアプローチを導入し、LCMポリシーと報酬モデルの改善につながった。具体的には、ペアの選好における品質マージンを考慮し、Bradley-Terryモデルに基づくソフトターゲット確率を設計し、標準のクロスエントロピー目標を持つモデルを訓練する。人間とAIの両方のフィードバックデータによる実験によると、MMPOはMT-benchやRewardBenchといった一般的なベンチマークにおいて、ベースラインメソッドよりも一貫してパフォーマンスが向上している。特に、MMPOでトレーニングされた7Bモデルは、2024年6月現在、RewardBenchで最先端のパフォーマンスを達成しており、同じスケールの他のモデルよりも優れています。我々の分析は、MMPOが過剰適合に対してより堅牢であることを示し、より良い校正モデルをもたらすことも示している。

Large language models (LLMs) fine-tuned with alignment techniques, such as reinforcement learning from human feedback, have been instrumental in developing some of the most capable AI systems to date. Despite their success, existing methods typically rely on simple binary labels, such as those indicating preferred outputs in pairwise preferences, which fail to capture the subtle differences in relative quality between pairs. To address this limitation, we introduce an approach called Margin Matching Preference Optimization (MMPO), which incorporates relative quality margins into optimization, leading to improved LLM policies and reward models. Specifically, given quality margins in pairwise preferences, we design soft target probabilities based on the Bradley-Terry model, which are then used to train models with the standard cross-entropy objective. Experiments with both human and AI feedback data demonstrate that MMPO consistently outperforms baseline methods, often by a substantial margin, on popular benchmarks including MT-bench and RewardBench. Notably, the 7B model trained with MMPO achieves state-of-the-art performance on RewardBench as of June 2024, outperforming other models of the same scale. Our analysis also shows that MMPO is more robust to overfitting, leading to better-calibrated models.

翻訳日:2024-11-03 03:24:16 公開日:2024-10-04

# 自律型とウィザード・オブ・オズのユーザ行動の違いの分析と検出

Analysis and Detection of Differences in Spoken User Behaviors between Autonomous and Wizard-of-Oz Systems ( http://arxiv.org/abs/2410.03147v1 )

ライセンス: Link先を確認

Mikey Elmers, Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara,

(参考訳) 本研究では、遠隔操作ロボットと自律対話システムとの対話を比較検討し、日本人とロボットの対話の大規模コーパスにおけるユーザの行動差について検討した。注意的聴取と面接の対話シナリオにおけるユーザ音声行動の分析を行った。その結果, 発話長, 発話速度, 補聴器, バックチャネル, 拡散, および操作者制御条件と自律的条件の笑いなどの指標に有意な差が認められた。さらに,オペレータと自律的なシステム条件を区別する予測モデルを開発した。ベースラインモデルと比較して精度と精度が向上し, ベースラインモデルよりもF1スコアが高いモデルもいくつか存在する。

This study examined users' behavioral differences in a large corpus of Japanese human-robot interactions, comparing interactions between a tele-operated robot and an autonomous dialogue system. We analyzed user spoken behaviors in both attentive listening and job interview dialogue scenarios. Results revealed significant differences in metrics such as speech length, speaking rate, fillers, backchannels, disfluencies, and laughter between operator-controlled and autonomous conditions. Furthermore, we developed predictive models to distinguish between operator and autonomous system conditions. Our models demonstrated higher accuracy and precision compared to the baseline model, with several models also achieving a higher F1 score than the baseline.

翻訳日:2024-11-03 03:24:16 公開日:2024-10-04

# イベント中心物語のレンズによるメディアフレイミング

Media Framing through the Lens of Event-Centric Narratives ( http://arxiv.org/abs/2410.03151v1 )

ライセンス: Link先を確認

Rohan Das, Aditya Chandra, I-Ta Lee, Maria Leonor Pacheco,

(参考訳) コミュニケーションの観点から、フレームは特定の解釈を奨励し、他人を遠ざけるために使用される言語のパッケージングを定義する。例えば、ニュース記事は、移民を経済の押し上げまたは排水とみなすことができるため、同じ現象の全く異なる解釈を伝えることができる。本論では, フラーミング装置を説明するためには, 物語の作り方を考える必要がある,と論じる。この方向への第一歩として、イベントと他のイベントとの関係を抽出し、それらを高レベルな物語に分類し、ニュース記事のフレームを説明するためのフレームワークを提案する。我々のフレームワークは、移民と銃規制という2つの異なる領域において、米国のニュースにおけるフレーミングを分析するのに利用できることを示す。

From a communications perspective, a frame defines the packaging of the language used in such a way as to encourage certain interpretations and to discourage others. For example, a news article can frame immigration as either a boost or a drain on the economy, and thus communicate very different interpretations of the same phenomenon. In this work, we argue that to explain framing devices we have to look at the way narratives are constructed. As a first step in this direction, we propose a framework that extracts events and their relations to other events, and groups them into high-level narratives that help explain frames in news articles. We show that our framework can be used to analyze framing in U.S. news for two different domains: immigration and gun control.

翻訳日:2024-11-03 03:24:16 公開日:2024-10-04

# メモリ拡張リカレントニューラルネットワークにおける学習可能性の探索:精度,安定性,経験的考察

Exploring Learnability in Memory-Augmented Recurrent Neural Networks: Precision, Stability, and Empirical Insights ( http://arxiv.org/abs/2410.03154v1 )

ライセンス: Link先を確認

Shrabon Das, Ankur Mali,

(参考訳) 本研究では,Pushdown Automataと理論的に等価であるメモリレスおよびメモリ拡張RNNの学習可能性について検討する。経験的な結果から、これらのモデルは記号文法を習得するよりも精度に頼って、長い列の一般化に失敗することが多い。完全トレーニングおよびコンポーネント凍結モデルの実験により、メモリコンポーネントの凍結はパフォーマンスを著しく向上し、Penn Treebankデータセット(テストパープレキシティを123.5から120.5に削減した)で最先端の結果が得られた。凍結メモリを持つモデルでは、通常のモデルでは60%減少するのに対して、より長いシーケンスで初期性能の90%を保った。理論的解析は、凍結記憶が時間的依存を安定化させ、堅牢な収束をもたらすことを示唆している。これらの知見は、RNNの真の学習可能性限界を理解するために、安定したメモリ設計と長いシーケンス評価の必要性を強調している。

This study explores the learnability of memory-less and memory-augmented RNNs, which are theoretically equivalent to Pushdown Automata. Empirical results show that these models often fail to generalize on longer sequences, relying more on precision than mastering symbolic grammar. Experiments on fully trained and component-frozen models reveal that freezing the memory component significantly improves performance, achieving state-of-the-art results on the Penn Treebank dataset (test perplexity reduced from 123.5 to 120.5). Models with frozen memory retained up to 90% of initial performance on longer sequences, compared to a 60% drop in standard models. Theoretical analysis suggests that freezing memory stabilizes temporal dependencies, leading to robust convergence. These findings stress the need for stable memory designs and long-sequence evaluations to understand RNNs true learnability limits.

翻訳日:2024-11-03 03:24:16 公開日:2024-10-04

# MELODI: 長期のコンテキストに対するメモリ圧縮の探索

MELODI: Exploring Memory Compression for Long Contexts ( http://arxiv.org/abs/2410.03156v1 )

ライセンス: Link先を確認

Yinpeng Chen, DeLesley Hutchins, Aren Jansen, Andrey Zhmoginov, David Racz, Jesper Andersen,

(参考訳) 本稿では,短いコンテキストウィンドウを用いて,長い文書を効率的に処理できる新しいメモリアーキテクチャMELODIを提案する。 MELODIの鍵となる原理は、短期記憶と長期記憶をネットワーク層とコンテキストウィンドウの両方にわたる階層的な圧縮スキームとして表現することである。特に、短期記憶は、複数のレイヤにわたるコンテキストウィンドウの繰り返し圧縮によって達成され、ウィンドウ間のスムーズな遷移を保証する。対照的に、長期記憶は単一の中間層内でさらなる圧縮を行い、コンテキストウィンドウ全体で情報を集約し、履歴全体から重要な情報を効果的に統合する。強いベースライン – 大規模な長期メモリ(64Kキー値ペア)に対して集中的に注意を払っているMemorizing Transformer – と比較して, 提案手法は, 様々な長期コンテキストデータセットにおいて優れた性能を示し, メモリフットプリントを8。

We present MELODI, a novel memory architecture designed to efficiently process long documents using short context windows. The key principle behind MELODI is to represent short-term and long-term memory as a hierarchical compression scheme across both network layers and context windows. Specifically, the short-term memory is achieved through recurrent compression of context windows across multiple layers, ensuring smooth transitions between windows. In contrast, the long-term memory performs further compression within a single middle layer and aggregates information across context windows, effectively consolidating crucial information from the entire history. Compared to a strong baseline - the Memorizing Transformer employing dense attention over a large long-term memory (64K key-value pairs) - our method demonstrates superior performance on various long-context datasets while remarkably reducing the memory footprint by a factor of 8.

翻訳日:2024-11-03 03:24:16 公開日:2024-10-04

# 選択状態空間モデルにおけるメモリ圧縮の数学的形式化

Mathematical Formalism for Memory Compression in Selective State Space Models ( http://arxiv.org/abs/2410.03158v1 )

ライセンス: Link先を確認

Siddhanth Bhat,

(参考訳) 状態空間モデル(SSM)は、シーケンスデータの長距離依存性をモデル化するための強力なフレームワークとして登場した。従来のリカレントニューラルネットワーク(RNN)や畳み込みニューラルネットワーク(CNN)とは異なり、SSMは、制御理論や力学系の原理を利用して、シーケンスモデリングに対する構造的かつ安定したアプローチを提供する。しかし、シーケンスモデリングにおける重要な課題は、重要な情報を失うことなく、長期的依存関係をコンパクトな隠れ状態表現に圧縮することである。本稿では,選択状態空間モデルにおけるメモリ圧縮を理解するための厳密な数学的枠組みを開発する。入力関連性に基づいて隠れた状態を動的にフィルタリング・更新する選択ゲーティング機構を導入し,効率的なメモリ圧縮を実現する。我々は、相互情報やレート歪曲理論などの情報理論ツールを用いて、メモリ効率と情報保持のトレードオフを定式化する。我々の分析は、モデル性能を犠牲にすることなく圧縮できる情報量に関する理論的境界を提供する。また、選択的SSMにおける隠れ状態の安定性と収束性を証明し、信頼性のある長期記憶保持を保証する定理を導出する。計算複雑性解析により、選択SSMは従来のRNNモデルと比較してメモリ効率と処理速度を大幅に改善することが明らかになった。時系列予測や自然言語処理などのシーケンスモデリングタスクの実証的検証を通じて、選択的なSSMが、少ないメモリと計算資源を使用しながら、最先端のパフォーマンスを達成できることを実証する。

State space models (SSMs) have emerged as a powerful framework for modelling long-range dependencies in sequence data. Unlike traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs), SSMs offer a structured and stable approach to sequence modelling, leveraging principles from control theory and dynamical systems. However, a key challenge in sequence modelling is compressing long-term dependencies into a compact hidden state representation without losing critical information. In this paper, we develop a rigorous mathematical framework for understanding memory compression in selective state space models. We introduce a selective gating mechanism that dynamically filters and updates the hidden state based on input relevance, allowing for efficient memory compression. We formalize the trade-off between memory efficiency and information retention using information-theoretic tools, such as mutual information and rate-distortion theory. Our analysis provides theoretical bounds on the amount of information that can be compressed without sacrificing model performance. We also derive theorems that prove the stability and convergence of the hidden state in selective SSMs, ensuring reliable long-term memory retention. Computational complexity analysis reveals that selective SSMs offer significant improvements in memory efficiency and processing speed compared to traditional RNN-based models. Through empirical validation on sequence modelling tasks such as time-series forecasting and natural language processing, we demonstrate that selective SSMs achieve state-of-the-art performance while using less memory and computational resources.

翻訳日:2024-11-03 03:24:16 公開日:2024-10-04

# 時系列予測のための自己回帰移動平均アテンション機構

Autoregressive Moving-average Attention Mechanism for Time Series Forecasting ( http://arxiv.org/abs/2410.03159v1 )

ライセンス: Link先を確認

Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang,

(参考訳) 本稿では,様々な線形アテンション機構に適応できる自己回帰(AR)移動平均アテンション構造を提案する。本稿では、時系列予測(TSF)タスクにおいて、予め見落とされたデコーダのみの自己回帰変換モデルを用いて、適切なトークン化とトレーニング手法を適用すると、最適なベースラインに匹敵する結果が得られることを示す。さらに、統計学と最近の線形注意の進歩からARMAモデルに着想を得て、既存の自己回帰的注意機構に完全なARMA構造を導入する。間接MA重み生成法を用いて,MA項を基礎となる効率的な注目モデルの時間的複雑さとパラメータサイズを維持しつつ組み込む。さらに、間接パラメータ生成が局所的時間的影響のモデリング要求に合致する暗黙のMA重みを生成する方法について検討する。実験結果から、ARMA構造を組み込むことで、TSFタスクにおける様々なAR注意の処理性能が向上し、最先端の結果が得られた。

We propose an Autoregressive (AR) Moving-average (MA) attention structure that can adapt to various linear attention mechanisms, enhancing their ability to capture long-range and local temporal patterns in time series. In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines when appropriate tokenization and training methods are applied. Moreover, inspired by the ARMA model from statistics and recent advances in linear attention, we introduce the full ARMA structure into existing autoregressive attention mechanisms. By using an indirect MA weight generation method, we incorporate the MA term while maintaining the time complexity and parameter size of the underlying efficient attention models. We further explore how indirect parameter generation can produce implicit MA weights that align with the modeling requirements for local temporal impacts. Experimental results show that incorporating the ARMA structure consistently improves the performance of various AR attentions on TSF tasks, achieving state-of-the-art results.

翻訳日:2024-11-03 03:24:16 公開日:2024-10-04

# ビデオ拡散における時間的モデリングの再定義:ベクトル化された時間ステップアプローチ

Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach ( http://arxiv.org/abs/2410.03160v1 )

ライセンス: Link先を確認

Yaofang Liu, Yumeng Ren, Xiaodong Cun, Aitor Artola, Yang Liu, Tieyong Zeng, Raymond H. Chan, Jean-michel Morel,

(参考訳) 拡散モデルは画像生成に革命をもたらし、ビデオ生成への拡張は将来性を示している。しかしながら、現在のビデオ拡散モデル~(VDM)は、クリップレベルで適用されるスカラータイムステップ変数に依存しており、画像からビデオ生成のような様々なタスクに必要な複雑な時間依存性をモデル化する能力を制限する。この制限に対処するため,新しいベクトル化タイムステップ変数~(VTV)を導入したフレーム対応ビデオ拡散モデル~(FVDM)を提案する。従来のVDMとは異なり、我々の手法では各フレームが独立したノイズスケジュールに従うことができ、モデルが微粒な時間依存性を捉える能力を高めることができる。 FVDMの柔軟性は、標準的なビデオ生成、画像間生成、ビデオ補間、長いビデオ合成など、複数のタスクで実証されている。様々なVTV構成により、ゼロショット法における微調整時の破滅的な忘れ込みや限定的な一般化性といった課題を克服し、生成ビデオの質の向上を実現し、FVDMはビデオ生成品質において最先端の手法よりも優れ、拡張タスクにも優れることを示す実験的な評価を行った。既存のVDMの根本的な欠点に対処することで、FVDMはビデオ合成の新しいパラダイムを設定し、生成モデリングやマルチメディアアプリケーションに重要な意味を持つ堅牢なフレームワークを提供する。

Diffusion models have revolutionized image generation, and their extension to video generation has shown promise. However, current video diffusion models~(VDMs) rely on a scalar timestep variable applied at the clip level, which limits their ability to model complex temporal dependencies needed for various tasks like image-to-video generation. To address this limitation, we propose a frame-aware video diffusion model~(FVDM), which introduces a novel vectorized timestep variable~(VTV). Unlike conventional VDMs, our approach allows each frame to follow an independent noise schedule, enhancing the model's capacity to capture fine-grained temporal dependencies. FVDM's flexibility is demonstrated across multiple tasks, including standard video generation, image-to-video generation, video interpolation, and long video synthesis. Through a diverse set of VTV configurations, we achieve superior quality in generated videos, overcoming challenges such as catastrophic forgetting during fine-tuning and limited generalizability in zero-shot methods.Our empirical evaluations show that FVDM outperforms state-of-the-art methods in video generation quality, while also excelling in extended tasks. By addressing fundamental shortcomings in existing VDMs, FVDM sets a new paradigm in video synthesis, offering a robust framework with significant implications for generative modeling and multimedia applications.

翻訳日:2024-11-03 03:24:16 公開日:2024-10-04

# 適応型マスキングは視覚的グラウンド化を促進する

Adaptive Masking Enhances Visual Grounding ( http://arxiv.org/abs/2410.03161v1 )

ライセンス: Link先を確認

Sen Jia, Lei Li,

(参考訳) 近年では、LAION-5BやDataComp-1Bのような拡張データセットでの大規模視覚言語事前学習の成功により、ゼロショットと少数ショット学習が注目されている。しかし、これらのデータセットの継続的な拡張は、特にデータの可用性と計算オーバーヘッドに関して重要な課題を示し、ローショット学習能力の進歩にボトルネックを生じさせる。本稿では,低ショットの学習シナリオにおいて,データセットサイズの増加を必要とせず,語彙基底の強化を目的とした,ガウス放射を用いた画像解釈型マスキングを提案する。認知科学からインスピレーションを得て,近年のマスク付きオートエンコーダ(MAE)の成功により,視覚バックボーンが生成する特徴マップの有能な領域における適応マスキングを活用している。これにより、隠蔽された情報の再構成を通じて、頑健で一般化された表現を学習し、局所的特徴とグローバル的特徴の両方に効果的な注意を向けることができる。我々はCOCOやODinWを含むベンチマークデータセットに対するアプローチの有効性を評価し、ゼロショットタスクや少数ショットタスクにおいて優れた性能を示す。実験結果から、画像はベースラインモデルより優れ、一般化の向上と低ショットシナリオの性能向上を実現していることがわかった。これらの知見は、ゼロショット学習と少数ショット学習の進歩にデータセットサイズを継続的にスケーリングするアプローチに代わる、アダプティブな特徴操作とガウス的モデリングの可能性を浮き彫りにしている。私たちのコードはhttps://github.com/git-lenny/IMAGE.comで公開されています。

In recent years, zero-shot and few-shot learning in visual grounding have garnered considerable attention, largely due to the success of large-scale vision-language pre-training on expansive datasets such as LAION-5B and DataComp-1B. However, the continuous expansion of these datasets presents significant challenges, particularly with respect to data availability and computational overhead, thus creating a bottleneck in the advancement of low-shot learning capabilities. In this paper, we propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, aimed at enhancing vocabulary grounding in low-shot learning scenarios without necessitating an increase in dataset size. Drawing inspiration from cognitive science and the recent success of masked autoencoders (MAE), our method leverages adaptive masking on salient regions of the feature maps generated by the vision backbone. This enables the model to learn robust, generalized representations through the reconstruction of occluded information, thereby facilitating effective attention to both local and global features. We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks. Experimental results consistently show that IMAGE outperforms baseline models, achieving enhanced generalization and improved performance in low-shot scenarios. These findings highlight the potential of adaptive feature manipulation through attention mechanisms and Gaussian modeling as a promising alternative to approaches that rely on the continual scaling of dataset sizes for the advancement of zero-shot and few-shot learning. Our code is publicly available at https://github.com/git-lenny/IMAGE.

翻訳日:2024-11-03 03:14:31 公開日:2024-10-04

# 透かし付きLLMは、加工プロンプトでユーザによって識別できるか?

Can Watermarked LLMs be Identified by Users via Crafted Prompts? ( http://arxiv.org/abs/2410.03168v1 )

ライセンス: Link先を確認

Aiwei Liu, Sheng Guan, Yiming Liu, Leyi Pan, Yifei Zhang, Liancheng Fang, Lijie Wen, Philip S. Yu, Xuming Hu,

(参考訳) 大規模言語モデル(LLM)のためのテキスト透かしは,LLM出力の検出と誤用防止に大きく進歩している。現在の透かし技術は、高い検出性、テキスト品質への影響の最小化、テキスト編集に対する堅牢性を提供する。しかし、近年の研究はLLMサービスにおける透かし技術の不受容性についての調査を欠いている。 LLMプロバイダは、実際のシナリオにおける透かしの存在を開示したくないかもしれないため、サービスを使用するユーザの意欲を減らし、攻撃に対する透かしをより脆弱にする可能性がある。この研究は、透かしLLMの非受容性を初めて研究したものである。そこで我々は,LLMに適切に設計されたプロンプトによって透かしを検出する,Water-Probeと呼ばれる識別アルゴリズムを設計した。我々の主要な動機は、現在の透かしLLMが同じ透かしキーの下で一貫した偏りを露呈し、異なる透かしキーの下で同様の違いをもたらすことである。実験では、ほぼすべての主流の透かしアルゴリズムが、よく設計されたプロンプトと容易に識別できることが示され、一方、Water-Probeは、非透かしLLMに対して最小の偽陽性率を示す。最後に,透かしLLMの非受容性を高める鍵として,透かしキー選択のランダム性を高めることを提案する。そこで本研究では,複数の透かしキーをマージすることで,透かし不感受性を著しく向上するWater-Bag戦略を提案する。

Text watermarking for Large Language Models (LLMs) has made significant progress in detecting LLM outputs and preventing misuse. Current watermarking techniques offer high detectability, minimal impact on text quality, and robustness to text editing. However, current researches lack investigation into the imperceptibility of watermarking techniques in LLM services. This is crucial as LLM providers may not want to disclose the presence of watermarks in real-world scenarios, as it could reduce user willingness to use the service and make watermarks more vulnerable to attacks. This work is the first to investigate the imperceptibility of watermarked LLMs. We design an identification algorithm called Water-Probe that detects watermarks through well-designed prompts to the LLM. Our key motivation is that current watermarked LLMs expose consistent biases under the same watermark key, resulting in similar differences across prompts under different watermark keys. Experiments show that almost all mainstream watermarking algorithms are easily identified with our well-designed prompts, while Water-Probe demonstrates a minimal false positive rate for non-watermarked LLMs. Finally, we propose that the key to enhancing the imperceptibility of watermarked LLMs is to increase the randomness of watermark key selection. Based on this, we introduce the Water-Bag strategy, which significantly improves watermark imperceptibility by merging multiple watermark keys.

翻訳日:2024-11-03 03:14:31 公開日:2024-10-04

# 自己回帰型大言語モデルは計算的に普遍的である

Autoregressive Large Language Models are Computationally Universal ( http://arxiv.org/abs/2410.03170v1 )

ライセンス: Link先を確認

Dale Schuurmans, Hanjun Dai, Francesco Zanini,

(参考訳) 変換器をベースとした言語モデルの自己回帰復号化は,外部介入や重みの変更を伴わずに,普遍的な計算を実現することができることを示す。この結果を確立するには、言語モデルがコンテキスト境界を使って任意の長さの入力を処理できるかを理解する必要がある。この目的のために,コンテクストウィンドウが進行するにつれて,長い入力によって出力されたトークンがシーケンスの最後に付加される自己回帰復号の一般化を検討する。まず、この結果が計算の古典的モデルであるラグシステムに対応していることを示す。新しい証明を活用することで、2027年の生産規則を持つラグシステムにより、普遍的なチューリングマシンをシミュレートできることが示される。次に,既存の大言語モデルがこのような普遍的なラグシステムの振る舞いをシミュレートできるかどうかを検討する。本稿では,2027年の生産ルールのそれぞれを正しく適用するために,決定的(欲求的)デコーディングの下でモデルを動かすgemini-1.5-pro-001に対して,単一のシステムプロンプトを開発できることを示し,肯定的な回答を与える。我々は、チャーチ・チューリングの論文でgemini-1.5-pro-001に拡張された自己回帰(greedy)デコーディングが汎用コンピュータであると結論付けた。

We show that autoregressive decoding of a transformer-based language model can realize universal computation, without external intervention or modification of the model's weights. Establishing this result requires understanding how a language model can process arbitrarily long inputs using a bounded context. For this purpose, we consider a generalization of autoregressive decoding where, given a long input, emitted tokens are appended to the end of the sequence as the context window advances. We first show that the resulting system corresponds to a classical model of computation, a Lag system, that has long been known to be computationally universal. By leveraging a new proof, we show that a universal Turing machine can be simulated by a Lag system with 2027 production rules. We then investigate whether an existing large language model can simulate the behaviour of such a universal Lag system. We give an affirmative answer by showing that a single system-prompt can be developed for gemini-1.5-pro-001 that drives the model, under deterministic (greedy) decoding, to correctly apply each of the 2027 production rules. We conclude that, by the Church-Turing thesis, prompted gemini-1.5-pro-001 with extended autoregressive (greedy) decoding is a general purpose computer.

翻訳日:2024-11-03 03:14:31 公開日:2024-10-04

# 深層カーネル学習型遺伝的アルゴリズムによる高次元空間の高速最適化

Rapid optimization in high dimensional space by deep kernel learning augmented genetic algorithms ( http://arxiv.org/abs/2410.03173v1 )

ライセンス: Link先を確認

Mani Valleti, Aditya Raghavan, Sergei V. Kalinin,

(参考訳) 複雑な高次元空間の探索は、分子発見、プロセス最適化、サプライチェーン管理といった分野における重要な課題を示す。遺伝的アルゴリズム(GA)は、新しい候補空間を作成するための大きなパワーを提供するが、新しい提案された各ソリューションの評価を必要とするため、しばしば高い計算要求を伴う。一方、Deep Kernel Learning(DKL)は、選択された候補構造の空間を効率的にナビゲートするが、生成能力に欠ける。本研究では,新しい候補空間の挙動を迅速に把握するために,DKLに基づく代理モデルの効率性を持つ新しい候補を生成するために,GAの生成力を両立させるアプローチを提案する。このDKL-GAフレームワークは、ベイズ最適化(BO)ワークフローを構築するためにさらに使用できる。本稿では,フェロSIMモデルの最適化による本手法の有効性を実証し,分子発見や電池充電の最適化など多種多様な課題に適用可能であることを示す。

Exploration of complex high-dimensional spaces presents significant challenges in fields such as molecular discovery, process optimization, and supply chain management. Genetic Algorithms (GAs), while offering significant power for creating new candidate spaces, often entail high computational demands due to the need for evaluation of each new proposed solution. On the other hand, Deep Kernel Learning (DKL) efficiently navigates the spaces of preselected candidate structures but lacks generative capabilities. This study introduces an approach that amalgamates the generative power of GAs to create new candidates with the efficiency of DKL-based surrogate models to rapidly ascertain the behavior of new candidate spaces. This DKL-GA framework can be further used to build Bayesian Optimization (BO) workflows. We demonstrate the effectiveness of this approach through the optimization of the FerroSIM model, showcasing its broad applicability to diverse challenges, including molecular discovery and battery charging optimization.

翻訳日:2024-11-03 03:14:31 公開日:2024-10-04

# HRVMamba: 密度予測のための高解像度ビジュアルステートスペースモデル

HRVMamba: High-Resolution Visual State Space Model for Dense Prediction ( http://arxiv.org/abs/2410.03174v1 )

ライセンス: Link先を確認

Hao Zhang, Yongqiang Ma, Wenqi Shao, Ping Luo, Nanning Zheng, Kaipeng Zhang,

(参考訳) 近年、効率的なハードウェア対応設計(Mamba)を備えた状態空間モデル(SSM)は、トークン長とグローバルな受容領域に関する線形計算の複雑さから、コンピュータビジョンタスクにおいて有意な可能性を証明している。しかし、人間のポーズ推定やセマンティックセグメンテーションを含む密集した予測タスクにおけるマンバのパフォーマンスは、帰納的バイアスの不足、長距離の忘れ、低解像度の出力表現の3つの主要な課題によって制約されている。これらの課題に対処するために,マルチスケールの畳み込みカーネルを用いた動的ビジュアル状態空間(DVSS)ブロックを導入し,様々なスケールの局所的特徴を抽出し,帰納的バイアスを高めるとともに,変形可能な畳み込みを用いて,入力情報とタスク固有情報に基づいて適応的な空間的アグリゲーションを実現する。 DVSSブロックに基づく高分解能視覚空間モデル(HRVMamba)を導入し、プロセス全体を通して高分解能表現を保存し、効果的なマルチスケール特徴学習を促進する。大規模な実験では、HRVMambaの高密度予測タスクにおける印象的なパフォーマンスを強調し、ベルやホイッスルを使わずに既存のベンチマークモデルと競合する結果を達成している。コードはhttps://github.com/zhanghao5201/HRVMamba.comで入手できる。

Recently, State Space Models (SSMs) with efficient hardware-aware designs, i.e., Mamba, have demonstrated significant potential in computer vision tasks due to their linear computational complexity with respect to token length and their global receptive field. However, Mamba's performance on dense prediction tasks, including human pose estimation and semantic segmentation, has been constrained by three key challenges: insufficient inductive bias, long-range forgetting, and low-resolution output representation. To address these challenges, we introduce the Dynamic Visual State Space (DVSS) block, which utilizes multi-scale convolutional kernels to extract local features across different scales and enhance inductive bias, and employs deformable convolution to mitigate the long-range forgetting problem while enabling adaptive spatial aggregation based on input and task-specific information. By leveraging the multi-resolution parallel design proposed in HRNet, we introduce High-Resolution Visual State Space Model (HRVMamba) based on the DVSS block, which preserves high-resolution representations throughout the entire process while promoting effective multi-scale feature learning. Extensive experiments highlight HRVMamba's impressive performance on dense prediction tasks, achieving competitive results against existing benchmark models without bells and whistles. Code is available at https://github.com/zhanghao5201/HRVMamba.

翻訳日:2024-11-03 03:14:31 公開日:2024-10-04

# 事前学習型視覚言語(CLIP)モデルにおける対象幻覚の探索と緩和

Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models ( http://arxiv.org/abs/2410.03176v1 )

ライセンス: Link先を確認

Yufang Liu, Tao Ji, Changzhi Sun, Yuanbin Wu, Aimin Zhou,

(参考訳) LVLM(Large Vision-Language Models)は目覚ましい性能を達成しているが、これらのモデルにおける物体の幻覚に深刻な問題があることが研究で指摘されている。しかし、これらの幻覚の由来については明確な結論は出ていない。本稿では,CLIPモデルにおける物体幻覚問題に関する詳細な研究について述べる。孤立しても、CLIPモデルは対象の幻覚に傾向があり、幻覚問題は単に視覚と言語モダリティの相互作用によるものではないことを示唆する。そこで本研究では,種々の幻覚的問題を伴う負のサンプルを作成することで,対物的データ拡張手法を提案する。提案手法は,CLIPモデルのオブジェクト幻覚を効果的に緩和できることを示し,拡張されたモデルを視覚エンコーダとして使用することにより,LVLMにおけるオブジェクト幻覚の問題を効果的に緩和できることを示す。

Large Vision-Language Models (LVLMs) have achieved impressive performance, yet research has pointed out a serious issue with object hallucinations within these models. However, there is no clear conclusion as to which part of the model these hallucinations originate from. In this paper, we present an in-depth investigation into the object hallucination problem specifically within the CLIP model, which serves as the backbone for many state-of-the-art vision-language systems. We unveil that even in isolation, the CLIP model is prone to object hallucinations, suggesting that the hallucination problem is not solely due to the interaction between vision and language modalities. To address this, we propose a counterfactual data augmentation method by creating negative samples with a variety of hallucination issues. We demonstrate that our method can effectively mitigate object hallucinations for CLIP model, and we show the the enhanced model can be employed as a visual encoder, effectively alleviating the object hallucination issue in LVLMs.

翻訳日:2024-11-03 03:14:31 公開日:2024-10-04

# 宇宙粒子生成における動的カシミール効果とアナログ特性の基礎的課題

Foundational Issues in Dynamical Casimir Effect and Analogue Features in Cosmological Particle Creation ( http://arxiv.org/abs/2410.03179v1 )

ライセンス: Link先を確認

Jen-Tsung Hsiang, Bei-Lok Hu,

(参考訳) ブラックホールからのホーキング放射のアナログ源としての移動鏡は、力学カシミール効果(英語版)(DCE)とCPCのパラメトリック増幅機構に基づく類似性があるにもかかわらず、宇宙論的な粒子生成(英語版)(CPC)とともに広く研究されてきた。この「パースペクティブ」エッセイは、CPCの理論的基礎となる曲線時空における量子場理論の厳密さと完全性の一部を、様々な実験的な探索を楽しむDCEに伝えることを目的としている。実験室での空間実験を行う場合、なぜ「曲線」時空に悩まされるべきなのかというような単純な問題から、フィールド環境における着色雑音によるシステム力学における非局所散逸の頻繁な出現、量子レンツ法の存在、DCE放出のバックリアクション効果におけるDCE放出の変動散逸関係、鏡や媒体の動的応答を考慮したマイクロフィジカルモデルの構築など、基本的な理論的問題まで7つの課題を選択した。 DCEの理論的基盤の強化は、概念的明確性の向上だけでなく、DCEの将来の実験設計のコンセプトタイプの証明の開発にも有用である。 DCE実験の結果は、これらの基本的な過程の検証に最も期待するアナログ重力の精神から、初期の宇宙における磁場効果の理解を深めることになる。

Moving mirrors as analogue sources of Hawking radiation from black holes have been explored extensively, less so with cosmological particle creation (CPC), even though the analogy between dynamical Casimir effect (DCE) and CPC based on the mechanism of parametric amplification of quantum field fluctuations has also been known for a long time. This `perspective' essay intends to convey some of the rigor and thoroughness of quantum field theory in curved spacetime, which serves as the theoretical foundation of CPC, to DCE, which enjoys a variety of active experimental explorations. We have selected out seven issues of relevance to address, starting from the naively simple ones, e.g., why should one be bothered with `curved' spacetime when performing a laboratory experiment in ostensibly flat space, to foundational theoretical ones, such as the frequent appearance of nonlocal dissipation in the system dynamics induced by colored noises in its field environment, the existence of quantum Lenz law and fluctuation-dissipation relations in the backreaction effects of DCE emission on the moving atom/mirror or the source, and the construction of a microphysics model to account for the dynamical responses of a mirror or medium. The strengthening of theoretical ground for DCE is useful not only for improving conceptual clarity but needed for the development of proof of concept type of future experimental designs for DCE. Results from DCE experiments in turn will enrich our understanding of quantum field effects in the early universe because they are, in the spirit of analogue gravity, our best hopes for the verification of these fundamental processes.

翻訳日:2024-11-03 03:14:31 公開日:2024-10-04

# VDM-SLの仕様スライシング

Specification Slicing for VDM-SL ( http://arxiv.org/abs/2410.03180v1 )

ライセンス: Link先を確認

Tomohiro Oda, Han-Myung Chang,

(参考訳) 実行可能な仕様は、軽量なフォーマルなソフトウェア開発における強力なツールの1つです。 VDM-SLは命令文を通じて内部状態を参照して更新する操作の明示的で実行可能な定義を可能にする。 VDM-SLの広範な実行可能なサブセットは仕様段階での検証とテストを可能にするが、命令型プログラミングのように読み書きやデバッグが困難になる。本稿では,プログラムスライシングに基づくVDM-SLの仕様スライシングを定義する。そして、その応用を提示し、議論する。 VDM-SL のスライサは ViennaTalk で実装されており、ブラウザや VDM-SL 仕様を記述するデバッガで使用することができる。

The executable specification is one of the powerful tools in lightweight formal software development. VDM-SL allows the explicit and executable definition of operations that reference and update internal state through imperative statements. While the extensive executable subset of VDM-SL enables validation and testing in the specification phase, it also brings difficulties in reading and debugging as in imperative programming. In this paper, we define specification slicing for VDM-SL based on program slicing, a technique used for debugging and maintaining program source code in implementation languages. We then present and discuss its applications. The slicer for VDM-SL is implemented on ViennaTalk and can be used on browsers and debuggers describing the VDM-SL specification.

翻訳日:2024-11-03 03:14:31 公開日:2024-10-04

# Kiss up, Kick down: ビジュアルペルソナを割り当てたマルチモーダル大規模言語モデルの振る舞い変化を探る

Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas ( http://arxiv.org/abs/2410.03181v1 )

ライセンス: Link先を確認

Seungjong Sun, Eungu Lee, Seo Yeon Baek, Seunghyun Hwang, Wonbyung Lee, Dongyan Nan, Bernard J. Jansen, Jang Hyun Kim,

(参考訳) 本研究は,多モーダル大言語モデル(LLM)が視覚的ペルソナと行動の整合性について検討し,主にテキストに基づくペルソナに焦点を当てた文献における大きなギャップに対処する試みである。我々は,LLMの視覚的ペルソナとして割り当てるための5K架空のアバター画像の新たなデータセットを開発し,これらの画像に表される視覚的特徴に基づいて,アグレッシブ性に着目して,それらの交渉行動を分析した。その結果,LLMは人間に類似した方法で画像の攻撃性を評価し,攻撃的な視覚的ペルソナを刺激するとより攻撃的な交渉行動を出力することがわかった。興味深いことに、LLMは、相手のイメージが自分より攻撃的でなく、相手のイメージが攻撃的に見えるときの攻撃的行動がより少ない場合に、より攻撃的な交渉行動を示した。

This study is the first to explore whether multi-modal large language models (LLMs) can align their behaviors with visual personas, addressing a significant gap in the literature that predominantly focuses on text-based personas. We developed a novel dataset of 5K fictional avatar images for assignment as visual personas to LLMs, and analyzed their negotiation behaviors based on the visual traits depicted in these images, with a particular focus on aggressiveness. The results indicate that LLMs assess the aggressiveness of images in a manner similar to humans and output more aggressive negotiation behaviors when prompted with an aggressive visual persona. Interestingly, the LLM exhibited more aggressive negotiation behaviors when the opponent's image appeared less aggressive than their own, and less aggressive behaviors when the opponents image appeared more aggressive.

翻訳日:2024-11-03 03:14:31 公開日:2024-10-04

# 辞書アシスタントとしての大型言語モデルを用いたバイリンガル例文の生成

Generating bilingual example sentences with large language models as lexicography assistants ( http://arxiv.org/abs/2410.03182v1 )

ライセンス: Link先を確認

Raphael Merx, Ekaterina Vylomova, Kemal Kurniawan,

(参考訳) 本稿では,フランス語(高資源),インドネシア語(中資源),テトゥン語(低資源),英語を対象言語とする言語間のバイリンガル辞書の例文の生成と評価におけるLLMの性能について述べる。 GDEX(Good Dictionary Example)基準に対するLCM生成例の品質評価を行った。この結果から,LLMは十分な辞書例を生成できるが,低リソース言語では性能が著しく低下することが明らかとなった。また,低いアノテータ間の合意率に反映される品質など,人間の嗜好の変動も観察する。そこで本研究では,LLMを個々のアノテータの好みに合わせることができることを示す。さらに、実例の自動評価に事前訓練された言語モデルを用いることについて検討し、文の難易度が高リソース言語における典型性とインテリジェンスのための優れたプロキシとなることを発見した。また,LLM生成文対に対する600の新たな評価データセットも提供し,特に低リソース言語において,LLMが辞書作業のコスト削減に寄与する可能性について考察した。

We present a study of LLMs' performance in generating and rating example sentences for bilingual dictionaries across languages with varying resource levels: French (high-resource), Indonesian (mid-resource), and Tetun (low-resource), with English as the target language. We evaluate the quality of LLM-generated examples against the GDEX (Good Dictionary EXample) criteria: typicality, informativeness, and intelligibility. Our findings reveal that while LLMs can generate reasonably good dictionary examples, their performance degrades significantly for lower-resourced languages. We also observe high variability in human preferences for example quality, reflected in low inter-annotator agreement rates. To address this, we demonstrate that in-context learning can successfully align LLMs with individual annotator preferences. Additionally, we explore the use of pre-trained language models for automated rating of examples, finding that sentence perplexity serves as a good proxy for typicality and intelligibility in higher-resourced languages. Our study also contributes a novel dataset of 600 ratings for LLM-generated sentence pairs, and provides insights into the potential of LLMs in reducing the cost of lexicographic work, particularly for low-resource languages.

翻訳日:2024-11-03 03:14:31 公開日:2024-10-04

# EXAQ: LLMの高速化のための指数的アウェア量子化

EXAQ: Exponent Aware Quantization For LLMs Acceleration ( http://arxiv.org/abs/2410.03185v1 )

ライセンス: Link先を確認

Moran Shkolnik, Maxim Fishman, Brian Chmiel, Hilla Ben-Yaacov, Ron Banner, Kfir Yehuda Levy,

(参考訳) 量子化は、LLM(Large Language Models)推論に関連する計算と記憶のコストを削減するための主要なアプローチとして確立されている。現在の研究の大半は、重みとアクティベーションの定量化に重点を置いており、低ビットの汎用行列多重演算(GEMM)が可能であり、残りの非線形演算は高い精度で実行される。本研究では, これらの手法の適用により, LLMの推論における主要なボトルネックがソフトマックス層にあることを発見した。ソフトマックス演算は, 指数計算, 累積, 正規化の3段階からなる。ソフトマックス関数への入力に対して最適なクリッピング値を決定するための解析的手法を提案する。この方法では、$e^x$と$\sum(e^x)$の両方の計算を最小限の精度で高速化する。例えば、LLaMA1-30Bでは、よく知られた"Physical Interaction: Question Answering"(PIQA)データセット評価に基づいて、2ビット量子化を行い、ベースライン性能を実現する。この超低ビット量子化は、蓄積相において初めて約4倍の加速を可能にする。 e^x$と$\sum(e^x)$の両方を加速させることで、ソフトマックス演算の36.9%の加速が得られる。

Quantization has established itself as the primary approach for decreasing the computational and storage expenses associated with Large Language Models (LLMs) inference. The majority of current research emphasizes quantizing weights and activations to enable low-bit general-matrix-multiply (GEMM) operations, with the remaining non-linear operations executed at higher precision. In our study, we discovered that following the application of these techniques, the primary bottleneck in LLMs inference lies in the softmax layer. The softmax operation comprises three phases: exponent calculation, accumulation, and normalization, Our work focuses on optimizing the first two phases. We propose an analytical approach to determine the optimal clipping value for the input to the softmax function, enabling sub-4-bit quantization for LLMs inference. This method accelerates the calculations of both $e^x$ and $\sum(e^x)$ with minimal to no accuracy degradation. For example, in LLaMA1-30B, we achieve baseline performance with 2-bit quantization on the well-known "Physical Interaction: Question Answering" (PIQA) dataset evaluation. This ultra-low bit quantization allows, for the first time, an acceleration of approximately 4x in the accumulation phase. The combination of accelerating both $e^x$ and $\sum(e^x)$ results in a 36.9% acceleration in the softmax operation.

翻訳日:2024-11-03 03:14:31 公開日:2024-10-04

# 糖尿病網膜症分類における概念記述法の検討

Looking into Concept Explanation Methods for Diabetic Retinopathy Classification ( http://arxiv.org/abs/2410.03188v1 )

ライセンス: Link先を確認

Andrea M. Storås, Josefine V. Sundgaard,

(参考訳) 糖尿病網膜症は糖尿病の一般的な合併症であり,眼底画像を用いた網膜異常の進行のモニタリングが重要である。画像は医療専門家によって解釈されなければならないため、糖尿病網膜症のために糖尿病患者全員をスクリーニングすることは不可能である。深層学習は、眼底画像の自動解析とグルーピングの素晴らしい結果を示している。しかし、1つの欠点は、解釈可能性の欠如であり、クリニックにおけるそのようなシステムの実装を妨げている。説明可能な人工知能手法は、ディープニューラルネットワークを説明するために応用できる。概念に基づく説明は人間の理解には直感的であるが、糖尿病網膜症のグレーディングについては詳細は明らかにされていない。本研究は、糖尿病網膜症の自動診断のために開発されたディープニューラルネットワークを説明するための概念に基づく2つの説明手法について検討・比較する。いずれの方法にも長所と短所があることに気付き、メソッドの選択は利用可能なデータとエンドユーザの好みを考慮に入れなければなりません。

Diabetic retinopathy is a common complication of diabetes, and monitoring the progression of retinal abnormalities using fundus imaging is crucial. Because the images must be interpreted by a medical expert, it is infeasible to screen all individuals with diabetes for diabetic retinopathy. Deep learning has shown impressive results for automatic analysis and grading of fundus images. One drawback is, however, the lack of interpretability, which hampers the implementation of such systems in the clinic. Explainable artificial intelligence methods can be applied to explain the deep neural networks. Explanations based on concepts have shown to be intuitive for humans to understand, but have not yet been explored in detail for diabetic retinopathy grading. This work investigates and compares two concept-based explanation techniques for explaining deep neural networks developed for automatic diagnosis of diabetic retinopathy: Quantitative Testing with Concept Activation Vectors and Concept Bottleneck Models. We found that both methods have strengths and weaknesses, and choice of method should take the available data and the end user's preferences into account.

翻訳日:2024-11-03 03:04:25 公開日:2024-10-04

# ペアワイズサンプル最適化を用いた時間ステップ拡散モデルのチューニング

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization ( http://arxiv.org/abs/2410.03190v1 )

ライセンス: Link先を確認

Zichen Miao, Zhengyuan Yang, Kevin Lin, Ze Wang, Zicheng Liu, Lijuan Wang, Qiang Qiu,

(参考訳) 近年の時間分割拡散モデルの進歩により、非蒸留多段階モデルに匹敵する高品質な画像生成が可能になったが、推論ステップは大幅に少なくなった。このようなモデルは、低推論コストと遅延のためにアプリケーションにとって魅力的であるが、単純な拡散目標でそれらを微調整すると、劣化し、ぼやけた出力が得られる。直感的な代替手段は、優れた結果を生み出すが、複雑で計算集約的な、微調整された教師モデルで拡散蒸留を繰り返すことである。本稿では,任意の時間ステップ蒸留拡散モデルを直接微調整できるPSOアルゴリズムを提案する。 PSOは、現在の時間ステップ蒸留モデルからサンプリングされた追加の参照画像を導入し、トレーニング画像と参照画像との相対的な近縁率を増大させる。これにより、モデルは出力分布を微調整しながら、数ステップの生成能力を維持できる。また、PSOは、オフラインサンプリングとオンラインサンプリングの両方のペアワイズデータに柔軟に拡張できる一般化された定式化であり、拡散モデル優先最適化の様々な一般的な目的をカバーできることを示した。我々は、好みの最適化と、スタイル転送やコンセプトのカスタマイズなど、その他の微調整タスクにおいてPSOを評価する。 PSOは、オフラインとオンラインのペアワイズ画像データの両方を用いて、蒸留モデルを直接人間の好ましくない世代に適応させることができることを示す。 PSOはまた、時間ステップ蒸留拡散モデルを直接チューニングすることで、スタイル転送と概念カスタマイズの有効性を示す。

Recent advancements in timestep-distilled diffusion models have enabled high-quality image generation that rivals non-distilled multi-step models, but with significantly fewer inference steps. While such models are attractive for applications due to the low inference cost and latency, fine-tuning them with a naive diffusion objective would result in degraded and blurry outputs. An intuitive alternative is to repeat the diffusion distillation process with a fine-tuned teacher model, which produces good results but is cumbersome and computationally intensive; the distillation training usually requires magnitude higher of training compute compared to fine-tuning for specific image styles. In this paper, we present an algorithm named pairwise sample optimization (PSO), which enables the direct fine-tuning of an arbitrary timestep-distilled diffusion model. PSO introduces additional reference images sampled from the current time-step distilled model, and increases the relative likelihood margin between the training images and reference images. This enables the model to retain its few-step generation ability, while allowing for fine-tuning of its output distribution. We also demonstrate that PSO is a generalized formulation which can be flexibly extended to both offline-sampled and online-sampled pairwise data, covering various popular objectives for diffusion model preference optimization. We evaluate PSO in both preference optimization and other fine-tuning tasks, including style transfer and concept customization. We show that PSO can directly adapt distilled models to human-preferred generation with both offline and online-generated pairwise preference image data. PSO also demonstrates effectiveness in style transfer and concept customization by directly tuning timestep-distilled diffusion models.

翻訳日:2024-11-03 03:04:25 公開日:2024-10-04

# MultiVerse: 効率的かつ表現力のあるマルチタスクテキスト音声合成

MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech ( http://arxiv.org/abs/2410.03192v1 )

ライセンス: Link先を確認

Taejun Bak, Youngsik Eom, SeungJae Choi, Young-Sun Joo,

(参考訳) 訓練データ量をスケールアップするテキスト音声合成システム(TTS)は、ゼロショット音声合成において大幅に改善されている。しかし、これらのシステムには一定の制限があり、大量のトレーニングデータが必要であり、コストが増大し、しばしばプロソディの類似性を見落としている。これらの問題に対処するために、ゼロショットマルチタスクTSシステムであるMultiVerseを提案する。 MultiVerseは、従来のデータ駆動型アプローチよりも、トレーニングデータが少ない。限られたデータであってもゼロショット性能を確保するために,フィルタ関連およびソース関連表現をモデル化するためのプロンプトを利用して,ソースフィルタ理論に基づくアンタングルメントを利用する。さらに,プロソディの類似性をさらに向上するため,プロソディ・モデリング手法として,プロソディ・ベースの自己回帰的手法と非自己回帰的手法を併用した。評価の結果,MultiVerse のマルチタスク TTS 性能は,データ量が少ないデータ駆動型 TTS システムに匹敵するゼロショット TTS 性能を達成できるだけでなく,同じデータ量で訓練された他のゼロショット TTS システムよりも大幅に向上することが示された。特に,提案するプロソディ・モデリング技術は,与えられたプロソディと高いプロソディ類似性を持つ音声を生成するMultiVerseの能力に大きく寄与する。私たちのサンプルはhttps://nc-ai.github.io/speech/publications/multiverse/index.htmlで公開されています。

Text-to-speech (TTS) systems that scale up the amount of training data have achieved significant improvements in zero-shot speech synthesis. However, these systems have certain limitations: they require a large amount of training data, which increases costs, and often overlook prosody similarity. To address these issues, we propose MultiVerse, a zero-shot multi-task TTS system that is able to perform TTS or speech style transfer in zero-shot and cross-lingual conditions. MultiVerse requires much less training data than traditional data-driven approaches. To ensure zero-shot performance even with limited data, we leverage source-filter theory-based disentanglement, utilizing the prompt for modeling filter-related and source-related representations. Additionally, to further enhance prosody similarity, we adopt a prosody modeling approach combining prompt-based autoregressive and non-autoregressive methods. Evaluations demonstrate the remarkable zero-shot multi-task TTS performance of MultiVerse and show that MultiVerse not only achieves zero-shot TTS performance comparable to data-driven TTS systems with much less data, but also significantly outperforms other zero-shot TTS systems trained with the same small amount of data. In particular, our novel prosody modeling technique significantly contributes to MultiVerse's ability to generate speech with high prosody similarity to the given prompts. Our samples are available at https://nc-ai.github.io/speech/publications/multiverse/index.html

翻訳日:2024-11-03 03:04:25 公開日:2024-10-04

# マスク言語モデルを用いた並列コーパス拡張

Parallel Corpus Augmentation using Masked Language Models ( http://arxiv.org/abs/2410.03194v1 )

ライセンス: Link先を確認

Vibhuti Kumari, Narayana Murthy Kavi,

(参考訳) 本稿では, 良質なテキストコーパスを並列テキストコーパスに拡張する手法を提案し, 得られたシードコーパスよりも多くの折りたたみ式コーパスを生成できることを示した。追加の単言語コーパスは不要である。我々は、多言語マスク言語モデルを用いて、文脈における代替単語のマスキングと予測を行い、文の組込みを用いて、互いに翻訳される可能性のある文対をチェックし、選択する。 MT品質評価のための指標を用いて手法を横断的に検証する。本手法は,適切なシードコーパスが利用できるすべての言語ペアにおいて,データ不足の問題を大幅に軽減できると考えている。

In this paper we propose a novel method of augmenting parallel text corpora which promises good quality and is also capable of producing many fold larger corpora than the seed corpus we start with. We do not need any additional monolingual corpora. We use Multi-Lingual Masked Language Model to mask and predict alternative words in context and we use Sentence Embeddings to check and select sentence pairs which are likely to be translations of each other. We cross check our method using metrics for MT Quality Estimation. We believe this method can greatly alleviate the data scarcity problem for all language pairs for which a reasonable seed corpus is available.

翻訳日:2024-11-03 03:04:24 公開日:2024-10-04

# 大規模社会技術システムの要求工学における市民プラットフォームの可能性

The Potential of Citizen Platforms for Requirements Engineering of Large Socio-Technical Software Systems ( http://arxiv.org/abs/2410.03195v1 )

ライセンス: Link先を確認

Jukka Ruohonen, Kalle Hjerppe,

(参考訳) 参加型市民プラットフォーム(Participatory citizen platform)は、政策立案と熟考型民主主義に市民をデジタル的により深く関与させる革新的なソリューションである。これらのプラットフォームはエンジニアリングの文脈でも使用されているが、これまでのところ、プラットフォームと要求工学を結びつけるための作業は行われていない。本稿ではこの顕著なギャップを埋める。要件工学とともにプラットフォームについて議論することに加えて、この論文は潜在的な利点とデメリットを詳述し、ソフトウェア工学の文脈における将来のパイロット研究の道を開く。これらの工学的特徴により、この論文は、その実装とガバナンスを含む公共部門における大規模社会技術ソフトウェアシステムの研究にも貢献する。

Participatory citizen platforms are innovative solutions to digitally better engage citizens in policy-making and deliberative democracy in general. Although these platforms have been used also in an engineering context, thus far, there is no existing work for connecting the platforms to requirements engineering. The present paper fills this notable gap. In addition to discussing the platforms in conjunction with requirements engineering, the paper elaborates potential advantages and disadvantages, thus paving the way for a future pilot study in a software engineering context. With these engineering tenets, the paper also contributes to the research of large socio-technical software systems in a public sector context, including their implementation and governance.

翻訳日:2024-11-03 03:04:24 公開日:2024-10-04

# 対話型言語における対話型構造学習による自動質問生成のための言語間移動

Cross-lingual Transfer for Automatic Question Generation by Learning Interrogative Structures in Target Languages ( http://arxiv.org/abs/2410.03197v1 )

ライセンス: Link先を確認

Seonjeong Hwang, Yunsu Kim, Gary Geunbae Lee,

(参考訳) 自動質問生成(QG)は、QAコーパスの強化、チャットボットシステムの強化、教育材料の開発など、幅広い目的を果たす。その重要性にもかかわらず、既存のデータセットのほとんどは英語に重点を置いており、その結果、他の言語でのデータ可用性にかなりの差がある。 QG(XLT-QG)の言語間転送は、高ソース言語データセットでトレーニングされたモデルが低リソース言語で質問を生成することを可能にすることで、この制限に対処する。本稿では,小言語モデルを用いて,単言語,並列,ラベル付きデータを必要としない,単純かつ効率的なXLT-QG手法を提案する。我々のモデルは、英語のQAデータセットのみに基づいて訓練され、限定された質問例から質問構造を学習し、対象言語で質問を生成する。実験の結果,提案手法は複数のXLT-QGベースラインより優れ,GPT-3.5-turboに匹敵する性能を示した。さらに,本モデルが生成した合成データは,多言語QAモデルの学習に有用であることを示す。大規模言語モデルよりもパラメータが大幅に少なく、ターゲット言語に対する追加のトレーニングを必要としないため、本手法は様々な言語を対象としたQGおよびQAタスクに有効なソリューションを提供する。

Automatic question generation (QG) serves a wide range of purposes, such as augmenting question-answering (QA) corpora, enhancing chatbot systems, and developing educational materials. Despite its importance, most existing datasets predominantly focus on English, resulting in a considerable gap in data availability for other languages. Cross-lingual transfer for QG (XLT-QG) addresses this limitation by allowing models trained on high-resource language datasets to generate questions in low-resource languages. In this paper, we propose a simple and efficient XLT-QG method that operates without the need for monolingual, parallel, or labeled data in the target language, utilizing a small language model. Our model, trained solely on English QA datasets, learns interrogative structures from a limited set of question exemplars, which are then applied to generate questions in the target language. Experimental results show that our method outperforms several XLT-QG baselines and achieves performance comparable to GPT-3.5-turbo across different languages. Additionally, the synthetic data generated by our model proves beneficial for training multilingual QA models. With significantly fewer parameters than large language models and without requiring additional training for target languages, our approach offers an effective solution for QG and QA tasks across various languages.

翻訳日:2024-11-03 03:04:24 公開日:2024-10-04

# PersoBench: 大規模言語モデルにおけるパーソナライズされた応答生成のベンチマーク

PersoBench: Benchmarking Personalized Response Generation in Large Language Models ( http://arxiv.org/abs/2410.03198v1 )

ライセンス: Link先を確認

Saleh Afzoon, Usman Naseem, Amin Beheshti, Zahra Jamali,

(参考訳) 大きな言語モデル(LLM)は印象的な会話能力を示したが、パーソナライズされた応答を提供する能力は未だに不明である。近年のベンチマークでは、ロールプレイングコンテキストにおけるペルソナの一貫性を自動的に評価しているが、応答生成におけるパーソナライゼーションの評価は未定である。このギャップに対処するため、ゼロショット環境での対話生成におけるLLMのパーソナライズ能力を評価するために、新しいベンチマークPersoBenchを提案する。我々は、よく知られたデータセットと様々なメトリクスを用いて、3つのオープンソースと3つのクローズドソースLCMの性能を評価する。 3つの有名なペルソナ・アウェア・データセットを用いて分析を行い、標準およびチェーン・オブ・シークレット・プロンディングの手法を用いて、流布、多様性、コヒーレンス、パーソナライゼーションを含む応答品質の複数の次元を評価した。以上の結果から,LLMは流動的で多様な応答を生成するのに優れるが,会話コンテキストと提供されるペルソナの両方を考慮して,パーソナライズされた一貫性のある応答を提供するのに十分ではないことが明らかとなった。ベンチマーク実装はhttps://github.com/salehafzoon/PersoBench.comで公開しています。

While large language models (LLMs) have exhibited impressive conversational capabilities, their proficiency in delivering personalized responses remains unclear. Although recent benchmarks automatically evaluate persona consistency in role-playing contexts using LLM-based judgment, the evaluation of personalization in response generation remains underexplored. To address this gap, we present a new benchmark, PersoBench, to evaluate the personalization ability of LLMs in persona-aware dialogue generation within a zero-shot setting. We assess the performance of three open-source and three closed-source LLMs using well-known datasets and a range of metrics. Our analysis, conducted on three well-known persona-aware datasets, evaluates multiple dimensions of response quality, including fluency, diversity, coherence, and personalization, across both standard and chain-of-thought prompting methods. Our findings reveal that while LLMs excel at generating fluent and diverse responses, they are far from satisfactory in delivering personalized and coherent responses considering both the conversation context and the provided personas. Our benchmark implementation is available at https://github.com/salehafzoon/PersoBench.

翻訳日:2024-11-03 03:04:24 公開日:2024-10-04

# サイバー物理システムのためのテストジェネレータの学習

Learning test generators for cyber-physical systems ( http://arxiv.org/abs/2410.03202v1 )

ライセンス: Link先を確認

Jarkko Peltomäki, Ivan Porres,

(参考訳) サイバー物理システムに対するブラックボックス実行時検証手法は、入力と出力が時間とともに信号として表現され、その正確性要件が時間論理で規定されるシステムにおけるエラーを発見するために用いられる。既存の方法、例えば要求のファルシフィケーションは、システム正当性に対する反例である単一の入力を見つけることに集中することが多い。本稿では,単一要求に対して多種多様な反例を生成可能なテストジェネレータの開発方法について検討する。いくつかの反例は、入力条件の異なるシステム障害を露呈し、障害の根本原因分析をサポートする。本稿では,WOGANアルゴリズムを用いて自動生成する手法を提案する。このアルゴリズムは、反例の集合上の均一分布のターゲット分布をモデル化するワッサーシュタイン生成逆数ネットワークを反復的に訓練することによって機能する。 WOGANは、実行時検証のためのテストジェネレータとして機能する生成モデルを訓練するアルゴリズムである。トレーニングは、以前のモデルやデータセットを必要とせずにオンラインで実行される。また,このようなテストジェネレータの評価基準も提案する。我々は、ARCH-COMPのファルシフィケーションベンチマークなど、よく知られたいくつかの問題に対して、訓練されたジェネレータを評価した。実験結果から,WOGANアルゴリズムによって訓練された発電機は,一様ランダムサンプリングのサンプルと同等に多種多様である試験を生成する一方で,最先端の要求ファルシフィケーションアルゴリズムと同じくらい有効であることが示唆された。我々は、WOGANは自動でテストジェネレータを生成するための実行可能な方法であり、これらのテストジェネレータは、サイバー物理システムの実行時検証のために、多種多様な反例を生成することができると結論付けた。

Black-box runtime verification methods for cyber-physical systems can be used to discover errors in systems whose inputs and outputs are expressed as signals over time and their correctness requirements are specified in a temporal logic. Existing methods, such as requirement falsification, often focus on finding a single input that is a counterexample to system correctness. In this paper, we study how to create test generators that can produce multiple and diverse counterexamples for a single requirement. Several counterexamples expose system failures in varying input conditions and support the root cause analysis of the faults. We present the WOGAN algorithm to create such test generators automatically. The algorithm works by training iteratively a Wasserstein generative adversarial network that models the target distribution of the uniform distribution on the set of counterexamples. WOGAN is an algorithm that trains generative models that act as test generators for runtime verification. The training is performed online without the need for a previous model or dataset. We also propose criteria to evaluate such test generators. We evaluate the trained generators on several well-known problems including the ARCH-COMP falsification benchmarks. Our experimental results indicate that generators trained by the WOGAN algorithm are as effective as state-of-the-art requirement falsification algorithms while producing tests that are as diverse as a sample from uniform random sampling. We conclude that WOGAN is a viable method to produce test generators automatically and that these test generators can generate multiple and diverse counterexamples for the runtime verification of cyber-physical systems.

翻訳日:2024-11-03 03:04:24 公開日:2024-10-04

# 1次論理変換による意味構造学習

Learning Semantic Structure through First-Order-Logic Translation ( http://arxiv.org/abs/2410.03203v1 )

ライセンス: Link先を確認

Akshay Chaturvedi, Nicholas Asher,

(参考訳) 本論文では,トランスフォーマーに基づく言語モデルが,簡単な文から述語構造を抽出できるかどうかを考察する。まず、どの述語がどの対象に当てはまるかを言語モデルが混同することがあることを示す。これを軽減するために,質問応答(Q/A),一階述語論理(FOL)翻訳という2つの課題と,素早い処理と微調整を行う2つの方法を検討する。 FOL翻訳では、一般化能力を評価するために設計された合成データセット上で、いくつかの大きな言語モデルを微調整する。 Q/AではBERTやRoBERTaのようなエンコーダモデルを微調整し、LSMのプロンプトを使用する。その結果,LLMのFOL翻訳は述語構造を学習するのに適していることがわかった。

In this paper, we study whether transformer-based language models can extract predicate argument structure from simple sentences. We firstly show that language models sometimes confuse which predicates apply to which objects. To mitigate this, we explore two tasks: question answering (Q/A), and first order logic (FOL) translation, and two regimes, prompting and finetuning. In FOL translation, we finetune several large language models on synthetic datasets designed to gauge their generalization abilities. For Q/A, we finetune encoder models like BERT and RoBERTa and use prompting for LLMs. The results show that FOL translation for LLMs is better suited to learn predicate argument structure.

翻訳日:2024-11-03 03:04:24 公開日:2024-10-04

# メタヒューリスティックアルゴリズムの設計・実験と実世界の最適化問題への応用に関する研究

A Tutorial on the Design, Experimentation and Application of Metaheuristic Algorithms to Real-World Optimization Problems ( http://arxiv.org/abs/2410.03205v1 )

ライセンス: Link先を確認

Eneko Osaba, Esther Villar-Rodriguez, Javier Del Ser, Antonio J. Nebro, Daniel Molina, Antonio LaTorre, Ponnuthurai N. Suganthan, Carlos A. Coello Coello, Francisco Herrera,

(参考訳) ここ数年、メタヒューリスティックアルゴリズムによる実世界の最適化問題の定式化と効率的な解法は、数多くの研究の触媒となっている。メタヒューリスティックの設計と使用に関する数十年の歴史的進歩にもかかわらず、新しい技術成果の理解可能性、アルゴリズム設計の正しさ、性能検証性に関して大きな困難が残っている。明確な例は、最適化に使用されるメタヒューリスティック(英語版)を扱う作業の複製性の欠如に起因している。さらに、多くの場合、報告された結果に疑わしい統計的意義がある。この研究は、科学的厳密さ、価値、透明性を提供するために最適化に使用されるメタヒューリスティックス手法の研究を行う際に、受け入れるべき良いプラクティスの提案を聴衆に提供することを目的としている。この目的のために、我々は、この科学分野に取り組む際に従うべきすべての研究フェーズをカバーするステップバイステップの方法論を紹介した。具体的には、問題の定式化、ソリューションエンコーディング、探索演算子の実装、評価指標、実験の設計、実世界のパフォーマンスに関する考察等について、しばしば見過ごされがちな側面と有用な勧告について論じる。最後に、現実のアプリケーション環境上での展開と運用において、新しく開発された最適化メタヒューリスティックスの成功に向けた重要な考察、課題、研究の方向性について概説する。

In the last few years, the formulation of real-world optimization problems and their efficient solution via metaheuristic algorithms has been a catalyst for a myriad of research studies. In spite of decades of historical advancements on the design and use of metaheuristics, large difficulties still remain in regards to the understandability, algorithmic design uprightness, and performance verifiability of new technical achievements. A clear example stems from the scarce replicability of works dealing with metaheuristics used for optimization, which is often infeasible due to ambiguity and lack of detail in the presentation of the methods to be reproduced. Additionally, in many cases, there is a questionable statistical significance of their reported results. This work aims at providing the audience with a proposal of good practices which should be embraced when conducting studies about metaheuristics methods used for optimization in order to provide scientific rigor, value and transparency. To this end, we introduce a step by step methodology covering every research phase that should be followed when addressing this scientific field. Specifically, frequently overlooked yet crucial aspects and useful recommendations will be discussed in regards to the formulation of the problem, solution encoding, implementation of search operators, evaluation metrics, design of experiments, and considerations for real-world performance, among others. Finally, we will outline important considerations, challenges, and research directions for the success of newly developed optimization metaheuristics in their deployment and operation over real-world application environments.

翻訳日:2024-11-03 02:54:39 公開日:2024-10-04

# SPHINX:ハイパーグラフ推論ネットワークを用いた構造予測

SPHINX: Structural Prediction using Hypergraph Inference Network ( http://arxiv.org/abs/2410.03208v1 )

ライセンス: Link先を確認

Iulia Duta, Pietro Liò,

(参考訳) 高次関係の重要性は、多くの現実世界システムにおいて広く認識されている。しかし、それらに注釈をつけるのは退屈な作業であり、時には不可能な作業である。その結果、データモデリングの現在のアプローチは、高次相互作用を完全に無視するか、あるいはペア接続に単純化する。高次処理を容易にするため、ハイパーグラフ構造が利用できない場合でも、最終ノードレベル信号のみから、非教師なしの方法で遅延ハイパーグラフ構造を推論するモデルであるハイパーグラフ推論ネットワーク(SPHINX)を用いて構造予測を導入する。このモデルは、各ハイパーエッジに対して、ノード上の確率分布を逐次予測するために使用されるソフトで微分可能なクラスタリング法と、それらを明示的なハイパーグラフ構造に変換するサンプリングアルゴリズムから構成される。近年のk-サブセットサンプリングの進歩は,先行研究で示されたトレーニング不安定性のいくつかに対処して,離散ハイパーグラフ構造を生成するのに適したツールであることが示されている。結果として得られるモデルは、最新のハイパーグラフニューラルネットワークに必要な高次構造を生成することができ、注釈付けが難しいドメインでの高次相互作用のキャプチャを容易にする。トラジェクトリ予測のための2つの挑戦的データセットを用いて行った広範囲なアブレーション研究と実験を通じて、我々のモデルは、解釈可能で最終的な性能を高めるための適切な潜時ハイパーグラフを推測できることを実証した。

The importance of higher-order relations is widely recognized in a large number of real-world systems. However, annotating them is a tedious and sometimes impossible task. Consequently, current approaches for data modelling either ignore the higher-order interactions altogether or simplify them into pairwise connections. In order to facilitate higher-order processing, even when a hypergraph structure is not available, we introduce Structural Prediction using Hypergraph Inference Network (SPHINX), a model that learns to infer a latent hypergraph structure in an unsupervised way, solely from the final node-level signal. The model consists of a soft, differentiable clustering method used to sequentially predict, for each hyperedge, the probability distribution over the nodes and a sampling algorithm that converts them into an explicit hypergraph structure. We show that the recent advancement in k-subset sampling represents a suitable tool for producing discrete hypergraph structures, addressing some of the training instabilities exhibited by prior works. The resulting model can generate the higher-order structure necessary for any modern hypergraph neural network, facilitating the capture of higher-order interaction in domains where annotating them is difficult. Through extensive ablation studies and experiments conducted on two challenging datasets for trajectory prediction, we demonstrate that our model is capable of inferring suitable latent hypergraphs, that are interpretable and enhance the final performance.

翻訳日:2024-11-03 02:54:39 公開日:2024-10-04

# Tadashi: 保証された正確さでAIベースの自動コード生成を実現する

Tadashi: Enabling AI-Based Automated Code Generation With Guaranteed Correctness ( http://arxiv.org/abs/2410.03210v1 )

ライセンス: Link先を確認

Emil Vatai, Aleksandr Drozd, Ivan R. Ivanov, Yinghao Ren, Mohamed Wahib,

(参考訳) フレームワークとDSL 自動生成コードは、伝統的に、適用されたコード変換の合法性を保証するために厳格な方法を持つように、開発する人間の専門家に依存してきました。機械学習(ML)は、ハードウェアターゲットに最適化されたコードを自動生成する手段として広く採用されている。しかし、MLソリューション、特にブラックボックスDNNは、合法性に関する保証を提供していない。本稿では,多面体モデルを利用して,MLをコード生成に適用する上で不可欠なデータセットのキュレートを求める研究者を支援する図書館,多面体モデルを提案する。 Tadashiは、ベースライン参照コードに適用された多面的スケジュールに基づいて、候補変換の合法性を確実かつ実践的にチェックする機能を提供する。図書館が生成した変換の合法性を保証することを証明し、その軽量な実用コストを実証する。 Tadashiはhttps://github.com/vatai/tadashi/.comで入手できる。

Frameworks and DSLs auto-generating code have traditionally relied on human experts developing them to have in place rigorous methods to assure the legality of the applied code transformations. Machine Learning (ML) is gaining wider adoption as a means to auto-generate code optimised for the hardware target. However, ML solutions, and in particular black-box DNNs, provide no such guarantees on legality. In this paper we propose a library, Tadashi, which leverages the polyhedral model to empower researchers seeking to curate datasets crucial for applying ML in code-generation. Tadashi provides the ability to reliably and practically check the legality of candidate transformations on polyhedral schedules applied on a baseline reference code. We provide a proof that our library guarantees the legality of generated transformations, and demonstrate its lightweight practical cost. Tadashi is available at https://github.com/vatai/tadashi/.

翻訳日:2024-11-03 02:54:39 公開日:2024-10-04

# CUDLE:未管理環境における大麻検出のためのラベルスカルシティ下での学習

CUDLE: Learning Under Label Scarcity to Detect Cannabis Use in Uncontrolled Environments ( http://arxiv.org/abs/2410.03211v1 )

ライセンス: Link先を確認

Reza Rahimi Azghan, Nicholas C. Glodosky, Ramesh Kumar Sah, Carrie Cuttler, Ryan McLaughlin, Michael J. Cleveland, Hassan Ghasemzadeh,

(参考訳) ウェアラブルセンサーシステムは、行動介入を支援するために生理的健康をリアルタイムで客観的に監視する大きな可能性を実証している。しかし、人間の監督が限られており、患者による自己ラベル化に依存しているため、生活自由環境で正確なラベルを取得することは依然として困難であり、データ収集や教師付き学習は特に困難である。この問題に対処するために、我々はCUDLE(Cannabis Use Detection with Label efficiency)を紹介した。これは、現実のウェアラブルセンサーデータによる自己教師型学習を活用して、医療的課題に対処する新しいフレームワークである。 CUDLEは、対照的な学習フレームワークを通じて、センサ由来のデータを使用して大麻の消費モーメントを特定する。まず、データ拡張を伴う自己教師付きプレテキストタスクを通じて、堅牢な表現を学習する。これらの表現は、浅い分類器で下流タスクで微調整され、CUDLEは従来の教師付きメソッド、特にラベル付きデータよりも優れている。アプローチを評価するため,大麻利用者20名を対象に,EMA(Ecological Momentary Assessment)手法を用いて,利用者が報告した大麻使用モーメントと合わせて500時間以上のウェアラブルセンサデータを収集した。収集したデータを用いて広範囲に分析したところ,CUDLEの精度は73.4%,教師付きアプローチでは71.1%,ラベル数が減少するにつれてパフォーマンスギャップが拡大していることがわかった。特に、CUDLEは、75%少ないラベルを使用しながら教師付きモデルを上回るだけでなく、はるかに少ない被験者でピーク性能に達する。

Wearable sensor systems have demonstrated a great potential for real-time, objective monitoring of physiological health to support behavioral interventions. However, obtaining accurate labels in free-living environments remains difficult due to limited human supervision and the reliance on self-labeling by patients, making data collection and supervised learning particularly challenging. To address this issue, we introduce CUDLE (Cannabis Use Detection with Label Efficiency), a novel framework that leverages self-supervised learning with real-world wearable sensor data to tackle a pressing healthcare challenge: the automatic detection of cannabis consumption in free-living environments. CUDLE identifies cannabis consumption moments using sensor-derived data through a contrastive learning framework. It first learns robust representations via a self-supervised pretext task with data augmentation. These representations are then fine-tuned in a downstream task with a shallow classifier, enabling CUDLE to outperform traditional supervised methods, especially with limited labeled data. To evaluate our approach, we conducted a clinical study with 20 cannabis users, collecting over 500 hours of wearable sensor data alongside user-reported cannabis use moments through EMA (Ecological Momentary Assessment) methods. Our extensive analysis using the collected data shows that CUDLE achieves a higher accuracy of 73.4%, compared to 71.1% for the supervised approach, with the performance gap widening as the number of labels decreases. Notably, CUDLE not only surpasses the supervised model while using 75% less labels, but also reaches peak performance with far fewer subjects.

翻訳日:2024-11-03 02:54:39 公開日:2024-10-04

# WMT24 Indic MT共有タスクのためのNLIP_Lab-IIth低リソースMTシステム

NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task ( http://arxiv.org/abs/2410.03215v1 )

ライセンス: Link先を確認

Pramit Sahoo, Maharaj Brahma, Maunendra Sankar Desarkar,

(参考訳) 本稿では,WMT 24の低リソースインデックス言語翻訳におけるタスク共有システムについて述べる。 eng $\leftrightarrow$ {as, kha, lus, mni} を参加言語ペアとみなす。この共有タスクでは、22のインド諸言語に対するアライメント強化により、埋め込みをより近くに整列させることを目標とした事前学習モデルの微調整について検討する。我々の一次システムは、事前訓練されたモデルに基づく言語固有の微調整に基づいている。我々は、eng$\rightarrow$as, eng$\rightarrow$kha, eng$\rightarrow$lus, eng$\rightarrow$mniの公式公試セットにおいて、50.6, 42.3, 54.9, 66.3のchrF2スコアを得る。また、言語グループ化や層凍結による多言語学習についても検討する。私たちのコード、モデル、生成された翻訳はここで利用可能です。

In this paper, we describe our system for the WMT 24 shared task of Low-Resource Indic Language Translation. We consider eng $\leftrightarrow$ {as, kha, lus, mni} as participating language pairs. In this shared task, we explore the finetuning of a pre-trained model motivated by the pre-trained objective of aligning embeddings closer by alignment augmentation \cite{lin-etal-2020-pre} for 22 scheduled Indian languages. Our primary system is based on language-specific finetuning on a pre-trained model. We achieve chrF2 scores of 50.6, 42.3, 54.9, and 66.3 on the official public test set for eng$\rightarrow$as, eng$\rightarrow$kha, eng$\rightarrow$lus, eng$\rightarrow$mni respectively. We also explore multilingual training with/without language grouping and layer-freezing. Our code, models, and generated translations are available here: https://github.com/pramitsahoo/WMT2024-LRILT.

翻訳日:2024-11-03 02:54:39 公開日:2024-10-04

# 医療データ管理のためのインテリジェントな量子サイバーセキュリティフレームワーク

An Intelligent Quantum Cyber-Security Framework for Healthcare Data Management ( http://arxiv.org/abs/2410.03217v1 )

ライセンス: Link先を確認

Kishu Gupta, Deepika Saxena, Pooja Rani, Jitendra Kumar, Aaisha Makkar, Ashutosh Kumar Singh, Chung-Nan Lee,

(参考訳) デジタル医療は、医療サービスの強化のために、消費者が医療データにアクセスし、配布しやすくするために不可欠である。しかし、医療システム間のデジタル化に関する重要な懸念は、機密性の高いデジタル医療データ共有と悪意あるエンティティの積極的な評価を促進するために、迅速な、生産的で安全な保管施設と活発なコミュニケーション戦略を必要とすることである。本稿では,医療データ管理におけるセキュリティとプライバシの潜在的な問題を克服する,包括的な量子ベースのフレームワークを提案する。量子暗号化を利用することで、セキュアなストレージと共有クラウドプラットフォーム上での医療データの分散に量子暗号化を装備する。また、このフレームワークは、量子フィードフォワードニューラルネットワークユニットを使用して、アクセスを許可する前にデータ要求の背後にある意図を調べ、潜在的なデータ漏洩を積極的に推定する。このようにして、提案したフレームワークは、高度な量子アプローチと機械学習を結合して、悪意あるエンティティを自動で保護し、アクセスし、予測することで、医療データ全体を管理する。このように提案されたIQ-HDMは、より協力的で効果的な医療提供をもたらし、個人の健康データを適切に管理する権限を与える。提案されたIQ-HDMフレームワークと最先端の手法の実験的評価と比較は、医療データセキュリティに関連するサイバー脅威に対処する上で、67.6%の大幅な改善を概説している。

Digital healthcare is essential to facilitate consumers to access and disseminate their medical data easily for enhanced medical care services. However, the significant concern with digitalization across healthcare systems necessitates for a prompt, productive, and secure storage facility along with a vigorous communication strategy, to stimulate sensitive digital healthcare data sharing and proactive estimation of malicious entities. In this context, this paper introduces a comprehensive quantum-based framework to overwhelm the potential security and privacy issues for secure healthcare data management. It equips quantum encryption for the secured storage and dispersal of healthcare data over the shared cloud platform by employing quantum encryption. Also, the framework furnishes a quantum feed-forward neural network unit to examine the intention behind the data request before granting access, for proactive estimation of potential data breach. In this way, the proposed framework delivers overall healthcare data management by coupling the advanced and more competent quantum approach with machine learning to safeguard the data storage, access, and prediction of malicious entities in an automated manner. Thus, the proposed IQ-HDM leads to more cooperative and effective healthcare delivery and empowers individuals with adequate custody of their health data. The experimental evaluation and comparison of the proposed IQ-HDM framework with state-of-the-art methods outline a considerable improvement up to 67.6%, in tackling cyber threats related to healthcare data security.

翻訳日:2024-11-03 02:54:39 公開日:2024-10-04

# ブラウン雑音を聴くことを学ぶ

Learning to steer with Brownian noise ( http://arxiv.org/abs/2410.03221v1 )

ライセンス: Link先を確認

Stefan Ankirchner, Sören Christensen, Jan Kallsen, Philip Le Borne, Stefan Perko,

(参考訳) 本稿では,境界速度追従問題のエルゴード版について考察し,意思決定者が基礎となるシステムパラメータの知識を欠いており,同時に制御しながら学習しなければならないことを仮定する。本研究では,移動経験平均に基づくアルゴリズムを提案し,統計的手法と確率的制御理論を統合するための枠組みを開発する。私たちの一番の成果は対数的期待の後悔率です。これを実現するために,本研究では,根底にあるプロセスのエルゴード収束率と考慮された推定者のリスクを厳密に分析する。

This paper considers an ergodic version of the bounded velocity follower problem, assuming that the decision maker lacks knowledge of the underlying system parameters and must learn them while simultaneously controlling. We propose algorithms based on moving empirical averages and develop a framework for integrating statistical methods with stochastic control theory. Our primary result is a logarithmic expected regret rate. To achieve this, we conduct a rigorous analysis of the ergodic convergence rates of the underlying processes and the risks of the considered estimators.

翻訳日:2024-11-03 02:54:39 公開日:2024-10-04

# 大規模言語モデルを用いた産業機械故障の相談

Consultation on Industrial Machine Faults with Large language Models ( http://arxiv.org/abs/2410.03223v1 )

ライセンス: Link先を確認

Apiradee Boonmee, Kritsada Wongsuwan, Pimchanok Sukjai,

(参考訳) 産業機械故障診断は、製造環境における運転効率と安全性の重要な要素である。従来の手法は専門家の知識と特定の機械学習モデルに大きく依存しており、適応性に制限があり、広範なラベル付きデータを必要とする。本稿では,大規模言語モデル(LLM)を利用した新しい手法を提案する。プロンプトを動的に作成することにより,多様なデータソースから情報を合成する能力が向上し,文脈的理解や行動可能なレコメンデーションが向上する。実験の結果,本手法はベースラインモデルより優れており,各種故障の診断において91%の精度が得られた。この知見は, 工業的断層協議の改革におけるLLMの可能性を浮き彫りにし, 複雑な環境におけるより効率的な保守戦略の道を開いた。

Industrial machine fault diagnosis is a critical component of operational efficiency and safety in manufacturing environments. Traditional methods rely heavily on expert knowledge and specific machine learning models, which can be limited in their adaptability and require extensive labeled data. This paper introduces a novel approach leveraging Large Language Models (LLMs), specifically through a structured multi-round prompting technique, to improve fault diagnosis accuracy. By dynamically crafting prompts, our method enhances the model's ability to synthesize information from diverse data sources, leading to improved contextual understanding and actionable recommendations. Experimental results demonstrate that our approach outperforms baseline models, achieving an accuracy of 91% in diagnosing various fault types. The findings underscore the potential of LLMs in revolutionizing industrial fault consultation practices, paving the way for more effective maintenance strategies in complex environments.

翻訳日:2024-11-03 02:54:39 公開日:2024-10-04

# ScriptViz: 大規模な映画データベースに基づくスクリプト作成を支援する可視化ツール

ScriptViz: A Visualization Tool to Aid Scriptwriting based on a Large Movie Database ( http://arxiv.org/abs/2410.03224v1 )

ライセンス: Link先を確認

Anyi Rao, Jean-Peïc Chou, Maneesh Agrawala,

(参考訳) スクリプトライターは通常、自分の心の可視化に頼って、自分の想像力を使って、自分が書いているシーンを見たり、感じたり、経験したりすることで、鮮やかなストーリーを作る。メンタルヴィジュアライゼーションの他に、映画内の既存のイメージやシーンを参照し、視覚要素を分析して特定の雰囲気や雰囲気を作り出すことも多い。本稿では,スクリーンライティングプロセスのための大規模映画データベースをベースとした外部可視化を実現するためのScriptVizを開発する。スクリプトのテキストと対話に基づいて、大規模な映画データベースから参照ビジュアルをリアルタイムで取得する。このツールは、作者が視覚的要素を制御できる2つのタイプのコントロールを提供する。 1) 固定された視覚要素で何が欲しいか正確に確認し、 2)不確実な要素の分散を見よ。 15人のスクリプト作者のユーザ評価によると、ScriptVizは、スクリプトと密に連携し、スクリプトの作成を支援する、一貫性がありながら多様な視覚的可能性を持つスクリプトを提示できる。

Scriptwriters usually rely on their mental visualization to create a vivid story by using their imagination to see, feel, and experience the scenes they are writing. Besides mental visualization, they often refer to existing images or scenes in movies and analyze the visual elements to create a certain mood or atmosphere. In this paper, we develop ScriptViz to provide external visualization based on a large movie database for the screenwriting process. It retrieves reference visuals on the fly based on scripts' text and dialogue from a large movie database. The tool provides two types of control on visual elements that enable writers to 1) see exactly what they want with fixed visual elements and 2) see variances in uncertain elements. User evaluation among 15 scriptwriters shows that ScriptViz is able to present scriptwriters with consistent yet diverse visual possibilities, aligning closely with their scripts and helping their creation.

翻訳日:2024-11-03 02:54:39 公開日:2024-10-04

# ALR$^2$:Long-context Question AnsweringのためのRetrieve-then-Reason Framework

ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering ( http://arxiv.org/abs/2410.03227v1 )

ライセンス: Link先を確認

Huayang Li, Pat Verga, Priyanka Sen, Bowen Yang, Vijay Viswanathan, Patrick Lewis, Taro Watanabe, Yixuan Su,

(参考訳) 近年,大規模言語モデル (LLM) のコンテキストウィンドウが大幅に拡張されている。しかし、LLMが処理できるコンテキスト長は増大しているが、そのコンテキストを正確に推論するモデルの能力は著しく低下している。これは、現代のLLMがコンテキスト内の膨大な情報に圧倒されることが多いためであり、質問に答える際には、モデルはテキスト全体にわずかに分散している関係する証拠を識別し、推論しなければならない。長文推論の課題を軽減するために,LLMが中間的検索ステップで収集した関連する証拠を推論することのできる,検索テーマ推論フレームワークを開発した。現代のLLMは、関連した事実を正確に取り出すのに苦労し、しばしば「回収された事実」を幻覚させ、欠陥のある推論と誤った答えを生み出す。これらの問題に対処するために、ALR$^2$を導入し、LLMの長文推論能力を明示的な2段階の手順により強化する手法、すなわち、LLMを検索と推論の両方の目的と整合させる手法を提案する。長文推論タスクの性能劣化を軽減するために, ALR$^2$の有効性を示す。長文QAベンチマークの広範な実験により、我々の手法は、HotpotQAデータセットとSQuADデータセットの長文バージョンで、それぞれ8.4と7.9のEMゲインを達成し、競争ベースラインを大きなマージンで上回ります。

The context window of large language models (LLMs) has been extended significantly in recent years. However, while the context length that the LLM can process has grown, the capability of the model to accurately reason over that context degrades noticeably. This occurs because modern LLMs often become overwhelmed by the vast amount of information in the context; when answering questions, the model must identify and reason over relevant evidence sparsely distributed throughout the text. To alleviate the challenge of long-context reasoning, we develop a retrieve-then-reason framework, enabling LLMs to reason over relevant evidence collected during an intermediate retrieval step. We find that modern LLMs struggle to accurately retrieve relevant facts and instead, often hallucinate "retrieved facts", resulting in flawed reasoning and the production of incorrect answers. To address these issues, we introduce ALR$^2$, a method that augments the long-context reasoning capability of LLMs via an explicit two-stage procedure, i.e., aligning LLMs with the objectives of both retrieval and reasoning. We demonstrate the efficacy of ALR$^2$ for mitigating performance degradation in long-context reasoning tasks. Through extensive experiments on long-context QA benchmarks, we find our method to outperform competitive baselines by large margins, achieving at least 8.4 and 7.9 EM gains on the long-context versions of HotpotQA and SQuAD datasets, respectively.

翻訳日:2024-11-03 02:54:39 公開日:2024-10-04

# 予測のためのフローマッチングにおける確率パスの設計選択

Elucidating the Design Choice of Probability Paths in Flow Matching for Forecasting ( http://arxiv.org/abs/2410.03229v1 )

ライセンス: Link先を確認

Soon Hoe Lim, Yijin Wang, Annan Yu, Emma Hart, Michael W. Mahoney, Xiaoye S. Li, N. Benjamin Erichson,

(参考訳) フローマッチングは、最近、生成モデリングの強力なパラダイムとして現れ、潜在空間における確率的時系列予測にまで拡張されている。しかし,確率経路モデルの選択が予測性能に与える影響は未検討のままである。本研究では,フローマッチングによる時空間データの予測が,確率パスモデルの選択に非常に敏感であることを示す。そこで本研究では,予測性能の向上を目的とした新しい確率パスモデルを提案する。各種力学系ベンチマークにおける実験結果から,本モデルがトレーニング中の収束を高速化し,既存の確率パスモデルと比較して予測性能が向上することが示唆された。重要なことは、我々のアプローチは推論時に効率的であり、ほんの数ステップのサンプリングしか必要としない。これにより,提案手法は実世界の応用に有効であり,確率予測のための新たな道を開くことができる。

Flow matching has recently emerged as a powerful paradigm for generative modeling and has been extended to probabilistic time series forecasting in latent spaces. However, the impact of the specific choice of probability path model on forecasting performance remains under-explored. In this work, we demonstrate that forecasting spatio-temporal data with flow matching is highly sensitive to the selection of the probability path model. Motivated by this insight, we propose a novel probability path model designed to improve forecasting performance. Our empirical results across various dynamical system benchmarks show that our model achieves faster convergence during training and improved predictive performance compared to existing probability path models. Importantly, our approach is efficient during inference, requiring only a few sampling steps. This makes our proposed model practical for real-world applications and opens new avenues for probabilistic forecasting.

翻訳日:2024-11-03 02:54:39 公開日:2024-10-04

# LLMの信頼度に基づくLLM生成符号の選択的表示

Showing LLM-Generated Code Selectively Based on Confidence of LLMs ( http://arxiv.org/abs/2410.03234v1 )

ライセンス: Link先を確認

Jia Li, Yuqi Zhu, Yongmin Li, Ge Li, Zhi Jin,

(参考訳) 大規模言語モデル(LLM)は、コード生成において印象的な能力を示しているが、誤ったプログラムを生成する可能性がある。プログラムを読むのに10倍の時間がかかる。これらの誤ったプログラムを開発者に示すことは、開発者のエネルギーを無駄にし、ソフトウェアにセキュリティリスクを導入します。上記の制限に対処するため,新しいLLMベースのコード生成手法であるHonestCoderを提案する。 HonestCoder は LLM の信頼性に基づいて生成したプログラムを開発者に選択的に表示する。信頼性は、生成されたプログラムの正確性に関する貴重な洞察を提供する。この目的を達成するために,LLMのコード生成に対する信頼度を推定する新しい手法を提案する。 LLM 生成プログラム間のマルチモーダル類似度を測定することで信頼性を推定する。 TruthCodeBenchは2,265のサンプルからなり、2つの人気のあるプログラミング言語(PythonとJava)をカバーする。我々は、HonestCoderを4つの人気のあるLLM(例えば、DeepSeek-CoderとCode Llama)に適用し、TruthCodeBenchで評価する。実験の結果,以下の知見を得た。 1)HonestCoderはLLMの信頼性を効果的に推定し,生成したプログラムの正確性を正確に判定する。例えば、HoestCoderは、AUROCでは27.79%、AUCPRでは63.74%で最先端のベースラインを上回っている。 2) HonestCoderは、開発者が示す誤ったプログラムの数を減らすことができる。 8つのベースラインと比較して、より正しいプログラムと間違ったプログラムを開発者に示すことができる。 (3) コードが無差別に表示されるのと比較して、HoestCoderはわずかな時間オーバーヘッド(要求あたり約0.4秒)しか追加しない。 (4)ソフトウェア開発におけるLCMの活用を促進するための今後の方向性について論じる。コード関連タスクの実行において,LCMのアウトプットの信頼性を測る上で,この取り組みが広範な議論の動機となることを願っている。

Large Language Models (LLMs) have shown impressive abilities in code generation, but they may generate erroneous programs. Reading a program takes ten times longer than writing it. Showing these erroneous programs to developers will waste developers' energies and introduce security risks to software. To address the above limitations, we propose HonestCoder, a novel LLM-based code generation approach. HonestCoder selectively shows the generated programs to developers based on LLMs' confidence. The confidence provides valuable insights into the correctness of generated programs. To achieve this goal, we propose a novel approach to estimate LLMs' confidence in code generation. It estimates confidence by measuring the multi-modal similarity between LLMs-generated programs. We collect and release a multilingual benchmark named TruthCodeBench, which consists of 2,265 samples and covers two popular programming languages (i.e., Python and Java). We apply HonestCoder to four popular LLMs (e.g., DeepSeek-Coder and Code Llama) and evaluate it on TruthCodeBench. Based on the experiments, we obtain the following insights. (1) HonestCoder can effectively estimate LLMs' confidence and accurately determine the correctness of generated programs. For example, HonestCoder outperforms the state-of-the-art baseline by 27.79% in AUROC and 63.74% in AUCPR. (2) HonestCoder can decrease the number of erroneous programs shown to developers. Compared to eight baselines, it can show more correct programs and fewer erroneous programs to developers. (3) Compared to showing code indiscriminately, HonestCoder only adds slight time overhead (approximately 0.4 seconds per requirement). (4) We discuss future directions to facilitate the application of LLMs in software development. We hope this work can motivate broad discussions about measuring the reliability of LLMs' outputs in performing code-related tasks.

翻訳日:2024-11-02 23:28:42 公開日:2024-10-04

# 大規模言語モデルを用いた不整合公理付きオントロジーの強化

Enriching Ontologies with Disjointness Axioms using Large Language Models ( http://arxiv.org/abs/2410.03235v1 )

ライセンス: Link先を確認

Elias Crum, Antonio De Santis, Manon Ovide, Jiaxin Pan, Alessia Pisu, Nicolas Lazzari, Sebastian Rudolph,

(参考訳) オントロジは、知識グラフの洗練された推論と一貫性チェックに有用であるにもかかわらず、クラス間での明確な不一致宣言を欠いていることが多い。本研究では,クラス不整合公理を同定し,主張することで,オントロジーを充実させるLarge Language Models (LLMs) の可能性を探る。提案手法は,LLMに埋め込まれた暗黙の知識を活用することを目的としている。本手法をDBpediaのオントロジーで検証し,オープンソース LLM に着目した。本研究は, LLMが効果的なプロンプト戦略によって導かれると, クラス間の関係を確実に識別し, オントロジーの完了過程を広範囲な手入力なしで合理化できることを示唆する。包括的不整合性向上のために,不整合性とサブクラス文の論理的関係を考慮に入れ,満足度を維持し,LLMへの呼び出し数を減少させるプロセスを提案する。この研究は、自動オントロジー拡張におけるLLMの将来の応用の基礎を提供し、戦略的プロンプト設計によるLLM性能の最適化に関する洞察を提供する。私たちのコードはGitHubでhttps://github.com/n28div/llm-disjointnessで公開されています。

Ontologies often lack explicit disjointness declarations between classes, despite their usefulness for sophisticated reasoning and consistency checking in Knowledge Graphs. In this study, we explore the potential of Large Language Models (LLMs) to enrich ontologies by identifying and asserting class disjointness axioms. Our approach aims at leveraging the implicit knowledge embedded in LLMs, using prompt engineering to elicit this knowledge for classifying ontological disjointness. We validate our methodology on the DBpedia ontology, focusing on open-source LLMs. Our findings suggest that LLMs, when guided by effective prompt strategies, can reliably identify disjoint class relationships, thus streamlining the process of ontology completion without extensive manual input. For comprehensive disjointness enrichment, we propose a process that takes logical relationships between disjointness and subclass statements into account in order to maintain satisfiability and reduce the number of calls to the LLM. This work provides a foundation for future applications of LLMs in automated ontology enhancement and offers insights into optimizing LLM performance through strategic prompt design. Our code is publicly available on GitHub at https://github.com/n28div/llm-disjointness.

翻訳日:2024-11-02 23:28:42 公開日:2024-10-04

# 映画の字幕を超えて:YouTubeは音声語彙の最適な近似か?

Beyond Film Subtitles: Is YouTube the Best Approximation of Spoken Vocabulary? ( http://arxiv.org/abs/2410.03240v1 )

ライセンス: Link先を確認

Adam Nohejl, Frederikus Hudi, Eunike Andriani Kardinata, Shintaro Ozaki, Maria Angelica Riera Machin, Hongyu Sun, Justin Vasselli, Taro Watanabe,

(参考訳) 単語頻度は、心理言語学において重要な変数であり、大きな言語モデル(LLM)の時代でさえ、単語と人間の親しみをモデル化するのに有用である。映画の字幕の頻度は、日常的な言語露出の特に良い近似であることが証明されている。しかし、多くの言語では、映画の字幕は簡単には入手できないか、英語から圧倒的に翻訳されている。我々は、慎重に処理されたYouTube字幕から抽出された周波数が、現在利用可能な最も優れたリソースに匹敵する近似を提供することを示した。さらに、高品質な字幕や音声コーパスが存在しない言語でも利用できる。我々は,中国語,英語,インドネシア語,日本語,スペイン語の5つの多言語に対して,YouTube字幕を用いて周波数ノルムを構築し,語彙決定時間,単語親和性,語彙複雑性との相関性を評価する。 2つの心理言語学変数と強く相関するのに加えて、新しい周波数に対する単純な線形回帰は、英語と日本語の語彙的複雑性予測タスクにおいて、フィルム字幕周波数とLLM GPT-4で訓練されたモデルの両方を上回り、新しい高いスコアを達成する。私たちのコード、頻度リスト、fastTextワードの埋め込み、統計言語モデルはhttps://github.com/naist-nlp/tubelex.comで無料で利用可能です。

Word frequency is a key variable in psycholinguistics, useful for modeling human familiarity with words even in the era of large language models (LLMs). Frequency in film subtitles has proved to be a particularly good approximation of everyday language exposure. For many languages, however, film subtitles are not easily available, or are overwhelmingly translated from English. We demonstrate that frequencies extracted from carefully processed YouTube subtitles provide an approximation comparable to, and often better than, the best currently available resources. Moreover, they are available for languages for which a high-quality subtitle or speech corpus does not exist. We use YouTube subtitles to construct frequency norms for five diverse languages, Chinese, English, Indonesian, Japanese, and Spanish, and evaluate their correlation with lexical decision time, word familiarity, and lexical complexity. In addition to being strongly correlated with two psycholinguistic variables, a simple linear regression on the new frequencies achieves a new high score on a lexical complexity prediction task in English and Japanese, surpassing both models trained on film subtitle frequencies and the LLM GPT-4. Our code, the frequency lists, fastText word embeddings, and statistical language models are freely available at https://github.com/naist-nlp/tubelex.

翻訳日:2024-11-02 23:28:42 公開日:2024-10-04

# オンライン模倣学習のための単一歩数サイクルによる潜時行動の先行

Latent Action Priors From a Single Gait Cycle Demonstration for Online Imitation Learning ( http://arxiv.org/abs/2410.03246v1 )

ライセンス: Link先を確認

Oliver Hausdörfer, Alexander von Rohr, Éric Lefort, Angela Schoellig,

(参考訳) シミュレーションにおける深層強化学習(DRL)は、しばしば脆く非現実的な学習結果をもたらす。エージェントをより望ましいソリューションへプッシュするには、例えば報酬形成、専門家データ、モーションプリミティブを通じて、学習プロセスに事前情報を注入することができる。本稿では,ロボット学習における帰納的バイアスとして,専門家による実証から学んだ潜伏行動を行動空間の先行として提案する。単純なオートエンコーダを用いて1つのオープンループ歩行サイクルのみからこれらの動作先を学習できることが示される。 DRLにおけるこれらの潜伏アクションの先行と、模倣のための確立されたスタイルの報酬を組み合わせることで、上記の専門家によるパフォーマンスのレベルが達成され、より望ましい歩みにつながります。さらに、アクション先行は転送タスクの性能を大幅に改善し、より高い目標速度の歩行遷移を導いた。ビデオとコードはhttps://sites.google.com/view/latent-action-priors.comで公開されている。

Deep Reinforcement Learning (DRL) in simulation often results in brittle and unrealistic learning outcomes. To push the agent towards more desirable solutions, prior information can be injected in the learning process through, for instance, reward shaping, expert data, or motion primitives. We propose an additional inductive bias for robot learning: latent actions learned from expert demonstration as priors in the action space. We show that these action priors can be learned from only a single open-loop gait cycle using a simple autoencoder. Using these latent action priors combined with established style rewards for imitation in DRL achieves above expert demonstration level of performance and leads to more desirable gaits. Further, action priors substantially improve the performance on transfer tasks, even leading to gait transitions for higher target speeds. Videos and code are available at https://sites.google.com/view/latent-action-priors.

翻訳日:2024-11-02 23:28:42 公開日:2024-10-04

# 多チャンネル情報を用いた神経細胞の3次元分割と細胞型同定

3D Segmentation of Neuronal Nuclei and Cell-Type Identification using Multi-channel Information ( http://arxiv.org/abs/2410.03248v1 )

ライセンス: Link先を確認

Antonio LaTorre, Lidia Alonso-Nanclares, José María Peña, Javier De Felipe,

(参考訳) 背景分析画像を用いて脳内の異なる細胞の種類を正確に推定することは神経科学の主要な目的である。神経細胞の自動的、選択的検出とセグメンテーションは、神経解剖学的研究において重要なステップである。神経核の3次元再構成を改良し,非神経細胞型以外の領域を分割する手法を提案する。結果は,ラット新皮質からの画像のスタック上で,複雑なシナリオ(大きな画像のスタック,不均一な染色,異なる細胞マーカーを可視化する3つの異なるチャネル)で検証した。神経細胞核と3Dセグメンテーションの良好な識別比を提供することができた。既存の方法との比較: 多くの自動ツールが現在利用可能であるが、異なる方法では、ラベル付けとイメージング技術の違いや、細胞を検知するアルゴリズムの違いにより、同じ脳の領域でも異なる細胞数の推定結果が得られる。さらに、利用可能な自動化ソフトウェア手法のいくつかは、神経解剖学者による評価の後、不正確または不整合であると報告された細胞数を推定した。結論神経細胞、グリア細胞、および血管周囲細胞を識別する自動セグメンテーションのためのツールを持つことは重要である。それは、現在手動で実行されているタスクを大幅にスピードアップし、細胞のカウントを体系的にし、人間のバイアスを避けます。さらに、異なる細胞の3次元再構成により、細胞の空間分布のモデルを生成することができる。

Background Analyzing images to accurately estimate the number of different cell types in the brain using automatic methods is a major objective in neuroscience. The automatic and selective detection and segmentation of neurons would be an important step in neuroanatomical studies. New method We present a method to improve the 3D reconstruction of neuronal nuclei that allows their segmentation, excluding the nuclei of non-neuronal cell types. Results We have tested the algorithm on stacks of images from rat neocortex, in a complex scenario (large stacks of images, uneven staining, and three different channels to visualize different cellular markers). It was able to provide a good identification ratio of neuronal nuclei and a 3D segmentation. Comparison with Existing Methods: Many automatic tools are in fact currently available, but different methods yield different cell count estimations, even in the same brain regions, due to differences in the labeling and imaging techniques, as well as in the algorithms used to detect cells. Moreover, some of the available automated software methods have provided estimations of cell numbers that have been reported to be inaccurate or inconsistent after evaluation by neuroanatomists. Conclusions It is critical to have a tool for automatic segmentation that allows discrimination between neurons, glial cells and perivascular cells. It would greatly speed up a task that is currently performed manually and would allow the cell counting to be systematic, avoiding human bias. Furthermore, the resulting 3D reconstructions of different cell types can be used to generate models of the spatial distribution of cells.

翻訳日:2024-11-02 23:28:42 公開日:2024-10-04

# エキスパートレベル言語モデルはエキスパートレベルアノテーションか?

Are Expert-Level Language Models Expert-Level Annotators? ( http://arxiv.org/abs/2410.03254v1 )

ライセンス: Link先を確認

Yu-Min Tseng, Wei-Lin Chen, Chung-Chi Chen, Hsin-Hsi Chen,

(参考訳) データアノテーションは、関連する情報を含むテキストデータのラベル付けやタグ付けを指す。 LLMをヒトのアノテーターの代替品として利用することについて、多くの研究が肯定的な結果を報告している。しかし、既存の研究は古典的なNLPタスクに焦点をあてており、専門家の知識を必要とする領域において、データアノテータとしてのLLMが果たすことの度合いは未定である。本研究では,3つの専門分野にわたる包括的アプローチについて検討し,費用対効果の観点からの実践的提案について考察する。我々の知る限り、我々はLSMを専門家レベルのデータアノテータとして初めて体系的に評価した。

Data annotation refers to the labeling or tagging of textual data with relevant information. A large body of works have reported positive results on leveraging LLMs as an alternative to human annotators. However, existing studies focus on classic NLP tasks, and the extent to which LLMs as data annotators perform in domains requiring expert knowledge remains underexplored. In this work, we investigate comprehensive approaches across three highly specialized domains and discuss practical suggestions from a cost-effectiveness perspective. To the best of our knowledge, we present the first systematic evaluation of LLMs as expert-level data annotators.

翻訳日:2024-11-02 23:28:42 公開日:2024-10-04

# 事前学習言語モデルにおける語彙適応強化のための適応的BPEトークン化

Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models ( http://arxiv.org/abs/2410.03258v1 )

ライセンス: Link先を確認

Gunjan Balde, Soumyadeep Roy, Mainack Mondal, Niloy Ganguly,

(参考訳) 本研究では, マイクロペア符号化(BPE)トークン化方式を用いた語彙適応手法の基本的制限を, エキスパートドメインへの微調整事前学習言語モデル (PLM) に適用する。現在のアプローチでは、PLM語彙の最後にターゲットドメイン固有の語彙を自明に付加している。このアプローチは優先度の低いスコアをもたらし、与えられたテキストのトークン化にマージルールを反復的に使用するBPEの準最適トークン化を引き起こす。この問題を軽減するために,BPEトークン化初期化フェーズを修正したAdaptBPEを提案する。各種分類タスクと要約タスクに対して,AdaptBPEと標準BPEを広範囲に評価し,AdaptBPEの精度は3.57%,Rue-Lでは1.87%向上した。 MEDVOCのAdaptBPEは、参照サマリーがOOV濃度が高い場合や長さが長い場合、特にうまく機能する。また,AdaptBPEがMEDVOCと比較して,より関連性が高く忠実な要約を生成することを明らかにする。コードベースはhttps://github.com/gb-kgp/adaptbpe.comで公開しています。

In this work, we show a fundamental limitation in vocabulary adaptation approaches that use Byte-Pair Encoding (BPE) tokenization scheme for fine-tuning pretrained language models (PLMs) to expert domains. Current approaches trivially append the target domain-specific vocabulary at the end of the PLM vocabulary. This approach leads to a lower priority score and causes sub-optimal tokenization in BPE that iteratively uses merge rules to tokenize a given text. To mitigate this issue, we propose AdaptBPE where the BPE tokenization initialization phase is modified to first perform the longest string matching on the added (target) vocabulary before tokenizing at the character level. We perform an extensive evaluation of AdaptBPE versus the standard BPE over various classification and summarization tasks; AdaptBPE improves by 3.57% (in terms of accuracy) and 1.87% (in terms of Rouge-L), respectively. AdaptBPE for MEDVOC works particularly well when reference summaries have high OOV concentration or are longer in length. We also conduct a human evaluation, revealing that AdaptBPE generates more relevant and more faithful summaries as compared to MEDVOC. We make our codebase publicly available at https://github.com/gb-kgp/adaptbpe.

翻訳日:2024-11-02 23:28:42 公開日:2024-10-04

# 部分空間アライメントによる回帰テスト時間適応

Test-time Adaptation for Regression by Subspace Alignment ( http://arxiv.org/abs/2410.03263v1 )

ライセンス: Link先を確認

Kazuki Adachi, Shin'ya Yamaguchi, Atsutoshi Kumagai, Tomoki Hamagami,

(参考訳) 本稿では、ソース領域で事前訓練された回帰モデルを、ラベルなしのターゲットデータを含む未知のターゲット分布に適応させる、回帰のためのテスト時間適応(TTA)について検討する。回帰は機械学習の基本的なタスクの1つであるが、既存のTTA手法のほとんどは分類固有の設計を持ち、モデルがクラス分類予測を出力するのに対し、回帰モデルは典型的には単一のスカラー値のみを出力する。回帰のためにTTAを有効にするために、ソースとターゲットドメイン間の特徴分布を整列させてドメインギャップを緩和する機能アライメントアプローチを採用する。しかし, 従来のTTA手法では, 小部分空間に分散し, 生の特徴次元の多くが出力にはほとんど意味がないため, 特徴アライメントが不有効あるいはさらに悪化することが判明した。回帰のためのTTAにおける効果的な特徴アライメントとして,SSA(Significant-subspace Alignment)を提案する。 SSAは、部分空間検出と次元重み付けという2つのコンポーネントから構成される。部分空間検出は、出力に代表的で重要な特徴部分空間を見つける。そして、TTA中にサブスペースで特徴アライメントを行う。一方、次元重み付けは出力により大きな意味を持つ特徴部分空間の次元の重要性を高める。実世界のデータセットにおいて,SSAが様々なベースラインより優れていることを示す。

This paper investigates test-time adaptation (TTA) for regression, where a regression model pre-trained in a source domain is adapted to an unknown target distribution with unlabeled target data. Although regression is one of the fundamental tasks in machine learning, most of the existing TTA methods have classification-specific designs, which assume that models output class-categorical predictions, whereas regression models typically output only single scalar values. To enable TTA for regression, we adopt a feature alignment approach, which aligns the feature distributions between the source and target domains to mitigate the domain gap. However, we found that naive feature alignment employed in existing TTA methods for classification is ineffective or even worse for regression because the features are distributed in a small subspace and many of the raw feature dimensions have little significance to the output. For an effective feature alignment in TTA for regression, we propose Significant-subspace Alignment (SSA). SSA consists of two components: subspace detection and dimension weighting. Subspace detection finds the feature subspace that is representative and significant to the output. Then, the feature alignment is performed in the subspace during TTA. Meanwhile, dimension weighting raises the importance of the dimensions of the feature subspace that have greater significance to the output. We experimentally show that SSA outperforms various baselines on real-world datasets.

翻訳日:2024-11-02 23:28:42 公開日:2024-10-04

# ε$-contaminateed Credal Setsの最適輸送

Optimal Transport for $ε$-Contaminated Credal Sets ( http://arxiv.org/abs/2410.03267v1 )

ライセンス: Link先を確認

Michele Caprio,

(参考訳) 我々は,モンジェとカントロビッチの最適輸送問題の確率の低いバージョンを提供する。より低い確率が$\epsilon$-contaminated set の下位エンベロープである場合、Monge の我々のバージョンと、関東ロビッチの問題の限定バージョンは、それぞれの古典バージョンと一致することを示す。また,関東ロビッチの最適計画の存在条件と,その2つの問題が等価となる条件についても述べる。副産物として、$\epsilon$-contamination の場合、Monge と Kantorovich の最適輸送問題は一致しない。機械学習と人工知能への本研究の応用についても論じる。

We provide a version for lower probabilities of Monge's and Kantorovich's optimal transport problems. We show that, when the lower probabilities are the lower envelopes of $\epsilon$-contaminated sets, then our version of Monge's, and a restricted version of our Kantorovich's problems, coincide with their respective classical versions. We also give sufficient conditions for the existence of our version of Kantorovich's optimal plan, and for the two problems to be equivalent. As a byproduct, we show that for $\epsilon$-contaminations the lower probability versions of Monge's and Kantorovich's optimal transport problems need not coincide. The applications of our results to Machine Learning and Artificial Intelligence are also discussed.

翻訳日:2024-11-02 23:28:42 公開日:2024-10-04

# 空間探索のための量子ウォークに及ぼす二変量ガウスポテンシャルの影響

Impact of Bivariate Gaussian Potentials on Quantum Walks for Spatial Search ( http://arxiv.org/abs/2410.03269v1 )

ライセンス: Link先を確認

Franklin de L. Marquezino, Raqueline A. M. Santos,

(参考訳) 空間探索問題における量子ウォークの力学に対するポテンシャル場,特に二変量ガウス分布関数の利用の影響について検討する。二次元格子を探索するためのAmbainis-Kempe-Rivosh(AKR)モデル上に構築し,二変量ガウス関数の標準偏差変化と正規化が探索アルゴリズムの性能に与える影響について検討する。その結果,標準偏差が小さい場合,量子ウォークはAKRアルゴリズムを密接に反映するが,標準偏差が大きくなるにつれて成功確率が急速に低下することを示した。この振る舞いは、二変量ガウスがAKRアルゴリズム内の雑音の多いオラクルを効果的にモデル化する方法を示す。さらに、AKRベースのモデルと代替量子ウォークモデルとの比較を、ハダマール硬貨と標準シフトを用いて行った。これらの知見は、量子ウォーク探索アルゴリズムの堅牢性を理解し、量子ウォークを最適化アルゴリズムに適用する方法に関する洞察を与えるのに寄与する。

We examine the impact of potential fields, particularly utilizing a bivariate Gaussian distribution function, on the dynamics of quantum walks in spatial search problems. Building on the Ambainis-Kempe-Rivosh (AKR) model for searching on a two-dimensional grid, we incorporate potential fields to investigate how changes in standard deviation and normalization of the bivariate Gaussian function impact the performance of the search algorithm. Our results show that the quantum walk closely mirrors the AKR algorithm when the standard deviation is small but exhibits a rapid decay in success probability as the standard deviation increases. This behavior demonstrates how the bivariate Gaussian can effectively model a noisy oracle within the AKR algorithm. Additionally, we compare the AKR-based model with an alternative quantum walk model using a Hadamard coin and standard shift. These findings contribute to understanding the robustness of quantum walk search algorithms, and provide insights into how quantum walks can be applied to optimization algorithms.

翻訳日:2024-11-02 23:28:42 公開日:2024-10-04

# 感情負荷によるユーザ生成コンテンツの機械翻訳評価のためのマルチタスク学習フレームワーク

A Multi-task Learning Framework for Evaluating Machine Translation of Emotion-loaded User-generated Content ( http://arxiv.org/abs/2410.03277v1 )

ライセンス: Link先を確認

Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo,

(参考訳) ユーザ生成コンテンツ(UGC)の機械翻訳(MT)は、スラング、感情、皮肉や皮肉といった文学的デバイスを扱うなど、ユニークな課題を生んでいる。これらの翻訳の品質を評価することは、現在のメトリクスがUGCのユビキタスな機能に重点を置いていないため、難しい。この問題に対処するために、感情ラベルと多次元品質指標に基づく人手による翻訳誤りを含む既存の感情関連データセットを利用する。文レベル評価スコアと単語レベルラベルで拡張し、マルチタスク設定で文レベルと単語レベルの翻訳評価と感情分類に適したデータセットを作成する。我々はこれらのタスクを同時に実行する新しいアーキテクチャを提案し、NashやAligned Lossのような異なる損失ヒューリスティックを統合した新しい複合損失関数を提案する。本評価では,既存の微調整学習とマルチタスク学習のアプローチを比較し,複数のデータセット上でのアブレーション実験による一般化を評価する。提案手法は最先端性能を実現し,UGCのMT評価のための総合的な解析手法を提案する。

Machine translation (MT) of user-generated content (UGC) poses unique challenges, including handling slang, emotion, and literary devices like irony and sarcasm. Evaluating the quality of these translations is challenging as current metrics do not focus on these ubiquitous features of UGC. To address this issue, we utilize an existing emotion-related dataset that includes emotion labels and human-annotated translation errors based on Multi-dimensional Quality Metrics. We extend it with sentence-level evaluation scores and word-level labels, leading to a dataset suitable for sentence- and word-level translation evaluation and emotion classification, in a multi-task setting. We propose a new architecture to perform these tasks concurrently, with a novel combined loss function, which integrates different loss heuristics, like the Nash and Aligned losses. Our evaluation compares existing fine-tuning and multi-task learning approaches, assessing generalization with ablative experiments over multiple datasets. Our approach achieves state-of-the-art performance and we present a comprehensive analysis for MT evaluation of UGC.

翻訳日:2024-11-02 23:18:36 公開日:2024-10-04

# デジタルステレオスコープを用いたマニキン記録心肺音データセット

Manikin-Recorded Cardiopulmonary Sounds Dataset Using Digital Stethoscope ( http://arxiv.org/abs/2410.03280v1 )

ライセンス: Link先を確認

Yasaman Torabi, Shahram Shirani, James P. Reilly,

(参考訳) 心臓と肺の音は、医療監視に不可欠です。近年の聴診器技術の進歩により、患者の音を精度良く捉えられるようになった。本データセットでは,個人と混合記録を含む心臓と肺の両方の音を計測するために,デジタル聴診器を用いた。私たちの知る限りでは、このデータセットは心呼吸音と混合呼吸音の両方を提供する最初のデータセットです。記録は、ヒトの生理状態を再現し、身体の異なる場所でクリーンな心臓と肺の音を発生させる患者シミュレータである臨床マニキンから収集された。このデータセットは、正常な音と様々な異常(例えば、大腿骨、心房細動、頻拍、房室ブロック、第3および第4心臓音、捕食、ひび割れ、ロンチ、胸水、ガーリング音)を含む。このデータセットは、専門看護師が定めるように、異なる解剖学的場所で行われる胸部検査の音声記録を含む。それぞれの録音は、特定の音種を強調するために周波数フィルタを用いて拡張されている。このデータセットは、自動心肺疾患検出、音分類、教師なし分離技術、音声信号処理に関連するディープラーニングアルゴリズムなど、人工知能の応用に有用である。

Heart and lung sounds are crucial for healthcare monitoring. Recent improvements in stethoscope technology have made it possible to capture patient sounds with enhanced precision. In this dataset, we used a digital stethoscope to capture both heart and lung sounds, including individual and mixed recordings. To our knowledge, this is the first dataset to offer both separate and mixed cardiorespiratory sounds. The recordings were collected from a clinical manikin, a patient simulator designed to replicate human physiological conditions, generating clean heart and lung sounds at different body locations. This dataset includes both normal sounds and various abnormalities (i.e., murmur, atrial fibrillation, tachycardia, atrioventricular block, third and fourth heart sound, wheezing, crackles, rhonchi, pleural rub, and gurgling sounds). The dataset includes audio recordings of chest examinations performed at different anatomical locations, as determined by specialist nurses. Each recording has been enhanced using frequency filters to highlight specific sound types. This dataset is useful for applications in artificial intelligence, such as automated cardiopulmonary disease detection, sound classification, unsupervised separation techniques, and deep learning algorithms related to audio signal processing.

翻訳日:2024-11-02 23:18:36 公開日:2024-10-04

# BN-SCAFFOLD:フェデレートラーニングにおけるバッチ正規化統計のドリフト制御

BN-SCAFFOLD: controlling the drift of Batch Normalization statistics in Federated Learning ( http://arxiv.org/abs/2410.03281v1 )

ライセンス: Link先を確認

Gonzalo Iñaki Quintana, Laurence Vancamberg, Vincent Jugnon, Mathilde Mougeot, Agnès Desolneux,

(参考訳) 機械学習(ML)モデルを分散的にトレーニングするための学習パラダイムとして、フェデレートラーニング(FL)が注目を集めている。バッチ正規化(BN)は、収束と一般化を改善するため、ディープニューラルネットワーク(DNN)においてユビキタスである。しかし、BNは異種FLにおけるDNNの性能を阻害すると報告されている。近年、BN統計と全てのクライアントからの勾配を集約することにより、BN上の不均一性の影響を軽減するためにFedTANアルゴリズムが提案されている。しかし、通信コストが高く、DNNの深さとともに直線的に増加する。 SCAFFOLDは分散低減アルゴリズムであり、クライアントのドリフトを通信効率のよい方法で推定し、補正する。ヘテロジニアスFL設定の有望な結果にもかかわらず、BNを持つモデルでは性能が劣っていることが報告されている。本研究では、異種FLにおけるBNを用いたDNNの効率的なトレーニング方法として、SCAFFOLD、より一般的にはばらつきの低減を目指す。 Wang et al 2023 の業績に触発された BN-DNN 設定における分散還元アルゴリズムの収束を解析するための統一理論フレームワークを導入し,SSCAFFOLD が BN-DNN のバイアスを除去できないことを示す。そこで我々は,SCAFFOLDのクライアントドリフト補正をBN統計に拡張するBN-SCAFFOLDアルゴリズムを提案する。上記のフレームワークを用いて収束を証明し、MNISTとCIFAR-10の実験により理論的結果を検証する。 BN-SCAFFOLDは通信コストが高く、フェデレート平均化(FedAvg)、SCAFFOLD、およびBNの不均一性を緩和するために設計された他のFLアルゴリズムよりも優れている。

Federated Learning (FL) is gaining traction as a learning paradigm for training Machine Learning (ML) models in a decentralized way. Batch Normalization (BN) is ubiquitous in Deep Neural Networks (DNN), as it improves convergence and generalization. However, BN has been reported to hinder performance of DNNs in heterogeneous FL. Recently, the FedTAN algorithm has been proposed to mitigate the effect of heterogeneity on BN, by aggregating BN statistics and gradients from all the clients. However, it has a high communication cost, that increases linearly with the depth of the DNN. SCAFFOLD is a variance reduction algorithm, that estimates and corrects the client drift in a communication-efficient manner. Despite its promising results in heterogeneous FL settings, it has been reported to underperform for models with BN. In this work, we seek to revive SCAFFOLD, and more generally variance reduction, as an efficient way of training DNN with BN in heterogeneous FL. We introduce a unified theoretical framework for analyzing the convergence of variance reduction algorithms in the BN-DNN setting, inspired of by the work of Wang et al. 2023, and show that SCAFFOLD is unable to remove the bias introduced by BN. We thus propose the BN-SCAFFOLD algorithm, which extends the client drift correction of SCAFFOLD to BN statistics. We prove convergence using the aforementioned framework and validate the theoretical results with experiments on MNIST and CIFAR-10. BN-SCAFFOLD equals the performance of FedTAN, without its high communication cost, outperforming Federated Averaging (FedAvg), SCAFFOLD, and other FL algorithms designed to mitigate BN heterogeneity.

翻訳日:2024-11-02 23:18:36 公開日:2024-10-04

# ボルツマン密度からのニューラルサンプリング:ワッサーシュタイン幾何学におけるフィッシャー・ラオ曲線

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry ( http://arxiv.org/abs/2410.03282v1 )

ライセンス: Link先を確認

Jannis Chemseddine, Christian Wald, Richard Duong, Gabriele Steidl,

(参考訳) 非正規化ボルツマン密度$\rho_D$ からエネルギー$f_t$ で与えられるボルツマン曲線を単純な密度$\rho_Z$ から学習してサンプリングするタスクに対処する。まず、フィッシャー・ラオ流がワッサーシュタイン幾何学において絶対連続である条件を検討する。第二に、特定の補間 $f_t$ と、関連する密度/速度対 $(\rho_t,v_t)$ の学習に対処する。速度場$v_t$のパラメトリゼーションしか必要としない線形補間が「質量の対流」の問題に悩まされることが数値的に観察された。ワッサーシュタイン幾何学のツールを用いて解析的な例を示し、速度場の爆発を正確に測定する。 M\'at\'e と Fleuret に触発され、$f_t$ と $v_t$ の両方をパラメタライズし、$f_t$ のみをパラメタライズし、適切な $v_t$ を修正する補間法を提案する。これはランゲヴィン力学に関連するクルバック・リーブラー発散のワッサーシュタイン勾配流に対応している。我々は,本モデルが,上記サンプリング課題をうまく解くための良好な流れ場を提供することを示す数値例で示す。

We deal with the task of sampling from an unnormalized Boltzmann density $\rho_D$ by learning a Boltzmann curve given by energies $f_t$ starting in a simple density $\rho_Z$. First, we examine conditions under which Fisher-Rao flows are absolutely continuous in the Wasserstein geometry. Second, we address specific interpolations $f_t$ and the learning of the related density/velocity pairs $(\rho_t,v_t)$. It was numerically observed that the linear interpolation, which requires only a parametrization of the velocity field $v_t$, suffers from a "teleportation-of-mass" issue. Using tools from the Wasserstein geometry, we give an analytical example, where we can precisely measure the explosion of the velocity field. Inspired by M\'at\'e and Fleuret, who parametrize both $f_t$ and $v_t$, we propose an interpolation which parametrizes only $f_t$ and fixes an appropriate $v_t$. This corresponds to the Wasserstein gradient flow of the Kullback-Leibler divergence related to Langevin dynamics. We demonstrate by numerical examples that our model provides a well-behaved flow field which successfully solves the above sampling task.

翻訳日:2024-11-02 23:18:36 公開日:2024-10-04

# uniINF:パラメータフリー重機MABのためのBest-of-Both-Worldsアルゴリズム

uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs ( http://arxiv.org/abs/2410.03284v1 )

ライセンス: Link先を確認

Yu Chen, Jiatai Huang, Yan Dai, Longbo Huang,

(参考訳) 本稿では,HTMAB(Heavy-Tailed Multi-Armed Bandits)問題に対するUniINFアルゴリズムを提案する。時間とともに損失分布が一定となる確率的MAB設定とは異なり、本研究は両腕と時間に依存する重み付き分布から損失が生じる対向的な構成にまで拡張する。我々の新しいアルゴリズム「uniINF」は、Best-of-Both-Worlds(BoBW)特性を楽しみ、正確な環境タイプを知らずに確率的および対角的環境の両方で最適に機能する。さらに,本アルゴリズムはパラメータフリーの機能も備えており,重みパラメータ $(\sigma, \alpha)$ a-priori を知らずに動作する。正確に言うと、uniINFは確率的および対数的環境においてほぼ最適の後悔を保証し、$(\sigma, \alpha)$が知られているときに対応する下界と一致する(対数的要因まで)。我々の知る限り、UniINFは重み付きMAB問題に対するBoBW特性を達成する最初のパラメータフリーアルゴリズムである。技術的には、パラメータフリーHTMABのBoBW保証を実現する革新的な技術を開発し、ログバリアのダイナミクスの洗練された解析、自動分散学習率スケジューリングスキーム、適応的なスキップ・クリッピング・ロスチューニング技術、対数後悔の停止時間解析を含む。

In this paper, we present a novel algorithm, uniINF, for the Heavy-Tailed Multi-Armed Bandits (HTMAB) problem, demonstrating robustness and adaptability in both stochastic and adversarial environments. Unlike the stochastic MAB setting where loss distributions are stationary with time, our study extends to the adversarial setup, where losses are generated from heavy-tailed distributions that depend on both arms and time. Our novel algorithm `uniINF` enjoys the so-called Best-of-Both-Worlds (BoBW) property, performing optimally in both stochastic and adversarial environments without knowing the exact environment type. Moreover, our algorithm also possesses a Parameter-Free feature, i.e., it operates without the need of knowing the heavy-tail parameters $(\sigma, \alpha)$ a-priori. To be precise, uniINF ensures nearly-optimal regret in both stochastic and adversarial environments, matching the corresponding lower bounds when $(\sigma, \alpha)$ is known (up to logarithmic factors). To our knowledge, uniINF is the first parameter-free algorithm to achieve the BoBW property for the heavy-tailed MAB problem. Technically, we develop innovative techniques to achieve BoBW guarantees for Parameter-Free HTMABs, including a refined analysis for the dynamics of log-barrier, an auto-balancing learning rate scheduling scheme, an adaptive skipping-clipping loss tuning technique, and a stopping-time analysis for logarithmic regret.

翻訳日:2024-11-02 23:18:35 公開日:2024-10-04

# 計算外交 : 「善のためのハッカソン」がデジタル時代の多元主義にどう貢献するか

Computational Diplomacy: How "hackathons for good" feed a participatory future for multilateralism in the digital age ( http://arxiv.org/abs/2410.03286v1 )

ライセンス: Link先を確認

Thomas Maillart, Lucia Gomez, Ewa Lombard, Alexander Nolte, Francesco Pisano,

(参考訳) この記事では、グローバルSDGの課題に対処することに焦点を当てた、ソフトウェアとハードウェア開発者のコミュニティを構築する上でのハッカソンの役割について説明する。我々は、この動きを計算外交として論じる:グローバルな問題に対処するために集団的知性を活用するデジタルガバナンスのための分散的で参加的なプロセス。 DevpostとGitHubのデータを分析してみると、2010年以降のハッカソンの30%がSDGのトピックに取り組み、革新的なソリューションを作るためにさまざまな技術を採用していることが分かる。ハッカソンは重要なカイロの瞬間として機能し、即時のプロジェクト成果と長期生産の両方を駆動するイノベーションのバーストを引き起こします。これらの出来事は、人間の協力と共感の神経生物学的基盤を利用し、目的意識を育み、対人偏見を減らすことを提案する。デジタルガバナンスに対するこのボトムアップアプローチは、ソフトウェア開発、人間の集合知性、集団行動を統合し、変革的変革のための動的モデルを作成します。カイロモーメントを活用することで、計算外交は未来のデジタル多角的ガバナンスにおいてより包括的で効果的なモデルを促進する。

This article explores the role of hackathons for good in building a community of software and hardware developers focused on addressing global SDG challenges. We theorise this movement as computational diplomacy: a decentralised, participatory process for digital governance that leverages collective intelligence to tackle major global issues. Analysing Devpost and GitHub data reveals that 30% of hackathons since 2010 have addressed SDG topics, employing diverse technologies to create innovative solutions. Hackathons serve as crucial kairos moments, sparking innovation bursts that drive both immediate project outcomes and long-term production. We propose that these events harness the neurobiological basis of human cooperation and empathy, fostering a collective sense of purpose and reducing interpersonal prejudice. This bottom-up approach to digital governance integrates software development, human collective intelligence, and collective action, creating a dynamic model for transformative change. By leveraging kairos moments, computational diplomacy promotes a more inclusive and effective model for digital multilateral governance of the future.

翻訳日:2024-11-02 23:18:35 公開日:2024-10-04

# セマンティックセグメンテーションに基づくスライディング画像の組織学的品質制御

Semantic Segmentation Based Quality Control of Histopathology Whole Slide Images ( http://arxiv.org/abs/2410.03289v1 )

ライセンス: Link先を確認

Abhijeet Patil, Garima Jain, Harsh Diwakar, Jay Sawant, Tripti Bameta, Swapnil Rane, Amit Sethi,

(参考訳) 我々は, 組織領域, 組織折り, ペンマークなど, さまざまな領域を区分する, 病理組織全体像(WSI)の品質管理のためのソフトウェアパイプラインを開発した。 WSIを処理するためのGPUの必要性と可用性の向上を踏まえ、提案したパイプラインは、精度と速度のバランスをとるために、複数の軽量ディープラーニングモデルで構成されている。パイプラインは全TCGAで評価され、これは28の臓器から11,000以上の病理像を含むWSIデータセットとして最大である。これは、深層学習をベースとしない以前の研究と比較され、臓器間でのセグメンテーションの結果が一貫した改善を示した。組織やぼやけたセグメンテーションに対するアノテーションの労力を最小限に抑えるため, パッチ分類ツールHistoROIを用いてラベルが同定されたWSIからモザイクパッチ(サブイメージ)をモザイクすることで, 注釈付き画像が自動的に作成される。トレーニング済みのQCパイプラインの汎用性と、その広範なテストにより、この作業の潜在的な影響は広くなっています。大規模な病理画像解析の精度と信頼性を高めるために、WSIコホートの自動前処理に使用できる。トレーニングされたモデル、トレーニングスクリプト、トレーニングデータ、推論結果はhttps://github.com/abhijeetptl5/wsisegqcで公開されています。

We developed a software pipeline for quality control (QC) of histopathology whole slide images (WSIs) that segments various regions, such as blurs of different levels, tissue regions, tissue folds, and pen marks. Given the necessity and increasing availability of GPUs for processing WSIs, the proposed pipeline comprises multiple lightweight deep learning models to strike a balance between accuracy and speed. The pipeline was evaluated in all TCGAs, which is the largest publicly available WSI dataset containing more than 11,000 histopathological images from 28 organs. It was compared to a previous work, which was not based on deep learning, and it showed consistent improvement in segmentation results across organs. To minimize annotation effort for tissue and blur segmentation, annotated images were automatically prepared by mosaicking patches (sub-images) from various WSIs whose labels were identified using a patch classification tool HistoROI. Due to the generality of our trained QC pipeline and its extensive testing the potential impact of this work is broad. It can be used for automated pre-processing any WSI cohort to enhance the accuracy and reliability of large-scale histopathology image analysis for both research and clinical use. We have made the trained models, training scripts, training data, and inference results publicly available at https://github.com/abhijeetptl5/wsisegqc, which should enable the research community to use the pipeline right out of the box or further customize it to new datasets and applications in the future.

翻訳日:2024-11-02 23:18:35 公開日:2024-10-04

# グラウンドドビデオLLM:ビデオ大言語モデルにおける微細な時間的グラウンド化

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models ( http://arxiv.org/abs/2410.03290v1 )

ライセンス: Link先を確認

Haibo Wang, Zhiyang Xu, Yu Cheng, Shizhe Diao, Yufan Zhou, Yixin Cao, Qifan Wang, Weifeng Ge, Lifu Huang,

(参考訳) ビデオ大言語モデル (Video-LLMs) は、粗粒度ビデオ理解において顕著な能力を示したが、細粒度の時間的接地に苦慮している。本稿では,特定の映像モーメントをきめ細かな方法で知覚・推論できる新しいビデオLLMであるGrounded-VideoLLMを紹介する。実時間モデルやタイムスタンプ表現が欠如しているため,現在のビデオ-LLMでは微細な映像理解に制限がある。そこで我々は,(1)フレーム間の関係を符号化するための時間的ストリームと(2)タイムスタンプを表現するための時間的知識に富んだ離散的時間的トークンを付加することにより,モデルを強化する。 Grounded-VideoLLMのトレーニングを最適化するために、簡単なビデオキャプションタスクから始まり、ビデオ時間的グラウンドニングタスクを段階的に導入し、複雑さを増す。 Grounded-VideoLLMの時間的推論能力をさらに強化するため、自動アノテーションパイプラインにより地上ビデオQAデータセットをキュレートする。広汎な実験により、Grounded-VideoLLMは、時間文の接地、高密度ビデオキャプション、グラウンドドビデオQAといったきめ細かい接地作業に優れるだけでなく、一般的なビデオ理解のための多目的ビデオアシスタントとして大きな可能性を示す。

Video Large Language Models (Video-LLMs) have demonstrated remarkable capabilities in coarse-grained video understanding, however, they struggle with fine-grained temporal grounding. In this paper, we introduce Grounded-VideoLLM, a novel Video-LLM adept at perceiving and reasoning over specific video moments in a fine-grained manner. We identify that current Video-LLMs have limitations for fine-grained video understanding since they lack effective temporal modeling and timestamp representation. In light of this, we sharpen our model by incorporating (1) an additional temporal stream to encode the relationships between frames and (2) discrete temporal tokens enriched with specific time knowledge to represent timestamps. To optimize the training of Grounded-VideoLLM, we employ a multi-stage training scheme, beginning with simple video-captioning tasks and progressively introducing video temporal grounding tasks of increasing complexity. To further enhance Grounded-VideoLLM's temporal reasoning capability, we also curate a grounded VideoQA dataset by an automatic annotation pipeline. Extensive experiments demonstrate that Grounded-VideoLLM not only excels in fine-grained grounding tasks such as temporal sentence grounding, dense video captioning, and grounded VideoQA, but also shows great potential as a versatile video assistant for general video understanding.

翻訳日:2024-11-02 23:18:35 公開日:2024-10-04

# 動的システムのコンテキスト内学習のための拡張トランスフォーマーアーキテクチャ

Enhanced Transformer architecture for in-context learning of dynamical systems ( http://arxiv.org/abs/2410.03291v1 )

ライセンス: Link先を確認

Matteo Rufolo, Dario Piga, Gabriele Maroni, Marco Forgione,

(参考訳) 著者らによって最近紹介されたインコンテキスト識別パラダイムは、システム全体の振る舞いを記述するメタモデルである合成データに基づいて、推定、オフライン、およびベースとなることを目的としている。訓練後、このメタモデルは実システムによって生成された観測された入出力シーケンス(コンテキスト)で入力され、その振る舞いをゼロショット学習方式で予測する。本稿では、確率的フレームワーク内で学習タスクを定式化すること、非連続的なコンテキストとクエリウィンドウを管理すること、長いコンテキストシーケンスを効果的に扱うために繰り返しパッチを適用すること、の3つの主要な革新を通じて、元のメタモデリングフレームワークを強化する。これらの修正の有効性は、Wiener-Hammersteinシステムクラスに焦点を当てた数値的な例を通して示され、モデルの性能と拡張性を強調している。

Recently introduced by some of the authors, the in-context identification paradigm aims at estimating, offline and based on synthetic data, a meta-model that describes the behavior of a whole class of systems. Once trained, this meta-model is fed with an observed input/output sequence (context) generated by a real system to predict its behavior in a zero-shot learning fashion. In this paper, we enhance the original meta-modeling framework through three key innovations: by formulating the learning task within a probabilistic framework; by managing non-contiguous context and query windows; and by adopting recurrent patching to effectively handle long context sequences. The efficacy of these modifications is demonstrated through a numerical example focusing on the Wiener-Hammerstein system class, highlighting the model's enhanced performance and scalability.

翻訳日:2024-11-02 23:18:35 公開日:2024-10-04

# 深部選択状態空間モデルのトーケンダイナミクスのデミステレーション

Demystifying the Token Dynamics of Deep Selective State Space Models ( http://arxiv.org/abs/2410.03292v1 )

ライセンス: Link先を確認

Thieu N Vo, Tung D. Pham, Xin T. Tong, Tan Minh Nguyen,

(参考訳) Mamba のような選択状態空間モデル (SSM) は、シーケンシャルなデータモデリングの有効性で有名になった。その卓越した経験的性能にもかかわらず、深い選択性を持つSSMの包括的な理論的理解は、高い忠実性を必要とするアプリケーションに対するさらなる開発と採用を妨げるままである。本稿では,事前学習したマンバモデルにおけるトークンの動的特性について検討する。特に,マンバモデルの連続時間限界を規定する力学系を導出し,その解の漸近挙動を特徴づける。一次元の場合、以下の2つのシナリオのうち、すべてのトークンが 0 に収束するか、またはすべてのトークンが無限大に分岐するかのどちらかである。各シナリオがいつ発生するかを決定するために、モデルパラメータに基づいた基準を提供する。収束シナリオに対しては、このシナリオがモデルの性能に悪影響を及ぼすことを実証的に検証する。分岐シナリオでは、異なるトークンが異なるレートで無限大に分岐し、モデルトレーニング中の更新に不平等に寄与することを証明する。これらの調査に基づき,本モデルでは,収束シナリオを除外し,重要なスコアに基づいてトークンを並べ替える2つの改良点を提案する。実世界の応用において,Mambaの有効性を高めるための洞察を提供するとともに,これらの改良を検証した。

Selective state space models (SSM), such as Mamba, have gained prominence for their effectiveness in modeling sequential data. Despite their outstanding empirical performance, a comprehensive theoretical understanding of deep selective SSM remains elusive, hindering their further development and adoption for applications that need high fidelity. In this paper, we investigate the dynamical properties of tokens in a pre-trained Mamba model. In particular, we derive the dynamical system governing the continuous-time limit of the Mamba model and characterize the asymptotic behavior of its solutions. In the one-dimensional case, we prove that only one of the following two scenarios happens: either all tokens converge to zero, or all tokens diverge to infinity. We provide criteria based on model parameters to determine when each scenario occurs. For the convergent scenario, we empirically verify that this scenario negatively impacts the model's performance. For the divergent scenario, we prove that different tokens will diverge to infinity at different rates, thereby contributing unequally to the updates during model training. Based on these investigations, we propose two refinements for the model: excluding the convergent scenario and reordering tokens based on their importance scores, both aimed at improving practical performance. Our experimental results validate these refinements, offering insights into enhancing Mamba's effectiveness in real-world applications.

翻訳日:2024-11-02 23:18:35 公開日:2024-10-04

# 多言語テキスト分類におけるゼロショット自己説明と人間の理性の比較

Comparing zero-shot self-explanations with human rationales in multilingual text classification ( http://arxiv.org/abs/2410.03296v1 )

ライセンス: Link先を確認

Stephanie Brandl, Oliver Eberle,

(参考訳) インストラクションチューニングされたLLMは、勾配計算や複雑なXAIメソッドの適用を必要としない自己説明を生成することで、その出力をユーザに説明することができる。本稿では,この能力が,人間に対する妥当性,モデルに対する忠実性に関して,入力論理の形で自己説明を評価することによって,良好な説明をもたらすかどうかを解析する。そこで本研究では,感情分類と強制労働検出という2つのテキスト分類タスクを適用した。英語の他に、デンマーク語とイタリア語による感情分類タスクの翻訳も含み、全サンプルに対する自己説明と人間のアノテーションを比較する。直接比較を可能にするため,本パイプラインを4LLM(Llama2,Llama3,Mistral,Mixtral)に適用する。以上の結果から,自己説明はLRPよりも人間のアノテーションと密接に一致し,忠実度は同等であることがわかった。

Instruction-tuned LLMs are able to provide an explanation about their output to users by generating self-explanations that do not require gradient computations or the application of possibly complex XAI methods. In this paper, we analyse whether this ability results in a good explanation by evaluating self-explanations in the form of input rationales with respect to their plausibility to humans as well as their faithfulness to models. For this, we apply two text classification tasks: sentiment classification and forced labour detection. Next to English, we further include Danish and Italian translations of the sentiment classification task and compare self-explanations to human annotations for all samples. To allow for direct comparisons, we also compute post-hoc feature attribution, i.e., layer-wise relevance propagation (LRP) and apply this pipeline to 4 LLMs (Llama2, Llama3, Mistral and Mixtral). Our results show that self-explanations align more closely with human annotations compared to LRP, while maintaining a comparable level of faithfulness.

翻訳日:2024-11-02 23:08:51 公開日:2024-10-04

# マルコフ政権と非マルコフ政権における外的運転・散逸・集団的影響の相互作用

Interplay between external driving, dissipation and collective effects in the Markovian and non-Markovian regimes ( http://arxiv.org/abs/2410.03297v1 )

ライセンス: Link先を確認

Roie Dann,

(参考訳) 量子光学系の進化は、周囲環境との相互作用、外部制御レーザー、異なるシステムコンポーネント間の相互作用の3つの重要な要素によって決定される。 3つの動的寄与の間の相互作用を理解することは、非平衡現象や技術応用の研究に不可欠である。本研究では、ボゾン場に同時に結合した駆動光学系における開系現象について検討する。フォトニック結晶に結合したマイクロキャビティの線形系について、環境相互作用と外部制御が適用されたコヒーレントドライブに有意な非マルコフ補正を引き起こすことを解析的に示す。さらに、複数のモードが同じフィールドに結合され、あるモードに印加されたレーザーが他のモードを効果的に駆動するときに、集合的なクロスドライブ効果が生じる。線形解に基づいて、2レベルエミッターに対する非マルコフ的マスター方程式が導出される。注目すべきことに、提案された運動方程式は、ボソニックモードではエミッタを近似できない中程度の駆動強度でも正確である。非線型性の影響を解析し、正確な擬モード解に対してベンチマークし、マルコフ体制の確立されたマスター方程式と比較する。この状態の中では、短時間の非マルコフ効果が環境の帯域幅の逆数を超え、短いレーザーパルスによって引き起こされるメモリ効果を示す。これらの発見は、量子光学系の駆動されたオープンシステムのダイナミクス、固体材料に埋め込まれた不純物、分子システムなどに関する貴重な洞察を与え、量子状態の正確な制御の道を開いた。

The evolution of quantum optical systems is determined by three key factors: the interactions with their surrounding environment, externally controlled lasers and between the different system components. Understanding the interplay between the three dynamical contributions is essential for the study of out-of-equilibrium phenomena as well as technological applications. The present study investigates open system phenomena in driven optical systems coupled simultaneously to a bosonic field. For a linear system of micro-cavities coupled to a photonic crystal, it is analytically shown that environmental interaction and external control cause significant non-Markovian corrections to the applied coherent drive. Additionally, collective cross-driving effects arise when multiple modes are coupled to the same field, where a laser applied to one mode effectively drives other modes. Based on the linear solution, a non-Markovian master equation for two-level emitters is derived. Remarkably, the proposed equation of motion remains accurate even for moderate driving intensities, where emitters cannot be approximated by bosonic modes. The influence of the non-linearity is analyzed and benchmarked against an exact pseudo-mode solution, and compared with established master equations in the Markovian regime. Within this regime, the comparison demonstrates the presence of short-time non-Markovian effects at times well beyond the inverse of the environment's bandwidth, and memory effects induced by short laser pulses. These findings offer valuable insights into the driven open system dynamics of quantum optical systems, impurities embedded in solid-state materials, molecular systems, and more, paving the way for precise control of their quantum states.

翻訳日:2024-11-02 23:08:51 公開日:2024-10-04

# SELU: 未知の環境下での自己学習型体育館

SELU: Self-Learning Embodied MLLMs in Unknown Environments ( http://arxiv.org/abs/2410.03303v1 )

ライセンス: Link先を確認

Boyu Li, Haobin Jiang, Ziluo Ding, Xinrun Xu, Haoran Li, Dongbin Zhao, Zongqing Lu,

(参考訳) 近年,Multimodal Large Language Model (MLLM) は視覚的理解と意思決定能力を示し,未知の環境でMLLMを自律的に改善する探索を可能にしている。しかし、人間や環境フィードバックのような外部からのフィードバックは、必ずしも利用できない。この課題に対処するため,既存の手法は主に投票・採点機構によるMLLMの意思決定能力の向上に重点を置いているが,未知環境におけるMLLMの環境理解向上にはほとんど努力が払われていない。 MLLMの自己学習の可能性を完全に解き放つために,強化学習におけるアクター-批判的自己学習パラダイムに触発された,SELUと呼ばれる新しいアクター-批判的自己学習パラダイムを提案する。批評家は、アクターが収集したインタラクショントラジェクトリから知識を抽出するために、自己認識と後向きのレバーベリングを採用し、それによって環境理解を増強する。同時に、批評家が提供した自己フィードバックにより、俳優は改善され、意思決定が強化される。筆者らはAI2-THORおよびVirtualHome環境での手法の評価を行い、SELUは約28%と30%の批判的改善と、自己学習による約20%と24%のアクター的改善を実現している。

Recently, multimodal large language models (MLLMs) have demonstrated strong visual understanding and decision-making capabilities, enabling the exploration of autonomously improving MLLMs in unknown environments. However, external feedback like human or environmental feedback is not always available. To address this challenge, existing methods primarily focus on enhancing the decision-making capabilities of MLLMs through voting and scoring mechanisms, while little effort has been paid to improving the environmental comprehension of MLLMs in unknown environments. To fully unleash the self-learning potential of MLLMs, we propose a novel actor-critic self-learning paradigm, dubbed SELU, inspired by the actor-critic paradigm in reinforcement learning. The critic employs self-asking and hindsight relabeling to extract knowledge from interaction trajectories collected by the actor, thereby augmenting its environmental comprehension. Simultaneously, the actor is improved by the self-feedback provided by the critic, enhancing its decision-making. We evaluate our method in the AI2-THOR and VirtualHome environments, and SELU achieves critic improvements of approximately 28% and 30%, and actor improvements of about 20% and 24% via self-learning.

翻訳日:2024-11-02 23:08:51 公開日:2024-10-04

# ユニタリケイリーグラフにおける隣接行列ハミルトンにより支配される量子分数復元

Quantum fractional revival governed by adjacency matrix Hamiltonian in unitary Cayley graphs ( http://arxiv.org/abs/2410.03310v1 )

ライセンス: Link先を確認

Rachana Soni, Neelam Choudhary, Navneet Pratap Singh,

(参考訳) 本稿では、隣接行列ハミルトンを用いたユニタリケイリーグラフにおける量子分数復元の存在を特徴づける。ユニタリケイリーグラフ $X=(Z_n, S)$ は接続集合 $S \subseteq Z_n$ として特別なグラフである。ユニタリケイリーグラフは積分グラフであり、その隣接行列は循環グラフである。ユニタリケイリーグラフにおける量子分数復活は、頂点の数が偶数である場合にのみ存在することを証明する。量子分数復元を許容するユニタリケイリーグラフに対して、数論的およびスペクトル的特徴付けが与えられる。量子分数復元は量子エンタングルメントに類似している。これは量子情報の伝達に有用な量子ビット状態伝達現象の1つである。

In this article, we give characterization for existence of quantum fractional revival in unitary Cayley graph utilizing adjacency matrix Hamiltonian. Unitary Cayley graph $X=( Z_n, S)$ is a special graph as connection set $S \subseteq Z_n$ is the collection of coprimes to $n$. Unitary Cayley graph is an integral graph and its adjacency matrix is a circulant one. We prove that quantum fractional revival in unitary Cayley graphs exists only when the number of vertices is even. Number-theoretic and spectral characterizations are given for unitary Cayley graph admitting quantum fractional revival. Quantum fractional revival is analogous to quantum entanglement. It is one of qubit state transfer phenomena useful in communication of quantum information.

翻訳日:2024-11-02 23:08:51 公開日:2024-10-04

# Quo Vadis, Motion Generation? 大規模言語モデルから大規模運動モデルへ

Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models ( http://arxiv.org/abs/2410.03311v1 )

ライセンス: Link先を確認

Ye Wang, Sipeng Zheng, Bin Cao, Qianshan Wei, Qin Jin, Zongqing Lu,

(参考訳) 近年のLLMの成功に触発されて、人間の動き理解の分野は、大きな動きモデルの開発へと移りつつある。いくつかの進歩にもかかわらず、現在の最先端の作業は、大規模で高品質なモーションデータがないために、真のジェネラリストモデルを達成するには程遠いままである。これを解決するために、最初の100万レベルのモーション生成ベンチマークであるMotionBaseを紹介し、前回の最大データセットの15倍のデータ量を提供し、階層的な詳細なテキスト記述を備えたマルチモーダルデータを特徴付ける。この膨大なデータセットを活用することで、我々の大きな動きモデルは、目に見えないものを含む幅広い動きの強いパフォーマンスを示す。組織的な調査を通じて、我々は、データ取得コストの軽減に重要な役割を果たす合成データと擬似ラベルを用いて、データサイズとモデルサイズの両方をスケールすることの重要性を強調した。さらに,本研究では,既存の評価指標,特にドメイン外のテキスト命令を扱う際の限界を明らかにする。さらに,動作情報を保存し,コードブックの容量を拡大し,大規模動きモデルの表現能力を向上する,動きトークン化のための新しい2次元ルックアップフリーアプローチを提案する。 MotionBaseのリリースとこの研究から得られた知見は、より強力で汎用的なモーション生成モデルを開発するための道を開くことが期待されている。

Inspired by the recent success of LLMs, the field of human motion understanding has increasingly shifted towards the development of large motion models. Despite some progress, current state-of-the-art works remain far from achieving truly generalist models, largely due to the lack of large-scale, high-quality motion data. To address this, we present MotionBase, the first million-level motion generation benchmark, offering 15 times the data volume of the previous largest dataset, and featuring multimodal data with hierarchically detailed text descriptions. By leveraging this vast dataset, our large motion model demonstrates strong performance across a broad range of motions, including unseen ones. Through systematic investigation, we underscore the importance of scaling both data and model size, with synthetic data and pseudo labels playing a crucial role in mitigating data acquisition costs. Moreover, our research reveals the limitations of existing evaluation metrics, particularly in handling out-of-domain text instructions -- an issue that has long been overlooked. In addition to these, we introduce a novel 2D lookup-free approach for motion tokenization, which preserves motion information and expands codebook capacity, further enhancing the representative ability of large motion models. The release of MotionBase and the insights gained from this study are expected to pave the way for the development of more powerful and versatile motion generation models.

翻訳日:2024-11-02 23:08:51 公開日:2024-10-04

# 大規模言語モデルを用いたASR後感情認識における文脈とシステム融合

Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models ( http://arxiv.org/abs/2410.03312v1 )

ライセンス: Link先を確認

Pavel Stepachev, Pinzhen Chen, Barry Haddow,

(参考訳) 大規模言語モデル(LLM)は、音声とテキストのモデリングにおいて重要な役割を担っている。我々は、文脈と複数のシステムのアウトプットをASR後の音声感情予測に最適に活用するために、GenSEC という最近のタスクに基づいて LLM のプロンプトについて検討する。我々の技術には、ASR transcript ranking, variable conversation context, and system output fusionがある。会話の文脈はリターンを減少させており、予測のための書き起こしを選択するための指標が不可欠であることを示す。最後に、提案するベースラインを絶対精度で20%超えます。

Large language models (LLMs) have started to play a vital role in modelling speech and text. To explore the best use of context and multiple systems' outputs for post-ASR speech emotion prediction, we study LLM prompting on a recent task named GenSEC. Our techniques include ASR transcript ranking, variable conversation context, and system output fusion. We show that the conversation context has diminishing returns and the metric used to select the transcript for prediction is crucial. Finally, our best submission surpasses the provided baseline by 20% in absolute accuracy.

翻訳日:2024-11-02 23:08:51 公開日:2024-10-04

# インパクト指向の個人化フェデレーション学習

Influence-oriented Personalized Federated Learning ( http://arxiv.org/abs/2410.03315v1 )

ライセンス: Link先を確認

Yue Tan, Guodong Long, Jing Jiang, Chengqi Zhang,

(参考訳) 伝統的な連合学習(FL)法は、しばしばパラメータアグリゲーションの固定重み付けに依存し、他者による相互影響を無視している。したがって、不均一なデータコンテキストにおけるそれらの有効性は限られている。この問題に対処するために,各クライアントに対して適応パラメータアグリゲーションを実現するために,クライアントレベルとクラスレベルの影響を定量的に測定する,影響指向のフェデレーション学習フレームワークであるFedC^2Iを提案する。我々の中核となる考え方は、十分に構成された影響ベクトルと影響行列を用いて、FLシステム内のクライアント間影響を明示的にモデル化することである。インフルエンスベクトルは、クライアントレベルの影響を定量化し、クライアントが他者からの知識を選択的に取得し、特徴表現層の集約をガイドする。一方、影響行列は、パーソナライズされた分類器アグリゲーションを達成するために、よりきめ細かな方法でクラスレベルの影響をキャプチャする。非IID環境下での既存のフェデレート学習手法に対するFedC^2Iの性能評価を行い,本手法の優位性を実証した。

Traditional federated learning (FL) methods often rely on fixed weighting for parameter aggregation, neglecting the mutual influence by others. Hence, their effectiveness in heterogeneous data contexts is limited. To address this problem, we propose an influence-oriented federated learning framework, namely FedC^2I, which quantitatively measures Client-level and Class-level Influence to realize adaptive parameter aggregation for each client. Our core idea is to explicitly model the inter-client influence within an FL system via the well-crafted influence vector and influence matrix. The influence vector quantifies client-level influence, enables clients to selectively acquire knowledge from others, and guides the aggregation of feature representation layers. Meanwhile, the influence matrix captures class-level influence in a more fine-grained manner to achieve personalized classifier aggregation. We evaluate the performance of FedC^2I against existing federated learning methods under non-IID settings and the results demonstrate the superiority of our method.

翻訳日:2024-11-02 22:58:38 公開日:2024-10-04

# Visual-O1:マルチモーダル・マルチターン・チェーン・オブ・シンセサイティングによる曖昧な指示を理解する

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning ( http://arxiv.org/abs/2410.03321v1 )

ライセンス: Link先を確認

Minheng Ni, Yutao Fan, Lei Zhang, Wangmeng Zuo,

(参考訳) 大規模モデルが進化するにつれて、言語命令はマルチモーダルタスクでますます活用される。人間の言語の習慣のため、これらの命令はしばしば現実のシナリオにおける曖昧さを含み、正確な解釈のために視覚的文脈や常識の統合を必要とする。しかし、高度にインテリジェントな大規模モデルでさえ、曖昧な命令に対して顕著な性能制限を示し、曖昧さの弱い推論能力は破滅的な誤りを引き起こす可能性がある。本稿では,マルチモーダルなマルチターン・チェーン・オブ・シークレット推論フレームワークであるVisual-O1を提案する。人間のマルチモーダルなマルチターン推論をシミュレートし、高度にインテリジェントなモデルに対する瞬間的な経験や、不明瞭な指示を理解するための一般的なインテリジェントなモデルに対する経験を提供する。長いテキストを理解したり、長い複雑な推論を行うために高知能なモデルを必要とする従来の手法とは異なり、我々のフレームワークは計算オーバーヘッドを著しく増加させておらず、一般的にはインテリジェントなモデルであってもより汎用的で効果的である。実験により,本手法は,曖昧な命令に対して異なるインテリジェンスレベルのモデルの性能を著しく向上するだけでなく,汎用データセット上での性能も向上することが示された。私たちの研究は、不確実性と曖昧さのある現実のシナリオにおいて、人工知能が人間のように機能する可能性を強調します。データとコードを公開します。

As large-scale models evolve, language instructions are increasingly utilized in multi-modal tasks. Due to human language habits, these instructions often contain ambiguities in real-world scenarios, necessitating the integration of visual context or common sense for accurate interpretation. However, even highly intelligent large models exhibit significant performance limitations on ambiguous instructions, where weak reasoning abilities of disambiguation can lead to catastrophic errors. To address this issue, this paper proposes Visual-O1, a multi-modal multi-turn chain-of-thought reasoning framework. It simulates human multi-modal multi-turn reasoning, providing instantial experience for highly intelligent models or empirical experience for generally intelligent models to understand ambiguous instructions. Unlike traditional methods that require models to possess high intelligence to understand long texts or perform lengthy complex reasoning, our framework does not significantly increase computational overhead and is more general and effective, even for generally intelligent models. Experiments show that our method not only significantly enhances the performance of models of different intelligence levels on ambiguous instructions but also improves their performance on general datasets. Our work highlights the potential of artificial intelligence to work like humans in real-world scenarios with uncertainty and ambiguity. We will release our data and code.

翻訳日:2024-11-02 22:58:37 公開日:2024-10-04

# SpatioTemporal Informationは2つのビデオ要約ベンチマークに役立つか?

Does SpatioTemporal information benefit Two video summarization benchmarks? ( http://arxiv.org/abs/2410.03323v1 )

ライセンス: Link先を確認

Aashutosh Ganesh, Mirela Popa, Daan Odijk, Nava Tintarev,

(参考訳) ビデオの要約における重要な側面は、ビデオの各部分の背後にある時間的文脈を理解して、何が重要で何が重要でないかを理解することである。近年、ビデオ要約モデルは、この情報を表現するために時空間関係をモデル化している。これらのモデルは重要なベンチマークデータセットに対して最先端の相関スコアを得た。しかし、レビューされていないのは、時空間関係が最先端の結果を得るために必要であるかどうかである。これまでのアクティビティ認識の研究は、シーンやオブジェクトのような静的なキューを、モーション情報よりも優先することで、バイアスを見つけてきた。本稿では,類似の関係が映像要約の課題に影響を及ぼすかどうかを考察する。そのために、既存のベンチマークデータセットで時間情報が果たす役割を分析します。まず、時間的に不変なモデルでベースラインを推定し、そのようなモデルがベンチマークデータセット(TVSumとSumMe)上でどれだけうまくランクされているかを確認する。次に、ビデオの時間的順序を乱して、既存の最先端モデルに与える影響を調査します。我々の研究結果の1つは、TVSumデータセット上の人間のベースラインに近い競合相関スコアを時間的不変モデルが達成することである。また,既存モデルは時間的摂動の影響を受けないことを示す。さらに、一定の時間セグメントをシャッフルする破壊戦略により、相関スコアを実際に改善することができる。これらの結果から,時空間的関係が微妙な役割を果たしていることが判明し,これらのベンチマークが映像要約のタスクを適切にモデル化するかどうかという疑問が提起された。 https://github.com/AashGan/TemporalPerturbSum

An important aspect of summarizing videos is understanding the temporal context behind each part of the video to grasp what is and is not important. Video summarization models have in recent years modeled spatio-temporal relationships to represent this information. These models achieved state-of-the-art correlation scores on important benchmark datasets. However, what has not been reviewed is whether spatio-temporal relationships are even required to achieve state-of-the-art results. Previous work in activity recognition has found biases, by prioritizing static cues such as scenes or objects, over motion information. In this paper we inquire if similar spurious relationships might influence the task of video summarization. To do so, we analyse the role that temporal information plays on existing benchmark datasets. We first estimate a baseline with temporally invariant models to see how well such models rank on benchmark datasets (TVSum and SumMe). We then disrupt the temporal order of the videos to investigate the impact it has on existing state-of-the-art models. One of our findings is that the temporally invariant models achieve competitive correlation scores that are close to the human baselines on the TVSum dataset. We also demonstrate that existing models are not affected by temporal perturbations. Furthermore, with certain disruption strategies that shuffle fixed time segments, we can actually improve their correlation scores. With these results, we find that spatio-temporal relationship play a minor role and we raise the question whether these benchmarks adequately model the task of video summarization. Code available at: https://github.com/AashGan/TemporalPerturbSum

翻訳日:2024-11-02 22:58:37 公開日:2024-10-04

# デコヒーレンスフリー部分空間を用いたフォトニック絡み合い状態の決定論的生成

Deterministic generation of photonic entangled states using decoherence-free subspaces ( http://arxiv.org/abs/2410.03325v1 )

ライセンス: Link先を確認

Oriol Rubies-Bigorda, Stuart J. Masson, Susanne F. Yelin, Ana Asenjo-Garcia,

(参考訳) 本稿では、量子情報技術の基本となる光の量子状態の決定論的生成のための資源として、物質の集合状態を用いることを提案する。我々の最小限のモデルでは、半導波路に結合された3つのエミッタ、すなわちミラーで終了する1次元導波路から構成される。発光体間の光子による相互作用は、明るい状態と暗い状態の出現をもたらす。ダークステートは、消散から保護された非コヒーレンスな部分空間を形成する。エミッターの局所的な駆動と共振周波数の制御により、デコヒーレンス自由部分空間内で任意の量子ゲートを実行することができる。明るい状態への結合は光子放出を促進するため、光と物質の間の量子ゲートの実現を可能にする。これらのゲートのシーケンシャルな応用は、グリーンベルガー=ホルン=ザイリンガーや1次元および2次元のクラスター状態のようなフォトニックな絡み合った状態を生み出すことを実証する。

We propose the use of collective states of matter as a resource for the deterministic generation of quantum states of light, which are fundamental for quantum information technologies. Our minimal model consists of three emitters coupled to a half-waveguide, i.e., a one-dimensional waveguide terminated by a mirror. Photon-mediated interactions between the emitters result in the emergence of bright and dark states. The dark states form a decoherence-free subspace, protected from dissipation. Local driving of the emitters and control of their resonance frequencies allows to perform arbitrary quantum gates within the decoherence-free subspace. Coupling to bright states facilitates photon emission, thereby enabling the realization of quantum gates between light and matter. We demonstrate that sequential application of these gates leads to the generation of photonic entangled states, such as Greenberger-Horne-Zeilinger and one- and two-dimensional cluster states.

翻訳日:2024-11-02 22:58:37 公開日:2024-10-04

# EmojiHeroVR:頭部ディスプレイの部分的閉塞下での表情認識に関する研究

EmojiHeroVR: A Study on Facial Expression Recognition under Partial Occlusion from Head-Mounted Displays ( http://arxiv.org/abs/2410.03331v1 )

ライセンス: Link先を確認

Thorben Ortmann, Qi Wang, Larissa Putzar,

(参考訳) 感情認識は、感情フィードバックを提供し、高度なパーソナライゼーションを可能にすることで、バーチャルリアリティ(VR)体験の評価と向上を促進する。しかし、HMD(Head-Mounted Displays)は顔の上半分を遮蔽するので、表情はユーザーの感情を認識するために使われることは滅多にない。この問題に対処するため,新しいVRゲームEmojiHeroVRをプレイした37人の参加者を対象に調査を行った。収集されたデータベースであるEmoHeVRDB(EmojiHeroVR Database)には、3,556のラベル付き顔画像と1,778の再現された感情が含まれている。ラベル付き画像ごとに、ラベル付き画像の前後に直接記録された29のフレームも提供し、動的顔表情認識(FER)を容易にする。さらに、EmoHeVRDBには、フレーム毎にMeta Quest Pro VRヘッドセットを介してキャプチャされた63の表情のアクティベートに関するデータが含まれている。データベースを利用して,静的FER分類タスクのベースライン評価を行い,6つの基本的な感情と,EfficientNet-B0アーキテクチャを用いた中立性について検討した。最良のモデルでは、テストセットで69.84%の精度を達成し、HMD閉塞下のFERは従来のFERよりもはるかに困難であることを示した。

Emotion recognition promotes the evaluation and enhancement of Virtual Reality (VR) experiences by providing emotional feedback and enabling advanced personalization. However, facial expressions are rarely used to recognize users' emotions, as Head-Mounted Displays (HMDs) occlude the upper half of the face. To address this issue, we conducted a study with 37 participants who played our novel affective VR game EmojiHeroVR. The collected database, EmoHeVRDB (EmojiHeroVR Database), includes 3,556 labeled facial images of 1,778 reenacted emotions. For each labeled image, we also provide 29 additional frames recorded directly before and after the labeled image to facilitate dynamic Facial Expression Recognition (FER). Additionally, EmoHeVRDB includes data on the activations of 63 facial expressions captured via the Meta Quest Pro VR headset for each frame. Leveraging our database, we conducted a baseline evaluation on the static FER classification task with six basic emotions and neutral using the EfficientNet-B0 architecture. The best model achieved an accuracy of 69.84% on the test set, indicating that FER under HMD occlusion is feasible but significantly more challenging than conventional FER.

翻訳日:2024-11-02 22:58:37 公開日:2024-10-04

# 乳がん分類における先行CNNアーキテクチャの比較解析とエンサンブルエンハンスメント

Comparative Analysis and Ensemble Enhancement of Leading CNN Architectures for Breast Cancer Classification ( http://arxiv.org/abs/2410.03333v1 )

ライセンス: Link先を確認

Gary Murphy, Raghubir Singh,

(参考訳) 本研究は,病理組織像を用いた乳癌分類への新規かつ正確なアプローチを提案する。さまざまな画像データセットにまたがる主要な畳み込みニューラルネットワーク(CNN)モデルを体系的に比較し、最適なハイパーパラメータを特定し、分類の有効性に基づいてランク付けする。探索する各モデルの分類精度を最大化するために、データ強化、代替の完全接続層、モデルトレーニングハイパーパラメータ設定、および事前トレーニングされた重みの使用に対するモデルの再トレーニングの利点がある。私たちの方法論には、トレーニング実行中に一貫性のあるデータ条件を確保するために生成されたデータセットのシリアライズや、トレーニング期間の大幅な短縮など、いくつかのオリジナルの概念が含まれている。結果の自動キュレーションと組み合わせることで、2,000以上のトレーニング順列の探索が可能になった。本研究は,独立系CNNモデルにおいて,例外的分類精度を達成し,モデルの有効性でランク付けするために必要な設定を確立した。これらの結果に基づき、3つの高性能スタンドアロンCNNモデルと多様な分類器を積み重ねたアンサンブルアーキテクチャを提案し、その結果、分類精度が向上した。非常に多くのモデル置換を体系的に実行して最高の結果を得る能力は、BreakHis x40とBreakHis x200の99.75%、Bachデータセットをトレーニング、検証、テストデータセットに分割する95.18%など、非常に高品質な結果をもたらす。 Bach Onlineのブラインドチャレンジでは89%がこのアプローチを使用していた。本研究は乳癌の病理組織像データセットに基づいているが,他の医用画像データセットにも等しく適用可能である。

This study introduces a novel and accurate approach to breast cancer classification using histopathology images. It systematically compares leading Convolutional Neural Network (CNN) models across varying image datasets, identifies their optimal hyperparameters, and ranks them based on classification efficacy. To maximize classification accuracy for each model we explore, the effects of data augmentation, alternative fully-connected layers, model training hyperparameter settings, and, the advantages of retraining models versus using pre-trained weights. Our methodology includes several original concepts, including serializing generated datasets to ensure consistent data conditions across training runs and significantly reducing training duration. Combined with automated curation of results, this enabled the exploration of over 2,000 training permutations -- such a comprehensive comparison is as yet unprecedented. Our findings establish the settings required to achieve exceptional classification accuracy for standalone CNN models and rank them by model efficacy. Based on these results, we propose ensemble architectures that stack three high-performing standalone CNN models together with diverse classifiers, resulting in improved classification accuracy. The ability to systematically run so many model permutations to get the best outcomes gives rise to very high quality results, including 99.75% for BreakHis x40 and BreakHis x200 and 95.18% for the Bach datasets when split into train, validation and test datasets. The Bach Online blind challenge, yielded 89% using this approach. Whilst this study is based on breast cancer histopathology image datasets, the methodology is equally applicable to other medical image datasets.

翻訳日:2024-11-02 22:58:37 公開日:2024-10-04

# X線は15の価値がある: 解釈可能な放射線学レポート作成のためのスパースオートエンコーダ

An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation ( http://arxiv.org/abs/2410.03334v1 )

ライセンス: Link先を確認

Ahmed Abdulaal, Hugo Fry, Nina Montaña-Brown, Ayodeji Ijishakin, Jack Gao, Stephanie Hyland, Daniel C. Alexander, Daniel C. Castro,

(参考訳) 放射線サービスは前例のない需要を経験しており、放射線学レポート生成の自動化への関心が高まっている。既存のビジョンランゲージモデル(VLM)は幻覚に悩まされ、解釈性に欠け、高価な微調整を必要とする。我々は,SAE-Radを導入し,スパースオートエンコーダ(SAE)を用いて,事前学習された視覚変換器から人間の解釈可能な特徴へ潜在表現を分解する。我々のハイブリッドアーキテクチャは、最先端のSAEの進歩を組み合わせ、空間性を維持しつつ正確な遅延再構築を実現します。既成の言語モデルを用いて,SAEの各特徴について,地中真実のレポートをラジオロジカルな記述に分解し,各画像の完全なレポートにコンパイルすることで,このタスクのために大規模なモデルを微調整する必要がなくなる。我々の知る限り、SAE-Radは下流のマルチモーダル推論タスクに機械的解釈可能性手法を明示的に用いた最初の例である。 MIMIC-CXRデータセットでは、SAE-Radは、最先端のモデルと比較して競合する放射線学固有のメトリクスを達成し、トレーニングのために計算資源を著しく少なくしている。質的な分析により、SAE-Radは意味のある視覚概念を学び、専門家の解釈と密接に一致したレポートを生成することが明らかになった。以上の結果から,SAEは医療におけるマルチモーダル推論を強化し,既存のVLMの代替となる可能性が示唆された。

Radiological services are experiencing unprecedented demand, leading to increased interest in automating radiology report generation. Existing Vision-Language Models (VLMs) suffer from hallucinations, lack interpretability, and require expensive fine-tuning. We introduce SAE-Rad, which uses sparse autoencoders (SAEs) to decompose latent representations from a pre-trained vision transformer into human-interpretable features. Our hybrid architecture combines state-of-the-art SAE advancements, achieving accurate latent reconstructions while maintaining sparsity. Using an off-the-shelf language model, we distil ground-truth reports into radiological descriptions for each SAE feature, which we then compile into a full report for each image, eliminating the need for fine-tuning large models for this task. To the best of our knowledge, SAE-Rad represents the first instance of using mechanistic interpretability techniques explicitly for a downstream multi-modal reasoning task. On the MIMIC-CXR dataset, SAE-Rad achieves competitive radiology-specific metrics compared to state-of-the-art models while using significantly fewer computational resources for training. Qualitative analysis reveals that SAE-Rad learns meaningful visual concepts and generates reports aligning closely with expert interpretations. Our results suggest that SAEs can enhance multimodal reasoning in healthcare, providing a more interpretable alternative to existing VLMs.

翻訳日:2024-11-02 22:58:37 公開日:2024-10-04

# Audio-Agent: オーディオ生成、編集、合成にLLMを活用する

Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition ( http://arxiv.org/abs/2410.03335v1 )

ライセンス: Link先を確認

Zixuan Wang, Yu-Wing Tai, Chi-Keung Tang,

(参考訳) 本稿では,テキストやビデオの入力に基づく音声生成,編集,合成のためのマルチモーダルフレームワークであるAudio-Agentを紹介する。従来のTTA(text-to-audio)タスクのアプローチは、テキスト記述からシングルパス推論を行うことが多い。しかし、このデザインは複雑なテキスト条件が与えられた場合、高品質なオーディオを作り出すのに苦労している。本手法では,事前学習したTTA拡散ネットワークを音声生成エージェントとして利用し,テキスト条件をアトミックな特定の命令に分解し,音声生成のためにエージェントを呼び出す。その結果、Audio-Agentは、提供されたテキストやビデオと密に一致した高品質なオーディオを生成し、可変長生成もサポートする。 VTA(Video-to-audio)タスクでは、既存のほとんどの手法では、ビデオイベントと生成されたオーディオを同期させるタイムスタンプ検出器をトレーニングする必要がある。本稿では,事前学習したLarge Language Model(LLM),例えばGemma2-2B-itを微調整して,ビデオとオーディオのモダリティをブリッジする意味的条件と時間的条件の両方を得る,というシンプルなアプローチを提案する。したがって、我々のフレームワークは、トレーニングにおいてかなりの計算オーバーヘッドを伴わずに、TTAタスクとVTAタスクの両方に包括的なソリューションを提供する。

We introduce Audio-Agent, a multimodal framework for audio generation, editing and composition based on text or video inputs. Conventional approaches for text-to-audio (TTA) tasks often make single-pass inferences from text descriptions. While straightforward, this design struggles to produce high-quality audio when given complex text conditions. In our method, we utilize a pre-trained TTA diffusion network as the audio generation agent to work in tandem with GPT-4, which decomposes the text condition into atomic, specific instructions, and calls the agent for audio generation. Consequently, Audio-Agent generates high-quality audio that is closely aligned with the provided text or video while also supporting variable-length generation. For video-to-audio (VTA) tasks, most existing methods require training a timestamp detector to synchronize video events with generated audio, a process that can be tedious and time-consuming. We propose a simpler approach by fine-tuning a pre-trained Large Language Model (LLM), e.g., Gemma2-2B-it, to obtain both semantic and temporal conditions to bridge video and audio modality. Thus our framework provides a comprehensive solution for both TTA and VTA tasks without substantial computational overhead in training.

翻訳日:2024-11-02 22:58:37 公開日:2024-10-04

# 自然論理と大規模言語モデルによるゼロショットファクト検証

Zero-Shot Fact Verification via Natural Logic and Large Language Models ( http://arxiv.org/abs/2410.03341v1 )

ライセンス: Link先を確認

Marek Strong, Rami Aly, Andreas Vlachos,

(参考訳) 近年の自然論理による事実検証システムの発達は、主張を集合論演算子を通して証拠と整合させ、忠実な正当化を提供することによって説明可能性を高めている。これらの進歩にもかかわらず、このようなシステムは自然論理に注釈付けされた大量のトレーニングデータに依存していることが多い。そこで本研究では,命令調整型大規模言語モデルの一般化機能を利用したゼロショット手法を提案する。提案手法と他の事実検証システムのゼロショット能力を総合的に評価するために,多言語データセットを含む,人工的および実世界のクレームに関するすべてのモデルを評価する。また,本手法を他の事実検証システムと比較する。まず、ゼロショットの一般化設定において、本手法は、自然論理データに特化して訓練されていない他のシステムよりも優れており、最高性能のベースラインに対して平均8.96ポイントの精度向上を実現していることを示す。第2に、ゼロショット転送設定において、自然論理データに基づいて訓練された現在のシステムは、他の領域にうまく一般化しないことを示し、本手法は、実世界のクレームを持つ全てのデータセットにおいて、これらのシステムより優れていることを示す。

The recent development of fact verification systems with natural logic has enhanced their explainability by aligning claims with evidence through set-theoretic operators, providing faithful justifications. Despite these advancements, such systems often rely on a large amount of training data annotated with natural logic. To address this issue, we propose a zero-shot method that utilizes the generalization capabilities of instruction-tuned large language models. To comprehensively assess the zero-shot capabilities of our method and other fact verification systems, we evaluate all models on both artificial and real-world claims, including multilingual datasets. We also compare our method against other fact verification systems in two setups. First, in the zero-shot generalization setup, we demonstrate that our approach outperforms other systems that were not specifically trained on natural logic data, achieving an average accuracy improvement of 8.96 points over the best-performing baseline. Second, in the zero-shot transfer setup, we show that current systems trained on natural logic data do not generalize well to other domains, and our method outperforms these systems across all datasets with real-world claims.

翻訳日:2024-11-02 22:58:37 公開日:2024-10-04

# Dolphin: スケーラブルなニューロシンボリックラーニングのためのプログラム可能なフレームワーク

Dolphin: A Programmable Framework for Scalable Neurosymbolic Learning ( http://arxiv.org/abs/2410.03348v1 )

ライセンス: Link先を確認

Aaditya Naik, Jason Liu, Claire Wang, Saikat Dutta, Mayur Naik, Eric Wong,

(参考訳) ニューロシンボリック学習は、シンボリック推論をディープラーニングモデルに組み込むための有望なパラダイムとして登場した。しかし、既存のフレームワークは、トレーニングデータとシンボリックプログラムの複雑さの両方に関してスケーラビリティに制限がある。シンボルプログラムの前方連鎖と後方勾配の伝播をベクトル化計算にマッピングすることにより,神経記号学習を基本レベルでスケールするフレームワークであるDolphinを提案する。この目的のためにDolphin氏は、PyTorchのような高性能なディープラーニングフレームワークの上に構築された一連の抽象化とプリミティブを導入し、シンボルプログラムをPyTorchモジュールとして書けるようにした。これにより、開発者が慣れ親しんだPythonのような言語でニューロシンボリックプログラムを記述し、GPU上でのエンドツーエンドの差別化に寄与する計算グラフにコンパイルすることが可能になる。我々はDolphinを、テキスト、画像、ビデオ処理のディープラーニングモデルとマルチホップ推論、再帰、さらにはPython eval()のようなブラックボックス関数を含むシンボリックプログラムを組み合わせた5つのニューロシンボリックタスクを対象とした13のベンチマークで評価した。ドルフィンの訓練時間は0.33%-37.17%(平均2.77%)で、Scallop、ISED、IndeCateR+と比較すると最大である。ドルフィンで書かれたモデルは、最大のベンチマークでも最先端の精度を達成する。

Neurosymbolic learning has emerged as a promising paradigm to incorporate symbolic reasoning into deep learning models. However, existing frameworks are limited in scalability with respect to both the training data and the complexity of symbolic programs. We propose Dolphin, a framework to scale neurosymbolic learning at a fundamental level by mapping both forward chaining and backward gradient propagation in symbolic programs to vectorized computations. For this purpose, Dolphin introduces a set of abstractions and primitives built directly on top of a high-performance deep learning framework like PyTorch, effectively enabling symbolic programs to be written as PyTorch modules. It thereby enables neurosymbolic programs to be written in a language like Python that is familiar to developers and compile them to computation graphs that are amenable to end-to-end differentiation on GPUs. We evaluate Dolphin on a suite of 13 benchmarks across 5 neurosymbolic tasks that combine deep learning models for text, image, or video processing with symbolic programs that involve multi-hop reasoning, recursion, and even black-box functions like Python eval(). Dolphin only takes 0.33%-37.17% of the time (and 2.77% on average) to train these models on the largest input per task compared to baselines Scallop, ISED, and IndeCateR+, which time out on most of these inputs. Models written in Dolphin also achieve state-of-the-art accuracies even on the largest benchmarks.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# 自己回帰法によるコード等価表現の生成

Generating Equivalent Representations of Code By A Self-Reflection Approach ( http://arxiv.org/abs/2410.03351v1 )

ライセンス: Link先を確認

Jia Li, Ge Li, Lecheng Wang, Hao Zhu, Zhi Jin,

(参考訳) コードの等価表現(ER)は、コード自身、例えば自然言語のコメントや擬似コードと同じ意味を保存したテキスト表現である。 ERはソフトウェア開発とメンテナンスにおいて重要な役割を担います。しかし、コードのERを自動的に生成する方法は、依然としてオープンな課題である。本稿では,ERを生成するための自己回帰手法を提案する。 2つの大規模言語モデル(LLM)を相互に動作させ、リフレクションプロセスを通じてERを生成する。 ERの制約が適用されるかどうかによって、我々の手法はオープンな設定と制約のある設定の両方でERを生成する。 ERを2つの設定で生成し,8つの結果を得るための実証的研究を行った。 1)オープン環境でERを生成する。オープンな環境では、LLMは制約なしにコードを表現することができ、その結果のERを分析し、5つの重要な発見を明らかにする。これらの発見は、LLMがコード内の構文構造、API、数値計算をどのように理解したかに光を当てた。 2)制約された環境でERを生成する。制約された設定では、自然言語コメント、擬似コード、フローチャートなどのERに制約を課す。これにより、当社のアプローチは、さまざまなソフトウェアエンジニアリングタスクに対処できます。実験結果から,本手法が特定の制約に従うERを効果的に生成し,様々なソフトウェア工学タスクをサポートすることを示す3つの知見を得た。 (3)今後の方向性。また、コード生成のための中間言語の作成、LCMフレンドリーな要件記述の探索、ソフトウェア工学タスクのさらなる支援など、将来的な研究の方向性についても論じる。本論文は,研究コミュニティの議論を喚起し,多くのフォローアップ研究を刺激すると考えられる。

Equivalent Representations (ERs) of code are textual representations that preserve the same semantics as the code itself, e.g., natural language comments and pseudocode. ERs play a critical role in software development and maintenance. However, how to automatically generate ERs of code remains an open challenge. In this paper, we propose a self-reflection approach to generating ERs of code. It enables two Large Language Models (LLMs) to work mutually and produce an ER through a reflection process. Depending on whether constraints on ERs are applied, our approach generates ERs in both open and constrained settings. We conduct a empirical study to generate ERs in two settings and obtain eight findings. (1) Generating ERs in the open setting. In the open setting, we allow LLMs to represent code without any constraints, analyzing the resulting ERs and uncovering five key findings. These findings shed light on how LLMs comprehend syntactic structures, APIs, and numerical computations in code. (2) Generating ERs in the constrained setting. In the constrained setting, we impose constraints on ERs, such as natural language comments, pseudocode, and flowcharts. This allows our approach to address a range of software engineering tasks. Based on our experiments, we have three findings demonstrating that our approach can effectively generate ERs that adhere to specific constraints, thus supporting various software engineering tasks. (3) Future directions. We also discuss potential future research directions, such as deriving intermediate languages for code generation, exploring LLM-friendly requirement descriptions, and further supporting software engineering tasks. We believe that this paper will spark discussions in research communities and inspire many follow-up studies.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# LANTERN:Relaxed Speculative Decodingによる視覚自己回帰モデルの高速化

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding ( http://arxiv.org/abs/2410.03355v1 )

ライセンス: Link先を確認

Doohyuk Jang, Sihwan Park, June Yong Yang, Yeonsung Jung, Jihun Yun, Souvik Kundu, Sung-Yub Kim, Eunho Yang,

(参考訳) オートレグレッシブ(AR)モデルは画像生成において最近注目され、しばしば拡散モデルの性能と一致するか、さらに上回っている。しかし、ARモデルの1つの大きな制限は、そのシーケンシャルな性質であり、トークンを一度に1つずつ処理し、より効率的に動作するGANや拡散ベースの方法と比較すると、生成を遅くする。投機的復号化は、1つの前方で複数のトークンを生成することでLCMを加速させる効果が証明されているが、視覚ARモデルにおけるその応用はいまだに探索されていない。本稿では,視覚的ARモデルがトークンに低確率を割り当てることによって,投機的復号化の性能を損なうような,この設定における課題を特定する。この課題を克服するために、潜在空間におけるトークンの交換性を活用するLANTERNと呼ばれる緩和された受け入れ条件を提案する。この緩和は、未熟に拒絶される候補トークンをより柔軟な使用を可能にすることで、視覚的ARモデルにおける投機的復号化の有効性を回復させる。さらに、全変動距離境界を組み込むことで、画像の品質やセマンティックコヒーレンスを著しく損なうことなく、これらの速度ゲインを実現する。実験により,提案手法が投機的復号化よりも大幅に高速化されたことを示す。具体的には、最先端の投機的復号法である na\ の応用と比較して、LANTERN は現代のビジュアルARモデルである LlamaGen に適用すると、greedy の復号法とランダムサンプリング法と比較して、$\mathbf{1.75}\times$ と $\mathbf{1.76}\times$ のスピードアップを増大させる。

Auto-Regressive (AR) models have recently gained prominence in image generation, often matching or even surpassing the performance of diffusion models. However, one major limitation of AR models is their sequential nature, which processes tokens one at a time, slowing down generation compared to models like GANs or diffusion-based methods that operate more efficiently. While speculative decoding has proven effective for accelerating LLMs by generating multiple tokens in a single forward, its application in visual AR models remains largely unexplored. In this work, we identify a challenge in this setting, which we term \textit{token selection ambiguity}, wherein visual AR models frequently assign uniformly low probabilities to tokens, hampering the performance of speculative decoding. To overcome this challenge, we propose a relaxed acceptance condition referred to as LANTERN that leverages the interchangeability of tokens in latent space. This relaxation restores the effectiveness of speculative decoding in visual AR models by enabling more flexible use of candidate tokens that would otherwise be prematurely rejected. Furthermore, by incorporating a total variation distance bound, we ensure that these speed gains are achieved without significantly compromising image quality or semantic coherence. Experimental results demonstrate the efficacy of our method in providing a substantial speed-up over speculative decoding. In specific, compared to a na\"ive application of the state-of-the-art speculative decoding, LANTERN increases speed-ups by $\mathbf{1.75}\times$ and $\mathbf{1.76}\times$, as compared to greedy decoding and random sampling, respectively, when applied to LlamaGen, a contemporary visual AR model.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# 言語横断型AMRパーシングはメタになるべきか? : メタラーニングと共同学習型AMRパーシングの実証評価

Should Cross-Lingual AMR Parsing go Meta? An Empirical Assessment of Meta-Learning and Joint Learning AMR Parsing ( http://arxiv.org/abs/2410.03357v1 )

ライセンス: Link先を確認

Jeongwoo Kang, Maximin Coavoux, Cédric Lopez, Didier Schwab,

(参考訳) 言語間AMR解析は、トレーニングデータがソース言語でのみ利用できる場合、ターゲット言語でAMRグラフを予測するタスクである。 AMRトレーニングデータと評価データのサイズが小さいため、言語間AMR解析は英語、スペイン語、ドイツ語、中国語、イタリア語などの小さな言語でのみ研究されている。言語間構文解析にメタラーニングを適用したLangedijk et al(2022)からインスピレーションを得て,メタラーニングを用いた言語間AMR解析について検討した。我々は,これらのモデルを$k$-shotシナリオ(0-shotを含む)で評価し,クロアチア語,ファルシ語,韓国語,中国語,フランス語での有効性を評価した。特に、韓国とクロアチアのテストセットは、既存のThe Little Prince English AMRコーパスに基づいて、我々の研究の一部として開発され、公開されています。従来のジョイントラーニングと比較し,実証的研究を行った。メタラーニングモデルは, 特定の言語に対する0ショット評価において若干改善されているが, $k$ が 0 よりも高い場合, 性能向上は最小か欠落であることが示唆された。

Cross-lingual AMR parsing is the task of predicting AMR graphs in a target language when training data is available only in a source language. Due to the small size of AMR training data and evaluation data, cross-lingual AMR parsing has only been explored in a small set of languages such as English, Spanish, German, Chinese, and Italian. Taking inspiration from Langedijk et al. (2022), who apply meta-learning to tackle cross-lingual syntactic parsing, we investigate the use of meta-learning for cross-lingual AMR parsing. We evaluate our models in $k$-shot scenarios (including 0-shot) and assess their effectiveness in Croatian, Farsi, Korean, Chinese, and French. Notably, Korean and Croatian test sets are developed as part of our work, based on the existing The Little Prince English AMR corpus, and made publicly available. We empirically study our method by comparing it to classical joint learning. Our findings suggest that while the meta-learning model performs slightly better in 0-shot evaluation for certain languages, the performance gain is minimal or absent when $k$ is higher than 0.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# 量子コミットと量子片方向のOracle分離

Oracle Separation Between Quantum Commitments and Quantum One-wayness ( http://arxiv.org/abs/2410.03358v1 )

ライセンス: Link先を確認

John Bostanci, Boyang Chen, Barak Nehoran,

(参考訳) 量子コミットメントが存在するが、(効果的に検証可能な)片方向状態生成器が存在しないような、ユニタリな量子オラクルが存在することを示す。どちらも、暗号の最小仮定としてワンウェイ関数を置き換える候補として広く考えられている。最近の研究は、一方の状態発生器からコミットメントを構築することができることを示したが、他方の方向は未解決のままである。我々の結果はブラックボックスの構成を除外し、この決定的なオープンな問題を解決し、量子コミットメント(EFI対の同値クラス、量子オブザーバー転送、セキュアな量子多パーティ計算)が、すべての既知のプリミティブの中では、極端に弱いように見えることを示唆している。

We show that there exists a unitary quantum oracle relative to which quantum commitments exist but no (efficiently verifiable) one-way state generators exist. Both have been widely considered candidates for replacing one-way functions as the minimal assumption for cryptography: the weakest cryptographic assumption implied by all of computational cryptography. Recent work has shown that commitments can be constructed from one-way state generators, but the other direction has remained open. Our results rule out any black-box construction, and thus settle this crucial open problem, suggesting that quantum commitments (as well as its equivalency class of EFI pairs, quantum oblivious transfer, and secure quantum multiparty computation) appear to be strictly weakest among all known cryptographic primitives.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# マルチカラー空間テンソルマージを利用した高調波高調波結合ハイブリッド変圧器ネットワークアーキテクチャ

An Enhanced Harmonic Densely Connected Hybrid Transformer Network Architecture for Chronic Wound Segmentation Utilising Multi-Colour Space Tensor Merging ( http://arxiv.org/abs/2410.03359v1 )

ライセンス: Link先を確認

Bill Cassidy, Christian Mcbride, Connah Kendrick, Neil D. Reeves, Joseph M. Pappachan, Cornelius J. Fernandez, Elias Chacko, Raphael Brüngel, Christoph M. Friedrich, Metib Alotaibi, Abdullah Abdulaziz AlWabel, Mohammad Alderwish, Kuan-Ying Lai, Moi Hoon Yap,

(参考訳) 慢性的な傷や合併症は、世界中のクリニックや病院の負担を増大させ続けている。静脈、動脈、糖尿病、圧傷は世界中でますます一般的になりつつある。これらの状態は、感染によって引き起こされる手足の切断や死亡リスクの増加により、感染した人に対する非常に不安定な反感を引き起こす可能性がある。したがって、慢性的な創傷治療において、臨床医を支援する新しい方法が、高品質なケア基準を維持する上で不可欠である。本稿では,ネットワークの初期層にコントラスト除去コンポーネントを統合し,特徴学習を強化する改良型HarDNetセグメンテーションアーキテクチャを提案する。また、マルチカラー空間テンソルマージプロセスを利用し、畳み込みブロックの調和形状を調整し、これらの追加的特徴を容易にする。提案モデルでは,光肌患者の創傷画像を用いてトレーニングを行い,より暗い肌の症例のみからなる2つのテストセット(1セットは真実,もう1セットは不要)でモデルをテストする。主観的評価は, クラス内相関係数を指標とした臨床創傷専門家から得られた。土台真実を含む暗色の音色検定では、Dice類似度係数(+0.1221)と結合(+0.1274)の交叉による改善を実証する。定性分析では, 基準モデルと提案モデルを比較した場合, 3%に改善が認められた。本研究は, 創傷画像のみを訓練したモデルを用いて, 慢性的な創傷セグメント化のための黒色色調に焦点を当てた最初の研究である。糖尿病は、患者がより暗い肌の色合いを持つ国で流行し、そのようなケースにもっと焦点を合わせる必要があることを強調している。また, 慢性的な創傷断裂に対して, 今までで最大の定性的研究を行った。

Chronic wounds and associated complications present ever growing burdens for clinics and hospitals world wide. Venous, arterial, diabetic, and pressure wounds are becoming increasingly common globally. These conditions can result in highly debilitating repercussions for those affected, with limb amputations and increased mortality risk resulting from infection becoming more common. New methods to assist clinicians in chronic wound care are therefore vital to maintain high quality care standards. This paper presents an improved HarDNet segmentation architecture which integrates a contrast-eliminating component in the initial layers of the network to enhance feature learning. We also utilise a multi-colour space tensor merging process and adjust the harmonic shape of the convolution blocks to facilitate these additional features. We train our proposed model using wound images from light-skinned patients and test the model on two test sets (one set with ground truth, and one without) comprising only darker-skinned cases. Subjective ratings are obtained from clinical wound experts with intraclass correlation coefficient used to determine inter-rater reliability. For the dark-skin tone test set with ground truth, we demonstrate improvements in terms of Dice similarity coefficient (+0.1221) and intersection over union (+0.1274). Qualitative analysis showed high expert ratings, with improvements of >3% demonstrated when comparing the baseline model with the proposed model. This paper presents the first study to focus on darker-skin tones for chronic wound segmentation using models trained only on wound images exhibiting lighter skin. Diabetes is highly prevalent in countries where patients have darker skin tones, highlighting the need for a greater focus on such cases. Additionally, we conduct the largest qualitative study to date for chronic wound segmentation.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# スピン-j系のエンタングリングパワー--幾何学的アプローチ

Entangling power of spin-j systems: a geometrical approach ( http://arxiv.org/abs/2410.03361v1 )

ライセンス: Link先を確認

Eduardo Serrano-Ensástiga, Diego Morachis Galindo, Jesús A. Maytorena, Chryssomalis Chryssomalakos,

(参考訳) 高いエンタングリング能力を持つユニタリゲートは、そのエンタングリング能力のためにいくつかの量子強化技術に関係している。スピン状態やボゾン系のような対称多粒子系では、粒子交換対称性はこれらのゲートと非絡み合い状態の集合を制限する。本研究では、SU(2)不変量から得られる成分を持つベクトル間の内積として再構成することで、スピン系の絡み合う力を解析する。このアプローチにより、小さなスピンに対して最大化するユニタリゲートの検出を含む、小さなスピン系に対して、この量を研究することができる。極端ユニタリゲートは、ある状態のフシミ関数の凸結合に結びついているのと同様、高い回転対称性を持つ絡み合い分布を示す。さらに、エンタングリングパワーといくつかのスピン状態部分空間で許容されるシュミット数との関係について検討する。このように、ここで提示される幾何学的アプローチは、量子情報理論における他の概念と結びついた絡み合う力を研究するための新しい経路を示唆している。

Unitary gates with high entangling power are relevant for several quantum-enhanced technologies due to their entangling capabilities. For symmetric multiparticle systems, such as spin states or bosonic systems, the particle exchange symmetry restricts these gates and also the set of not-entangled states. In this work, we analyze the entangling power of spin systems by reformulating it as an inner product between vectors with components given by SU(2) invariants. This approach allows us to study this quantity for small-spin systems including the detection of the unitary gate that maximizes it for small spins. We observe that extremal unitary gates exhibit entanglement distributions with high rotational symmetry, same that are linked to a convex combination of Husimi functions of certain states. Furthermore, we explore the connection between entangling power and the Schmidt numbers admissible in some spin state subspaces. Thus, the geometrical approach presented here suggests new paths for studying entangling power linked to other concepts in quantum information theory.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# オンライン非パラメトリック回帰のためのMinimax Adaptive Boosting

Minimax Adaptive Boosting for Online Nonparametric Regression ( http://arxiv.org/abs/2410.03363v1 )

ライセンス: Link先を確認

Paul Liautaud, Pierre Gaillard, Olivier Wintenberger,

(参考訳) 一般凸損失を伴う対向的オンライン非パラメトリック回帰の促進について検討した。我々はまず,パラメータフリーなオンライン勾配向上アルゴリズム(OGB)を導入し,その連鎖木への応用により,リプシッツ関数と競合する際の最小限の後悔を実現することを示す。非パラメトリック関数クラスと競合するのは難しいが、ローカルなリプシッツネスのようなローカルなパターンは、オンラインアルゴリズムがパフォーマンスを改善するために活用できる。連鎖木に基づくコアツリーにOGBを適用することにより,異なるリプシッツプロファイルに整合したすべてのプルーニングに対して効率よく競合し,局所正規性への最適依存を示す。その結果,オンライン回帰に対する局所的適応的最適率を持つ最初の計算効率の良いアルゴリズムが,対角的条件下で得られた。

We study boosting for adversarial online nonparametric regression with general convex losses. We first introduce a parameter-free online gradient boosting (OGB) algorithm and show that its application to chaining trees achieves minimax optimal regret when competing against Lipschitz functions. While competing with nonparametric function classes can be challenging, the latter often exhibit local patterns, such as local Lipschitzness, that online algorithms can exploit to improve performance. By applying OGB over a core tree based on chaining trees, our proposed method effectively competes against all prunings that align with different Lipschitz profiles and demonstrates optimal dependence on the local regularities. As a result, we obtain the first computationally efficient algorithm with locally adaptive optimal rates for online regression in an adversarial setting.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# 誤り訂正符号変換器:非統一から統一へ

Error Correction Code Transformer: From Non-Unified to Unified ( http://arxiv.org/abs/2410.03364v1 )

ライセンス: Link先を確認

Yongli Yan, Jieao Zhu, Tianyue Zheng, Jiaqi He, Linglong Dai,

(参考訳) チャネル符号化は、現代の無線システムにおいて信頼性の高いデータ伝送に不可欠であり、その重要性は、様々なエラー訂正コードをサポートする必要がある第6世代(6G)ネットワークの出現とともに増大する。しかし、従来のデコーダは特定のデコードアルゴリズムに適した固定ハードウェア回路として設計され、非効率性と柔軟性が制限された。これらの課題に対処するために,Pola,Low-Density Parity-Check(LDPC),Bose-Chaudhuri-Hocquenghem(BCH)など,複数の線形ブロックコードを扱う,コードに依存しないトランスフォーマーベースのデコードアーキテクチャを提案する。これを実現するために、標準化されたユニットが様々なコードタイプにまたがるパラメータを調和させるのに使われ、再設計された統一されたアテンションモジュールは様々なコードワードの構造情報を圧縮する。さらに、パリティチェック行列の空間性から派生したスパースマスクを導入し、情報とパリティチェックビット間の固有の制約を捕捉し、復号精度とロバスト性を向上させる。広範にわたる実験結果から,提案手法は既存の手法に勝るだけでなく,次世代無線通信システムに対して,柔軟性,効率,高性能なソリューションを提供することが示された。

Channel coding is vital for reliable data transmission in modern wireless systems, and its significance will increase with the emergence of sixth-generation (6G) networks, which will need to support various error correction codes. However, traditional decoders were typically designed as fixed hardware circuits tailored to specific decoding algorithms, leading to inefficiencies and limited flexibility. To address these challenges, this paper proposes a unified, code-agnostic Transformer-based decoding architecture capable of handling multiple linear block codes, including Polar, Low-Density Parity-Check (LDPC), and Bose-Chaudhuri-Hocquenghem (BCH), within a single framework. To achieve this, standardized units are employed to harmonize parameters across different code types, while the redesigned unified attention module compresses the structural information of various codewords. Additionally, a sparse mask, derived from the sparsity of the parity-check matrix, is introduced to enhance the model's ability to capture inherent constraints between information and parity-check bits, resulting in improved decoding accuracy and robustness. Extensive experimental results demonstrate that the proposed unified Transformer-based decoder not only outperforms existing methods but also provides a flexible, efficient, and high-performance solution for next-generation wireless communication systems.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# 生成拡散モデルにおける潜在抽象化

Latent Abstractions in Generative Diffusion Models ( http://arxiv.org/abs/2410.03368v1 )

ライセンス: Link先を確認

Giulio Franzese, Mattia Martini, Giulio Corallo, Paolo Papotti, Pietro Michiardi,

(参考訳) 本研究では,拡散に基づく生成モデルが画像などの高次元データをどのように生成するかを,低次元の潜在抽象概念の表現に暗黙的に頼って検討し,生成過程を導く。我々は,NLFを拡張した理論的枠組みを提案し,SDEに基づく生成モデルについて一意に考察する。本理論の進展は, 関節(状態および測定)力学の新たな定式化と, システム状態が測定過程に与える影響の情報理論的尺度に依存する。我々の理論によれば、拡散モデルはSDEのシステムとしてキャストすることができ、観測不可能な遅延抽象の進化が観測可能な測定過程(生成経路に対応する)のダイナミクスを操縦する非線形フィルタを記述することができる。さらに、生成過程の異なる段階における潜伏抽象の出現に関する、我々の理論と過去の経験的結果を検証するための実証的研究を行った。

In this work we study how diffusion-based generative models produce high-dimensional data, such as an image, by implicitly relying on a manifestation of a low-dimensional set of latent abstractions, that guide the generative process. We present a novel theoretical framework that extends NLF, and that offers a unique perspective on SDE-based generative models. The development of our theory relies on a novel formulation of the joint (state and measurement) dynamics, and an information-theoretic measure of the influence of the system state on the measurement process. According to our theory, diffusion models can be cast as a system of SDE, describing a non-linear filter in which the evolution of unobservable latent abstractions steers the dynamics of an observable measurement process (corresponding to the generative pathways). In addition, we present an empirical study to validate our theory and previous empirical results on the emergence of latent abstractions at different stages of the generative process.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# インターバル境界の伝播を再び大きくする

Make Interval Bound Propagation great again ( http://arxiv.org/abs/2410.03373v1 )

ライセンス: Link先を確認

Patryk Krukowski, Daniel Wilczak, Jacek Tabor, Anna Bielawska, Przemysław Spurek,

(参考訳) 医療データ分析,自律運転,対人訓練など,実生活に動機づけられたさまざまなシナリオにおいて,我々は,堅牢なディープネットワークに関心を持っている。入力の比較的小さな摂動が出力の劇的な変化(クラスの変更など)を引き起こすことができない場合、ネットワークは堅牢である。これは、NNC(Neural Network Certification)の幅広い分野に該当する。 NNCにおける2つの重要な問題は、与えられた事前訓練されたネットワークの堅牢性を計算する方法と、堅牢なネットワークを構築する方法である。堅牢なネットワークを構築するための一般的なアプローチは、インターバルバウンド・プロパゲーション (Interval Bound Propagation, IBP) である。本報告では,IPPは包装効果に感受性があることから,第1の症例では準最適であることを示す。線形活性化においても、IPPは強い準最適境界を与える。したがって、ラップ効果に免疫的な戦略を用いて最適に近い境界を得る必要がある。我々は、ニューラルネットワークのラップ効果を軽減するために、厳密な計算に特化した2つの古典的なアプローチ、Dubleton ArithmeticとAffine Arithmeticを適用する。これらの手法は線形活性化関数を持つネットワークに対して正確な結果をもたらし、ラップ効果に抵抗する。その結果,IPBよりも最適値に近い値が得られることがわかった。

In various scenarios motivated by real life, such as medical data analysis, autonomous driving, and adversarial training, we are interested in robust deep networks. A network is robust when a relatively small perturbation of the input cannot lead to drastic changes in output (like change of class, etc.). This falls under the broader scope field of Neural Network Certification (NNC). Two crucial problems in NNC are of profound interest to the scientific community: how to calculate the robustness of a given pre-trained network and how to construct robust networks. The common approach to constructing robust networks is Interval Bound Propagation (IBP). This paper demonstrates that IBP is sub-optimal in the first case due to its susceptibility to the wrapping effect. Even for linear activation, IBP gives strongly sub-optimal bounds. Consequently, one should use strategies immune to the wrapping effect to obtain bounds close to optimal ones. We adapt two classical approaches dedicated to strict computations -- Dubleton Arithmetic and Affine Arithmetic -- to mitigate the wrapping effect in neural networks. These techniques yield precise results for networks with linear activation functions, thus resisting the wrapping effect. As a result, we achieve bounds significantly closer to the optimal level than IBPs.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# SoundSignature:どんな音楽が好き?

SoundSignature: What Type of Music Do You Like? ( http://arxiv.org/abs/2410.03375v1 )

ライセンス: Link先を確認

Brandon James Carone, Pablo Ripollés,

(参考訳) SoundSignatureは、ユーザーのお気に入りの曲を分析するためにカスタムのOpenAIアシスタントを統合する音楽アプリケーションである。このシステムには最先端の音楽情報検索(MIR)Pythonパッケージが組み込まれており、抽出された音響的・音楽的特徴と、アシスタントのアーティストやバンドに関する広範な知識を組み合わせている。この知識を組み合わせることでSoundSignatureは、新たなIoT of Sounds(IoS)エコシステムのセマンティックオーディオと原則を活用し、MIRとAIを統合して、音楽の音響特性に関するパーソナライズされた洞察をユーザに提供する。ユーザーはチャットボットと対話して、演奏された音響分析と音楽の味との関係についてより深い質問をすることができる。この対話性はアプリケーションを変え、親しみのある曲やお気に入りの曲に関する情報資源としてだけでなく、ユーザーが音楽の特徴、音楽理論、信号処理でよく使われる音響特性、そして音楽の背後にあるアーティストの理解を深めるための教育プラットフォームとしても機能する。一般的なユーザビリティ以外にも、コード認識アルゴリズム(CREMA)、ソース分離アルゴリズム(DEMUCS)、オーディオ・トゥ・MIDIコンバータ(基本ピッチ)など、確立されたオープンソースのミュージシャン固有のツールが組み込まれている。これらの機能は、コーディングスキルのないユーザが、チャットボットと対話することで、高度なオープンソースの音楽処理アルゴリズムにアクセスできるようにする。本稿では,アプリケーションの革新的な特徴と教育的可能性を強調し,その有効性とユーザビリティを評価するパイロットユーザ研究から得られた知見を紹介する。

SoundSignature is a music application that integrates a custom OpenAI Assistant to analyze users' favorite songs. The system incorporates state-of-the-art Music Information Retrieval (MIR) Python packages to combine extracted acoustic/musical features with the assistant's extensive knowledge of the artists and bands. Capitalizing on this combined knowledge, SoundSignature leverages semantic audio and principles from the emerging Internet of Sounds (IoS) ecosystem, integrating MIR with AI to provide users with personalized insights into the acoustic properties of their music, akin to a musical preference personality report. Users can then interact with the chatbot to explore deeper inquiries about the acoustic analyses performed and how they relate to their musical taste. This interactivity transforms the application, acting not only as an informative resource about familiar and/or favorite songs, but also as an educational platform that enables users to deepen their understanding of musical features, music theory, acoustic properties commonly used in signal processing, and the artists behind the music. Beyond general usability, the application also incorporates several well-established open-source musician-specific tools, such as a chord recognition algorithm (CREMA), a source separation algorithm (DEMUCS), and an audio-to-MIDI converter (basic-pitch). These features allow users without coding skills to access advanced, open-source music processing algorithms simply by interacting with the chatbot (e.g., can you give me the stems of this song?). In this paper, we highlight the application's innovative features and educational potential, and present findings from a pilot user study that evaluates its efficacy and usability.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# ベクトル量子化による深部強化学習における逆方向摂動の緩和

Mitigating Adversarial Perturbations for Deep Reinforcement Learning via Vector Quantization ( http://arxiv.org/abs/2410.03376v1 )

ライセンス: Link先を確認

Tung M. Luu, Thanh Nguyen, Tee Joshua Tian Jin, Sungwoon Kim, Chang D. Yoo,

(参考訳) 近年の研究では、訓練における優れた強化学習(RL)エージェントは、デプロイメント中に敵の摂動に対してレジリエンスを欠いていることが示されている。これは、現実世界にデプロイする前に堅牢なエージェントを構築することの重要性を強調している。従来の作業は、ディープニューラルネットワークコンポーネント自体の堅牢性の向上や、強力な攻撃に対するエージェントの対角的なトレーニングなど、この問題に対処するための堅牢なトレーニングベースの手順の開発に重点を置いていた。そこで本研究では,RLの入力変換に基づくディフェンスについて検討する。具体的には、ベクトル量子化(VQ)の変種を入力観測の変換として使用し、テスト中の敵攻撃の空間を削減し、その結果、変換された観測は攻撃の影響を受けない。本手法は, 計算効率が高く, 対人訓練とシームレスに統合され, 対人攻撃に対するRLエージェントの堅牢性をさらに向上する。複数の環境における広範囲な実験を通して、VQを入力変換として使用すると、エージェントの観察に対する敵の攻撃に対して効果的に防御できることを示した。

Recent studies reveal that well-performing reinforcement learning (RL) agents in training often lack resilience against adversarial perturbations during deployment. This highlights the importance of building a robust agent before deploying it in the real world. Most prior works focus on developing robust training-based procedures to tackle this problem, including enhancing the robustness of the deep neural network component itself or adversarially training the agent on strong attacks. In this work, we instead study an input transformation-based defense for RL. Specifically, we propose using a variant of vector quantization (VQ) as a transformation for input observations, which is then used to reduce the space of adversarial attacks during testing, resulting in the transformed observations being less affected by attacks. Our method is computationally efficient and seamlessly integrates with adversarial training, further enhancing the robustness of RL agents against adversarial attacks. Through extensive experiments in multiple environments, we demonstrate that using VQ as the input transformation effectively defends against adversarial attacks on the agent's observations.

翻訳日:2024-11-02 22:48:52 公開日:2024-10-04

# 因果微分ネットワークによる摂動目標予測

Predicting perturbation targets with causal differential networks ( http://arxiv.org/abs/2410.03380v1 )

ライセンス: Link先を確認

Menghua Wu, Umesh Padia, Sean H. Murphy, Regina Barzilay, Tommi Jaakkola,

(参考訳) 生物学的システムの変更に関与する変数を相対的に同定することで、病気の理解や細胞工学における無数の応用が可能になる。因果関係の観点から、同じ因果関係モデルによって生成された2つのデータセット、観察的(制御)と介入的(摂動)が与えられる。目的は、介入の標的である測定変数(eg遺伝子)のサブセットを分離することである。因果グラフを知ることは、探索空間を制限し、これらの変数を効率的に特定することを可能にする。しかしながら、未知の介入目標が存在する場合、因果グラフを推定する現在のアルゴリズムは、グラフの組合せ空間と一貫した介入目標を共同で探索する必要があるため、生物学的データ中の数百から数千の変数に不適切にスケールする。本研究では,2つの探索ステップを分離する摂動目標の予測に因果性に着想を得たアプローチを提案する。まず, 因果グラフを観察データと介入データから別々に推定するために, 償却因果探索モデルを用いる。そして、これらのペアグラフを、教師付き学習フレームワークにおいて、介入された変数の集合にマッピングすることを学ぶ。このアプローチは、数千の変数を持つ7つのシングルセルトランスクリプトミクスデータセットの摂動モデリングのベースラインを一貫して上回る。また、6つの因果探索アルゴリズムに対して、様々な抽出可能な合成データセットの介入目標を予測することで、大幅な改善を示す。

Rationally identifying variables responsible for changes to a biological system can enable myriad applications in disease understanding and cell engineering. From a causality perspective, we are given two datasets generated by the same causal model, one observational (control) and one interventional (perturbed). The goal is to isolate the subset of measured variables (e.g. genes) that were the targets of the intervention, i.e. those whose conditional independencies have changed. Knowing the causal graph would limit the search space, allowing us to efficiently pinpoint these variables. However, current algorithms that infer causal graphs in the presence of unknown intervention targets scale poorly to the hundreds or thousands of variables in biological data, as they must jointly search the combinatorial spaces of graphs and consistent intervention targets. In this work, we propose a causality-inspired approach for predicting perturbation targets that decouples the two search steps. First, we use an amortized causal discovery model to separately infer causal graphs from the observational and interventional datasets. Then, we learn to map these paired graphs to the sets of variables that were intervened upon, in a supervised learning framework. This approach consistently outperforms baselines for perturbation modeling on seven single-cell transcriptomics datasets, each with thousands of measured variables. We also demonstrate significant improvements over six causal discovery algorithms in predicting intervention targets across a variety of tractable, synthetic datasets.

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# マシン内のcogs、それがすべきことをする -- WMT24の一般翻訳タスクへのAMIサブミッション

Cogs in a Machine, Doing What They're Meant to Do -- The AMI Submission to the WMT24 General Translation Task ( http://arxiv.org/abs/2410.03381v1 )

ライセンス: Link先を確認

Atli Jasonarson, Hinrik Hafsteinsson, Bjarki Ármannsson, Steinþór Steingrímsson,

(参考訳) 本稿では,WMT24の一般翻訳課題に対する「アルニ・マグヌッソン研究所のチーム」の提出について述べる。我々は、英語からアイスランド語への翻訳の方向について研究している。本システムは4つの翻訳モデルと文法補正モデルから構成される。モデルのトレーニングには、データセットを慎重にキュレートし、システムの出力の品質に有害な可能性のある文ペアを積極的にフィルタリングします。データのいくつかは人間の翻訳から収集され、いくつかは人工的に生成される。合成データの一部がLLMを用いて生成され,システムの翻訳能力が著しく向上することがわかった。

This paper presents the submission of the \'Arni Magnusson Institute's team to the WMT24 General translation task. We work on the English->Icelandic translation direction. Our system comprises four translation models and a grammar correction model. For training our models we carefully curate our datasets, aggressively filtering out sentence pairs that may detrimentally affect the quality of our system's output. Some of our data are collected from human translations and some are synthetically generated. A part of the synthetic data is generated using an LLM, and we find that it increases the translation capability of our system significantly.

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# ポッツ量子スピン鎖上の閉じ込めと偽真空崩壊

Confinement and false vacuum decay on the Potts quantum spin chain ( http://arxiv.org/abs/2410.03382v1 )

ライセンス: Link先を確認

Octavio Pomponio, Anna Krasznai, Gábor Takács,

(参考訳) 強磁性状態における混合体三状態ポッツ量子鎖における量子クエンチ後の非平衡ダイナミクスを考察する。イジングスピン鎖の類似した設定と比較すると、ポッツモデルはよりリッチな現象論を持ち、これは部分的にスペクトルのバリオン励起から始まり、部分的には初期磁化と長手磁場の様々な相対的アライメントから生じる。半古典的近似と正確な対角化を組み合わせることで励起スペクトルを求め、その結果を用いて観察する様々な動的挙動を説明する。ダイナミックな閉じ込めの回復に加え、イジング連鎖に似たブロッホ振動によるワニエ・スタークの局在は、クエンチ分光におけるバリオン励起の存在を特徴としている。さらに、初期磁化と長手磁場が不一致の場合には、閉じ込めとブロッホ振動の両方が部分的な局所化をもたらすだけであり、いくつかの相関は、エンタングルメントエントロピーの対応する成長とともに、抑制されていない光円錐挙動を保持する。

We consider non-equilibrium dynamics after quantum quenches in the mixed-field three-state Potts quantum chain in the ferromagnetic regime. Compared to the analogous setting for the Ising spin chain, the Potts model has a much richer phenomenology, which originates partly from baryonic excitations in the spectrum and partly from the various possible relative alignments of the initial magnetisation and the longitudinal field. We obtain the excitation spectrum by combining semi-classical approximation and exact diagonalisation, and we use the results to explain the various dynamical behaviours we observe. Besides recovering dynamical confinement, as well as Wannier-Stark localisation due to Bloch oscillations similar to the Ising chain, a novel feature is the presence of baryonic excitations in the quench spectroscopy. In addition, when the initial magnetisation and the longitudinal field are misaligned, both confinement and Bloch oscillations only result in partial localisation, with some correlations retaining an unsuppressed light-cone behaviour together with a corresponding growth of entanglement entropy.

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# てんかん発作の分類から検出へ:脳波信号の深層学習に基づくアプローチ

From Epilepsy Seizures Classification to Detection: A Deep Learning-based Approach for Raw EEG Signals ( http://arxiv.org/abs/2410.03385v1 )

ライセンス: Link先を確認

Davy Darankoum, Manon Villalba, Clelia Allioux, Baptiste Caraballo, Carine Dumont, Eloise Gronlier, Corinne Roucard, Yann Roche, Chloe Habermacher, Sergei Grudinin, Julien Volle,

(参考訳) てんかんは世界で最も多い神経疾患である。間質性側頭葉てんかん(MTLE)の3分の1は薬剤耐性を示し、新しい治療の必要性を訴えている。抗敗血症薬(ASM)開発における重要な役割は、脳波(EEG)信号で発生するてんかんを検出・定量する能力であり、治療効果の評価に欠かせない。本研究では,脳波信号に適用した深層学習モデルに基づく発作検出パイプラインを提案する。このパイプラインは、発作と発作のない活動を事前に区別せずに、連続した生の脳波信号をセグメント化する新しい前処理技術、脳波のセグメントを再編成し、発作の開始/終了の識別を可能にする後処理アルゴリズム、そして最後に、予測されたラベルと実際のラベルの厳密な発作イベントの比較に基づく新しい評価手順を統合する。データ漏洩の可能性に対処するデータ分割戦略を使用して、モデルトレーニングが実施されている。発作分類と発作検出タスクの基本的な相違を実証し,2つのタスクのパフォーマンスの相違を示した。最後に、畳み込みニューラルネットワークとトランスフォーマーエンコーダを組み合わせて、最高のアーキテクチャの種間での一般化能力を実証した。モデルは動物の脳波でトレーニングされ、バランスの取れたボンデータセット上でF1スコアの93%で人間の脳波でテストされた。

Epilepsy represents the most prevalent neurological disease in the world. One-third of people suffering from mesial temporal lobe epilepsy (MTLE) exhibit drug resistance, urging the need to develop new treatments. A key part in anti-seizure medication (ASM) development is the capability of detecting and quantifying epileptic seizures occurring in electroencephalogram (EEG) signals, which is crucial for treatment efficacy evaluation. In this study, we introduced a seizure detection pipeline based on deep learning models applied to raw EEG signals. This pipeline integrates: a new pre-processing technique which segments continuous raw EEG signals without prior distinction between seizure and seizure-free activities; a post-processing algorithm developed to reassemble EEG segments and allow the identification of seizures start/end; and finally, a new evaluation procedure based on a strict seizure events comparison between predicted and real labels. Models training have been performed using a data splitting strategy which addresses the potential for data leakage. We demonstrated the fundamental differences between a seizure classification and a seizure detection task and showed the differences in performance between the two tasks. Finally, we demonstrated the generalization capabilities across species of our best architecture, combining a Convolutional Neural Network and a Transformer encoder. The model was trained on animal EEGs and tested on human EEGs with a F1-score of 93% on a balanced Bonn dataset.

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# 行動データを用いた慢性疾患診断

Chronic Disease Diagnoses Using Behavioral Data ( http://arxiv.org/abs/2410.03386v1 )

ライセンス: Link先を確認

Di Wang, Yidan Hu, Eng Sing Lee, Hui Hwang Teong, Ray Tian Rui Lai, Wai Han Hoi, Chunyan Miao,

(参考訳) 慢性疾患の早期発見は、タイムリーな介入の黄金の機会を提供することによって、医療にとって有益である。多くの先行研究は、疾患の診断に機械学習(ML)モデルを使うことに成功しているが、医療データに大きく依存しており、慢性疾患の初期段階のほとんどの患者には不十分である。本稿では, 糖尿病, 高脂血症, 高血圧症(総称して3H) の診断を, 臨床現場で収集した医療データを用いることなく早期に3Hの検出を可能にすることを目的とする。具体的には、3ヶ月の学習期間で629人の被験者から毎日の行動データを収集し、データ前処理後のさまざまなMLモデルを訓練した。実験の結果, 糖尿病, 高脂血症, 高血圧の3H診断は, それぞれ80.2\%, 71.3\%, 81.2\%であった。さらに、訓練されたモデル上でShapley分析を行い、疾患の種類ごとに最も影響のある特徴を特定する。特定された影響力のある特徴は、文献で報告された特徴と一致している。

Early detection of chronic diseases is beneficial to healthcare by providing a golden opportunity for timely interventions. Although numerous prior studies have successfully used machine learning (ML) models for disease diagnoses, they highly rely on medical data, which are scarce for most patients in the early stage of the chronic diseases. In this paper, we aim to diagnose hyperglycemia (diabetes), hyperlipidemia, and hypertension (collectively known as 3H) using own collected behavioral data, thus, enable the early detection of 3H without using medical data collected in clinical settings. Specifically, we collected daily behavioral data from 629 participants over a 3-month study period, and trained various ML models after data preprocessing. Experimental results show that only using the participants' uploaded behavioral data, we can achieve accurate 3H diagnoses: 80.2\%, 71.3\%, and 81.2\% for diabetes, hyperlipidemia, and hypertension, respectively. Furthermore, we conduct Shapley analysis on the trained models to identify the most influential features for each type of diseases. The identified influential features are consistent with those reported in the literature.

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# イオン輸送における量子効果:熱力学資源論のアプローチ

Quantum Effects in Ion Transport: A Thermodynamic Resource Theory Approach ( http://arxiv.org/abs/2410.03389v1 )

ライセンス: Link先を確認

Amin Mohammadi, Afshin Shafiee,

(参考訳) 近年、量子状態における熱力学の理解は、ナノスケール物理学や実験技術の発展によって大きな注目を集めている。並行して、成長する証拠は、様々な生物学的プロセスにおける量子効果の重要性を支持し、量子熱力学に関係がますます高まっている。本研究では、熱力学の資源理論を応用し、細胞膜を横断するイオン輸送における量子的性質の役割を解明する。この枠組みの中で、量子的性質は、量子状態における一般化熱力学的制約の下で資源として扱われる。具体的には、イオン輸送力学におけるメモリ効果を反映する非マルコビアン性は、イオン輸送過程の収量と効率を高める重要な量子資源として機能することを示した。対照的に、イオン輸送タンパク質のエネルギー状態の重畳として表される量子コヒーレンスは、これらの指標を減らし、イオンチャネルとイオンポンプを区別する重要な役割を担っている。最後に、余剰コヒーレントシステムを導入することで、コヒーレンスによりイオンポンプのイオンチャネルへの変換が容易になることを示す。

In recent years, understanding thermodynamics in the quantum regime has garnered significant attention, driven by advances in nanoscale physics and experimental techniques. In parallel, growing evidence supports the importance of quantum effects in various biological processes, making them increasingly relevant to quantum thermodynamics. In this study, we apply resource theory formulations of thermodynamics to investigate the role of quantum properties in ion transport across cell membranes. Within this framework, quantum properties are treated as resources under generalized thermodynamic constraints in the quantum regime. Specifically, our findings reveal that non-Markovianity, which reflects memory effects in ion transport dynamics, serves as a key quantum resource that enhances the yield and efficiency of the ion transport process. In contrast, quantum coherence, manifested as the superposition of energy states in ion-transport proteins, reduces these metrics but plays a crucial role in distinguishing between ion channels and ion pumps: two distinct types of ion-transport proteins in cell membranes. Finally, we demonstrate that introducing an additional coherent system allows coherence to facilitate the transformation of an ion pump into an ion channel.

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# Lightning UQ Box: ディープラーニングにおける不確実性定量化のための総合的なフレームワーク

Lightning UQ Box: A Comprehensive Framework for Uncertainty Quantification in Deep Learning ( http://arxiv.org/abs/2410.03390v1 )

ライセンス: Link先を確認

Nils Lehmann, Jakob Gawlikowski, Adam J. Stewart, Vytautas Jancauskas, Stefan Depeweg, Eric Nalisnick, Nina Maria Gottschling,

(参考訳) 不確実性定量化(英: Uncertainty Quantification、UQ)は、ディープニューラルネットワーク(DNN)を現実世界のタスクに適用するための重要なツールである。しかし、その利点にもかかわらず、UQは既存のUQ手順を適用し評価するために必要な追加の技術知識のため、標準のDNNワークフローから外されることが多い。したがって、ユーザーは大きなオーバーヘッドを伴わずに、UQをモデリングワークフローに統合できる包括的なツールボックスが必要である。本稿では,UQ に対する様々なアプローチを適用し評価するための統一インターフェースである \texttt{Lightning UQ Box を紹介する。本稿では,ツールボックスに実装された最先端のUQ手法を理論的,定量的に比較する。私たちは2つの挑戦的なビジョンタスクに焦点を合わせます。一赤外線衛星画像から熱帯低気圧風速の推定 (II)空のRGB画像から太陽電池パネルの出力を推定する。方法の違いを強調することで、我々の結果は、UQメソッドのベンチマークに使用できる、広範かつアプローチ可能なUQの実験フレームワークの必要性を示しています。ツールボックス、実装例、その他の情報は、https://github.com/lightning-uq-box/lightning-uq-boxで確認できる。

Uncertainty quantification (UQ) is an essential tool for applying deep neural networks (DNNs) to real world tasks, as it attaches a degree of confidence to DNN outputs. However, despite its benefits, UQ is often left out of the standard DNN workflow due to the additional technical knowledge required to apply and evaluate existing UQ procedures. Hence there is a need for a comprehensive toolbox that allows the user to integrate UQ into their modelling workflow, without significant overhead. We introduce \texttt{Lightning UQ Box}: a unified interface for applying and evaluating various approaches to UQ. In this paper, we provide a theoretical and quantitative comparison of the wide range of state-of-the-art UQ methods implemented in our toolbox. We focus on two challenging vision tasks: (i) estimating tropical cyclone wind speeds from infrared satellite imagery and (ii) estimating the power output of solar panels from RGB images of the sky. By highlighting the differences between methods our results demonstrate the need for a broad and approachable experimental framework for UQ, that can be used for benchmarking UQ methods. The toolbox, example implementations, and further information are available at: https://github.com/lightning-uq-box/lightning-uq-box

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# 線形偏光に対する量子回路の複雑さ

Quantum circuit complexity for linearly polarised light ( http://arxiv.org/abs/2410.03391v1 )

ライセンス: Link先を確認

E. M. F. Curado, S. Faci, J. P. Gazeau, T. Koide, A. C. Maioli, D. Noguera,

(参考訳) 本研究では,オープンシステムに拡張する量子回路の複雑性の形式について検討する。この方法論を説明するために、状態の射影ヒルベルト空間がユークリッド平面の向きの集合によって描写される基本モデルに焦点を当てる。具体的には、混合量子状態がゲート列と相互作用する際のダイナミクスについて検討する。提案手法では, 実数 2 の密度行列の列を解析する。この数学的モデルは、準単色光線の線形偏光を規定するストークス密度行列と、量子偏光子と見なされるゲートによって物理的に例示される。偏光-直線偏光間の相互作用は、この量子形式論の文脈内で解釈される。光の密度行列は、連続ゲート間の時間間隔で、ゴリニ-コサコフスキー-リンドブラッド-スダルシャン過程(GKLS)に類似したアプローチで進化する。特に、コスト関数や寛容性、精度の上限を考えると、最適なゲート数と電力-法則の関係が従うことが分かる。

In this study, we explore a form of quantum circuit complexity that extends to open systems. To illustrate our methodology, we focus on a basic model where the projective Hilbert space of states is depicted by the set of orientations in the Euclidean plane. Specifically, we investigate the dynamics of mixed quantum states as they undergo interactions with a sequence of gates. Our approach involves the analysis of sequences of real $2\times2$ density matrices. This mathematical model is physically exemplified by the Stokes density matrices, which delineate the linear polarisation of a quasi-monochromatic light beam, and the gates, which are viewed as quantum polarisers, whose states are also real $2\times2$ density matrices. The interaction between polariser-linearly polarised light is construed within the context of this quantum formalism. Each density matrix for the light evolves in an approach analogous to a Gorini-Kossakowski-Lindblad-Sudarshan (GKLS) process during the time interval between consecutive gates. Notably, when considering an upper limit for the cost function or tolerance or accuracy, we unearth that the optimal number of gates follows a power-law relationship.

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# 2羽のハエを1羽の石で殺す: 英語の「Icelandic Idioms」と「Proper Names」を使ってLSMを壊そうとする試み

Killing Two Flies with One Stone: An Attempt to Break LLMs Using English->Icelandic Idioms and Proper Names ( http://arxiv.org/abs/2410.03394v1 )

ライセンス: Link先を確認

Bjarki Ármannsson, Hinrik Hafsteinsson, Atli Jasonarson, Steinþór Steingrímsson,

(参考訳) 本稿では,WMT24テストスイートのサブタスクに,'Arni Magn\'usson Institute'sのチームが参加し,英語からアイスランド語への翻訳方向の慣用的な表現と適切な名前に焦点をあてる。直感的にも経験的にも、慣用句や固有名称は現代の翻訳モデルにとって重要な課題であることが知られている。 2つの異なるテストスイートを作成します。 1つ目は、一般的な英語の慣用表現を翻訳する際のMTシステムの能力を評価し、また、リテラル文脈で使用する場合、それらの表現と同一のフレーズを区別できるかどうかをテストする。第2のテストスイートはアイスランド語の異名に翻訳されるべき地名と、男性と女性の間の表面的な形態を共有するアイスランド語の2つの名前からなるため、誤った翻訳が読みやすさに影響を及ぼす。報告されたスコアは比較的低く、特に慣用的な表現や地名についてであり、改善の余地がかなりあることを示している。

This paper presents the submission of the \'Arni Magn\'usson Institute's team to the WMT24 test suite subtask, focusing on idiomatic expressions and proper names for the English->Icelandic translation direction. Intuitively and empirically, idioms and proper names are known to be a significant challenge for modern translation models. We create two different test suites. The first evaluates the competency of MT systems in translating common English idiomatic expressions, as well as testing whether systems can distinguish between those expressions and the same phrases when used in a literal context. The second test suite consists of place names that should be translated into their Icelandic exonyms (and correctly inflected) and pairs of Icelandic names that share a surface form between the male and female variants, so that incorrect translations impact meaning as well as readability. The scores reported are relatively low, especially for idiomatic expressions and place names, and indicate considerable room for improvement.

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# GraphCroc: グラフ構造再構築のための相互相関オートエンコーダ

GraphCroc: Cross-Correlation Autoencoder for Graph Structural Reconstruction ( http://arxiv.org/abs/2410.03396v1 )

ライセンス: Link先を確認

Shijin Duan, Ruyi Ding, Jiaxing He, Aidong Adam Ding, Yunsi Fei, Xiaolin Xu,

(参考訳) グラフ構造化データは、多くのアプリケーションに不可欠なものであり、様々なグラフ表現法の開発を促す。グラフオートエンコーダ(GAE)、特にノード埋め込みからグラフ構造を再構築する。現在のGAEモデルは、主に自己相関を利用してグラフ構造を表現し、しばしばマルチグラフシナリオを見下ろすノードレベルのタスクに焦点を当てている。我々の理論的分析は、一般に島、対称構造、方向エッジといった特定のグラフの特徴、特により小さいグラフや複数のグラフの文脈において正確に表現できないことを示唆している。これらの制約に対処するために,GAE表現能力を著しく向上する相互相関機構を導入する。さらに,様々な下流タスクに適したフレキシブルエンコーダアーキテクチャをサポートする新しいGAEであるGraphCrocを提案する。このモデルは、損失分散戦略を実装することにより、最適化中の表現バイアスの課題にも対処する。理論的解析と数値評価の両方で、我々の手法はグラフ構造再構築において既存の自己相関に基づくGAEよりも著しく優れていることが示されている。

Graph-structured data is integral to many applications, prompting the development of various graph representation methods. Graph autoencoders (GAEs), in particular, reconstruct graph structures from node embeddings. Current GAE models primarily utilize self-correlation to represent graph structures and focus on node-level tasks, often overlooking multi-graph scenarios. Our theoretical analysis indicates that self-correlation generally falls short in accurately representing specific graph features such as islands, symmetrical structures, and directional edges, particularly in smaller or multiple graph contexts. To address these limitations, we introduce a cross-correlation mechanism that significantly enhances the GAE representational capabilities. Additionally, we propose GraphCroc, a new GAE that supports flexible encoder architectures tailored for various downstream tasks and ensures robust structural reconstruction, through a mirrored encoding-decoding process. This model also tackles the challenge of representation bias during optimization by implementing a loss-balancing strategy. Both theoretical analysis and numerical evaluations demonstrate that our methodology significantly outperforms existing self-correlation-based GAEs in graph structure reconstruction.

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# EBES: イベントシーケンスのベンチマークを容易にする

EBES: Easy Benchmarking for Event Sequences ( http://arxiv.org/abs/2410.03399v1 )

ライセンス: Link先を確認

Dmitry Osin, Igor Udovichenko, Viktor Moskvoretskii, Egor Shvetsov, Evgeny Burnaev,

(参考訳) イベントシーケンスは、不規則なサンプリング間隔とカテゴリと数値の混合によって特徴づけられ、医療、ファイナンス、ユーザーインタラクションログといった様々な現実世界のドメインで一般的なデータ構造である。時間データモデリング技術の進歩にもかかわらず、イベントシーケンスのパフォーマンスを評価するための標準ベンチマークは存在しない。これは、様々な評価プロトコルによって異なる論文間での結果の比較を複雑にし、この分野の進歩を誤解させる可能性がある。本稿では,標準化された評価シナリオとプロトコルを備えた総合的なベンチマークツールEBESを紹介する。私たちのライブラリは、統一インターフェースによるベンチマーク、データセットの追加、メソッド統合を簡単にします。それは、新しい合成データセットを含み、公開可能な最大の銀行用データセットを含む、前処理された現実世界のデータセットを提供する。この結果から,データセットの詳細な分析を行い,モデル比較に不適なデータセットを同定した。本稿では、時間的およびシーケンシャルなコンポーネントのモデリングの重要性と、モデルの堅牢性とスケーリング特性について考察する。これらの知見は今後の研究の方向性を浮き彫りにしている。本ベンチマークの目的は,再現可能な研究の促進,進歩の迅速化,実環境への影響の増大である。

Event sequences, characterized by irregular sampling intervals and a mix of categorical and numerical features, are common data structures in various real-world domains such as healthcare, finance, and user interaction logs. Despite advances in temporal data modeling techniques, there is no standardized benchmarks for evaluating their performance on event sequences. This complicates result comparison across different papers due to varying evaluation protocols, potentially misleading progress in this field. We introduce EBES, a comprehensive benchmarking tool with standardized evaluation scenarios and protocols, focusing on regression and classification problems with sequence-level targets. Our library simplifies benchmarking, dataset addition, and method integration through a unified interface. It includes a novel synthetic dataset and provides preprocessed real-world datasets, including the largest publicly available banking dataset. Our results provide an in-depth analysis of datasets, identifying some as unsuitable for model comparison. We investigate the importance of modeling temporal and sequential components, as well as the robustness and scaling properties of the models. These findings highlight potential directions for future research. Our benchmark aim is to facilitate reproducible research, expediting progress and increasing real-world impacts.

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# 分散ネットワーク型マルチタスク学習

Distributed Networked Multi-task Learning ( http://arxiv.org/abs/2410.03403v1 )

ライセンス: Link先を確認

Lingzhou Hong, Alfredo Garcia,

(参考訳) 異種および/または相関したデータストリームを含む複数の線形モデル推定タスクを考慮に入れた分散マルチタスク学習方式を提案する。ノードを異なる学習タスクに対応するグループに分割し、有向ネットワークトポロジに従って通信することができると仮定する。各ノードは、線形モデルを非同期に推定し、それぞれ雑音低減と一般化性能の向上を目的とした局所(グループ内)正則化と大域(グループ間)正則化の条件を満たす。本稿では,推定器の収束度とタスク関係を有限時間で評価し,ランダム場温度推定と,異なる学区の学生のパフォーマンスのモデル化という2つの例において,スキームの一般適用性を説明する。

We consider a distributed multi-task learning scheme that accounts for multiple linear model estimation tasks with heterogeneous and/or correlated data streams. We assume that nodes can be partitioned into groups corresponding to different learning tasks and communicate according to a directed network topology. Each node estimates a linear model asynchronously and is subject to local (within-group) regularization and global (across groups) regularization terms targeting noise reduction and generalization performance improvement respectively. We provide a finite-time characterization of convergence of the estimators and task relation and illustrate the scheme's general applicability in two examples: random field temperature estimation and modeling student performance from different academic districts.

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# Camel: 差別的プライバシのシャッフルモデルにおけるコミュニケーション効率が高く悪意のあるフェデレーション学習

Camel: Communication-Efficient and Maliciously Secure Federated Learning in the Shuffle Model of Differential Privacy ( http://arxiv.org/abs/2410.03407v1 )

ライセンス: Link先を確認

Shuangqing Xu, Yifeng Zheng, Zhongyun Hua,

(参考訳) フェデレートラーニング(FL)は、複数のクライアントが、ローカルなプライベートデータを公開することなく、集約のための勾配更新のみを共有することで、モデルを共同でトレーニングすることのできる、魅力的なパラダイムとして急速に成長しています。プライバシーに敏感な勾配更新を保護するため、正式なプライバシー保証を提供するために、ローカル差分プライバシー(LDP)メカニズムの研究が続けられている。 LDPメカニズムでは、クライアントは集約のために共有する前に、勾配更新をローカルに中断する。しかし、そのような手法は、重音の付加のため、モデルユーティリティーを著しく劣化させることで知られている。より優れたプライバシユーティリティトレードオフを実現するために、最近のトレンドは、プライバシーの増幅を実現するために、摂動勾配更新の中間シャッフル操作に依存する、FLにおけるDPのシャッフルモデルを適用することである。本稿では,DP のシャッフルモデルにおける新しい通信効率と悪意のあるセキュアな FL フレームワークである Camel について述べる。 Camelはまず、シャッフル計算の整合性チェックを野心的にサポートし、悪意のある敵に対するセキュリティを達成することで、既存の作業から脱却する。具体的には、シークレット共有シャッフルのトレンドとなる暗号プリミティブに基づいて、システム全体の通信効率の最適化と、サーバ側の計算のセキュリティを強化するための軽量な整合性チェックのためのカスタム技術を開発した。さらに、FLプロセス全体のRenyi差分プライバシー(RDP)を分析することにより、プライバシー損失をはるかに厳しくする。大規模な実験により、Camelは最先端の作業よりも優れたプライバシーとユーティリティのトレードオフを実現し、有望なパフォーマンスを実現している。

Federated learning (FL) has rapidly become a compelling paradigm that enables multiple clients to jointly train a model by sharing only gradient updates for aggregation, without revealing their local private data. In order to protect the gradient updates which could also be privacy-sensitive, there has been a line of work studying local differential privacy (LDP) mechanisms to provide a formal privacy guarantee. With LDP mechanisms, clients locally perturb their gradient updates before sharing them out for aggregation. However, such approaches are known for greatly degrading the model utility, due to heavy noise addition. To enable a better privacy-utility tradeoff, a recently emerging trend is to apply the shuffle model of DP in FL, which relies on an intermediate shuffling operation on the perturbed gradient updates to achieve privacy amplification. Following this trend, in this paper, we present Camel, a new communication-efficient and maliciously secure FL framework in the shuffle model of DP. Camel first departs from existing works by ambitiously supporting integrity check for the shuffle computation, achieving security against malicious adversary. Specifically, Camel builds on the trending cryptographic primitive of secret-shared shuffle, with custom techniques we develop for optimizing system-wide communication efficiency, and for lightweight integrity checks to harden the security of server-side computation. In addition, we also derive a significantly tighter bound on the privacy loss through analyzing the Renyi differential privacy (RDP) of the overall FL process. Extensive experiments demonstrate that Camel achieves better privacy-utility trade-offs than the state-of-the-art work, with promising performance.

翻訳日:2024-11-02 22:39:00 公開日:2024-10-04

# 決定変換器の予測符号化

Predictive Coding for Decision Transformer ( http://arxiv.org/abs/2410.03408v1 )

ライセンス: Link先を確認

Tung M. Luu, Donghoon Lee, Chang D. Yoo,

(参考訳) オフライン強化学習(RL)における最近の研究は、リターン条件付き教師付き学習として意思決定を定式化する効果を実証している。特に、決定変換器(DT)アーキテクチャは、様々な領域で約束されている。しかし、初期の成功にもかかわらず、DTはゴール条件付きRLのいくつかの挑戦的なデータセットでは性能が劣っている。この制限は、特に非構造的、最適でないデータセットにおいて、政策学習を導くためのリターン条件付けの非効率性に起因し、DTは時間的構成性を効果的に学習することができない。さらに、この問題は長期のスパース・リワードタスクでさらに悪化する可能性がある。この課題に対処するために、一般化された将来の条件付けを活用してDT手法を強化するPCDT(Predictive Coding for Decision Transformer)フレームワークを提案する。 PCDTはDTフレームワークを拡張し、予測的なコーディングを条件に、過去と未来の両方の要因に基づいた意思決定を可能にし、一般化を改善するアーキテクチャを利用する。提案手法は,AntMaze環境とFrankaKitchen環境の8つのデータセットに対する広範な実験を通じて,オフラインゴール条件RLにおける既存の値ベースおよびトランスフォーマーベースの手法に匹敵する性能を実現する。さらに,本手法を物理ロボットを用いた目標達成作業でも評価する。

Recent work in offline reinforcement learning (RL) has demonstrated the effectiveness of formulating decision-making as return-conditioned supervised learning. Notably, the decision transformer (DT) architecture has shown promise across various domains. However, despite its initial success, DTs have underperformed on several challenging datasets in goal-conditioned RL. This limitation stems from the inefficiency of return conditioning for guiding policy learning, particularly in unstructured and suboptimal datasets, resulting in DTs failing to effectively learn temporal compositionality. Moreover, this problem might be further exacerbated in long-horizon sparse-reward tasks. To address this challenge, we propose the Predictive Coding for Decision Transformer (PCDT) framework, which leverages generalized future conditioning to enhance DT methods. PCDT utilizes an architecture that extends the DT framework, conditioned on predictive codings, enabling decision-making based on both past and future factors, thereby improving generalization. Through extensive experiments on eight datasets from the AntMaze and FrankaKitchen environments, our proposed method achieves performance on par with or surpassing existing popular value-based and transformer-based methods in offline goal-conditioned RL. Furthermore, we also evaluate our method on a goal-reaching task with a physical robot.

翻訳日:2024-11-02 22:29:14 公開日:2024-10-04

# 代理に基づくヒューリスティック最適化のための回帰対ペアワイズモデルの比較研究

Comparative study of regression vs pairwise models for surrogate-based heuristic optimisation ( http://arxiv.org/abs/2410.03409v1 )

ライセンス: Link先を確認

Pablo S. Naharro, Pablo Toharia, Antonio LaTorre, José-María Peña,

(参考訳) ヒューリスティック最適化アルゴリズムは、解をサンプリングし、その適合性を評価し、有望な解の方向に探索をバイアスすることで探索空間を探索する。しかし、多くの場合、この適合度関数は高価な計算処理を実行し、合理的な評価数を劇的に削減する。この文脈では、シュロゲートモデルはこれらの計算問題を緩和するための優れた代替品として現れている。本稿では,サロゲート問題の定式化を,適合度(表面サロゲートモデル)を近似する回帰モデルと,分類モデル(ペアワイズサロゲートモデル)を結合する新しい方法の両方として扱う。ペアワイズアプローチは、例えば差分進化(differial Evolution)のように、実際に探索を駆動するために適合値を必要としないアルゴリズムによって直接利用することができ、ある解が他の解より優れているかどうかを知るのに十分である。これらのモデリングアプローチに基づいて、異なる機械学習アルゴリズム(正規化回帰、ニューラルネットワーク、決定木、ブースティングメソッド、ランダムフォレスト)、異なる代理戦略(多様性の促進や予測しきい値の緩和)など、異なる構成下で代理モデルを多次元的に分析し、表面および対の代理モデルと比較した。論文の実験的部分には、SOCO2011コンペティションで提案されている連続最適化のベンチマーク問題と、最近のGECCO2021産業課題に含まれるシミュレーション問題が含まれている。本稿では,オンライン機械学習に基づくサロゲートモデルを用いた場合,全体の探索性能は,予測モデルの精度だけでなく,肯定的・否定的事例に対するバイアスの種類や,それらの予測を用いて実際のフィットネス機能を実行するかを決定する方法にも依存することを示す。

Heuristic optimisation algorithms explore the search space by sampling solutions, evaluating their fitness, and biasing the search in the direction of promising solutions. However, in many cases, this fitness function involves executing expensive computational calculations, drastically reducing the reasonable number of evaluations. In this context, surrogate models have emerged as an excellent alternative to alleviate these computational problems. This paper addresses the formulation of surrogate problems as both regression models that approximate fitness (surface surrogate models) and a novel way to connect classification models (pairwise surrogate models). The pairwise approach can be directly exploited by some algorithms, such as Differential Evolution, in which the fitness value is not actually needed to drive the search, and it is sufficient to know whether a solution is better than another one or not. Based on these modelling approaches, we have conducted a multidimensional analysis of surrogate models under different configurations: different machine learning algorithms (regularised regression, neural networks, decision trees, boosting methods, and random forests), different surrogate strategies (encouraging diversity or relaxing prediction thresholds), and compare them for both surface and pairwise surrogate models. The experimental part of the article includes the benchmark problems already proposed for the SOCO2011 competition in continuous optimisation and a simulation problem included in the recent GECCO2021 Industrial Challenge. This paper shows that the performance of the overall search, when using online machine learning-based surrogate models, depends not only on the accuracy of the predictive model but also on both the kind of bias towards positive or negative cases and how the optimisation uses those predictions to decide whether to execute the actual fitness function.

翻訳日:2024-11-02 22:29:14 公開日:2024-10-04

# 合成関係データの忠実度と実用性のベンチマーク

Benchmarking the Fidelity and Utility of Synthetic Relational Data ( http://arxiv.org/abs/2410.03411v1 )

ライセンス: Link先を確認

Valter Hudovernik, Martin Jurkovič, Erik Štrumbelj,

(参考訳) リレーショナルデータの合成は、研究者、実践者、業界からより多くの注目を集め始めています。このタスクは、テーブル間の関係が複雑になるため、単一のテーブルを合成するよりも難しい。同じ理由から、リレーショナルデータを合成するためのベンチマーク手法は、新しい課題をもたらす。我々の研究は、最先端の手法の実証的な評価の欠如と、そのような評価をどのように行うべきかの理解のギャップによって動機付けられている。我々は、関係データ合成、共通ベンチマークデータセット、および合成データの忠実性と有用性を測定するためのアプローチに関する関連研究についてレビューする。ベストプラクティスと新しい堅牢な検出アプローチをベンチマークツールに組み合わせ、それを2つの商用ツールを含む6つの方法の比較に使用します。一部のメソッドは他のメソッドよりも優れているが、元のデータと区別できないデータセットを合成する手段はない。実用面では、モデル予測性能と特徴量の両方において、実データと合成データの適度な相関が観察されるのが一般的である。

Synthesizing relational data has started to receive more attention from researchers, practitioners, and industry. The task is more difficult than synthesizing a single table due to the added complexity of relationships between tables. For the same reason, benchmarking methods for synthesizing relational data introduces new challenges. Our work is motivated by a lack of an empirical evaluation of state-of-the-art methods and by gaps in the understanding of how such an evaluation should be done. We review related work on relational data synthesis, common benchmarking datasets, and approaches to measuring the fidelity and utility of synthetic data. We combine the best practices and a novel robust detection approach into a benchmarking tool and use it to compare six methods, including two commercial tools. While some methods are better than others, no method is able to synthesize a dataset that is indistinguishable from original data. For utility, we typically observe moderate correlation between real and synthetic data for both model predictive performance and feature importance.

翻訳日:2024-11-02 22:29:14 公開日:2024-10-04

# Team MTS @ AutoMin 2021: 既存の要約手法の概要と教師なし要約手法との比較

Team MTS @ AutoMin 2021: An Overview of Existing Summarization Approaches and Comparison to Unsupervised Summarization Techniques ( http://arxiv.org/abs/2410.03412v1 )

ライセンス: Link先を確認

Olga Iakovenko, Anna Andreeva, Anna Lapidus, Liana Mikaelyan,

(参考訳) ビデオやオーディオ会議による遠隔コミュニケーションは、世界規模のパンデミックにより、これまで以上に人気が高まっている。これらの出来事は、AutoMin 2021チャレンジに繋がる音声言語の自動マイニングシステムの開発を促した。下記の論文は、Automatic Minutes チャレンジに参加しているチーム MTS が実施した研究成果について説明する。本稿では,テキストと音声の要約に対する既存のアプローチを解析し,クラスタリングに基づく教師なし要約手法を提案する。提案手法は, ルージュ1, ルージュ2, ルージュL値0.21, 0.02, 0.2, ルージュ1, ルージュ2, ルージュL値Adequacy, 文法的正しさおよびフラレンシ値0.0180, 0.035, 0.098,

Remote communication through video or audio conferences has become more popular than ever because of the worldwide pandemic. These events, therefore, have provoked the development of systems for automatic minuting of spoken language leading to AutoMin 2021 challenge. The following paper illustrates the results of the research that team MTS has carried out while participating in the Automatic Minutes challenge. In particular, in this paper we analyze existing approaches to text and speech summarization, propose an unsupervised summarization technique based on clustering and provide a pipeline that includes an adapted automatic speech recognition block able to run on real-life recordings. The proposed unsupervised technique outperforms pre-trained summarization models on the automatic minuting task with Rouge 1, Rouge 2 and Rouge L values of 0.21, 0.02 and 0.2 on the dev set, with Rouge 1, Rouge 2, Rouge L, Adequacy, Grammatical correctness and Fluency values of 0.180, 0.035, 0.098, 1.857, 2.304, 1.911 on the test set accordingly

翻訳日:2024-11-02 22:29:14 公開日:2024-10-04

# セキュアなキーリースのためのシンプルなフレームワーク

A Simple Framework for Secure Key Leasing ( http://arxiv.org/abs/2410.03413v1 )

ライセンス: Link先を確認

Fuyuki Kitagawa, Tomoyuki Morimae, Takashi Yamakawa,

(参考訳) セキュアな鍵リース(すなわち、鍵取り消し可能な暗号)により、暗号鍵を量子状態としてリースし、鍵を検証可能な方法で取り消すことができる。本稿では,BB84状態の復号化特性を利用して,暗号プリミティブをセキュアな鍵リースで構築するための簡単なフレームワークを提案する。この枠組みに基づき、以下のスキームを得る。 -IND-CPAのセキュアな公開鍵暗号スキームに基づいて古典的な取り消しを行うセキュアな鍵リースを備えた公開鍵暗号スキーム。以前の研究は、量子的取り消しか、LWE問題による学習の量子的硬度のようなより強い仮定に依存していた。 -一方の関数に基づいて古典的な取り消しを行うセキュアな鍵リースを持つ擬似乱数関数。以前の研究は、LWE問題の量子硬度のような強い仮定に依存していた。 -ショート整数解(SIS)問題の量子硬度に基づく古典的取り消しを有するセキュアな鍵リース付きデジタル署名スキーム。私たちの構造には静的な署名キーがあります。つまり、署名キーの状態は署名前後でほとんど変化しません。以前の構成では、コピー保護というより強力な目標を達成するために、非静的署名キーや識別不能な難読化に依存していた。さらに、敵が削除の有効な証明書を提出した後、取り消しの検証キーが漏洩しても、これらのスキームはすべて安全である。私たちの知る限り、この設定では、以前の構成はすべて完全に壊れています。さらに、我々の見解では、我々のセキュリティ証明は既存のスキームよりもはるかに単純である。

Secure key leasing (a.k.a. key-revocable cryptography) enables us to lease a cryptographic key as a quantum state in such a way that the key can be later revoked in a verifiable manner. We propose a simple framework for constructing cryptographic primitives with secure key leasing via the certified deletion property of BB84 states. Based on our framework, we obtain the following schemes. - A public key encryption scheme with secure key leasing that has classical revocation based on any IND-CPA secure public key encryption scheme. Prior works rely on either quantum revocation or stronger assumptions such as the quantum hardness of the learning with errors (LWE) problem. - A pseudorandom function with secure key leasing that has classical revocation based on one-way functions. Prior works rely on stronger assumptions such as the quantum hardness of the LWE problem. - A digital signature scheme with secure key leasing that has classical revocation based on the quantum hardness of the short integer solution (SIS) problem. Our construction has static signing keys, i.e., the state of a signing key almost does not change before and after signing. Prior constructions either rely on non-static signing keys or indistinguishability obfuscation to achieve a stronger goal of copy-protection. In addition, all of our schemes remain secure even if a verification key for revocation is leaked after the adversary submits a valid certificate of deletion. To our knowledge, all prior constructions are totally broken in this setting. Moreover, in our view, our security proofs are much simpler than those for existing schemes.

翻訳日:2024-11-02 22:29:14 公開日:2024-10-04

# 単一ベクトルアブレーションによる言語モデルにおける偽の拒絶の軽減

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation ( http://arxiv.org/abs/2410.03415v1 )

ライセンス: Link先を確認

Xinpeng Wang, Chengzhi Hu, Paul Röttger, Barbara Plank,

(参考訳) モデルが悪意のある指示に従うことや、有害なアドバイスをすることを拒否したり(例: "どうやって誰かを殺すのか" など)、安全でないもの(例: "どのようにPythonプロセスを殺すのか" など)に似ても、安全な要求を拒否するべきではない。このような誤った拒絶を避けることは、以前の研究が示すように、高機能な言語モデルでさえ困難である。本稿では,単一ベクトルアブレーションによる言語モデルにおける偽の拒絶を緩和するための簡易かつ外科的手法を提案する。与えられたモデルに対して、偽の拒絶ベクトルを抽出し、このベクトルを非難することで、モデル安全性や一般モデルの能力に悪影響を及ぼすことなく、偽の拒絶率を低減することを示す。また,本手法はモデル安全性のきめ細かい校正に有効であることを示す。提案手法はトレーニング不要で,モデルに依存しないため,現在および将来の言語モデルにおける誤認の問題を軽減するのに有用である。

Training a language model to be both helpful and harmless requires careful calibration of refusal behaviours: Models should refuse to follow malicious instructions or give harmful advice (e.g. "how do I kill someone?"), but they should not refuse safe requests, even if they superficially resemble unsafe ones (e.g. "how do I kill a Python process?"). Avoiding such false refusal, as prior work has shown, is challenging even for highly-capable language models. In this paper, we propose a simple and surgical method for mitigating false refusal in language models via single vector ablation. For a given model, we extract a false refusal vector and show that ablating this vector reduces false refusal rate without negatively impacting model safety and general model capabilities. We also show that our approach can be used for fine-grained calibration of model safety. Our approach is training-free and model-agnostic, making it useful for mitigating the problem of false refusal in current and future language models.

翻訳日:2024-11-02 22:29:14 公開日:2024-10-04

# Img2CAD:構造的視覚幾何学を用いた単一画像からの3次元CADモデル生成

Img2CAD: Conditioned 3D CAD Model Generation from Single Image with Structured Visual Geometry ( http://arxiv.org/abs/2410.03417v1 )

ライセンス: Link先を確認

Tianrun Chen, Chunan Yu, Yuanqi Hu, Jing Li, Tao Xu, Runlong Cao, Lanyun Zhu, Ying Zang, Yong Zhang, Zejian Li, Linyun Sun,

(参考訳) 本稿では,編集可能なパラメータを持つCADモデルを生成するために2次元画像入力を用いた知識に対する最初のアプローチであるImg2CADを提案する。テキストや画像入力を使用した既存の3Dモデル生成のためのAIメソッドとは異なり、CADツールと互換性がなく、編集性や細かい制御が欠けているメッシュベースの表現に依存することが多い。我々は、オブジェクトから抽出されたベクトル化されたワイヤフレームを特徴とする、構造化ビジュアル幾何学(SVG)と呼ばれる革新的な中間表現を特定した。この表現は、条件付きCADモデルの生成性能を大幅に向上させる。 ABC-monoはレンダリングされた画像を持つ20,000以上の3DCADモデルからなる既知の最大のデータセットであり、KoCADは、実世界のキャプチャーオブジェクトとそれらの地上の真理CADモデルを組み合わせた最初のデータセットであり、条件付きCADモデル生成におけるさらなる研究を支援する。

In this paper, we propose Img2CAD, the first approach to our knowledge that uses 2D image inputs to generate CAD models with editable parameters. Unlike existing AI methods for 3D model generation using text or image inputs often rely on mesh-based representations, which are incompatible with CAD tools and lack editability and fine control, Img2CAD enables seamless integration between AI-based 3D reconstruction and CAD software. We have identified an innovative intermediate representation called Structured Visual Geometry (SVG), characterized by vectorized wireframes extracted from objects. This representation significantly enhances the performance of generating conditioned CAD models. Additionally, we introduce two new datasets to further support research in this area: ABC-mono, the largest known dataset comprising over 200,000 3D CAD models with rendered images, and KOCAD, the first dataset featuring real-world captured objects alongside their ground truth CAD models, supporting further research in conditioned CAD model generation.

翻訳日:2024-11-02 22:29:14 公開日:2024-10-04

# CNN層を経由した航空機のレーダ・アレータ干渉低減

Aircraft Radar Altimeter Interference Mitigation Through a CNN-Layer Only Denoising Autoencoder Architecture ( http://arxiv.org/abs/2410.03423v1 )

ライセンス: Link先を確認

Samuel B. Brown, Stephen Young, Adam Wagenknecht, Daniel Jakubisin, Charles E. Thornton, Aaron Orndorff, William C. Headley,

(参考訳) 信号処理アプリケーションのためのデノイングオートエンコーダは、特に大規模なサンプルシステムにおいて、無線周波数通信信号を再構成する学習において重大な困難を経験することが示されている。通信システムでは、この課題は主に、本質的には確率的である変調されたデータストリームを再構築する必要があるためである。本研究では,高構造FMCWレーダ信号を再構成しながら,干渉する無線周波数通信信号を除去するために,デノナイズ方式のオートエンコーダを用いることにより,この制限を利用する。具体的には、CNN層のみのオートエンコーダアーキテクチャを用いて、多数の干渉信号からなる厳しい干渉環境においても、レーダ高度計のレンジ推定精度を向上させることができることを示す。これは、畳み込み層のみのオートエンコーダを使用せずとも、エンドツーエンドのFMCWレーダ高度計シミュレーションの包括的な性能解析によって実証される。提案手法は、狭帯域のトーン干渉と広帯域QPSK干渉の両方の存在下での干渉緩和を、レンジRMS誤差、偽高度レポート数、および結果のレンジプロファイルのピーク・ツー・サイドローブ比の観点から著しく改善する。最大4万個のIQサンプルのFMCWレーダー信号を確実に再構成することができる。

Denoising autoencoders for signal processing applications have been shown to experience significant difficulty in learning to reconstruct radio frequency communication signals, particularly in the large sample regime. In communication systems, this challenge is primarily due to the need to reconstruct the modulated data stream which is generally highly stochastic in nature. In this work, we take advantage of this limitation by using the denoising autoencoder to instead remove interfering radio frequency communication signals while reconstructing highly structured FMCW radar signals. More specifically, in this work we show that a CNN-layer only autoencoder architecture can be utilized to improve the accuracy of a radar altimeter's ranging estimate even in severe interference environments consisting of a multitude of interference signals. This is demonstrated through comprehensive performance analysis of an end-to-end FMCW radar altimeter simulation with and without the convolutional layer-only autoencoder. The proposed approach significantly improves interference mitigation in the presence of both narrow-band tone interference as well as wideband QPSK interference in terms of range RMS error, number of false altitude reports, and the peak-to-sidelobe ratio of the resulting range profile. FMCW radar signals of up to 40,000 IQ samples can be reliably reconstructed.

翻訳日:2024-11-02 22:29:14 公開日:2024-10-04

# Cayley Graph Propagation

Cayley Graph Propagation ( http://arxiv.org/abs/2410.03424v1 )

ライセンス: Link先を確認

JJ Wilson, Maya Bechler-Speicher, Petar Veličković,

(参考訳) グラフ構造化データのモデリングにおいて、グラフニューラルネットワーク(GNN)を使った成功談は多々あるが、それらは過度な監視に弱いことで知られており、タスクはノードの距離ペア間の情報の混合を必要とする。この問題に対処するため、先行研究では、情報フローを改善するためにグラフ構造を書き換えることを提案している。あるいは、重要な研究機関が、オーバー・スカッシングを改善するためにボトルネックのないグラフ構造の発見と事前計算に力を入れている。数学界におけるボトルネックのないグラフのファミリの一つとして、拡張グラフがある。先行研究$\unicode{x2014}$Expander Graph Propagation (EGP)$\unicode{x2014}$proposing the use of a well-known expander graph family$\unicode{x2014}$the Cayley graphs of the $\mathrm{SL}(2,\mathbb{Z}_n)$ special linear group$\unicode{x2014}$as a computer template for GNNs。しかし、EGPでは、使用する計算グラフは与えられた入力グラフと整合するように切り詰められる。本研究は, トランケーションが対流膨張特性に有害であることを示す。代わりに、完全なケイリーグラフ構造上の情報を伝播する手法であるCGPを提案する。実世界の複数のデータセットにまたがる実証的な証拠は、CGPがEGPに比べて大幅な改善を回復するだけでなく、計算に複雑なグラフリウィリング技術に類似していることを示している。

In spite of the plethora of success stories with graph neural networks (GNNs) on modelling graph-structured data, they are notoriously vulnerable to over-squashing, whereby tasks necessitate the mixing of information between distance pairs of nodes. To address this problem, prior work suggests rewiring the graph structure to improve information flow. Alternatively, a significant body of research has dedicated itself to discovering and precomputing bottleneck-free graph structures to ameliorate over-squashing. One well regarded family of bottleneck-free graphs within the mathematical community are expander graphs, with prior work$\unicode{x2014}$Expander Graph Propagation (EGP)$\unicode{x2014}$proposing the use of a well-known expander graph family$\unicode{x2014}$the Cayley graphs of the $\mathrm{SL}(2,\mathbb{Z}_n)$ special linear group$\unicode{x2014}$as a computational template for GNNs. However, in EGP the computational graphs used are truncated to align with a given input graph. In this work, we show that truncation is detrimental to the coveted expansion properties. Instead, we propose CGP, a method to propagate information over a complete Cayley graph structure, thereby ensuring it is bottleneck-free to better alleviate over-squashing. Our empirical evidence across several real-world datasets not only shows that CGP recovers significant improvements as compared to EGP, but it is also akin to or outperforms computationally complex graph rewiring techniques.

翻訳日:2024-11-02 22:29:14 公開日:2024-10-04

# このテストセットはどれくらい難しいか?爆発的トレーニングダイナミクスによるNLIの特性評価

How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics ( http://arxiv.org/abs/2410.03429v1 )

ライセンス: Link先を確認

Adrian Cosma, Stefan Ruseti, Mihai Dascalu, Cornelia Caragea,

(参考訳) 自然言語推論(NLI)評価は、言語理解モデルを評価する上で重要であるが、一般的なデータセットは、実際のモデル性能を人工的に向上させる体系的な急激な相関に悩まされている。そこで本研究では,人為的および非現実的な例を手作業で構築することに頼ることなく,挑戦的なテストセットを自動生成する手法を提案する。一般的なNLIデータセットのテストセットを,トレーニングダイナミクスを利用した3つの難易度に分類する。この分類は、性能が著しく低下し、より現実的で多様な言語現象を包含する最も難易度の高い事例として、素早い相関措置を著しく減少させる。我々の特徴付け手法がトレーニングセットに適用された場合、トレーニングされたデータのごく一部でトレーニングされたモデルは、他のデータセットの特徴付け手法を上回り、フルデータセットでトレーニングされたモデルに匹敵するパフォーマンスを達成する。本研究は,NLIデータセット構築における制約に対処し,多様なNLUアプリケーションに影響を及ぼすモデル性能のより正確な評価を提供する。

Natural Language Inference (NLI) evaluation is crucial for assessing language understanding models; however, popular datasets suffer from systematic spurious correlations that artificially inflate actual model performance. To address this, we propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples. We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics. This categorization significantly reduces spurious correlation measures, with examples labeled as having the highest difficulty showing markedly decreased performance and encompassing more realistic and diverse linguistic phenomena. When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset, surpassing other dataset characterization techniques. Our research addresses limitations in NLI dataset construction, providing a more authentic evaluation of model performance with implications for diverse NLU applications.

翻訳日:2024-11-02 22:29:14 公開日:2024-10-04

# Image Speak Volumes: アクセシブルコミュニケーションのための画像生成のユーザ中心評価

Images Speak Volumes: User-Centric Assessment of Image Generation for Accessible Communication ( http://arxiv.org/abs/2410.03430v1 )

ライセンス: Link先を確認

Miriam Anschütz, Tringa Sylaj, Georg Groh,

(参考訳) 説明画像は、アクセシブルで読みやすい(E2R)テキストにおいて重要な役割を果たす。しかし、オンラインデータベースで利用可能な画像はそれぞれのテキストに合わせて調整されておらず、カスタマイズされた画像の作成は高価である。本研究では,手軽にカスタマイズ可能な画像を提供することで,テキスト・画像生成モデルがこのギャップを埋めることができるかを検討した。我々は、7、4つのオープンソース、3つのクローズドソース画像生成モデルをベンチマークし、その結果の画像を広範囲に評価した。また,E2Rターゲットグループの人々とユーザスタディを行い,画像が要件を満たしているかどうかを検討した。いくつかのモデルは優れた性能を示すが、人間の監督なしに大規模に使用する準備ができていない。我々の研究は、E2Rクリエーターにとってアクセス可能な情報の作成を容易にし、ターゲットグループのニーズに合わせてアクセス可能なイメージを調整するための重要なステップである。

Explanatory images play a pivotal role in accessible and easy-to-read (E2R) texts. However, the images available in online databases are not tailored toward the respective texts, and the creation of customized images is expensive. In this large-scale study, we investigated whether text-to-image generation models can close this gap by providing customizable images quickly and easily. We benchmarked seven, four open- and three closed-source, image generation models and provide an extensive evaluation of the resulting images. In addition, we performed a user study with people from the E2R target group to examine whether the images met their requirements. We find that some of the models show remarkable performance, but none of the models are ready to be used at a larger scale without human supervision. Our research is an important step toward facilitating the creation of accessible information for E2R creators and tailoring accessible images to the target group's needs.

翻訳日:2024-11-02 22:29:14 公開日:2024-10-04

# EB-NeRD:ニュースレコメンデーションのための大規模データセット

EB-NeRD: A Large-Scale Dataset for News Recommendation ( http://arxiv.org/abs/2410.03432v1 )

ライセンス: Link先を確認

Johannes Kruse, Kasper Lindskow, Saikishore Kalloori, Marco Polignano, Claudio Pomo, Abhishek Srivastava, Anshuk Uppal, Michael Riis Andersen, Jes Frellsen,

(参考訳) パーソナライズされたコンテンツレコメンデーションは、ビデオストリーミングからソーシャルネットワークまで、デジタルメディアのコンテンツ体験に重要な要素となっている。しかし、いくつかのドメイン固有の課題は、ニュース出版におけるレコメンデーターシステムの採用を妨げている。これらの課題に対処するために、Ekstra Bladet News Recommendation Dataset (EB-NeRD)を紹介する。このデータセットには、100万人以上のユニークユーザと、Ekstra Bladetの3700万以上のインプレッションログが含まれている。また、125,000以上のデンマークのニュース記事のコレクションが含まれており、タイトル、要約、ボディ、カテゴリなどのメタデータが完備している。 EB-NeRDはRecSys '24 Challengeのベンチマークデータセットとして機能し、このデータセットが、ニュースパブリッシングのために効果的で責任あるレコメンデータシステムを設計する際の技術的および規範的な課題にどのように対処できるかを実証した。データセットは以下の通りである。

Personalized content recommendations have been pivotal to the content experience in digital media from video streaming to social networks. However, several domain specific challenges have held back adoption of recommender systems in news publishing. To address these challenges, we introduce the Ekstra Bladet News Recommendation Dataset (EB-NeRD). The dataset encompasses data from over a million unique users and more than 37 million impression logs from Ekstra Bladet. It also includes a collection of over 125,000 Danish news articles, complete with titles, abstracts, bodies, and metadata, such as categories. EB-NeRD served as the benchmark dataset for the RecSys '24 Challenge, where it was demonstrated how the dataset can be used to address both technical and normative challenges in designing effective and responsible recommender systems for news publishing. The dataset is available at: https://recsys.eb.dk.

翻訳日:2024-11-02 22:19:23 公開日:2024-10-04

# 多点触覚の知覚的重要度予測のための自己教師付き時空間マスクパージング注意ネットワーク

Self-supervised Spatio-Temporal Graph Mask-Passing Attention Network for Perceptual Importance Prediction of Multi-point Tactility ( http://arxiv.org/abs/2410.03434v1 )

ライセンス: Link先を確認

Dazhong He, Qian Liu,

(参考訳) 視覚的・聴覚的情報は現代のマルチメディアシステムでは一般的であるが、触覚的相互作用(触覚的・審美的相互作用)は人間の知覚のユニークな形態を提供する。しかし,接触操作のためのマルチメディア技術は,非接触型マルチメディア技術よりも成熟度が低く,さらなる開発が必要である。低レイテンシとビットレートを必要とする特殊な触覚メディア技術は、触覚情報圧縮を必要とする触覚インタラクションを実現するために不可欠である。既存のビブロタクタクタブル信号圧縮法は知覚モデルに基づいて,複数の空間的相互作用点における融合触覚知覚の特性を考慮していない。実際、触覚の重要性の違いは、従来の周波数や時間領域に限らず、触覚に特有の皮膚上の空間的位置の違いも含んでいる。最も頻繁に使用される触覚情報、視覚的テクスチャ知覚のために、自己教師付き学習と時空間グラフニューラルネットワークに基づいて、その知覚的重要性を複数の点で予測するモデルを開発した。現在の実験結果から,多点触覚の知覚シナリオにおいて,様々な点の知覚的重要性を効果的に予測できることが示唆された。

While visual and auditory information are prevalent in modern multimedia systems, haptic interaction, e.g., tactile and kinesthetic interaction, provides a unique form of human perception. However, multimedia technology for contact interaction is less mature than non-contact multimedia technologies and requires further development. Specialized haptic media technologies, requiring low latency and bitrates, are essential to enable haptic interaction, necessitating haptic information compression. Existing vibrotactile signal compression methods, based on the perceptual model, do not consider the characteristics of fused tactile perception at multiple spatially distributed interaction points. In fact, differences in tactile perceptual importance are not limited to conventional frequency and time domains, but also encompass differences in the spatial locations on the skin unique to tactile perception. For the most frequently used tactile information, vibrotactile texture perception, we have developed a model to predict its perceptual importance at multiple points, based on self-supervised learning and Spatio-Temporal Graph Neural Network. Current experimental results indicate that this model can effectively predict the perceptual importance of various points in multi-point tactile perception scenarios.

翻訳日:2024-11-02 22:19:23 公開日:2024-10-04

# 解釈可能なセマンティックテキスト埋め込み作成のための汎用フレームワーク

A General Framework for Producing Interpretable Semantic Text Embeddings ( http://arxiv.org/abs/2410.03435v1 )

ライセンス: Link先を確認

Yiqun Sun, Qiang Huang, Yixuan Tang, Anthony K. H. Tung, Jun Yu,

(参考訳) セマンティックテキストの埋め込みは自然言語処理(NLP)において多くのタスクに必須である。ブラックボックスモデルは高品質な埋め込みを生成することができるが、解釈可能性の欠如は透明性を必要とするタスクでの使用を制限する。近年のアプローチでは、ドメインエキスパートが作成した質問やLLMが生成した質問を活用することで、解釈可能性の向上が図られているが、これらの手法は専門家の入力や適切な設計に大きく依存しており、その一般化性と幅広いタスクにまたがる差別的な質問を生成する能力を制限する。これらの課題に対処するために,さまざまなタスクにまたがる解釈可能なセマンティックテキストの埋め込みを生成するための一般的なフレームワークである,<algo{CQG-MBQA} (Contrastive Question Generation - Multi-task Binary Question Answering)を紹介した。この枠組みは,高度に識別的かつ低認知的負荷のYes/no質問を \algo{CQG} 法を用いて体系的に生成し,それらをより効果的に解答することにより,コスト効率のよい埋め込みを実現する。本研究では,多くの高度なブラックボックスモデルに匹敵する埋め込み品質を提供するとともに,本質的な解釈可能性を維持しつつ,より広範な実験とアブレーション研究を通じて, \algo{CQG-MBQA}の有効性と解釈可能性を検証する。さらに、 \algo{CQG-MBQA} は、様々なダウンストリームタスクにまたがる他の解釈可能なテキスト埋め込みメソッドよりも優れている。

Semantic text embedding is essential to many tasks in Natural Language Processing (NLP). While black-box models are capable of generating high-quality embeddings, their lack of interpretability limits their use in tasks that demand transparency. Recent approaches have improved interpretability by leveraging domain-expert-crafted or LLM-generated questions, but these methods rely heavily on expert input or well-prompt design, which restricts their generalizability and ability to generate discriminative questions across a wide range of tasks. To address these challenges, we introduce \algo{CQG-MBQA} (Contrastive Question Generation - Multi-task Binary Question Answering), a general framework for producing interpretable semantic text embeddings across diverse tasks. Our framework systematically generates highly discriminative, low cognitive load yes/no questions through the \algo{CQG} method and answers them efficiently with the \algo{MBQA} model, resulting in interpretable embeddings in a cost-effective manner. We validate the effectiveness and interpretability of \algo{CQG-MBQA} through extensive experiments and ablation studies, demonstrating that it delivers embedding quality comparable to many advanced black-box models while maintaining inherently interpretability. Additionally, \algo{CQG-MBQA} outperforms other interpretable text embedding methods across various downstream tasks.

翻訳日:2024-11-02 22:19:23 公開日:2024-10-04

# プレトレーニングにおけるアクティベーション・スパリティのメリットを探る

Exploring the Benefit of Activation Sparsity in Pre-training ( http://arxiv.org/abs/2410.03440v1 )

ライセンス: Link先を確認

Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin, Yankai Lin, Zhiyuan Zeng, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou,

(参考訳) 事前訓練されたトランスフォーマーは本質的にスパース活性化の特徴を持ち、各トークンに対して少数のニューロンのみが活性化される。スパース・アクティベーションはポスト・トレーニング法によって研究されているが、プレ・トレーニングの可能性は未解決のままである。本研究では,まず,事前学習中に活性化特性がどう変化するかを検討する。本研究により,トランスフォーマーは,トレーニングの進行とともに活性化相関が変化し続けながら,トレーニング前プロセスの大部分を通してスパースアクティベーションを示すことが明らかとなった。そこで本研究では,Sparse-Dense Learning (SSD)を提案する。 SSDは、Mixtures-of-Experts (MoE)ベースのスパーストレーニングと事前トレーニング中の従来の密集トレーニングを適応的に切り替え、スパーストレーニングの効率を活用し、スパーストレーニングの静的アクティベーション相関を回避する。高密度トレーニングと比較して、SSDは同じモデルサイズで同等のパフォーマンスを達成し、事前トレーニングコストを削減します。さらに、SSDでトレーニングされたモデルは、スパース推論のMoEモデルとして直接使用することができ、最大2\times$高速推論速度の高密度モデルと同じパフォーマンスを達成することができる。コードはhttps://github.com/thunlp/moefication.comで入手できる。

Pre-trained Transformers inherently possess the characteristic of sparse activation, where only a small fraction of the neurons are activated for each token. While sparse activation has been explored through post-training methods, its potential in pre-training remains untapped. In this work, we first study how activation properties change during pre-training. Our examination reveals that Transformers exhibit sparse activation throughout the majority of the pre-training process while the activation correlation keeps evolving as training progresses. Leveraging this observation, we propose Switchable Sparse-Dense Learning (SSD). SSD adaptively switches between the Mixtures-of-Experts (MoE) based sparse training and the conventional dense training during the pre-training process, leveraging the efficiency of sparse training and avoiding the static activation correlation of sparse training. Compared to dense training, SSD achieves comparable performance with identical model size and reduces pre-training costs. Moreover, the models trained with SSD can be directly used as MoE models for sparse inference and achieve the same performance as dense models with up to $2\times$ faster inference speed. Codes are available at https://github.com/thunlp/moefication.

翻訳日:2024-11-02 22:19:23 公開日:2024-10-04

# CLoSD:マルチタスク文字制御のためのシミュレーションと拡散のループを閉じる

CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control ( http://arxiv.org/abs/2410.03441v1 )

ライセンス: Link先を確認

Guy Tevet, Sigal Raab, Setareh Cohan, Daniele Reda, Zhengyi Luo, Xue Bin Peng, Amit H. Bermano, Michiel van de Panne,

(参考訳) 物理シミュレーションのための運動拡散モデルと強化学習(RL)に基づく制御は、人間の運動生成に相補的な強みを持つ。前者はテキストなどの直感的な制御に固執し、後者は物理的にもっともらしい動きと環境との直接的な相互作用を提供する。本研究では,それぞれの強みを組み合わせた手法を提案する。 CLoSDはテキスト駆動のRL物理ベースのコントローラで、様々なタスクの拡散生成によって導かれる。我々の重要な洞察は、動きの拡散がロバストなRLコントローラのためのオンザフライユニバーサルプランナーとして機能するということである。この目的のために、CLoSDは、Diffusion Planner(DiP)とトラッキングコントローラという、2つのモジュール間のクローズドループインタラクションを維持している。 DiPはテキストのプロンプトとターゲット位置によって制御される高速応答型自己回帰拡散モデルであり、コントローラはシンプルで堅牢な動作模倣器であり、DiPからの動作計画を継続的に受信し、環境からのフィードバックを提供する。 CLoSDは、目標地点へのナビゲーション、テキストプロンプトで指定された手や足で物体を打つこと、座ること、立ち上がることなど、さまざまなタスクをシームレスに実行することができる。 https://guytevet.github.io/CLoSD-page/

Motion diffusion models and Reinforcement Learning (RL) based control for physics-based simulations have complementary strengths for human motion generation. The former is capable of generating a wide variety of motions, adhering to intuitive control such as text, while the latter offers physically plausible motion and direct interaction with the environment. In this work, we present a method that combines their respective strengths. CLoSD is a text-driven RL physics-based controller, guided by diffusion generation for various tasks. Our key insight is that motion diffusion can serve as an on-the-fly universal planner for a robust RL controller. To this end, CLoSD maintains a closed-loop interaction between two modules -- a Diffusion Planner (DiP), and a tracking controller. DiP is a fast-responding autoregressive diffusion model, controlled by textual prompts and target locations, and the controller is a simple and robust motion imitator that continuously receives motion plans from DiP and provides feedback from the environment. CLoSD is capable of seamlessly performing a sequence of different tasks, including navigation to a goal location, striking an object with a hand or foot as specified in a text prompt, sitting down, and getting up. https://guytevet.github.io/CLoSD-page/

翻訳日:2024-11-02 22:19:23 公開日:2024-10-04

# 不純物をもつ二重ユニタリ量子回路における絡み合い

Entanglement in dual unitary quantum circuits with impurities ( http://arxiv.org/abs/2410.03442v1 )

ライセンス: Link先を確認

Shachar Fraenkel, Colin Rylands,

(参考訳) 両部エンタングルメントエントロピーは多体量子系における普遍的性質の最も有用な特徴の1つである。平衡から遠いところでは、その力学、準粒子像と膜像の2つの非常に効果的な理論が存在する。本研究では、不純物に摂食された量子回路モデルにおいて、絡み合いのダイナミクスとこれら2つの相補的アプローチについて検討する。特に、空間的に固定された非双対不純物ゲートを含む双対ユニタリ量子回路を考える。不純物の有限距離における半無限部分系と有限部分系の両方に対する絡み合いエントロピーを計算し、正確な結果を有効理論の予測と比較する。前者の場合、どちらの理論も互いに一致し、正確な計算を行う。しかし後者の場合、両理論は質的に異なり、準粒子像は膜像とは対照的に非単調な成長を予測している。このようなモノトニックな動作は、ランダムなカオス回路でも起こりうることを示す。

Bipartite entanglement entropy is one of the most useful characterizations of universal properties in a many-body quantum system. Far from equilibrium, there exist two highly effective theories describing its dynamics -- the quasiparticle and membrane pictures. In this work we investigate entanglement dynamics, and these two complementary approaches, in a quantum circuit model perturbed by an impurity. In particular, we consider a dual unitary quantum circuit containing a spatially fixed, non-dual-unitary impurity gate, allowing for differing local Hilbert space dimensions to either side. We compute the entanglement entropy for both a semi-infinite and a finite subsystem within a finite distance of the impurity, comparing exact results to predictions of the effective theories. We find that in the former case, both theories agree with each other and the exact calculation. In the latter case, however, both theories qualitatively differ, with the quasiparticle picture predicting a non-monotonic growth in contrast to the membrane picture. We show that such non-monotonic behavior can arise even in random chaotic circuits, pointing to a hitherto unknown shortcoming of the membrane picture in describing such systems.

翻訳日:2024-11-02 22:19:23 公開日:2024-10-04

# 自然言語処理の不確かさについて

On Uncertainty In Natural Language Processing ( http://arxiv.org/abs/2410.03446v1 )

ライセンス: Link先を確認

Dennis Ulmer,

(参考訳) ディープラーニングの過去10年で、さまざまなアプリケーションにデプロイされる、ますます有能なシステムが生まれました。自然言語処理において、この分野は大きな言語モデルを含む多くのブレークスルーによって変革され、ますます多くのユーザ向けアプリケーションで使われている。この技術の利点を享受し、潜在的な害を軽減するためには、モデル予測の信頼性と、その開発を妨げた不確実性を定量化することが重要である。この論文は、自然言語処理の不確実性が言語的、統計的、神経的な視点からどのように特徴づけられるか、そして、実験パイプラインの設計を通してそれを減らし、定量化する方法について研究する。さらに,テキスト分類タスクにおける帰納的モデルバイアスの効果を理論的かつ実験的に検討することにより,モデリングにおける不確実性定量化について検討する。対応する実験には、3つの異なる言語(デンマーク語、英語、フィンランド語)とタスクのデータと、異なる不確実性定量化アプローチの大規模なセットが含まれる。さらに,非交換不能な共形予測に基づく自然言語生成における校正サンプリング手法を提案する。最後に、補助予測器を用いて、大規模ブラックボックス言語モデルの信頼度を定量化する手法を開発し、ターゲットモデルの出力テキストへの入力から信頼度を予測する。

The last decade in deep learning has brought on increasingly capable systems that are deployed on a wide variety of applications. In natural language processing, the field has been transformed by a number of breakthroughs including large language models, which are used in increasingly many user-facing applications. In order to reap the benefits of this technology and reduce potential harms, it is important to quantify the reliability of model predictions and the uncertainties that shroud their development. This thesis studies how uncertainty in natural language processing can be characterized from a linguistic, statistical and neural perspective, and how it can be reduced and quantified through the design of the experimental pipeline. We further explore uncertainty quantification in modeling by theoretically and empirically investigating the effect of inductive model biases in text classification tasks. The corresponding experiments include data for three different languages (Danish, English and Finnish) and tasks as well as a large set of different uncertainty quantification approaches. Additionally, we propose a method for calibrated sampling in natural language generation based on non-exchangeable conformal prediction, which provides tighter token sets with better coverage of the actual continuation. Lastly, we develop an approach to quantify confidence in large black-box language models using auxiliary predictors, where the confidence is predicted from the input to and generated output text of the target model alone.

翻訳日:2024-11-02 22:19:23 公開日:2024-10-04

# 言語モデルはどのように文脈文法的キューを優先するか?

How Language Models Prioritize Contextual Grammatical Cues? ( http://arxiv.org/abs/2410.03447v1 )

ライセンス: Link先を確認

Hamidreza Amirzadeh, Afra Alishahi, Hosein Mohebbi,

(参考訳) トランスフォーマーベースの言語モデルは、文脈情報を効果的に捉え、活用する優れた能力を示している。主観的コンセンサスやコア参照解決など,対象タスクに対する単一コンテキストキューの寄与を定量化し,追跡するために,さまざまな分析手法が用いられているが,そのコンテキスト内で複数の関連キューが利用可能となるシナリオはいまだ検討されていない。本稿では,複数のジェンダーキュー語が存在する場合の言語モデルによるジェンダー合意の扱いについて検討し,それぞれが対象のジェンダー代名詞を独立に曖昧にすることができることを示す。我々は、エンコーダベースであるBERTとデコーダベースモデルであるGPT-2の2つの広く使われているトランスフォーマーモデルを分析する。我々の分析では、モデル内の情報の流れを追跡するコンテキスト混合分析と、モデルの予測に対するキューの影響を測定するアクティベーションパッチングという2つの相補的なアプローチを採用している。 GPT-2は最終のキューに依存しているのに対し、BERTはターゲットの単語表現とモデルの予測の両方を形成するために、コンテキストの最初のキューを優先順位付けする傾向にある。この結果から,エンコーダベースのモデルとデコーダベースのモデルでは,予測にコンテキスト情報を優先し,使用する方法に顕著な違いが認められた。

Transformer-based language models have shown an excellent ability to effectively capture and utilize contextual information. Although various analysis techniques have been used to quantify and trace the contribution of single contextual cues to a target task such as subject-verb agreement or coreference resolution, scenarios in which multiple relevant cues are available in the context remain underexplored. In this paper, we investigate how language models handle gender agreement when multiple gender cue words are present, each capable of independently disambiguating a target gender pronoun. We analyze two widely used Transformer-based models: BERT, an encoder-based, and GPT-2, a decoder-based model. Our analysis employs two complementary approaches: context mixing analysis, which tracks information flow within the model, and a variant of activation patching, which measures the impact of cues on the model's prediction. We find that BERT tends to prioritize the first cue in the context to form both the target word representations and the model's prediction, while GPT-2 relies more on the final cue. Our findings reveal striking differences in how encoder-based and decoder-based models prioritize and use contextual information for their predictions.

翻訳日:2024-11-02 22:09:37 公開日:2024-10-04

# 毒性分類器とAbleismに応答する大規模言語モデル

How Toxicity Classifiers and Large Language Models Respond to Ableism ( http://arxiv.org/abs/2410.03448v1 )

ライセンス: Link先を確認

Mahika Phutane, Ananya Seelam, Aditya Vashistha,

(参考訳) 障害のある人(PwD)は、定期的にネット上の憎悪やマイクロアグレッションに遭遇する。オンラインプラットフォームは、機械学習モデルを使用してオンラインの害を和らげる一方で、これらのモデルが能力主義とどのように相互作用するかを研究する研究はほとんどない。本稿では,PwDをターゲットとした100のソーシャルメディアコメントのデータセットをキュレートし,160人の参加者を募集し,これらのコメントがいかに有毒で有能かを説明する。その後,最先端の毒性分類器 (TCs) と大規模言語モデル (LLMs) を誘導し,その害を評価・説明した。分析の結果, TCsおよびLSMsはPwDよりも毒性が有意に低かったが, LLMsは一般的にPwDと同程度であった。しかし、LLMによる能力主義の説明は感情的な害を見落としており、PwDの説明の重要な側面である文脈の特異性や認識が欠如していた。障害を意識した毒性分類器を設計する上での課題について論じ,能力主義検出から能力主義解釈・説明への転換を提唱する。

People with disabilities (PwD) regularly encounter ableist hate and microaggressions online. While online platforms use machine learning models to moderate online harm, there is little research investigating how these models interact with ableism. In this paper, we curated a dataset of 100 social media comments targeted towards PwD, and recruited 160 participants to rate and explain how toxic and ableist these comments were. We then prompted state-of-the art toxicity classifiers (TCs) and large language models (LLMs) to rate and explain the harm. Our analysis revealed that TCs and LLMs rated toxicity significantly lower than PwD, but LLMs rated ableism generally on par with PwD. However, ableism explanations by LLMs overlooked emotional harm, and lacked specificity and acknowledgement of context, important facets of PwD explanations. Going forward, we discuss challenges in designing disability-aware toxicity classifiers, and advocate for the shift from ableism detection to ableism interpretation and explanation.

翻訳日:2024-11-02 22:09:37 公開日:2024-10-04

# リトリーバーとしてのMLLM: エージェントのマルチモーダル検索を対話的に学習する

MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents ( http://arxiv.org/abs/2410.03450v1 )

ライセンス: Link先を確認

Junpeng Yue, Xinru Xu, Börje F. Karlsson, Zongqing Lu,

(参考訳) MLLMエージェントは、マルチモーダルなタスク関連軌道データを取得することで、複雑なエンボディされたタスクの可能性を実証する。しかし,現在の検索手法は,手前の特定のタスクに対する有効性を無視し,テキストや視覚的手がかりの表面レベルでの類似性に重点を置いている。この課題に対処するため,MLLM as ReTriever (MART) という新たな手法を提案し,対話データを利用してMLLMレトリバーを選好学習に基づいて微調整し,トラジェクトリの有効性を十分に考慮し,それらを未知のタスクに優先する手法を提案する。また、MLLMの要約機能を活用して、キー情報を保存しながら少ないトークンでトラジェクトリを表現する機構であるトラジェクトリ抽象化を導入し、エージェントがトラジェクトリのマイルストーンをよりよく理解できるようにする。様々な環境における実験結果から,本手法はベースライン手法と比較して,見えない場面でのタスク成功率を大幅に向上することが示された。本研究は,汎用MLLMをトラジェクタとして微調整し,トラジェクタの有効性を評価することで,エンボディエージェントのマルチモーダル検索のための新しいパラダイムを提案する。アクションと観測空間のすべてのベンチマークタスクセットとシミュレータコード修正がリリースされる。

MLLM agents demonstrate potential for complex embodied tasks by retrieving multimodal task-relevant trajectory data. However, current retrieval methods primarily focus on surface-level similarities of textual or visual cues in trajectories, neglecting their effectiveness for the specific task at hand. To address this issue, we propose a novel method, MLLM as ReTriever (MART), which enhances the performance of embodied agents by utilizing interaction data to fine-tune an MLLM retriever based on preference learning, such that the retriever fully considers the effectiveness of trajectories and prioritize them for unseen tasks. We also introduce Trajectory Abstraction, a mechanism that leverages MLLMs' summarization capabilities to represent trajectories with fewer tokens while preserving key information, enabling agents to better comprehend milestones in the trajectory. Experimental results across various environments demonstrate our method significantly improves task success rates in unseen scenes compared to baseline methods. This work presents a new paradigm for multimodal retrieval in embodied agents, by fine-tuning a general-purpose MLLM as the retriever to assess trajectory effectiveness. All benchmark task sets and simulator code modifications for action and observation spaces will be released.

翻訳日:2024-11-02 22:09:37 公開日:2024-10-04

# 周波数偏光超符号化フォトニック量子ビットの決定論的かつ効率的な源

A deterministic and efficient source of frequency-polarization hyper-encoded photonic qubits ( http://arxiv.org/abs/2410.03454v1 )

ライセンス: Link先を確認

N. Coste, D. A. Fioretto, S. E. Thomas, S. C. Wein, H. Ollivier, I. Maillette de Buy Wenniger, A. Henry, N. Belabas, A. Harouri, A. Lemaitre, I. Sagnes, N. Somaschi, O. Krebs, L. Lanco, P. Senellart,

(参考訳) 光子の周波数や色は、長距離にわたって量子情報を符号化し配布する魅力的な自由度である。しかし、周波数符号化されたフォトニック量子ビットの生成は、これまでは確率的な非線形単光子源と非効率ゲートに依存してきた。ここでは, 共振器内の半導体量子ドットに基づいて, 周波数および偏光に超符号化されたフォトニック量子ビットの決定論的生成を示す。我々は中性励起子の二重双極子構造を利用して、ポンプレーザーパルスの偏光によって制御される振幅と位相における量子重ね合わせの発生を実証する。ソースは、第1レンズの28$\pm$2%の生成確率に対応する4MHzの周波数偏光単光子量子ビットを生成し、光子数純度は98%である。光子は、それぞれの双極子に対して91%、両のバランスの取れた量子重ね合わせにおいて88%の区別がつかない。超符号化フォトニック状態の密度行列は時間分解偏光トモグラフィーにより測定され、目標状態に対する忠実度は94$\pm$ 8%、収束度は77$\pm$ 2%と推定される。我々のアプローチは、周波数符号化に基づく量子情報処理の分野に量子ドット源の利点をもたらす。

The frequency or color of photons is an attractive degree of freedom to encode and distribute the quantum information over long distances. However, the generation of frequency-encoded photonic qubits has so far relied on probabilistic non-linear single-photon sources and inefficient gates. Here, we demonstrate the deterministic generation of photonic qubits hyper-encoded in frequency and polarization based on a semiconductor quantum dot in a cavity. We exploit the double dipole structure of a neutral exciton and demonstrate the generation of any quantum superposition in amplitude and phase, controlled by the polarization of the pump laser pulse. The source generates frequency-polarization single-photon qubits at a rate of 4 MHz corresponding to a generation probability at the first lens of 28 $\pm$ 2%, with a photon number purity > 98%. The photons show an indistinguishability > 91% for each dipole and 88% for a balanced quantum superposition of both. The density matrix of the hyper-encoded photonic state is measured by time-resolved polarization tomography, evidencing a fidelity to the target state of 94 $\pm$ 8% and concurrence of 77 $\pm$ 2%, here limited by frequency overlap in our device. Our approach brings the advantages of quantum dot sources to the field of quantum information processing based on frequency encoding.

翻訳日:2024-11-02 22:09:37 公開日:2024-10-04

# CoCoLoFa: LLM支援の群衆が書いた共通の論理的誤りを伴うニュースコメントのデータセット

CoCoLoFa: A Dataset of News Comments with Common Logical Fallacies Written by LLM-Assisted Crowds ( http://arxiv.org/abs/2410.03457v1 )

ライセンス: Link先を確認

Min-Hsuan Yeh, Ruyuan Wan, Ting-Hao 'Kenneth' Huang,

(参考訳) テキスト中の論理的誤検出は、ユーザが引数の欠陥を見つけるのに役立つが、この検出を自動化するのは容易ではない。大規模な実世界のテキストデータを手動で注釈付けして、検出モデルの開発と検証のためのデータセットを作成するのはコストがかかる。本稿では,648件のニュース記事に対する7,706件のコメントと,それぞれのコメントに誤りの有無とタイプをラベル付けした,既知の最大の論理的誤読データセットであるCoCoLoFaを紹介する。我々は,ニュース記事に反応して,特定の誤字型(例えば,滑りやすい斜面)を具現化したコメントを書くために,143人の群衆労働者を募集した。この作業の複雑さを認識して,作業者のインターフェースにLLMを利用したアシスタントを構築し,コメントの起草と修正を支援した。専門家は、CoCoLoFaの書き込み品質とラベル付けの有効性を高い信頼性と評価した。 CoCoLoFaを使用して微調整されたBERTベースのモデルは、テストセット上で最高の誤検出(F1=0.86)と分類(F1=0.87)を達成し、最先端のLLMよりも優れていた。我々の研究は、クラウドソーシングとLLMを組み合わせることで、より効果的に複雑な言語現象のデータセットを構築することができることを示している。

Detecting logical fallacies in texts can help users spot argument flaws, but automating this detection is not easy. Manually annotating fallacies in large-scale, real-world text data to create datasets for developing and validating detection models is costly. This paper introduces CoCoLoFa, the largest known logical fallacy dataset, containing 7,706 comments for 648 news articles, with each comment labeled for fallacy presence and type. We recruited 143 crowd workers to write comments embodying specific fallacy types (e.g., slippery slope) in response to news articles. Recognizing the complexity of this writing task, we built an LLM-powered assistant into the workers' interface to aid in drafting and refining their comments. Experts rated the writing quality and labeling validity of CoCoLoFa as high and reliable. BERT-based models fine-tuned using CoCoLoFa achieved the highest fallacy detection (F1=0.86) and classification (F1=0.87) performance on its test set, outperforming the state-of-the-art LLMs. Our work shows that combining crowdsourcing and LLMs enables us to more effectively construct datasets for complex linguistic phenomena that crowd workers find challenging to produce on their own.

翻訳日:2024-11-02 22:09:37 公開日:2024-10-04

# 多次元ベトナム:タスク、データセット、ベースラインモデル、課題

Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges ( http://arxiv.org/abs/2410.03458v1 )

ライセンス: Link先を確認

Nguyen Van Dinh, Thanh Chi Dang, Luan Thanh Nguyen, Kiet Van Nguyen,

(参考訳) 低資源語であるベトナム語は通常、北ベトナム、中央ベトナム、南ベトナムに属する3つの主要な方言群に分類される。しかし、これらの地域内の各州は独自の発音のバリエーションを持っている。様々な音声認識データセットが存在するにもかかわらず、ベトナムの個々の州に特有の63の方言の詳細な分類を提供していない。このギャップに対処するため、ベトナム全土で話されている63の地方方言の多様性を包括的に分析したベトナム多方言データセット(ViMD)を導入した。我々のデータセットは、約19,000の発話からなる102.56時間の音声からなり、関連するテキストには120万以上の単語が含まれている。ベンチマークを行い、データセットの課題を同時に示すために、(1)識別と(2)音声認識の2つの下流タスクに対して、最先端のトレーニング済みモデルを微調整する。実験結果から,地理的要因が方言に与える影響と,多言語音声データを含む音声認識タスクにおける現在のアプローチの制約の2つが示唆された。私たちのデータセットは研究目的で利用可能です。

Vietnamese, a low-resource language, is typically categorized into three primary dialect groups that belong to Northern, Central, and Southern Vietnam. However, each province within these regions exhibits its own distinct pronunciation variations. Despite the existence of various speech recognition datasets, none of them has provided a fine-grained classification of the 63 dialects specific to individual provinces of Vietnam. To address this gap, we introduce Vietnamese Multi-Dialect (ViMD) dataset, a novel comprehensive dataset capturing the rich diversity of 63 provincial dialects spoken across Vietnam. Our dataset comprises 102.56 hours of audio, consisting of approximately 19,000 utterances, and the associated transcripts contain over 1.2 million words. To provide benchmarks and simultaneously demonstrate the challenges of our dataset, we fine-tune state-of-the-art pre-trained models for two downstream tasks: (1) Dialect identification and (2) Speech recognition. The empirical results suggest two implications including the influence of geographical factors on dialects, and the constraints of current approaches in speech recognition tasks involving multi-dialect speech data. Our dataset is available for research purposes.

翻訳日:2024-11-02 22:09:37 公開日:2024-10-04

# テキスト音声合成のための生成意味コミュニケーション

Generative Semantic Communication for Text-to-Speech Synthesis ( http://arxiv.org/abs/2410.03459v1 )

ライセンス: Link先を確認

Jiahao Zheng, Jinke Ren, Peng Xu, Zhihao Yuan, Jie Xu, Fangxin Wang, Gui Gui, Shuguang Cui,

(参考訳) セマンティック通信は、ソースデータのセマンティック情報のみを送信することによって、通信効率を向上させるための有望な技術である。しかし,従来の意味コミュニケーション手法は,テキスト音声合成(TTS)のような新たな生成タスクでは効率が良くないデータ再構成タスクに重点を置いている。この制限に対処するために, 生成人工知能技術を活用した, TTS合成のための新しい生成意味コミュニケーションフレームワークを開発する。まず,WavLMと残留ベクトル量子化法という事前学習された大音声モデルを用いて,送信側と受信側で2つの意味的知識ベース(KB)を構築する。送信機におけるKBは効果的な意味抽出を可能にし、受信機におけるKBは生命に似た音声合成を促進する。そこで我々は,トランスフォーマーエンコーダと拡散モデルを用いて,通信オーバーヘッドを伴わずに効率的なセマンティックコーディングを実現する。最後に, 付加的な白色ガウスノイズ流路とレイリーフェディング流路のいずれにおいても, 生成した音声の忠実度は4つのベースラインよりもはるかに高いことを示した。

Semantic communication is a promising technology to improve communication efficiency by transmitting only the semantic information of the source data. However, traditional semantic communication methods primarily focus on data reconstruction tasks, which may not be efficient for emerging generative tasks such as text-to-speech (TTS) synthesis. To address this limitation, this paper develops a novel generative semantic communication framework for TTS synthesis, leveraging generative artificial intelligence technologies. Firstly, we utilize a pre-trained large speech model called WavLM and the residual vector quantization method to construct two semantic knowledge bases (KBs) at the transmitter and receiver, respectively. The KB at the transmitter enables effective semantic extraction, while the KB at the receiver facilitates lifelike speech synthesis. Then, we employ a transformer encoder and a diffusion model to achieve efficient semantic coding without introducing significant communication overhead. Finally, numerical results demonstrate that our framework achieves much higher fidelity for the generated speech than four baselines, in both cases with additive white Gaussian noise channel and Rayleigh fading channel.

翻訳日:2024-11-02 22:09:37 公開日:2024-10-04

# 自動GDA: 検索拡張生成における効率的な接地検証のための自動ドメイン適応

Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation ( http://arxiv.org/abs/2410.03461v1 )

ライセンス: Link先を確認

Tobias Leemann, Periklis Petridis, Giuseppe Vietri, Dionysis Manousakas, Aaron Roth, Sergul Aydore,

(参考訳) 検索拡張生成(RAG)は、大規模言語モデル(LLM)出力の事実性を高めることが示されているが、LLMはまだ幻覚に悩まされており、誤った情報や無関係な情報を生成する。 1つの一般的な検出戦略は、LLMにその応答が得られた証拠に根拠があるかどうかを再度評価させることであるが、このアプローチはコストがかかる。あるいは、効率的な基底検証のための軽量自然言語推論(NLI)モデルも推論時に利用できる。既存の事前学習されたNLIモデルは潜在的な解決策を提供するが、実際のRAG入力のより大きなモデルに比べて性能は低い。 RAG入力は、NLIモデルをトレーニングするために使われるほとんどのデータセットよりも複雑で、基礎となる知識ベースに特有の特徴を持ち、特定のターゲットドメインにNLIモデルを適応する必要がある。さらに、ターゲットドメインにラベル付きインスタンスがないため、例えば、微調整によって、教師付きドメイン適応が不可能になる。これらの課題に対処するために、自動生成ドメイン適応(Auto Generative Domain Adaptation, Auto-GDA)を導入する。我々のフレームワークは、合成データ生成による教師なしドメイン適応を可能にする。従来の手作りフィルタリングや拡張戦略に依存した手法とは異なり、Auto-GDAは、低効率の教師モデルからの弱いラベルと離散最適化を用いて生成したサンプルの品質を継続的に改善し、最も有望な追加サンプルを選択するために反復的なプロセスを採用している。提案手法の有効性を実験的に検証し,Auto-GDAを用いた合成データに微調整したモデルが,教師モデルの性能を上回り,LLMの性能レベルを計算コストの10%にまで達することを示した。

While retrieval augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs, LLMs still suffer from hallucination, generating incorrect or irrelevant information. One common detection strategy involves prompting the LLM again to assess whether its response is grounded in the retrieved evidence, but this approach is costly. Alternatively, lightweight natural language inference (NLI) models for efficient grounding verification can be used at inference time. While existing pre-trained NLI models offer potential solutions, their performance remains subpar compared to larger models on realistic RAG inputs. RAG inputs are more complex than most datasets used for training NLI models and have characteristics specific to the underlying knowledge base, requiring adaptation of the NLI models to a specific target domain. Additionally, the lack of labeled instances in the target domain makes supervised domain adaptation, e.g., through fine-tuning, infeasible. To address these challenges, we introduce Automatic Generative Domain Adaptation (Auto-GDA). Our framework enables unsupervised domain adaptation through synthetic data generation. Unlike previous methods that rely on handcrafted filtering and augmentation strategies, Auto-GDA employs an iterative process to continuously improve the quality of generated samples using weak labels from less efficient teacher models and discrete optimization to select the most promising augmented samples. Experimental results demonstrate the effectiveness of our approach, with models fine-tuned on synthetic data using Auto-GDA often surpassing the performance of the teacher model and reaching the performance level of LLMs at 10 % of their computational cost.

翻訳日:2024-11-02 22:09:37 公開日:2024-10-04

# 逆問題に対する拡散状態誘導射影勾配

Diffusion State-Guided Projected Gradient for Inverse Problems ( http://arxiv.org/abs/2410.03463v1 )

ライセンス: Link先を確認

Rayhan Zirvi, Bahareh Tolooshams, Anima Anandkumar,

(参考訳) 拡散モデルの最近の進歩は、逆問題解決のためのデータ事前学習に有効である。拡散サンプリングステップを利用して、各ステップで測定ガイダンス勾配を使用してデータの一貫性を強制する。一般の逆問題では、測定精度が低下し、不正確な後続サンプリングが生じるため、無条件で訓練された拡散モデルを使用する場合、近似が必要である。言い換えれば、それらの近似により、これらの手法は拡散前の拡散によって定義されるデータ多様体上の生成過程を保存できず、画像復元のような応用の成果物に繋がる。拡散過程の中間状態の低ランク近似である部分空間に測定勾配を投影する拡散状態誘導射影勾配(DiffStateGrad)を提案する。 DiffStateGradは、モジュールとして、幅広い拡散ベースの逆解法に付加することができ、以前の多様体上の拡散過程の保存を改善し、アーティファクト誘導コンポーネントをフィルタリングすることができる。 DiffStateGradは、測定手順のステップサイズとノイズの選択による拡散モデルのロバスト性の向上と、最悪の場合の性能向上を両立させる。最後に、DiffStateGradは、線形および非線形画像復元の逆問題に対する最先端技術を改善することを実証する。

Recent advancements in diffusion models have been effective in learning data priors for solving inverse problems. They leverage diffusion sampling steps for inducing a data prior while using a measurement guidance gradient at each step to impose data consistency. For general inverse problems, approximations are needed when an unconditionally trained diffusion model is used since the measurement likelihood is intractable, leading to inaccurate posterior sampling. In other words, due to their approximations, these methods fail to preserve the generation process on the data manifold defined by the diffusion prior, leading to artifacts in applications such as image restoration. To enhance the performance and robustness of diffusion models in solving inverse problems, we propose Diffusion State-Guided Projected Gradient (DiffStateGrad), which projects the measurement gradient onto a subspace that is a low-rank approximation of an intermediate state of the diffusion process. DiffStateGrad, as a module, can be added to a wide range of diffusion-based inverse solvers to improve the preservation of the diffusion process on the prior manifold and filter out artifact-inducing components. We highlight that DiffStateGrad improves the robustness of diffusion models in terms of the choice of measurement guidance step size and noise while improving the worst-case performance. Finally, we demonstrate that DiffStateGrad improves upon the state-of-the-art on linear and nonlinear image restoration inverse problems.

翻訳日:2024-11-02 22:09:37 公開日:2024-10-04

# S7: シーケンスモデリングのための選択的で単純化された状態空間層

S7: Selective and Simplified State Space Layers for Sequence Modeling ( http://arxiv.org/abs/2410.03464v1 )

ライセンス: Link先を確認

Taylan Soydan, Nikola Zubić, Nico Messikommer, Siddhartha Mishra, Davide Scaramuzza,

(参考訳) シーケンスモデリングにおける中心的な課題は、拡張されたコンテキストでタスクを効率的に処理することである。最近の状態空間モデル(SSM)はこの分野で大きな進歩を遂げているが、入力依存フィルタリングが欠如している場合が多い。安定なパラメータ化と特定の設計選択を取り入れ、入力内容に基づいて状態遷移を動的に調整し、効率と性能を維持しながら、入力依存を処理できるシンプルで強力なSSMであるS7を導入することで、このギャップに対処する。この再パラメータ化は、時間とともに状態遷移を良好に保ち、長期連続モデリングにおける安定性を保証することを証明している。さらに、グラデーション規範をコントロールし、効率的なトレーニングを可能にし、グラデーションの爆発や消滅といった問題を防止する。 S7は、ニューロモルフィックイベントベースのデータセット、Long Range Arenaベンチマーク、さまざまな物理的および生物学的時系列など、さまざまなシーケンスモデリングタスクにおいて、ベースラインを大幅に上回っている。全体として、S7は、複雑なドメイン固有の帰納的バイアスに頼ることなく、より簡単なシーケンスモデリングアプローチを提供する。

A central challenge in sequence modeling is efficiently handling tasks with extended contexts. While recent state-space models (SSMs) have made significant progress in this area, they often lack input-dependent filtering or require substantial increases in model complexity to handle input variability. We address this gap by introducing S7, a simplified yet powerful SSM that can handle input dependence while incorporating stable reparameterization and specific design choices to dynamically adjust state transitions based on input content, maintaining efficiency and performance. We prove that this reparameterization ensures stability in long-sequence modeling by keeping state transitions well-behaved over time. Additionally, it controls the gradient norm, enabling efficient training and preventing issues like exploding or vanishing gradients. S7 significantly outperforms baselines across various sequence modeling tasks, including neuromorphic event-based datasets, Long Range Arena benchmarks, and various physical and biological time series. Overall, S7 offers a more straightforward approach to sequence modeling without relying on complex, domain-specific inductive biases, achieving significant improvements across key benchmarks.

翻訳日:2024-11-02 21:59:46 公開日:2024-10-04

# 安全か? ヘイトスピーチカウンターリングにおけるLLMの断面積強度に及ぼすガードレールの影響

Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering ( http://arxiv.org/abs/2410.03466v1 )

ライセンス: Link先を確認

Helena Bonaldi, Greta Damo, Nicolás Benjamín Ocampo, Elena Cabrio, Serena Villata, Marco Guerini,

(参考訳) ヘイトスピーチ緩和戦略としての反音声の有効性は、NLG研究コミュニティ、特にそれを自動生成するタスクへの関心が高まりつつある。しかし、自動生成された応答は、専門家が生成した反音声を特徴付ける議論的な豊かさを欠くことが多い。本研究では,よりコジェントな応答を生成するために,対音声生成の2つの側面に焦点を当てる。まず, 安全ガードレールの存在が世代品質を損なうかどうかを検証した。第二に、ヘイトスピーチの特定の要素を攻撃することが、オンラインヘイトと戦うためのより効果的な議論戦略をもたらすかどうかを評価する。広範囲な人的・自動的な評価を行うことにより、安全ガードレールの存在がいかに有害であるかを、本質的に肯定的な社会的相互作用を育むことを目的とした課題に示す。さらに, ヘイトスピーチの特定の構成要素, 特に暗黙の否定的ステレオタイプとそのヘイトフルな部分に対する攻撃は, 高品質な世代を生み出すことが示唆された。

The potential effectiveness of counterspeech as a hate speech mitigation strategy is attracting increasing interest in the NLG research community, particularly towards the task of automatically producing it. However, automatically generated responses often lack the argumentative richness which characterises expert-produced counterspeech. In this work, we focus on two aspects of counterspeech generation to produce more cogent responses. First, by investigating the tension between helpfulness and harmlessness of LLMs, we test whether the presence of safety guardrails hinders the quality of the generations. Secondly, we assess whether attacking a specific component of the hate speech results in a more effective argumentative strategy to fight online hate. By conducting an extensive human and automatic evaluation, we show how the presence of safety guardrails can be detrimental also to a task that inherently aims at fostering positive social interactions. Moreover, our results show that attacking a specific component of the hate speech, and in particular its implicit negative stereotype and its hateful parts, leads to higher-quality generations.

翻訳日:2024-11-02 21:59:46 公開日:2024-10-04

# 注意図のトポロジ解析による脆弱性検出

Vulnerability Detection via Topological Analysis of Attention Maps ( http://arxiv.org/abs/2410.03470v1 )

ライセンス: Link先を確認

Pavel Snopov, Andrey Nikolaevich Golubinskiy,

(参考訳) 近年,脆弱性検出に対するディープラーニング(DL)アプローチが注目されている。これらの手法は有望な結果を示し、多くの場合、従来の静的コード解析ツールをはるかに上回っている。本研究では,BERTモデルの注意行列に基づくトポロジカルデータ解析(TDA)のツールを用いた脆弱性検出手法を提案する。従来の機械学習(ML)技術は,これらの注意行列から抽出したトポロジ的特徴に基づいて訓練すると,CodeBERTaのような事前学習言語モデル(LLM)と競合する。これは、永続的ホモロジーを含むTDAツールが、脆弱性を特定するために重要な意味情報を効果的にキャプチャできることを示している。

Recently, deep learning (DL) approaches to vulnerability detection have gained significant traction. These methods demonstrate promising results, often surpassing traditional static code analysis tools in effectiveness. In this study, we explore a novel approach to vulnerability detection utilizing the tools from topological data analysis (TDA) on the attention matrices of the BERT model. Our findings reveal that traditional machine learning (ML) techniques, when trained on the topological features extracted from these attention matrices, can perform competitively with pre-trained language models (LLMs) such as CodeBERTa. This suggests that TDA tools, including persistent homology, are capable of effectively capturing semantic information critical for identifying vulnerabilities.

翻訳日:2024-11-02 21:59:46 公開日:2024-10-04

# ピアレビューにおけるグループフェアネス

Group Fairness in Peer Review ( http://arxiv.org/abs/2410.03474v1 )

ライセンス: Link先を確認

Haris Aziz, Evi Micha, Nisarg Shah,

(参考訳) NeurIPSやAAAIといった大規模なカンファレンスは、多数のコミュニティからの応募を惹きつけるため、さまざまなAI分野のクロスロードとして機能している。しかし、一部のコミュニティではレビュー経験が不十分な場合があり、そのコミュニティ以外では資格の低いレビュアーに応募が割り当てられている。しばしば推奨される解決策は、このような大きなカンファレンスを小さなカンファレンスに分割することだが、これはコミュニティの分離と学際的な研究の害につながる可能性がある。我々は、この課題に取り組み、コア(core)と呼ばれるグループフェアネスの概念を導入し、可能なすべてのコミュニティ(研究者のサブセット)を、大きなカンファレンスから撤退することで、一方的に利益を得ることができない方法で扱うことを要求する。我々は、簡単なピアレビューモデルについて研究し、常にコアにレビューの代入が認められることを証明し、そのような代入を見つけるための効率的なアルゴリズムを設計する。 CVPRとICLRのカンファレンスの実際のデータを使って、アルゴリズムと既存のレビュー割り当てアルゴリズムを、さまざまなメトリクスで比較しています。

Large conferences such as NeurIPS and AAAI serve as crossroads of various AI fields, since they attract submissions from a vast number of communities. However, in some cases, this has resulted in a poor reviewing experience for some communities, whose submissions get assigned to less qualified reviewers outside of their communities. An often-advocated solution is to break up any such large conference into smaller conferences, but this can lead to isolation of communities and harm interdisciplinary research. We tackle this challenge by introducing a notion of group fairness, called the core, which requires that every possible community (subset of researchers) to be treated in a way that prevents them from unilaterally benefiting by withdrawing from a large conference. We study a simple peer review model, prove that it always admits a reviewing assignment in the core, and design an efficient algorithm to find one such assignment. We use real data from CVPR and ICLR conferences to compare our algorithm to existing reviewing assignment algorithms on a number of metrics.

翻訳日:2024-11-02 21:59:46 公開日:2024-10-04

# 階層型ニューラルネットワークの学習の難しさについて

On the Hardness of Learning One Hidden Layer Neural Networks ( http://arxiv.org/abs/2410.03477v1 )

ライセンス: Link先を確認

Shuchen Li, Ilias Zadik, Manolis Zampetakis,

(参考訳) 本研究では,1つの隠れ層ReLUニューラルネットワークを$\mathbb{R}^d$から入力することで学習する問題を考察する。この学習問題は,(1)ニューラルネットワークのサイズが$d$の多項式であり,(2)入力分布が標準ガウスであり,(3)ノイズがガウスで$d$の多項式が小さい場合においても,標準的な暗号的仮定の下では困難であることを示す。我々の硬さは、連続学習エラー(CLWE)問題の硬さに基づいており、特に、最も短いベクトル問題を乗算多項式係数まで解くという、最も難しい硬さに基づいている。

In this work, we consider the problem of learning one hidden layer ReLU neural networks with inputs from $\mathbb{R}^d$. We show that this learning problem is hard under standard cryptographic assumptions even when: (1) the size of the neural network is polynomial in $d$, (2) its input distribution is a standard Gaussian, and (3) the noise is Gaussian and polynomially small in $d$. Our hardness result is based on the hardness of the Continuous Learning with Errors (CLWE) problem, and in particular, is based on the largely believed worst-case hardness of approximately solving the shortest vector problem up to a multiplicative polynomial factor.

翻訳日:2024-11-02 21:59:46 公開日:2024-10-04

# VEDIT:手続き型ビデオ表現学習のための潜在予測アーキテクチャ

VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning ( http://arxiv.org/abs/2410.03478v1 )

ライセンス: Link先を確認

Han Lin, Tushar Nagarajan, Nicolas Ballas, Mido Assran, Mojtaba Komeili, Mohit Bansal, Koustuv Sinha,

(参考訳) 手続き型ビデオ表現学習(Procedural video representation learning)は、現在入力されている映像をテキストアノテーションとともに予測し、予測できるエージェントを学習することを目的とした活発な研究分野である。先行研究は、しばしば言語監督を伴う視覚エンコーダや予測モデルの大規模事前学習に依存している。しかし、ノイズの多いテキスト管理を伴うビデオクリップシーケンスを学習するために、計算集中事前学習を拡張する必要性と効果は、これまでの研究でまだ十分に検証されていない。本研究では,厳密な既成の凍結型視覚エンコーダとよく設計された予測モデルを用いて,予測モデルの事前訓練や言語やASRからの追加の監督を必要とせず,予測および手続き計画における最先端(SoTA)のパフォーマンスを実現できることを示す。画素空間から表現を学習する代わりに,一般に公開されている視覚エンコーダの埋め込み空間を利用する。観察されたステップから凍結したクリップレベルの埋め込みを条件付け、未確認ステップの動作を予測することによって、我々の予測モデルは、反復的復調により予測のための堅牢な表現を学習することができる。 4つのデータセット(NIV, CrossTask, COIN, Ego4D-v2)にまたがる5つの手続き的学習タスク(NIV, CrossTask, COIN, Ego4D-v2)に関する実証的研究は、我々のモデルが長方形の行動予測において強いベースライン(+2.6%、Noun ED@20では+3.1%)を前進させ、ステップ予測(+5.0%)、タスク分類(+3.8%)、手順計画タスク(+2.28%、mAccでは+3.39%、mIoUでは+0.90%)においてSoTAを大幅に改善していることを示している。

Procedural video representation learning is an active research area where the objective is to learn an agent which can anticipate and forecast the future given the present video input, typically in conjunction with textual annotations. Prior works often rely on large-scale pretraining of visual encoders and prediction models with language supervision. However, the necessity and effectiveness of extending compute intensive pretraining to learn video clip sequences with noisy text supervision have not yet been fully validated by previous works. In this work, we show that a strong off-the-shelf frozen pretrained visual encoder, along with a well designed prediction model, can achieve state-of-the-art (SoTA) performance in forecasting and procedural planning without the need for pretraining the prediction model, nor requiring additional supervision from language or ASR. Instead of learning representations from pixel space, our method utilizes the latent embedding space of publicly available vision encoders. By conditioning on frozen clip-level embeddings from observed steps to predict the actions of unseen steps, our prediction model is able to learn robust representations for forecasting through iterative denoising - leveraging the recent advances in diffusion transformers (Peebles & Xie, 2023). Empirical studies over a total of five procedural learning tasks across four datasets (NIV, CrossTask, COIN and Ego4D-v2) show that our model advances the strong baselines in long-horizon action anticipation (+2.6% in Verb ED@20, +3.1% in Noun ED@20), and significantly improves the SoTA in step forecasting (+5.0%), task classification (+3.8%), and procedure planning tasks (up to +2.28% in success rate, +3.39% in mAcc, and +0.90% in mIoU).

翻訳日:2024-11-02 21:59:46 公開日:2024-10-04

# 開量子系問題としてのパラメトリック近似

Parametric approximation as open quantum systems problem ( http://arxiv.org/abs/2410.03482v1 )

ライセンス: Link先を確認

A. Yu. Karasev, A. E. Teretenkov,

(参考訳) 本研究では、パラメトリック近似のオープン量子系ビューを開発し、それに対する体系的な摂動補正を得る。散逸を伴うJaynes-Cummingsモデルを考えると、この場は減少を伴うパラメトリック近似に近い状態にあると仮定する。パラメトリック近似に対する非単項補正とそれに対する動的ラムシフトの寄与を得る。高階調では、これらの非単位補正は、枯渇する前に非マルコフ的であるように見える。また, 劣化後の初期マルコフ的非マルコフ的挙動は, レーザー誘起密度行列の研磨により動的に寄与することを示した。

In this work we develop an open quantum system view of the parametric approximation, which allows us to obtain systematic perturbative corrections to it. We consider the Jaynes-Cummings model with dissipation, assuming that the field is in the regime close to the parametric approximation with depletion. We obtain non-unitary corrections to the parametric approximation and additional dynamical Lamb-shift contributions to it. For high detuning, these non-unitary corrections appear to be non-Markovian before depletion. And we show that even after depletion, initial non-Markovian behaviour contributes to the dynamics via laser-induced polishing of the density matrix.

翻訳日:2024-11-02 21:59:46 公開日:2024-10-04

# ディープフェイク検出のためのマルチモーダルフレームワーク

A Multimodal Framework for Deepfake Detection ( http://arxiv.org/abs/2410.03487v1 )

ライセンス: Link先を確認

Kashish Gandhi, Prutha Kulkarni, Taran Shah, Piyush Chaudhari, Meera Narvekar, Kranti Ghag,

(参考訳) ディープフェイク技術の急速な進歩は、デジタルメディアの完全性に重大な脅威をもたらす。 AIを使って合成メディアを作るDeepfakesは、ビデオやオーディオを説得力を持って修正して、現実を正しく表現する。これにより、個人情報、詐欺、および個人のプライバシーとセキュリティに対する深刻な影響のリスクが生じる。本研究は,視覚的要素と聴覚的要素の両方を対象とする,革新的なマルチモーダルアプローチによるディープフェイクの重要課題に対処する。この包括的な戦略は、人間の知覚が複数の感覚入力、特に視覚情報と聴覚情報を統合し、メディアコンテンツを完全に理解することを認識する。視覚分析のために,高度な特徴抽出技術を用いたモデルを開発し,9つの顔の特徴を抽出し,様々な機械学習モデルと深層学習モデルを適用した。本モデルでは,特徴抽出にメル・スペクトログラム解析を用い,各種機械学習および深層学習モデルを適用した。組み合わせた分析を実現するため、元のデータセットの実際の音声とディープフェイク音声は、テスト目的で交換され、バランスの取れたサンプルが確保された。提案した映像・音声分類モデル,すなわち人工ニューラルネットワークとVGG19を用いて,いずれの成分も同定した場合,全体サンプルをディープフェイクとして分類する。我々のマルチモーダル・フレームワークは視覚的・聴覚的分析を組み合わせたもので、精度は94%である。

The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.

翻訳日:2024-11-02 21:59:46 公開日:2024-10-04

# FAIR原則を計算ワークフローに適用する

Applying the FAIR Principles to Computational Workflows ( http://arxiv.org/abs/2410.03490v1 )

ライセンス: Link先を確認

Sean R. Wilkinson, Meznah Aloqalaa, Khalid Belhajjame, Michael R. Crusoe, Bruno de Paula Kinoshita, Luiz Gadelha, Daniel Garijo, Ove Johan Ragnar Gustafsson, Nick Juty, Sehrish Kanwal, Farah Zaib Khan, Johannes Köster, Karsten Peters-von Gehlen, Line Pouchard, Randy K. Rannow, Stian Soiland-Reyes, Nicola Soranzo, Shoaib Sufi, Ziheng Sun, Baiba Vilne, Merridee A. Wouters, Denis Yuen, Carole Goble,

(参考訳) 計算科学とデータ科学の最近のトレンドは、生産性、再現性、プラットフォームへの民主化アクセスとノウハウの処理のためのツールとして、計算ワークフローの認識と採用が増加していることを示している。デジタルオブジェクトを共有、発見、再利用するためには、計算ワークフローはFinderable、Accessible、Interoperable、ReusableのFAIR原則の恩恵を受ける。 Workflows Community InitiativeのFAIR Workflows Working Group (WCI-FW)は、FAIRデータとソフトウェア原則の両方を計算ワークフローに適用する体系的な取り組みを行っている。我々は、私たちの議論を反映し、私たちの選択と適応を正当化するコメンデーションを提示する。それらがベースとするソフトウェアやデータ原則と同様に、これらはワークフローユーザと作者、ワークフロー管理システム開発者、ワークフローサービスのプロバイダに対して、採用のためのガイドレールとして提供され、議論の場となる。ワークフローは、データ分析、データ収集、AIベースの予測、シミュレーションのためのドキュメント化、自動化された機器として、より普及しつつある。本論文で提案するワークフローに対するFAIR勧告は,研究資産としての価値を最大化し,より広範なコミュニティによる採用を促進するものである。

Recent trends within computational and data sciences show an increasing recognition and adoption of computational workflows as tools for productivity, reproducibility, and democratized access to platforms and processing know-how. As digital objects to be shared, discovered, and reused, computational workflows benefit from the FAIR principles, which stand for Findable, Accessible, Interoperable, and Reusable. The Workflows Community Initiative's FAIR Workflows Working Group (WCI-FW), a global and open community of researchers and developers working with computational workflows across disciplines and domains, has systematically addressed the application of both FAIR data and software principles to computational workflows. We present our recommendations with commentary that reflects our discussions and justifies our choices and adaptations. Like the software and data principles on which they are based, these are offered to workflow users and authors, workflow management system developers, and providers of workflow services as guide rails for adoption and fodder for discussion. Workflows are becoming more prevalent as documented, automated instruments for data analysis, data collection, AI-based predictions, and simulations. The FAIR recommendations for workflows that we propose in this paper will maximize their value as research assets and facilitate their adoption by the wider community.

翻訳日:2024-11-02 21:59:46 公開日:2024-10-04

# 再現可能なLCM評価に向けて:LCMベンチマークスコアの不確かさの定量化

Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores ( http://arxiv.org/abs/2410.03492v1 )

ライセンス: Link先を確認

Robert E. Blackwell, Jon Barry, Anthony G. Cohn,

(参考訳) 大規模言語モデル(LLM)は確率的であり、固定されたランダムシードで温度を0に設定しても、すべてのモデルが決定論的回答を与えるわけではない。しかしながら、連続実験の時間とコストのために不確実性を定量化しようとするベンチマーク研究はほとんどない。 LLMのキャパシティをテストするために設計されたベンチマークを用いて,実験的な繰り返しが平均スコアと予測間隔に与える影響を推定する。本稿では,ベンチマークスコアの不確かさを簡易に定量化する手法を提案し,再現可能なLCM評価について提案する。

Large language models (LLMs) are stochastic, and not all models give deterministic answers, even when setting temperature to zero with a fixed random seed. However, few benchmark studies attempt to quantify uncertainty, partly due to the time and cost of repeated experiments. We use benchmarks designed for testing LLMs' capacity to reason about cardinal directions to explore the impact of experimental repeats on mean score and prediction interval. We suggest a simple method for cost-effectively quantifying the uncertainty of a benchmark score and make recommendations concerning reproducible LLM evaluation.

翻訳日:2024-11-02 21:59:45 公開日:2024-10-04

# 合成可能な化学空間をナビゲートするための生成人工知能

Generative Artificial Intelligence for Navigating Synthesizable Chemical Space ( http://arxiv.org/abs/2410.03494v1 )

ライセンス: Link先を確認

Wenhao Gao, Shitong Luo, Connor W. Coley,

(参考訳) 合成可能な化学空間を効率的に探索し、ナビゲートするための生成モデリングフレームワークであるSynFormerを紹介する。従来の分子生成手法とは異なり、我々は分子の合成経路を生成し、設計が合成的に牽引可能であることを保証する。拡張性のあるトランスフォーマーアーキテクチャとブロック選択のための拡散モジュールを組み込むことで、SynFormerは合成可能な分子設計において既存のモデルを超えている。本研究では,(1) 局所的な化学空間探索,(2) 参照分子の合成可能な類似物を生成する,(2) グローバルな化学空間探索,(2) ブラックボックス特性予測オラクルに基づいて最適な分子を同定する,という2つの主要な応用において,SynFormerの有効性を実証する。さらに,より計算資源が利用可能になるにつれて,性能の向上を通じて,我々のアプローチのスケーラビリティを実証する。当社のコードとトレーニング済みのモデルを公開することで、SynFormerは、薬物発見や材料科学の分野にまたがって利用できるようになることを期待しています。

We introduce SynFormer, a generative modeling framework designed to efficiently explore and navigate synthesizable chemical space. Unlike traditional molecular generation approaches, we generate synthetic pathways for molecules to ensure that designs are synthetically tractable. By incorporating a scalable transformer architecture and a diffusion module for building block selection, SynFormer surpasses existing models in synthesizable molecular design. We demonstrate SynFormer's effectiveness in two key applications: (1) local chemical space exploration, where the model generates synthesizable analogs of a reference molecule, and (2) global chemical space exploration, where the model aims to identify optimal molecules according to a black-box property prediction oracle. Additionally, we demonstrate the scalability of our approach via the improvement in performance as more computational resources become available. With our code and trained models openly available, we hope that SynFormer will find use across applications in drug discovery and materials science.

翻訳日:2024-11-02 21:59:45 公開日:2024-10-04

# Fourier PINN: 強い境界条件から適応的なFourierベースへ

Fourier PINNs: From Strong Boundary Conditions to Adaptive Fourier Bases ( http://arxiv.org/abs/2410.03496v1 )

ライセンス: Link先を確認

Madison Cooley, Varun Shankar, Robert M. Kirby, Shandian Zhe,

(参考訳) 偏微分方程式(PDE)の従来の数値解法に代わるメッシュフリーの代替として、物理情報ニューラルネットワーク(PINN)への関心が高まっている。しかし、PINNは高頻度でマルチスケールなターゲットソリューションを学ぶのに苦労することが多い。この問題に対処するために,我々はまず,ディリクレ BC に対する PINN の強い境界条件 (BC) について検討し,標準 PINN と比較して相対誤差が一貫した減少を観察する。次にフーリエ変換と畳み込み定理に基づく理論的解析を行う。強いBC PINNは、ターゲット溶液の高周波成分の振幅をよりよく学習できることがわかった。しかし、強力なBC PINNのアーキテクチャを構築することは、多くのBCやドメインのジオメトリにとって困難である。理論解析により,Fourier PINN を提案する。Fourier PINN は単純で汎用的で強力な手法で,あらかじめ特定された密度の高いFourier ベースで PINN を増強する。提案アーキテクチャも同様に高周波成分を学習するが、特定のBCや問題領域に制限はない。本研究では,ニューラルネットベース最適化,フーリエとニューラルネットベースベース推定,係数切り抜きによる適応学習とベース選択アルゴリズムを開発した。このスキームは、高い周波数を柔軟に識別し、名目周波数を弱め、ターゲットの溶液のパワースペクトルをよりよく捉えることができる。我々は,一連の系統的な実験を通じて,アプローチの利点を示す。

Interest is rising in Physics-Informed Neural Networks (PINNs) as a mesh-free alternative to traditional numerical solvers for partial differential equations (PDEs). However, PINNs often struggle to learn high-frequency and multi-scale target solutions. To tackle this problem, we first study a strong Boundary Condition (BC) version of PINNs for Dirichlet BCs and observe a consistent decline in relative error compared to the standard PINNs. We then perform a theoretical analysis based on the Fourier transform and convolution theorem. We find that strong BC PINNs can better learn the amplitudes of high-frequency components of the target solutions. However, constructing the architecture for strong BC PINNs is difficult for many BCs and domain geometries. Enlightened by our theoretical analysis, we propose Fourier PINNs -- a simple, general, yet powerful method that augments PINNs with pre-specified, dense Fourier bases. Our proposed architecture likewise learns high-frequency components better but places no restrictions on the particular BCs or problem domains. We develop an adaptive learning and basis selection algorithm via alternating neural net basis optimization, Fourier and neural net basis coefficient estimation, and coefficient truncation. This scheme can flexibly identify the significant frequencies while weakening the nominal frequencies to better capture the target solution's power spectrum. We show the advantage of our approach through a set of systematic experiments.

翻訳日:2024-11-02 21:50:00 公開日:2024-10-04

# 適応器の混合による協調的・効率的なパーソナライゼーション

Collaborative and Efficient Personalization with Mixtures of Adaptors ( http://arxiv.org/abs/2410.03497v1 )

ライセンス: Link先を確認

Abdulla Jasem Almansoori, Samuel Horváth, Martin Takáč,

(参考訳) 非イドデータは、現実世界のフェデレーション学習問題で広く利用されている。データの不均一性は、分散シフトの点で異なるタイプのものとなる。この研究では、コンセプトシフト、すなわちクライアント間での予測のシフトから生じる異質性に興味を持っています。特に、モデルをクライアントのタスクに適応させたいマルチタスク学習について検討する。この問題に対処するためのパラメータ効率フレームワークを提案し、各クライアントはそのタスクに応じてパラメータ効率のよいアダプタを混在させることを学ぶ。バックボーンとしてLoRA(Lolow-Rank Adaptors)を使用し、そのコンセプトを他のタイプのレイヤに拡張しています。当社のフレームワークをFLoRAL(Federated Low-Rank Adaptive Learning)と呼んでいます。このフレームワークは、アルゴリズムではなく、マルチタスク学習目的のためのモデルパラメータ化であり、文献からの多くのアルゴリズムを含む、この目的を最適化する任意のアルゴリズム上で動作することができる。 FLoRALはメモリ効率が高く、クライアントはアダプタ自体がフェデレーションされるため、小さな状態(例えば、アダプタ毎に1個)でパーソナライズされる。したがって、パーソナライゼーションは、この意味でも----フェデレーションである。クライアントは、アダプタをローカルにトレーニングすることで、より自由にパーソナライズすることができるが、アダプタの協調的かつ効率的なトレーニングが可能であり、パフォーマンスが向上することを示す。また,FLoralは,フェデレートされたパーソナライゼーションのメリットと,FLoralの過度な適合性を示すため,クラスタ割り当てを最適化した完全モデルのアンサンブルよりも優れていることを示す。合成データセット,MNIST, CIFAR-10, CIFAR-100などの実世界のマルチタスク問題について, 有望な実験結果を示す。また, 局所SGDの緩和対象に関する理論的解析を行い, 凝集ミスマッチが収束に及ぼす影響について考察する。

Non-iid data is prevalent in real-world federated learning problems. Data heterogeneity can come in different types in terms of distribution shifts. In this work, we are interested in the heterogeneity that comes from concept shifts, i.e., shifts in the prediction across clients. In particular, we consider multi-task learning, where we want the model to adapt to the task of the client. We propose a parameter-efficient framework to tackle this issue, where each client learns to mix between parameter-efficient adaptors according to its task. We use Low-Rank Adaptors (LoRAs) as the backbone and extend its concept to other types of layers. We call our framework Federated Low-Rank Adaptive Learning (FLoRAL). This framework is not an algorithm but rather a model parameterization for a multi-task learning objective, so it can work on top of any algorithm that optimizes this objective, which includes many algorithms from the literature. FLoRAL is memory-efficient, and clients are personalized with small states (e.g., one number per adaptor) as the adaptors themselves are federated. Hence, personalization is--in this sense--federated as well. Even though clients can personalize more freely by training an adaptor locally, we show that collaborative and efficient training of adaptors is possible and performs better. We also show that FLoRAL can outperform an ensemble of full models with optimal cluster assignment, which demonstrates the benefits of federated personalization and the robustness of FLoRAL to overfitting. We show promising experimental results on synthetic datasets, real-world federated multi-task problems such as MNIST, CIFAR-10, and CIFAR-100. We also provide a theoretical analysis of local SGD on a relaxed objective and discuss the effects of aggregation mismatch on convergence.

翻訳日:2024-11-02 21:50:00 公開日:2024-10-04

# FedStein: James-Stein Estimatorによるマルチドメインフェデレーション学習の促進

FedStein: Enhancing Multi-Domain Federated Learning Through James-Stein Estimator ( http://arxiv.org/abs/2410.03499v1 )

ライセンス: Link先を確認

Sunny Gupta, Nikita Jangid, Amit Sethi,

(参考訳) Federated Learning (FL)は、分散クライアント間の協調的なインサイトトレーニングを可能にすることで、データのプライバシを促進する。その固有の利点にもかかわらず、FLは独立で同一に分散されていないデータを扱う際に、パフォーマンスと収束の重大な課題に直面している(非i.d.)。従来,クライアント間でスキュードなラベル分散が問題視されてきたが,本研究では,異なる特徴分布を持つ異なるドメインからクライアントデータが派生するマルチドメインFLの課題に焦点をあてた。本稿では,FedStein: Enhancing Multi-Domain Federated Learning through the James-Stein Estimatorを提案する。 FedSteinは、ローカルBNパラメータを維持しながら、クライアント間でバッチ正規化(BN)統計のJames-Stein(JS)推定のみを共有する。非BN層パラメータは標準FL技術で交換される。 3つのデータセットと複数のモデルで実施された大規模な実験は、FedSteinがFedAvgやFedBNといった既存の手法を上回り、特定のドメインで精度が14%以上向上し、ドメインの一般化が促進されたことを示している。コードはhttps://github.com/sunnyinAI/FedSteinで入手できる。

Federated Learning (FL) facilitates data privacy by enabling collaborative in-situ training across decentralized clients. Despite its inherent advantages, FL faces significant challenges of performance and convergence when dealing with data that is not independently and identically distributed (non-i.i.d.). While previous research has primarily addressed the issue of skewed label distribution across clients, this study focuses on the less explored challenge of multi-domain FL, where client data originates from distinct domains with varying feature distributions. We introduce a novel method designed to address these challenges FedStein: Enhancing Multi-Domain Federated Learning Through the James-Stein Estimator. FedStein uniquely shares only the James-Stein (JS) estimates of batch normalization (BN) statistics across clients, while maintaining local BN parameters. The non-BN layer parameters are exchanged via standard FL techniques. Extensive experiments conducted across three datasets and multiple models demonstrate that FedStein surpasses existing methods such as FedAvg and FedBN, with accuracy improvements exceeding 14% in certain domains leading to enhanced domain generalization. The code is available at https://github.com/sunnyinAI/FedStein

翻訳日:2024-11-02 21:50:00 公開日:2024-10-04

# CliMedBench: 臨床シナリオにおける医学大言語モデル評価のための大規模中国語ベンチマーク

CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios ( http://arxiv.org/abs/2410.03502v1 )

ライセンス: Link先を確認

Zetian Ouyang, Yishuai Qiu, Linlin Wang, Gerard de Melo, Ya Zhang, Yanfeng Wang, Liang He,

(参考訳) 多様な領域におけるLarge Language Models (LLMs) の普及に伴い、臨床医療シナリオにおける統一的な評価基準が特に必要となる。 CliMedBenchは、7つの方向のLSMの医学的能力を評価するために特別に設計された14の専門的な臨床シナリオを備えた総合的なベンチマークである。上位第3階層の病院の実際の医療報告と、本物の検査演習から得られた33,735の質問から成り立っている。このベンチマークの信頼性はいくつかの点で確認されている。その後、既存のLLMを用いた実験により、以下の結果が得られた。 (i)中国の医学LLMは、特に医学的推論と事実的整合性が不可欠である場合において、臨床知識と診断精度の進歩の必要性を浮き彫りにしている。 (II)いくつかの一般ドメイン LLM は医療クリニックにおいて有意な可能性を示す一方で,多くの医療 LLM の入力能力の制限は,その実用性を妨げている。これらの結果から,臨床シナリオにおけるLSMの強度と限界が明らかとなり,臨床研究における重要な知見が得られた。

With the proliferation of Large Language Models (LLMs) in diverse domains, there is a particular need for unified evaluation standards in clinical medical scenarios, where models need to be examined very thoroughly. We present CliMedBench, a comprehensive benchmark with 14 expert-guided core clinical scenarios specifically designed to assess the medical ability of LLMs across 7 pivot dimensions. It comprises 33,735 questions derived from real-world medical reports of top-tier tertiary hospitals and authentic examination exercises. The reliability of this benchmark has been confirmed in several ways. Subsequent experiments with existing LLMs have led to the following findings: (i) Chinese medical LLMs underperform on this benchmark, especially where medical reasoning and factual consistency are vital, underscoring the need for advances in clinical knowledge and diagnostic accuracy. (ii) Several general-domain LLMs demonstrate substantial potential in medical clinics, while the limited input capacity of many medical LLMs hinders their practical use. These findings reveal both the strengths and limitations of LLMs in clinical scenarios and offer critical insights for medical research.

翻訳日:2024-11-02 21:50:00 公開日:2024-10-04

# 医療機器ディジタル双生児の不確実性を考慮した環境シミュレーション

Uncertainty-Aware Environment Simulation of Medical Devices Digital Twins ( http://arxiv.org/abs/2410.03504v1 )

ライセンス: Link先を確認

Hassan Sartaj, Shaukat Ali, Julie Marie Gjøby,

(参考訳) スマートメディカルデバイスは、IoT(Health Internet of Things)の不可欠なコンポーネントであり、IoTベースのアプリケーションを通じて、さまざまなヘルスケアサービスを提供する。システムと統合レベルのテストを通じて、そのようなアプリケーションの信頼性を確保することは、多くの医療機器の物理的統合を義務付ける。この文脈では、医療機器のデジタルツインが、自動化テストの促進に不可欠な役割を担っている。医療機器の不確実な環境要因を考慮せずに、デジタルツインでテストすることは、IoTベースの医療アプリケーションに未検証の機能を残している。さらに、環境要因のないディジタル双生児は、実際の環境で機能する対応するデバイスと同期し、未調整のままである。本稿では,医療機器のディジタル双生児の環境をモデル化し,シミュレーションするためのモデルベースアプローチ(EnvDT)を提案する。実世界のIoTベースのヘルスケアアプリケーションに接続された3つの薬品ディスペンサー、Karie, Medido, Pillyを使用して、EnvDTを実証的に評価した。評価対象は,デジタル双生児を対象とした環境モデルと不確実性シナリオの多様性の分析である。その結果,EnvDTは環境モデルの約61%のカバレッジを達成し,複数の環境シミュレーションにおいて,様々な不確実なシナリオ(最大ダイバーシティ値0.62)を生成することがわかった。

Smart medical devices are an integral component of the healthcare Internet of Things (IoT), providing patients with various healthcare services through an IoT-based application. Ensuring the dependability of such applications through system and integration-level testing mandates the physical integration of numerous medical devices, which is costly and impractical. In this context, digital twins of medical devices play an essential role in facilitating testing automation. Testing with digital twins without accounting for uncertain environmental factors of medical devices leaves many functionalities of IoT-based healthcare applications untested. In addition, digital twins operating without environmental factors remain out of sync and uncalibrated with their corresponding devices functioning in the real environment. To deal with these challenges, in this paper, we propose a model-based approach (EnvDT) for modeling and simulating the environment of medical devices' digital twins under uncertainties. We empirically evaluate the EnvDT using three medicine dispensers, Karie, Medido, and Pilly connected to a real-world IoT-based healthcare application. Our evaluation targets analyzing the coverage of environment models and the diversity of uncertain scenarios generated for digital twins. Results show that EnvDT achieves approximately 61% coverage of environment models and generates diverse uncertain scenarios (with a near-maximum diversity value of 0.62) during multiple environmental simulations.

翻訳日:2024-11-02 21:50:00 公開日:2024-10-04

# 分類Denoising Networks

Classification-Denoising Networks ( http://arxiv.org/abs/2410.03505v1 )

ライセンス: Link先を確認

Louis Thiry, Florentin Guth,

(参考訳) 画像分類と認知は、堅牢性の欠如や条件情報の部分的に無視という相補的な問題に悩まされる。両タスクを(ノイズの多い)画像とクラスラベルの結合確率のモデルで統一することで緩和できると論じる。フォワードパスで分類を行い、コンディショニングを行う。 Tweedie-Miyasawa式を用いて,楽譜を用いた復調関数の評価を行った。トレーニングの目的は、ノイズレベルを総合したクロスエントロピー損失とデノジングスコアマッチング損失の組み合わせである。 CIFAR-10 と ImageNet の数値実験では、参照深層畳み込み分類器/デノワよりも競合的な分類とデノナイジング性能を示し、従来のジョイントアプローチに比べて効率が大幅に向上した。本モデルでは, 標準的な識別型分類器と比較して, 対向的摂動に対する頑健さが向上し, 対向的勾配の新たな解釈が可能となった。

Image classification and denoising suffer from complementary issues of lack of robustness or partially ignoring conditioning information. We argue that they can be alleviated by unifying both tasks through a model of the joint probability of (noisy) images and class labels. Classification is performed with a forward pass followed by conditioning. Using the Tweedie-Miyasawa formula, we evaluate the denoising function with the score, which can be computed by marginalization and back-propagation. The training objective is then a combination of cross-entropy loss and denoising score matching loss integrated over noise levels. Numerical experiments on CIFAR-10 and ImageNet show competitive classification and denoising performance compared to reference deep convolutional classifiers/denoisers, and significantly improves efficiency compared to previous joint approaches. Our model shows an increased robustness to adversarial perturbations compared to a standard discriminative classifier, and allows for a novel interpretation of adversarial gradients as a difference of denoisers.

翻訳日:2024-11-02 21:50:00 公開日:2024-10-04

# 水中音響ネットワークにおける位置追跡による認証

Authentication by Location Tracking in Underwater Acoustic Networks ( http://arxiv.org/abs/2410.03511v1 )

ライセンス: Link先を確認

Gianmaria Ventura, Francesco Ardizzon, Stefano Tomasin,

(参考訳) 水中音響ネットワーク(UWAN)における物理層メッセージ認証は,送信装置の指紋として水中音響チャネル(UWAC)の特性を利用する。しかし、デバイスがUWACを変更すると、認証機構はそのようなバリエーションを追跡する必要がある。本稿では,まず,水中デバイスの位置を推定し,それに基づいて将来の位置を推定する,という2つのステップで動作するコンテキストベース認証機構を提案する。送信の真正性を確認するため,推定位置と予測位置を比較した。推定されたUWACのサンプル共分散行列を入力として、畳み込みニューラルネットワークを用いて位置を推定する。この予測は、カルマンフィルタまたはリカレントニューラルネットワーク(RNN)を使用する。予測位置と推定位置の2乗誤差に対して認証チェックを行う。カルマンフィルタに基づく解は、典型的な水中動作を再現する相関したガウス-マルコフ運動モデルに従って、RNN上に構築された解よりも優れる。

Physical layer message authentication in underwater acoustic networks (UWANs) leverages the characteristics of the underwater acoustic channel (UWAC) as a fingerprint of the transmitting device. However, as the device moves its UWAC changes, and the authentication mechanism must track such variations. In this paper, we propose a context-based authentication mechanism operating in two steps: first, we estimate the position of the underwater device, then we predict its future position based on the previously estimated ones. To check the authenticity of the transmission, we compare the estimated and the predicted position. The location is estimated using a convolutional neural network taking as input the sample covariance matrix of the estimated UWACs. The prediction uses either a Kalman filter or a recurrent neural network (RNN). The authentication check is performed on the squared error between the predicted and estimated positions. The solution based on the Kalman filter outperforms that built on the RNN when the device moves according to a correlated Gauss-Markov mobility model, which reproduces a typical underwater motion.

翻訳日:2024-11-02 21:50:00 公開日:2024-10-04

# Weisfeiler-Lemanの微粒化表現力--準同型数論の観点から

Fine-Grained Expressive Power of Weisfeiler-Leman: A Homomorphism Counting Perspective ( http://arxiv.org/abs/2410.03517v1 )

ライセンス: Link先を確認

Junru Zhou, Muhan Zhang,

(参考訳) グラフニューラルネットワーク(GNN)が準同型を数える能力は、最近、その表現力の実用的できめ細かい尺度として提案されている。いくつかの既存の研究で特定のGNNファミリーの準同型カウント能力について研究されているが、その問題を解析するためのシンプルで統一的な枠組みは欠如している。本稿では,まず,表現型GNNのフレキシブルな設計基盤として \emph{ Generalized folklore Weisfeiler-Leman (GFWL) アルゴリズムを提案する。検討されている設計空間は、既知の強力なGNNのほとんどすべてに対応するのに十分の大きさであるので、我々の結果は既存のすべての作業を大幅に拡張し、GNNモデル設計の自動化にその応用を見出すことができるかもしれない。

The ability of graph neural networks (GNNs) to count homomorphisms has recently been proposed as a practical and fine-grained measure of their expressive power. Although several existing works have investigated the homomorphism counting power of certain GNN families, a simple and unified framework for analyzing the problem is absent. In this paper, we first propose \emph{generalized folklore Weisfeiler-Leman (GFWL)} algorithms as a flexible design basis for expressive GNNs, and then provide a theoretical framework to algorithmically determine the homomorphism counting power of an arbitrary class of GNN within the GFWL design space. As the considered design space is large enough to accommodate almost all known powerful GNNs, our result greatly extends all existing works, and may find its application in the automation of GNN model design.

翻訳日:2024-11-02 21:50:00 公開日:2024-10-04

# 複雑な不均衡データストリームのためのオンラインバギングの改善

Improving Online Bagging for Complex Imbalanced Data Stream ( http://arxiv.org/abs/2410.03519v1 )

ライセンス: Link先を確認

Bartosz Przybyl, Jerzy Stefanowski,

(参考訳) 不均衡とデータストリームのドリフトから分類器を学ぶことは依然として課題である。現在の提案の多くは、グローバル不均衡比の変化を考慮に入れ、少数民族のサブコンセプトへの分解や、安全でない種類の例(国境線や珍しいもの)の存在など、局所的な困難要素を無視することに集中している。ストリームに存在する要因はオンライン分類器の性能を低下させる可能性があるため、安全でないマイノリティ事例の存在を考慮し、オンライン・バッグングの強化、すなわち、近隣アンダーサンプリング(Neighbourhood Undersampling)やオーバーサンプリング・オンライン・バッグング(Oversampling Online Bagging)を提案する。合成複素不均衡データストリームを用いた計算実験は、オンラインバッグ再サンプリングアンサンブルの以前の変種よりも有利であることを示した。

Learning classifiers from imbalanced and concept drifting data streams is still a challenge. Most of the current proposals focus on taking into account changes in the global imbalance ratio only and ignore the local difficulty factors, such as the minority class decomposition into sub-concepts and the presence of unsafe types of examples (borderline or rare ones). As the above factors present in the stream may deteriorate the performance of popular online classifiers, we propose extensions of resampling online bagging, namely Neighbourhood Undersampling or Oversampling Online Bagging to take better account of the presence of unsafe minority examples. The performed computational experiments with synthetic complex imbalanced data streams have shown their advantage over earlier variants of online bagging resampling ensembles.

翻訳日:2024-11-02 21:50:00 公開日:2024-10-04

# コード実行とテキスト推論の間の大規模言語モデルのステアリング

Steering Large Language Models between Code Execution and Textual Reasoning ( http://arxiv.org/abs/2410.03524v1 )

ライセンス: Link先を確認

Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma, Chuchu Fan, Chi Wang,

(参考訳) 最近の研究は、多エージェントフレームワークや推論チェーンを最適化することで、LLM(Large Language Models)のテキスト推論能力の向上に重点を置いているが、直接コーディングによる100%の成功によって、いくつかのベンチマークタスクを解決できる。テキスト推論は、数学、論理学、最適化、探索における課題を伴うタスクの解決に固有の制限があり、それは単にモデルとデータサイズをスケールアップするだけでは解決できない。最近リリースされたOpenAI GPT Code InterpreterとAutoGenのようなマルチエージェントフレームワークは、LCMを使って複雑なタスクを解くためにコード生成と実行を統合するのに顕著な能力を示した。しかし、14のタスクと6種類のLLM(新しいO1-previewを含む)を持つシングルターンとマルチターンの両方でコード/テキスト生成をステアリングするための既存の7つの方法の実験に基づいて、現在、必要な時にコードを書き込むのに最適な方法が存在しない。タスクの複雑さとモデルサイズへの進化によって、モデルがコードを使う場合と、テキストによる推論を使用する場合の興味深いパターンを見つけました。また、LLMで書かれたコードの結果が、たとえそのタスクがコードを通して解決できたとしても、テキスト推論よりも必ずしも良いとは限らないことを発見した。上記の問題を緩和するため,LLMのコード/テキスト生成を向上し,顕著な改善を実現するための3つの方法を提案する。トークン長とランタイムのコストは、すべてのメソッドで完全に議論されている。 LLMコード/テキスト生成のステアリング問題は今後の研究にとって重要であり、さらなる改善のための余地があると考えています。 Project Page、Datasets、Codesはhttps://yongchao98.github.io/CodeSteer/.comで入手できる。

While a lot of recent research focuses on enhancing the textual reasoning capabilities of Large Language Models (LLMs) by optimizing the multi-agent framework or reasoning chains, several benchmark tasks can be solved with 100% success through direct coding, which is more scalable and avoids the computational overhead associated with textual iterating and searching. Textual reasoning has inherent limitations in solving tasks with challenges in math, logics, optimization, and searching, which is unlikely to be solved by simply scaling up the model and data size. The recently released OpenAI GPT Code Interpreter and multi-agent frameworks such as AutoGen have demonstrated remarkable proficiency of integrating code generation and execution to solve complex tasks using LLMs. However, based on our experiments on 7 existing popular methods for steering code/text generation in both single- and multi-turn settings with 14 tasks and 6 types of LLMs (including the new O1-preview), currently there is no optimal method to correctly steer LLMs to write code when needed. We discover some interesting patterns on when models use code vs. textual reasoning with the evolution to task complexity and model sizes, which even result in an astonishingly inverse scaling law. We also discover that results from LLM written code are not always better than using textual reasoning, even if the task could be solved through code. To mitigate the above issues, we propose three methods to better steer LLM code/text generation and achieve a notable improvement. The costs of token lengths and runtime are thoroughly discussed for all the methods. We believe the problem of steering LLM code/text generation is critical for future research and has much space for further improvement. Project Page, Datasets, and Codes are available at https://yongchao98.github.io/CodeSteer/.

翻訳日:2024-11-02 21:50:00 公開日:2024-10-04

# 話す必要がない: 言語モデルの非同期混合

No Need to Talk: Asynchronous Mixture of Language Models ( http://arxiv.org/abs/2410.03529v1 )

ライセンス: Link先を確認

Anastasiia Filippova, Angelos Katharopoulos, David Grangier, Ronan Collobert,

(参考訳) SmallTalk LM(SmallTalk LM)は,言語モデルの混合をほぼ非同期に訓練する革新的な手法である。混合物の各モデルは、各モデルを訓練するノード間での高帯域通信を必要とせず、データ分散の異なる部分に特化している。推測では、短いプレフィックスによると、軽量ルータが与えられたシーケンスを単一の専門家に指示する。この推論スキームは、全混合モデルからパラメータのごく一部を自然に利用する。言語モデリング実験では,SmallTalk LMは,同一の訓練FLOPとほぼ同一の推論コストに対して,高密度モデルベースラインよりも難易度が著しく低いことを実証した。最後に、下流の評価では、タスクの75セント%で密度の高いベースラインを上回ります。

We introduce SmallTalk LM, an innovative method for training a mixture of language models in an almost asynchronous manner. Each model of the mixture specializes in distinct parts of the data distribution, without the need of high-bandwidth communication between the nodes training each model. At inference, a lightweight router directs a given sequence to a single expert, according to a short prefix. This inference scheme naturally uses a fraction of the parameters from the overall mixture model. Our experiments on language modeling demonstrate tha SmallTalk LM achieves significantly lower perplexity than dense model baselines for the same total training FLOPs and an almost identical inference cost. Finally, in our downstream evaluations we outperform the dense baseline on $75\%$ of the tasks.

翻訳日:2024-11-02 21:39:44 公開日:2024-10-04

# MARE: 教師なしRationale抽出におけるマルチアスペクトRationaleエクストラクタ

MARE: Multi-Aspect Rationale Extractor on Unsupervised Rationale Extraction ( http://arxiv.org/abs/2410.03531v1 )

ライセンス: Link先を確認

Han Jiang, Junwen Duan, Zhe Qu, Jianxin Wang,

(参考訳) 教師なしの合理性抽出は、明示的な合理性アノテーションなしでモデル予測をサポートするためにテキストスニペットを抽出することを目的としている。研究者はこの課題の解決に多くの努力を払ってきた。従来の作業は各側面を独立してエンコードすることが多く、アスペクト間の有意義な内部相関を捉える能力を制限する可能性がある。突発的相関を緩和する研究は盛んに行われてきたが,本手法では,有益な内部相関を利用して多視点理性抽出を改善することに重点を置いている。本稿では,複数の側面を同時に説明・予測するマルチアスペクト・ライタリー・エクストラクタ(MARE)を提案する。具体的には,複数のテキストチャンクを同時に符号化するハード削除に基づくマルチアスペクトマルチヘッドアテンション(MAMHA)機構を提案する。さらに、テキストの前に複数の特別なトークンをプリプションし、それぞれが1つの特定の側面に対応する。最後に、トレーニングオーバーヘッドを低減するためにマルチタスクトレーニングがデプロイされる。 2つの教師なし理性抽出ベンチマークの実験結果は、MAREが最先端の性能を達成することを示す。アブレーション研究により, 本手法の有効性がさらに示された。私たちのコードはhttps://github.com/CSU-NLP-Group/MAREで公開されています。

Unsupervised rationale extraction aims to extract text snippets to support model predictions without explicit rationale annotation. Researchers have made many efforts to solve this task. Previous works often encode each aspect independently, which may limit their ability to capture meaningful internal correlations between aspects. While there has been significant work on mitigating spurious correlations, our approach focuses on leveraging the beneficial internal correlations to improve multi-aspect rationale extraction. In this paper, we propose a Multi-Aspect Rationale Extractor (MARE) to explain and predict multiple aspects simultaneously. Concretely, we propose a Multi-Aspect Multi-Head Attention (MAMHA) mechanism based on hard deletion to encode multiple text chunks simultaneously. Furthermore, multiple special tokens are prepended in front of the text with each corresponding to one certain aspect. Finally, multi-task training is deployed to reduce the training overhead. Experimental results on two unsupervised rationale extraction benchmarks show that MARE achieves state-of-the-art performance. Ablation studies further demonstrate the effectiveness of our method. Our codes have been available at https://github.com/CSU-NLP-Group/MARE.

翻訳日:2024-11-02 21:39:44 公開日:2024-10-04

# NRGBoost:エネルギーベースで生長する高木

NRGBoost: Energy-Based Generative Boosted Trees ( http://arxiv.org/abs/2410.03535v1 )

ライセンス: Link先を確認

João Bravo,

(参考訳) 非構造データ領域におけるディープラーニングの優位性の高まりにもかかわらず、ランダムフォレスト(RF)やグラディエントブースト決定木(GBDT)のような木に基づく手法は、表層データにおける差別的タスクを扱うための作業場である。我々は、データ密度(正規化定数まで)を明示的にモデル化することに焦点を当て、これらの人気アルゴリズムの生成拡張を検討し、サンプリング以外のアプリケーションを可能にする。本研究の主な貢献として,XGBoost などの人気パッケージに実装された第2次ブースティングに類似したエネルギーベース生成促進アルゴリズムを提案する。提案アルゴリズムは,任意の入力変数に対して推論タスクを処理可能な生成モデルを生成する一方で,実世界のグラフデータセットにおいて,GBDTと類似の識別性能を実現し,代替生成手法よりも優れていることを示す。同時に、サンプリングのためのニューラルネットワークベースのモデルとも競合することを示した。

Despite the rise to dominance of deep learning in unstructured data domains, tree-based methods such as Random Forests (RF) and Gradient Boosted Decision Trees (GBDT) are still the workhorses for handling discriminative tasks on tabular data. We explore generative extensions of these popular algorithms with a focus on explicitly modeling the data density (up to a normalization constant), thus enabling other applications besides sampling. As our main contribution we propose an energy-based generative boosting algorithm that is analogous to the second order boosting implemented in popular packages like XGBoost. We show that, despite producing a generative model capable of handling inference tasks over any input variable, our proposed algorithm can achieve similar discriminative performance to GBDT on a number of real world tabular datasets, outperforming alternative generative approaches. At the same time, we show that it is also competitive with neural network based models for sampling.

翻訳日:2024-11-02 21:39:44 公開日:2024-10-04

# Ward: LLM透かしによる確率的RAGデータセット推論

Ward: Provable RAG Dataset Inference via LLM Watermarks ( http://arxiv.org/abs/2410.03537v1 )

ライセンス: Link先を確認

Nikola Jovanović, Robin Staab, Maximilian Baader, Martin Vechev,

(参考訳) Retrieval-Augmented Generation (RAG)は、ジェネレーション中に外部データを組み込むことでLLMを改善する。これにより、RAGシステムにおけるコンテンツの不正使用に対するデータ所有者の懸念が高まる。その重要性にもかかわらず、そのような不正使用を検出するという課題は未解決のままであり、近隣の分野からの既存のデータセットや方法論は研究に不適である。この作業では、このギャップを埋めるためにいくつかのステップを踏んでいます。まず、この問題を(ブラックボックス)RAGデータセット推論(RAG-DI)として定式化する。さらに,現実的な条件下でのRAG-DI手法のベンチマークに特化して設計された新しいデータセットを導入し,一連のベースラインアプローチを提案する。この基盤を基盤として,データ所有者がRAGシステムにおけるデータセットの使用に関する厳密な統計的保証を得られるようなLCM透かしに基づくRAG-DI手法であるWardを導入する。実験評価では、Wardは、多くの難易度設定において、全てのベースラインを一貫して上回り、高い精度、優れたクエリ効率、ロバスト性を実現している。我々の研究は今後のRAG-DI研究の基礎を提供し、この問題に対する将来的なアプローチとしてLCM透かしを強調します。

Retrieval-Augmented Generation (RAG) improves LLMs by enabling them to incorporate external data during generation. This raises concerns for data owners regarding unauthorized use of their content in RAG systems. Despite its importance, the challenge of detecting such unauthorized usage remains underexplored, with existing datasets and methodologies from adjacent fields being ill-suited for its study. In this work, we take several steps to bridge this gap. First, we formalize this problem as (black-box) RAG Dataset Inference (RAG-DI). To facilitate research on this challenge, we further introduce a novel dataset specifically designed for benchmarking RAG-DI methods under realistic conditions, and propose a set of baseline approaches. Building on this foundation, we introduce Ward, a RAG-DI method based on LLM watermarks that enables data owners to obtain rigorous statistical guarantees regarding the usage of their dataset in a RAG system. In our experimental evaluation, we show that Ward consistently outperforms all baselines across many challenging settings, achieving higher accuracy, superior query efficiency and robustness. Our work provides a foundation for future studies of RAG-DI and highlights LLM watermarks as a promising approach to this problem.

翻訳日:2024-11-02 21:39:44 公開日:2024-10-04

# アノテーションによる性差別とミソジニー分類の再検討

Re-examining Sexism and Misogyny Classification with Annotator Attitudes ( http://arxiv.org/abs/2410.03543v1 )

ライセンス: Link先を確認

Aiqi Jiang, Nikolas Vitsakis, Tanvi Dinkar, Gavin Abercrombie, Ioannis Konstas,

(参考訳) Gender-Based Violence(GBV)はオンライン上の問題だが、既存のデータセットは複数の可能なアノテータの視点を捉えたり、影響を受けるグループの表現を確実にすることができない。我々はGBVのモデレーションパイプラインにおいて,(1)手動データラベリング,(2)自動分類の2つの重要な段階を再考する。 1)アノテータのアイデンティティと態度の関係と,それらが2つのGBVラベリングタスクに与える応答について検討する。この目的のために,社会心理学の3つの検証された調査データを用いて,クラウドソーシングアノテータから人口統計情報と人口統計情報を収集した。右翼権威主義のスコアは、テキストをセクシストとしてラベル付けする確率が高いのに対して、社会的支配指向とネオセクシズムの態度では、高いスコアは、それを行う負の傾向に関連している。 2)大規模言語モデルと5つのプロンプト戦略を用いて分類実験を行う。以下に示す。 (i)アノテータの態度は、分類者のラベルの予測能力に影響を及ぼす。二適時情報を含むものは、よく構造化された簡潔な注釈書を用いて、性能を高めることができる。三モデルは、新しいラベルセットの複雑さと不均衡なクラスの増加を反映するのに苦労する。

Gender-Based Violence (GBV) is an increasing problem online, but existing datasets fail to capture the plurality of possible annotator perspectives or ensure the representation of affected groups. We revisit two important stages in the moderation pipeline for GBV: (1) manual data labelling; and (2) automated classification. For (1), we examine two datasets to investigate the relationship between annotator identities and attitudes and the responses they give to two GBV labelling tasks. To this end, we collect demographic and attitudinal information from crowd-sourced annotators using three validated surveys from Social Psychology. We find that higher Right Wing Authoritarianism scores are associated with a higher propensity to label text as sexist, while for Social Dominance Orientation and Neosexist Attitudes, higher scores are associated with a negative tendency to do so. For (2), we conduct classification experiments using Large Language Models and five prompting strategies, including infusing prompts with annotator information. We find: (i) annotator attitudes affect the ability of classifiers to predict their labels; (ii) including attitudinal information can boost performance when we use well-structured brief annotator descriptions; and (iii) models struggle to reflect the increased complexity and imbalanced classes of the new label sets.

翻訳日:2024-11-02 21:39:44 公開日:2024-10-04

# 単純重複によるデータ品質向上:責任ある計算社会科学研究をナビゲートする

Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research ( http://arxiv.org/abs/2410.03545v1 )

ライセンス: Link先を確認

Yida Mu, Mali Jin, Xingyi Song, Nikolaos Aletras,

(参考訳) 計算社会科学(CSS)のための自然言語処理(NLP)の研究は、ソーシャルメディアプラットフォームからのデータに大きく依存している。このデータは,オンラインコミュニティにおける社会言語現象の分析モデルの開発において重要な役割を担っている。本研究では,NLP for CSSで広く使われている20のデータセットの詳細な調査を行い,データ品質を包括的に調査する。分析の結果、ソーシャルメディアのデータセットは様々なレベルのデータ重複を示すことが明らかとなった。これにより、ラベルの不整合やデータの漏洩といった問題が発生し、モデルの信頼性が損なわれる。我々の研究結果は、データ重複が現在の最先端性能の主張に影響を与え、現実のシナリオにおけるモデルの有効性を過大評価する可能性があることを示唆している。最後に,ソーシャルメディアデータからデータセット開発を改善するための新しいプロトコルとベストプラクティスを提案する。

Research in natural language processing (NLP) for Computational Social Science (CSS) heavily relies on data from social media platforms. This data plays a crucial role in the development of models for analysing socio-linguistic phenomena within online communities. In this work, we conduct an in-depth examination of 20 datasets extensively used in NLP for CSS to comprehensively examine data quality. Our analysis reveals that social media datasets exhibit varying levels of data duplication. Consequently, this gives rise to challenges like label inconsistencies and data leakage, compromising the reliability of models. Our findings also suggest that data duplication has an impact on the current claims of state-of-the-art performance, potentially leading to an overestimation of model effectiveness in real-world scenarios. Finally, we propose new protocols and best practices for improving dataset development from social media data and its usage.

翻訳日:2024-11-02 21:39:44 公開日:2024-10-04

# 単一光子LiDARを用いた隠れ物体のイメージングによる自律走行の促進

Enhancing Autonomous Navigation by Imaging Hidden Objects using Single-Photon LiDAR ( http://arxiv.org/abs/2410.03555v1 )

ライセンス: Link先を確認

Aaron Young, Nevindu M. Batagoda, Harry Zhang, Akshat Dave, Adithya Pediredla, Dan Negrut, Ramesh Raskar,

(参考訳) 可視性に制限のある環境でのロバストな自律ナビゲーションは、ロボティクスにおける重要な課題である。単一光子LiDARを用いたNon-Line-of-Sight(NLOS)センシングによる視認性の向上と自律ナビゲーションの向上を目的とした新しいアプローチを提案する。提案手法により,移動ロボットは,マルチバウンス光情報を利用して「角を見回す」ことができ,インフラを付加せずに知覚範囲を効果的に拡張することができる。本研究では,(1)SPADベースのLiDARを用いてマルチバウンスヒストグラムをキャプチャするセンシング,(2)畳み込みニューラルネットワークを用いて隠れた領域の占有マップを推定する知覚,(3)ロボットが推定した占有状況に基づいて安全な経路を辿ることができる制御の3つのモジュールパイプラインを提案する。我々は,L字型廊下を隠れ障害物で航行する移動ロボットのシミュレーションと実世界実験により,我々のアプローチを評価する。我々の研究は、自律ナビゲーションのためのNLOSイメージングの初めての実験的なデモであり、複雑な環境で動くより安全で効率的なロボットシステムを実現するための道を開いた。我々はまた、NLOSシナリオをシミュレートし、この領域における将来の研究を促進するための、新しい動的統合トランジェントレンダリングフレームワークにも貢献する。

Robust autonomous navigation in environments with limited visibility remains a critical challenge in robotics. We present a novel approach that leverages Non-Line-of-Sight (NLOS) sensing using single-photon LiDAR to improve visibility and enhance autonomous navigation. Our method enables mobile robots to "see around corners" by utilizing multi-bounce light information, effectively expanding their perceptual range without additional infrastructure. We propose a three-module pipeline: (1) Sensing, which captures multi-bounce histograms using SPAD-based LiDAR; (2) Perception, which estimates occupancy maps of hidden regions from these histograms using a convolutional neural network; and (3) Control, which allows a robot to follow safe paths based on the estimated occupancy. We evaluate our approach through simulations and real-world experiments on a mobile robot navigating an L-shaped corridor with hidden obstacles. Our work represents the first experimental demonstration of NLOS imaging for autonomous navigation, paving the way for safer and more efficient robotic systems operating in complex environments. We also contribute a novel dynamics-integrated transient rendering framework for simulating NLOS scenarios, facilitating future research in this domain.

翻訳日:2024-11-02 21:29:56 公開日:2024-10-04

# færdXel:デンマークの交通法の専門家システム

færdXel: An Expert System for Danish Traffic Law ( http://arxiv.org/abs/2410.03560v1 )

ライセンス: Link先を確認

Luís Cruz-Filipe, Jonas Vistrup,

(参考訳) デンマークの交通法分野における象徴的推論ツール f{\ae}rdXel について述べる。 f{\ae}rdXelは、論理プログラミングの技法と新しいインターフェースを組み合わせることで、ユーザーは推論過程をナビゲートし、システムの信頼性を確保することができる。予備的な実証的な評価は、この研究が非常に有望であると見なされ、デンマークの法務部門で専門家を支援する現実世界のAIツールの基礎になる可能性があることを示している。

We present f{\ae}rdXel, a tool for symbolic reasoning in the domain of Danish traffic law. f{\ae}rdXel combines techniques from logic programming with a novel interface that allows users to navigate through its reasoning process, thereby ensuring the system's trustworthiness. A preliminary empirical evaluation indicates that this work is seen as very promising, and has the potential to become a foundation for real-world AI tools supporting professionals in the Danish legal sector.

翻訳日:2024-11-02 21:29:56 公開日:2024-10-04

# 吸収及び排出を補正するコードの種類

Class of codes correcting absorptions and emissions ( http://arxiv.org/abs/2410.03562v1 )

ライセンス: Link先を確認

Arda Aydin, Alexander Barg,

(参考訳) 我々は、全ての放射、吸収、軽視、エラーの上昇/低下を任意の順序で防止する一般的な量子符号群を構築する。このような符号は、文献では吸収放出符号(AE codes)として知られている。一般的なAE符号に対する簡易な誤り訂正条件を導出し、$\le t$エラーを補正する置換不変コードをAE符号にマッピングし、オーダー=t$遷移を補正できることを示す。置換不変符号のパラメータを慎重に調整し、全角運動量が少ないシステムでホストされる効率的なAE符号のいくつかの例を構築した。また、スピン符号をAE符号にマッピングすることで、そのような符号の特定のサブクラスに対する論理演算子を特徴付けることができることを示す。

We construct a general family of quantum codes that protect against all emission, absorption, dephasing, and raising/lowering errors up to an arbitrary fixed order. Such codes are known in the literature as absorption-emission (AE) codes. We derive simplified error correction conditions for a general AE code and show that any permutation-invariant code that corrects $\le t$ errors can be mapped to an AE code that corrects up to order-$t$ transitions. Carefully tuning the parameters of permutationally invariant codes, we construct several examples of efficient AE codes, hosted in systems with low total angular momentum. Our results also imply that spin codes can be mapped to AE codes, enabling us to characterize logical operators for certain subclasses of such codes.

翻訳日:2024-11-02 21:29:56 公開日:2024-10-04

# 強化学習における一般化のためのより到達可能な課題の育成

Training on more Reachable Tasks for Generalisation in Reinforcement Learning ( http://arxiv.org/abs/2410.03565v1 )

ライセンス: Link先を確認

Max Weltevrede, Caroline Horsch, Matthijs T. J. Spaan, Wendelin Böhmer,

(参考訳) マルチタスク強化学習では、エージェントは一定のタスクセットでトレーニングを行い、新しいタスクに一般化する必要がある。近年の研究では、探索の増加がこの一般化を改善することが示されているが、その理由は不明である。本稿では、マルチタスク強化学習における到達可能性の概念を導入し、初期探索フェーズがエージェントが訓練する到達可能なタスクの数を増やすことを示す。これは、探索の増大ではなく、到達不可能なタスクに対しても、一般化の改善の責任がある。そこで本研究では,各エピソードの開始時に,このような探索フェーズを実装する新しい手法であるExplore-Goを提案する。 Explore-Goは、経験の収集方法を変更するだけであり、既存のほとんどのオン・ポリティクスまたはオフ・ポリティクスの強化学習アルゴリズムで使用することができる。いくつかの一般的なアルゴリズムと組み合わせることで,本手法の有効性を実証し,いくつかの環境における一般化性能の向上を示す。

In multi-task reinforcement learning, agents train on a fixed set of tasks and have to generalise to new ones. Recent work has shown that increased exploration improves this generalisation, but it remains unclear why exactly that is. In this paper, we introduce the concept of reachability in multi-task reinforcement learning and show that an initial exploration phase increases the number of reachable tasks the agent is trained on. This, and not the increased exploration, is responsible for the improved generalisation, even to unreachable tasks. Inspired by this, we propose a novel method Explore-Go that implements such an exploration phase at the beginning of each episode. Explore-Go only modifies the way experience is collected and can be used with most existing on-policy or off-policy reinforcement learning algorithms. We demonstrate the effectiveness of our method when combined with some popular algorithms and show an increase in generalisation performance across several environments.

翻訳日:2024-11-02 21:17:55 公開日:2024-10-04

# 言語モデル(LLM)の言語的認識と言語非依存化に向けて

Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs) ( http://arxiv.org/abs/2410.03568v1 )

ライセンス: Link先を確認

Abrar Rahman, Garry Bowlin, Binit Mohanty, Sean McGunigal,

(参考訳) 本稿では,最先端の大規模言語モデル (LLM) が採用するトークン化技術と,それらが様々な言語,特に低リソース言語におけるサービスのコストと可用性に与える影響を包括的に研究する。この分析では、GPT-4(cl100k_base埋め込み)、GPT-3(p50k_base埋め込み)、DaVinci(r50k_base埋め込み)を含む複数のLCMと、広く使用されているBERTベーストークンーザが検討されている。本研究は,これらのモデル間で観測されるトークン化の多様性を評価し,サブワードトークン化における言語表現の課題について検討する。この研究は、特に伝統的にリソース不足の言語に対して、言語的に認識された開発プラクティスを育むことの重要性を強調している。さらに,電子健康記録(EHR)システムにおけるトークン化選択の現実的意味を強調するケーススタディを紹介する。本研究の目的は、AIアプリケーションで伝統的に表現されていない言語において、この領域以上のAIサービスの開発において、一般化可能な国際化(I18N)の実践を促進することである。

This paper presents a comprehensive study on the tokenization techniques employed by state-of-the-art large language models (LLMs) and their implications on the cost and availability of services across different languages, especially low resource languages. The analysis considers multiple LLMs, including GPT-4 (using cl100k_base embeddings), GPT-3 (with p50k_base embeddings), and DaVinci (employing r50k_base embeddings), as well as the widely used BERT base tokenizer. The study evaluates the tokenization variability observed across these models and investigates the challenges of linguistic representation in subword tokenization. The research underscores the importance of fostering linguistically-aware development practices, especially for languages that are traditionally under-resourced. Moreover, this paper introduces case studies that highlight the real-world implications of tokenization choices, particularly in the context of electronic health record (EHR) systems. This research aims to promote generalizable Internationalization (I18N) practices in the development of AI services in this domain and beyond, with a strong emphasis on inclusivity, particularly for languages traditionally underrepresented in AI applications.

翻訳日:2024-11-02 21:17:55 公開日:2024-10-04

# 変圧器の大規模化によるモジュラー算術の指導

Teaching Transformers Modular Arithmetic at Scale ( http://arxiv.org/abs/2410.03569v1 )

ライセンス: Link先を確認

Eshika Saxena, Alberto Alfarano, Emily Wenger, Kristin Lauter,

(参考訳) モジュラー加算は単純な演算である:$\mathbb{Z}_q$ の$N$要素が与えられたとき、その和 modulo $q$ が計算される。しかし、この問題に対するスケーラブルな機械学習ソリューションは、いまだ解明されていない: 事前作業は、N \le 6$ element mod $q \le 1000$を和算するMLモデルを訓練する。暗号解析のためのMLモデルの応用を実証する - 多くの場合、大きな$N$と$q$のモチベーションを持つモジュラー演算を伴う。この作業では、より多様なトレーニングデータ、角の埋め込み、カスタムロス関数という、モジュール追加モデルのトレーニングパイプラインに3つの変更を提案する。これらの変更で、我々は、N = 256, q = 3329$のアプローチで成功し、暗号アプリケーションにとって興味深いケースであり、以前の作業でN = 256, $q$が大幅に増加したことを実証した。これらの手法は他のモジュラー算術問題にも一般化し、将来の研究を動機付けている。

Modular addition is, on its face, a simple operation: given $N$ elements in $\mathbb{Z}_q$, compute their sum modulo $q$. Yet, scalable machine learning solutions to this problem remain elusive: prior work trains ML models that sum $N \le 6$ elements mod $q \le 1000$. Promising applications of ML models for cryptanalysis-which often involve modular arithmetic with large $N$ and $q$-motivate reconsideration of this problem. This work proposes three changes to the modular addition model training pipeline: more diverse training data, an angular embedding, and a custom loss function. With these changes, we demonstrate success with our approach for $N = 256, q = 3329$, a case which is interesting for cryptographic applications, and a significant increase in $N$ and $q$ over prior work. These techniques also generalize to other modular arithmetic problems, motivating future work.

翻訳日:2024-11-02 21:17:55 公開日:2024-10-04

# ソフトウェアエンジニアリング領域における生成AI: 職業的アイデンティティの緊張とアイデンティティ保護のパターン

Generative AI in the Software Engineering Domain: Tensions of Occupational Identity and Patterns of Identity Protection ( http://arxiv.org/abs/2410.03571v1 )

ライセンス: Link先を確認

Anuschka Schmitt, Krzysztof Z. Gajos, Osnat Mokryn,

(参考訳) 組織環境におけるジェネレーティブ・人工知能(GAI)の導入は、労働者の役割に疑問を投げかけ、それに関連して、長期的なスキル開発やドメインの専門知識に影響を及ぼす。ソフトウェアエンジニアリング領域における質的研究では、職業的アイデンティティの理論レンズと自己決定理論に基づいて、ソフトウェアエンジニアがどのようにして、なぜ自分たちの仕事にGAIを理解するのかを理解しています。技術者のセンスメイキングは、下級生や上級生が、能力、自律性、関連性に対するニーズが、GAIによって異なる影響を受けていると感じているため、ドメインの専門知識に依存していることが分かりました。我々は、職業的アイデンティティを保護するセンスメイキングに従事するエンジニアとして、暗黙のドメイン知識を保存する上で、個人の役割の重要性を強調した。本稿では、労働者の意識形成過程において組織がどのように積極的な役割を担っているかを説明し、組織やシステムデザイナが労働者の職業的アイデンティティに技術的変化が及ぼす影響をいかに促進するかに関する設計ガイドラインを提案する。

The adoption of generative Artificial Intelligence (GAI) in organizational settings calls into question workers' roles, and relatedly, the implications for their long-term skill development and domain expertise. In our qualitative study in the software engineering domain, we build on the theoretical lenses of occupational identity and self-determination theory to understand how and why software engineers make sense of GAI for their work. We find that engineers' sense-making is contingent on domain expertise, as juniors and seniors felt their needs for competence, autonomy, and relatedness to be differently impacted by GAI. We shed light on the importance of the individual's role in preserving tacit domain knowledge as engineers engaged in sense-making that protected their occupational identity. We illustrate how organizations play an active role in shaping workers' sense-making process and propose design guidelines on how organizations and system designers can facilitate the impact of technological change on workers' occupational identity.

翻訳日:2024-11-02 21:17:55 公開日:2024-10-04

# 木テンソルネットワークによる多変量関数の圧縮

Compressing multivariate functions with tree tensor networks ( http://arxiv.org/abs/2410.03572v1 )

ライセンス: Link先を確認

Joseph Tindall, Miles Stoudenmire, Ryan Levy,

(参考訳) テンソルネットワークは多次元データのための圧縮フォーマットである。 1次元テンソルネットワーク - テンソルトレイン (TT) や行列積状態 (MPS) と呼ばれるは、入力を離散二進数に「量子化」することで連続関数の数値アンザッツとして益々使われている。ここでは、この目的のために、より一般的なツリーテンソルネットワークのパワーを実証する。一般的な木テンソルネットワークとして多くの基本関数を直接構成し、テンソル交叉補間アルゴリズムの一般化によりより複雑な関数に対する補間構造を提供する。多次元の関数に対して、一般的に使用されるテンソルトレインよりもはるかに効率的なアンザッツが、より構造化されたツリーテンソルネットワークがどのように提供されるかを示す。本手法の多次元非線形フレドホルム方程式への応用を実演し、解の階数に厳密な境界を与え、ある問題に対してツリーテンソルネットワークのサイズで指数関数的スケーリング精度を保証する。

Tensor networks are a compressed format for multi-dimensional data. One-dimensional tensor networks -- often referred to as tensor trains (TT) or matrix product states (MPS) -- are increasingly being used as a numerical ansatz for continuum functions by "quantizing" the inputs into discrete binary digits. Here we demonstrate the power of more general tree tensor networks for this purpose. We provide direct constructions of a number of elementary functions as generic tree tensor networks and interpolative constructions for more complicated functions via a generalization of the tensor cross interpolation algorithm. For a range of multi-dimensional functions we show how more structured tree tensor networks offer a significantly more efficient ansatz than the commonly used tensor train. We demonstrate an application of our methods to solving multi-dimensional, non-linear Fredholm equations, providing a rigorous bound on the rank of the solution which, in turn, guarantees exponentially scaling accuracy with the size of the tree tensor network for certain problems.

翻訳日:2024-11-02 21:17:55 公開日:2024-10-04

# HyResPINNs:物理インフォームドモデリングのためのニューラルネットワークとRBFコンポーネントの最適組み合わせ学習のための適応型ハイブリッド残差ネットワーク

HyResPINNs: Adaptive Hybrid Residual Networks for Learning Optimal Combinations of Neural and RBF Components for Physics-Informed Modeling ( http://arxiv.org/abs/2410.03573v1 )

ライセンス: Link先を確認

Madison Cooley, Robert M. Kirby, Shandian Zhe, Varun Shankar,

(参考訳) 物理インフォームドニューラルネットワーク(英: Physics-informed Neural Network, PINN)は、PDE(偏微分方程式)の数値解法において、関連するPDE項で正規化された損失関数を用いてニューラルネットワークを訓練し、物理的制約を強制する手法として人気が高まっている。我々はHyResPINNと呼ばれる新しい種類のPINNを提案し、標準ニューラルネットワークと放射基底関数(RBF)ネットワークの出力を組み合わせた適応型ハイブリッド残差ブロックで従来のPINNを拡張した。提案手法の重要な特徴は,ニューラルネットワークとRBFネットワーク出力の寄与度を動的に学習する残差ブロックに適応的な組み合わせパラメータを組み込むことである。さらに、残余ブロック間の適応的な接続は、ネットワーク全体の柔軟な情報フローを可能にする。 HyResPINNは従来のPINNよりも、ポイントロケーションやニューラルネットワークアーキテクチャのトレーニングに堅牢であることを示す。さらに、HyResPINNは特定の問題に対する競合メソッドよりも桁違いに精度が高く、トレーニングコストはわずかに増加している。我々は,Allen-Cahn方程式やDarcy-Flow方程式など,PDEの挑戦に対するアプローチの強みを実証する。この結果から,HyResPINNは従来の数値解法と現代の機械学習に基づく解法とのギャップを効果的に埋めることが示唆された。

Physics-informed neural networks (PINNs) are an increasingly popular class of techniques for the numerical solution of partial differential equations (PDEs), where neural networks are trained using loss functions regularized by relevant PDE terms to enforce physical constraints. We present a new class of PINNs called HyResPINNs, which augment traditional PINNs with adaptive hybrid residual blocks that combine the outputs of a standard neural network and a radial basis function (RBF) network. A key feature of our method is the inclusion of adaptive combination parameters within each residual block, which dynamically learn to weigh the contributions of the neural network and RBF network outputs. Additionally, adaptive connections between residual blocks allow for flexible information flow throughout the network. We show that HyResPINNs are more robust to training point locations and neural network architectures than traditional PINNs. Moreover, HyResPINNs offer orders of magnitude greater accuracy than competing methods on certain problems, with only modest increases in training costs. We demonstrate the strengths of our approach on challenging PDEs, including the Allen-Cahn equation and the Darcy-Flow equation. Our results suggest that HyResPINNs effectively bridge the gap between traditional numerical methods and modern machine learning-based solvers.

翻訳日:2024-11-02 21:17:55 公開日:2024-10-04

# 低リソースのインデックス言語に対するテーブル質問回答

Table Question Answering for Low-resourced Indic Languages ( http://arxiv.org/abs/2410.03576v1 )

ライセンス: Link先を確認

Vaishali Pal, Evangelos Kanoulas, Andrew Yates, Maarten de Rijke,

(参考訳) TableQAは構造化された情報のテーブル上で質問に答え、個々のセルやテーブルを出力として返すタスクである。 TableQAの研究は、主に高リソース言語に焦点を当てており、注釈付きデータやニューラルモデルが不足しているため、中低リソース言語はほとんど進歩していない。予算が限られている低リソース言語に対して,完全に自動化された大規模テーブルQAデータ生成プロセスを導入することで,このギャップに対処する。表QAデータセットやモデルを持たない2つのIndic言語であるBengaliとHindiにデータ生成手法を組み込む。大規模データセットに基づいてトレーニングされたTableQAモデルは、最先端のLLMよりも優れています。さらに、数学的推論能力やゼロショット言語間移動など、異なる側面の訓練されたモデルについて研究する。当社の作業は、スケーラブルなデータ生成と評価手順に焦点を当てた、低リソースのテーブルQAに関する最初のものです。提案手法は,Web が存在する低リソース言語にも適用可能である。データセット、モデル、コード(https://github.com/kolk/Low-Resource-TableQA-Indic-langs)をリリースします。

TableQA is the task of answering questions over tables of structured information, returning individual cells or tables as output. TableQA research has focused primarily on high-resource languages, leaving medium- and low-resource languages with little progress due to scarcity of annotated data and neural models. We address this gap by introducing a fully automatic large-scale tableQA data generation process for low-resource languages with limited budget. We incorporate our data generation method on two Indic languages, Bengali and Hindi, which have no tableQA datasets or models. TableQA models trained on our large-scale datasets outperform state-of-the-art LLMs. We further study the trained models on different aspects, including mathematical reasoning capabilities and zero-shot cross-lingual transfer. Our work is the first on low-resource tableQA focusing on scalable data generation and evaluation procedures. Our proposed data generation method can be applied to any low-resource language with a web presence. We release datasets, models, and code (https://github.com/kolk/Low-Resource-TableQA-Indic-languages).

翻訳日:2024-11-02 21:17:55 公開日:2024-10-04

# メモリスペース・ビジュアル・リトラクションによるマルチモーダル大言語モデルにおける幻覚の緩和

Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models ( http://arxiv.org/abs/2410.03577v1 )

ライセンス: Link先を確認

Xin Zou, Yizhou Wang, Yibo Yan, Sirui Huang, Kening Zheng, Junkai Chen, Chang Tang, Xuming Hu,

(参考訳) その印象的な能力にもかかわらず、マルチモーダル大言語モデル(MLLM)は幻覚の影響を受けやすい。上記の課題に対処するために、私たちは共通の認知プロセスに従います - 重要なオンサイトの詳細の初期の記憶が消えると、現実的で正確な答えを求めるために、もう一度彼らを見るのは直感的です。そこで我々は,外部知識検索や追加の微調整を必要とせず,新たな幻覚緩和パラダイムであるメモリスペース・ビジュアル・リトラクション(MemVR)を導入する。特に、モデルが不確かである場合や、質問関連視覚記憶に注意を払っている場合、フィードフォワードネットワーク(FFN)を介してMLLMにリジェクションされる補助的証拠として視覚刺激をキーバリューメモリとして扱う。総合的な実験的評価により、MemVRは様々なMLLMの幻覚問題を著しく軽減し、追加の時間オーバーヘッドを発生させることなく、一般的なベンチマークで優れていることが示される。

Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) are susceptible to hallucinations, especially assertively fabricating content not present in the visual inputs. To address the aforementioned challenge, we follow a common cognitive process - when one's initial memory of critical on-sight details fades, it is intuitive to look at them a second time to seek a factual and accurate answer. Therefore, we introduce Memory-space Visual Retracing (MemVR), a novel hallucination mitigation paradigm that without the need for external knowledge retrieval or additional fine-tuning. In particular, we treat visual prompts as supplementary evidence to be reinjected into MLLMs via Feed Forward Network (FFN) as key-value memory, when the model is uncertain or even amnesic about question-relevant visual memories. Comprehensive experimental evaluations demonstrate that MemVR significantly mitigates hallucination issues across various MLLMs and excels in general benchmarks without incurring added time overhead, thus emphasizing its potential for widespread applicability.

翻訳日:2024-11-02 21:17:55 公開日:2024-10-04

# 自動運転車開発におけるビデオデータ検索のためのマルチモデルアプローチ

A Multi-model Approach for Video Data Retrieval in Autonomous Vehicle Development ( http://arxiv.org/abs/2410.03580v1 )

ライセンス: Link先を確認

Jesper Knapp, Klas Moberg, Yuchuan Jin, Simin Sun, Miroslaw Staron,

(参考訳) 自律運転ソフトウェアは毎秒大量のデータを生成し、それはソフトウェア開発組織が将来の分析とテストのためにログ形式で保存する。しかし、このデータの大きさを考えると、車両ログのコレクション内の特定のシナリオを特定することは困難である。これらのシナリオを見つけるために正しいSQLクエリを書くには、エンジニアがSQLと特定のデータベースに強いバックグラウンドを持つ必要があり、さらに検索プロセスが複雑になる。本稿では,SQLの代わりに自然言語記述を用いて,ログコレクションの特定のシナリオを検索するパイプラインを提示し,評価する。生成した記述は、Zenseactで車両ログを1から5のスケールで作業するエンジニアによって評価された。私たちのアプローチは平均3.3のスコアを獲得し、ソフトウェア開発ワークフローを改善するためにマルチモデルアーキテクチャを使うことの可能性を示しました。また、クエリプロセスを視覚化し、結果を視覚化するインターフェースも提示する。

Autonomous driving software generates enormous amounts of data every second, which software development organizations save for future analysis and testing in the form of logs. However, given the vast size of this data, locating specific scenarios within a collection of vehicle logs can be challenging. Writing the correct SQL queries to find these scenarios requires engineers to have a strong background in SQL and the specific databases in question, further complicating the search process. This paper presents and evaluates a pipeline that allows searching for specific scenarios in log collections using natural language descriptions instead of SQL. The generated descriptions were evaluated by engineers working with vehicle logs at the Zenseact on a scale from 1 to 5. Our approach achieved a mean score of 3.3, demonstrating the potential of using a multi-model architecture to improve the software development workflow. We also present an interface that can visualize the query process and visualize the results.

翻訳日:2024-11-02 21:17:55 公開日:2024-10-04

# 散逸型Landau-Zenerモデルにおける量子軌道の統計的解析

Statistical analysis of quantum trajectories in dissipative Landau-Zener model ( http://arxiv.org/abs/2410.03582v1 )

ライセンス: Link先を確認

Laleh Memarzadeh, Rosario Fazio,

(参考訳) マルコフ過程を行うランダウ・ツェナー・ハミルトニアンを持つ2レベル系における量子ジャンプの統計について述べる。断熱・非断熱・非断熱のシミュレーションに成功しているランダウ・ツェナーモデルについて, 2種類の散逸を考察する。第一に、ジャンプ作用素のプロジェクトは、初期基底状態とハミルトニアンの励起状態に$t\to -\infty$で記述する。第2のタイプでは、ジャンプ作用素はハミルトニアンの瞬時固有状態に射影する。量子軌道法により、両方のモデルに対する断熱的および非断熱的状態におけるジャンプ数の確率を示す。さらに、進化の時間間隔におけるジャンプの統計を実証する。また, 浴槽温度, 環境との結合強度, スピンカップリング方向が量子ジャンプの統計に与える影響を示す。

We present statistics of quantum jumps in the two-level system with landau-Zener Hamiltonian that undergoes a Markovian process. For the Landau-Zener model, which is successful in simulating adiabatic/non-adiabatic evolution and quantum annealing, we consider two types of dissipation. In the first one, the jump operators project states to the initial ground state and excited state of the Hamiltonian at $t\to -\infty$. In the second type, the jump operators project to the instantaneous eigenstates of the Hamiltonian. By the quantum trajectories approach, we present the probability of the number of jumps in adiabatic and non-adiabatic regimes for both models. Furthermore, we demonstrate the statistics of jumps in time intervals of the evolutions. Also, we show the role of bath temperature, coupling strength to the environment, and spin-coupling directions on the statistics of quantum jumps.

翻訳日:2024-11-02 21:17:55 公開日:2024-10-04

# MeDeT:少人数のメタラーニングによる医療機器デジタルツインズ作成

MeDeT: Medical Device Digital Twins Creation with Few-shot Meta-learning ( http://arxiv.org/abs/2410.03585v1 )

ライセンス: Link先を確認

Hassan Sartaj, Shaukat Ali, Julie Marie Gjøby,

(参考訳) システムと統合レベルにおける医療用IoT(Internet of Things)アプリケーションをテストするには、さまざまなタイプの医療機器を統合する必要がある。医療機器の導入の課題 (i)その連続的な進化により、すべての装置の変種を含ませることが不可能となり、 (ii) 大規模な厳密なテストには複数のデバイスとそのバリエーションが必要で、それは時間集約的でコストがかかり、実用的ではない。私たちの共同研究者であるOslo City’s Health Departmentは、自動テストインフラストラクチャの開発において、これらの課題に直面しました。本稿では,医療機器のディジタルツイン(DT)を生成し,進化するデバイスにDTを適用するメタラーニングベースアプローチ(MeDeT)を提案する。我々は、現実世界の医療用IoTアプリケーションと統合された5つの広く使われている医療機器を用いて、OsloCityのコンテキストでMeDeTを評価する。評価では,様々なデバイスやバージョンにまたがるDTの生成と適応を行うMeDeTの能力,これらのDTの忠実度,1000個のDTの同時動作のスケーラビリティ,関連する時間的コストを評価した。その結果、MeDeTは96%以上の忠実度でDTを生成し、異なるデバイスや新しいバージョンにDTを適応させ、時間コスト(約1分)を削減し、忠実度レベルを維持しながらスケーラブルな1000のDTを運用でき、テスト用の物理デバイスの代わりに機能することがわかった。

Testing healthcare Internet of Things (IoT) applications at system and integration levels necessitates integrating numerous medical devices of various types. Challenges of incorporating medical devices are: (i) their continuous evolution, making it infeasible to include all device variants, and (ii) rigorous testing at scale requires multiple devices and their variants, which is time-intensive, costly, and impractical. Our collaborator, Oslo City's health department, faced these challenges in developing automated test infrastructure, which our research aims to address. In this context, we propose a meta-learning-based approach (MeDeT) to generate digital twins (DTs) of medical devices and adapt DTs to evolving devices. We evaluate MeDeT in OsloCity's context using five widely-used medical devices integrated with a real-world healthcare IoT application. Our evaluation assesses MeDeT's ability to generate and adapt DTs across various devices and versions using different few-shot methods, the fidelity of these DTs, the scalability of operating 1000 DTs concurrently, and the associated time costs. Results show that MeDeT can generate DTs with over 96% fidelity, adapt DTs to different devices and newer versions with reduced time cost (around one minute), and operate 1000 DTs in a scalable manner while maintaining the fidelity level, thus serving in place of physical devices for testing.

翻訳日:2024-11-02 21:17:55 公開日:2024-10-04

# 不均衡分類における性能・適応性向上のためのハイパーパラメータ分布の学習

Training Over a Distribution of Hyperparameters for Enhanced Performance and Adaptability on Imbalanced Classification ( http://arxiv.org/abs/2410.03588v1 )

ライセンス: Link先を確認

Kelsey Lieberman, Swarna Kamlam Ravindran, Shuai Yuan, Carlo Tomasi,

(参考訳) 二項分類はよく研究されている問題であるが、厳密なクラス不均衡の下での信頼性の高い分類器の訓練は依然として課題である。最近の技術は、損失関数や最適化方法を変更することにより、トレーニングにおける不均衡の悪影響を軽減する。これらの損失関数上の異なるハイパーパラメータ値が、異なるリコール値でよりよく機能することを示す。我々は,この事実を,単一値の代わりにハイパーパラメータ値の分布を1つのモデルで学習し,LCT(Loss Conditional Training)を介して活用することを提案する。実験により、ハイパーパラメータの分布に対するトレーニングは、いくつかのモデルのパフォーマンスを近似するだけでなく、CIFARおよびメラノーマや糖尿病網膜症検出などの実際の医療画像アプリケーションにおけるモデル全体のパフォーマンスを実際に改善することが示された。さらに、LCTを用いたトレーニングモデルは、スクラッチから再トレーニングする必要なく、個々のニーズを満たすためにトレーニング後にいくつかのハイパーパラメータチューニングを行うことができるため、より効率的である。

Although binary classification is a well-studied problem, training reliable classifiers under severe class imbalance remains a challenge. Recent techniques mitigate the ill effects of imbalance on training by modifying the loss functions or optimization methods. We observe that different hyperparameter values on these loss functions perform better at different recall values. We propose to exploit this fact by training one model over a distribution of hyperparameter values--instead of a single value--via Loss Conditional Training (LCT). Experiments show that training over a distribution of hyperparameters not only approximates the performance of several models but actually improves the overall performance of models on both CIFAR and real medical imaging applications, such as melanoma and diabetic retinopathy detection. Furthermore, training models with LCT is more efficient because some hyperparameter tuning can be conducted after training to meet individual needs without needing to retrain from scratch.

翻訳日:2024-11-02 21:17:55 公開日:2024-10-04

# 変分ベイズガウススプレイティング

Variational Bayes Gaussian Splatting ( http://arxiv.org/abs/2410.03592v1 )

ライセンス: Link先を確認

Toon Van de Maele, Ozan Catal, Alexander Tschantz, Christopher L. Buckley, Tim Verbelen,

(参考訳) 近年,3Dガウシアン・スプラッティングはガウシアンの混合物を用いた3Dシーンのモデリングに有望なアプローチとして出現している。これらのモデルの主な最適化方法は、連続的なデータストリームを扱う際の破滅的な忘れ込みに苦労する、微分可能なレンダリングパイプラインによる勾配のバックプロパゲートに依存している。この制限に対処するために,モデルパラメータに対する変分推論としてガウススプレートをトレーニングするための新しいアプローチである変分ベイズ・ガウススプラッティング(VBGS)を提案する。多変量ガウスの共役特性を利用することで、バッファを再生することなく部分的な逐次観測から効率的な更新を可能にする閉形式変分更新規則を導出する。実験の結果,VBGSは静的データセット上での最先端性能と一致しただけでなく,逐次ストリーミングされた2Dデータや3Dデータからの連続学習も可能であり,この設定における性能を大幅に向上させることができた。

Recently, 3D Gaussian Splatting has emerged as a promising approach for modeling 3D scenes using mixtures of Gaussians. The predominant optimization method for these models relies on backpropagating gradients through a differentiable rendering pipeline, which struggles with catastrophic forgetting when dealing with continuous streams of data. To address this limitation, we propose Variational Bayes Gaussian Splatting (VBGS), a novel approach that frames training a Gaussian splat as variational inference over model parameters. By leveraging the conjugacy properties of multivariate Gaussians, we derive a closed-form variational update rule, allowing efficient updates from partial, sequential observations without the need for replay buffers. Our experiments show that VBGS not only matches state-of-the-art performance on static datasets, but also enables continual learning from sequentially streamed 2D and 3D data, drastically improving performance in this setting.