Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240611となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 都市音環境のより良い可視化に向けて--インタビューからの考察 Towards better visualizations of urban sound environments: insights from interviews ( http://arxiv.org/abs/2407.16889v1 ) ライセンス: Link先を確認	Modan Tailleur, Pierre Aumond, Vincent Tourre, Mathieu Lagrange,	(参考訳) 都市騒音マップや騒音の可視化は伝統的に都市全体の騒音レベルをマクロ的に表現している。しかし、これらの表現は、これらの音環境に関連する音の知覚を正確に計測するのに失敗し、知覚は関連する音源に大きく依存する。本稿では,そのような表現が重要であると想定される都市住民を特定し,音源の表現の必要性を分析することを目的とする。様々な都会の利害関係者とのインタビューを通じて, 現状の実践, 既存ツールの強み, 弱点, 既存の都市音環境表現に音源を組み込むことの意義について考察した。本研究において,音源表現の3つの異なる利用法が出現した。 1) 工業者及び専門市民に対する騒音に関する苦情 2【市民の音質評価】 3)都市計画者の指導。視覚化は、対象のオーディエンスに適応したインジケータを使用し、データアクセシビリティーを可能にする。 Urban noise maps and noise visualizations traditionally provide macroscopic representations of noise levels across cities. However, those representations fail at accurately gauging the sound perception associated with these sound environments, as perception highly depends on the sound sources involved. This paper aims at analyzing the need for the representations of sound sources, by identifying the urban stakeholders for whom such representations are assumed to be of importance. Through spoken interviews with various urban stakeholders, we have gained insight into current practices, the strengths and weaknesses of existing tools and the relevance of incorporating sound sources into existing urban sound environment representations. Three distinct use of sound source representations emerged in this study: 1) noise-related complaints for industrials and specialized citizens, 2) soundscape quality assessment for citizens, and 3) guidance for urban planners. Findings also reveal diverse perspectives for the use of visualizations, which should use indicators adapted to the target audience, and enable data accessibility.	翻訳日:2024-08-05 01:45:45 公開日:2024-06-11
# 赤外線可視画像融合のためのセマンティック・アウェア・マルチガイドネットワーク A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion ( http://arxiv.org/abs/2407.06159v1 ) ライセンス: Link先を確認	Xiaoli Zhang, Liying Wang, Libo Zhao, Xiongfei Li, Siwei Ma,	(参考訳) マルチモダリティ画像融合は、2つのソース画像から特定のモダリティ情報と共有モダリティ情報を融合することを目的としている。複雑な場面における特徴抽出の不十分さと意味認識の欠如に対処するために, 相関型分解特徴をモデル化し, 補足的特徴と多誘導的特徴集合を効率的に抽出することで高レベルのグラフ表現をモデル化する方法に焦点を当てる。本稿では,3分岐エンコーダデコーダアーキテクチャと,それに対応する融合層を融合戦略として提案する。深部畳み込み後の浅部特徴抽出にマルチDconv Transposed Attention と Local-enhanced Feed Forward Network を用いた変圧器を用いる。 3つの並列ブランチエンコーダでは、CAI(Cross Attention and Invertible Block)が局所的な特徴を抽出し、高周波テクスチャの詳細を保存することができる。残った接続を持つベース機能抽出モジュール(BFE)は、長距離依存性をキャプチャし、共有モダリティ表現能力を向上することができる。グラフ推論モジュール(GR)は、高レベルなクロスモダリティ関係を推論し、CAIの特定のモダリティ補完情報として低レベルな詳細特徴を同時に抽出するために導入された。可視・近赤外画像融合と医用画像融合タスクにおける最先端手法と比較して,本手法が競争力のある結果を得たことを示す実験結果を得た。さらに、その後のタスクで他の融合法を上回り、オブジェクト検出では平均9.78% mAP@.5、セマンティックセグメンテーションでは6.46% mIoUと評価した。 Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images. To tackle the problem of insufficient feature extraction and lack of semantic awareness for complex scenes, this paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation by efficiently extracting complementary features and multi-guided feature aggregation. We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy. The transformer with Multi-Dconv Transposed Attention and Local-enhanced Feed Forward network is used to extract shallow features after the depthwise convolution. In the three parallel branches encoder, Cross Attention and Invertible Block (CAI) enables to extract local features and preserve high-frequency texture details. Base feature extraction module (BFE) with residual connections can capture long-range dependency and enhance shared-modality expression capabilities. Graph Reasoning Module (GR) is introduced to reason high-level cross-modality relations and extract low-level details features as CAI's specific-modality complementary information simultaneously. Experiments demonstrate that our method has obtained competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks. Moreover, we surpass other fusion methods in terms of subsequent tasks, averagely scoring 9.78% mAP@.5 higher in object detection and 6.46% mIoU higher in semantic segmentation.	翻訳日:2024-07-22 14:07:46 公開日:2024-06-11
# 生成AIを用いたコンテキストパーソナライズされたプログラミング演習の評価 Evaluating Contextually Personalized Programming Exercises Created with Generative AI ( http://arxiv.org/abs/2407.11994v1 ) ライセンス: Link先を確認	Evanfiya Logacheva, Arto Hellas, James Prather, Sami Sarsa, Juho Leinonen,	(参考訳) プログラミングスキルは、様々なハンズオンエクササイズを完了して開発されるのが一般的である。このようなプログラミング問題は、学生の興味や文化的背景に文脈化することができる。教育心理学における先行研究は、運動の文脈的パーソナライゼーションが学習者の状況的関心を刺激し、彼らのエンゲージメントに肯定的な影響を及ぼすことを示した。しかし、学生が実践するための多様な包括的なプログラミング演習を作成することは、コンピュータサイエンス教育者にとって時間と労力のかかる課題である。従来の研究では、大きな言語モデルが概念的および文脈的に関連するプログラミング演習を生成できることが示されている。そのため、学生の興味やニーズに合ったパーソナライズされたプログラミング問題を自動的に生成することが可能になる。本報告では,GPT-4で作成した文脈的にパーソナライズされたプログラミング演習を含む,選択型プログラミングコースで実施されるユーザスタディについて報告する。運動の質は,学生と著者の両方が評価した。さらに,本研究は,創生運動に対する学生の態度とシステムとの関わりについて検討した。その結果, GPT-4で発生する運動の質は概して高かった。さらに、参加者は興味を持ち、役に立ちます。このことは、AIが生成するプログラミング問題は、学生が自分の個人的関心や教育的ニーズに合わせた、事実上無制限の実践資料を提供するため、入門プログラミングコースに価値を付加する可能性があることを示唆している。 Programming skills are typically developed through completing various hands-on exercises. Such programming problems can be contextualized to students' interests and cultural backgrounds. Prior research in educational psychology has demonstrated that context personalization of exercises stimulates learners' situational interests and positively affects their engagement. However, creating a varied and comprehensive set of programming exercises for students to practice on is a time-consuming and laborious task for computer science educators. Previous studies have shown that large language models can generate conceptually and contextually relevant programming exercises. Thus, they offer a possibility to automatically produce personalized programming problems to fit students' interests and needs. This article reports on a user study conducted in an elective introductory programming course that included contextually personalized programming exercises created with GPT-4. The quality of the exercises was evaluated by both the students and the authors. Additionally, this work investigated student attitudes towards the created exercises and their engagement with the system. The results demonstrate that the quality of exercises generated with GPT-4 was generally high. What is more, the course participants found them engaging and useful. This suggests that AI-generated programming problems can be a worthwhile addition to introductory programming courses, as they provide students with a practically unlimited pool of practice material tailored to their personal interests and educational needs.	翻訳日:2024-07-22 11:30:12 公開日:2024-06-11
# FoldToken2: コンパクトで不変で生成的タンパク質構造言語を学ぶ FoldToken2: Learning compact, invariant and generative protein structure language ( http://arxiv.org/abs/2407.00050v1 ) ライセンス: Link先を確認	Zhangyang Gao, Cheng Tan, Stan Z. Li,	(参考訳) 3D座標の等価性は、タンパク質構造表現学習、アライメント、生成において長期にわたる課題を提起している。タンパク質構造を等価に表現するコンパクトで不変な言語を作成できるだろうか? この目的に向けて、FoldToken2を提案し、元の構造の復元性を維持しながら、同変構造を離散トークンに転送する。 FoldToken1からFoldToken2へ、(1)不変構造エンコーダ、(2)ベクトル量子化圧縮機、(3)等価構造デコーダの3つのキーコンポーネントを改善した。タンパク質構造再構築タスクにおいてFoldToken2を評価したところ,従来のFoldToken1はTMScoreで20倍,RMSDで81倍であった。 FoldToken2はおそらく、単一鎖と多鎖タンパク質の量子化の両方でうまく機能する最初の方法である。我々はFoldToken2が、タンパク質構造表現学習、構造アライメント、構造生成タスクのさらなる改善をもたらすと考えている。 The equivalent nature of 3D coordinates has posed long term challenges in protein structure representation learning, alignment, and generation. Can we create a compact and invariant language that equivalently represents protein structures? Towards this goal, we propose FoldToken2 to transfer equivariant structures into discrete tokens, while maintaining the recoverability of the original structures. From FoldToken1 to FoldToken2, we improve three key components: (1) invariant structure encoder, (2) vector-quantized compressor, and (3) equivalent structure decoder. We evaluate FoldToken2 on the protein structure reconstruction task and show that it outperforms previous FoldToken1 by 20\% in TMScore and 81\% in RMSD. FoldToken2 probably be the first method that works well on both single-chain and multi-chain protein structures quantization. We believe that FoldToken2 will inspire further improvement in protein structure representation learning, structure alignment, and structure generation tasks.	翻訳日:2024-07-07 13:43:41 公開日:2024-06-11
# PreSto:レコメンデーションモデルのトレーニングのためのストレージ内データ前処理システム PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models ( http://arxiv.org/abs/2406.14571v1 ) ライセンス: Link先を確認	Yunjae Lee, Hyeseong Kim, Minsoo Rhu,	(参考訳) トレーニングレコメンデーションシステム(RecSys)は、大量の生データを前処理し、それらをGPUにシームレスに供給するために、"データ前処理"ステージを必要とするため、いくつかの課題に直面している。高いトレーニングスループットを維持するために、最先端のソリューションは大量のCPUサーバを事前処理のために予約する。我々の特徴は、RecSysプリプロセッシングにおいて、CPU中心のプリプロセッシングは機能生成と機能正規化操作にボトルネックがあることである。 PreStoは、ISP(In-Storage Processing)を活用するストレージ中心の事前処理システムです。 PreStoは、エンド・ツー・エンドのプリプロセッシング時間で9.6\times$スピードアップ、4.3\times$コスト効率の向上、1.3\times$エネルギ効率をプロダクションスケールのRecSysプリプロセッシングで平均して1.3\times$エネルギ効率の向上で、ベースラインのCPU中心システムより優れていることを示す。 Training recommendation systems (RecSys) faces several challenges as it requires the "data preprocessing" stage to preprocess an ample amount of raw data and feed them to the GPU for training in a seamless manner. To sustain high training throughput, state-of-the-art solutions reserve a large fleet of CPU servers for preprocessing which incurs substantial deployment cost and power consumption. Our characterization reveals that prior CPU-centric preprocessing is bottlenecked on feature generation and feature normalization operations as it fails to reap out the abundant inter-/intra-feature parallelism in RecSys preprocessing. PreSto is a storage-centric preprocessing system leveraging In-Storage Processing (ISP), which offloads the bottlenecked preprocessing operations to our ISP units. We show that PreSto outperforms the baseline CPU-centric system with a $9.6\times$ speedup in end-to-end preprocessing time, $4.3\times$ enhancement in cost-efficiency, and $11.3\times$ improvement in energyefficiency on average for production-scale RecSys preprocessing.	翻訳日:2024-07-01 07:21:04 公開日:2024-06-11
# ディープラーニングによる大規模市場均衡計算 Large-Scale Contextual Market Equilibrium Computation through Deep Learning ( http://arxiv.org/abs/2406.15459v1 ) ライセンス: Link先を確認	Yunxuan Ma, Yide Bian, Hao Xu, Weitao Yang, Jingshu Zhao, Zhijian Duan, Feng Wang, Xiaotie Deng,	(参考訳) 市場均衡は、経済学と社会最適化分析における最も基本的な解決策の1つである。市場均衡計算に関する既存の研究は、主に比較的少数の購入者による設定に焦点を当てている。そこで本研究では,購入者と商品がコンテキストによって表される大規模購入者人口のシナリオにおける市場均衡の計算について検討する。この現実的で一般化された市場モデルに基づいて、市場均衡を近似する深層学習に基づく手法であるMarketFCNetを導入する。まず、買い手と買い手のコンテキストにのみ依存するニューラルネットワークを用いて、買い手ごとに各商品の割り当てをパラメータ化することから始める。次に,学習アルゴリズムの損失関数を非バイアスで推定する効率的な手法を提案し,勾配降下によるネットワークパラメータの最適化を可能にする。近似解を評価するために、市場均衡から与えられた割当と価格対の偏差を定量化するナッシュギャップと呼ばれる計量を導入する。実験結果から,MarketFCNetは市場規模が拡大するにつれて,既存の手法に比べて競争性能とランニングタイムを著しく低下させ,大規模市場均衡の近似を加速する深層学習手法の可能性を示した。 Market equilibrium is one of the most fundamental solution concepts in economics and social optimization analysis. Existing works on market equilibrium computation primarily focus on settings with a relatively small number of buyers. Motivated by this, our paper investigates the computation of market equilibrium in scenarios with a large-scale buyer population, where buyers and goods are represented by their contexts. Building on this realistic and generalized contextual market model, we introduce MarketFCNet, a deep learning-based method for approximating market equilibrium. We start by parameterizing the allocation of each good to each buyer using a neural network, which depends solely on the context of the buyer and the good. Next, we propose an efficient method to estimate the loss function of the training algorithm unbiasedly, enabling us to optimize the network parameters through gradient descent. To evaluate the approximated solution, we introduce a metric called Nash Gap, which quantifies the deviation of the given allocation and price pair from the market equilibrium. Experimental results indicate that MarketFCNet delivers competitive performance and significantly lower running times compared to existing methods as the market scale expands, demonstrating the potential of deep learning-based methods to accelerate the approximation of large-scale contextual market equilibrium.	翻訳日:2024-07-01 07:01:19 公開日:2024-06-11
# RACon:検索機能強化されたキャラクタロコモーション制御 RACon: Retrieval-Augmented Simulated Character Locomotion Control ( http://arxiv.org/abs/2406.17795v1 ) ライセンス: Link先を確認	Yuxuan Mu, Shihao Zou, Kangning Yin, Zheng Tian, Li Cheng, Weinan Zhang, Jun Wang,	(参考訳) コンピュータアニメーションでは、シミュレートされたキャラクターをライフライクな動きで運転することは困難である。現在の生成モデルは多様な動作に一般化できるが、エンドユーザー制御の応答性に問題を引き起こすことが多い。これらの問題に対処するために, RACon: Retrieval-Augmented Simulated Character Locomotion Controlを提案する。エンドツーエンドの階層的強化学習法は,レトリバーとモーションコントローラを利用する。検索者は、ユーザ指定データベースからタスク指向で動きの専門家を検索し、ユーザの制御に対する応答性を高める。選択された動きの専門家と操作信号は、シミュレートされたキャラクタを駆動するためにコントローラに転送される。さらに、トレーニングプロセスの安定化を図るために、検索強化判別器を設計する。本手法は,実証実験で実証したように,移動制御における品質と量の両方において既存の手法を超越した手法である。さらに、検索用の広範囲なデータベースを切り替えることで、実行時に独特の動作タイプに適応することができる。 In computer animation, driving a simulated character with lifelike motion is challenging. Current generative models, though able to generalize to diverse motions, often pose challenges to the responsiveness of end-user control. To address these issues, we introduce RACon: Retrieval-Augmented Simulated Character Locomotion Control. Our end-to-end hierarchical reinforcement learning method utilizes a retriever and a motion controller. The retriever searches motion experts from a user-specified database in a task-oriented fashion, which boosts the responsiveness to the user's control. The selected motion experts and the manipulation signal are then transferred to the controller to drive the simulated character. In addition, a retrieval-augmented discriminator is designed to stabilize the training process. Our method surpasses existing techniques in both quality and quantity in locomotion control, as demonstrated in our empirical study. Moreover, by switching extensive databases for retrieval, it can adapt to distinctive motion types at run time.	翻訳日:2024-07-01 06:21:45 公開日:2024-06-11
# KROP(Knowledge Return Oriented Prompting) Knowledge Return Oriented Prompting (KROP) ( http://arxiv.org/abs/2406.11880v1 ) ライセンス: Link先を確認	Jason Martin, Kenneth Yeung,	(参考訳) 多くのLarge Language Models (LLMs) と LLM ベースのアプリが現在デプロイされており、ある種のプロンプトフィルタやアライメントを使用して、それらの整合性を保護する。しかし、これらの措置はばかげたものではない。 KROPは即発注入攻撃を回避し,これらの安全対策のほとんどにおいて事実上検出不可能な手法である。 Many Large Language Models (LLMs) and LLM-powered apps deployed today use some form of prompt filter or alignment to protect their integrity. However, these measures aren't foolproof. This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks, rendering them virtually undetectable to most of these security measures.	翻訳日:2024-06-23 13:24:48 公開日:2024-06-11
# Meent: 機械学習のための微分可能な電磁シミュレータ Meent: Differentiable Electromagnetic Simulator for Machine Learning ( http://arxiv.org/abs/2406.12904v1 ) ライセンス: Link先を確認	Yongha Kim, Anthony W. Jung, Sanmun Kim, Kevin Octavian, Doyoung Heo, Chaejin Park, Jeongmin Shin, Sunghyun Nam, Chanhyung Park, Juho Park, Sangjun Han, Jinmyoung Lee, Seolho Kim, Min Seok Jang, Chan Y. Park,	(参考訳) 電磁法(EM)シミュレーションは、太陽電池、半導体デバイス、イメージセンサー、将来のディスプレイ、集積フォトニックデバイスなどのサブ波長スケール構造を持つデバイスを解析・設計する上で重要な役割を担っている。具体的には、半導体デバイス構造の推定やナノフォトニクスデバイスの設計といった光学的問題によって、遠く離れた現実世界への影響に関する興味深い研究トピックが提供される。このようなタスクの伝統的なアルゴリズムは、アルゴリズムとEMシミュレーションの両方の計算コストが高いため、しばしば準最適結果をもたらすシミュレーションを通じてパラメータを反復的に精錬する必要がある。機械学習(ML)は、これらの課題を軽減するための有望な候補として現れ、光学研究コミュニティは、さまざまなタスクにわたる古典的手法を超える結果を得るために、MLアルゴリズムをますます採用している。光と機械学習のコミュニティ間の相乗的コラボレーションを促進するためには、両方の研究コミュニティに親しみやすいEMシミュレーションソフトウェアを持つことが不可欠である。この目的のために,厳密な結合波解析(RCWA)を用いたEMシミュレーションソフトウェアであるMeentを提案する。 Pythonで開発され、自動微分(AD)機能を備えたMeentは、光学研究にMLを統合するための汎用プラットフォームとして機能し、その逆も可能である。研究プラットフォームとしての実用性を実証するため、Meentの3つの応用を提示する。 1) 神経オペレーターの訓練用データセットの作成 2)ナノフォトニックデバイス最適化の強化学習環境として機能し、 3)勾配型最適化器を用いた逆問題に対する解を提供する。これらの応用は、EMシミュレーションとML方法論の両方を前進させるMeentの可能性を浮き彫りにする。コードはMITライセンスのhttps://github.com/kc-ml2/meentで公開されている。 Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reaching real world impact. Traditional algorithms for such tasks require iteratively refining parameters through simulations, which often yield sub-optimal results due to the high computational cost of both the algorithms and EM simulations. Machine learning (ML) emerged as a promising candidate to mitigate these challenges, and optics research community has increasingly adopted ML algorithms to obtain results surpassing classical methods across various tasks. To foster a synergistic collaboration between the optics and ML communities, it is essential to have an EM simulation software that is user-friendly for both research communities. To this end, we present Meent, an EM simulation software that employs rigorous coupled-wave analysis (RCWA). Developed in Python and equipped with automatic differentiation (AD) capabilities, Meent serves as a versatile platform for integrating ML into optics research and vice versa. To demonstrate its utility as a research platform, we present three applications of Meent: 1) generating a dataset for training neural operator, 2) serving as an environment for the reinforcement learning of nanophotonic device optimization, and 3) providing a solution for inverse problems with gradient-based optimizers. These applications highlight Meent's potential to advance both EM simulation and ML methodologies. The code is available at https://github.com/kc-ml2/meent with the MIT license to promote the cross-polinations of ideas among academic researchers and industry practitioners.	翻訳日:2024-06-23 13:15:04 公開日:2024-06-11
# PufferLib: 強化学習ライブラリと環境の遊び方 PufferLib: Making Reinforcement Learning Libraries and Environments Play Nice ( http://arxiv.org/abs/2406.12905v1 ) ライセンス: Link先を確認	Joseph Suarez,	(参考訳) 環境、モデル、強化学習ライブラリがあり、一緒に動作するように設計されていますが、そうではありません。 PufferLibは、それらをうまく演奏させる。このライブラリは、一般的な互換性問題を排除し、トレーニングを加速するために高速なベクトル化を行うワンライン環境ラッパーを提供する。 PufferLibを使えば、CleanRLやSB3といった慣れ親しんだライブラリを使って、AtariやProcgenといった古典的なベンチマークからNetHackやNeural MMOのような複雑なシミュレータまでスケールすることができる。 pipパッケージとビルド済みのイメージは、数十の環境に依存しています。私たちのコードはすべてMITライセンスの下でフリーでオープンソースで、ベースライン、ドキュメント、pufferai.github.ioでのサポートが完備しています。 You have an environment, a model, and a reinforcement learning library that are designed to work together but don't. PufferLib makes them play nice. The library provides one-line environment wrappers that eliminate common compatibility problems and fast vectorization to accelerate training. With PufferLib, you can use familiar libraries like CleanRL and SB3 to scale from classic benchmarks like Atari and Procgen to complex simulators like NetHack and Neural MMO. We release pip packages and prebuilt images with dependencies for dozens of environments. All of our code is free and open-source software under the MIT license, complete with baselines, documentation, and support at pufferai.github.io.	翻訳日:2024-06-23 13:15:04 公開日:2024-06-11
# Flextron: マルチインワンのフレキシブルな大言語モデル Flextron: Many-in-One Flexible Large Language Model ( http://arxiv.org/abs/2406.10260v1 ) ライセンス: Link先を確認	Ruisi Cai, Saurav Muralidharan, Greg Heinrich, Hongxu Yin, Zhangyang Wang, Jan Kautz, Pavlo Molchanov,	(参考訳) 現代のLSMのトレーニングは非常にリソース集約的であり、反復的なトレーニングを通じて限られた計算資源とメモリ資源によって特徴づけられる様々な展開シナリオをカスタマイズするのは現実的ではない。本稿では,フレキシブルモデル展開をサポートするネットワークアーキテクチャとポストトレーニングモデル最適化フレームワークであるFlextronを紹介する。 Flextronアーキテクチャはネストされた弾性構造を利用して、追加の微調整を必要とせず、推論中に特定のユーザ定義のレイテンシと精度ターゲットに迅速に適応する。入力適応性も備えており、トークンをサブネットワーク経由で自動的にルーティングすることで、パフォーマンスと効率を向上させることができる。本稿では,既存のLLMをFlextronモデルに体系的に変換する,サンプル効率のよい学習手法と関連するルーティングアルゴリズムを提案する。我々は,LPMのGPT-3およびLLama-2ファミリ上でFlextronを評価し,複数のエンドツーエンドトレーニングされた変種や他の最先端の弾性ネットワークよりも優れた性能を示す。 Training modern LLMs is extremely resource intensive, and customizing them for various deployment scenarios characterized by limited compute and memory resources through repeated training is impractical. In this paper, we introduce Flextron, a network architecture and post-training model optimization framework supporting flexible model deployment. The Flextron architecture utilizes a nested elastic structure to rapidly adapt to specific user-defined latency and accuracy targets during inference with no additional fine-tuning required. It is also input-adaptive, and can automatically route tokens through its sub-networks for improved performance and efficiency. We present a sample-efficient training method and associated routing algorithms for systematically transforming an existing trained LLM into a Flextron model. We evaluate Flextron on the GPT-3 and LLama-2 family of LLMs, and demonstrate superior performance over multiple end-to-end trained variants and other state-of-the-art elastic networks, all with a single pretraining run that consumes a mere 7.63% tokens compared to original pretraining.	翻訳日:2024-06-19 01:31:17 公開日:2024-06-11
# FoodSky:シェフとダイエットテストに合格した食品指向の大規模言語モデル FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination ( http://arxiv.org/abs/2406.10261v1 ) ライセンス: Link先を確認	Pengfei Zhou, Weiqing Min, Chaoran Fu, Ying Jin, Mingyu Huang, Xiangyang Li, Shuhuan Mei, Shuqiang Jiang,	(参考訳) 食べ物は人間の生活の基礎であり、栄養源としてだけでなく、文化的アイデンティティや社会的相互作用の基盤としても機能している。グローバルな食生活のニーズと嗜好の複雑さが増大するにつれて、レシピ生成や食事推奨から食生活と食生活の相関関係の発見や理解まで、食品の認識と推論を可能にするために、食品知性が必要である。この目標に向けて,Large Language Models (LLMs) における様々なドメインやタスクにまたがる強力な機能を実現するために,食品指向の LLM FoodSky を導入し,食品データの認識と推論を通じて理解する。中国料理の複雑さと典型性を考慮すると、まず、さまざまな権威ソースから1つの総合的な中華料理コーパス「FoodEarth」を構築し、食品関連データを深く理解するためにFoodSkyが活用する。そこで,我々は,食品の微細なセマンティクスを捕捉し,コンテキスト対応の食品関連テキストを生成する際に,食品Skyを強化するために,トピックベースの選択状態空間モデル(TS3M)と階層的トピック検索拡張生成(HTRAG)機構を提案する。以上の結果から,食生活において,食生活は食生活と食生活の両方において,食生活の汎用的LLMよりも有意に優れており,それぞれ67.2%,66.4%が中国食生活と食生活の総合的LLMよりも優れていたことが示唆された。 FoodSkyは、料理の創造性を高め、健康的な食事パターンを促進するだけでなく、食品分野における複雑な現実世界の問題に対処する、ドメイン固有のLLMの新しい標準も設定している。 FoodSkyのオンラインデモはhttp://222.92.101.211:8200で公開されている。 Food is foundational to human life, serving not only as a source of nourishment but also as a cornerstone of cultural identity and social interaction. As the complexity of global dietary needs and preferences grows, food intelligence is needed to enable food perception and reasoning for various tasks, ranging from recipe generation and dietary recommendation to diet-disease correlation discovery and understanding. Towards this goal, for powerful capabilities across various domains and tasks in Large Language Models (LLMs), we introduce Food-oriented LLM FoodSky to comprehend food data through perception and reasoning. Considering the complexity and typicality of Chinese cuisine, we first construct one comprehensive Chinese food corpus FoodEarth from various authoritative sources, which can be leveraged by FoodSky to achieve deep understanding of food-related data. We then propose Topic-based Selective State Space Model (TS3M) and the Hierarchical Topic Retrieval Augmented Generation (HTRAG) mechanism to enhance FoodSky in capturing fine-grained food semantics and generating context-aware food-relevant text, respectively. Our extensive evaluations demonstrate that FoodSky significantly outperforms general-purpose LLMs in both chef and dietetic examinations, with an accuracy of 67.2% and 66.4% on the Chinese National Chef Exam and the National Dietetic Exam, respectively. FoodSky not only promises to enhance culinary creativity and promote healthier eating patterns, but also sets a new standard for domain-specific LLMs that address complex real-world issues in the food domain. An online demonstration of FoodSky is available at http://222.92.101.211:8200.	翻訳日:2024-06-19 01:31:17 公開日:2024-06-11
# Sinkhornアルゴリズムを用いた公正ランキング問題の高速解法 Fast solution to the fair ranking problem using the Sinkhorn algorithm ( http://arxiv.org/abs/2406.10262v1 ) ライセンス: Link先を確認	Yuki Uehara, Shunnosuke Ikeda, Naoki Nishimura, Koya Ohashi, Yilin Li, Jie Yang, Deddy Jobson, Xingxia Zha, Takeshi Matsumoto, Noriyoshi Sukegawa, Yuichi Takano,	(参考訳) オンラインフリーマーケットのような両面のマーケットプレースでは、消費者にパーソナライズされたアイテムランキングを提供するレコメンデーションシステムが、プロバイダとコンシューマ間の取引を促進する上で重要な役割を担っている。一方、両面の市場は、消費者の満足度と公正度をバランスさせ、商品提供者の活動を刺激する問題に直面している。サイトーとヨアヒムズ(2022)は、公正な分割に基づくナッシュ社会福祉を最大化するインパクトに基づく公正格付け法を考案したが、この方法は、大規模に制約された非線形最適化問題を解くことを必要としており、実際的なレコメンデーターシステムに適用することは極めて困難である。そこで本稿では,インパクトに基づく公正ランキング問題に対する高速な解法を提案する。まず、公正ランキング問題を制約のない最適化問題に変換し、シンクホーンアルゴリズムを繰り返し実行する勾配上昇法を設計する。実験の結果,提案アルゴリズムは高品質で,商用最適化ソフトウェアよりも約1000倍高速であることがわかった。 In two-sided marketplaces such as online flea markets, recommender systems for providing consumers with personalized item rankings play a key role in promoting transactions between providers and consumers. Meanwhile, two-sided marketplaces face the problem of balancing consumer satisfaction and fairness among items to stimulate activity of item providers. Saito and Joachims (2022) devised an impact-based fair ranking method for maximizing the Nash social welfare based on fair division; however, this method, which requires solving a large-scale constrained nonlinear optimization problem, is very difficult to apply to practical-scale recommender systems. We thus propose a fast solution to the impact-based fair ranking problem. We first transform the fair ranking problem into an unconstrained optimization problem and then design a gradient ascent method that repeatedly executes the Sinkhorn algorithm. Experimental results demonstrate that our algorithm provides fair rankings of high quality and is about 1000 times faster than application of commercial optimization software.	翻訳日:2024-06-19 01:31:17 公開日:2024-06-11
# 批判モデルによるコード補完における適応検索のための軽量フレームワーク A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model ( http://arxiv.org/abs/2406.10263v1 ) ライセンス: Link先を確認	Wenrui Zhang, Tiehang Fu, Ting Yuan, Ge Zhang, Dong Chen, Jie Wang,	(参考訳) Retrieval-Augmented Generationの最近の進歩は、リポジトリレベルでコード補完を大幅に強化した。 RAGをベースとした様々なコード補完システムが設計選択に基づいて提案されている。例えば、検索生成プロセスを何度も繰り返すコストで、より多くの効率性を得ることができます。しかし、現在の手法における検索の非差別的使用は、検索のかなりの部分が不要であり、コード言語モデルに有害または有害な提案をもたらす可能性があるため、効率と有効性の両面での問題を明らかにする。これらの課題に対処するために,検索の必要性に関する洞察を提供し,複数の予測から最適な回答を選択するための軽量な批判手法であるCARDを紹介した。 CARDは任意のRAGベースのコード補完システムにシームレスに統合できる。評価の結果,CARDは21%から46%,API完了の14%から40%,関数完了の6%から46.5%を削減し,精度を向上した。 CARDはレイテンシを16%から83%に削減する。 CARDは異なるLM、レトリバー、プログラミング言語に一般化可能である。軽量で、数秒でトレーニングし、数ミリ秒で推論する。 Recent advancements in Retrieval-Augmented Generation have significantly enhanced code completion at the repository level. Various RAG-based code completion systems are proposed based on different design choices. For instance, gaining more effectiveness at the cost of repeating the retrieval-generation process multiple times. However, the indiscriminate use of retrieval in current methods reveals issues in both efficiency and effectiveness, as a considerable portion of retrievals are unnecessary and may introduce unhelpful or even harmful suggestions to code language models. To address these challenges, we introduce CARD, a lightweight critique method designed to provide insights into the necessity of retrievals and select the optimal answer from multiple predictions. CARD can seamlessly integrate into any RAG-based code completion system. Our evaluation shows that CARD saves 21% to 46% times of retrieval for Line completion, 14% to 40% times of retrieval for API completion, and 6% to 46.5% times of retrieval for function completion respectively, while improving the accuracy. CARD reduces latency ranging from 16% to 83%. CARD is generalizable to different LMs, retrievers, and programming languages. It is lightweight with training in few seconds and inference in few milliseconds.	翻訳日:2024-06-19 01:31:17 公開日:2024-06-11
# 大規模言語モデルを用いたマルチモーダルひずみセンサシステムによる引張の形状認識・モニタリング・ヒューマンインタラクション Large Language Model-empowered multimodal strain sensory system for shape recognition, monitoring, and human interaction of tensegrity ( http://arxiv.org/abs/2406.10264v1 ) ライセンス: Link先を確認	Zebing Mao, Ryota Kobayashi, Hiroyuki Nabae, Koichi Suzumori,	(参考訳) 引張に基づくシステムは、不均一で予測不可能な環境、特に宇宙探査を動的に探索する上で有望なアプローチである。しかし、このようなシステムの実装は、状態認識、無線監視、ヒューマンインタラクション、スマート分析とアドバイス機能といった知的側面の観点からの課題を提示している。本稿では,深層学習モデルと大規模言語モデルの両方を活用することで,24個のマルチモーダルひずみセンサと6本のストラット張力積分を導入することにより,スマートな張力を実現する。長期記憶モデルによって補助される導電性フレキシブル腱を用いて、伸縮性は外部センサを使わずに自己形状の再構成を実現する。フレスコサーバとgpt-3.5-turboモデルを統合することで、緊張度は自動でiPhoneにデータを送信してワイヤレス監視を可能にし、意思決定のためにデータ分析、説明、予測、提案を提供する。最後に、テングレティの人間間相互作用システムは、人間の言語的側面からテングレティの必要な情報を得るのに役立つ。全体として、このインテリジェントな緊張感に基づくシステムは、未来の探索の可能性を示しており、現実世界のアプリケーションに汎用的なツールとなっている。 A tensegrity-based system is a promising approach for dynamic exploration of uneven and unpredictable environments, particularly, space exploration. However, implementing such systems presents challenges in terms of intelligent aspects: state recognition, wireless monitoring, human interaction, and smart analyzing and advising function. Here, we introduce a 6-strut tensegrity integrate with 24 multimodal strain sensors by leveraging both deep learning model and large language models to realize smart tensegrity. Using conductive flexible tendons assisted by long short-term memory model, the tensegrity achieves the self-shape reconstruction without extern sensors. Through integrating the flask server and gpt-3.5-turbo model, the tensegrity autonomously enables to send data to iPhone for wireless monitoring and provides data analysis, explanation, prediction, and suggestions to human for decision making. Finally, human interaction system of the tensegrity helps human obtain necessary information of tensegrity from the aspect of human language. Overall, this intelligent tensegrity-based system with self-sensing tendons showcases potential for future exploration, making it a versatile tool for real-world applications.	翻訳日:2024-06-19 01:31:17 公開日:2024-06-11
# 感情分析のための言語モデルの改善:認知科学からの洞察 Improving Language Models for Emotion Analysis: Insights from Cognitive Science ( http://arxiv.org/abs/2406.10265v1 ) ライセンス: Link先を確認	Constant Bonard, Gustave Cortal,	(参考訳) 本稿では、感情分析のための言語モデルを改善するために、認知科学研究を感情とコミュニケーションに活用することを提案する。まず,心理学と認知科学の主な感情理論について述べる。次に、自然言語処理における感情アノテーションの主な方法とその心理理論との関係について紹介する。また、認知実用論における感情コミュニケーションの2つの主要な分析方法について述べる。最後に,認知科学研究に基づき,感情分析のための言語モデルを改善するための方向性を提案する。これらの研究は、人間の感情とコミュニケーションの異なる側面を考慮して、新しい注釈体系の構築方法と感情理解のためのベンチマークの道を開くことを示唆している。 We propose leveraging cognitive science research on emotions and communication to improve language models for emotion analysis. First, we present the main emotion theories in psychology and cognitive science. Then, we introduce the main methods of emotion annotation in natural language processing and their connections to psychological theories. We also present the two main types of analyses of emotional communication in cognitive pragmatics. Finally, based on the cognitive science research presented, we propose directions for improving language models for emotion analysis. We suggest that these research efforts pave the way for constructing new annotation schemes and a possible benchmark for emotional understanding, considering different facets of human emotion and communication.	翻訳日:2024-06-19 01:31:17 公開日:2024-06-11
# 格子探索法に基づくハイブリッド深層学習モデルを用いたCOVID-19 Twitterの感性分類 COVID-19 Twitter Sentiment Classification Using Hybrid Deep Learning Model Based on Grid Search Methodology ( http://arxiv.org/abs/2406.10266v1 ) ライセンス: Link先を確認	Jitendra Tembhurne, Anant Agrawal, Kirtan Lakhotia,	(参考訳) 現代では、ソーシャルメディアプラットフォームは、ユーザーが貢献する膨大な量のソーシャルデータを蓄積している。製品やイベントに関する個人の意見や感情的傾向を素早く把握するためには、ユーザ生成コンテンツに対して感情分析を行うことが不可欠となる。マイクロブログのコメントは長いテキストと簡潔なテキストの両方を包含し、複雑なシナリオを提示する。この複雑さは、リッチな内容と短いテキストエントリと比較して複雑な単語の相互関係のため、広範にテキストコンテンツで顕著に発音される。 FacebookやTwitterなどのソーシャルネットワークサイトで共有されている世論の感情分析は進化し、多様なアプリケーションを見つけてきた。しかし、この分野ではいくつかの課題が取り組まれている。ハイブリッド手法は、特に漸進的に複雑なトレーニングデータを扱う場合、感情分析エラーを緩和するための有望なモデルとして現れてきた。本稿では、新型コロナウイルスワクチン接種の難しさを検討するために、感情分類のための8種類のハイブリッドディープラーニングモデルを提案する。感情予測は、Twitter COVID-19データセットへの埋め込み、ディープラーニングモデル、グリッド検索アルゴリズムを使用して達成される。研究によると、新型コロナウイルス(COVID-19)の予防接種に対する大衆の感情は時間とともに改善しているようだ。広範囲な評価により、提案されたモデルでは98.86%の精度が向上し、他のモデルよりも優れていた。具体的には、BERT、CNN、GSの組み合わせが最も正確であり、GloVe、BiLSTM、CNN、GSの組み合わせは98.17%の精度で遅れている。また,2.11%から14.46%の範囲での精度の向上は,既存の研究と比較して提案モデルにより報告されている。 In the contemporary era, social media platforms amass an extensive volume of social data contributed by their users. In order to promptly grasp the opinions and emotional inclinations of individuals regarding a product or event, it becomes imperative to perform sentiment analysis on the user-generated content. Microblog comments often encompass both lengthy and concise text entries, presenting a complex scenario. This complexity is particularly pronounced in extensive textual content due to its rich content and intricate word interrelations compared to shorter text entries. Sentiment analysis of public opinion shared on social networking websites such as Facebook or Twitter has evolved and found diverse applications. However, several challenges remain to be tackled in this field. The hybrid methodologies have emerged as promising models for mitigating sentiment analysis errors, particularly when dealing with progressively intricate training data. In this article, to investigate the hesitancy of COVID-19 vaccination, we propose eight different hybrid deep learning models for sentiment classification with an aim of improving overall accuracy of the model. The sentiment prediction is achieved using embedding, deep learning model and grid search algorithm on Twitter COVID-19 dataset. According to the study, public sentiment towards COVID-19 immunization appears to be improving with time, as evidenced by the gradual decline in vaccine reluctance. Through extensive evaluation, proposed model reported an increased accuracy of 98.86%, outperforming other models. Specifically, the combination of BERT, CNN and GS yield the highest accuracy, while the combination of GloVe, BiLSTM, CNN and GS follows closely behind with an accuracy of 98.17%. In addition, increase in accuracy in the range of 2.11% to 14.46% is reported by the proposed model in comparisons with existing works.	翻訳日:2024-06-19 01:31:17 公開日:2024-06-11
# 生成LDMのトークン確率分布における未使用情報:予測値の計算によるLCM読取理解の改善 Unused information in token probability distribution of generative LLM: improving LLM reading comprehension through calculation of expected values ( http://arxiv.org/abs/2406.10267v1 ) ライセンス: Link先を確認	Krystian Zawistowski,	(参考訳) LLMテキストデコーディングは、LLMの品質を認識するための重要なコンポーネントである。トークン確率の操作により復号法を改良できることを示す2つの実験を行った。まず,SummEvalの要約スコアリングデータセットを用いて,読解理解度を測定する。欲求復号から期待値までのスコアを次のトークン分布で比較する。スコアのエントロピーを高めるために,ロジットを高温でスケールする。これにより SummEval のパフォーマンスが向上する(人間の判断に相関する)。 7BMistralでは6-8%から13-28%,Mixtralでは20%-46%から37%-56%に改善した。利得の一部は位置バイアスに関係しているようだ。第2に、確率に基づく木サンプリングアルゴリズムを用いて、与えられたプロンプトに対して最も確率の高い世代すべてを調べる。 LLM text decoding is key component for perceived LLM quality. We demonstrate two experiments showing that decoding methods could be improved by manipulation of token probabilities. First, we test few LLM on SummEval summary scoring dataset, to measure reading comprehension. We compare scores from greedy decoding to expected values over the next token distribution. We scale logits by large temperature to increase the entropy of scores. This allows strong improvement of performance on SummEval (in terms of correlations to human judgement). We see improvement from 6-8% to 13-28% for 7B Mistral and from 20%-46% to 37%-56% for Mixtral, beating GPT 4 0314 result on two metrics. Part of the gain seems related to positional bias. Secondly, we use probability-based tree sampling algorithm, to examine all most probable generations for given prompt.	翻訳日:2024-06-19 01:31:17 公開日:2024-06-11
# 自然言語処理による自動数学的帰納証明 Autograding Mathematical Induction Proofs with Natural Language Processing ( http://arxiv.org/abs/2406.10268v1 ) ライセンス: Link先を確認	Chenyan Zhao, Mariana Silva, Seth Poulsen,	(参考訳) 数学の証明教育では、学生が数学の証明を書くことを学ぶのを助ける介入が必要である。研究によると、タイムリーなフィードバックは、新しいスキルを学ぶ学生にとって非常に役に立つ。長年にわたり、自然言語処理モデルは数学的テキストに関連するタスクでうまく機能するのに苦労してきたが、近年の自然言語処理の発展は、学生に数学的証明に対する即時フィードバックを与える機会を生み出している。本稿では,既存の大規模言語モデルや他の機械学習技術を活用して,自由形式の数学的証明を自動分解する訓練手法とモデルを提案する。モデルは、誘導問題によって4つの異なる証明から収集された証明データを用いて訓練される。我々は、4つの異なる頑健な大規模言語モデルを使用してパフォーマンスを比較し、それぞれが満足できるパフォーマンスを様々な程度に達成しています。さらに、トレーニングデータと同じ証明を格付けするために、人間の学級者を募集し、最高の学級モデルがほとんどの学級者よりも正確であることを見出した。これらのグレーティングモデルの開発により,帰納的問題による証明のためのオートグラファーの作成と展開を行い,学生とのユーザスタディを実施する。研究結果は、学生がオートグラファーからのフィードバックを使って証明を大幅に改善できることを示しているが、学生は人間のグレーダーを信頼するほどAIオートグラファーを信頼していない。将来の作業は、オートグラファーのフィードバックを改善し、学生がAIオートグラダーを信頼する方法を見つけることができる。 In mathematical proof education, there remains a need for interventions that help students learn to write mathematical proofs. Research has shown that timely feedback can be very helpful to students learning new skills. While for many years natural language processing models have struggled to perform well on tasks related to mathematical texts, recent developments in natural language processing have created the opportunity to complete the task of giving students instant feedback on their mathematical proofs. In this paper, we present a set of training methods and models capable of autograding freeform mathematical proofs by leveraging existing large language models and other machine learning techniques. The models are trained using proof data collected from four different proof by induction problems. We use four different robust large language models to compare their performances, and all achieve satisfactory performances to various degrees. Additionally, we recruit human graders to grade the same proofs as the training data, and find that the best grading model is also more accurate than most human graders. With the development of these grading models, we create and deploy an autograder for proof by induction problems and perform a user study with students. Results from the study shows that students are able to make significant improvements to their proofs using the feedback from the autograder, but students still do not trust the AI autograders as much as they trust human graders. Future work can improve on the autograder feedback and figure out ways to help students trust AI autograders.	翻訳日:2024-06-19 01:31:17 公開日:2024-06-11
# 大規模言語モデルサロゲートとしてのマルコフ制約 Markov Constraint as Large Language Model Surrogate ( http://arxiv.org/abs/2406.10269v1 ) ライセンス: Link先を確認	Alexandre Bonlarron, Jean-Charles Régin,	(参考訳) 本稿では,マルコフ制約の変種であるNgramMarkovについて述べる。制約プログラミング(CP)におけるテキスト生成に特化している。これは、大きな言語モデル(LLM)によって与えられる確率に関連する一連のn-gram(すなわちnワードの列)を含む。これは文のn-グラムの確率の積を制限する。この制約のプロパゲータは、n-gram の最大推定ではなく LLM 分布を取り入れた、素マルコフ制約プロパゲータの拡張と見なすことができる。これはグライディングしきい値、すなわち局所確率が低すぎるn-グラムを拒絶し、平衡解を保証する。また、固定長地平線に対して許容される文につながる可能性が極めて低いn-gramを除去する「ルックアヘッド」アプローチと組み合わせることもできる。この考え方はMDDMarkovProcess制約プロパゲータに基づいているが、MDD(Multi-Valued Decision Diagram)を明示的に使用していない。実験の結果, 生成したテキストは, LLMのパープレキシティ関数と同じような方法で評価されることがわかった。この新しい制約を使用することで、生成される候補文の数を劇的に削減し、計算時間を改善し、より大きなコーパスやより小さなn-gramを使用することができる。 5グラムではなく4グラムで現実の問題が初めて解決された。 This paper presents NgramMarkov, a variant of the Markov constraints. It is dedicated to text generation in constraint programming (CP). It involves a set of n-grams (i.e., sequence of n words) associated with probabilities given by a large language model (LLM). It limits the product of the probabilities of the n-gram of a sentence. The propagator of this constraint can be seen as an extension of the ElementaryMarkov constraint propagator, incorporating the LLM distribution instead of the maximum likelihood estimation of n-grams. It uses a gliding threshold, i.e., it rejects n-grams whose local probabilities are too low, to guarantee balanced solutions. It can also be combined with a "look-ahead" approach to remove n-grams that are very unlikely to lead to acceptable sentences for a fixed-length horizon. This idea is based on the MDDMarkovProcess constraint propagator, but without explicitly using an MDD (Multi-Valued Decision Diagram). The experimental results show that the generated text is valued in a similar way to the LLM perplexity function. Using this new constraint dramatically reduces the number of candidate sentences produced, improves computation times, and allows larger corpora or smaller n-grams to be used. A real-world problem has been solved for the first time using 4-grams instead of 5-grams.	翻訳日:2024-06-19 01:21:32 公開日:2024-06-11
# Trie-Augmented Neural Networks(TANNS)の概念的フレームワーク A Conceptual Framework For Trie-Augmented Neural Networks (TANNS) ( http://arxiv.org/abs/2406.10270v1 ) ライセンス: Link先を確認	Temitayo Adefemi,	(参考訳) Trie-Augmented Neural Networks (TANN)は、ニューラルネットワークとトリ構造を組み合わせることで、意思決定の透明性と機械学習の効率性を高める階層的な設計を形成する。本稿では,テキストと文書の分類にTANNを用い,RNN(Recurrent Neural Networks)とFNN(Feed forward Neural Networks)を適用した。 20のNewsGroupおよびSMS Spam Collectionデータセット上でTANNを評価し,従来のRNNおよびFFNネットワークとドロップアウト正規化の有無を比較した。その結果, TANNはテキスト分類において, 類似あるいは若干の性能が向上していることがわかった。 TANNの最大の利点は、構造化された意思決定プロセスであり、解釈可能性を向上させる。実装上の課題と実用上の制限について論じる。今後の作業は、より複雑な分類タスクのために、TANNアーキテクチャを洗練することを目的としている。 Trie-Augmented Neural Networks (TANNs) combine trie structures with neural networks, forming a hierarchical design that enhances decision-making transparency and efficiency in machine learning. This paper investigates the use of TANNs for text and document classification, applying Recurrent Neural Networks (RNNs) and Feed forward Neural Networks (FNNs). We evaluated TANNs on the 20 NewsGroup and SMS Spam Collection datasets, comparing their performance with traditional RNN and FFN Networks with and without dropout regularization. The results show that TANNs achieve similar or slightly better performance in text classification. The primary advantage of TANNs is their structured decision-making process, which improves interpretability. We discuss implementation challenges and practical limitations. Future work will aim to refine the TANNs architecture for more complex classification tasks.	翻訳日:2024-06-19 01:21:32 公開日:2024-06-11
# Perlによる非Perlバイオインフォマティクス応用の強化: オブジェクト指向, PDL, Alien, FFI, Inline, OpenMP を用いた新しいコンポーネントベースアプリケーションの構築 Enhancing non-Perl bioinformatic applications with Perl: Building novel, component based applications using Object Orientation, PDL, Alien, FFI, Inline and OpenMP ( http://arxiv.org/abs/2406.10271v1 ) ライセンス: Link先を確認	Christos Argyropoulos,	(参考訳) コンポーネントベースのソフトウェアエンジニアリング(CBSE)は、既存の再利用可能なソフトウェアコンポーネントを新しいアプリケーションに組み立てる方法論である。 Perlはこの分野で10年前まで広く使われていたが、最近のアプリケーションはBiioconductor/RまたはPythonを選択している。この傾向は、Perlがコンポジションを容易にするための様々な抽象化を提供しているため、既存のコンポーネントから新しいバイオインフォマティクスアプリケーションを素早く生成する機会が著しく欠落していることを示している。本稿では,オブジェクト指向フレームワーク,Perl Data Language,および外部関数インタフェースによる非Perlコードへのインターフェース,および外部ソースコードのインライン化によるCBSE用Perlの有用性について述べる。そのため、Rで書かれたRNAシークエンシングシミュレータであるPolyesterを拡張し、編集距離に基づいて高速な配列類似性検索ライブラリをedlibする。最初のケーススタディでは、GNU Scientific LibraryとPDLを使って乱数シミュレーションのために、新しい高性能なPerlモジュールをほぼ無作為に作成し、生物学的配列からポリAテールを"トリム"するために使用されるPythonツール cutadaptのPerlとPerl/C代替案を提案する。 edlibの場合、メタクラスプログラミングのパワーを活用して、多コアエンジン(MCE)モジュールとOpenMP(C/C++/Fortran Application Programming Interface for shared memory multithreaded Processing)によるプロセスベースの並列処理、そして粒度の細かい並列処理を実現します。これらのユースケースは、Bio::SeqAlignmentフレームワークのコンセプト実証を提供する。このフレームワークは、複雑なメモリにおける異種コンポーネントを整理し、新しいビオンフォマティクスツールを構築するためのコマンドラインベースのワークフローで、ロングリードシークエンシング、例えばナノポール、シークエンシングプラットフォームからのデータを分析することができる。 Component-Based Software Engineering (CBSE) is a methodology that assembles pre-existing, re-usable software components into new applications, which is particularly relevant for fast moving, data-intensive fields such as bioinformatics. While Perl was used extensively in this field until a decade ago, more recent applications opt for a Bioconductor/R or Python. This trend represents a significantly missed opportunity for the rapid generation of novel bioinformatic applications out of pre-existing components since Perl offers a variety of abstractions that can facilitate composition. In this paper, we illustrate the utility of Perl for CBSE through a combination of Object Oriented frameworks, the Perl Data Language and facilities for interfacing with non-Perl code through Foreign Function Interfaces and inlining of foreign source code. To do so, we enhance Polyester, a RNA sequencing simulator written in R, and edlib a fast sequence similarity search library based on the edit distance. The first case study illustrates the near effortless authoring of new, highly performant Perl modules for the simulation of random numbers using the GNU Scientific Library and PDL, and proposes Perl and Perl/C alternatives to the Python tool cutadapt that is used to "trim" polyA tails from biological sequences. For the edlib case, we leverage the power of metaclass programming to endow edlib with coarse, process based parallelism, through the Many Core Engine (MCE) module and fine grained parallelism through OpenMP, a C/C++/Fortran Application Programming Interface for shared memory multithreaded processing. These use cases provide proof-of-concept for the Bio::SeqAlignment framework, which can organize heterogeneous components in complex memory and command-line based workflows for the construction of novel bionformatic tools to analyze data from long-read sequencing, e.g. Nanopore, sequencing platforms.	翻訳日:2024-06-19 01:21:32 公開日:2024-06-11
# 中国語と英語におけるコネクテッド音声に基づく認知評価 Connected Speech-Based Cognitive Assessment in Chinese and English ( http://arxiv.org/abs/2406.10272v1 ) ライセンス: Link先を確認	aturnino Luz, Sofia De La Fuente Garcia, Fasih Haider, Davida Fromm, Brian MacWhinney, Alyssa Lanzi, Ya-Ning Chang, Chia-Ju Chou, Yi-Chien Liu,	(参考訳) 本稿では,コネクテッド音声の分析による認知機能評価のための新しいベンチマークデータセットと予測タスクを提案する。このデータセットは、中国語と英語の話者の音声サンプルと臨床情報からなり、認知障害のレベルが異なる。これらのデータは、モデルトレーニングにおけるバランスと表現力を確保するために、確率スコア分析によって年齢と性別によって慎重に一致している。予測タスクは、軽度の認知障害診断と認知テストスコア予測を含む。このフレームワークは、言語にまたがって一般化する音声に基づく認知評価手法の開発を促進するために設計された。本稿では,言語に依存しない,同等の機能を備えたベースライン予測モデルを用いて,診断と認知テストスコア予測を行う。非重みのない平均リコールは59.2%、根平均2乗誤差は2.89である。 We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age and sex by propensity score analysis to ensure balance and representativity in model training. The prediction tasks encompass mild cognitive impairment diagnosis and cognitive test score prediction. This framework was designed to encourage the development of approaches to speech-based cognitive assessment which generalise across languages. We illustrate it by presenting baseline prediction models that employ language-agnostic and comparable features for diagnosis and cognitive test score prediction. The models achieved unweighted average recall was 59.2% in diagnosis, and root mean squared error of 2.89 in score prediction.	翻訳日:2024-06-19 01:21:32 公開日:2024-06-11
# 言葉を超えて: ミッションクリティカルリスク分析における大規模言語モデルでの行動可能性 Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis ( http://arxiv.org/abs/2406.10273v1 ) ライセンス: Link先を確認	Matteo Esposito, Francesco Palagiano, Valentina Lenarduzzi,	(参考訳) コンテキスト。リスク分析は特定のシナリオにおける潜在的なリスクを評価する。リスク分析の原則は、コンテキストレスであり、同じ方法論を、健康や情報技術のセキュリティに関連するリスクに適用することができる。リスク分析には、国内外の規制や基準に関する膨大な知識が必要であり、時間と努力が集中している。大きな言語モデルは、人間よりも少ない時間で情報を素早く要約することができ、特定のタスクに微調整することができる。エイム。本研究は,リスク分析における検索・拡張型LLMと微調整型LLMの有効性を検討することを目的とした実証研究である。我々の知る限り、リスク分析の能力について事前の研究は行われていない。方法。我々は過去5年間に産業状況チームによってアーカイブされた50以上のミッションクリティカルな分析結果から,‘totalscenarios’というユニークなシナリオを手作業でキュレートした。基本モデルであるGPT-3.5とGPT-4とRetrieval-Augmented Generationおよび微調整モデルを比較した。我々は、モデルの競合相手として2人の人間専門家と、3人の人間専門家を雇い、モデルと以前の人間専門家の分析をレビューします。審査員は5000のシナリオ分析を行った。結果と結論。 HEsは高い精度を示したが、LSMsはより速く、より実用的な。さらに,RAG支援LSMが最も低い幻覚率を示し,隠れたリスクを効果的に発見し,人間の専門知識を補完することを示した。したがって、モデルの選択は、正確性のためのFTM、隠れたリスク発見のためのRAG、包括性と行動可能性のためのベースモデルなど、特定のニーズに依存する。したがって、専門家はLLMを、凝縮した時間枠内でのリスク分析を効果的に補完するコンパニオンとして活用することができる。また、不当な対策の実施に伴う不要な費用を回避することでコストを削減できる。 Context. Risk analysis assesses potential risks in specific scenarios. Risk analysis principles are context-less; the same methodology can be applied to a risk connected to health and information technology security. Risk analysis requires a vast knowledge of national and international regulations and standards and is time and effort-intensive. A large language model can quickly summarize information in less time than a human and can be fine-tuned to specific tasks. Aim. Our empirical study aims to investigate the effectiveness of Retrieval-Augmented Generation and fine-tuned LLM in Risk analysis. To our knowledge, no prior study has explored its capabilities in risk analysis. Method. We manually curated \totalscenarios unique scenarios leading to \totalsamples representative samples from over 50 mission-critical analyses archived by the industrial context team in the last five years. We compared the base GPT-3.5 and GPT-4 models versus their Retrieval-Augmented Generation and fine-tuned counterparts. We employ two human experts as competitors of the models and three other three human experts to review the models and the former human expert's analysis. The reviewers analyzed 5,000 scenario analyses. Results and Conclusions. HEs demonstrated higher accuracy, but LLMs are quicker and more actionable. Moreover, our findings show that RAG-assisted LLMs have the lowest hallucination rates, effectively uncovering hidden risks and complementing human expertise. Thus, the choice of model depends on specific needs, with FTMs for accuracy, RAG for hidden risks discovery, and base models for comprehensiveness and actionability. Therefore, experts can leverage LLMs for an effective complementing companion in risk analysis within a condensed timeframe. They can also save costs by averting unnecessary expenses associated with implementing unwarranted countermeasures.	翻訳日:2024-06-19 01:21:32 公開日:2024-06-11
# 一般大言語モデルを用いた数学的文書の分類 Using General Large Language Models to Classify Mathematical Documents ( http://arxiv.org/abs/2406.10274v1 ) ライセンス: Link先を確認	Patrick D. F. Ion, Stephen M. Watt,	(参考訳) 本稿では,最近公開された汎用大言語モデル (LLM) を用いて数学的文書を分類する可能性を評価するための最初の調査について報告する。自動分類は、文学のナビゲーションを改善するための応用的な視点と、数学的結果間の関係を識別するよりオープンな目標から有用である。 MathSciNet と zbMATH の Mathematical Subject Classification MSC 2020 は広く使われており、公開文学において地中真理資料のかなりのコーパスが存在する。我々は,MSC 2020に基づき,arXiv.orgの事前印刷項目の分類を評価した。実験ではタイトルと抽象のみを使用しましたが、紙全体ではありません。これはチャットボットの利用とAPIの開発の初期段階であったため、ここでは手作業による実行について報告する。もちろん、プロセスの自動化は、一般的に有用であるならば、従わなければなりません。サンプルの約60%において, LLMはarXivで既に報告されている一次分類マッチングを作成した。約半数の症例では、検出されなかった追加の一次分類があった。サンプルの約40%において、LLMは提供されたものとは異なる分類を提案した。しかし, これらの症例の詳細な検査では, LLMを推奨する分類は, 提供された分類よりも, 多くの場合において良好であった。 In this article we report on an initial exploration to assess the viability of using the general large language models (LLMs), recently made public, to classify mathematical documents. Automated classification would be useful from the applied perspective of improving the navigation of the literature and the more open-ended goal of identifying relations among mathematical results. The Mathematical Subject Classification MSC 2020, from MathSciNet and zbMATH, is widely used and there is a significant corpus of ground truth material in the open literature. We have evaluated the classification of preprint articles from arXiv.org according to MSC 2020. The experiment used only the title and abstract alone -- not the entire paper. Since this was early in the use of chatbots and the development of their APIs, we report here on what was carried out by hand. Of course, the automation of the process will have to follow if it is to be generally useful. We found that in about 60% of our sample the LLM produced a primary classification matching that already reported on arXiv. In about half of those instances, there were additional primary classifications that were not detected. In about 40% of our sample, the LLM suggested a different classification than what was provided. A detailed examination of these cases, however, showed that the LLM-suggested classifications were in most cases better than those provided.	翻訳日:2024-06-19 01:21:32 公開日:2024-06-11
# ExHuBERT:37の感情データセットのブロック拡張と細調整によるHumberTの強化 ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets ( http://arxiv.org/abs/2406.10275v1 ) ライセンス: Link先を確認	Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller,	(参考訳) 基礎モデルは、事前訓練された表現を利用して、音声信号の感情パターンをキャプチャすることで、音声感情認識(SER)に大きな可能性を示してきた。様々な言語やドメインのSER性能をさらに向上するために,新しい2次元アプローチを提案する。 EmoSet++は、37のデータセット、150,907のサンプル、合計119.5時間からなる包括的な多言語、多文化の音声感情コーパスである。次に、バックボーン拡張とEmoSet++の微調整によって達成されたHuBERTの拡張バージョンであるExHuBERTを紹介します。それぞれのエンコーダ層とその重みを複製し、最初の複製を凍結し、余分なゼロ初期化線形層を統合し、接続をスキップして機能を保ち、その後の微調整への適応性を確保する。未知のデータセットに対する評価は、ExHuBERTの有効性を示し、様々なSERタスクに対する新しいベンチマークを設定した。 EmoSet++に関するモデルと詳細: https://huggingface.co/amiriparian/ExHuBERT Foundation models have shown great promise in speech emotion recognition (SER) by leveraging their pre-trained representations to capture emotion patterns in speech signals. To further enhance SER performance across various languages and domains, we propose a novel twofold approach. First, we gather EmoSet++, a comprehensive multi-lingual, multi-cultural speech emotion corpus with 37 datasets, 150,907 samples, and a total duration of 119.5 hours. Second, we introduce ExHuBERT, an enhanced version of HuBERT achieved by backbone extension and fine-tuning on EmoSet++. We duplicate each encoder layer and its weights, then freeze the first duplicate, integrating an extra zero-initialized linear layer and skip connections to preserve functionality and ensure its adaptability for subsequent fine-tuning. Our evaluation on unseen datasets shows the efficacy of ExHuBERT, setting a new benchmark for various SER tasks. Model and details on EmoSet++: https://huggingface.co/amiriparian/ExHuBERT.	翻訳日:2024-06-19 01:21:32 公開日:2024-06-11
# YOLOモデルによる道路信号検出の高速化と伝達学習 Advancing Roadway Sign Detection with YOLO Models and Transfer Learning ( http://arxiv.org/abs/2406.09437v1 ) ライセンス: Link先を確認	Selvia Nafaa, Hafsa Essam, Karim Ashour, Doaa Emad, Rana Mohamed, Mohammed Elhenawy, Huthaifa I. Ashqar, Abdallah A. Hassan, Taqwa I. Alhadidi,	(参考訳) 道路標識の検出と認識はアドバンスト・ドライビング・アシスタント・システム(ADAS)の重要な要素である。いくつかの人工知能手法が、YOLOv5とYOLOv8の中で広く使われている。本稿では,異なる照明条件下で異なる道路標識を検出し,分類するために,改良されたYOLOv5とYOLOv8を用いた。実験の結果、YOLOv8モデルでは、エポック数やバッチサイズによってMAP50のスコアは94.6%から97.1%に変化していることがわかった。 YOLOv5モデルは競合性能を示し、MAP50のスコアは92.4%から96.9%である。これらの結果から, YOLOv8はMAP50スコアをわずかに高め, 異なるトレーニング設定で良好に動作することが示唆された。これらの結果は、どちらのモデルも異なるトレーニング設定下でもうまく機能し、オブジェクト検出アプリケーションにおいて信頼性と適応性のあるソリューションを求める実践者にとって貴重な洞察を提供することを示唆している。 Roadway signs detection and recognition is an essential element in the Advanced Driving Assistant Systems (ADAS). Several artificial intelligence methods have been used widely among of them YOLOv5 and YOLOv8. In this paper, we used a modified YOLOv5 and YOLOv8 to detect and classify different roadway signs under different illumination conditions. Experimental results indicated that for the YOLOv8 model, varying the number of epochs and batch size yields consistent MAP50 scores, ranging from 94.6% to 97.1% on the testing set. The YOLOv5 model demonstrates competitive performance, with MAP50 scores ranging from 92.4% to 96.9%. These results suggest that both models perform well across different training setups, with YOLOv8 generally achieving slightly higher MAP50 scores. These findings suggest that both models can perform well under different training setups, offering valuable insights for practitioners seeking reliable and adaptable solutions in object detection applications.	翻訳日:2024-06-17 17:54:01 公開日:2024-06-11
# テキストマイニング分析を用いたヨルダンにおける交通事故物語の探索 Exploring Traffic Crash Narratives in Jordan Using Text Mining Analytics ( http://arxiv.org/abs/2406.09438v1 ) ライセンス: Link先を確認	Shadi Jaradat, Taqwa I. Alhadidi, Huthaifa I. Ashqar, Ahmed Hossain, Mohammed Elhenawy,	(参考訳) 本研究は,テキストマイニング分析による交通安全政策の効果的な情報提供と強化を目的として,交通事故の物語を考察する。テキストマイニング技術は、物語の中の主要なテーマや傾向を解明するために使われ、交通事故の原因についてより深く理解することを目的としている。この研究は、2018-2022年の7,587件の記録をカバーしたヨルダンの5つの主要高速道路の事故データを収集した。事故データからパターンを学習するために,教師なし学習法を採用した。トピックモデリング、キーワード抽出、Word Co-Occurrence Networkといったテキストマイニング技術も、クラッシュパターンの共起を明らかにするために使用された。その結果,テキストマイニング分析は有望な手法であり,交通事故の多因子的特性を裏付けるものであることがわかった。すべての分析における繰り返しのテーマは、道路安全に対するバランスのとれたアプローチの必要性を強調し、積極的かつ反応性のある手段を融合させる。動物関連の出来事に関するドライバー教育と認識が重要視される。 This study explores traffic crash narratives in an attempt to inform and enhance effective traffic safety policies using text-mining analytics. Text mining techniques are employed to unravel key themes and trends within the narratives, aiming to provide a deeper understanding of the factors contributing to traffic crashes. This study collected crash data from five major freeways in Jordan that cover narratives of 7,587 records from 2018-2022. An unsupervised learning method was adopted to learn the pattern from crash data. Various text mining techniques, such as topic modeling, keyword extraction, and Word Co-Occurrence Network, were also used to reveal the co-occurrence of crash patterns. Results show that text mining analytics is a promising method and underscore the multifactorial nature of traffic crashes, including intertwining human decisions and vehicular conditions. The recurrent themes across all analyses highlight the need for a balanced approach to road safety, merging both proactive and reactive measures. Emphasis on driver education and awareness around animal-related incidents is paramount.	翻訳日:2024-06-17 17:54:01 公開日:2024-06-11
# 論文へのコメント: 位置: 大規模トラベリングセールスマン問題の解決のためのポストホック検索に基づくニューラルアプローチの再考 Comment on paper: Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems ( http://arxiv.org/abs/2406.09441v1 ) ライセンス: Link先を確認	Yimeng Min,	(参考訳) 我々は,SoftDistの論文(Xia et al )において,(1)異なるベースラインのすべてのステップを同じハードウェア環境で実行できないこと,(2)他のベースラインとの比較において不整合時間測定を使用すること,の2つの主要な課題を識別する。これらの問題は欠点のある結論に繋がる。すべてのステップが同じハードウェア環境で実行される場合、SoftDistの主要なクレームはもはやサポートされない。 We identify two major issues in the SoftDist paper (Xia et al.): (1) the failure to run all steps of different baselines on the same hardware environment, and (2) the use of inconsistent time measurements when comparing to other baselines. These issues lead to flawed conclusions. When all steps are executed in the same hardware environment, the primary claim made in SoftDist is no longer supported.	翻訳日:2024-06-17 17:54:01 公開日:2024-06-11
# Deep Contextualized Transformer を用いた質問分類 Question Classification with Deep Contextualized Transformer ( http://arxiv.org/abs/1910.10492v3 ) ライセンス: Link先を確認	Haozheng Luo, Ningwei Liu, Charles Feng,	(参考訳) 質問と回答に関する最新の作業は、Stanford Parse Treeを使用することだ。我々は,事前の作業に基づいて,Deep Contextualized Transformerを用いて質問・回答問題に対処する新しい手法を開発し,いくつかの異常表現を管理する。また、SQuADおよびSwDAデータセットの広範囲な評価を行い、産業ニーズのQA問題分類よりも大幅に改善されたことを示す。また,問題解の精度と効率性に対する異なるモデルの影響についても検討する。本手法はより高精度でQA問題の解法に有効であることを示す。 The latest work for Question and Answer problems is to use the Stanford Parse Tree. We build on prior work and develop a new method to handle the Question and Answer problem with the Deep Contextualized Transformer to manage some aberrant expressions. We also conduct extensive evaluations of the SQuAD and SwDA dataset and show significant improvement over QA problem classification of industry needs. We also investigate the impact of different models for the accuracy and efficiency of the problem answers. It shows that our new method is more effective for solving QA problems with higher accuracy	翻訳日:2024-06-16 18:08:02 公開日:2024-06-11
# 単語埋め込みメソッドは安定しているか、それに気を配るべきか? Are Word Embedding Methods Stable and Should We Care About It? ( http://arxiv.org/abs/2104.08433v2 ) ライセンス: Link先を確認	Angana Borah, Manash Pratim Barman, Amit Awekar,	(参考訳) 表現学習法は、複数の実行で与えられたデータの類似した表現を一貫して生成している場合、安定であると考えられる。 Word Embedding Methods (WEM) は、与えられたテキストデータ中の各単語に対して密度の高いベクトル表現を生成する表現学習のクラスである。本研究の中心となる考え方は,単語の類似性に基づく内在的評価を用いたWEMの安定性の測定である。我々は、Word2Vec、GloVe、fastTextの3つの人気のあるWEMを実験した。安定度測定には,これらのモデルのトレーニングに係わる5つのパラメータの影響について検討する。われわれは、ウィキペディア、ニュース、ソング歌詞、欧州議会の議事録の4つの実世界のデータセットを用いて実験を行う。また、WEM安定性が3つの下流タスク(クラスタリング、POSタグ付け、フェアネス評価)に与える影響を観察した。我々の実験は、3つのWEMの中で、fastTextが最も安定しており、GloVeとWord2Vecが続くことを示している。 A representation learning method is considered stable if it consistently generates similar representation of the given data across multiple runs. Word Embedding Methods (WEMs) are a class of representation learning methods that generate dense vector representation for each word in the given text data. The central idea of this paper is to explore the stability measurement of WEMs using intrinsic evaluation based on word similarity. We experiment with three popular WEMs: Word2Vec, GloVe, and fastText. For stability measurement, we investigate the effect of five parameters involved in training these models. We perform experiments using four real-world datasets from different domains: Wikipedia, News, Song lyrics, and European parliament proceedings. We also observe the effect of WEM stability on three downstream tasks: Clustering, POS tagging, and Fairness evaluation. Our experiments indicate that amongst the three WEMs, fastText is the most stable, followed by GloVe and Word2Vec.	翻訳日:2024-06-15 02:48:35 公開日:2024-06-11
# ランダムに相互作用するスピンモデルにおける創発的ユニバーサルクエンチダイナミクス Emergent Universal Quench Dynamics in Randomly Interacting Spin Models ( http://arxiv.org/abs/2406.07625v1 ) ライセンス: Link先を確認	Yuchen Li, Tian-Gang Zhou, Ze Wu, Pai Peng, Shengyu Zhang, Riqiang Fu, Ren Zhang, Wei Zheng, Pengfei Zhang, Hui Zhai, Xinhua Peng, Jiangfeng Du,	(参考訳) 普遍性はしばしば、その微妙な複雑さと多様性にもかかわらず、量子多体系の低エネルギー平衡物理学に現れる。近年、量子多体系の遠方平衡力学の研究への関心が高まっている。このような力学は、通常、伝統的な低エネルギー理論の記述を超える非常に励起的な状態を含む。このような非平衡力学において普遍的挙動がもたらされるかどうかは、量子力学のフロンティアにおける中心的な問題である。本稿では、ランダムに相互作用するスピンのアンサンブルによって記述された固体NMR系におけるスピン脱分極過程を監視することにより、普遍力学の実験的観察を報告する。スピン脱分極は、高温における時間的スピン-スピン相関関数と関連付けられる。これらの相関関数が普遍関数形式に従うという驚くべき現象を発見した。この実験的な事実は、この普遍性につながるスピン脱分極力学における支配的な相互作用過程を特定するのに役立つ。我々の観測は、高温における非平衡力学においても普遍性の存在を示し、低エネルギー物理学において確立された普遍性を補完するものである。 Universality often emerges in low-energy equilibrium physics of quantum many-body systems, despite their microscopic complexity and variety. Recently, there has been a growing interest in studying far-from-equilibrium dynamics of quantum many-body systems. Such dynamics usually involves highly excited states beyond the traditional low-energy theory description. Whether universal behaviors can also emerge in such non-equilibrium dynamics is a central issue at the frontier of quantum dynamics. Here we report the experimental observation of universal dynamics by monitoring the spin depolarization process in a solid-state NMR system described by an ensemble of randomly interacting spins. The spin depolarization can be related to temporal spin-spin correlation functions at high temperatures. We discover a remarkable phenomenon that these correlation functions obey a universal functional form. This experimental fact helps us identify the dominant interacting processes in the spin depolarization dynamics that lead to this universality. Our observation demonstrates the existence of universality even in non-equilibrium dynamics at high temperatures, thereby complementing the well-established universality in low-energy physics.	翻訳日:2024-06-14 22:37:00 公開日:2024-06-11
# SAADを用いた自動車システムの異常検出の高速化:統計的異常検出 Enhanced Anomaly Detection in Automotive Systems Using SAAD: Statistical Aggregated Anomaly Detection ( http://arxiv.org/abs/2406.08516v1 ) ライセンス: Link先を確認	Dacian Goina, Eduard Hogea, George Maties,	(参考訳) 本稿では,SAADと呼ばれる新しい異常検出手法を提案する。 SAADアプローチは、高度な統計技術と機械学習を統合し、その有効性は、自動車領域内のハードウェア・イン・ザ・ループ(HIL)環境からの実センサデータを検証することによって実証される。 SAADの重要な革新は、ドロップアウト層によって強化されたFCN(Fully Connected Networks)と組み合わせることで、異常検出の精度と堅牢性を大幅に向上する能力である。総合的な実験的評価では、スタンドアロン統計手法は72.1%の精度を達成し、ディープラーニングモデルは71.5%の精度を達成している。対照的に、集約された手法は88.3%の精度、F1スコア0.921の精度を実現し、個々のモデルよりも優れている。これらの結果はSAADの有効性を浮き彫りにし、自動車システムを含む様々な分野への応用の可能性を示している。 This paper presents a novel anomaly detection methodology termed Statistical Aggregated Anomaly Detection (SAAD). The SAAD approach integrates advanced statistical techniques with machine learning, and its efficacy is demonstrated through validation on real sensor data from a Hardware-in-the-Loop (HIL) environment within the automotive domain. The key innovation of SAAD lies in its ability to significantly enhance the accuracy and robustness of anomaly detection when combined with Fully Connected Networks (FCNs) augmented by dropout layers. Comprehensive experimental evaluations indicate that the standalone statistical method achieves an accuracy of 72.1%, whereas the deep learning model alone attains an accuracy of 71.5%. In contrast, the aggregated method achieves a superior accuracy of 88.3% and an F1 score of 0.921, thereby outperforming the individual models. These results underscore the effectiveness of SAAD, demonstrating its potential for broad application in various domains, including automotive systems.	翻訳日:2024-06-14 22:37:00 公開日:2024-06-11
# アラビア語学習支援のための質問応答(QA)モデル Question-Answering (QA) Model for a Personalized Learning Assistant for Arabic Language ( http://arxiv.org/abs/2406.08519v1 ) ライセンス: Link先を確認	Mohammad Sammoudi, Ahmad Habaybeh, Huthaifa I. Ashqar, Mohammed Elhenawy,	(参考訳) 本稿では,アラビア語用にカスタマイズされたBERTトランスフォーマーを用いたパーソナライズされた学習アシスタントのための質問応答モデルの作成,最適化,評価について述べる。このモデルは特にパレスチナのカリキュラムの科学教科書に微調整された。私たちのアプローチでは、理科教育の分野における質問に対する正しい回答を自動的に生成するためにBERTの素晴らしい能力を使用します。このモデルは、パレスチナのカリキュラムで11年生と12年生の生物学の本を用いて微調整することで、関連する情報を理解し、抽出する能力を向上させる。これにより、啓蒙応答の生成におけるモデルの有効性が向上する。 Exact Match(EM)とF1スコアは、モデルのパフォーマンスを評価するために使用され、結果は、EMスコアが20%、F1スコアが51%である。これらの結果は、このモデルがパレスチナの科学書の文脈で質問を理解し、反応することができることを示している。この結果は、アラビア語の学生の質問を学習し理解するためのBERTベースのQAモデルの可能性を示している。 This paper describes the creation, optimization, and assessment of a question-answering (QA) model for a personalized learning assistant that uses BERT transformers customized for the Arabic language. The model was particularly finetuned on science textbooks in Palestinian curriculum. Our approach uses BERT's brilliant capabilities to automatically produce correct answers to questions in the field of science education. The model's ability to understand and extract pertinent information is improved by finetuning it using 11th and 12th grade biology book in Palestinian curriculum. This increases the model's efficacy in producing enlightening responses. Exact match (EM) and F1 score metrics are used to assess the model's performance; the results show an EM score of 20% and an F1 score of 51%. These findings show that the model can comprehend and react to questions in the context of Palestinian science book. The results demonstrate the potential of BERT-based QA models to support learning and understanding Arabic students questions.	翻訳日:2024-06-14 22:37:00 公開日:2024-06-11
# NLP技術を用いたアラビア語の科学実験のための自動質問生成 Automated Question Generation for Science Tests in Arabic Language Using NLP Techniques ( http://arxiv.org/abs/2406.08520v1 ) ライセンス: Link先を確認	Mohammad Tami, Huthaifa I. Ashqar, Mohammed Elhenawy,	(参考訳) 教育評価のための質問生成は、教育に応用される人工知能における成長分野である。これらの質問生成ツールは、インテリジェント・チュータリングシステムや対話型プラットフォームなど、教育技術分野において重要な役割を担っている。明確な答えを必要とする評価質問の自動生成は、通常、宣言文内の構文的および意味的な指示に依存し、質問に変換される。最近の研究は、アラビア語における評価教育問題の発生を探求している。報告された性能は、文解析の不正確さ、名前の認識の問題、ルールベースの質問変換に起因する誤りなど、固有の誤りによって悪影響を受けている。さらに、長大なアラビア語文の複雑さがこれらの課題に寄与している。本研究は,キーワードとキーフレーズ抽出,質問生成,その後のランク付けという3段階のプロセスに基づいて,アラビア語の革新的な質問生成システムを提案する。本研究の目的は,アラビア語における評価質問の自動生成に関わる課題に対処することである。提案手法と結果から,83.50%の精度,78.68%のリコール,80.95%のFlスコアが得られた。人的評価によりモデルの有効性が確認され、平均評価は84%となった。 Question generation for education assessments is a growing field within artificial intelligence applied to education. These question-generation tools have significant importance in the educational technology domain, such as intelligent tutoring systems and dialogue-based platforms. The automatic generation of assessment questions, which entail clear-cut answers, usually relies on syntactical and semantic indications within declarative sentences, which are then transformed into questions. Recent research has explored the generation of assessment educational questions in Arabic. The reported performance has been adversely affected by inherent errors, including sentence parsing inaccuracies, name entity recognition issues, and errors stemming from rule-based question transformation. Furthermore, the complexity of lengthy Arabic sentences has contributed to these challenges. This research presents an innovative Arabic question-generation system built upon a three-stage process: keywords and key phrases extraction, question generation, and subsequent ranking. The aim is to tackle the difficulties associated with automatically generating assessment questions in the Arabic language. The proposed approach and results show a precision of 83.50%, a recall of 78.68%, and an Fl score of 80.95%, indicating the framework high efficiency. Human evaluation further confirmed the model efficiency, receiving an average rating of 84%.	翻訳日:2024-06-14 22:37:00 公開日:2024-06-11
# 組込み型マルチモーダルラーニングによる生存率向上のためのパン扁平上皮癌 Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes ( http://arxiv.org/abs/2406.08521v1 ) ライセンス: Link先を確認	Asim Waqas, Aakash Tripathi, Paul Stewart, Mia Naeini, Ghulam Rasool,	(参考訳) がんクリニックは、遺伝子から臓器レベルまで、さまざまなスケールで疾患データをキャプチャする。現在のバイオインフォマティクス法は、このデータの不均一な性質、特に欠落したモダリティを扱うのに苦労している。 PARADIGMは,多モーダルな異種データセットから学習し,臨床結果の予測を改善するためのグラフニューラルネットワーク(GNN)フレームワークである。 PARADIGMは、基礎モデルを使用してマルチ解像度データから埋め込みを生成し、それらを患者レベルの表現に集約し、それらを統一されたグラフに融合し、生存分析のようなタスクのパフォーマンスを向上させる。膵扁平上皮癌においてGNNを訓練し,Moffitt Cancer Center肺SCCデータに対するアプローチを検証した。マルチモーダルGNNは、患者生存予測において他のモデルより優れている。さまざまなスケールにわたる個々のデータモダリティの収束は、より洞察に富んだ病気の見方を提供する。我々のソリューションは、患者の状況を包括的に理解することを目的としており、異種データ統合と最大データビューの収束の利点についての洞察を提供する。 Cancer clinics capture disease data at various scales, from genetic to organ level. Current bioinformatic methods struggle to handle the heterogeneous nature of this data, especially with missing modalities. We propose PARADIGM, a Graph Neural Network (GNN) framework that learns from multimodal, heterogeneous datasets to improve clinical outcome prediction. PARADIGM generates embeddings from multi-resolution data using foundation models, aggregates them into patient-level representations, fuses them into a unified graph, and enhances performance for tasks like survival analysis. We train GNNs on pan-Squamous Cell Carcinomas and validate our approach on Moffitt Cancer Center lung SCC data. Multimodal GNN outperforms other models in patient survival prediction. Converging individual data modalities across varying scales provides a more insightful disease view. Our solution aims to understand the patient's circumstances comprehensively, offering insights on heterogeneous data integration and the benefits of converging maximum data views.	翻訳日:2024-06-14 22:37:00 公開日:2024-06-11
# ExioML:グローバルセクタサステナビリティにおける機械学習のためのエコエコノミクスデータセット ExioML: Eco-economic dataset for Machine Learning in Global Sectoral Sustainability ( http://arxiv.org/abs/2406.09046v1 ) ライセンス: Link先を確認	Yanming Guo, Jin Ma,	(参考訳) 環境拡張多段階インプット・アウトプット分析は、経済活動の環境影響を評価するための生態経済学の主要な枠組みである。本稿では,持続可能性分析のための最初の機械学習ベンチマークデータセットであるExioMLを紹介する。セクターサステナビリティを評価し,データセットのユーザビリティを実証するために,温室効果ガスのレグレッションタスクを実施した。従来の浅層モデルと深層学習モデルを比較し,因子会計表を多用し,分類的・数値的特徴を取り入れた。この結果から,ExioMLはユーザビリティが高く,深層およびアンサンブルモデルによる平均二乗誤差の低減を可能にし,将来の機械学習研究のベースラインを確立した。 ExioMLを通じて、さまざまな機械学習アプリケーションをサポートする基盤データセットを構築し、気候変動対策と持続可能な投資決定を促進することを目指している。 The Environmental Extended Multi-Regional Input-Output analysis is the predominant framework in Ecological Economics for assessing the environmental impact of economic activities. This paper introduces ExioML, the first Machine Learning benchmark dataset designed for sustainability analysis, aimed at lowering barriers and fostering collaboration between Machine Learning and Ecological Economics research. A crucial greenhouse gas emission regression task was conducted to evaluate sectoral sustainability and demonstrate the usability of the dataset. We compared the performance of traditional shallow models with deep learning models, utilizing a diverse Factor Accounting table and incorporating various categorical and numerical features. Our findings reveal that ExioML, with its high usability, enables deep and ensemble models to achieve low mean square errors, establishing a baseline for future Machine Learning research. Through ExioML, we aim to build a foundational dataset supporting various Machine Learning applications and promote climate actions and sustainable investment decisions.	翻訳日:2024-06-14 18:05:18 公開日:2024-06-11
# フィルタされた2モードスクイーズ混合状態における絡み合い, スクイーズおよび非局所性 Entanglement, Squeezing and non-Locality in Filtered Two-Mode Squeezed Mixed States ( http://arxiv.org/abs/2406.09134v1 ) ライセンス: Link先を確認	Souvik Agasti,	(参考訳) 連続可変2モード圧縮混合状態のスペクトル成分間の絡み合いと非局所性について検討し,その限界を同定した。これらのスペクトル成分は、光学系でよく用いられるフィルタを用いて出力モードから選択される。絡み合いと非局所性は、フィルタが同一であるときにピークに達する。しかし、非同一性フィルタを適用しながら入力スキューズする度合いを増大させると、絡み合いと非局所性の両方が乱れ、ベル状のパターンが生まれる。さらに、絡み合いと非局所性のための正確な境界を提供する。さらに,2モードのハイブリッド二次体のスケズングを絡み合いの尺度として評価し,対数ネガティビティとどのように類似しているかを示した。このフィルタと組み合わせて、2モードの加圧熱光の集団は、最大で加圧されたハイブリッド二次構造の角度に影響を及ぼす。 We investigate the entanglement and non-locality between specific spectral components of continuous variable two-mode squeezed mixed states, identifying their limits. These spectral components are selected from output modes using filters commonly employed in optomechanical systems. Both entanglement and non-locality reach their peak when the filters are identical. However, increasing the degree of input squeezing while applying non-identical filters disrupts both entanglement and non-locality, leading to a bell-shaped pattern. Additionally, we provide precise boundaries for entanglement and non-locality. Furthermore, we also evaluate the squeezing of two-mode hybrid quadrature as a measure of entanglement, thereby demonstrating how it remains analogous to logarithmic negativity. Combined with the filter, the population of two-mode squeezed thermal light influences the angle of a maximally squeezed hybrid quadrature.	翻訳日:2024-06-14 17:34:25 公開日:2024-06-11
# 状態準備と一元合成における汚れ量子ビットのTゲート Trading T gates for dirty qubits in state preparation and unitary synthesis ( http://arxiv.org/abs/1812.00954v2 ) ライセンス: Link先を確認	Guang Hao Low, Vadym Kliuchnikov, Luke Schaeffer,	(参考訳) 普遍的なフォールトトレラントゲートセット e g Clifford+T からの任意の量子状態とユニタリの効率的な合成は、量子計算における重要なサブルーチンである。大規模な量子アルゴリズムは、コヒーレントな量子情報を符号化する多くの量子ビットを特徴としているが、計算の一部にアイドルを保っているため、ゲート数、特に高価なTゲートの数を最小化すれば、これらを用いるべきである。我々は、空間とTゲートの間のトレードオフを実現するため、任意の次元-$N$純量子状態を作成する量子アルゴリズムを提案する。我々のスキームは、$\mathcal{O}(\log{(N/\epsilon)})$ clean qubitsと$\sim(\lambda\log{(\frac{\log{N}}{\epsilon})})$ dirty qubitsを使って、Tゲートコストを$\mathcal{O}(\frac{N}{\lambda}+\lambda\log{\frac{N}{\epsilon}}\log{\log{N}}{\epsilon}})$に下げる。このトレードオフは、下界を数える無条件ゲートを通して証明された対数的因子に最適であり、最良の場合、前回の無条件アプローチよりもTカウントの二次的な改善である。状態生成への還元によるユニタリ合成についても同様のことが証明されている。我々の構成は、任意の古典的データに対する量子オラクルのT効率回路の実装である。 Efficient synthesis of arbitrary quantum states and unitaries from a universal fault-tolerant gate-set e.g. Clifford+T is a key subroutine in quantum computation. As large quantum algorithms feature many qubits that encode coherent quantum information but remain idle for parts of the computation, these should be used if it minimizes overall gate counts, especially that of the expensive T-gates. We present a quantum algorithm for preparing any dimension-$N$ pure quantum state specified by a list of $N$ classical numbers, that realizes a trade-off between space and T-gates. Our scheme uses $\mathcal{O}(\log{(N/\epsilon)})$ clean qubits and a tunable number of $\sim(\lambda\log{(\frac{\log{N}}{\epsilon})})$ dirty qubits, to reduce the T-gate cost to $\mathcal{O}(\frac{N}{\lambda}+\lambda\log{\frac{N}{\epsilon}}\log{\frac{\log{N}}{\epsilon}})$. This trade-off is optimal up to logarithmic factors, proven through an unconditional gate counting lower bound, and is, in the best case, a quadratic improvement in T-count over prior ancillary-free approaches. We prove similar statements for unitary synthesis by reduction to state preparation. Underlying our constructions is a T-efficient circuit implementation of a quantum oracle for arbitrary classical data.	翻訳日:2024-06-14 02:02:19 公開日:2024-06-11
# コンベックスゲームにおける平衡予測学習のための演算子分割 Operator Splitting for Learning to Predict Equilibria in Convex Games ( http://arxiv.org/abs/2106.00906v4 ) ライセンス: Link先を確認	Daniel McKenzie, Howard Heaton, Qiuwei Li, Samy Wu Fung, Stanley Osher, Wotao Yin,	(参考訳) 競合するエージェントのシステムは、しばしばゲームとしてモデル化される。合理性を仮定すると、最も可能性の高い結果は平衡(例えばナッシュ平衡)によって与えられる。多くの実践的な環境では、ゲームは文脈、すなわちいかなるエージェントの制御以外の追加データ(例えば交通の天気や市場経済の財政政策)に影響を受けている。正確なゲーム力学は分かっていないが、(コンテキスト、平衡)ペアからなる膨大な歴史的データが利用可能であり、文脈のみに与えられる平衡を予測できる解法を学ぶ可能性を高める。平衡を自然に出力するニューラルネットワークのクラスであるNash Fixed Point Networks (N-FPNs)を紹介する。重要なことに、N-FPNは複雑なエージェントアクションセットを扱うために、高価なプロジェクションを避けながら制約デカップリング方式を採用している。経験的に、N-FPNは暗黙のネットワークをトレーニングするための最近開発されたヤコビアンフリーバックプロパゲーション技術と互換性があり、従来のモデルよりもはるかに高速で訓練が容易である。実験の結果,N-FPNは既存の学習ゲーム解法よりも桁違いに大きい問題にスケール可能であることがわかった。 Systems of competing agents can often be modeled as games. Assuming rationality, the most likely outcomes are given by an equilibrium (e.g. a Nash equilibrium). In many practical settings, games are influenced by context, i.e. additional data beyond the control of any agent (e.g. weather for traffic and fiscal policy for market economies). Often the exact game mechanics are unknown, yet vast amounts of historical data consisting of (context, equilibrium) pairs are available, raising the possibility of learning a solver which predicts the equilibria given only the context. We introduce Nash Fixed Point Networks (N-FPNs), a class of neural networks that naturally output equilibria. Crucially, N- FPNs employ a constraint decoupling scheme to handle complicated agent action sets while avoiding expensive projections. Empirically, we find N-FPNs are compatible with the recently developed Jacobian-Free Backpropagation technique for training implicit networks, making them significantly faster and easier to train than prior models. Our experiments show N-FPNs are capable of scaling to problems orders of magnitude larger than existing learned game solvers.	翻訳日:2024-06-14 02:02:19 公開日:2024-06-11
# ダイヤモンドの量子欠陥を用いた集積回路活動の三次元イメージング Three-dimensional imaging of integrated-circuit activity using quantum defects in diamond ( http://arxiv.org/abs/2112.12242v2 ) ライセンス: Link先を確認	Marwa Garsi, Rainer Stöhr, Andrej Denisenko, Farida Shagieva, Nils Trautmann, Ulrich Vogl, Badou Sene, Florian Kaiser, Andrea Zappe, Rolf Reuter, Jörg Wrachtrup,	(参考訳) 半導体ベースの技術のミクロンおよびサブミクロンレギュレーションへの継続的なスケーリングにより、デバイス密度は高く、消費電力も低くなった。このようなスケールでは、自己加熱や電流漏れなどの多くの物理的現象が重要となり、これらの特徴を明らかにするために現在の密度をマッピングすることは、現代のエレクトロニクスの発展にとって決定的なことである。しかし、高度な非侵襲技術は、感度が低く、空間分解能が悪く、2次元の空間マッピングに限られる。ここでは, ダイヤモンド中の窒素空孔近傍のセンターを用いて, 多層集積回路内を電流流によって生成したOersted場を予備開発時に探究する。本研究では,電流密度の3次元成分を約$\approx 10 \,\rm \mu A / \mu m^2$,室温でのサブミクロン空間分解能で再現した。また、異なる層内の電流の局在を報告し、電子チップ内の異常な電流の流れを観察する。そこで本手法は,ナノスケール半導体チップの3次元電流マッピングに向けた決定的なステップを提供する。 The continuous scaling of semiconductor-based technologies to micron and sub-micron regimes has resulted in higher device density and lower power dissipation. Many physical phenomena such as self-heating or current leakage become significant at such scales, and mapping current densities to reveal these features is decisive for the development of modern electronics. However, advanced non-invasive technologies either offer low sensitivity or poor spatial resolution and are limited to two-dimensional spatial mapping. Here we use near-surface nitrogen-vacancy centres in diamond to probe Oersted fields created by current flowing within a multi-layered integrated circuit in pre-development. We show the reconstruction of the three-dimensional components of the current density with a magnitude down to about $\approx 10 \,\rm \mu A / \mu m^2$ and sub-micron spatial resolution at room temperature. We also report the localisation of currents in different layers and observe anomalous current flow in an electronic chip. Our method provides, therefore a decisive step toward three-dimensional current mapping in technologically relevant nanoscale electronics chips.	翻訳日:2024-06-14 02:02:19 公開日:2024-06-11
# 顔の認識システム:DNNを特定の人だけに強制的に操作する Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons ( http://arxiv.org/abs/2301.03118v2 ) ライセンス: Link先を確認	Irad Zehavi, Roee Nitzan, Adi Shamir,	(参考訳) 本稿では,ディープシームズニューラルネットワークの一般的なアーキテクチャに基づいて,あらゆる顔認識モデルに新しい種類のバックドアを植える方法について述べる。これらのバックドアは、攻撃者によって事前に選択された特定の人物の自然なイメージのみに、システムの外観を制御したり、トリガーを挿入したりすることなく、システムを強制する。例えば、そのようなバックドアシステムは、ある人物の2つのイメージを別人、または同一人物の2つのイメージを同一人物と分類し、その決定の正しさにほとんど影響を与えないことを示す。モデルの最後の重み行列に線形変換を適用することで、バックドアのイメージのみを用いて、追加のトレーニングや最適化を行わずに、両方のバックドアを実装できることが驚きである。我々の攻撃の特徴は、複数のバックドアを同一モデルに独立して設置できることである。我々は,SOTA顔認識システムに対する攻撃を実験的に検証した。 10人の有名人を個別に匿名化しようとしたが、ネットワークは2つの画像が同じ人物であることを9,7.02 %から9,8.31 %に認識できなかった。例えば、非常に異なるモーガン・フリーマンとスカーレット・ヨハンソンを混同しようとしたとき、彼らのイメージは当時の9,8.47 %で同一人物であると宣言された。バックドアの種類によっては、お互いのパフォーマンスに最小限の影響しか与えない複数のバックドアを順次設置した(例えば、同じモデルで有名人10人全員を匿名化することで、有名人の成功率が1.01\%以下になった)。実験では、他人のネットワークの良さがほとんど損なわれませんでした(ほとんどの場合、0.05\%以下で劣化しました)。 In this paper, we describe how to plant novel types of backdoors in any facial recognition model based on the popular architecture of deep Siamese neural networks. These backdoors force the system to err only on natural images of specific persons who are preselected by the attacker, without controlling their appearance or inserting any triggers. For example, we show how such a backdoored system can classify any two images of a particular person as different people, or any two images of a particular pair of persons as the same person, with almost no effect on the correctness of its decisions for other persons. Surprisingly, we show that both types of backdoors can be implemented by applying linear transformations to the model's last weight matrix, with no additional training or optimization, using only images of the backdoor identities. A unique property of our attack is that multiple backdoors can be independently installed in the same model by multiple attackers, who may not be aware of each other's existence, with almost no interference. We have experimentally verified the attacks on a SOTA facial recognition system. When we tried to individually anonymize ten celebrities, the network failed to recognize two of their images as being the same person in $97.02\%$ to $98.31\%$ of the time. When we tried to confuse between the extremely different-looking Morgan Freeman and Scarlett Johansson, for example, their images were declared to be the same person in $98.47 \%$ of the time. For each type of backdoor, we sequentially installed multiple backdoors with minimal effect on the performance of each other (for example, anonymizing all ten celebrities on the same model reduced the success rate for each celebrity by no more than $1.01\%$). In all of our experiments, the benign accuracy of the network on other persons barely degraded (in most cases, it degraded by less than $0.05\%$).	翻訳日:2024-06-14 01:52:33 公開日:2024-06-11
# 指数型家族雑音を用いたグラフラプラシアン学習 Graph Laplacian Learning with Exponential Family Noise ( http://arxiv.org/abs/2306.08201v2 ) ライセンス: Link先を確認	Changhao Shi, Gal Mishne,	(参考訳) グラフ信号処理(GSP)は、非ユークリッド領域の信号を分析するための重要なフレームワークである。グラフフーリエ変換(GFT)は、組合せグラフラプラシア行列を用いて、グラフ周波数領域における信号のスペクトル分解を明らかにする。しかし、GSP法の適用における一般的な課題は、多くのシナリオにおいてシステムの基盤となるグラフが不明であることである。そのような場合の解決策は、一般にグラフまたはネットワーク推論と呼ばれる、利用可能なデータから観測されていないグラフを構築することである。異なるグラフ推論法が存在するが、これらは滑らかなグラフ信号または単純な加法的ガウスノイズから学ぶことに限定されている。離散数や二進数といった他のノイズの多いデータは、現実のアプリケーションではよく見られるが、グラフ推論では過小評価されている。本稿では,指数関数的ファミリーノイズによって劣化したグラフ信号から学習する汎用グラフ推論フレームワークを提案する。本フレームワークは,連続的なスムーズなグラフ信号から様々なデータタイプまで,従来の手法を一般化する。雑音信号からラプラシアングラフと保存されない滑らかな表現を共同で推定する交互アルゴリズムを提案する。また、我々のアプローチを変分形式に拡張し、潜在滑らかな表現の固有の確率性を考慮した。最後に、実世界のグラフ信号はしばしば非独立で時間的に相関しているので、元の設定を時間頂点の定式化に適応させる。ノイズモデルミスマッチに苦しむ競合するラプラシアン推定法より優れた合成および実世界のデータを示す。 Graph signal processing (GSP) is a prominent framework for analyzing signals on non-Euclidean domains. The graph Fourier transform (GFT) uses the combinatorial graph Laplacian matrix to reveal the spectral decomposition of signals in the graph frequency domain. However, a common challenge in applying GSP methods is that in many scenarios the underlying graph of a system is unknown. A solution in such cases is to construct the unobserved graph from available data, which is commonly referred to as graph or network inference. Although different graph inference methods exist, these are restricted to learning from either smooth graph signals or simple additive Gaussian noise. Other types of noisy data, such as discrete counts or binary digits, are rather common in real-world applications, yet are underexplored in graph inference. In this paper, we propose a versatile graph inference framework for learning from graph signals corrupted by exponential family noise. Our framework generalizes previous methods from continuous smooth graph signals to various data types. We propose an alternating algorithm that jointly estimates the graph Laplacian and the unobserved smooth representation from the noisy signals. We also extend our approach to a variational form to account for the inherent stochasticity of the latent smooth representation. Finally, since real-world graph signals are frequently non-independent and temporally correlated, we further adapt our original setting to a time-vertex formulation. We demonstrate on synthetic and real-world data that our new algorithms outperform competing Laplacian estimation methods that suffer from noise model mismatch.	翻訳日:2024-06-14 01:42:49 公開日:2024-06-11
# AViT:小さな皮膚病変セグメンテーションデータセットに対する視覚変換器の適応 AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets ( http://arxiv.org/abs/2307.13897v2 ) ライセンス: Link先を確認	Siyi Du, Nourhan Bayasi, Ghassan Hamarneh, Rafeef Garbi,	(参考訳) 皮膚病変セグメンテーション(SLS)は皮膚病変解析において重要な役割を担っている。視覚トランスフォーマー(ViT)は、SLSにとって注目に値するソリューションと考えられているが、固有のパラメータ重構造と誘導バイアスの欠如により、畳み込みニューラルネットワーク(CNN)と比較して、より多くのトレーニングデータを必要とする。この問題を軽減するため、現在のSLSデータセット上で、微調整済みのViTバックボーンにアプローチすることで、より大規模な自然画像から学んだ知識を活用して、必要な皮膚トレーニングデータの量を減らすことを目指している。しかし、大きなバックボーンの全てのパラメータを完全に微調整することは、計算コストが高く、メモリ集約的である。本稿では,任意のトレーニング済みViTをSLSタスクに転送することで,ViTのデータハンガーを緩和する,新しい効率的な戦略であるAViTを提案する。具体的には、トランス層に軽量モジュール(アダプタ)を統合することで、トレーニング済みの重みを更新することなく、ViTの特徴表現を変調する。さらに,入力画像からサブジェネレータとして浅いCNNを用いて,細粒度情報とCNNの帰納バイアスを把握し,セグメント化タスクを小さなデータセット上で案内する。 4つの皮膚病変データセットに関する定量的実験により、AViTはSOTAよりも競争力があり、時には優れているが、トレーニング可能なパラメータは極めて少ないことが示されている。私たちのコードはhttps://github.com/siyi-wind/AViT.comで利用可能です。 Skin lesion segmentation (SLS) plays an important role in skin lesion analysis. Vision transformers (ViTs) are considered an auspicious solution for SLS, but they require more training data compared to convolutional neural networks (CNNs) due to their inherent parameter-heavy structure and lack of some inductive biases. To alleviate this issue, current approaches fine-tune pre-trained ViT backbones on SLS datasets, aiming to leverage the knowledge learned from a larger set of natural images to lower the amount of skin training data needed. However, fully fine-tuning all parameters of large backbones is computationally expensive and memory intensive. In this paper, we propose AViT, a novel efficient strategy to mitigate ViTs' data-hunger by transferring any pre-trained ViTs to the SLS task. Specifically, we integrate lightweight modules (adapters) within the transformer layers, which modulate the feature representation of a ViT without updating its pre-trained weights. In addition, we employ a shallow CNN as a prompt generator to create a prompt embedding from the input image, which grasps fine-grained information and CNN's inductive biases to guide the segmentation task on small datasets. Our quantitative experiments on 4 skin lesion datasets demonstrate that AViT achieves competitive, and at times superior, performance to SOTA but with significantly fewer trainable parameters. Our code is available at https://github.com/siyi-wind/AViT.	翻訳日:2024-06-13 23:42:48 公開日:2024-06-11
# ニューラルソースコード要約のための意味的類似性損失 Semantic Similarity Loss for Neural Source Code Summarization ( http://arxiv.org/abs/2308.07429v2 ) ライセンス: Link先を確認	Chia-Yi Su, Collin McMillan,	(参考訳) 本稿では,ニューラルネットワークの要約における損失関数として意味的類似度測定を用いた手法と評価について述べる。コード要約は、ソースコードの自然言語記述を記述するタスクである。ニューラルネットワークの要約(英: Neural code summarization)とは、ニューラルネットワークを用いてこれらの記述を生成する自動化技術である。現在のアプローチのほとんどすべてが、ニューラルネットワークをスタンドアロンモデルとして、あるいはトレーニング済みの大規模言語モデル(g , GPT, Codex, LLaMA)の一部として含んでいる。しかし、ほとんどの場合、ネットワーク最適化に分類的クロスエントロピー(CCE)損失関数を使用する。 CCEの2つの問題は 1)全文を評価するのではなく、各単語の1対1の予測における損失を計算し、 2) 完全な予測が必要であり、同義語に対する部分的信用の余地は残っていない。本稿では,従来の意味的類似度指標に関する研究を拡張し,その課題を軽減するために意味的類似度を損失関数として用いた手法を示し,この手法をメトリクス駆動型と人為的両方の研究においていくつかの設定で評価する。本質的には,各単語の損失だけでなく,学習バッチごとの出力文予測全体の損失を計算するために,意味的類似度尺度を用いることを提案する。また,各単語に対するCCEの損失と組み合わせることで,ベースラインと比較してトレーニングプロセスの合理化を図ることを提案する。我々は,いくつかのベースラインに対するアプローチを評価し,ほとんどの条件で改善を報告した。 This paper presents a procedure for and evaluation of using a semantic similarity metric as a loss function for neural source code summarization. Code summarization is the task of writing natural language descriptions of source code. Neural code summarization refers to automated techniques for generating these descriptions using neural networks. Almost all current approaches involve neural networks as either standalone models or as part of a pretrained large language models e.g., GPT, Codex, LLaMA. Yet almost all also use a categorical cross-entropy (CCE) loss function for network optimization. Two problems with CCE are that 1) it computes loss over each word prediction one-at-a-time, rather than evaluating a whole sentence, and 2) it requires a perfect prediction, leaving no room for partial credit for synonyms. In this paper, we extend our previous work on semantic similarity metrics to show a procedure for using semantic similarity as a loss function to alleviate this problem, and we evaluate this procedure in several settings in both metrics-driven and human studies. In essence, we propose to use a semantic similarity metric to calculate loss over the whole output sentence prediction per training batch, rather than just loss for each word. We also propose to combine our loss with CCE for each word, which streamlines the training process compared to baselines. We evaluate our approach over several baselines and report improvement in the vast majority of conditions.	翻訳日:2024-06-13 23:42:48 公開日:2024-06-11
# 完全遺伝性原子性OML Completely hereditarily atomic OMLs ( http://arxiv.org/abs/2308.08508v2 ) ライセンス: Link先を確認	John Harding, Andre Kornell,	(参考訳) 無限高さの既約完全原子型 OML は代数的かつ被覆性を持つことができない。しかし、カルムバッハの構成は代数的で 2-被覆性を持つような OML の例を示し、ケラーの構成は被覆性を持ち、完全に遺伝学的にアトミックであるような OML の例を提供する。完全に遺伝的にアトミックなOMLは、量子述語論理に相応しい代数的OMLを一般化する。 An irreducible complete atomic OML of infinite height cannot both be algebraic and have the covering property. However, Kalmbach's construction provides an example of such an OML that is algebraic and has the 2-covering property, and Keller's construction provides an example of such an OML that has the covering property and is completely hereditarily atomic. Completely hereditarily atomic OMLs generalize algebraic OMLs suitably to quantum predicate logic.	翻訳日:2024-06-13 23:42:48 公開日:2024-06-11
# 変圧器は未知系の最適フィルタリングを学習できるか? Can Transformers Learn Optimal Filtering for Unknown Systems? ( http://arxiv.org/abs/2308.08536v3 ) ライセンス: Link先を確認	Haldun Balim, Zhe Du, Samet Oymak, Necmiye Ozay,	(参考訳) トランスフォーマーモデルは自然言語処理において大きな成功をおさめてきたが、そのポテンシャルは力学系では未解明のままである。本研究では,過去の全ての出力予測を生成する変換器を用いた最適出力推定問題について検討する。特に,様々な異なるシステムを用いて変圧器を訓練し,未知のダイナミクスを持つ未知のシステムの性能を評価する。経験的に、訓練された変圧器は異なる未知の系に非常によく適応し、線形系に対してカルマンフィルタが与える最適性能にさえ適合する。非d.d.ノイズ、時間変化力学、未知のパラメータを持つ四元数系のような非線形力学のより複雑な設定では、トランスフォーマーも有望な結果を示す。実験結果を支援するため,変圧器が所望の余剰リスクを達成するのに必要なトレーニングデータの量を定量化する統計的保証を提供する。最後に,性能低下につながる2つの問題のクラスを特定し,制御と推定にトランスフォーマーを使用する場合の注意点を強調することで,いくつかの制約を指摘した。 Transformer models have shown great success in natural language processing; however, their potential remains mostly unexplored for dynamical systems. In this work, we investigate the optimal output estimation problem using transformers, which generate output predictions using all the past ones. Particularly, we train the transformer using various distinct systems and then evaluate the performance on unseen systems with unknown dynamics. Empirically, the trained transformer adapts exceedingly well to different unseen systems and even matches the optimal performance given by the Kalman filter for linear systems. In more complex settings with non-i.i.d. noise, time-varying dynamics, and nonlinear dynamics like a quadrotor system with unknown parameters, transformers also demonstrate promising results. To support our experimental findings, we provide statistical guarantees that quantify the amount of training data required for the transformer to achieve a desired excess risk. Finally, we point out some limitations by identifying two classes of problems that lead to degraded performance, highlighting the need for caution when using transformers for control and estimation.	翻訳日:2024-06-13 23:42:48 公開日:2024-06-11
# 分類におけるスプーラス相関の測定--英訳における「クレバーハンズ」について Measuring Spurious Correlation in Classification: 'Clever Hans' in Translationese ( http://arxiv.org/abs/2308.13170v2 ) ライセンス: Link先を確認	Angana Borah, Daria Pylypenko, Cristina Espana-Bonet, Josef van Genabith,	(参考訳) 近年の研究では、BERTをベースとした分類器が、真の翻訳信号ではなく、データとターゲット分類ラベルの間の素早い相関、特にトピック情報に乗じている、高性能なニューラル翻訳分類器における「クレバーハンズ」の挙動を示す証拠が示されている。翻訳信号は微妙な(特に専門的な翻訳のために)、ジャンル、スタイル、著者、特にトピックといった他の多くの信号と競合する。このことは、特に微妙なターゲット信号や挑戦的な(リソースの低い)データ設定において、分類器のパフォーマンスが、実際に分類器がターゲットとする信号と、データの急激な相関によるものであるという一般的な疑問を提起する。トピックベースの素早い相関に注目し、質問に2つの方向からアプローチする。一急激な話題情報及びデータにおけるその分布に関する知識がない場合。 (II) 突発的トピック相関の性質について, 若干の指標が得られた。目的 (i)データ中の素早い話題情報の指標として,教師なしトピックと対象分類ラベルとのアライメントを捉えた第一原理から尺度を作成する。本手法はクラスタリングにおける純度と同一であることを示し,分類のための「トピックフロア」(「ノイズフロア」など)を提案する。目的 (II) 既知の話題担体の分類におけるマスキングについて検討する。両方 (i)および (二)定量化及び定量化に寄与する (ii)急激な相関を緩和する。 Recent work has shown evidence of 'Clever Hans' behavior in high-performance neural translationese classifiers, where BERT-based classifiers capitalize on spurious correlations, in particular topic information, between data and target classification labels, rather than genuine translationese signals. Translationese signals are subtle (especially for professional translation) and compete with many other signals in the data such as genre, style, author, and, in particular, topic. This raises the general question of how much of the performance of a classifier is really due to spurious correlations in the data versus the signals actually targeted for by the classifier, especially for subtle target signals and in challenging (low resource) data settings. We focus on topic-based spurious correlation and approach the question from two directions: (i) where we have no knowledge about spurious topic information and its distribution in the data, (ii) where we have some indication about the nature of spurious topic correlations. For (i) we develop a measure from first principles capturing alignment of unsupervised topics with target classification labels as an indication of spurious topic information in the data. We show that our measure is the same as purity in clustering and propose a 'topic floor' (as in a 'noise floor') for classification. For (ii) we investigate masking of known spurious topic carriers in classification. Both (i) and (ii) contribute to quantifying and (ii) to mitigating spurious correlations.	翻訳日:2024-06-13 23:42:48 公開日:2024-06-11
# ニューラルネットワークにおける損失平坦性から圧縮表現への簡単な接続 A simple connection from loss flatness to compressed representations in neural networks ( http://arxiv.org/abs/2310.01770v3 ) ライセンス: Link先を確認	Shirui Chen, Stefano Recanatesi, Eric Shea-Brown,	(参考訳) ディープニューラルネットワークの一般化能力は、パラメータ空間における損失ランドスケープの形状に基づくものと、特徴空間における表現多様体の構造に基づくもの(つまり、単位活動の空間における)という、少なくとも2つの異なるアプローチのカテゴリを含む様々な方法で研究されてきた。これら2つのアプローチは関連しているが、これらは明確に研究されることはめったにない。ここでは、このギャップを埋める分析について述べる。ディープニューラルネットワークにおける学習の最終段階において、ニューラルネットワークの多様体の圧縮は、SGDが探索したミニマのまわりの損失の平坦さと相関することを示す。この相関関係は比較的単純な数学的関係によって予測される: 平坦な損失は、ニューラル表現の圧縮指標上の下限に対応する。本研究は,Ma と Ying による線形安定性の洞察に基づくもので,様々な圧縮測定値と鋭さを含む量の不等式を導出する。実験によって得られた不等式は,複数の実験環境における表現圧縮と損失シャープネスの連続的な正の相関を予測した。全体として、パラメータと特徴空間の両方におけるニューラルネットワークの一般化に関する双対視点を推し進める。 The generalization capacity of deep neural networks has been studied in a variety of ways, including at least two distinct categories of approaches: one based on the shape of the loss landscape in parameter space, and the other based on the structure of the representation manifold in feature space (that is, in the space of unit activities). Although these two approaches are related, they are rarely studied together explicitly. Here, we present an analysis that bridges this gap. We show that in the final phase of learning in deep neural networks, the compression of the manifold of neural representations correlates with the flatness of the loss around the minima explored by SGD. This correlation is predicted by a relatively simple mathematical relationship: a flatter loss corresponds to a lower upper bound on the compression metrics of neural representations. Our work builds upon the linear stability insight by Ma and Ying, deriving inequalities between various compression metrics and quantities involving sharpness. Empirically, our derived inequality predicts a consistently positive correlation between representation compression and loss sharpness in multiple experimental settings. Overall, we advance a dual perspective on generalization in neural networks in both parameter and feature space.	翻訳日:2024-06-13 23:33:02 公開日:2024-06-11
# インスタンスにもっと注意が必要だ - ループ収量の改善によるゼロショットパフォーマンス向上のために,LLMを使用したインスタンスのプロンプトを書き換える Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance ( http://arxiv.org/abs/2310.02107v4 ) ライセンス: Link先を確認	Saurabh Srivastava, Chengyue Huang, Weiguo Fan, Ziyu Yao,	(参考訳) 大規模言語モデル(LLM)はゼロショットタスクのパフォーマンスに革命をもたらし、タスク固有のアノテーションの必要性を軽減し、タスクの一般化性を高めている。その進歩にもかかわらず、「ステップ・バイ・ステップ」のようなトリガーフレーズを用いた現在の手法は依然として限られている。 PRomPTedは「ループ内のLLM」というイノベーティブな方法に従って、個々のタスクインスタンスに対してゼロショットプロンプトを最適化する手法である。 GPT-4に基づく13のデータセットと10のタスクタイプにわたる包括的な評価により、PRomPTedは、入力プロンプトの代わりにタスク出力を洗練する、単純なゼロショットアプローチと強力なベースライン(すなわち「出力リファインメント」)の両方を著しく上回っていることが明らかとなった。実験の結果, 比較的弱い GPT-3.5 に対して, この利点が一般化されることが確認された。さらに興味深いことに, GPT-3.5 を用いてより強力な GPT-4 のプロンプトを書き換えるだけでなく, 時折 GPT-4 をプロンプトリライタとして使用する効果を上回ることが判明した。本研究は, ゼロショットLDMの性能向上だけでなく, より弱めのLCMを監視できる可能性も示しており, 最近では注目されている。最後に,Mistral 7B や Mixtral 8x7B などのオープンソース LLM の利点の一般化を確認した。 Large language models (LLMs) have revolutionized zero-shot task performance, mitigating the need for task-specific annotations while enhancing task generalizability. Despite its advancements, current methods using trigger phrases such as "Let's think step by step" remain limited. This study introduces PRomPTed, an approach that optimizes the zero-shot prompts for individual task instances following an innovative manner of "LLMs in the loop". Our comprehensive evaluation across 13 datasets and 10 task types based on GPT-4 reveals that PRomPTed significantly outperforms both the naive zero-shot approaches and a strong baseline (i.e., "Output Refinement") which refines the task output instead of the input prompt. Our experimental results also confirmed the generalization of this advantage to the relatively weaker GPT-3.5. Even more intriguingly, we found that leveraging GPT-3.5 to rewrite prompts for the stronger GPT-4 not only matches but occasionally exceeds the efficacy of using GPT-4 as the prompt rewriter. Our research thus presents a huge value in not only enhancing zero-shot LLM performance but also potentially enabling supervising LLMs with their weaker counterparts, a capability attracting much interest recently. Finally, our additional experiments confirm the generalization of the advantages to open-source LLMs such as Mistral 7B and Mixtral 8x7B.	翻訳日:2024-06-13 23:33:02 公開日:2024-06-11
# RIR-SF:マルチチャンネルマルチスピーカシナリオにおけるターゲット音声認識のための室内インパルス応答に基づく空間的特徴 RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios ( http://arxiv.org/abs/2311.00146v2 ) ライセンス: Link先を確認	Yiwen Shao, Shi-Xiong Zhang, Dong Yu,	(参考訳) マルチトーカー録音における音声認識(ASR)は困難である。マルチチャンネルオーディオとビジュアルキューの3次元空間データを用いた現在の手法は、主にターゲット話者からの直接波に焦点を合わせ、反射波の影響を見越して、残響環境における性能を阻害する。 RIR-SFは, 話者の位置, 室内音響, リフレクションダイナミクスを生かした, 室内インパルス応答(RIR)に基づく空間的特徴である。 RIR-SFは従来の3次元空間特性よりも優れており、理論的および経験的性能が優れている。また、RIR-SFのための最適化されたオールニューラルマルチチャネルASRフレームワークを提案し、マルチチャネル設定におけるターゲット話者ASRに対するCERの相対的な21.3倍の削減を実現した。 RIR-SFは認識精度を高め、従来の手法の限界を克服し、高残響シナリオの堅牢性を示す。 Automatic speech recognition (ASR) on multi-talker recordings is challenging. Current methods using 3D spatial data from multi-channel audio and visual cues focus mainly on direct waves from the target speaker, overlooking reflection wave impacts, which hinders performance in reverberant environments. Our research introduces RIR-SF, a novel spatial feature based on room impulse response (RIR) that leverages the speaker's position, room acoustics, and reflection dynamics. RIR-SF significantly outperforms traditional 3D spatial features, showing superior theoretical and empirical performance. We also propose an optimized all-neural multi-channel ASR framework for RIR-SF, achieving a relative 21.3\% reduction in CER for target speaker ASR in multi-channel settings. RIR-SF enhances recognition accuracy and demonstrates robustness in high-reverberation scenarios, overcoming the limitations of previous methods.	翻訳日:2024-06-13 23:33:02 公開日:2024-06-11
# 効率的なファインチューニングのための勾配型パラメータ選択法 Gradient-based Parameter Selection for Efficient Fine-Tuning ( http://arxiv.org/abs/2312.10136v3 ) ライセンス: Link先を確認	Zhi Zhang, Qizhe Zhang, Zijun Gao, Renrui Zhang, Ekaterina Shutova, Shiji Zhou, Shanghang Zhang,	(参考訳) 事前訓練されたモデルのサイズが大きくなるにつれて、さまざまな下流タスクのパラメータをすべて微調整して保存することは、コストがかかり、実現不可能になります。本稿では, パラメータ効率のよいパラメータ選択法, Gradient-based Parameter Selection (GPS) を提案し, 既訓練モデルから選択したパラメータを調整し, 残りのモデルを凍結したままにしておくことで, フルモデルファインチューニング法と比較して, 同様の, あるいは優れた性能が得られることを示した。本手法は,既存のパラメータ・パラメータ・効率的な微調整手法と異なり,学習段階と推論段階の両方で追加のパラメータや計算コストを導入していない。もう1つの利点は、モデルに依存しない非破壊的な性質であり、特定のモデルに固有の他の設計の必要性を排除している。完全な微調整と比較すると、GPSは3.33%(91.78%対88.45%、FGVC)と9.61%(73.1%対65.57%、VTAB)の精度向上を実現し、24以上の画像分類タスクにおいて、トレーニング済みモデルのパラメータの6%しか調整していない。さらに,既存のPEFT法と比較すると,GPSは最先端の性能を実現している。 With the growing size of pre-trained models, full fine-tuning and storing all the parameters for various downstream tasks is costly and infeasible. In this paper, we propose a new parameter-efficient fine-tuning method, Gradient-based Parameter Selection (GPS), demonstrating that only tuning a few selected parameters from the pre-trained model while keeping the remainder of the model frozen can generate similar or better performance compared with the full model fine-tuning method. Different from the existing popular and state-of-the-art parameter-efficient fine-tuning approaches, our method does not introduce any additional parameters and computational costs during both the training and inference stages. Another advantage is the model-agnostic and non-destructive property, which eliminates the need for any other design specific to a particular model. Compared with the full fine-tuning, GPS achieves 3.33% (91.78% vs. 88.45%, FGVC) and 9.61% (73.1% vs. 65.57%, VTAB) improvement of the accuracy with tuning only 0.36% parameters of the pre-trained model on average over 24 image classification tasks; it also demonstrates a significant improvement of 17% and 16.8% in mDice and mIoU, respectively, on medical image segmentation task. Moreover, GPS achieves state-of-the-art performance compared with existing PEFT methods.	翻訳日:2024-06-13 23:13:33 公開日:2024-06-11
# 流通シフト下における私的移動学習のための公共表現のメリットについて On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift ( http://arxiv.org/abs/2312.15551v3 ) ライセンス: Link先を確認	Pratiksha Thaker, Amrith Setlur, Zhiwei Steven Wu, Virginia Smith,	(参考訳) 公的な事前訓練は、微分プライベートモデルトレーニングを改善するための有望なアプローチである。しかし、近年の研究では、このパラダイムを研究する多くの肯定的な研究成果は、分散タスクのみを考慮しており、事前学習データと微調整データの間に分散シフトがある設定には適用できない可能性がある、と指摘している。本研究では、公開データからのゼロショットのパフォーマンスとプライベートデータによるゼロショットのトレーニングの両方が、不可能なほど弱い結果をもたらすような、大規模な分散シフトの設定においても、3つのタスクを経験的に比較し、パブリック機能は、スクラッチからプライベートトレーニングよりも最大67倍まで、プライベートトレーニングの精度を向上させることができることを示す。この現象の理論的説明として、公開データとプライベートデータが低次元表現を共有している場合、公開データのみからプライベートタスクを学習できない場合でも、公開表現はプライベートトレーニングのサンプル複雑さを改善することができることを示す。いずれにせよ,我々の結果は,公開データによって,極端分布シフトの現実的な設定において,私的なトレーニングを現実的に行うことができることを示すものである。 Public pretraining is a promising approach to improve differentially private model training. However, recent work has noted that many positive research results studying this paradigm only consider in-distribution tasks, and may not apply to settings where there is distribution shift between the pretraining and finetuning data -- a scenario that is likely when finetuning private tasks due to the sensitive nature of the data. In this work, we show empirically across three tasks that even in settings with large distribution shift, where both zero-shot performance from public data and training from scratch with private data give unusably weak results, public features can in fact improve private training accuracy by up to 67\% over private training from scratch. We provide a theoretical explanation for this phenomenon, showing that if the public and private data share a low-dimensional representation, public representations can improve the sample complexity of private training even if it is impossible to learn the private task from the public data alone. Altogether, our results provide evidence that public data can indeed make private training practical in realistic settings of extreme distribution shift.	翻訳日:2024-06-13 23:13:33 公開日:2024-06-11
# 理科教育評価の自動化のためのLLMの知識蒸留 Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments ( http://arxiv.org/abs/2312.15842v3 ) ライセンス: Link先を確認	Ehsan Latif, Luyang Fang, Ping Ma, Xiaoming Zhai,	(参考訳) 本研究では, より小さく, より効率的かつ正確なニューラルネットワークへの微調整型大言語モデル(LLM)の知識蒸留(KD)手法を提案する。リソース制約のあるデバイスにこれらのモデルをデプロイするという課題を特にターゲットとしています。本手法は,教師モデルとして機能するLSMの予測確率(ソフトラベル)を用いて,より小さな学生モデル(ニューラルネットワーク)を訓練することを含む。これは、LLMの出力確率から学習するために調整された特殊な損失関数によって達成され、学生モデルが教師のパフォーマンスを忠実に模倣することを保証する。 KD手法の性能を検証するために,6,684名の学生による科学質問に対する回答と,人間の専門家が評価した学生による回答を用いた3つの数学的推論データセットを含む,大規模なデータセット7Tを用いた。我々は,最先端(SOTA)蒸留モデル,TinyBERT,人工ニューラルネットワーク(ANN)モデルと比較した。その結果,KD法はANN法とTinyBERT法に比較して評価精度が3%,TinyBERT法が2%高く,教師モデルに比較して精度が高かった。さらに、生徒モデルのサイズは0.03Mで、パラメータの4000倍小さく、x10は教師モデルとTinyBERTよりも高速である。この研究の意義は、高度なAI技術を一般的な教育環境、特に自動スコアリングで利用できるようにすることにある。 This study proposes a method for knowledge distillation (KD) of fine-tuned Large Language Models (LLMs) into smaller, more efficient, and accurate neural networks. We specifically target the challenge of deploying these models on resource-constrained devices. Our methodology involves training the smaller student model (Neural Network) using the prediction probabilities (as soft labels) of the LLM, which serves as a teacher model. This is achieved through a specialized loss function tailored to learn from the LLM's output probabilities, ensuring that the student model closely mimics the teacher's performance. To validate the performance of the KD approach, we utilized a large dataset, 7T, containing 6,684 student-written responses to science questions and three mathematical reasoning datasets with student-written responses graded by human experts. We compared accuracy with state-of-the-art (SOTA) distilled models, TinyBERT, and artificial neural network (ANN) models. Results have shown that the KD approach has 3% and 2% higher scoring accuracy than ANN and TinyBERT, respectively, and comparable accuracy to the teacher model. Furthermore, the student model size is 0.03M, 4,000 times smaller in parameters and x10 faster in inferencing than the teacher model and TinyBERT, respectively. The significance of this research lies in its potential to make advanced AI technologies accessible in typical educational settings, particularly for automatic scoring.	翻訳日:2024-06-13 23:13:33 公開日:2024-06-11
# 大型スピン猫符号を用いたフォールトトレラント量子計算 Fault-tolerant quantum computation using large spin cat-codes ( http://arxiv.org/abs/2401.04271v4 ) ライセンス: Link先を確認	Sivaprasad Omanakuttan, Vikas Buchemmavari, Jonathan A. Gross, Ivan H Deutsch, Milad Marvian,	(参考訳) 本研究では、スピンキャット符号を用いて、大きなスピンキューディットに符号化された量子ビットに基づいて、フォールトトレラントな量子誤り訂正プロトコルを構築する。これにより、支配的な誤差源、すなわち角運動量の成分において線型あるいは二次的な誤差演算子として表現できる過程を補正することができる。このような符号は、非構造ノイズモデルのために設計された符号に比べて、優れたしきい値と低いリソースオーバーヘッドを示す。ゲート操作における支配的なエラーを保存するため、適切なユニバーサルゲートセットを同定する。鍵となる構成要素は、球面テンソル作用素のランクを保存するCNOTゲートである。支配的な誤差を位相誤差と振幅誤差と分類し、量子ビットの位相フリップ誤差に類似した位相誤差を効果的に補正できることを示す。さらに,シンドローム測定に頼らずに振幅誤差に対処する計測自由誤差補正手法を提案する。論理的CNOTゲート誤差の詳細な解析により、スピンキャット符号化における誤り訂正の耐故障しきい値が標準量子ビット符号化のそれを超えることが確認される。我々は、量子制御とライドベルク封鎖を用いて、ランク保存型CNOTゲートを含む普遍ゲートセットを生成する方法を示す。これらの知見は、量子情報処理において、耐障害性、高いしきい値、リソースオーバーヘッドを低減できる可能性を持つ、大きなスピンで量子ビットを符号化する方法を舗装している。 We construct a fault-tolerant quantum error-correcting protocol based on a qubit encoded in a large spin qudit using a spin-cat code, analogous to the continuous variable cat encoding. With this, we can correct the dominant error sources, namely processes that can be expressed as error operators that are linear or quadratic in the components of angular momentum. Such codes tailored to dominant error sources {can} exhibit superior thresholds and lower resource overheads when compared to those designed for unstructured noise models. To preserve the dominant errors during gate operations, we identify a suitable universal gate set. A key component is the CNOT gate that preserves the rank of spherical tensor operators. Categorizing the dominant errors as phase and amplitude errors, we demonstrate how phase errors, analogous to phase-flip errors for qubits, can be effectively corrected. Furthermore, we propose a measurement-free error correction scheme to address amplitude errors without relying on syndrome measurements. Through an in-depth analysis of logical CNOT gate errors, we establish that the fault-tolerant threshold for error correction in the spin-cat encoding surpasses that of standard qubit-based encodings. We consider a specific implementation based on neutral-atom quantum computing, with qudits encoded in the nuclear spin of $^{87}$Sr, and show how to generate the universal gate set, including the rank-preserving CNOT gate, using quantum control and the Rydberg blockade. These findings pave the way for encoding a qubit in a large spin with the potential to achieve fault tolerance, high threshold, and reduced resource overhead in quantum information processing.	翻訳日:2024-06-13 23:13:33 公開日:2024-06-11
# 自由フェルミオン系の絡み合い、信号処理および代数的コンビネータ Entanglement of free-fermion systems, signal processing and algebraic combinatorics ( http://arxiv.org/abs/2401.07150v2 ) ライセンス: Link先を確認	Pierre-Antoine Bernard, Nicolas Crampé, Rafael I. Nepomechie, Gilles Parez, Luc Vinet,	(参考訳) 本稿では,信号処理や代数コンビネータの手法を生かしたグラフ上の自由フェルミオン系の絡み合いに関する最近の研究について述べる。一方、時間と帯域制限の問題と平行して、双スペクトル状態において切断された相関行列と交換する三角行列を求め、他方では、$P$-ポリノミカルなアソシエーションスキームの文脈で生じるテルウィガー代数の既約分解は、単純化された枠組みをもたらす。 This paper offers a review of recent studies on the entanglement of free-fermion systems on graphs that take advantage of methods pertaining to signal processing and algebraic combinatorics. On the one hand, a parallel with time and band limiting problems is used to obtain a tridiagonal matrix commuting with the chopped correlation matrix in bispectral situations and on the other, the irreducible decomposition of the Terwilliger algebra arising in the context of $P$-polynomial association schemes is seen to yield a simplifying framework.	翻訳日:2024-06-13 23:03:49 公開日:2024-06-11
# 2次元量子多体基底状態のバンバン準備--2次元テンソルネットワークを用いたアルゴリズムの最適化 Bang-bang preparation of quantum many-body ground states in two dimensions: optimization of the algorithm with a two-dimensional tensor network ( http://arxiv.org/abs/2401.09158v4 ) ライセンス: Link先を確認	Yintai Zhang, Jacek Dziarmaga,	(参考訳) バンバン(BB)アルゴリズムは、初期積状態が$H_1$と$H_2$の間で交互に変化することによって、2次元(2次元)量子多体ハミルトンの基底状態を作成する。近傍テンソル更新を用いて、BB進化を無限対絡み状態 (iPEPS) でシミュレートする。交代シーケンスは、最終エネルギーをコスト関数として最適化する。エネルギーは、その安定性のために接空間法で計算される。この手法は、iPEPSの変分最適化により得られた基底状態に対して、量子臨界点付近の2次元逆場量子イジングモデルでベンチマークされる。最適BB配列は、基底状態の量子アニールまたは断熱処理(AP)をシミュレートする配列と非摂動的に異なる。最適BBエネルギーは最適APエネルギーよりもはるかに速いバン数と収束する。 A bang-bang (BB) algorithm prepares the ground state of a two-dimensional (2D) quantum many-body Hamiltonian $H=H_1+H_2$ by evolving an initial product state alternating between $H_1$ and $H_2$. We use the neighborhood tensor update to simulate the BB evolution with an infinite pair-entangled projected state (iPEPS). The alternating sequence is optimized with the final energy as a cost function. The energy is calculated with the tangent space methods for the sake of their stability. The method is benchmarked in the 2D transverse field quantum Ising model near its quantum critical point against a ground state obtained by variational optimization of the iPEPS. The optimal BB sequence differs non-perturbatively from a sequence simulating quantum annealing or adiabatic preparation (AP) of the ground state. The optimal BB energy converges with the number of bangs much faster than the optimal AP energy.	翻訳日:2024-06-13 23:03:49 公開日:2024-06-11
# 生成コンテキストによるブラインド: 言語モデルと生成コンテキストのマージは、知識衝突時にどのように行われるか? Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts? ( http://arxiv.org/abs/2401.11911v6 ) ライセンス: Link先を確認	Hexiang Tan, Fei Sun, Wanli Yang, Yuanzhuo Wang, Qi Cao, Xueqi Cheng,	(参考訳) 補助情報は、LLM(Large Language Models)の拡張の鍵となっているが、LLMがこれらのコンテキストをどのように統合するかについては、特にLLMが生成したコンテキストと外部ソースから取得したコンテキストについてはあまり知られていない。そこで本研究では,LLMの応答が生成した文脈と検索した文脈のいずれに起因しているかを特定するための体系的な枠組みを定式化する。応答の起源を容易に追跡するために,各質問は生成したコンテキストと検索したコンテキストの両方にペアリングされるが,その中の1つだけが正解である。実験の結果,複数のLDM (GPT-4/3.5, Llama2) において, 誤った情報を提供する場合でも, 生成コンテキストを優先する有意なバイアスが認められた。さらに、このバイアスに寄与する2つの重要な要因を特定します。 i) LLMが生成する文脈は,通常,質問とより類似し,選択される可能性を高める。二検索した文脈におけるセグメンテーションのプロセスは、その完全性を損なうため、LLMの完全利用を阻害する。我々の分析は,LLMが様々な文脈を融合する方法の理解を深め,現在のLLM拡張法を進展させる上で貴重な洞察を提供し,LLM検索における誤情報の発生リスクを強調している。 While auxiliary information has become a key to enhancing Large Language Models (LLMs), relatively little is known about how LLMs merge these contexts, specifically contexts generated by LLMs and those retrieved from external sources. To investigate this, we formulate a systematic framework to identify whether LLMs' responses are attributed to either generated or retrieved contexts. To easily trace the origin of the response, we construct datasets with conflicting contexts, i.e., each question is paired with both generated and retrieved contexts, yet only one of them contains the correct answer. Our experiments reveal a significant bias in several LLMs (GPT-4/3.5 and Llama2) to favor generated contexts, even when they provide incorrect information. We further identify two key factors contributing to this bias: i) contexts generated by LLMs typically show greater similarity to the questions, increasing their likelihood of being selected; ii) the segmentation process used in retrieved contexts disrupts their completeness, thereby hindering their full utilization in LLMs. Our analysis enhances the understanding of how LLMs merge diverse contexts, offers valuable insights for advancing current LLM augmentation methods, and highlights the risk of generated misinformation for retrieval-augmented LLMs.	翻訳日:2024-06-13 23:03:49 公開日:2024-06-11
# 浅部ReLU様ニューラルネットワークのランドスケープ:静止点,サドルエスケープ,ネットワーク埋め込み Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escaping, and Network Embedding ( http://arxiv.org/abs/2402.05626v4 ) ライセンス: Link先を確認	Zhengqing Wu, Berfin Simsek, Francois Ged,	(参考訳) 本稿では,経験的二乗損失を学習したReLU様活性化関数を持つ一層ニューラルネットワークの損失状況について検討する。アクティベーション関数は微分不可能であるため、固定点を完全に特徴づける方法は今のところ不明である。非微分可能ケースと微分可能ケースの両方に適用可能な定常条件を提案する。さらに、定常点が一階条件で定義される「エスケープニューロン」を含まない場合、局所最小値でなければならないことを示す。さらに、スカラーアウトプットの場合、エスケープニューロンの存在は、静止点が局所的な最小値でないことを保証している。その結果,浅部ReLU様ネットワークに対する無限小の初期化から始まり,サドルからサドルまでのトレーニングプロセスの記述を洗練し,サドルから脱出したニューロンのパラメータ変化と直接関連付けることができた。さらに、より広いネットワーク内でより狭いネットワークをインスタンス化するネットワーク埋め込みが、静止点を再設定する方法について、十分に議論することができる。 In this paper, we investigate the loss landscape of one-hidden-layer neural networks with ReLU-like activation functions trained with the empirical squared loss. As the activation function is non-differentiable, it is so far unclear how to completely characterize the stationary points. We propose the conditions for stationarity that apply to both non-differentiable and differentiable cases. Additionally, we show that, if a stationary point does not contain "escape neurons", which are defined with first-order conditions, then it must be a local minimum. Moreover, for the scalar-output case, the presence of an escape neuron guarantees that the stationary point is not a local minimum. Our results refine the description of the saddle-to-saddle training process starting from infinitesimally small (vanishing) initialization for shallow ReLU-like networks, linking saddle escaping directly with the parameter changes of escape neurons. Moreover, we are also able to fully discuss how network embedding, which is to instantiate a narrower network within a wider network, reshapes the stationary points.	翻訳日:2024-06-13 22:53:54 公開日:2024-06-11
# チュニジア・アラビアの正規化オルソグラフィー Normalized Orthography for Tunisian Arabic ( http://arxiv.org/abs/2402.12940v2 ) ライセンス: Link先を確認	Houcemeddine Turki, Kawthar Ellouze, Hager Ben Ammar, Mohamed Ali Hadj Taieb, Imed Adel, Mohamed Ben Aouicha, Pier Luigi Farri, Abderrezak Bennour,	(参考訳) チュニジア・アラビア(英語: Tunisian Arabic、ISO 693-3: aeb)は、チュニジア原産で、様々な歴史的影響を受け、アラビア語に由来する。本研究は、チュニジア・アラビア語をアラビア語で翻訳するためのCODAガイドラインの適応である「チュニジア・アラビア語版Normalized Orthography for Tunisian Arabic」(NOTA)を紹介する。ユーザフレンドリさと一貫性を確保することで、言語リソースの開発を強化することを目的としている。改訂された標準は、チュニジアの音韻学と形態学を正確に表現する上での課題に対処し、現代標準アラビア語に基づく転写の問題を修正した。 Tunisian Arabic (ISO 693-3: aeb) isa distinct variety native to Tunisia, derived from Arabic and enriched by various historical influences. This research introduces the "Normalized Orthography for Tunisian Arabic" (NOTA), an adaptation of CODA guidelines for transcribing Tunisian Arabic using Arabic script. The aim is to enhance language resource development by ensuring user-friendliness and consistency. The updated standard addresses challenges in accurately representing Tunisian phonology and morphology, correcting issues from transcriptions based on Modern Standard Arabic.	翻訳日:2024-06-13 22:44:06 公開日:2024-06-11
# ターゲットデータサブセット選択のためのサブモジュール情報対策の理論解析 Theoretical Analysis of Submodular Information Measures for Targeted Data Subset Selection ( http://arxiv.org/abs/2402.13454v2 ) ライセンス: Link先を確認	Nathan Beck, Truong Pham, Rishabh Iyer,	(参考訳) 機械学習タスク全体で使用されているデータの量が増えると、データの特定のサブセットをターゲットする能力がより重要になる。この機能を実現するために、最近提案されたsubmodular Mutual Information (SMI) は、文献の様々なタスクに効果的に適用され、典型的なクエリセットの助けを借りてターゲットサブセットの選択を行う。しかし、これらすべての研究は、サブセットの関連性や対象データのカバレッジに対する感度の観点から、SMIの理論的保証を提供するには不十分である。対象データの関連性やカバレッジに関連する量に関する類似性に基づく境界を導出することで,このような保証を初めて提供する。これらの境界により、複数のアプリケーションで経験的に成功したSMI関数は、理論的には、クエリ関連性およびクエリカバレッジが良好であることを示す。 With increasing volume of data being used across machine learning tasks, the capability to target specific subsets of data becomes more important. To aid in this capability, the recently proposed Submodular Mutual Information (SMI) has been effectively applied across numerous tasks in literature to perform targeted subset selection with the aid of a exemplar query set. However, all such works are deficient in providing theoretical guarantees for SMI in terms of its sensitivity to a subset's relevance and coverage of the targeted data. For the first time, we provide such guarantees by deriving similarity-based bounds on quantities related to relevance and coverage of the targeted data. With these bounds, we show that the SMI functions, which have empirically shown success in multiple applications, are theoretically sound in achieving good query relevance and query coverage.	翻訳日:2024-06-13 22:44:06 公開日:2024-06-11
# 並列文脈符号化を用いたLong-Context言語モデリング Long-Context Language Modeling with Parallel Context Encoding ( http://arxiv.org/abs/2402.16617v2 ) ライセンス: Link先を確認	Howard Yen, Tianyu Gao, Danqi Chen,	(参考訳) 大きな言語モデル(LLM)を拡張して、より長い入力を処理することは、幅広いアプリケーションにとって不可欠である。しかし、トランスのかなりの計算コストと位置符号化の限定的な一般化により、コンテキストウィンドウのサイズは制限される。既存のデコーダのみのLLMに適用可能なフレームワークであるCEPE(Context Expansion with Parallel Encoding)を導入し、コンテキストウィンドウを拡張する。 CEPEは小さなエンコーダを使用して長い入力チャンクをチャンク単位で処理し、冷凍復号器はクロスアテンションを介して追加のコンテキストを利用することができる。 CEPEは効率的で汎用的で汎用的であり、8Kの文書で訓練され、LLAMA-2のコンテキストウィンドウを128Kのトークンに拡張し、メモリの1/6のスループットを10倍提供する。 CEPEは、言語モデリングとコンテキスト内学習に強いパフォーマンスをもたらす。 CEPEは検索拡張アプリケーションでも優れており、既存の長期コンテキストモデルは検索コンテキストで縮退する。さらに、ラベルなしデータのみを用いて命令調整モデルのコンテキストウィンドウを拡張するCEPE変異を導入し、LLAMA-2-CHAT上での有効性を示し、下流タスクにおいて非常に長いコンテキストを活用できる強力な命令追従モデルを実現する。 Extending large language models (LLMs) to process longer inputs is crucial for a wide range of applications. However, the substantial computational cost of transformers and limited generalization of positional encoding restrict the size of their context window. We introduce Context Expansion with Parallel Encoding (CEPE), a framework that can be applied to any existing decoder-only LLMs to extend their context window. CEPE employs a small encoder to process long inputs chunk by chunk, enabling the frozen decoder to utilize additional contexts via cross-attention. CEPE is efficient, generalizable, and versatile: trained with 8K-token documents, it extends the context window of LLAMA-2 to 128K tokens, offering 10x the throughput with only 1/6 of the memory. CEPE yields strong performance on language modeling and in-context learning. CEPE also excels in retrieval-augmented applications, while existing long-context models degenerate with retrieved contexts. We further introduce a CEPE variant that can extend the context window of instruction-tuned models using only unlabeled data, and showcase its effectiveness on LLAMA-2-CHAT, leading to a strong instruction-following model that can leverage very long contexts on downstream tasks.	翻訳日:2024-06-13 22:44:06 公開日:2024-06-11
# Larimar: エピソードメモリ制御を備えた大規模言語モデル Larimar: Large Language Models with Episodic Memory Control ( http://arxiv.org/abs/2403.11901v2 ) ライセンス: Link先を確認	Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, Soham Dan, Pin-Yu Chen,	(参考訳) LLM(Large Language Models)に格納された知識の効率的かつ正確な更新は、今日の最も急進的な研究課題の1つである。本稿では,Larimarについて述べる。Larimarは,分散エピソードメモリを用いてLLMを拡張するための,脳にインスパイアされた新しいアーキテクチャである。 Larimarのメモリは、計算コストのかかるリトレーニングや微調整を必要とせずに、動的でワンショットの知識更新を可能にする。複数のファクト編集ベンチマークの実験結果から、Larimarは、挑戦的なシーケンシャルな編集セットアップであっても、最も競争力のあるベースラインに匹敵する精度を達成できただけでなく、ベースLLMに依存して8～10倍のスピードアップを実現している。さらに,Larimarを用いた情報漏洩防止,入力コンテキスト長の一般化のメカニズムを提案し,その有効性を示す。私たちのコードはhttps://github.com/IBM/larimarで利用可能です。 Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 8-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting, information leakage prevention, and input context length generalization with Larimar and show their effectiveness. Our code is available at https://github.com/IBM/larimar	翻訳日:2024-06-13 22:34:15 公開日:2024-06-11
# ニューラルネットワークによる最適化のための自己改善: 置き換えせずに、改善されたサンプル Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement ( http://arxiv.org/abs/2403.15180v2 ) ライセンス: Link先を確認	Jonathan Pirnay, Dominik G. Grimm,	(参考訳) エンドツーエンド構築型ニューラルネットワーク最適化の現在の手法は、通常、専門家ソリューションからの行動クローニングや強化学習からのポリシー勾配手法を用いてポリシーを訓練する。行動クローニングは単純であるが、高価な専門家のソリューションが必要であり、ポリシー勾配法は計算的に要求され、微調整が複雑であることが多い。本研究では、各エポックにおける現在のモデルを用いてランダムなインスタンスに対する複数のソリューションをサンプリングし、その後、教師付き模倣学習の専門的軌跡として最適解を選択することにより、これら2つを橋渡しし、トレーニングプロセスを簡素化する。最小限のサンプリングで徐々に改善する手法を実現するため,提案手法では,ラウンドワイド・確率的ビームサーチと,証明可能なポリシー改善から得られた更新戦略を組み合わせた手法を提案する。この戦略は、ほとんど計算オーバーヘッドのないサンプルシーケンスの利点を利用して、ラウンド間のポリシーを洗練させる。我々は,トラベリングセールスマン問題とキャパシタントカールーティング問題に対する我々のアプローチを評価する。本手法で訓練したモデルでは,専門家データと同等の性能と一般化を実現している。さらに,この手法をトランスフォーマーアーキテクチャを用いてジョブショップスケジューリング問題に適用し,既存の最先端手法よりも広いマージンで性能を向上する。 Current methods for end-to-end constructive neural combinatorial optimization usually train a policy using behavior cloning from expert solutions or policy gradient methods from reinforcement learning. While behavior cloning is straightforward, it requires expensive expert solutions, and policy gradient methods are often computationally demanding and complex to fine-tune. In this work, we bridge the two and simplify the training process by sampling multiple solutions for random instances using the current model in each epoch and then selecting the best solution as an expert trajectory for supervised imitation learning. To achieve progressively improving solutions with minimal sampling, we introduce a method that combines round-wise Stochastic Beam Search with an update strategy derived from a provable policy improvement. This strategy refines the policy between rounds by utilizing the advantage of the sampled sequences with almost no computational overhead. We evaluate our approach on the Traveling Salesman Problem and the Capacitated Vehicle Routing Problem. The models trained with our method achieve comparable performance and generalization to those trained with expert data. Additionally, we apply our method to the Job Shop Scheduling Problem using a transformer-based architecture and outperform existing state-of-the-art methods by a wide margin.	翻訳日:2024-06-13 22:34:15 公開日:2024-06-11
# 類似OOD検出パラドックスの幾何学的説明 A Geometric Explanation of the Likelihood OOD Detection Paradox ( http://arxiv.org/abs/2403.18910v2 ) ライセンス: Link先を確認	Hamidreza Kamkari, Brendan Leigh Ross, Jesse C. Cresswell, Anthony L. Caterini, Rahul G. Krishnan, Gabriel Loaiza-Ganem,	(参考訳) Likelihood-based Deep Generative Model (DGM) は一般的に、比較的複雑なデータセットで訓練された場合、より単純なソースからのアウト・オブ・ディストリビューション(OOD)データに高い確率値を割り当てる。謎に加え、OODサンプルは高い可能性にもかかわらずこれらのDGMによって生成されることはない。この2重のパラドックスはまだ決定的に説明されていないため、OOD検出の確率は信頼性が低い。我々の第一の観察は、最小の確率質量を含む場合、高濃度の領域は発生しないということである。このような大きな密度と低い確率質量の矛盾が、低次元多様体に制限されたデータの周りに生じることを示す。また、このシナリオは、局所固有次元(LID)推定により同定できることを示し、事前訓練されたDGMから得られる可能性とLID推定をペアリングするOOD検出法を提案する。提案手法はフローの正規化やスコアベース拡散モデルに適用でき、同じDGMバックボーンを用いて最先端のOOD検出ベンチマークに適合または超越した結果が得られる。私たちのコードはhttps://github.com/layer6ai-labs/dgm_ood_detectionで利用可能です。 Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at https://github.com/layer6ai-labs/dgm_ood_detection.	翻訳日:2024-06-13 22:34:15 公開日:2024-06-11
# BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes (英語) BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes ( http://arxiv.org/abs/2404.03022v2 ) ライセンス: Link先を確認	Amirhossein Abaskohi, Amirhossein Dabiriaghdam, Lele Wang, Giuseppe Carenini,	(参考訳) テキストと画像を組み合わせたミームは、しばしばメタファーを使って説得力のあるメッセージを伝え、世論を形成する。そこで本研究チームはSemEval-2024 Task 4という階層型マルチラベル分類タスクに取り組み,その手法をミーム内に組み込んだ修辞的,心理的説得的手法を同定した。この問題に対処するために,画像のモダリティギャップと追加の意味情報の影響を評価するキャプション生成手法を導入し,その結果を改良した。本モデルでは, テキストエンコーダとしてRoBERTa, 画像エンコーダとしてCLIPを微調整するために, GPT-4 生成キャプションとミームテキストを併用した。ベースラインは12のサブタスクすべてにおいて大きなマージンで上回っている。特に、Subtask 2aの全言語でトップ3、Subtask 2bでトップ4にランクインし、定量的に強いパフォーマンスを示した。中間段階の導入によって達成された改善は、視覚エンコーダに挑戦する画像の比喩的本質に起因する可能性が高い。これは抽象的な視覚的セマンティックスエンコーディングを改善する可能性を強調している。 Memes, combining text and images, frequently use metaphors to convey persuasive messages, shaping public opinion. Motivated by this, our team engaged in SemEval-2024 Task 4, a hierarchical multi-label classification task designed to identify rhetorical and psychological persuasion techniques embedded within memes. To tackle this problem, we introduced a caption generation step to assess the modality gap and the impact of additional semantic information from images, which improved our result. Our best model utilizes GPT-4 generated captions alongside meme text to fine-tune RoBERTa as the text encoder and CLIP as the image encoder. It outperforms the baseline by a large margin in all 12 subtasks. In particular, it ranked in top-3 across all languages in Subtask 2a, and top-4 in Subtask 2b, demonstrating quantitatively strong performance. The improvement achieved by the introduced intermediate step is likely attributable to the metaphorical essence of images that challenges visual encoders. This highlights the potential for improving abstract visual semantics encoding.	翻訳日:2024-06-13 22:24:31 公開日:2024-06-11
# 汎用行動エージェントのためのデータ駆動ゴール認識設計 Data-Driven Goal Recognition Design for General Behavioral Agents ( http://arxiv.org/abs/2404.03054v2 ) ライセンス: Link先を確認	Robert Kasumba, Guanghui Yu, Chien-Ju Ho, Sarah Keren, William Yeoh,	(参考訳) 目標認識設計は、意思決定環境への限定的な修正を目標とし、それらの環境内で行動するエージェントの目標の推測を容易にすることを目的としている。目標認識設計において様々な研究努力がなされてきたが、既存のアプローチは計算的に要求されており、エージェントが意思決定において(ほぼ)最適であると仮定することが多い。これらの制約に対処するために、汎用的な行動モデルを持つエージェントを考慮に入れた、ゴール認識設計のためのデータ駆動型アプローチを導入する。既存の文献に従えば、意思決定環境におけるエージェントの目標を推測する難しさの尺度として、最悪のケースの識別性($\textit{wcd}$)を用いる。私たちのアプローチは、与えられた環境とエージェントの振る舞いモデルに対して$\textit{wcd}$を予測するために、機械学習モデルをトレーニングすることから始まります。そこで我々は,目標認識の強化のための意思決定環境を最適化するために,様々な制約を満たす勾配に基づく最適化フレームワークを提案する。シミュレーションにより,既存の手法よりも$\textit{wcd}$を削減し,従来のセットアップにおける実行効率を向上させることが実証された。さらに, フレキシブルな予算制約, より複雑な環境, 最適なエージェント動作など, 既存のアプローチが適用されないような設定にも適応する。最後に,本手法が実世界の人的意思決定者による効率的な目標認識を促進する環境を創出できることを確認した。 Goal recognition design aims to make limited modifications to decision-making environments with the goal of making it easier to infer the goals of agents acting within those environments. Although various research efforts have been made in goal recognition design, existing approaches are computationally demanding and often assume that agents are (near-)optimal in their decision-making. To address these limitations, we introduce a data-driven approach to goal recognition design that can account for agents with general behavioral models. Following existing literature, we use worst-case distinctiveness($\textit{wcd}$) as a measure of the difficulty in inferring the goal of an agent in a decision-making environment. Our approach begins by training a machine learning model to predict the $\textit{wcd}$ for a given environment and the agent behavior model. We then propose a gradient-based optimization framework that accommodates various constraints to optimize decision-making environments for enhanced goal recognition. Through extensive simulations, we demonstrate that our approach outperforms existing methods in reducing $\textit{wcd}$ and enhancing runtime efficiency in conventional setup. Moreover, our approach also adapts to settings in which existing approaches do not apply, such as those involving flexible budget constraints, more complex environments, and suboptimal agent behavior. Finally, we have conducted human-subject experiments which confirm that our method can create environments that facilitate efficient goal recognition from real-world human decision-makers.	翻訳日:2024-06-13 22:24:31 公開日:2024-06-11
# ニューラルネットワーク検証のための最小NAP仕様の学習 Learning Minimal NAP Specifications for Neural Network Verification ( http://arxiv.org/abs/2404.04662v2 ) ライセンス: Link先を確認	Chuqin Geng, Zhaoyue Wang, Haolin Ye, Saifei Liao, Xujie Si,	(参考訳) 仕様はニューラルネットワークの検証において重要な役割を果たす。彼らは我々が検証しようとする正確な入力領域を定義し、典型的にはL-無限ノルム球として表される。最近の研究では、未確認のテストデータセットを検証するための仕様として、ニューラルアクティベーションパターン(NAP)を使用することが提案されているが、最も洗練されたNAPの計算に焦点を当てており、しばしば入力空間の非常に小さな領域に限られている。本稿では,ニューラルネットワークが与えられた場合,ネットワークの堅牢性の形式的検証に十分な最小限の(最も粗い)NAPを求める。最小のNAP仕様を見つけることは、検証可能な境界を広げるだけでなく、どのニューロンがモデルの堅牢性に寄与するかの洞察を与える。この問題に対処するために、我々はいくつかの正確で近似的なアプローチを提案する。我々の正確なアプローチは、検証ツールを利用して、決定論的または統計的に最小限のNAP仕様を見つけます。近似手法は, 検証ツールを呼び出すことなく, 逆例と局所勾配を用いて最小NAPを効率的に推定する。これにより、ニューロン間の潜在的な因果関係と、既存の検証フレームワークがスケールできないタスクである最先端のニューラルネットワークの堅牢性を調べることができる。我々の実験結果から、最小のNAP仕様は最も洗練されたNAP仕様よりもはるかに少ない神経細胞を必要とすることが示唆されるが、検証可能な境界を桁違いに大きく拡張することができる。 Specifications play a crucial role in neural network verification. They define the precise input regions we aim to verify, typically represented as L-infinity norm balls. While recent research suggests using neural activation patterns (NAPs) as specifications for verifying unseen test set data, it focuses on computing the most refined NAPs, often limited to very small regions in the input space. In this paper, we study the following problem: Given a neural network, find a minimal (coarsest) NAP that is sufficient for formal verification of the network's robustness. Finding the minimal NAP specification not only expands verifiable bounds but also provides insights into which neurons contribute to the model's robustness. To address this problem, we propose several exact and approximate approaches. Our exact approaches leverage the verification tool to find minimal NAP specifications in either a deterministic or statistical manner. Whereas the approximate methods efficiently estimate minimal NAPs using adversarial examples and local gradients, without making calls to the verification tool. This allows us to inspect potential causal links between neurons and the robustness of state-of-the-art neural networks, a task for which existing verification frameworks fail to scale. Our experimental results suggest that minimal NAP specifications require much smaller fractions of neurons compared to the most refined NAP specifications, yet they can significantly expand the verifiable boundaries to several orders of magnitude larger.	翻訳日:2024-06-13 22:24:31 公開日:2024-06-11
# PVF (Parameter Vulnerability Factor):モデルパラメータにおけるSDCに対するAI脆弱性を理解するためのスケーラブルなメトリクス PVF (Parameter Vulnerability Factor): A Scalable Metric for Understanding AI Vulnerability Against SDCs in Model Parameters ( http://arxiv.org/abs/2405.01741v3 ) ライセンス: Link先を確認	Xun Jiao, Fred Lin, Harish D. Dixit, Joel Coburn, Abhinav Pandey, Han Wang, Venkat Ramesh, Jianyu Huang, Wang Xu, Daniel Moore, Sriram Sankar,	(参考訳) AIシステムの信頼性は、デプロイメントの成功とAI技術の広範な採用に対する基本的な懸念である。残念なことに、AIハードウェアシステムのエスカレートする複雑さとヘテロジニティは、ハードウェアの欠陥(例えば、サイレントデータ破損(SDC))の影響を受けやすくなり、モデルパラメータを破損させる可能性がある。これがAI推論/サービス中に発生する場合、ユーザにとって誤ったあるいは劣化したモデルアウトプットが発生し、最終的にはAIサービスの品質と信頼性に影響を与える可能性がある。モデル内のさまざまなコンポーネント(モジュール、レイヤなど)が、パラメータの破損に対して、どのようにさまざまな脆弱性を示すのか? この問題を体系的に解決するために,コンピュータアーキテクチャコミュニティにおいて,AIモデル脆弱性のパラメータ破損に対する定量化を目標とした,新しい量的尺度であるパラメータ脆弱性係数(PVF)を提案する。モデルパラメータのPVFを、そのモデルパラメータの破損が誤った出力をもたらす確率として定義する。本稿では,推論中にPVFを3種類のタスク/モデルに適用するためのいくつかのユースケースについて述べる。 PVFは、脆弱なAIパラメータコンポーネントを保護されたハードウェアモジュールにマッピングするなど、フォールトプロテクションとパフォーマンス/効率のトレードオフのバランスにおいて、AIハードウェアデザイナに重要な洞察を提供することができる。 PVFメトリックは任意のAIモデルに適用可能であり、AI脆弱性/レジリエンス評価プラクティスの統合と標準化を支援する可能性がある。 Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults, e.g., silent data corruptions (SDC), that can potentially corrupt model parameters. When this occurs during AI inference/servicing, it can potentially lead to incorrect or degraded model output for users, ultimately affecting the quality and reliability of AI services. In light of the escalating threat, it is crucial to address key questions: How vulnerable are AI models to parameter corruptions, and how do different components (such as modules, layers) of the models exhibit varying vulnerabilities to parameter corruptions? To systematically address this question, we propose a novel quantitative metric, Parameter Vulnerability Factor (PVF), inspired by architectural vulnerability factor (AVF) in computer architecture community, aiming to standardize the quantification of AI model vulnerability against parameter corruptions. We define a model parameter's PVF as the probability that a corruption in that particular model parameter will result in an incorrect output. In this paper, we present several use cases on applying PVF to three types of tasks/models during inference -- recommendation (DLRM), vision classification (CNN), and text classification (BERT), while presenting an in-depth vulnerability analysis on DLRM. PVF can provide pivotal insights to AI hardware designers in balancing the tradeoff between fault protection and performance/efficiency such as mapping vulnerable AI parameter components to well-protected hardware modules. PVF metric is applicable to any AI model and has a potential to help unify and standardize AI vulnerability/resilience evaluation practice.	翻訳日:2024-06-13 22:14:47 公開日:2024-06-11
# 潜伏潜伏実験による運動不自由者に対するハンドジェスチャのウェアラブルセンサベースFew-Shot連続学習 Wearable Sensor-Based Few-Shot Continual Learning on Hand Gestures for Motor-Impaired Individuals via Latent Embedding Exploitation ( http://arxiv.org/abs/2405.08969v2 ) ライセンス: Link先を確認	Riyad Bin Rafiq, Weishi Shi, Mark V. Albert,	(参考訳) ハンドジェスチャは、人間とコンピュータのインタラクションの自然な手段を提供し、会話ができない人でも効率的にコミュニケーションできる。既存のジェスチャー認識法は、事前に定義されたジェスチャーに大きく依存するが、運動障害のある個人は、各個人のジェスチャー動作やスタイルに合わせて、新しいジェスチャーを必要とする。異なる人物から採取したジェスチャーサンプルは、健康状態、障害の重症度、腕の動きパターンなどによって分布の変化がある。本稿では,リプレイベースFew-Shot Continual Learning (FSCL) フレームワークにおけるLatent Embedding Exploitation (LEE) 機構を紹介する。本手法は,2つの追加埋め込みから得られた着地内ばらつきとともに,ジェスチャー先行知識として知られる保存された潜伏埋め込みを活用することにより,多彩な潜伏特徴空間を創出する。このように、モデルは、限られたサンプルで高度に可変なジェスチャーで潜時統計構造をキャプチャすることができる。我々はSmartWatch GestureとMotion Gestureデータセットを用いて実験評価を行う。提案手法は,6種類のジェスチャーに対して,1,3,5サンプルを用いて平均57.0%,64.6%,69.3%の検査精度を示す。本手法は、運動障害者がウェアラブルデバイスを活用するのに役立ち、そのユニークな動作様式を学習し、人間とコンピュータのインタラクションやソーシャルコミュニケーションに適用することができる。 https://github.com/riyadRafiq/wearable-latent-embedding-exploitation Hand gestures can provide a natural means of human-computer interaction and enable people who cannot speak to communicate efficiently. Existing hand gesture recognition methods heavily depend on pre-defined gestures, however, motor-impaired individuals require new gestures tailored to each individual's gesture motion and style. Gesture samples collected from different persons have distribution shifts due to their health conditions, the severity of the disability, motion patterns of the arms, etc. In this paper, we introduce the Latent Embedding Exploitation (LEE) mechanism in our replay-based Few-Shot Continual Learning (FSCL) framework that significantly improves the performance of fine-tuning a model for out-of-distribution data. Our method produces a diversified latent feature space by leveraging a preserved latent embedding known as gesture prior knowledge, along with intra-gesture divergence derived from two additional embeddings. Thus, the model can capture latent statistical structure in highly variable gestures with limited samples. We conduct an experimental evaluation using the SmartWatch Gesture and the Motion Gesture datasets. The proposed method results in an average test accuracy of 57.0%, 64.6%, and 69.3% by using one, three, and five samples for six different gestures. Our method helps motor-impaired persons leverage wearable devices, and their unique styles of movement can be learned and applied in human-computer interaction and social communication. Code is available at: https://github.com/riyadRafiq/wearable-latent-embedding-exploitation	翻訳日:2024-06-13 22:14:47 公開日:2024-06-11
# LOGO:言語協調とグリフ知覚モデルを用いたビデオテキストスポッティング LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model ( http://arxiv.org/abs/2405.19194v2 ) ライセンス: Link先を確認	Hongen Liu, Di Sun, Jiahao Wang, Yi Liu, Gang Pan,	(参考訳) ビデオテキストスポッティング(VTS)は、ビデオ内のテキストインスタンスを同時にローカライズ、認識、追跡することを目的としている。エンド・ツー・エンド方式の限られた認識能力に対処するため、最新の手法では、最先端画像テキストスポッターのゼロショット結果を直接追跡し、印象的な性能を実現している。しかしながら、異なるデータセット間のドメインギャップのため、これらのメソッドは通常、極端なデータセット上の限られたトラッキングトラジェクトリを取得する。特定のデータセット上の微調整トランスフォーマーベースのテキストスポッターは、かなりのトレーニングリソースを犠牲にして、パフォーマンスの向上をもたらす可能性がある。本稿では,従来のテキストスポッターの性能向上を目的とした革新的なフレームワークであるLOGO(Language Collaboration and Glyph Perception Model)を提案する。この目的を達成するために、認識段階における背景雑音からテキストインスタンスを明示的に識別する言語シナジー分類器(LSC)を設計する。特に、言語シナジー分類器は、テキスト領域の正当性に基づいて、テキストコンテンツまたはバックグラウンドコードを出力できるため、言語スコアを計算できる。その後、検出スコアと言語スコアの平均値を取得して融合スコアを算出し、追跡前に検出結果を再スコアする。再描画機構により,LSCはテキストライクな領域をフィルタリングしながら低解像度テキストインスタンスの検出を容易にする。さらに、ノイズの多いテキスト領域の認識精度を高めるために、グリフ監視を導入する。さらに、位置情報と視覚的特徴を効率よく統合し、より識別的な追跡機能を得る視覚的位置混合モジュールを提案する。提案手法の有効性を,公開ベンチマークで検証した。 Video text spotting (VTS) aims to simultaneously localize, recognize and track text instances in videos. To address the limited recognition capability of end-to-end methods, recent methods track the zero-shot results of state-of-the-art image text spotters directly, and achieve impressive performance. However, owing to the domain gap between different datasets, these methods usually obtain limited tracking trajectories on extreme dataset. Fine-tuning transformer-based text spotters on specific datasets could yield performance enhancements, albeit at the expense of considerable training resources. In this paper, we propose a Language Collaboration and Glyph Perception Model, termed LOGO, an innovative framework designed to enhance the performance of conventional text spotters. To achieve this goal, we design a language synergy classifier (LSC) to explicitly discern text instances from background noise in the recognition stage. Specially, the language synergy classifier can output text content or background code based on the legibility of text regions, thus computing language scores. Subsequently, fusion scores are computed by taking the average of detection scores and language scores, and are utilized to re-score the detection results before tracking. By the re-scoring mechanism, the proposed LSC facilitates the detection of low-resolution text instances while filtering out text-like regions. Moreover, the glyph supervision is introduced to enhance the recognition accuracy of noisy text regions. In addition, we propose the visual position mixture module, which can merge the position information and visual features efficiently, and acquire more discriminative tracking features. Extensive experiments on public benchmarks validate the effectiveness of the proposed method.	翻訳日:2024-06-13 22:05:02 公開日:2024-06-11
# 最大傾き問題の解法における量子アニーリングアルゴリズムの解析 An Analysis of Quantum Annealing Algorithms for Solving the Maximum Clique Problem ( http://arxiv.org/abs/2406.07587v1 ) ライセンス: Link先を確認	Alessandro Gherardi, Alberto Leporati,	(参考訳) 量子アンニアは、2次非制約二元最適化(QUBO)問題として定式化したり、等しくイジングの定式化を用いて多くの(おそらくNP-ハード)組合せ最適化問題を解くのに使うことができる。本稿では,QUBO問題として表されるグラフ上の最大傾きを求める量子D波アニーラの能力を解析する。アンネラが課した164ノードの埋め込み限界のため, グラフ分解によるインスタンスの埋め込みについて検討した。そこで本稿では, 相補的な最大独立集合問題に対する分解アルゴリズムと, ノード数, 傾き数, 密度, 接続率, 解サイズの他のノード数に対する比を制御するグラフ生成アルゴリズムを提案する。そして、これらの変数が量子アニールによって見つかる解の質にどのように影響するかを統計的に分析した。本研究の結果は, 最適に近い解を得る確率を最大化するために実施すべき一連の予防策, 事前分析など, 比および密度限界を超過しないよう推奨することを含む。 Quantum annealers can be used to solve many (possibly NP-hard) combinatorial optimization problems, by formulating them as quadratic unconstrained binary optimization (QUBO) problems or, equivalently, using the Ising formulation. In this paper we analyse the ability of quantum D-Wave annealers to find the maximum clique on a graph, expressed as a QUBO problem. Due to the embedding limit of 164 nodes imposed by the anneler, we conducted a study on graph decomposition to enable instance embedding. We thus propose a decomposition algorithm for the complementary maximum independent set problem, and a graph generation algorithm to control the number of nodes, the number of cliques, the density, the connectivity indices and the ratio of the solution size to the number of other nodes. We then statistically analysed how these variables affect the quality of the solutions found by the quantum annealer. The results of our investigation include recommendations on ratio and density limits not to be exceeded, as well as a series of precautions and a priori analyses to be carried out in order to maximise the probability of obtaining a solution close to the optimum.	翻訳日:2024-06-13 21:45:26 公開日:2024-06-11
# AIM: マルチモーダルな大規模言語モデルにインコンテキスト学習を効果的に実施させる AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning ( http://arxiv.org/abs/2406.07588v1 ) ライセンス: Link先を確認	Jun Gao, Qian Qiao, Ziqiang Cao, Zili Wang, Wenjie Li,	(参考訳) In-context Learning(ICL)は、数十億のパラメータを更新することなく、下流タスクに創発的な能力を示すLarge Language Models(LLM)を容易にする。しかし、MLLM(Multi-modal Large Language Models)の分野では、2つの問題がマルチモーダルICLの適用を妨げる。 2)デモの増加に伴い,数千の視覚トークンがハードウェアに挑戦し,ICL性能を低下させた。予備的な調査では、内部のLLMは、応答を生成するためのマルチモーダルな実演において、言語的モダリティに重点を置いていることが判明した。そこで本稿では, 対応する言語部分の高密度潜在空間に対して, <textbf{A}mage information of \textbf{M}ultimodal demonstrations を集約することで, 上記の問題に対処するための, 汎用的で軽量なフレームワークである \textbf{AIM} を提案する。具体的には、AIMはまず凍結したバックボーンMLLMを使用して各画像テキストのデモを読み出し、テキストの上のベクトル表現を抽出する。これらのベクトルは自然に画像とテキストのペアに関する情報を融合させ、AIMはそれらを訓練可能な投影層を介して内部LLMに許容される融合仮想トークンに変換する。最終的に、これらの融合トークンはマルチモーダルなデモの変種として機能し、MLLMに入力され、通常通り現在のクエリに応答する。これらの融合トークンは、画像とテキストのペアのテキストコンポーネントに由来するため、マルチモーダルなデモはほぼ純粋なテキストによるデモに還元され、任意のMLLMにシームレスに適用される。実のMLLMを凍結することで、AIMはパラメータ効率が良く、下流のテストタスクとは無関係な公開マルチモーダルウェブコーパスでトレーニングする。 In-context learning (ICL) facilitates Large Language Models (LLMs) exhibiting emergent ability on downstream tasks without updating billions of parameters. However, in the area of multi-modal Large Language Models (MLLMs), two problems hinder the application of multi-modal ICL: (1) Most primary MLLMs are only trained on single-image datasets, making them unable to read multi-modal demonstrations. (2) With the demonstrations increasing, thousands of visual tokens highly challenge hardware and degrade ICL performance. During preliminary explorations, we discovered that the inner LLM tends to focus more on the linguistic modality within multi-modal demonstrations to generate responses. Therefore, we propose a general and light-weighted framework \textbf{AIM} to tackle the mentioned problems through \textbf{A}ggregating \textbf{I}mage information of \textbf{M}ultimodal demonstrations to the dense latent space of the corresponding linguistic part. Specifically, AIM first uses the frozen backbone MLLM to read each image-text demonstration and extracts the vector representations on top of the text. These vectors naturally fuse the information of the image-text pair, and AIM transforms them into fused virtual tokens acceptable for the inner LLM via a trainable projection layer. Ultimately, these fused tokens function as variants of multi-modal demonstrations, fed into the MLLM to direct its response to the current query as usual. Because these fused tokens stem from the textual component of the image-text pair, a multi-modal demonstration is nearly reduced to a pure textual demonstration, thus seamlessly applying to any MLLMs. With its de facto MLLM frozen, AIM is parameter-efficient and we train it on public multi-modal web corpora which have nothing to do with downstream test tasks.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# タグと正しい:音声認識誤り訂正のための高精度後編集手法 Tag and correct: high precision post-editing approach to correction of speech recognition errors ( http://arxiv.org/abs/2406.07589v1 ) ライセンス: Link先を確認	Tomasz Ziętkiewicz,	(参考訳) 本稿では,後編集による音声認識誤り訂正問題に対する新しいアプローチを提案する。 ASR(Automatic Speech Recognition)仮説の単語を単語単位で修正する方法を学ぶニューラルネットワークタグと、タグによって返される修正を適用する修正モジュールとから構成される。提案手法はアーキテクチャによらず,任意のASRシステムに適用可能である。これは本番環境では特に重要であり、エラー訂正モデルによる新しいミスの導入を避けることは、全体的な結果の純利よりも重要である可能性がある。その結果,提案モデルの性能は従来の手法に匹敵するが,トレーニングに要するリソースははるかに小さいため,推論遅延とトレーニング時間の両方が他の手法の使用を制限する重要な要因である産業用途に適していることがわかった。 This paper presents a new approach to the problem of correcting speech recognition errors by means of post-editing. It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis word by word and a corrector module that applies corrections returned by the tagger. The proposed solution is applicable to any ASR system, regardless of its architecture, and provides high-precision control over errors being corrected. This is especially crucial in production environments, where avoiding the introduction of new mistakes by the error correction model may be more important than the net gain in overall results. The results show that the performance of the proposed error correction models is comparable with previous approaches while requiring much smaller resources to train, which makes it suitable for industrial applications, where both inference latency and training times are critical factors that limit the use of other techniques.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# StreamPrompt: 効率的なストリーム学習のための学習可能なプロンプト誘導データ選択 StreamPrompt: Learnable Prompt-guided Data Selection for Efficient Stream Learning ( http://arxiv.org/abs/2406.07590v1 ) ライセンス: Link先を確認	Tongjun Shi, Shuhao Zhang,	(参考訳) ストリーム学習(SL)は、従来の継続学習(CL)とは別物として、連続したデータストリームに迅速に適応するモデルを必要とする。近年のSL法では、トレーニング用のデータサブセットを選択することで効率性が強調されているが、データの重要性の変化に効果的に適応できない静的なルールベースの選択アルゴリズムに依存しているため、しばしば苦労する。本稿では,動的で学習可能なプロンプトによってデータ選択を強化するStreamPromptを紹介する。これらの動的なプロンプトは、モデル推論を導くこと以上の2つの目的を果たす。 1)データ選択の最適化、及び 2) リハーサルバッファの更新を案内する。このアプローチは、連続データストリームの処理における適応性と計算効率の課題に対処する。さらに、StreamPromptは、迅速な学習の効率を高めるメカニズムであるPrompt Attunementを導入した。視覚変換器からの注意層を活用し、それらの出力をゲートユニットとソフトに結合することにより、Prompt Attunementrefinesはプロンプトを最小の計算資源で処理する。総合的な評価では、StreamPromptは最先端よりも優れたパフォーマンスを示し、トレーニング時間の精度と削減が大幅に向上した。これらの結果はStreamPromptの有効性と効率を裏付け、SLの進化する要求に対するスケーラブルで効果的なソリューションとしての可能性を確立した。私たちのコードはhttps://github.com/intellistream/Efficient-Stream-Learning.comで公開されています。 Stream Learning (SL) requires models to rapidly adapt to continuous data streams, setting it apart from traditional Continual Learning (CL). Recent SL methods emphasize efficiency by selecting data subsets for training, but they often struggle due to their reliance on static, rule-based selection algorithms that cannot effectively adapt to the changing importance of data. In this work, we introduce StreamPrompt, a method that enhances data selection through dynamic, learnable prompts. These dynamic prompts serve two purposes beyond guiding model inference: 1) optimizing data selection, and 2) guiding updates to the rehearsal buffer. This approach addresses the challenges of adaptability and computational efficiency in processing continuous data streams. Moreover, StreamPrompt introduces Prompt Attunement,a mechanism that enhances the efficiency of prompt learning. By leveraging attention layers from vision transformers and softly combining their outputs with a gate unit, Prompt Attunementrefines prompts with minimal computational resources. Comprehensive evaluations demonstrate StreamPrompts superior performance over state-of-the-art, with significant improvements in accuracy and reductions in training time. These results underscore the efficacy and efficiency of StreamPrompt, establishing its potential as a scalable and effective solution for the evolving demands of SL. Our code is available at https://github.com/intellistream/Efficient-Stream-Learning.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# MambaLRP: Selective State Space Sequence Modelの説明 MambaLRP: Explaining Selective State Space Sequence Models ( http://arxiv.org/abs/2406.07592v1 ) ライセンス: Link先を確認	Farnoush Rezaei Jafari, Grégoire Montavon, Klaus-Robert Müller, Oliver Eberle,	(参考訳) 選択状態空間系列モデル(マンバモデルと呼ばれる)を用いた最近のシーケンスモデリング手法は、関心が高まりつつある。これらのモデルは、線形時間における長いシーケンスの効率的な処理を可能にし、言語モデリングのような幅広いアプリケーションで急速に採用され、有望な性能を示す。現実のシナリオにおける信頼性の高い利用を促進するためには、透明性を高めることが重要です。私たちの研究は、説明可能性、特にLayer-wise Relevance Propagation(LRP)をMambaアーキテクチャにもたらすことで、この重要なギャップを埋めます。関係保存の公理に導かれ、マムバ建築の特定の構成要素を特定し、不誠実な説明を引き起こす。この問題を解決するため,LRP フレームワーク内の新しいアルゴリズムである MambaLRP を提案する。提案手法は理論的に健全であり,多種多様なモデルやデータセットにまたがる最先端の説明性能を実現するのに優れている。さらに、MambaLRPは、Mambaアーキテクチャのより深い検査を促進し、様々なバイアスを明らかにし、それらの重要性を評価する。また、マンバ模型の長距離能力に関する以前の憶測の分析も可能である。 Recent sequence modeling approaches using Selective State Space Sequence Models, referred to as Mamba models, have seen a surge of interest. These models allow efficient processing of long sequences in linear time and are rapidly being adopted in a wide range of applications such as language modeling, demonstrating promising performance. To foster their reliable use in real-world scenarios, it is crucial to augment their transparency. Our work bridges this critical gap by bringing explainability, particularly Layer-wise Relevance Propagation (LRP), to the Mamba architecture. Guided by the axiom of relevance conservation, we identify specific components in the Mamba architecture, which cause unfaithful explanations. To remedy this issue, we propose MambaLRP, a novel algorithm within the LRP framework, which ensures a more stable and reliable relevance propagation through these components. Our proposed method is theoretically sound and excels in achieving state-of-the-art explanation performance across a diverse range of models and datasets. Moreover, MambaLRP facilitates a deeper inspection of Mamba architectures, uncovering various biases and evaluating their significance. It also enables the analysis of previous speculations regarding the long-range capabilities of Mamba models.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# アクティブ推論を用いた持続可能な資源管理のモデリング Modeling Sustainable Resource Management using Active Inference ( http://arxiv.org/abs/2406.07593v1 ) ライセンス: Link先を確認	Mahault Albarracin, Ines Hipolito, Maria Raffa, Paul Kinghorn,	(参考訳) 能動推論は,生物および人工エージェントの適応行動と意思決定をシミュレートするのに役立つ。本研究は, アクティブ推論, ウェルビーイング, レジリエンス, 持続可能性の関係を探求する先行研究に基づいて, 静的環境と動的環境の両方において, 持続可能な資源管理戦略を学習するエージェントの計算モデルを提案する。エージェントの行動は、環境力学に関する信念に基づいて、優先的な嗜好によって表される自身の幸福を最適化することから生じる。静的な環境では、エージェントはそのニーズを満たすためにリソースを一貫して消費することを学ぶ。エージェントの動作に基づいてリソースが枯渇し、補給される動的な環境では、エージェントは、その動作に適応して、短期的なリソース可用性と即時的な要求のバランスをとる。これは、環境条件の変化に直面した場合に、アクティブな推論が持続的で回復力のある行動を引き起こすことを示す。我々は,モデルの意味,その限界,さらに複雑なエージェントと環境の相互作用を統合するための今後の方向性について論じる。我々の研究は、持続可能な行動を理解し形成する活動的推論の可能性を強調している。 Active inference helps us simulate adaptive behavior and decision-making in biological and artificial agents. Building on our previous work exploring the relationship between active inference, well-being, resilience, and sustainability, we present a computational model of an agent learning sustainable resource management strategies in both static and dynamic environments. The agent's behavior emerges from optimizing its own well-being, represented by prior preferences, subject to beliefs about environmental dynamics. In a static environment, the agent learns to consistently consume resources to satisfy its needs. In a dynamic environment where resources deplete and replenish based on the agent's actions, the agent adapts its behavior to balance immediate needs with long-term resource availability. This demonstrates how active inference can give rise to sustainable and resilient behaviors in the face of changing environmental conditions. We discuss the implications of our model, its limitations, and suggest future directions for integrating more complex agent-environment interactions. Our work highlights active inference's potential for understanding and shaping sustainable behaviors.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# MLLMGuard:マルチモーダル大言語モデルのための多次元安全評価スイート MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models ( http://arxiv.org/abs/2406.07594v1 ) ライセンス: Link先を確認	Tianle Gu, Zeyang Zhou, Kexin Huang, Dandan Liang, Yixu Wang, Haiquan Zhao, Yuanqi Yao, Xingge Qiao, Keqing Wang, Yujiu Yang, Yan Teng, Yu Qiao, Yingchun Wang,	(参考訳) LLM(Large Language Models)やMLLM(Multimodal Large Language Models)の顕著な進歩によって、多様体のタスクにおける印象的な能力が示される。しかし、MLLMの実践的な応用シナリオは複雑であり、悪意のある命令に晒され、それによって安全性のリスクが生じる。現在のベンチマークには特定の安全性の考慮事項が含まれているが、包括的なカバレッジが欠如しており、必要な厳密さと堅牢性を示すことができないことが多い。例えば、評価対象と評価対象のモデルの両方にGPT-4Vを用いるという一般的な実践は、自分自身の反応に偏りを示す傾向があるため、信頼性に欠ける。本稿では,MLLMの多次元安全性評価スイートであるMLLMGuardについて述べる。 MLLMGuardの評価は、2つの言語(英語と中国語)と5つの重要な安全次元(Privacy, Bias, Toxicity, Truthfulness, Legality)を包括的にカバーしている。これらの次元に着目して、評価データセットは主にソーシャルメディアなどのプラットフォームから作成されており、テキストベースおよび画像ベースのレッドチーム技術と、人間の専門家による巧妙なアノテーションを統合している。これにより、オープンソースのデータセットを使用する際のデータ漏洩による不正確な評価が防止され、ベンチマークの品質と課題の性質が保証される。さらに、完全に自動化された軽量評価器であるGuardRankが開発され、GPT-4よりも高い評価精度を実現している。 13種類の先進モデルに対する評価結果は,MLLMが安全かつ責任を負うことができるまでには,まだかなりの道のりを歩んでいることを示唆している。 Powered by remarkable advancements in Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities in manifold tasks. However, the practical application scenarios of MLLMs are intricate, exposing them to potential malicious instructions and thereby posing safety risks. While current benchmarks do incorporate certain safety considerations, they often lack comprehensive coverage and fail to exhibit the necessary rigor and robustness. For instance, the common practice of employing GPT-4V as both the evaluator and a model to be evaluated lacks credibility, as it tends to exhibit a bias toward its own responses. In this paper, we present MLLMGuard, a multidimensional safety evaluation suite for MLLMs, including a bilingual image-text evaluation dataset, inference utilities, and a lightweight evaluator. MLLMGuard's assessment comprehensively covers two languages (English and Chinese) and five important safety dimensions (Privacy, Bias, Toxicity, Truthfulness, and Legality), each with corresponding rich subtasks. Focusing on these dimensions, our evaluation dataset is primarily sourced from platforms such as social media, and it integrates text-based and image-based red teaming techniques with meticulous annotation by human experts. This can prevent inaccurate evaluation caused by data leakage when using open-source datasets and ensures the quality and challenging nature of our benchmark. Additionally, a fully automated lightweight evaluator termed GuardRank is developed, which achieves significantly higher evaluation accuracy than GPT-4. Our evaluation results across 13 advanced models indicate that MLLMs still have a substantial journey ahead before they can be considered safe and responsible.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# VulDetectBench: 大規模言語モデルによる脆弱性検出の深い機能評価 VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models ( http://arxiv.org/abs/2406.07595v1 ) ライセンス: Link先を確認	Yu Liu, Mingxin Yang, Yu Xie, Ping Chen, Xiaojin Zhang, Wei Chen,	(参考訳) 大規模言語モデル(LLM)は、大量のプログラムコードを含むトレーニングコーパスを持ち、モデルのコード理解と生成能力を大幅に改善する。しかし、プログラムの脆弱性の検出、コードに関するより具体的なタスク、そしてこのより専門的なシナリオにおけるLLMの性能評価に関する包括的な研究は、いまだに不足している。脆弱性分析における一般的な課題に対処するため,本研究では,LSMの脆弱性検出機能を評価するために特別に設計された,新たなベンチマークであるVulDetectBenchを紹介した。このベンチマークは、LLMの脆弱性を特定し、分類し、発見する能力を、難易度を高める5つのタスクを通じて総合的に評価している。我々は17モデル(オープンソースとクローズドソースの両方)の性能を評価し、既存のモデルでは脆弱性の識別と分類に関連するタスクにおいて80%以上の精度を達成できるが、その一方で、特定のより詳細な脆弱性分析タスクでは、30%未満の精度で不足しており、プロの脆弱性マイニングに有用な補助情報を提供することは困難である。本ベンチマークでは,脆弱性検出の特定のタスクにおいて,様々なLLMの能力評価を効果的に行うとともに,コードセキュリティの重要領域における今後の研究と改善の基盤となる。 VulDetectBenchはhttps://github.com/Sweetaroo/VulDetectBench.comで公開されている。 Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challenges in vulnerability analysis, our study introduces a new benchmark, VulDetectBench, specifically designed to assess the vulnerability detection capabilities of LLMs. The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty. We evaluate the performance of 17 models (both open- and closed-source) and find that while existing models can achieve over 80% accuracy on tasks related to vulnerability identification and classification, they still fall short on specific, more detailed vulnerability analysis tasks, with less than 30% accuracy, making it difficult to provide valuable auxiliary information for professional vulnerability mining. Our benchmark effectively evaluates the capabilities of various LLMs at different levels in the specific task of vulnerability detection, providing a foundation for future research and improvements in this critical area of code security. VulDetectBench is publicly available at https://github.com/Sweetaroo/VulDetectBench.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# 最小フレーム平均化による高対称性と効率の等価性 Equivariance via Minimal Frame Averaging for More Symmetries and Efficiency ( http://arxiv.org/abs/2406.07598v1 ) ライセンス: Link先を確認	Yuchao Lin, Jacob Helwig, Shurui Gui, Shuiwang Ji,	(参考訳) フレーム平均化による機械学習システムにおける等価性の実現を検討する。現在のフレーム平均化法は、大きなフレーム上でのコストのかかる和や、近似同値しか得られないサンプリングベースのアプローチに依存している。本稿では,最小フレーム平均化(MFA, Minimal Frame Averaging)を提案する。 MFAの一般基盤はまた、時空の対称性を記述するローレンツ群や複素値領域のユニタリ群など、これまで考えられていたよりも多くの群にフレーム平均化を拡張できる。その結果,MFAによる対称性の符号化は,$n$-bodyシミュレーション,コライダー物理におけるトップタグ付け,緩和エネルギー予測など,多種多様なタスクにまたがって効率と効果が示された。私たちのコードはhttps://github.com/divelab/MFA.comで公開されています。 We consider achieving equivariance in machine learning systems via frame averaging. Current frame averaging methods involve a costly sum over large frames or rely on sampling-based approaches that only yield approximate equivariance. Here, we propose Minimal Frame Averaging (MFA), a mathematical framework for constructing provably minimal frames that are exactly equivariant. The general foundations of MFA also allow us to extend frame averaging to more groups than previously considered, including the Lorentz group for describing symmetries in space-time, and the unitary group for complex-valued domains. Results demonstrate the efficiency and effectiveness of encoding symmetries via MFA across a diverse range of tasks, including $n$-body simulation, top tagging in collider physics, and relaxed energy prediction. Our code is available at https://github.com/divelab/MFA.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# CTIBench:サイバー脅威インテリジェンスにおけるLCMの評価ベンチマーク CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence ( http://arxiv.org/abs/2406.07599v1 ) ライセンス: Link先を確認	Md Tanvirul Alam, Dipkamal Bhushl, Le Nguyen, Nidhi Rastogi,	(参考訳) サイバー脅威インテリジェンス(サイバー脅威インテリジェンス、サイバー脅威インテリジェンス、CTI)は、サイバーセキュリティの世界において重要な存在であり、進化を続けるサイバー脅威を理解し、緩和するための重要な洞察を提供する。近年のLarge Language Models (LLM) の台頭は、この領域における潜在的な可能性を示しているが、信頼性、正確性、幻覚に関する懸念は続いている。既存のベンチマークはLCMの一般的な評価を提供するが、CTI固有のタスクの実践的および応用的な側面に対処するベンチマークは存在しない。このギャップを埋めるために、我々はCTIアプリケーションにおけるLCMの性能を評価するために設計されたベンチマークであるCTIBenchを紹介する。 CTIBenchには、サイバー脅威の状況においてLLMが取得した知識を評価することに焦点を当てた複数のデータセットが含まれている。これらのタスクに対するいくつかの最先端モデルの評価は、CTIコンテキストにおけるその強みと弱みに関する洞察を与え、CTIにおけるLLM能力のより深い理解に寄与する。 Cyber threat intelligence (CTI) is crucial in today's cybersecurity landscape, providing essential insights to understand and mitigate the ever-evolving cyber threats. The recent rise of Large Language Models (LLMs) have shown potential in this domain, but concerns about their reliability, accuracy, and hallucinations persist. While existing benchmarks provide general evaluations of LLMs, there are no benchmarks that address the practical and applied aspects of CTI-specific tasks. To bridge this gap, we introduce CTIBench, a benchmark designed to assess LLMs' performance in CTI applications. CTIBench includes multiple datasets focused on evaluating knowledge acquired by LLMs in the cyber-threat landscape. Our evaluation of several state-of-the-art models on these tasks provides insights into their strengths and weaknesses in CTI contexts, contributing to a better understanding of LLM capabilities in CTI.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# 中間回路計測における読み出し誤差低減とフィードフォワード Readout Error Mitigation for Mid-Circuit Measurements and Feedforward ( http://arxiv.org/abs/2406.07611v1 ) ライセンス: Link先を確認	Jin Ming Koh, Dax Enshan Koh, Jayne Thompson,	(参考訳) 現在の量子コンピューティングプラットフォームはリードアウトエラーを受けており、デバイスが計測結果の故障を報告している。中間回路測定とフィードフォワードの回路では、読み出しノイズは、不正確な条件量子演算をショット単位で適用することができる。後処理で作用する終端測定のための標準読み出し誤差軽減法は、この文脈では十分ではない。本稿では,回路の任意の層数とフィードフォワードを含む回路上の期待値に対する,回路深さ0と2量子ゲートカウントコストで読み出し誤差を緩和する一般的な手法を提案する。このプロトコルは、エラーチャネルの対称性化とフィードフォワードデータの確率的ビットフリップを量子軌道のアンサンブル上で平均化するためにゲートツイリングの形式を使用する。緩和された推定器は偏りがなく、サンプリングオーバーヘッドは${\sim} 1 / (1 - 2 r)^m$ for $m$ total measured and characteristic readout error rate $r$である。本稿では,動的量子ビットリセット,浅部GHZ状態準備,多段量子状態テレポーテーションなど,実用上の興味のあるフィードフォワード回路の例に対して,超伝導量子プロセッサのエラーを最大60 %削減する手法の有効性を実証する。 Current-day quantum computing platforms are subject to readout errors, in which faulty measurement outcomes are reported by the device. On circuits with mid-circuit measurements and feedforward, readout noise can cause incorrect conditional quantum operations to be applied on a per-shot basis. Standard readout error mitigation methods for terminal measurements which act in post-processing do not suffice in this context. Here we present a general method for readout error mitigation for expectation values on circuits containing an arbitrary number of layers of mid-circuit measurements and feedforward, at zero circuit depth and two-qubit gate count cost. The protocol uses a form of gate twirling for symmetrization of the error channels and probabilistic bit-flips in feedforward data to average over an ensemble of quantum trajectories. The mitigated estimator is unbiased and has a sampling overhead of ${\sim} 1 / (1 - 2 r)^m$ for $m$ total measurements and characteristic readout error rate $r$ per measurement. We demonstrate the effectiveness of our method, obtaining up to a ${\sim} 60\%$ reduction in error on superconducting quantum processors for several examples of feedforward circuits of practical interest, including dynamic qubit resets, shallow-depth GHZ state preparation, and multi-stage quantum state teleportation.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# 散逸系における正規およびカオス古典力学の量子的区別の破壊 Breakdown of the quantum distinction of regular and chaotic classical dynamics in dissipative systems ( http://arxiv.org/abs/2406.07616v1 ) ライセンス: Link先を確認	David Villaseñor, Lea F. Santos, Pablo Barberis-Blostein,	(参考訳) グローブ・ヘイク・ソマーズ予想(Grobe-Haake-Sommers、GHS)は、ボヒガス・ジョノニ・シュミット予想を散逸系に一般化し、古典的なカオス系と、ジニブレのアンサンブルによって予測されるレベル反発を示す量子スペクトルを結びつける。ここでは、GHS予想が実験的関心のスピンボソンモデルであるオープンディックモデルに当てはまらないことを示す。驚くべきことに、オープン量子モデルがジニブレ準位統計を示す場合、古典的極限におけるカオス構造の証拠は必ずしも見つからない。この結果は、GHS予想の普遍性に挑戦し、オープン量子系におけるスペクトル相関の源は何かという疑問を提起する。 The Grobe-Haake-Sommers (GHS) conjecture generalizes the Bohigas-Giannoni-Schmit conjecture to dissipative systems, connecting classically chaotic systems with quantum spectra that exhibit level repulsion as predicted by Ginibre ensembles. Here, we show that the GHS conjecture does not hold for the open Dicke model, which is a spin-boson model of experimental interest. Surprisingly, where the open quantum model shows Ginibre level statistics, we do not always find evidence of chaotic structures in the classical limit. This result challenges the universality of the GHS conjecture and raises the question of what is the source of spectral correlations in open quantum systems.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# 2次元サブ波長アレイにおける不純物を用いた協調センシング Cooperative Sensing with Impurities in a Two-Dimensional Subwavelength Array ( http://arxiv.org/abs/2406.07619v1 ) ライセンス: Link先を確認	Oliver August Dall'Alba Sandberg, Stefan Ostermann, Susanne F. Yelin,	(参考訳) 本稿では,2次元サブ波長原子配列に不純物として埋め込まれた2つの散逸結合した遠方原子をベースとした多用途量子センシングプロトコルを提案する。アレイはエミッタ光の導波路として機能し、より効率的な人口移動を可能にする協調的な拡張を生み出す。不純物原子の1つの集団を監視することにより、エミッタの共鳴周波数の周波数シフトを検出することができる。我々は、達成可能な感度と様々なシステムパラメータへの依存性を解析的に推定する。提案プロトコルは, 様々な環境要因や摂動に対して堅牢であり, 実環境における適用性を高めている。 We propose a versatile quantum sensing protocol based on two dissipatively coupled distant atoms embedded as impurities in a two-dimensional sub-wavelength atomic array. The array acts as a waveguide for the emitter light, creating cooperative enhancement that allows for more efficient population transfer. By monitoring the population of one of the impurity atoms, it is possible to detect frequency shifts in the emitters' resonance frequencies. We analytically estimate achievable sensitivities as well as the dependence on various system parameters. The proposed protocol is robust against various environmental factors and perturbations, which enhances its applicability in real-world scenarios.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# 高次元非エルミタン系におけるテーラー境界状態幾何 Tailoring Bound State Geometry in High-Dimensional Non-Hermitian Systems ( http://arxiv.org/abs/2406.07626v1 ) ライセンス: Link先を確認	Ao Yang, Zixi Fang, Kai Zhang, Chen Fang,	(参考訳) 非エルミート効果(NHSE)はその非相互性のため、不純物境界状態の出現の障壁を生じると一般に信じられている。本稿では,2次元以上の次元において,幾何依存性の皮膚効果の存在は,無限小不純物ポテンシャルでさえも,このタイプの非エルミート系において有界な状態を閉じ込められるような障壁を排除できることを見出した。ブロッホ・サドル点の周囲の有界状態を調べることで、非ハーミティティーは有界状態の等方性を阻害し、ダンベル型有界状態の凹凸となることが分かる。我々の研究は、高次元非エルミート系における凹凸と凸の間の境界状態の幾何学的遷移を明らかにする。 It is generally believed that the non-Hermitian effect (NHSE), due to its non-reciprocal nature, creates barriers for the appearance of impurity bound states. In this paper, we find that in two and higher dimensions, the presence of geometry-dependent skin effect eliminates this barrier such that even an infinitesimal impurity potential can confine bound states in this type of non-Hermitian systems. By examining bound states around Bloch saddle points, we find that non-Hermiticity can disrupt the isotropy of bound states, resulting in concave dumbbell-shaped bound states. Our work reveals a geometry transition of bound state between concavity and convexity in high-dimensional non-Hermitian systems.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# Qureed QuReed ( http://arxiv.org/abs/2406.07638v1 ) ライセンス: Link先を確認	Simon Sekavčnik, Kareem H. El-Safty, Janis Nötzel,	(参考訳) 提案するQuReedは,量子理論と実験コミュニティ,エンジニアリングのギャップを埋めるために設計された,オープンソースの量子シミュレーションフレームワークである。量子力学が成熟し、量子コンピューティング以上の大きな可能性を秘めているため、物理的に正確なシミュレーションの必要性が重要になる。 QuReedはピアレビューシミュレーションモデルを提供し、量子通信プロトコルやアプリケーションを探索するための信頼性の高いツールを研究者やエンジニアに提供する。理論と実験のクロストークを促進することで、QuReedは分野の進歩を加速し、通信業界における量子力学の変換力を解き放つことを目指している。ユーザフレンドリなPythonインターフェースと包括的なドキュメントは、広範囲のアクセシビリティとユーザビリティを保証する。 We present QuReed, an open-source quantum simulation framework designed to bridge gaps between quantum theory, experimental community and engineering. With Quantum Mechanics maturing and holding significant potential beyond quantum computing, the need for physically accurate simulations becomes critical. QuReed offers peer-reviewed simulation models, providing researchers and engineers with reliable tools for exploring quantum communication protocols and applications. By facilitating cross-talk between theory and experiments, QuReed aims to accelerate progress in the field and unlock the transformative power of quantum mechanics in the communications industry. Its user-friendly Python interface and comprehensive documentation ensure widespread accessibility and usability, making QuReed a valuable resource for advancing quantum communication technologies.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# 埋め込みモデルはいつ、他のモデルよりも確率が高いのか? When is an Embedding Model More Promising than Another? ( http://arxiv.org/abs/2406.07640v1 ) ライセンス: Link先を確認	Maxime Darrin, Philippe Formont, Ismail Ben Ayed, Jackie CK Cheung, Pablo Piantanida,	(参考訳) 埋め込みは機械学習において中心的な役割を担い、任意のオブジェクトを数値表現に投影することで、様々な下流タスクの実行に活用することができる。埋め込みモデルの評価は、典型的には、下流タスクを利用したドメイン固有の経験的アプローチに依存している。しかし、これらの評価を行うための適切な大規模で代表的なデータセットを取得することは必ずしも可能ではなく、違法に高価で時間を要することを証明できる。本稿では,組込み装置の評価に統一的なアプローチを提案する。まず, 埋め込みモデルを比較し, 十分性および情報性の概念に基づく理論的基礎を確立する。次に、これらの概念を活用して、抽出可能な比較基準(情報充足性)を考案し、タスクに依存しない自己監督的なランク付け手順を導出する。提案手法は,自然言語処理と分子生物学の両方において,様々な下流作業を容易にするために,モデル埋め込みの能力と密接に一致していることを実験的に実証した。これは、実践者がモデルトライアルを優先順位付けするための貴重なツールを効果的に提供します。 Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to perform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately large and representative datasets for conducting these assessments is not always viable and can prove to be prohibitively expensive and time-consuming. In this paper, we present a unified approach to evaluate embedders. First, we establish theoretical foundations for comparing embedding models, drawing upon the concepts of sufficiency and informativeness. We then leverage these concepts to devise a tractable comparison criterion (information sufficiency), leading to a task-agnostic and self-supervised ranking procedure. We demonstrate experimentally that our approach aligns closely with the capability of embedding models to facilitate various downstream tasks in both natural language processing and molecular biology. This effectively offers practitioners a valuable tool for prioritizing model trials.	翻訳日:2024-06-13 21:35:30 公開日:2024-06-11
# ノード埋め込みのための人間の理解できない説明の生成 Generating Human Understandable Explanations for Node Embeddings ( http://arxiv.org/abs/2406.07642v1 ) ライセンス: Link先を確認	Zohair Shafi, Ayan Chatterjee, Tina Eliassi-Rad,	(参考訳) ノード埋め込みアルゴリズムはグラフ内のノードの低次元潜在表現を生成する。これらの埋め込みは、ノード分類やリンク予測といった下流タスクによく使用される。本稿では,次の2つの質問について検討する: (Q1) 埋め込み次元を人間の理解可能なグラフ特徴(例えば,クラスタリング係数,PageRank)で説明できる。 (Q2) 既存のノード埋め込みアルゴリズムをどう修正すれば、人間の理解可能なグラフ機能で簡単に説明できる埋め込みを生成することができるのか? Q1への回答はイエスであり、Q2に答えるためにXM(eXplain eMbeddingのショート)と呼ばれる新しいフレームワークを導入する。 XMの重要な側面は、生成された説明の核規範を最小化することである。核規範を最小化することにより、生成した説明のエントロピーの低い境界を最小化することを示す。我々は,XMを実世界の様々なグラフ上でテストし,XMが既存のノード埋め込み手法の性能を保っているだけでなく,その説明可能性も向上していることを示す。 Node embedding algorithms produce low-dimensional latent representations of nodes in a graph. These embeddings are often used for downstream tasks, such as node classification and link prediction. In this paper, we investigate the following two questions: (Q1) Can we explain each embedding dimension with human-understandable graph features (e.g. degree, clustering coefficient and PageRank). (Q2) How can we modify existing node embedding algorithms to produce embeddings that can be easily explained by human-understandable graph features? We find that the answer to Q1 is yes and introduce a new framework called XM (short for eXplain eMbedding) to answer Q2. A key aspect of XM involves minimizing the nuclear norm of the generated explanations. We show that by minimizing the nuclear norm, we minimize the lower bound on the entropy of the generated explanations. We test XM on a variety of real-world graphs and show that XM not only preserves the performance of existing node embedding methods, but also enhances their explainability.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# SSNVC:意図しないテンポラル情報を用いた単一ストリームニューラルビデオ圧縮 SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information ( http://arxiv.org/abs/2406.07645v1 ) ライセンス: Link先を確認	Feng Wang, Haihang Ruan, Zhihuang Xie, Ronggang Wang, Xiangyu Yue,	(参考訳) 近年、ニューラルビデオ圧縮(NVC)技術は、従来の失われたビデオコーデックを超越したパフォーマンスを達成している。しかし、既存のNVC手法の多くは、正確なコンテキスト特徴を生成するために、MV(Motion Vector)の送信に大きく依存している。 1) MVの圧縮と送信には,モジュールを冗長にする特殊なMVエンコーダとデコーダが必要である。 2)MVエンコーダデコーダが存在するため,訓練戦略は複雑である。本稿では,複雑なMVエンコーダ・デコーダ構造を除去し,一段階のトレーニング戦略を用いる,Noval Single Stream NVC framework (SSNVC)を提案する。 SSNVCは、現在のエントロピーモデルに以前のエントロピーモデル機能を追加し、以前の2フレームを使用してデコーダ側で予測された動き情報を生成することで、時間情報を暗黙的に利用する。さらに,フレーム生成装置を改良し,高品質な再構成フレームを生成する。実験により、SSNVCは複数のベンチマークで最先端のパフォーマンスを達成でき、圧縮プロセスとトレーニングプロセスを大幅に単純化できることが示された。 Recently, Neural Video Compression (NVC) techniques have achieved remarkable performance, even surpassing the best traditional lossy video codec. However, most existing NVC methods heavily rely on transmitting Motion Vector (MV) to generate accurate contextual features, which has the following drawbacks. (1) Compressing and transmitting MV requires specialized MV encoder and decoder, which makes modules redundant. (2) Due to the existence of MV Encoder-Decoder, the training strategy is complex. In this paper, we present a noval Single Stream NVC framework (SSNVC), which removes complex MV Encoder-Decoder structure and uses a one-stage training strategy. SSNVC implicitly use temporal information by adding previous entropy model feature to current entropy model and using previous two frame to generate predicted motion information at the decoder side. Besides, we enhance the frame generator to generate higher quality reconstructed frame. Experiments demonstrate that SSNVC can achieve state-of-the-art performance on multiple benchmarks, and can greatly simplify compression process as well as training process.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# 音声強調のための事前学習特徴誘導拡散モデル Pre-training Feature Guided Diffusion Model for Speech Enhancement ( http://arxiv.org/abs/2406.07646v1 ) ライセンス: Link先を確認	Yiyuan Yang, Niki Trigoni, Andrew Markham,	(参考訳) 音声強調は、雑音の多い環境下での音声の明瞭さと明瞭さを著しく改善し、コミュニケーションと聴取経験を向上する。本稿では,既存の識別モデルと生成モデルの限界に対処する,効率的な音声強調に適した,事前学習型特徴誘導拡散モデルを提案する。スペクトル特徴を可変オートエンコーダ (VAE) に統合し, 逆処理の指導に事前学習した特徴を活用することにより, サンプリングステップの合理化に決定論的離散積分法 (DDIM) を併用することにより, 効率と音声強調品質を向上させる。異なるSNRを持つ2つの公開データセットの最先端結果を示すため、我々のモデルは効率とロバスト性において他のベースラインよりも優れている。提案手法は, 性能を最適化するだけでなく, 計算要求を増大させることなく, 実用的な展開能力を向上する。 Speech enhancement significantly improves the clarity and intelligibility of speech in noisy environments, improving communication and listening experiences. In this paper, we introduce a novel pretraining feature-guided diffusion model tailored for efficient speech enhancement, addressing the limitations of existing discriminative and generative models. By integrating spectral features into a variational autoencoder (VAE) and leveraging pre-trained features for guidance during the reverse process, coupled with the utilization of the deterministic discrete integration method (DDIM) to streamline sampling steps, our model improves efficiency and speech enhancement quality. Demonstrating state-of-the-art results on two public datasets with different SNRs, our model outshines other baselines in efficiency and robustness. The proposed method not only optimizes performance but also enhances practical deployment capabilities, without increasing computational demands.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# FP-Inconsistent:Browser Fingerprintの不整合を用いた侵入ボットの検出 FP-Inconsistent: Detecting Evasive Bots using Browser Fingerprint Inconsistencies ( http://arxiv.org/abs/2406.07647v1 ) ライセンス: Link先を確認	Hari Venugopalan, Shaoor Munir, Shuaib Ahmed, Tangbaihe Wang, Samuel T. King, Zubair Shafiq,	(参考訳) ブラウザの指紋認証がますますボット検出に使われている中、ボットは回避のために指紋を変更し始めている。本研究では,回避ボットの大規模な評価を行い,指紋の改ざんが検出の妨げになるかどうかを調査する。回避ボットを体系的に調査するために,2つのアンチボットサービス(DataDomeとBotD)と20種類のボットサービスからのボットトラフィックを取り入れたハニーサイトをデプロイした。ハニーサイトの20のボットサービスからの50万件のリクエストのうち、DataDomeに対する平均回避率は52.93%、BotDに対する平均回避率は44.56%である。ボットサービスとボットサービスの両方を個別に回避するボットサービスによる指紋属性の比較は、ボットサービスが実際に回避のために異なるブラウザ指紋属性を変更していることを示している。さらに,本研究では,回避ボットにおける指紋属性の不整合の存在を明らかにした。回避ボットは, 指紋属性の整合性を確保するのに困難であると考えられるため, 空間的不整合(ブラウザ指紋の2つの属性)と時間的(2つの異なる点における単一の属性)を検出するためのデータ駆動型アプローチを提案する。これらのルールは、アンチボットサービスによって容易にデプロイでき、DataDomeとBotDに対する回避ボットの回避率をそれぞれ48.11%、44.95%削減する。 As browser fingerprinting is increasingly being used for bot detection, bots have started altering their fingerprints for evasion. We conduct the first large-scale evaluation of evasive bots to investigate whether and how altering fingerprints helps bots evade detection. To systematically investigate evasive bots, we deploy a honey site incorporating two anti-bot services (DataDome and BotD) and solicit bot traffic from 20 different bot services that purport to sell "realistic and undetectable traffic". Across half a million requests from 20 different bot services on our honey site, we find an average evasion rate of 52.93% against DataDome and 44.56% evasion rate against BotD. Our comparison of fingerprint attributes from bot services that evade each anti-bot service individually as well as bot services that evade both shows that bot services indeed alter different browser fingerprint attributes for evasion. Further, our analysis reveals the presence of inconsistent fingerprint attributes in evasive bots. Given evasive bots seem to have difficulty in ensuring consistency in their fingerprint attributes, we propose a data-driven approach to discover rules to detect such inconsistencies across space (two attributes in a given browser fingerprint) and time (a single attribute at two different points in time). These rules, which can be readily deployed by anti-bot services, reduce the evasion rate of evasive bots against DataDome and BotD by 48.11% and 44.95% respectively.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# M-LRM:多視点大規模再構成モデル M-LRM: Multi-view Large Reconstruction Model ( http://arxiv.org/abs/2406.07648v1 ) ライセンス: Link先を確認	Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Xiaowei Chi, Xingqun Qi, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo,	(参考訳) 大規模再構成モデル(LRM)の最近の進歩にもかかわらず、単一の画像から複数の画像への入力を拡大する際には、非効率性、幾何学的およびテクスチャの質、および予想以上に収束速度が遅くなる。 LRMは、入力画像間の強い3Dコヒーレンスを無視して、3D再構成を自然な画像から3Dへの変換問題として定式化する。本稿では,M-LRM(Multi-view Large Restruction Model)を提案する。具体的には、M-LRMが入力画像から情報を正確にクエリできるマルチビュー整合型クロスアテンション方式を提案する。さらに、入力された多視点画像の3次元先行情報を用いて、三面体トークンを初期化する。 LRMと比較すると、提案したM-LRMは128ドル(約1,800円)の3次元のNeRFを生成し、高忠実度の3次元形状を生成することができる。実験により,本モデルがLRMよりも優れた性能向上と訓練収束を達成できることが実証された。プロジェクトページ:https://murphylmf.github.io/M-LRM/ Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the strong 3D coherence among the input images. In this paper, we propose a Multi-view Large Reconstruction Model (M-LRM) designed to efficiently reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner. Specifically, we introduce a multi-view consistent cross-attention scheme to enable M-LRM to accurately query information from the input images. Moreover, we employ the 3D priors of the input multi-view images to initialize the tri-plane tokens. Compared to LRM, the proposed M-LRM can produce a tri-plane NeRF with $128 \times 128$ resolution and generate 3D shapes of high fidelity. Experimental studies demonstrate that our model achieves a significant performance gain and faster training convergence than LRM. Project page: https://murphylmf.github.io/M-LRM/	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# 逐次ノイズ測定による資源の回収 Recovery of resource through sequential noisy measurements ( http://arxiv.org/abs/2406.07652v1 ) ライセンス: Link先を確認	Sudipta Mondal, Pritam Halder, Amit Kumar Pal, Aditi Sen De,	(参考訳) 量子情報プロトコルに組み込まれたノイズのないアンシャープ測定は、性能を阻害し、量子上の優位性を低下させる可能性がある。しかし、量子ネットワーク内のノード間の量子相関を完全に破壊する射影測定とは異なり、ノイズ測定の逐次的な応用は、量子情報処理タスクにおける測定装置のノイズの悪影響を軽減することができる。量子ネットワークにおける選択ノードの絡み合いに集中する場合,量子ビットのアシストによる雑音測定によりこれを実証する。 3つ以上の量子ビットのクラスタを持つネットワークの場合、アシスト量子ビット上で最適なアンシャープ測定を順次行うと、同じアシスト量子ビット上での最適射影測定により得られるものと似た2つのノード間の局所的絡み合いが得られることを示す。さらに, 連続雑音測定を用いた提案手法は, 特定の量子スキームの資源となる所望の状態の調製に有効である可能性が示唆された。また、量子通信に影響を及ぼす可能性のある鋭い計測ベースのプロトコルとは対照的に、量子ビットのアシストは、アンシャープ測定によって絡み合いが集中する量子ビットをより多く制御できると主張している。 Noisy unsharp measurements incorporated in quantum information protocols may hinder performance, reducing the quantum advantage. However, we show that, unlike projective measurements which completely destroy quantum correlations between nodes in quantum networks, sequential applications of noisy measurements can mitigate the adverse impact of noise in the measurement device on quantum information processing tasks. We demonstrate this in the case of concentrating entanglement on chosen nodes in quantum networks via noisy measurements performed by assisting qubits. In the case of networks with a cluster of three or higher number of qubits, we exhibit that sequentially performing optimal unsharp measurements on the assisting qubits yields localizable entanglement between two nodes akin to that obtained by optimal projective measurements on the same assisting qubits. Furthermore, we find that the proposed approach using consecutive noisy measurements can potentially be used to prepare desired states that are resource for specific quantum schemes. We also argue that assisting qubits have greater control over the qubits on which entanglement is concentrated via unsharp measurements, in contrast to sharp measurement-based protocols, which may have implications for secure quantum communication.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# OPTune: 効率的なオンライン参照チューニング OPTune: Efficient Online Preference Tuning ( http://arxiv.org/abs/2406.07657v1 ) ライセンス: Link先を確認	Lichang Chen, Jiuhai Chen, Chenxi Liu, John Kirchenbauer, Davit Soselia, Chen Zhu, Tom Goldstein, Tianyi Zhou, Heng Huang,	(参考訳) RLHF(Reinforcement Learning with Human feedback)は、Large Language Models(LLM)を人間の好みに合わせるために重要である。 RLHF のオフライン版である \emph{e g } direct preference optimization (DPO) と比較して、最近の研究ではオンライン版の方がアライメントがさらに優れていることが示されている。しかし、オンラインアライメントには、コストがかかり、並列化が困難で、さまざまな品質と実用性に苦しむ、新たなトレーニングデータをオンザフライで生成する必要がある。本稿では,オンライン嗜好調整(OPTune)のためのより効率的なデータ探索手法を提案する。データ生成中、OPTuneは(再)生成された応答が既存の応答よりも情報的かつ高品質なトレーニング信号を提供するプロンプトのみを選択する。トレーニング目標では、OPTuneは、各生成された応答(ペア)をそのユーティリティによって再重み付けし、アライメントを改善し、学習が最も有用なサンプルに集中できるようにしている。我々の評価を通じて、OPTuneのLLMは、効率的なデータ探索戦略により1.27-1.56倍高速なトレーニング速度を享受しながら、標準設定チューニングによって提供される命令追従の利点を維持している。 Reinforcement learning with human feedback~(RLHF) is critical for aligning Large Language Models (LLMs) with human preference. Compared to the widely studied offline version of RLHF, \emph{e.g.} direct preference optimization (DPO), recent works have shown that the online variants achieve even better alignment. However, online alignment requires on-the-fly generation of new training data, which is costly, hard to parallelize, and suffers from varying quality and utility. In this paper, we propose a more efficient data exploration strategy for online preference tuning (OPTune), which does not rely on human-curated or pre-collected teacher responses but dynamically samples informative responses for on-policy preference alignment. During data generation, OPTune only selects prompts whose (re)generated responses can potentially provide more informative and higher-quality training signals than the existing responses. In the training objective, OPTune reweights each generated response (pair) by its utility in improving the alignment so that learning can be focused on the most helpful samples. Throughout our evaluations, OPTune'd LLMs maintain the instruction-following benefits provided by standard preference tuning whilst enjoying 1.27-1.56x faster training speed due to the efficient data exploration strategy.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# Treeffuser: 勾配ブースト木を用いた条件拡散による確率予測 Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees ( http://arxiv.org/abs/2406.07658v1 ) ライセンス: Link先を確認	Nicolas Beltran-Velez, Alessandro Antonio Grande, Achille Nazaret, Alp Kucukelbir, David Blei,	(参考訳) 確率予測は単点予測よりも予測分布を計算することを目的としている。これらの分布により、実践者は不確実性を定量化し、リスクを計算し、外れ値を検出することができる。しかしながら、ほとんどの確率的手法はガウス分布やポアソン分布のようなパラメトリック応答を仮定する。これらの仮定が失敗すると、そのようなモデルは予測が悪く、不確実性が不確かである。本稿では,表型データに対する確率的予測法であるTreeffuserを提案する。傾き木を用いてスコア関数を推定する条件拡散モデルを学習する。条件付き拡散モデルにより、Treeffuserは柔軟で非パラメトリックになり、グラデーションブーストツリーは、CPU上でのトレーニングが堅牢で簡単になる。 Treeffuserはよく校正された予測分布を学習し、多変量、マルチモーダル、歪んだ応答を含む幅広い回帰タスクを処理できる。 1%, カテゴリー的予測器および欠落データとともに, 合成および実データに基づいてTreeffuserを研究した結果, 既存の手法よりも優れ, よりキャリブレーションのよい確率予測が得られた。さらに、Walmartの営業データを用いた不確実性の下での在庫配分への応用について、その汎用性を実証する。 Treeffuser は \href{https://github.com/blei-lab/treeffuser}{https://github.com/blei-lab/treeffuser} に実装しています。 Probabilistic prediction aims to compute predictive distributions rather than single-point predictions. These distributions enable practitioners to quantify uncertainty, compute risk, and detect outliers. However, most probabilistic methods assume parametric responses, such as Gaussian or Poisson distributions. When these assumptions fail, such models lead to bad predictions and poorly calibrated uncertainty. In this paper, we propose Treeffuser, an easy-to-use method for probabilistic prediction on tabular data. The idea is to learn a conditional diffusion model where the score function is estimated using gradient-boosted trees. The conditional diffusion model makes Treeffuser flexible and non-parametric, while the gradient-boosted trees make it robust and easy to train on CPUs. Treeffuser learns well-calibrated predictive distributions and can handle a wide range of regression tasks -- including those with multivariate, multimodal, and skewed responses. % , as well as categorical predictors and missing data We study Treeffuser on synthetic and real data and show that it outperforms existing methods, providing better-calibrated probabilistic predictions. We further demonstrate its versatility with an application to inventory allocation under uncertainty using sales data from Walmart. We implement Treeffuser in \href{https://github.com/blei-lab/treeffuser}{https://github.com/blei-lab/treeffuser}.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# 量子コンピュータのベンチマークのための多部非局所性の生成 Generating multipartite nonlocality to benchmark quantum computers ( http://arxiv.org/abs/2406.07659v1 ) ライセンス: Link先を確認	Jan Lennart Bönsel, Otfried Gühne, Adán Cabello,	(参考訳) 量子コンピュータは, 大規模に$n$の非局所性を生成するために利用でき, それらをベンチマークする方法を提供する。克服すべき主な課題は次のとおりである。 (i)相互作用トポロジーは任意の2ビットゲートを許さないかもしれない。 (二)ベル違反の騒音 (3)局所測定の組み合わせ数は指数関数的に$n$と増加する。乗り越える i) コンピュータの2ビット接続と互換性のあるグラフを効率的に作成できることを指摘した。 mitigate (複数形 mitigates) (ii) 特定のグラフ状態に対して、ホワイトノイズに対する抵抗が指数関数的に$n$で増加するような$n$-partite Bell不等式が存在することに留意する。宛て (iii)任意の$nおよび接続性に対して、ランダムサンプリングに依存する推定器を導入する。その結果,これまでにない大額な$n$で$n$パーティトベル非局所性を生成する方法が提案された。これにより、量子ビットの数や接続性に関わらず、古典的でない相関をベンチマークすることができる。我々は、少なくとも$n=24$ qubitsで$n$-partite Bell非局所性を予測できる、ノイズの多いIBM量子コンピュータのシミュレーションを用いて、我々のアプローチをテストする。 We show that quantum computers can be used for producing large $n$-partite nonlocality, thereby providing a method to benchmark them. The main challenges to overcome are: (i) The interaction topology might not allow arbitrary two-qubit gates. (ii) Noise limits the Bell violation. (iii) The number of combinations of local measurements grows exponentially with $n$. To overcome (i), we point out that graph states that are compatible with the two-qubit connectivity of the computer can be efficiently prepared. To mitigate (ii), we note that, for specific graph states, there are $n$-partite Bell inequalities whose resistance to white noise increases exponentially with $n$. To address (iii) for any $n$ and any connectivity, we introduce an estimator that relies on random sampling. As a result, we propose a method for producing $n$-partite Bell nonlocality with unprecedented large $n$. This allows in return to benchmark nonclassical correlations regardless of the number of qubits or the connectivity. We test our approach by using a simulation for a noisy IBM quantum computer, which predicts $n$-partite Bell nonlocality for at least $n=24$ qubits.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# ROADWorkデータセット:ワークゾーンを認識し、観察し、分析し、運転する学習 ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones ( http://arxiv.org/abs/2406.07661v1 ) ライセンス: Link先を確認	Anurag Ghosh, Robert Tamburo, Shen Zheng, Juan R. Alvarez-Padilla, Hailiang Zhu, Michael Cardei, Nicholas Dunn, Christoph Mertz, Srinivasa G. Narasimhan,	(参考訳) 自動運転研究の大きな進歩にもかかわらず、ワークゾーンの認識とナビゲートは困難で、未調査だ。重要な理由は、この長いシナリオに対処する新しいアルゴリズムを開発するためのオープンデータセットがないことである。 ROADWorkデータセットを提案し、ワークゾーンの認識、観察、分析、運転の仕方を学習する。最先端のファンデーションモデルでは、作業ゾーンではパフォーマンスが悪いことが分かりました。本データセットでは,作業ゾーン検出(+26.2 AP),高精度(+32.5%),発見率(12.8倍),検出(+23.9 AP),読取(+14.2%1-NED),作業ゾーン記述(+36.7 SPICE)の改善を行った。また、作業ゾーンのナビゲーションビデオから乾燥可能な経路を計算し、53.6%の目標が角誤差 (AE) <0.5度 (+9.9 %) で75.3%の経路がAE <0.5度 (+8.1 %) であるような航法目標や経路を予測することができることを示した。 Perceiving and navigating through work zones is challenging and under-explored, even with major strides in self-driving research. An important reason is the lack of open datasets for developing new algorithms to address this long-tailed scenario. We propose the ROADWork dataset to learn how to recognize, observe and analyze and drive through work zones. We find that state-of-the-art foundation models perform poorly on work zones. With our dataset, we improve upon detecting work zone objects (+26.2 AP), while discovering work zones with higher precision (+32.5%) at a much higher discovery rate (12.8 times), significantly improve detecting (+23.9 AP) and reading (+14.2% 1-NED) work zone signs and describing work zones (+36.7 SPICE). We also compute drivable paths from work zone navigation videos and show that it is possible to predict navigational goals and pathways such that 53.6% goals have angular error (AE) < 0.5 degrees (+9.9 %) and 75.3% pathways have AE < 0.5 degrees (+8.1 %).	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# fNIRSによる画像の復号化に向けて Progress Towards Decoding Visual Imagery via fNIRS ( http://arxiv.org/abs/2406.07662v1 ) ライセンス: Link先を確認	Michel Adamic, Wellington Avelino, Anna Brandenberger, Bryan Chiang, Hunter Davis, Stephen Fay, Andrew Gregory, Aayush Gupta, Raphael Hotter, Grace Jiang, Fiona Leng, Stephen Polcyn, Thomas Ribeiro, Paul Scotti, Michelle Wang, Marley Xiong, Jonathan Xu,	(参考訳) 我々は,fNIRS脳活動からのイメージ再構成の可能性を示し,必要な仕様に適合するプロトタイプの構築に着手する。縮小されたfMRIデータを用いて画像再構成モデルを訓練することにより,cmスケールの空間分解能は画像生成に十分であることがわかった。その結果, フル解像度fMRIでは93%, 2cmでは20%の精度で検索精度は71%であった。シミュレーションと高密度トモグラフィにより,時間領域fNIRSは連続波fNIRSの2cm分解能と比較して1cm分解能が得られることがわかった。最後に,レーザードライバ,光子検出器,デジタルコンバータシステムからなるプロトタイプの時間領域fNIRSデバイスの設計を共有する。 We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 20% with 2-cm resolution. With simulations and high-density tomography, we found that time-domain fNIRS can achieve 1-cm resolution, compared to 2-cm resolution for continuous-wave fNIRS. Lastly, we share designs for a prototype time-domain fNIRS device, consisting of a laser driver, a single photon detector, and a time-to-digital converter system.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# グラフマッチング問題の整数計画定式化のための統一フレームワーク A Unified Framework for Integer Programming Formulation of Graph Matching Problems ( http://arxiv.org/abs/2406.07666v1 ) ライセンス: Link先を確認	Bahram Alidaee, Haibo Wang, Hugh Sloan,	(参考訳) グラフ理論は、あらゆる分野における困難で複雑な問題を解決する強力なツールである。特に、グラフマッチングは、膨大な応用を伴うパターン解析における古典的な問題である。多くのグラフ問題は数学的プログラムとして定式化され、正確な、ヒューリスティックな、あるいは近似された保証された手順を用いて解かれる。一方、グラフ理論は複雑な数学的プログラミング問題、特に整数プログラムを可視化し理解するための強力なツールである。グラフ問題を自然整数プログラム(IP)として定式化することは、しばしば難しい課題である。しかし、IPの定式化には多くの利点がある。数人の研究者が、グラフ理論問題の自然なIP定式化の必要性について言及している。本研究の目的は,グラフマッチング問題のIP定式化のための統一的なフレームワークを提供することである。グラフマッチング問題に関する多くの調査があるが、IPの定式化には関心がない。本稿では,このような問題に対する包括的IP定式化を初めて提供する。このフレームワークには、文献における様々なグラフ最適化の問題が含まれている。しかしながら、これらの問題は異なる研究コミュニティによって研究されてきたが、ここで提示される枠組みは、このような多様で複雑な問題に取り組むために、異なる分野からの取り組みを促進するのに役立っている。本研究は,特にパターン解析において,実際に発生する難題のいくつかを単純化する上で,極めて有効であることを期待する。 Graph theory has been a powerful tool in solving difficult and complex problems arising in all disciplines. In particular, graph matching is a classical problem in pattern analysis with enormous applications. Many graph problems have been formulated as a mathematical program and then solved using exact, heuristic, and/or approximated-guaranteed procedures. On the other hand, graph theory has been a powerful tool in visualizing and understanding complex mathematical programming problems, especially integer programs. Formulating a graph problem as a natural integer program (IP) is often a challenging task. However, an IP formulation of the problem has many advantages. Several researchers have noted the need for natural IP formulation of graph theoretic problems. The present study aims to provide a unified framework for IP formulation of graph-matching problems. Although there are many surveys on graph matching problems, none is concerned with IP formulation. This paper is the first to provide a comprehensive IP formulation for such problems. The framework includes a variety of graph optimization problems in the literature. While these problems have been studied by different research communities, however, the framework presented here helps to bring efforts from different disciplines to tackle such diverse and complex problems. We hope the present study can significantly help to simplify some of the difficult problems arising in practice, especially in pattern analysis.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# PLT-D3:ステレオ深度とシーンフローのための高忠実度動的運転シミュレーションデータセット PLT-D3: A High-fidelity Dynamic Driving Simulation Dataset for Stereo Depth and Scene Flow ( http://arxiv.org/abs/2406.07667v1 ) ライセンス: Link先を確認	Joshua Tokarsky, Ibrahim Abdulhafiz, Satya Ayyalasomayajula, Mostafa Mohsen, Navya G. Rao, Adam Forbes,	(参考訳) 自律運転は、計算ハードウェアと高度なディープラーニング方法論の革新に支えられ、目覚ましい進歩を遂げてきた。これらの進歩の基盤はデータセットの可用性と品質に依存しており、信頼性と汎用的な自律運転アルゴリズムの開発と改良に不可欠である。自律運転認識技術の進化を支援するために多くのデータセットが開発されているが、様々な気象条件下でシステムの堅牢性を徹底的にテストし強化するために必要な多様性を提供するものはほとんどない。多くの公開データセットは、挑戦的な気象シナリオと詳細な高解像度データに関する包括的なカバレッジを欠いている。本稿では,各種気象条件に対する自律運転システムの適応性向上を目的とした動的天候駆動データセットであるPLT-D3を紹介する。 PLT-D3は、Unreal Engine 5を用いて生成された高忠実度ステレオ深度およびシーンフローグラウンド真理データを提供する。特に、このデータセットには、雨、雪、霧、様々な照明条件を含む幅広い動的気象シナリオを再現する、同期された高解像度ステレオ画像シーケンスが含まれており、シミュレーションベースのテストでは前例のないレベルのリアリズムを提供する。 PLT-D3の主な目的は、現実世界の気象変動をシミュレートできる総合的な訓練と試験資源の不足に対処することである。 PLT-D3を用いたいくつかの重要な自律運転タスクのためのベンチマークが確立されている。 Autonomous driving has experienced remarkable progress, bolstered by innovations in computational hardware and sophisticated deep learning methodologies. The foundation of these advancements rests on the availability and quality of datasets, which are crucial for the development and refinement of dependable and versatile autonomous driving algorithms. While numerous datasets have been developed to support the evolution of autonomous driving perception technologies, few offer the diversity required to thoroughly test and enhance system robustness under varied weather conditions. Many public datasets lack the comprehensive coverage of challenging weather scenarios and detailed, high-resolution data, which are critical for training and validating advanced autonomous-driving perception models. In this paper, we introduce PLT-D3; a Dynamic-weather Driving Dataset, designed specifically to enhance autonomous driving systems' adaptability to diverse weather conditions. PLT-D3 provides high-fidelity stereo depth and scene flow ground truth data generated using Unreal Engine 5. In particular, this dataset includes synchronized high-resolution stereo image sequences that replicate a wide array of dynamic weather scenarios including rain, snow, fog, and diverse lighting conditions, offering an unprecedented level of realism in simulation-based testing. The primary aim of PLT-D3 is to address the scarcity of comprehensive training and testing resources that can simulate real-world weather variations. Benchmarks have been established for several critical autonomous driving tasks using PLT-D3, such as depth estimation, optical flow and scene-flow to measure and enhance the performance of state-of-the-art models.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# フェルミオン計数による一般化ゼノ効果と絡み合いダイナミクス Generalized Zeno effect and entanglement dynamics induced by fermion counting ( http://arxiv.org/abs/2406.07673v1 ) ライセンス: Link先を確認	Elias Starchl, Mark H. Fischer, Lukas M. Sieberer,	(参考訳) 本研究では, 粒子をその環境と交換する一般計測プロセスによる自由フェルミオンの1次元格子系について検討するが, それぞれのフェルミオンの離脱・入射はカウントされる。格子サイト占有数の頻繁な測定によるダイナミクスの凍結とは対照的に、フェルミオン数の増加は系の状態の急激な変動を引き起こす。それでも、量子軌道の数値シミュレーションと、複製ケルディシュ場理論に基づく解析的アプローチにより、フェルミオン計数および局所的占有測定による自由フェルミオンの瞬時相関と絡み合い特性が著しく類似していることが分かる。この類似性は、フェルミオンカウントによって誘導される一般化されたゼノ効果と、$\mathrm{SU}(R)$非線形シグマモデルによる普遍長波長記述によって説明される。さらに, 両種類の測定プロセスにおいて, 対数的絡み合いと有限測定速度での共形不変性を有する臨界相の存在に対する強い証拠を示す。代わりに、共形不変量のシグネチャが観測可能な長さスケールの、明確に定義された有限臨界範囲を同定する。面積法的な絡み合いは、測定速度において指数関数的に大きいスケールを超えて確立されるが、臨界範囲の上界は代数的に大きく、したがって数値的にアクセス可能である。 We study a one-dimensional lattice system of free fermions subjected to a generalized measurement process: the system exchanges particles with its environment, but each fermion leaving or entering the system is counted. In contrast to the freezing of dynamics due to frequent measurements of lattice-site occupation numbers, a high rate of fermion counts induces fast fluctuations in the state of the system. Still, through numerical simulations of quantum trajectories and an analytical approach based on replica Keldysh field theory, we find that instantaneous correlations and entanglement properties of free fermions subjected to fermion counting and local occupation measurements are strikingly similar. We explain this similarity through a generalized Zeno effect induced by fermion counting and a universal long-wavelength description in terms of an $\mathrm{SU}(R)$ nonlinear sigma model. Further, for both types of measurement processes, we present strong evidence against the existence of a critical phase with logarithmic entanglement and conformal invariance at finite measurement rates. Instead, we identify a well-defined and finite critical range of length scales on which signatures of conformal invariance are observable. While area-law entanglement is established beyond a scale that is exponentially large in the measurement rate, the upper boundary of the critical range is only algebraically large and thus numerically accessible.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# ディープラーニングを用いた自動舗装き裂検出と分類 Automated Pavement Cracks Detection and Classification Using Deep Learning ( http://arxiv.org/abs/2406.07674v1 ) ライセンス: Link先を確認	Selvia Nafaa, Hafsa Essam, Karim Ashour, Doaa Emad, Rana Mohamed, Mohammed Elhenawy, Huthaifa I. Ashqar, Abdallah A. Hassan, Taqwa I. Alhadidi,	(参考訳) 効率的な輸送資産管理を構築する上で、資産状況のモニタリングが重要な要素である。画像処理の進歩により、従来の手動分類はセミオートマチック/オートマチック技術に置き換えられている。その結果,自動資産検出・分類技術が求められた。本稿では, 道路舗装の亀裂の検出と分類を, 有名なYou Only Look Once (YOLO) バージョン5 (YOLOv5) とバージョン8 (YOLOv8) のアルゴリズムを用いて行う手法を提案する。実験結果から, 照明条件と画像サイズが異なる場合, 舗装き裂検出精度は67.3%に達することがわかった。本研究は,異なる照明条件下での資産状況の正確な検出・分類を支援することを目的としている。これにより、手動検査に伴うコストと時間を削減し、ハイウェイ資産維持のコストを大幅に削減することができる。 Monitoring asset conditions is a crucial factor in building efficient transportation asset management. Because of substantial advances in image processing, traditional manual classification has been largely replaced by semi-automatic/automatic techniques. As a result, automated asset detection and classification techniques are required. This paper proposes a methodology to detect and classify roadway pavement cracks using the well-known You Only Look Once (YOLO) version five (YOLOv5) and version 8 (YOLOv8) algorithms. Experimental results indicated that the precision of pavement crack detection reaches up to 67.3% under different illumination conditions and image sizes. The findings of this study can assist highway agencies in accurately detecting and classifying asset conditions under different illumination conditions. This will reduce the cost and time that are associated with manual inspection, which can greatly reduce the cost of highway asset maintenance.	翻訳日:2024-06-13 21:25:46 公開日:2024-06-11
# FastAST:Token Mergingとクロスモデル知識蒸留によるオーディオスペクトログラム変換器の高速化 FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation ( http://arxiv.org/abs/2406.07676v1 ) ライセンス: Link先を確認	Swarup Ranjan Behera, Abhishek Dhiman, Karthik Gowda, Aalekhya Satya Narayani,	(参考訳) 音声分類モデル、特にAudio Spectrogram Transformer(AST)は、効率的な音声分析において重要な役割を果たす。しかし、精度を損なうことなく効率を最適化することは依然として課題である。本稿では,Token Merging(ToMe)をASTフレームワークに統合するフレームワークであるFastASTを紹介する。 FastASTは、オーディオスペクトログラムに類似のトークンをマージすることで、広範な再トレーニングを必要とせずに、推論速度を向上させる。さらに、トレーニング中に、FastASTは大幅なスピード改善をもたらす。実験により、FastASTは精度に最小限の影響を与えることなく、オーディオ分類のスループットを向上できることが示された。精度への影響を軽減するため、Cross-Model Knowledge Distillation (CMKD)をFastASTフレームワークに統合する。 ToMeとCMKDをASTに統合すると、より高速な推論速度を維持しながら、ASTと比較して精度が向上する。 FastASTは、リアルタイムでリソース効率の良いオーディオ分析への一歩である。 Audio classification models, particularly the Audio Spectrogram Transformer (AST), play a crucial role in efficient audio analysis. However, optimizing their efficiency without compromising accuracy remains a challenge. In this paper, we introduce FastAST, a framework that integrates Token Merging (ToMe) into the AST framework. FastAST enhances inference speed without requiring extensive retraining by merging similar tokens in audio spectrograms. Furthermore, during training, FastAST brings about significant speed improvements. The experiments indicate that FastAST can increase audio classification throughput with minimal impact on accuracy. To mitigate the accuracy impact, we integrate Cross-Model Knowledge Distillation (CMKD) into the FastAST framework. Integrating ToMe and CMKD into AST results in improved accuracy compared to AST while maintaining faster inference speeds. FastAST represents a step towards real-time, resource-efficient audio analysis.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# 量子コンピュータ上の多体熱状態--変分的アプローチ Many-body thermal states on a quantum computer: a variational approach ( http://arxiv.org/abs/2406.07677v1 ) ライセンス: Link先を確認	Mirko Consiglio, Tony J. G. Apollaro,	(参考訳) 熱平衡状態の多くの量子状態は自然界においてユビキタスである。それらの力学的性質を調べることは、ヒルベルト空間の複雑さのために、非常に難しい作業である。量子コンピュータは量子システムを効果的にシミュレートする可能性があり、多くのボディ状態は効率的なアルゴリズムによって忠実に準備できる。この目的により、量子$XY$モデルのギブス状態を作成するためのハイブリッド量子-古典的変分量子アルゴリズムを提案する。本アルゴリズムは,Grover と Rudolph のパラメトリゼーション量子回路を用いて,Gibs 状態のボルツマン重みを合成し,各ボルツマン重みにアイジェネギー基底を割り当てるためのパリティ保存アンサッツを用いている。本稿では,多体システムの対称性を利用して,Grover と Rudolph のアルゴリズムで要求される変動パラメータの指数関数的増加を著しく低減できることを示す。最後に、異なるパラメータのステートベクターシミュレーションによって得られた$XY$モデルのギブス状態の密度行列が、ギブス状態と正確なユニティに近い忠実性を示すことを示し、これが現在の量子コンピュータにおける我々のプロトコルの潜在的使用の可能性を浮き彫りにしている。 {Many-body quantum states at thermal equilibrium are ubiquitous in nature. Investigating their dynamical properties is a formidable task due to the complexity of the Hilbert space they live in. Quantum computers may have the potential to effectively simulate quantum systems, provided that the many-body state under scrutiny can be faithfully prepared via an efficient algorithm. With this aim, we present a hybrid quantum--classical variational quantum algorithm for the preparation of the Gibbs state of the quantum $XY$ model. Our algorithm is based on the Grover and Rudolph parametrized quantum circuit for the preparation of the Boltzmann weights of the Gibbs state, and on a parity-preserving ansatz for the allocation of the eigenenergy basis to their respective Boltzmann weight. We explicitly show, with a paradigmatic few-body case instance, how the symmetries of a many-body system can be exploited to significantly reduce the exponentially increasing number of variational parameters needed in the Grover and Rudolph algorithm. Finally, we show that the density matrix, of the Gibbs state of the $XY$ model, obtained by statevector simulations for different parameters, exhibits a fidelity close to unity with the exact Gibbs state; this highlights the potential use of our protocol on current quantum computers.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# ドローンビデオにおける高度な物体追跡のためのフレームワークAboveのSwarm Dynamics Watching Swarm Dynamics from Above: A Framework for Advanced Object Tracking in Drone Videos ( http://arxiv.org/abs/2406.07680v1 ) ライセンス: Link先を確認	Duc Pham, Matthew Hansen, Félicie Dhellemmens, Jens Krause, Pia Bideau,	(参考訳) さまざまなセンサーを搭載したドローンのような、簡単にアクセスできるセンサーは、自然環境における動物行動の研究を大幅に拡大した。しかし、しばしば数時間にわたる膨大なラベルのないビデオデータを分析することは、機械学習、特にコンピュータビジョンにとって依然として課題である。既存のアプローチでは、ほんの数フレームしか分析できないことが多い。我々の焦点は、長期的な動物行動分析である。この課題に対処するために、粒子フィルタリングのような古典的確率的手法を用いて状態推定を行う。セマンティックオブジェクトセグメンテーションの最近の進歩を取り入れることで、データ可用性に制限のあるシナリオであっても、急速に進化するオブジェクトの連続的な追跡を可能にする。粒子フィルタは、新しい入ってくる情報を再帰的に追加するための、証明可能なアルゴリズム構造を提供する。本研究では,無人ドローン映像から海中の魚の群れを追跡する新しい手法を提案する。我々のフレームワークは、2Dで古典的な物体追跡を行うだけでなく、ビデオデータとドローンの搭載センサー情報(GPSとIMU)を融合させることで、世界座標における魚学校の位置と空間的拡張を追跡する。提示された枠組みにより、研究者は非侵襲的でスケーラブルな方法で、自然の社会的・環境的な文脈で魚学校の集団行動を研究することができる。 Easily accessible sensors, like drones with diverse onboard sensors, have greatly expanded studying animal behavior in natural environments. Yet, analyzing vast, unlabeled video data, often spanning hours, remains a challenge for machine learning, especially in computer vision. Existing approaches often analyze only a few frames. Our focus is on long-term animal behavior analysis. To address this challenge, we utilize classical probabilistic methods for state estimation, such as particle filtering. By incorporating recent advancements in semantic object segmentation, we enable continuous tracking of rapidly evolving object formations, even in scenarios with limited data availability. Particle filters offer a provably optimal algorithmic structure for recursively adding new incoming information. We propose a novel approach for tracking schools of fish in the open ocean from drone videos. Our framework not only performs classical object tracking in 2D, instead it tracks the position and spatial expansion of the fish school in world coordinates by fusing video data and the drone's on board sensor information (GPS and IMU). The presented framework for the first time allows researchers to study collective behavior of fish schools in its natural social and environmental context in a non-invasive and scalable way.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# 量子コンピューティングのための最適化QUBO定式化法 Optimized QUBO formulation methods for quantum computing ( http://arxiv.org/abs/2406.07681v1 ) ライセンス: Link先を確認	Dario De Santis, Salvatore Tirone, Stefano Marmi, Vittorio Giovannetti,	(参考訳) NISQデバイスでは、対応する2次非制約バイナリ最適化(QUBO)形式が導出されると、いくつかの組合せ最適化問題を解くことができる。本研究の目的は、これらのQUBO改革に必要な変数を劇的に削減し、NISQ機器の最適化問題に対する最適解を効率よく得られるようにすることである。これは、スラック変数の効率的な使用を可能にする新しいツールを導入することで実現される。我々は,新しい手法を2つの独立部分,すなわち反復二次多項式とマスター・サテライト法に分割する。そこで,本手法をNPハード最適化問題に応用する方法を,Max-Profit Balance Settlementと呼ばれる現実の金融シナリオにインスパイアされた場合に適用する方法を示す。 2つのD波量子異方体にこの問題のいくつかの事例を提出し、これらのシナリオで使用される標準手法と新しい手法の性能を比較した。さらに、本研究では、D波アドバンテージとAdvantage2量子アニールのいくつかの性能差を評価できる。 Several combinatorial optimization problems can be solved with NISQ devices once that a corresponding quadratic unconstrained binary optimization (QUBO) form is derived. The aim of this work is to drastically reduce the variables needed for these QUBO reformulations in order to unlock the possibility to efficiently obtain optimal solutions for a class of optimization problems with NISQ devices. This is achieved by introducing novel tools that allow an efficient use of slack variables, even for problems with non-linear constraints, without the need to approximate the starting problem. We divide our new techniques in two independent parts, called the iterative quadratic polynomial and the master-satellite methods. Hence, we show how to apply our techniques in case of an NP-hard optimization problem inspired by a real-world financial scenario called Max-Profit Balance Settlement. We follow by submitting several instances of this problem to two D-wave quantum annealers, comparing the performances of our novel approach with the standard methods used in these scenarios. Moreover, this study allows to appreciate several performance differences between the D-wave Advantage and Advantage2 quantum annealers.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# AIトリングがエンジニアリングワークスペースに与える影響 Impact of AI-tooling on the Engineering Workspace ( http://arxiv.org/abs/2406.07683v1 ) ライセンス: Link先を確認	Lena Chretien, Nikolas Albarran,	(参考訳) AI駆動のコーディングツールがエンジニアのワークフローや作業環境に与える影響を理解するために、私たちはJellyfishプラットフォームを使用して変化の指標を分析します。主な指標は、Allocations、Coding Fraction vs. PR Fraction、Lifecycle Phases、Cycle Time、Jiraチケットサイズ、PRピックアップ時間、PRコメント、PRコメント数、インタラクション、コーディング言語から導かれる。また,Copilot利用者のコーディング時間に有意な変化がみられ,平均3%の減少と最大15%の減少がみられた。 4社で平均16%減少し, サイクルタイムは8%減少したが, コントロールグループは変化を認めなかった。さらに、PRプロセスはCopilotの使用とともに進化し、週単位のPR数が一定であるにもかかわらず、より長い包括的なコメントが特徴となった。すべての企業で仮説変更が観測されたわけではない。しかし、いくつかの企業はPRのピックアップ時間を最大33%減少させ、ワークフローのボトルネックを減らし、ある企業は最大17%の作業がメンテナンスから製品の成長イニシアチブへ移行した。この研究は、複数の企業のデータを初めて利用し、代わりに実際のエンジニアリング設定を考慮して、単純な生産性と満足度の測定を超えたものだ。そうすることによって、一部の企業は、Copilotの使用によって他の企業よりもメリットがあるように思われると同時に、エンジニアリング作業やワークフローの特定の側面ではなく、集約を調査する場合には、変更が微妙になる可能性がある、と強調する。 To understand the impacts of AI-driven coding tools on engineers' workflow and work environment, we utilize the Jellyfish platform to analyze indicators of change. Key indicators are derived from Allocations, Coding Fraction vs. PR Fraction, Lifecycle Phases, Cycle Time, Jira ticket size, PR pickup time, PR comments, PR comment count, interactions, and coding languages. Significant changes were observed in coding time fractions among Copilot users, with an average decrease of 3% with individual decreases as large as 15%. Ticket sizes decreased by an average of 16% across four companies, accompanied by an 8% decrease in cycle times, whereas the control group showed no change. Additionally, the PR process evolved with Copilot usage, featuring longer and more comprehensive comments, despite the weekly number of PRs reviewed remaining constant. Not all hypothesized changes were observed across all participating companies. However, some companies experienced a decrease in PR pickup times by up to 33%, indicating reduced workflow bottlenecks, and one company experienced a shift of up to 17% of effort from maintenance and support work towards product growth initiatives. This study is the first to utilize data from more than one company and goes beyond simple productivity and satisfaction measures, considering real-world engineering settings instead. By doing so, we highlight that some companies seem to benefit more than others from the use of Copilot and that changes can be subtle when investigating aggregates rather than specific aspects of engineering work and workflows - something that will be further investigated in the future.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# 大規模言語モデル予測におけるアウトオフコンテキストプロンプトの公正性とロバスト性向上 Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions ( http://arxiv.org/abs/2406.07685v1 ) ライセンス: Link先を確認	Leonardo Cotta, Chris J. Maddison,	(参考訳) Frontier Large Language Models (LLMs)は、高い意思決定のためにますますデプロイされている。一方で、これらのモデルは、ユーザや社会の期待に反する予測、例えば、幻覚、あるいは差別を継続的に行っています。したがって、信頼性を向上させるためのテストタイム戦略を開発することが重要である。従来の作業にインスパイアされた私たちは、因果関係をツールとして活用して、LLMにおける信頼の2つの側面、すなわち公正性と堅牢性を公式にエンコードします。この観点では、既存のテストタイムソリューションは、モデルに公正か堅牢かを明確に指示するが、LLMの因果推論能力に依存している。この研究では、反対のアプローチを探求する。 LLMに信頼性を明示的に求める代わりに、我々は、構築によってより信頼性の高い予測をもたらす根底にある因果推論アルゴリズムを符号化するプロンプトを設計する。具体的には、LLMの公平性と堅牢性を促進するテストタイムソリューションとして、アウト・オブ・コンテクストを提案する。アウト・オブ・コンテクスト(Out-of-context prompting)は、タスクの因果モデルに関するユーザの事前の知識を活用して、(ランダムな)反事実変換を適用し、モデルの信頼性を向上させる。経験的に、アウト・オブ・コンテクストは、追加のデータや微調整や事前学習を必要とせずに、5つのベンチマークデータセットにわたるフロンティアLSMの公平性と堅牢性を一貫して改善することを示す。 Frontier Large Language Models (LLMs) are increasingly being deployed for high-stakes decision-making. On the other hand, these models are still consistently making predictions that contradict users' or society's expectations, e.g., hallucinating, or discriminating. Thus, it is important that we develop test-time strategies to improve their trustworthiness. Inspired by prior work, we leverage causality as a tool to formally encode two aspects of trustworthiness in LLMs: fairness and robustness. Under this perspective, existing test-time solutions explicitly instructing the model to be fair or robust implicitly depend on the LLM's causal reasoning capabilities. In this work, we explore the opposite approach. Instead of explicitly asking the LLM for trustworthiness, we design prompts to encode the underlying causal inference algorithm that will, by construction, result in more trustworthy predictions. Concretely, we propose out-of-context prompting as a test-time solution to encourage fairness and robustness in LLMs. Out-of-context prompting leverages the user's prior knowledge of the task's causal model to apply (random) counterfactual transformations and improve the model's trustworthiness. Empirically, we show that out-of-context prompting consistently improves the fairness and robustness of frontier LLMs across five different benchmark datasets without requiring additional data, finetuning or pre-training.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# AV-DiT:ジョイントオーディオ・ビデオ生成のための高能率オーディオ・ビジュアル・ディフュージョン変換器 AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation ( http://arxiv.org/abs/2406.07686v1 ) ライセンス: Link先を確認	Kai Wang, Shijian Deng, Jing Shi, Dimitrios Hatzinakos, Yapeng Tian,	(参考訳) 最近のDiffusion Transformers (DiTs)は、画像、ビデオ、オーディオを含む高品質な単一モダリティコンテンツを生成する素晴らしい能力を示している。しかし, 変圧器をベースとしたディフューザがガウス雑音を効率よくマルチモーダルコンテンツ生成に分解できるかどうかはまだ未定である。このギャップを埋めるために、視覚とオーディオの両方で高品質でリアルなビデオを生成するために設計された、新しく効率的なオーディオ-視覚拡散変換器であるAV-DiTを導入する。モデルの複雑さと計算コストを最小限に抑えるため、AV-DiTは画像のみのデータで事前訓練された共有のDiTバックボーンを使用し、新しく挿入されたアダプタのみをトレーニングできる。この共有バックボーンは、オーディオとビデオの両方を生成する。具体的には、トレーニング可能な時間的注意層を凍結したトレーニング済みのDiTブロックに組み込んで、時間的一貫性を実現する。さらに、少数のトレーニング可能なパラメータが画像ベースのDiTブロックに適応してオーディオを生成する。軽量なパラメータを備えた追加の共有DiTブロックは、オーディオと視覚のモダリティ間の特徴的相互作用を促進し、アライメントを確保する。 AIST++とLandscapeデータセットの大規模な実験により、AV-DiTは可変パラメータが大幅に少ない共同オーディオ・ビジュアル生成において最先端のパフォーマンスを達成することが示された。さらに, 単一の共有画像生成バックボーンをモダリティに適応させることで, 共同オーディオ映像生成装置を構築するのに十分であることを示した。ソースコードと事前訓練されたモデルがリリースされます。 Recent Diffusion Transformers (DiTs) have shown impressive capabilities in generating high-quality single-modality content, including images, videos, and audio. However, it is still under-explored whether the transformer-based diffuser can efficiently denoise the Gaussian noises towards superb multimodal content creation. To bridge this gap, we introduce AV-DiT, a novel and efficient audio-visual diffusion transformer designed to generate high-quality, realistic videos with both visual and audio tracks. To minimize model complexity and computational costs, AV-DiT utilizes a shared DiT backbone pre-trained on image-only data, with only lightweight, newly inserted adapters being trainable. This shared backbone facilitates both audio and video generation. Specifically, the video branch incorporates a trainable temporal attention layer into a frozen pre-trained DiT block for temporal consistency. Additionally, a small number of trainable parameters adapt the image-based DiT block for audio generation. An extra shared DiT block, equipped with lightweight parameters, facilitates feature interaction between audio and visual modalities, ensuring alignment. Extensive experiments on the AIST++ and Landscape datasets demonstrate that AV-DiT achieves state-of-the-art performance in joint audio-visual generation with significantly fewer tunable parameters. Furthermore, our results highlight that a single shared image generative backbone with modality-specific adaptations is sufficient for constructing a joint audio-video generator. Our source code and pre-trained models will be released.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# 対訳マシンアンラーニング Adversarial Machine Unlearning ( http://arxiv.org/abs/2406.07687v1 ) ライセンス: Link先を確認	Zonglin Di, Sixie Yu, Yevgeniy Vorobeychik, Yang Liu,	(参考訳) 本稿では,機械学習モデルに対する特定のトレーニングデータの影響を取り除くことを目的とした,機械学習の課題に焦点を当てた。従来、未学習アルゴリズムの開発は、トレーニングにデータインスタンスを使用したかどうかを判断するプライバシーの脅威である、メンバシップ推論攻撃(MIA)と並行して実行される。しかし、2つのストランドは密接に結びついており、削除されたデータに関してMIAの成功のレンズを通して機械学習を見ることができる。この関係を認識し,未学習アルゴリズムの設計にMIAを統合するゲーム理論フレームワークを提案する。具体的には、未学習の問題をモデルから特定のトレーニングデータを解放しようとするStackelbergゲームとしてモデル化し、監査官はMIAを用いて視覚的に除去されたデータのトレースを検出する。この対立的な観点を採用することで、新たな攻撃の進展が利用でき、未学習アルゴリズムの設計が容易になる。私たちのフレームワークは2つの点で際立っている。まず、敵対的なアプローチをとり、攻撃を非学習アルゴリズムの設計に積極的に組み込む。第二に、攻撃者の成功を制限する勾配を得るために暗黙の差別を利用するため、学習を解き放つプロセスの恩恵を受ける。本研究では,機械学習における提案手法の有効性を示す実験結果を示す。 This paper focuses on the challenge of machine unlearning, aiming to remove the influence of specific training data on machine learning models. Traditionally, the development of unlearning algorithms runs parallel with that of membership inference attacks (MIA), a type of privacy threat to determine whether a data instance was used for training. However, the two strands are intimately connected: one can view machine unlearning through the lens of MIA success with respect to removed data. Recognizing this connection, we propose a game-theoretic framework that integrates MIAs into the design of unlearning algorithms. Specifically, we model the unlearning problem as a Stackelberg game in which an unlearner strives to unlearn specific training data from a model, while an auditor employs MIAs to detect the traces of the ostensibly removed data. Adopting this adversarial perspective allows the utilization of new attack advancements, facilitating the design of unlearning algorithms. Our framework stands out in two ways. First, it takes an adversarial approach and proactively incorporates the attacks into the design of unlearning algorithms. Secondly, it uses implicit differentiation to obtain the gradients that limit the attacker's success, thus benefiting the process of unlearning. We present empirical results to demonstrate the effectiveness of the proposed approach for machine unlearning.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# AIラジオロジスト:畳み込みニューラルネットワークと臨床用GUIによる肝組織分節の革命 AI Radiologist: Revolutionizing Liver Tissue Segmentation with Convolutional Neural Networks and a Clinician-Friendly GUI ( http://arxiv.org/abs/2406.07688v1 ) ライセンス: Link先を確認	Ayman Al-Kababji, Faycal Bensaali, Sarada Prasad Dakua, Yassine Himeur,	(参考訳) 人工知能(AI)は、様々な分野や応用に浸透する幅広い研究トピックである。本研究では,肝組織分割のためのAI,特に畳み込みニューラルネットワーク(ConvNets)のパワーを利用する。また、ユーザフレンドリーなグラフィカルユーザインタフェース(GUI)ツールである"AI Radioologist"の開発にも重点を置いている。この取り組みは、学術研究と実践的、産業的応用のギャップを埋めるものである。 GUIはシングルページアプリケーションであり、PyQt5 Pythonフレームワークを使って設計されている。オフラインで利用できるAIラジオロジストは、すべての肝臓組織をセグメンテーションするためにトレーニングされた3つのConvNetモデルを利用している。 Diceの指標では、ベスト肝のConvNetスコアは98.16%、ベスト腫瘍のConvNetスコアは65.95%、ベスト血管のConvNetスコアは51.94%である。肝臓、腫瘍、血管の2Dスライスと、.NETの3D補間を出力する。 objと... これは、どんな3D互換のソフトウェアでも視覚化/プリントできる。したがって、AIラジオロジストは、臨床医が肝組織セグメンテーションと組織セグメンテーションの最先端モデルを用いた3D補間を行うのに便利なツールを提供する。ボリュームと事前訓練されたモデルを選択する能力が提供されるため、臨床医は残りをAIラジオロジストに委ねることができる。 Artificial Intelligence (AI) is a pervasive research topic, permeating various sectors and applications. In this study, we harness the power of AI, specifically convolutional neural networks (ConvNets), for segmenting liver tissues. It also focuses on developing a user-friendly graphical user interface (GUI) tool, "AI Radiologist", enabling clinicians to effectively delineate different liver tissues (parenchyma, tumors, and vessels), thereby saving lives. This endeavor bridges the gap between academic research and practical, industrial applications. The GUI is a single-page application and is designed using the PyQt5 Python framework. The offline-available AI Radiologist resorts to three ConvNet models trained to segment all liver tissues. With respect to the Dice metric, the best liver ConvNet scores 98.16%, the best tumor ConvNet scores 65.95%, and the best vessel ConvNet scores 51.94%. It outputs 2D slices of the liver, tumors, and vessels, along with 3D interpolations in .obj and .mtl formats, which can be visualized/printed using any 3D-compatible software. Thus, the AI Radiologist offers a convenient tool for clinicians to perform liver tissue segmentation and 3D interpolation employing state-of-the-art models for tissues segmentation. With the provided capacity to select the volumes and pre-trained models, the clinicians can leave the rest to the AI Radiologist.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# 教育におけるトランスフォーマーモデル:AraBART、MT5、AraT5、mBARTによるサイエンス教科書の要約 Transformer Models in Education: Summarizing Science Textbooks with AraBART, MT5, AraT5, and mBART ( http://arxiv.org/abs/2406.07692v1 ) ライセンス: Link先を確認	Sari Masri, Yaqeen Raddad, Fidaa Khandaqji, Huthaifa I. Ashqar, Mohammed Elhenawy,	(参考訳) 近年,技術分野の急速な発展と,インターネット上で利用できるテキストtの量の増加により,情報の基本的基礎を損なうことなく,コンテンツを要約してテキストを処理・理解するための効果的なツールの開発が急務となっている。この課題から、アラビア語の教科書を対象とする高度なテキスト要約システムを開発した。 MT5, AraBART, AraT5, mBART50などの現代のナチュラル言語処理モデルに基づいて, パレスチナのカリキュラムにおける11年生および12年生の生物学教科書で見られる最も重要な文章を評価し, 抽出し, 学生や教師が, 内容を容易に理解するための正確で有用な要約を得られるようにした。トレーニングされたモデルの性能を評価するために,ルージュ測度を用いた。さらに、教育エドゥの教科書執筆の専門家は、訓練されたモデルのアウトプットを評価する。このアプローチは、最良のソリューションを特定し、改善が必要な領域を明確にすることを目的としています。この研究はアラビア語のテキストを要約するための解決策を提供する。アラビア語の理解と生成のための技術において、研究と開発のための新たな地平線を開くことができる結果を提供することによって、この分野を豊かにする。さらに、教科書のテキストを作成し、編集し、データセットを構築することで、アラビア語のテキストでこの分野に貢献する。 Recently, with the rapid development in the fields of technology and the increasing amount of text t available on the internet, it has become urgent to develop effective tools for processing and understanding texts in a way that summaries the content without losing the fundamental essence of the information. Given this challenge, we have developed an advanced text summarization system targeting Arabic textbooks. Relying on modern natu-ral language processing models such as MT5, AraBART, AraT5, and mBART50, this system evaluates and extracts the most important sentences found in biology textbooks for the 11th and 12th grades in the Palestinian curriculum, which enables students and teachers to obtain accurate and useful summaries that help them easily understand the content. We utilized the Rouge metric to evaluate the performance of the trained models. Moreover, experts in education Edu textbook authoring assess the output of the trained models. This approach aims to identify the best solutions and clarify areas needing improvement. This research provides a solution for summarizing Arabic text. It enriches the field by offering results that can open new horizons for research and development in the technologies for understanding and generating the Arabic language. Additionally, it contributes to the field with Arabic texts through creating and compiling schoolbook texts and building a dataset.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# YouTube、TikTok、その他2024年の麻疹のアウトブレイクに関する動画の感情分析のためのラベル付きデータセット A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles ( http://arxiv.org/abs/2406.07693v1 ) ライセンス: Link先を確認	Nirmalya Thakur, Vanessa Su, Mingchen Shao, Kesha A. Patel, Hongseok Jeong, Victoria Knieling, Andrew Brian,	(参考訳) 本稿では,2024年1月1日から5月31日までにインターネット上の264のウェブサイトで公表された麻疹の流行に関する4011件のビデオデータを含むデータセットを提案する。データセットはhttps://dx.doi.org/10.21227/40s8-xf63で公開されている。これらのウェブサイトにはYouTubeとTikTokが含まれるが、これはそれぞれ48.6%と15.2%である。残りのWebサイトは、InstagramとFacebookだけでなく、さまざまなグローバルおよびローカルなニュース組織のWebサイトも含んでいる。これらのビデオのそれぞれについて、ビデオのURL、投稿のタイトル、投稿の説明、およびビデオの公開日をデータセット内の別の属性として提示する。このデータセットを開発した後、ビデオタイトルとビデオ記述の感情分析(VADERを用いた)、主観的分析(TextBlobを用いた)、微粒な感情分析(DistilRoBERTaベースを用いた)を行った。これには、各ビデオタイトルとビデオ記述を分類することが含まれる。 (i)肯定的、否定的、中立的な感情階級の1つ (二)主観的階級の1つ、即ち、高い意見、中立的な意見、または、最小の意見 (三)恐怖、驚き、喜び、悲しみ、怒り、嫌悪、中立という微粒な感情のクラスの一つ。これらの結果は、この分野での感情分析や主観分析を行う機械学習アルゴリズムのトレーニングとテストのためのデータセットと、他のアプリケーションのためのデータセットの別属性として提示される。最後に,本データセットを用いて検討することのできるオープンリサーチ質問のリストも提示する。 The work of this paper presents a dataset that contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. The dataset is available at https://dx.doi.org/10.21227/40s8-xf63. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. Finally, this paper also presents a list of open research questions that may be investigated using this dataset.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# 産業欠陥検出のためのベンチマークとモデル開発のための公開データセットのPRISMA駆動型システムレビュー A PRISMA Driven Systematic Review of Publicly Available Datasets for Benchmark and Model Developments for Industrial Defect Detection ( http://arxiv.org/abs/2406.07694v1 ) ライセンス: Link先を確認	Can Akbas, Irem Su Arin, Sinan Onal,	(参考訳) 近年, 様々な産業における品質管理の進歩は, ビデオカメラと画像処理を統合し, 効果的な欠陥検出を実現している。進歩にとって重要な障壁は、注釈付き欠陥を含む包括的なデータセットの不足であり、自動欠陥検出モデルの開発と修正に不可欠である。この体系的なレビューは、2015年から2023年にかけて、15の公開データセットを特定し、ベンチマークとモデル開発の有効性と適用性を評価するために、それらを批判的に検証する。 NEU-CLS, NEU-DET, DAGM, KolektorSDD, PCB Defect Dataset, Hollow Cylindrical Defect Detection Datasetなどのデータセットには,画像品質, 欠陥型表現, 実世界の適用性など,それぞれ独自の長所と制限がある。この体系的なレビューの目的は、これらのデータセットを単一の場所にまとめることであり、そのような公開リソースを包括的な参照で探す研究者に提供することである。 Recent advancements in quality control across various industries have increasingly utilized the integration of video cameras and image processing for effective defect detection. A critical barrier to progress is the scarcity of comprehensive datasets featuring annotated defects, which are essential for developing and refining automated defect detection models. This systematic review, spanning from 2015 to 2023, identifies 15 publicly available datasets and critically examines them to assess their effectiveness and applicability for benchmarking and model development. Our findings reveal a diverse landscape of datasets, such as NEU-CLS, NEU-DET, DAGM, KolektorSDD, PCB Defect Dataset, and the Hollow Cylindrical Defect Detection Dataset, each with unique strengths and limitations in terms of image quality, defect type representation, and real-world applicability. The goal of this systematic review is to consolidate these datasets in a single location, providing researchers who seek such publicly available resources with a comprehensive reference.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# 音声表現のための持続的自己教師型学習 Sustainable self-supervised learning for speech representations ( http://arxiv.org/abs/2406.07696v1 ) ライセンス: Link先を確認	Luis Lugo, Valentin Vielzeuf,	(参考訳) 持続可能な人工知能は、データ、ハードウェア、アルゴリズムに焦点を当て、機械学習モデルをより環境に責任を持つものにする。特に、音声表現のための機械学習モデルは計算コストが高く、高エネルギー消費のため環境問題が発生する。そこで本稿では,音声表現学習のための持続的自己教師モデルを提案する。提案モデルでは,資源効率のよいベースラインを改良し,メモリ使用量と計算コストの見積を削減した。 1日以内で1つのGPUを使用して事前トレーニングを行う。それに加えて、下流タスク評価におけるベースラインのエラー率パフォーマンスを向上させる。大規模な音声表現アプローチと比較すると、メモリ使用量の桁違いの削減が見られ、計算コストの削減は、ほぼ3桁の桁違いの改善を示している。 Sustainable artificial intelligence focuses on data, hardware, and algorithms to make machine learning models more environmentally responsible. In particular, machine learning models for speech representations are computationally expensive, generating environmental concerns because of their high energy consumption. Thus, we propose a sustainable self-supervised model to learn speech representation, combining optimizations in neural layers and training to reduce computing costs. The proposed model improves over a resource-efficient baseline, reducing both memory usage and computing cost estimations. It pretrains using a single GPU in less than a day. On top of that, it improves the error rate performance of the baseline in downstream task evaluations. When comparing it to large speech representation approaches, there is an order of magnitude reduction in memory usage, while computing cost reductions represent almost three orders of magnitude improvement.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# Label Smoothingが機械学習を改善 Label Smoothing Improves Machine Unlearning ( http://arxiv.org/abs/2406.07698v1 ) ライセンス: Link先を確認	Zonglin Di, Zhaowei Zhu, Jinghan Jia, Jiancheng Liu, Zafar Takhirov, Bo Jiang, Yuanshun Yao, Sijia Liu, Yang Liu,	(参考訳) マシン・アンラーニング(MU)の目的は、以前に学習したデータをモデルから排除することである。しかし、既存のMU技術を使用する場合、計算コストと性能のバランスをとることは困難である。ラベル平滑化がモデル信頼性と差分プライバシーに与える影響から着想を得て,ラベル平滑化の逆プロセスを用いた単純な勾配に基づくMUアプローチを提案する。この研究は、スムーズなラベルを使用するシンプルなプラグアンドプレイMUアプローチであるUGradSLを導入している。ラベルのスムース化を適切に導入することでMU性能が向上する理由を理論的に分析する。提案手法の有効性とロバスト性を実証し,様々なサイズと異なるモードの6つのデータセットについて広範な実験を行った。 MU性能の一貫した改善は、余剰計算の限界コストでしかない。例えば、UGradSLは、未学習効率を犠牲にすることなく、勾配上昇MUベースラインを66%の未学習精度で改善する。 The objective of machine unlearning (MU) is to eliminate previously learned data from a model. However, it is challenging to strike a balance between computation cost and performance when using existing MU techniques. Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of label smoothing. This work introduces UGradSL, a simple, plug-and-play MU approach that uses smoothed labels. We provide theoretical analyses demonstrating why properly introducing label smoothing improves MU performance. We conducted extensive experiments on six datasets of various sizes and different modalities, demonstrating the effectiveness and robustness of our proposed method. The consistent improvement in MU performance is only at a marginal cost of additional computations. For instance, UGradSL improves over the gradient ascent MU baseline by 66% unlearning accuracy without sacrificing unlearning efficiency.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# CUPID: プロンプト条件付き画像分布の文脈的理解 CUPID: Contextual Understanding of Prompt-conditioned Image Distributions ( http://arxiv.org/abs/2406.07699v1 ) ライセンス: Link先を確認	Yayan Zhao, Mingwei Li, Matthew Berger,	(参考訳) 本稿では,プロンプト条件付き画像分布の文脈的理解のための可視化手法CUPIDを提案する。 CUPIDは、ユーザが自然言語でシーンを指定できる現代のテキスト・画像生成モデルによって生成された分布の視覚的解析を目標とし、そのモデルがユーザの記述を満足する一連の画像を生成する。 CUPIDは、結果の分布を理解するために設計されており、文脈的手がかりを用いて分析を容易にする。 CUPIDの中心は高次元分布を可視化する新しい手法であり、画像内の物体の文脈的埋め込みは密度に基づく埋め込みによって低次元空間にマッピングされる。このような埋め込みによって、分布内のオブジェクトの健全なスタイルを発見できるだけでなく、異常なオブジェクトスタイルやまれなオブジェクトスタイルを識別できることを示す。さらに、条件密度埋め込みを導入し、与えられたオブジェクトの条件付けにより、分布内のオブジェクトの依存関係を比較することができる。大規模拡散モデルにより生成された画像の分布解析にCUPIDを用いており、実験結果から、そのようなモデルからの言語誤解やオブジェクト構成のバイアスについての洞察が得られ、また、典型的あるいは稀な合成シーンの発見のためのインターフェースを提供する。 We present CUPID: a visualization method for the contextual understanding of prompt-conditioned image distributions. CUPID targets the visual analysis of distributions produced by modern text-to-image generative models, wherein a user can specify a scene via natural language, and the model generates a set of images, each intended to satisfy the user's description. CUPID is designed to help understand the resulting distribution, using contextual cues to facilitate analysis: objects mentioned in the prompt, novel, synthesized objects not explicitly mentioned, and their potential relationships. Central to CUPID is a novel method for visualizing high-dimensional distributions, wherein contextualized embeddings of objects, those found within images, are mapped to a low-dimensional space via density-based embeddings. We show how such embeddings allows one to discover salient styles of objects within a distribution, as well as identify anomalous, or rare, object styles. Moreover, we introduce conditional density embeddings, whereby conditioning on a given object allows one to compare object dependencies within the distribution. We employ CUPID for analyzing image distributions produced by large-scale diffusion models, where our experimental results offer insights on language misunderstanding from such models and biases in object composition, while also providing an interface for discovery of typical, or rare, synthesized scenes.	翻訳日:2024-06-13 21:16:01 公開日:2024-06-11
# 微細分散状態によるスケーラブルUTXOスマートコントラクト Scalable UTXO Smart Contracts via Fine-Grained Distributed State ( http://arxiv.org/abs/2406.07700v1 ) ライセンス: Link先を確認	Massimo Bartoletti, Riccardo Marchesin, Roberto Zunino,	(参考訳) 現在のUTXOベースのスマートコントラクトは効率上のボトルネックに直面しており、更新されたコントラクト状態全体を特定するために、コントラクトに送信されるすべてのトランザクションが必要になる。この要件は、契約状態がマップのような動的なデータ構造を含んでいる場合、特に負担になる。一方、トランザクションにおける大きな状態は、大きなトランザクション手数料を意味する。一方、大きな中央集中状態は、トランザクションの並列化に有害であり、これは、アカウントベースのものと比較してUTXOベースのブロックチェーンの主要なセールスポイントの1つである。本稿では,拡張UTXOブロックチェーン上でのスマートコントラクトの効率的な実行手法を提案する。このようにして、トランザクションはアクセスする必要のある状態の一部のみを指定し、サイズ(および料金)を削減します。また、マルチコアCPU上でのトランザクションの検証を並列化するために、我々のモデルを利用する方法を示す。我々は,本手法を実装し,その有効性を実証的に検証する。 Current UTXO-based smart contracts face an efficiency bottleneck, requiring any transaction sent to a contract to specify the entire updated contract state. This requirement becomes particularly burdensome when the contract state contains dynamic data structures, such as maps, which are needed in many use cases for tracking users interactions with the contract. The problem is twofold: on the one hand, a large state in transactions implies a large transaction fee; on the other hand, a large centralized state is detrimental to the parallelization of transactions, which should be one of the main selling points of UTXO-based blockchains compared to account-based ones. We propose a technique to efficiently execute smart contracts on an extended UTXO blockchain, which allows the contract state to be distributed across multiple UTXOs. In this way, transactions only need to specify the part of the state they need to access, reducing their size (and fees). We also show how to exploit our model to parallelize the validation of transactions on multi-core CPUs. We implement our technique and provide an empirical validation of its effectiveness.	翻訳日:2024-06-13 21:06:17 公開日:2024-06-11
# 正当性に基づくモデル説明のグラフ的知覚 Graphical Perception of Saliency-based Model Explanations ( http://arxiv.org/abs/2406.07702v1 ) ライセンス: Link先を確認	Yayan Zhao, Mingwei Li, Matthew Berger,	(参考訳) 近年、予測的、深層学習に基づくモデルの説明に多くの研究が注がれてきた。評価手法の重要なクラスは人間中心のものであり、可視化を通して説明の伝達を必要とするのが普通である。ビジュアライゼーションは、モデル説明の知覚と理解において重要な役割を担っているが、ビジュアライゼーションデザインが人間の説明に対する認識にどのように影響するかは、まだよく分かっていない。本研究では,モデル説明のグラフィカルな知覚,特に視覚的認識モデルに対するサリエンシに基づく説明について検討する。本研究では,人間の知覚が視覚的デザインにどのように影響するかを実験的に検討し,アライメントアセスメントの課題や,画像中の物体とサリエンシマップが一致しているかを考察する。以上の結果から, 可視化設計決定やアライメントの種類, サリエンシマップの質に関連する要因が, 人間がサリエンシに基づく視覚的説明を知覚する上で重要な役割を担っていることが明らかとなった。 In recent years, considerable work has been devoted to explaining predictive, deep learning-based models, and in turn how to evaluate explanations. An important class of evaluation methods are ones that are human-centered, which typically require the communication of explanations through visualizations. And while visualization plays a critical role in perceiving and understanding model explanations, how visualization design impacts human perception of explanations remains poorly understood. In this work, we study the graphical perception of model explanations, specifically, saliency-based explanations for visual recognition models. We propose an experimental design to investigate how human perception is influenced by visualization design, wherein we study the task of alignment assessment, or whether a saliency map aligns with an object in an image. Our findings show that factors related to visualization design decisions, the type of alignment, and qualities of the saliency map all play important roles in how humans perceive saliency-based visual explanations.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# オブジェクトレベルのシーンデクルージョン Object-level Scene Deocclusion ( http://arxiv.org/abs/2406.07706v1 ) ライセンス: Link先を確認	Zhengzhe Liu, Qing Liu, Chirui Chang, Jianming Zhang, Daniil Pakhomov, Haitian Zheng, Zhe Lin, Daniel Cohen-Or, Chi-Wing Fu,	(参考訳) シーン内のオブジェクトの隠された部分を取り除くことは、特に現実世界のシーンに対処する場合、非常に恐ろしい作業である。本稿では,オブジェクトレベルのシーン・デクルージョンの基礎モデルであるPACOという,自己制御型PArallel可視・コミュールト拡散フレームワークを提案する。事前訓練されたモデルのリッチな事前処理を活用して、複数の完全オブジェクトを同時に符号化するフルビュー特徴マップを生成する並列変分オートエンコーダと、部分ビュー特徴マップから全ビュー特徴マップを暗黙的に予測し、入力画像中の不完全オブジェクトから抽出したテキストプロンプトを学習する可視から完全ラテント生成器を設計する。 PACOをトレーニングするために、500kサンプルによる大規模データセットを作成し、アモーダルマスクや隠蔽領域の退屈なアノテーションを回避し、自己教師付き学習を可能にする。提案手法では,非閉塞性を維持しつつ,効率向上を図るため,層単位の非閉塞性戦略を考案する。 COCOAと様々な現実世界のシーンに対する大規模な実験は、PACOがシーンの排除に優れた能力を示し、最先端の技術をはるかに上回っていることを示している。また,本手法は,トレーニングセットがカバーしていないクロスドメインシーンや新しいカテゴリにも拡張可能である。さらに,単視点3次元シーン再構成とオブジェクト再構成におけるPACOの非閉塞性を示す。 Deoccluding the hidden portions of objects in a scene is a formidable task, particularly when addressing real-world scenes. In this paper, we present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, a foundation model for object-level scene deocclusion. Leveraging the rich prior of pre-trained models, we first design the parallel variational autoencoder, which produces a full-view feature map that simultaneously encodes multiple complete objects, and the visible-to-complete latent generator, which learns to implicitly predict the full-view feature map from partial-view feature map and text prompts extracted from the incomplete objects in the input image. To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning, avoiding tedious annotations of the amodal masks and occluded regions. At inference, we devise a layer-wise deocclusion strategy to improve efficiency while maintaining the deocclusion quality. Extensive experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin. Our method can also be extended to cross-domain scenes and novel categories that are not covered by the training set. Further, we demonstrate the deocclusion applicability of PACO in single-view 3D scene reconstruction and object recomposition.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# YOLOv7に基づく全安全機器検出のための深層学習手法 A Deep Learning Approach to Detect Complete Safety Equipment For Construction Workers Based On YOLOv7 ( http://arxiv.org/abs/2406.07707v1 ) ライセンス: Link先を確認	Md. Shariful Islam, SM Shaqib, Shahriar Sultan Ramit, Shahrun Akter Khushbu, Mr. Abdus Sattar, Dr. Sheak Rashed Haider Noor,	(参考訳) 建設部門では、労働者の安全を確保することが最も重要である。本研究では, ヘルメット, ゴーグル, ジャケット, 手袋, 履物など, 建設作業員が着用する安全装備を同定するための深層学習技術を提案する。推奨されるアプローチは、YOLO v7(You Only Look Once)オブジェクト検出アルゴリズムを使用して、これらの安全アイテムを正確に検出する。この作業で使用されるデータセットは、トレーニング、テスト、検証セットに分割されたラベル付きイメージで構成されている。各画像には、画像内の安全装置の位置を示すバウンディングボックスラベルがある。モデルは、反復的なトレーニングアプローチを通じてラベル付きデータセットに基づいて安全装置を識別し、分類するように訓練されている。このモデルをトレーニングするためにカスタムデータセットを使用しました。トレーニングされたモデルでは,安全機器認識のための精度,リコール,F1スコアが良好に動作した。また、モデルの評価は、mAP@0.5スコア87.7\%の励振結果を生み出した。モデルは効果的に動作し、建設現場における安全装置の違反を迅速に識別することができる。結果の徹底的な評価は、モデルの利点を明らかにし、開発の潜在的な領域を指摘します。本研究は,自動かつ信頼性の高い安全機器検出手法を提供することにより,コンピュータビジョンと職場安全の分野に貢献する。深層学習に基づくアプローチは、安全コンプライアンスを高め、建設業界における事故リスクを低減する In the construction sector, ensuring worker safety is of the utmost significance. In this study, a deep learning-based technique is presented for identifying safety gear worn by construction workers, such as helmets, goggles, jackets, gloves, and footwears. The recommended approach uses the YOLO v7 (You Only Look Once) object detection algorithm to precisely locate these safety items. The dataset utilized in this work consists of labeled images split into training, testing and validation sets. Each image has bounding box labels that indicate where the safety equipment is located within the image. The model is trained to identify and categorize the safety equipment based on the labeled dataset through an iterative training approach. We used custom dataset to train this model. Our trained model performed admirably well, with good precision, recall, and F1-score for safety equipment recognition. Also, the model's evaluation produced encouraging results, with a mAP@0.5 score of 87.7\%. The model performs effectively, making it possible to quickly identify safety equipment violations on building sites. A thorough evaluation of the outcomes reveals the model's advantages and points up potential areas for development. By offering an automatic and trustworthy method for safety equipment detection, this research makes a contribution to the fields of computer vision and workplace safety. The proposed deep learning-based approach will increase safety compliance and reduce the risk of accidents in the construction industry	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# 分子設計のためのベイズ最適化における共通問題の診断と修正 Diagnosing and fixing common problems in Bayesian optimization for molecule design ( http://arxiv.org/abs/2406.07709v1 ) ライセンス: Link先を確認	Austin Tripp, José Miguel Hernández-Lobato,	(参考訳) ベイズ最適化(英: Bayesian Optimization、BO)は、分子設計の課題に対する原理的なアプローチである。本稿では,不正確な先行幅,過度な平滑化,不適切な獲得関数の最大化という,経験的性能の低下を引き起こすBOの落とし穴を3つ説明する。これらの課題に対処して,分子設計のためのPMOベンチマーク(Gao et al, 2022)において,基本的なBO設定でも高い総合的な性能を達成可能であることを示す。これらの結果から,BOは分子群集における機械学習のさらなる注目の恩恵を受ける可能性が示唆された。 Bayesian optimization (BO) is a principled approach to molecular design tasks. In this paper we explain three pitfalls of BO which can cause poor empirical performance: an incorrect prior width, over-smoothing, and inadequate acquisition function maximization. We show that with these issues addressed, even a basic BO setup is able to achieve the highest overall performance on the PMO benchmark for molecule design (Gao et al, 2022). These results suggest that BO may benefit from more attention in the machine learning for molecules community.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# YOLOv8を利用した大都市圏の道路安全・交通管理向上のための車両速度検出システム Vehicle Speed Detection System Utilizing YOLOv8: Enhancing Road Safety and Traffic Management for Metropolitan Areas ( http://arxiv.org/abs/2406.07710v1 ) ライセンス: Link先を確認	SM Shaqib, Alaya Parvin Alo, Shahriar Sultan Ramit, Afraz Ul Haque Rupak, Sadman Sadik Khan, Mr. Md. Sadekur Rahman,	(参考訳) 死者や事故の減少による交通安全を確保するためには,車両の速度検出が不可欠である。自動車の速度の正確なモニタリングによって可能となる速度制限の実施によって、寛大な運転慣行は避けられる。バングラデシュでは道路事故が主要な死因の1つとなっている。バングラデシュ旅客福祉協会は2023年に、年間7,902人が交通事故で命を落としたと発表した。交通安全維持には効率的な車両速度検出が不可欠である。信頼性の高い速度検出は重要なトラフィックデータ収集にも役立ち、トラフィックフローを最適化し、より安全な道路インフラを提供する。 YOLOv8モデルは、密接な監督の下で訓練されたときに、より高速で精度の高いビデオ中の車を認識、追跡することができる。バングラデシュにおける車両の速度推定における物体識別への教師あり学習の適用と、特定の交通状況と安全上の懸念に焦点を当てた知見を提供することにより、この研究は、この地域に注目すべき貢献である。 MAEは3.5,RMSEは4.22であり,提案手法は従来の手法に代えて経済的に有効な代替手段である。 In order to ensure traffic safety through a reduction in fatalities and accidents, vehicle speed detection is essential. Relentless driving practices are discouraged by the enforcement of speed restrictions, which are made possible by accurate monitoring of vehicle speeds. Road accidents remain one of the leading causes of death in Bangladesh. The Bangladesh Passenger Welfare Association stated in 2023 that 7,902 individuals lost their lives in traffic accidents during the course of the year. Efficient vehicle speed detection is essential to maintaining traffic safety. Reliable speed detection can also help gather important traffic data, which makes it easier to optimize traffic flow and provide safer road infrastructure. The YOLOv8 model can recognize and track cars in videos with greater speed and accuracy when trained under close supervision. By providing insights into the application of supervised learning in object identification for vehicle speed estimation and concentrating on the particular traffic conditions and safety concerns in Bangladesh, this work represents a noteworthy contribution to the area. The MAE was 3.5 and RMSE was 4.22 between the predicted speed of our model and the actual speed or the ground truth measured by the speedometer Promising increased efficiency and wider applicability in a variety of traffic conditions, the suggested solution offers a financially viable substitute for conventional approaches.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# 損失勾配ガウス幅に基づく一般化と最適化保証 Loss Gradient Gaussian Width based Generalization and Optimization Guarantees ( http://arxiv.org/abs/2406.07712v1 ) ライセンス: Link先を確認	Arindam Banerjee, Qiaobo Li, Yingxue Zhou,	(参考訳) 機械学習における集団損失の一般化と最適化は、しばしば一様収束に基づく解析に頼っている。現代のモデルの豊かな表現力は、このアプローチに対する懸念につながっている。本稿では,Loss Gradient Gaussian Width (LGGW)によって測定された勾配の複雑さの観点から,一般化と最適化の保証を示す。まず,LGGWのフレキシブルな勾配支配条件下での一般化保証を導入する。第二に, 有限和最適化におけるサンプル再利用は, LGGWが小さい限り, 集団勾配から経験的勾配を逸脱させるものではないことを示す。第3に、ディープネットワークに着目し、軽度な仮定の下でLGGWをバインドする方法を示す。特に,LGGWは有界であることを示す。 (a) 損失ヘッセン固有値の$L_2$-normにより、一般に使用されるディープモデルに対して$\tilde{O}(1)$と実証的に示されている。 (b) プロデューサのガウス幅、すなわち、最後のただし1層の出力の点で。我々の知る限り、LGGWによる一般化と最適化の保証は、その種の第一の結果であり、予測器ラデマッハの複雑性に基づく解析の落とし穴を回避し、深層モデルの量的に厳密な境界に対するかなりの保証を保っている。 Generalization and optimization guarantees on the population loss in machine learning often rely on uniform convergence based analysis, typically based on the Rademacher complexity of the predictors. The rich representation power of modern models has led to concerns about this approach. In this paper, we present generalization and optimization guarantees in terms of the complexity of the gradients, as measured by the Loss Gradient Gaussian Width (LGGW). First, we introduce generalization guarantees directly in terms of the LGGW under a flexible gradient domination condition, which we demonstrate to hold empirically for deep models. Second, we show that sample reuse in finite sum (stochastic) optimization does not make the empirical gradient deviate from the population gradient as long as the LGGW is small. Third, focusing on deep networks, we present results showing how to bound their LGGW under mild assumptions. In particular, we show that their LGGW can be bounded (a) by the $L_2$-norm of the loss Hessian eigenvalues, which has been empirically shown to be $\tilde{O}(1)$ for commonly used deep models; and (b) in terms of the Gaussian width of the featurizer, i.e., the output of the last-but-one layer. To our knowledge, our generalization and optimization guarantees in terms of LGGW are the first results of its kind, avoid the pitfalls of predictor Rademacher complexity based analysis, and hold considerable promise towards quantitatively tight bounds for deep models.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# LLAMAFUZZ: 大規模言語モデルによるGreybox Fuzzingの拡張 LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing ( http://arxiv.org/abs/2406.07714v1 ) ライセンス: Link先を確認	Hongxiang Zhang, Yuyang Rong, Yifeng He, Hao Chen,	(参考訳) Greyboxのファジィは、プログラムのバグや脆弱性を明らかにすることに成功している。しかし、ランダム化された突然変異戦略は、構造データに対するファジィザの性能を制限している。特殊なファジィザは複雑な構造化データを扱うことができるが、文法にさらなる努力が必要であり、低スループットに悩まされる。本稿では,構造化データに対するグレーボックスファジングを強化するために,Large Language Modelを活用する可能性について検討する。我々は、データ変換とフォーマットに関するLLMの事前学習知識を利用して、新しい有効な入力を生成する。さらに、組換え突然変異種を用いて微調整を行い、構造化形式と突然変異戦略を効果的に学習した。 LLMベースのファザであるLLAMAFUZZは、LLMのパワーを統合して、構造化データをファザリングに理解し、変更する。我々は,標準的なバグベースのベンチマークMagmaと,さまざまな実世界のプログラムで実験を行う。 LLAMAFUZZは、平均して41のバグでトップのライバルより優れています。また、すべてのトライアルで47のユニークなバグを特定しました。さらに、LLAMAFUZはバグトリガとバグ到達の両方で一貫したパフォーマンスを示した。 AFL++と比較すると、LLAMAFUZは現実世界のプログラムセットで平均27.19%以上の分岐を達成した。また、コードカバレッジの観点からLLMがファジィ処理をどのように強化するかを説明するためのケーススタディも紹介する。 Greybox fuzzing has achieved success in revealing bugs and vulnerabilities in programs. However, randomized mutation strategies have limited the fuzzer's performance on structured data. Specialized fuzzers can handle complex structured data, but require additional efforts in grammar and suffer from low throughput. In this paper, we explore the potential of utilizing the Large Language Model to enhance greybox fuzzing for structured data. We utilize the pre-trained knowledge of LLM about data conversion and format to generate new valid inputs. We further fine-tuned it with paired mutation seeds to learn structured format and mutation strategies effectively. Our LLM-based fuzzer, LLAMAFUZZ, integrates the power of LLM to understand and mutate structured data to fuzzing. We conduct experiments on the standard bug-based benchmark Magma and a wide variety of real-world programs. LLAMAFUZZ outperforms our top competitor by 41 bugs on average. We also identified 47 unique bugs across all trials. Moreover, LLAMAFUZZ demonstrated consistent performance on both bug trigger and bug reached. Compared to AFL++, LLAMAFUZZ achieved 27.19% more branches in real-world program sets on average. We also demonstrate a case study to explain how LLMs enhance the fuzzing process in terms of code coverage.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# 脳のコインフリップ:神経集合を用いた統計的学習 Coin-Flipping In The Brain: Statistical Learning with Neuronal Assemblies ( http://arxiv.org/abs/2406.07715v1 ) ライセンス: Link先を確認	Max Dabagia, Daniel Mitropolsky, Christos H. Papadimitriou, Santosh S. Vempala,	(参考訳) 脳からインテリジェンスが発生するかは科学の中心的な問題である。知性の重要な側面は、不確実性に対処することである -- 環境に関する優れた予測を発達させ、これらの予測を判断に転換すること。脳自身は、発達と神経活動を駆動する化学プロセスから刺激に対する反応のバラツキを試すまでの多くのレベルでうるさい。一つの仮説は、脳のメカニズムに固有のノイズが、世界のモデルからサンプリングされ、予測を発生させることである。この仮説をテストするために、我々は、スタイリングされたニューロンやシナプス、可塑性、抑制に基づく、脳の生物学的に妥当な計算モデルであるNEMOにおける統計的学習の出現について研究し、調整された発火が位置、概念、記憶、その他の認知の項目を思い出させるために、組み立てられたニューロンのグループであるアセンブリを発生させる。理論とシミュレーションにおいて、アセンブリ間の接続が統計を記録し、周囲ノイズを利用してアセンブリ間の確率的選択を行うことが示されている。これによりNEMOは、マルコフ連鎖のような内部モデルを作成することができる。本研究は, 生物学的に妥当な確率的計算の基礎を提供し, 雑音が脳の認知メカニズムの有用な構成要素であるという仮説を理論的に裏付けるものである。 How intelligence arises from the brain is a central problem in science. A crucial aspect of intelligence is dealing with uncertainty -- developing good predictions about one's environment, and converting these predictions into decisions. The brain itself seems to be noisy at many levels, from chemical processes which drive development and neuronal activity to trial variability of responses to stimuli. One hypothesis is that the noise inherent to the brain's mechanisms is used to sample from a model of the world and generate predictions. To test this hypothesis, we study the emergence of statistical learning in NEMO, a biologically plausible computational model of the brain based on stylized neurons and synapses, plasticity, and inhibition, and giving rise to assemblies -- a group of neurons whose coordinated firing is tantamount to recalling a location, concept, memory, or other primitive item of cognition. We show in theory and simulation that connections between assemblies record statistics, and ambient noise can be harnessed to make probabilistic choices between assemblies. This allows NEMO to create internal models such as Markov chains entirely from the presentation of sequences of stimuli. Our results provide a foundation for biologically plausible probabilistic computation, and add theoretical support to the hypothesis that noise is a useful component of the brain's mechanism for cognition.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# 高度化昆虫検出のための移動学習モデルのパワーを開放する:進化的昆虫分類 Unleashing the Power of Transfer Learning Model for Sophisticated Insect Detection: Revolutionizing Insect Classification ( http://arxiv.org/abs/2406.07716v1 ) ライセンス: Link先を確認	Md. Mahmudul Hasan, SM Shaqib, Ms. Sharmin Akter, Rabiul Alam, Afraz Ul Haque, Shahrun akter khushbu,	(参考訳) 作物・植物健康のための昆虫検出システムの目的は、農業地帯における昆虫の寄生虫の発見と発見を目立たせることである。コンピュータービジョンや機械学習などの最先端技術を活用して、有害昆虫を迅速かつ正確に識別する。これにより、作物を救い、最適な植物の健康を維持することができる。本研究は,データ取得,前処理,データ分割,モデル実装,モデル評価を含む。この研究ではMobileNetV2、ResNet152V2、Xecption、Custom CNNといった異なるモデルが使用された。昆虫の写真を分類するために,ResNet152V2アーキテクチャに基づく畳み込みニューラルネットワーク(CNN)を構築し,評価した。 ResNet152V2は、99%のトレーニング精度と97%のテスト精度を達成した。この結果は、昆虫の分類と昆虫学研究における現実世界の応用の可能性を強調し、効率と精度を強調した。食料の安全を確保し、世界の農業生産を維持するためには、昆虫の発見が不可欠である。 ResNet152V2モデルのようなカットエッジ技術は、昆虫の識別の自動化と精度の向上に大きな影響を与えている。効率的な昆虫検出は作物の損失を最小限に抑えるだけでなく、農業の生産性を高め、持続可能な食料生産に寄与する。このことは、グローバルな食料安全保障に関わる課題に対処する上で、テクノロジーが重要な役割を担っていることを裏付けている。 The purpose of the Insect Detection System for Crop and Plant Health is to keep an eye out for and identify insect infestations in farming areas. By utilizing cutting-edge technology like computer vision and machine learning, the system seeks to identify hazardous insects early and accurately. This would enable prompt response to save crops and maintain optimal plant health. The Method of this study includes Data Acquisition, Preprocessing, Data splitting, Model Implementation and Model evaluation. Different models like MobileNetV2, ResNet152V2, Xecption, Custom CNN was used in this study. In order to categorize insect photos, a Convolutional Neural Network (CNN) based on the ResNet152V2 architecture is constructed and evaluated in this work. Achieving 99% training accuracy and 97% testing accuracy, ResNet152V2 demonstrates superior performance among four implemented models. The results highlight its potential for real-world applications in insect classification and entomology studies, emphasizing efficiency and accuracy. To ensure food security and sustain agricultural output globally, finding insects is crucial. Cutting-edge technology, such as ResNet152V2 models, greatly influence automating and improving the accuracy of insect identification. Efficient insect detection not only minimizes crop losses but also enhances agricultural productivity, contributing to sustainable food production. This underscores the pivotal role of technology in addressing challenges related to global food security.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# 離散時間におけるアクティブ推論の簡潔な数学的記述 A Concise Mathematical Description of Active Inference in Discrete Time ( http://arxiv.org/abs/2406.07726v1 ) ライセンス: Link先を確認	Jesse van Oostrum, Carlotta Langer, Nihat Ay,	(参考訳) 本稿では,離散時間における能動推論の簡潔な数学的記述について述べる。本論文の主部は,行動選択理論の具体例を含む,このトピックの一般的な紹介として機能する。付録では、より微妙な数学的詳細が議論されている。この部分は、既に活発な推論文学を研究しているが、数学的詳細や導出を理解するのに苦労している読者を対象としている。写本全体を通して、標準的な数学的テキストと正確かつ一致した表記法を採用することに特に注意が払われている。すべての方程式と導出は、トピック上の他の人気のあるテキストの特定の方程式数に関連付けられている。さらに,本論文で記述したアクション選択機構を実装し,pymdp環境と互換性を持つPythonコードも提供される。 In this paper we present a concise mathematical description of active inference in discrete time. The main part of the paper serves as a general introduction to the topic, including an example illustrating the theory on action selection. In the appendix the more subtle mathematical details are discussed. This part is aimed at readers who have already studied the active inference literature but struggle to make sense of the mathematical details and derivations. Throughout the whole manuscript, special attention has been paid to adopting notation that is both precise and in line with standard mathematical texts. All equations and derivations are linked to specific equation numbers in other popular text on the topic. Furthermore, Python code is provided that implements the action selection mechanism described in this paper and is compatible with pymdp environments.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# 効率的な並列マルチホップ推論:知識グラフ解析のためのスケーラブルなアプローチ Efficient Parallel Multi-Hop Reasoning: A Scalable Approach for Knowledge Graph Analysis ( http://arxiv.org/abs/2406.07727v1 ) ライセンス: Link先を確認	Jesmin Jahan Tithi, Fabio Checconi, Fabrizio Petrini,	(参考訳) マルチホップ推論(MHR、Multi-hop reasoning)は、人工知能と自然言語処理におけるプロセスであり、システムは結論または答えに到達するために複数の推論ステップを行う必要がある。知識グラフやデータベースのコンテキストでは、複雑なクエリを理解したり、より深い理解を必要とするタスクを実行するために、複数のリンクされたエンティティや関係をトラバースする。マルチホップ推論は、質問応答、知識ベース補完、リンク予測など、様々なアプリケーションにおいて重要な機能である。人工知能、機械学習、グラフ分析に多大な関心を寄せている。本稿では,大規模グラフ上での時間効率の最適化に焦点をあて,直交目標である精度の従来の重視から逸脱する。本稿では,知識グラフ内の頂点間の上位K経路を効率よく識別し,3つのホップクエリの最適解を求めるために,ドメイン固有の学習埋め込みを利用する並列アルゴリズムを提案する。 1) MHRの性能, スケーラビリティ, 効率を向上させるための新しい並列アルゴリズムを提案する。 2) 先進的なIntelおよびAMDアーキテクチャにおけるアルゴリズムの優れた性能を実証実験により示す。本稿では,深層学習におけるチューリング賞の学術的関連性を特定するためのケーススタディを通じて,アルゴリズムの実践性を実証し,複雑な実体関係を扱う能力を強調した。これは、現代の知識グラフの複雑さの増大をナビゲートするのに有用な、高性能なMHRを実現するための我々のアプローチの可能性を示すものである。 Multi-hop reasoning (MHR) is a process in artificial intelligence and natural language processing where a system needs to make multiple inferential steps to arrive at a conclusion or answer. In the context of knowledge graphs or databases, it involves traversing multiple linked entities and relationships to understand complex queries or perform tasks requiring a deeper understanding. Multi-hop reasoning is a critical function in various applications, including question answering, knowledge base completion, and link prediction. It has garnered significant interest in artificial intelligence, machine learning, and graph analytics. This paper focuses on optimizing MHR for time efficiency on large-scale graphs, diverging from the traditional emphasis on accuracy which is an orthogonal goal. We introduce a novel parallel algorithm that harnesses domain-specific learned embeddings to efficiently identify the top K paths between vertices in a knowledge graph to find the best answers to a three-hop query. Our contributions are: (1) We present a new parallel algorithm to enhance MHR performance, scalability and efficiency. (2) We demonstrate the algorithm's superior performance on leading-edge Intel and AMD architectures through empirical results. We showcase the algorithm's practicality through a case study on identifying academic affiliations of potential Turing Award laureates in Deep Learning, highlighting its capability to handle intricate entity relationships. This demonstrates the potential of our approach to enabling high-performance MHR, useful to navigate the growing complexity of modern knowledge graphs.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# 素因数分解問題に対するD波量子アニールの実験 Experimenting with D-Wave Quantum Annealers on Prime Factorization problems ( http://arxiv.org/abs/2406.07732v1 ) ライセンス: Link先を確認	Jingwen Ding, Giuseppe Spallitta, Roberto Sebastiani,	(参考訳) この論文は、我々が最近発表した論文の上に構築されており、量子アニールによる素因数分解(PF)に対する新しいアプローチを提案しており、そこでは8,219,999=32,749x251が分解可能な最高素数である。しかし、これらの結果に繋がる一連のアニール実験は、直線的な経路をたどるものではなく、失敗または部分的に失敗する試みとバックトラックに満ちた、複雑な試行錯誤プロセスに関係していたため、最終的に成功したアニール戦略を見つけることができたのです。本稿では、実験的な意思決定の背後にある理由を掘り下げ、その結果を達成できる最終戦略を思いつく前に、私たちが行った試みのいくつかについて説明します。これはまた、私たちが調査した多くのアイデア、テクニック、戦略を含んでいます。最終的に私たちが採用したものは、D-Waveのユーザや実践者の、より専門化されたオーディエンスに洞察を与えることができます。 i$) 異なる初期化技術がパフォーマンスに影響を与えることを示し、そのうちの1つは、局所的な構造的埋め込みをターゲットとする場合のフラックスバイアスが効果的であること、(ii$) 連鎖強度は、グローバルな埋め込みに依存する問題よりも局所的な構造的埋め込みに低い影響を持つこと、(iii$) 壊れたチェーンと励起されたCFAの間にトレードオフがあること、そして、単一のキュービットの代わりにモジュールをベースとした漸進的なオフセット救済アプローチが提案されている。このように、私たちの経験の詳細を共有することで、進化を続ける量子アニールの風景についての洞察を提供し、人々がD-Wave量子アニールにアクセスし、効果的に利用することを目指している。 This paper builds on top of a paper we have published very recently, in which we have proposed a novel approach to prime factorization (PF) by quantum annealing, where 8,219,999=32,749x251 was the highest prime product we were able to factorize -- which, to the best of our knowledge is the largest number which was ever factorized by means of a quantum device. The series of annealing experiments which led us to these results, however, did not follow a straight-line path; rather, they involved a convoluted trial-and-error process, full of failed or partially-failed attempts and backtracks, which only in the end drove us to find the successful annealing strategies. In this paper, we delve into the reasoning behind our experimental decisions and provide an account of some of the attempts we have taken before conceiving the final strategies that allowed us to achieve the results. This involves also a bunch of ideas, techniques, and strategies we investigated which, although turned out to be inferior wrt. those we adopted in the end, may instead provide insights to a more-specialized audience of D-Wave users and practitioners. In particular, we show the following insights: ($i$) different initialization techniques affect performances, among which flux biases are effective when targeting locally-structured embeddings; ($ii$) chain strengths have a lower impact in locally-structured embeddings compared to problem relying on global embeddings; ($iii$) there is a trade-off between broken chain and excited CFAs, suggesting an incremental annealing offset remedy approach based on the modules instead of single qubits. Thus, by sharing the details of our experiences, we aim to provide insights into the evolving landscape of quantum annealing, and help people access and effectively use D-Wave quantum annealers.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# REALサンプリング:漸近エントロピーによるオープンエンデッドジェネレーションの現実性と多様性を高める REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy ( http://arxiv.org/abs/2406.07735v1 ) ライセンス: Link先を確認	Haw-Shiuan Chang, Nanyun Peng, Mohit Bansal, Anil Ramakrishna, Tagyoung Chung,	(参考訳) 大規模言語モデル(LLM)の復号法は通常、事実性の確保と多様性の維持のトレードオフに苦慮する。例えば、核内の高pしきい値(トップp)のサンプリングは多様性を増すが、事実性を低下させる。本稿では,適応しきい値の$p$を予測して,実効性と核サンプリングの多様性を向上させる復号法であるREAL(Residual Entropy from Asymptotic Line)サンプリングを提案する。具体的には、REAL サンプリングは LLM が幻覚するステップワイドな確率を予測し、 LLM が幻覚する確率の p 閾値を下げる。そうでなければ、REALサンプリングは多様性を高めるためにpしきい値を増加させる。本研究では,次のトークンの漸近エントロピー(すなわち固有の不確実性)を,異なる大きさのLCMから次トーケンエントロピーを外挿することによって予測する,トークンレベルの幻覚予測(THF)モデルを構築した。 LLMのエントロピーが漸近エントロピーよりも高い場合、THFモデルは高い幻覚障害を予測し、REALサンプリングではp閾値が低い。 FactualityPromptsベンチマークでは,70M THFモデルに基づくREALサンプリングが,検索基準と人的評価の両方から,7B LLMの事実と多様性を同時に改善できることが示されている。対照的な復号法と組み合わせて、REALサンプリングは9つのサンプリング方法より優れ、グリーディサンプリングよりも現実的で、$p=0.5$の核サンプリングよりも多種多様であるテキストを生成する。さらに、予測された漸近性エントロピーは幻覚検出タスクに有用な教師なし信号である。 Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. For example, a higher p threshold in the nucleus (top-p) sampling increases the diversity but decreases the factuality, and vice versa. In this paper, we propose REAL (Residual Entropy from Asymptotic Line) sampling, a decoding method that achieves improved factuality and diversity over nucleus sampling by predicting an adaptive threshold of $p$. Specifically, REAL sampling predicts the step-wise likelihood of an LLM to hallucinate, and lowers the p threshold when an LLM is likely to hallucinate. Otherwise, REAL sampling increases the p threshold to boost the diversity. To predict the step-wise hallucination likelihood without supervision, we construct a Token-level Hallucination Forecasting (THF) model to predict the asymptotic entropy (i.e., inherent uncertainty) of the next token by extrapolating the next-token entropies from a series of LLMs with different sizes. If a LLM's entropy is higher than the asymptotic entropy (i.e., the LLM is more uncertain than it should be), the THF model predicts a high hallucination hazard, which leads to a lower p threshold in REAL sampling. In the FactualityPrompts benchmark, we demonstrate that REAL sampling based on a 70M THF model can substantially improve the factuality and diversity of 7B LLMs simultaneously, judged by both retrieval-based metrics and human evaluation. After combined with contrastive decoding, REAL sampling outperforms 9 sampling methods, and generates texts that are more factual than the greedy sampling and more diverse than the nucleus sampling with $p=0.5$. Furthermore, the predicted asymptotic entropy is also a useful unsupervised signal for hallucination detection tasks.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# MultiPragEval:大規模言語モデルの多言語プラグマティック評価 MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models ( http://arxiv.org/abs/2406.07736v1 ) ライセンス: Link先を確認	Dojun Park, Jiwoo Lee, Seohyun Park, Hyeyun Jeong, Youngeun Koo, Soonha Hwang, Seonwoo Park, Sungeun Lee,	(参考訳) LLMの能力が拡大するにつれて、より高度な言語理解に焦点をあてて、基本的な知識評価以上の評価を行うことがますます重要になる。本研究は,英語,ドイツ語,韓国語,中国語におけるLLMの多言語的評価を目的とした頑健なテストスイートであるMultiPragEvalを紹介する。 Griceの協力原理と4つの会話の最大値に基づいて分類された1200の質問ユニットを補完するMultiPragEvalは、LLMの文脈認識とインプリケートされた意味を推測する能力の詳細な評価を可能にする。以上の結果から,Claude3-Opusはすべてのテスト言語で他のモデルよりも優れており,この分野における最先端の確立が期待できる。オープンソースのモデルでは、Solar-10.7BとQwen1.5-14Bが強力なライバルとして登場している。この研究は、実用的推論におけるLLMの多言語評価の道のりを導くだけでなく、AIシステムにおける高度な言語理解に必要なニュアンスド能力に関する貴重な洞察を提供する。 As the capabilities of LLMs expand, it becomes increasingly important to evaluate them beyond basic knowledge assessment, focusing on higher-level language understanding. This study introduces MultiPragEval, a robust test suite designed for the multilingual pragmatic evaluation of LLMs across English, German, Korean, and Chinese. Comprising 1200 question units categorized according to Grice's Cooperative Principle and its four conversational maxims, MultiPragEval enables an in-depth assessment of LLMs' contextual awareness and their ability to infer implied meanings. Our findings demonstrate that Claude3-Opus significantly outperforms other models in all tested languages, establishing a state-of-the-art in the field. Among open-source models, Solar-10.7B and Qwen1.5-14B emerge as strong competitors. This study not only leads the way in the multilingual evaluation of LLMs in pragmatic inference but also provides valuable insights into the nuanced capabilities necessary for advanced language comprehension in AI systems.	翻訳日:2024-06-13 21:06:16 公開日:2024-06-11
# AI駆動の世界におけるソフトウェア工学の未来 The Future of Software Engineering in an AI-Driven World ( http://arxiv.org/abs/2406.07737v1 ) ライセンス: Link先を確認	Valerio Terragni, Partha Roop, Kelly Blincoe,	(参考訳) ソフトウェアエンジニアリングではパラダイムシフトが進行中であり、LLMのようなAIシステムがソフトウェア開発の生産性向上に重要性を増している。この傾向は続くと予測されている。今後5年間では、人間開発者とAIの共生的なパートナーシップが増加するだろう。私たちは、AIをソフトウェア開発プロセスに統合することによって引き起こされる重要な研究課題に対処する必要があります。本稿では、AI駆動の世界におけるソフトウェア開発の将来についてのビジョンを示し、このビジョンを実現するために研究コミュニティが取り組むべき重要な課題について考察する。 A paradigm shift is underway in Software Engineering, with AI systems such as LLMs gaining increasing importance for improving software development productivity. This trend is anticipated to persist. In the next five years, we will likely see an increasing symbiotic partnership between human developers and AI. The Software Engineering research community cannot afford to overlook this trend; we must address the key research challenges posed by the integration of AI into the software development process. In this paper, we present our vision of the future of software development in an AI-Driven world and explore the key challenges that our research community should address to realize this vision.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# エゴセントリックコンピュータビジョンの産業シナリオへの応用について On the Application of Egocentric Computer Vision to Industrial Scenarios ( http://arxiv.org/abs/2406.07738v1 ) ライセンス: Link先を確認	Vivek Chavan, Oliver Heimann, Jörg Krüger,	(参考訳) エゴセントリックなビジョンは、一人称視点から世界を捉え、分析することを目的としている。我々は、エゴセントリックなウェアラブルデバイスが、データ収集、アノテーション、ラベル付け、下流アプリケーションといった産業用ユースケースを改善し、強化する可能性を探る。これにより、データ収集が容易になり、ユーザが追加のコンテキストを提供できるようになる。このアプローチは、従来の産業用機械学習ワークフローの補足として役立つと期待しています。コード、データセットおよび関連するリソースは、https://github.com/Vivek9Chavan/EgoVis24で利用可能になる。 Egocentric vision aims to capture and analyse the world from the first-person perspective. We explore the possibilities for egocentric wearable devices to improve and enhance industrial use cases w.r.t. data collection, annotation, labelling and downstream applications. This would contribute to easier data collection and allow users to provide additional context. We envision that this approach could serve as a supplement to the traditional industrial Machine Vision workflow. Code, Dataset and related resources will be available at: https://github.com/Vivek9Chavan/EgoVis24	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# UICoder: フィードバック自動生成によるユーザインタフェースコード生成のための大規模な言語モデルを微調整 UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback ( http://arxiv.org/abs/2406.07739v1 ) ライセンス: Link先を確認	Jason Wu, Eldon Schoop, Alan Leung, Titus Barik, Jeffrey P. Bigham, Jeffrey Nichols,	(参考訳) 大規模言語モデル(LLM)は、視覚的に関連する設計をコンパイルし、生成するUIコードを生成するのに苦労する。生成を改善するための既存のアプローチは、高価な人間のフィードバックやプロプライエタリなモデルを蒸留することに依存している。本稿では,LLMを誘導し,高品質なUIコードを生成するための自動フィードバック(コンパイラとマルチモーダルモデル)の利用について検討する。提案手法は,既存のLLMから始まり,原モデルを用いて大規模合成データセットを自己生成することで,改良された高品質なデータセットにデータを積極的にフィルタし,スコア付けし,デ複製する自動ツールを適用することで,改良されたモデルを反復的に生成する。オリジナルのLLMは、この洗練されたデータセットを微調整することで改善されている。提案手法をいくつかのオープンソース LLM に適用し,結果のパフォーマンスをベースラインモデルと比較した。評価の結果, ダウンロード可能なベースラインを全て上回り, より大きなプロプライエタリモデルの性能に近づいた。 Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing approaches to improve generation rely on expensive human feedback or distilling a proprietary model. In this paper, we explore the use of automated feedback (compilers and multi-modal models) to guide LLMs to generate high-quality UI code. Our method starts with an existing LLM and iteratively produces improved models by self-generating a large synthetic dataset using an original model, applying automated tools to aggressively filter, score, and de-duplicate the data into a refined higher quality dataset. The original LLM is improved by finetuning on this refined dataset. We applied our approach to several open-source LLMs and compared the resulting performance to baseline models with both automated metrics and human preferences. Our evaluation shows the resulting models outperform all other downloadable baselines and approach the performance of larger proprietary models.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# バック・トゥ・ザ・カラー:教師なし深度推定のための特定の色変換への深度学習 Back to the Color: Learning Depth to Specific Color Transformation for Unsupervised Depth Estimation ( http://arxiv.org/abs/2406.07741v1 ) ライセンス: Link先を確認	Yufan Zhu, Chongzhi Ran, Mingtao Feng, Weisheng Dong, Antonio M. López, Guangming Shi,	(参考訳) 仮想エンジンは様々な合成シーンの深度マップを生成する能力を有しており、深度推定モデルの訓練には有用ではない。しかし、合成色は実世界の色に比べて大きな相違が見られることが多く、特に教師なしの単分子深度推定タスクで発生する複雑で不確実な環境において、現実世界のシーンにおける深度推定の課題を提起する。この問題に対処するために,実世界のデータに基づいて訓練されたモデルを用いて,奥行きからリアルな色を予測するフレームワークBack2Colorを提案する。さらに,Syn-Real CutMix法を実世界の非教師付きおよび合成教師付き深度サンプルの併用訓練に利用することにより,実世界のシーンにおける単眼深度推定の性能向上を実現する。さらに,非厳密な動きが深度推定に与える影響を包括的に解決するために,時間次元と空間次元の両方で教師なし学習の利点を統合する自動学習不確実性時空間融合法(Auto-UTSF)を提案する。さらに,視覚注意ネットワークに基づく深度推定ネットワーク(VADepth)を設計する。私たちのBack2Colorフレームワークは、パフォーマンス指標の改善と予測における詳細な詳細生成、特に教師なし深度推定のためのCityscapesのような挑戦的なデータセットによって実証された、最先端のパフォーマンスを実証しています。 Virtual engines have the capability to generate dense depth maps for various synthetic scenes, making them invaluable for training depth estimation models. However, synthetic colors often exhibit significant discrepancies compared to real-world colors, thereby posing challenges for depth estimation in real-world scenes, particularly in complex and uncertain environments encountered in unsupervised monocular depth estimation tasks. To address this issue, we propose Back2Color, a framework that predicts realistic colors from depth utilizing a model trained on real-world data, thus facilitating the transformation of synthetic colors into real-world counterparts. Additionally, by employing the Syn-Real CutMix method for joint training with both real-world unsupervised and synthetic supervised depth samples, we achieve improved performance in monocular depth estimation for real-world scenes. Moreover, to comprehensively address the impact of non-rigid motions on depth estimation, we propose an auto-learning uncertainty temporal-spatial fusion method (Auto-UTSF), which integrates the benefits of unsupervised learning in both temporal and spatial dimensions. Furthermore, we design a depth estimation network (VADepth) based on the Vision Attention Network. Our Back2Color framework demonstrates state-of-the-art performance, as evidenced by improvements in performance metrics and the production of fine-grained details in our predictions, particularly on challenging datasets such as Cityscapes for unsupervised depth estimation.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# C3DAG:3次元ポーズガイダンスを用いた3次元動物生成制御 C3DAG: Controlled 3D Animal Generation using 3D pose guidance ( http://arxiv.org/abs/2406.07742v1 ) ライセンス: Link先を確認	Sandeep Mishra, Oindrila Saha, Alan C. Bovik,	(参考訳) テキスト・ツー・3D生成の最近の進歩は、高品質な3Dアセットを生成する能力を示している。しかし、動物を生成する一方でこれらの手法は不正確で、しばしば不正確な解剖学と幾何学を表現している。この欠陥を改善するために,提案するC3DAGは,与えられたポーズに整合した高品質な3D動物を生成する,ポーズ制御型テキスト・ツー・3D動物生成フレームワークである。また、Webベースのツールによる動的ポーズ生成と修正を可能にする自動3D形状作成ツールを導入し、簡単なジオメトリを用いて3Dバルーン動物を生成する。そして、深度制御SDSを用いて、この3次元形状を用いてNeRFを初期化する。次の段階では、事前訓練されたNeRFを四重対位制御SDSを用いて微調整する。私たちが開発したパイプラインは、幾何学的および解剖学的に一貫した結果を生成するだけでなく、精密なポーズ制御を許さない従来の方法とは異なり、高度に制御された3D動物をレンダリングする。 Recent advancements in text-to-3D generation have demonstrated the ability to generate high quality 3D assets. However while generating animals these methods underperform, often portraying inaccurate anatomy and geometry. Towards ameliorating this defect, we present C3DAG, a novel pose-Controlled text-to-3D Animal Generation framework which generates a high quality 3D animal consistent with a given pose. We also introduce an automatic 3D shape creator tool, that allows dynamic pose generation and modification via a web-based tool, and that generates a 3D balloon animal using simple geometries. A NeRF is then initialized using this 3D shape using depth-controlled SDS. In the next stage, the pre-trained NeRF is fine-tuned using quadruped-pose-controlled SDS. The pipeline that we have developed not only produces geometrically and anatomically consistent results, but also renders highly controlled 3D animals, unlike prior methods which do not allow fine-grained pose control.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# 線形二次系制御のための完全適応レギュレット保証アルゴリズム Fully Adaptive Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems ( http://arxiv.org/abs/2406.07746v1 ) ライセンス: Link先を確認	Jafar Abbaszadeh Chekan, Cedric Langbort,	(参考訳) 未知の系モデルを持つ線形二次(LQ)制御問題に対する最初のアルゴリズムは、Abbasi-Yadkori と Szepesv\'ari (2011) によって導入された$\mathcal{O}(\sqrt{T})$の後悔を特徴とするものである。このアルゴリズムの計算複雑性を認識して、その後の取り組み(Cohen et al (2019)、Mania et al (2019)、Faradonbeh et al (2020a)、Kargin et al (2022))は、この後悔の順序を保ちながら計算的に抽出可能なアルゴリズムの提案に費やされている。文献における既存の研究は、完全に適応的な探索・探索のトレードオフ調整を欠き、ユーザ定義の値が必要であり、いくつかの要因で全体的な後悔と結びついた成長につながる可能性がある。本研究は,このギャップに気付き,ポリシー更新数(すなわち探索・探索トレードオフの調整)を制御し,後悔の上限を適応的に最適化する,最初の完全適応アルゴリズムを提案する。提案アルゴリズムは、Cohen et al (2019) の SDP に基づくアプローチに基づいており、正規化パラメータを適切に調整し、適応的な入力摂動を追加することにより、水平依存ウォームアップフェーズの必要性を緩和する。さらに、慎重な探索・探索トレードオフ調整により、厳密なシーケンシャル安定性という概念にコミットする必要がなく、初期化の複雑さを生じさせる可能性があることを示す。 The first algorithm for the Linear Quadratic (LQ) control problem with an unknown system model, featuring a regret of $\mathcal{O}(\sqrt{T})$, was introduced by Abbasi-Yadkori and Szepesv\'ari (2011). Recognizing the computational complexity of this algorithm, subsequent efforts (see Cohen et al. (2019), Mania et al. (2019), Faradonbeh et al. (2020a), and Kargin et al.(2022)) have been dedicated to proposing algorithms that are computationally tractable while preserving this order of regret. Although successful, the existing works in the literature lack a fully adaptive exploration-exploitation trade-off adjustment and require a user-defined value, which can lead to overall regret bound growth with some factors. In this work, noticing this gap, we propose the first fully adaptive algorithm that controls the number of policy updates (i.e., tunes the exploration-exploitation trade-off) and optimizes the upper-bound of regret adaptively. Our proposed algorithm builds on the SDP-based approach of Cohen et al. (2019) and relaxes its need for a horizon-dependant warm-up phase by appropriately tuning the regularization parameter and adding an adaptive input perturbation. We further show that through careful exploration-exploitation trade-off adjustment there is no need to commit to the widely-used notion of strong sequential stability, which is restrictive and can introduce complexities in initialization.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# MuSe 2024 マルチモーダル感性分析の課題:社会的知覚と覚醒の認識 The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition ( http://arxiv.org/abs/2406.07753v1 ) ライセンス: Link先を確認	Shahin Amiriparian, Lukas Christ, Alexander Kathan, Maurice Gerczuk, Niklas Müller, Steffen Klug, Lukas Stappen, Andreas König, Erik Cambria, Björn Schuller, Simone Eulitz,	(参考訳) マルチモーダル・センティメント・アナリティクス・チャレンジ (MuSe) 2024は、現代の2つのマルチモーダルな感情と感情分析の問題に対処する。ソーシャル・パーセプション・サブ・チャレンジ (MuSe-Perception) では、参加者は、提供された音声視覚データに基づいて、主張性、支配性、責任性、誠実さなどの、16の異なる個人の社会的属性を予測する。クロスカルチャー・ヒューム検出サブチャレンジ(MuSe-Humor)データセットは、相互言語的・異文化的な環境で自発的なユーモアの検出に焦点を当てた、パストー・スポンタンス・フットボール・コーチ・ヒューム(Passau Sponous Football Coach Humor, Passau-SFCH)データセット上に拡張される。 MuSe 2024の主な目的は、マルチモーダル感情分析、オーディオ・視覚的感情コンピューティング、連続信号処理、自然言語処理など、様々な研究領域から幅広い聴衆を集結させることである。これらの分野の専門家間のコラボレーションと交流を促進することで、MuSe 2024は感情分析と感情計算の理解と適用を複数のモダリティにわたって進めようとしている。本論文は,各サブチャレンジとその対応するデータセットの詳細,各データモダリティから抽出した特徴,課題ベースラインについて述べる。ベースラインシステムでは,トランスフォーマーと専門家が設計した機能を活用し,GRU(Gated Recurrent Unit)-Recurrent Neural Network(RNN)モデルをトレーニングすることで,競争力のあるベースラインシステムを実現する。各サブチャレンジの目に見えないテストデータセットでは、平均ピアソンの相関係数は MuSe-Perception が 0.3573 であり、AUC は MuSe-Humor が 0.8682 である。 The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems: In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals such as assertiveness, dominance, likability, and sincerity based on the provided audio-visual data. The Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor) dataset expands upon the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset, focusing on the detection of spontaneous humor in a cross-lingual and cross-cultural setting. The main objective of MuSe 2024 is to unite a broad audience from various research domains, including multimodal sentiment analysis, audio-visual affective computing, continuous signal processing, and natural language processing. By fostering collaboration and exchange among experts in these fields, the MuSe 2024 endeavors to advance the understanding and application of sentiment analysis and affective computing across multiple modalities. This baseline paper provides details on each sub-challenge and its corresponding dataset, extracted features from each data modality, and discusses challenge baselines. For our baseline system, we make use of a range of Transformers and expert-designed features and train Gated Recurrent Unit (GRU)-Recurrent Neural Network (RNN) models on them, resulting in a competitive baseline system. On the unseen test datasets of the respective sub-challenges, it achieves a mean Pearson's Correlation Coefficient ($\rho$) of 0.3573 for MuSe-Perception and an Area Under the Curve (AUC) value of 0.8682 for MuSe-Humor.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# HOI-Swap:手動インタラクションを意識したビデオにおけるオブジェクトのスワップ HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness ( http://arxiv.org/abs/2406.07754v1 ) ライセンス: Link先を確認	Zihui Xue, Mi Luo, Changan Chen, Kristen Grauman,	(参考訳) ユーザが提供する参照オブジェクト画像から,手動で対話するオブジェクトに焦点をあてて,ビデオ内のオブジェクトを正確に交換する問題について検討する。最近のビデオ編集における拡散モデルの大きな進歩にもかかわらず、これらのモデルは手動オブジェクトの相互作用(HOI)の複雑さを扱うのに不足することが多く、特にオブジェクトの交換がオブジェクトの形や機能の変化をもたらすと、現実的な編集が得られない。このギャップを埋めるために, HOI-Swapを提案する。モデルでは、オブジェクトの特性の変化に基づいて、手つかみなどのインタラクションパターンを調整することを学ぶ。第2段階は, 単一フレーム編集を全シーケンスにわたって拡張し, 1) サンプリングされた動き点に基づいて, 1) 編集されたフレームから新しいシーケンスをワープし, (2) ワープされた動画生成を行うことにより, 元のビデオと制御可能な動きアライメントを実現する。包括的質的,定量的評価により,HOI-Swapは既存の手法よりも優れ,リアルなHOIで高品質な映像編集を実現することが示された。 We study the problem of precisely swapping objects in videos, with a focus on those interacted with by hands, given one user-provided reference object image. Despite the great advancements that diffusion models have made in video editing recently, these models often fall short in handling the intricacies of hand-object interactions (HOI), failing to produce realistic edits -- especially when object swapping results in object shape or functionality changes. To bridge this gap, we present HOI-Swap, a novel diffusion-based video editing framework trained in a self-supervised manner. Designed in two stages, the first stage focuses on object swapping in a single frame with HOI awareness; the model learns to adjust the interaction patterns, such as the hand grasp, based on changes in the object's properties. The second stage extends the single-frame edit across the entire sequence; we achieve controllable motion alignment with the original video by: (1) warping a new sequence from the stage-I edited frame based on sampled motion points and (2) conditioning video generation on the warped sequence. Comprehensive qualitative and quantitative evaluations demonstrate that HOI-Swap significantly outperforms existing methods, delivering high-quality video edits with realistic HOIs.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# LT4SG@SMM4H24: 事前学習言語モデルを用いた小児健康状態のデジタル疫学のつぶやき分類 LT4SG@SMM4H24: Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models ( http://arxiv.org/abs/2406.07759v1 ) ライセンス: Link先を確認	Dasun Athukoralage, Thushari Atapattu, Menasha Thilakaratne, Katrina Falkner,	(参考訳) 本稿では,子どもの医学的障害を報告した英語ツイートのバイナリ分類について,SMM4H24共有タスク5に対するアプローチを提案する。最初のアプローチでは、1つのRoBERTa-largeモデルを微調整しますが、第2のアプローチでは3つの細調整BERTweet-largeモデルの結果を要約します。両手法は検証データに同一の性能を示すが,BERTweet-largeアンサンブルはテストデータに優れることを示した。テストデータに対するF1スコアの0.938を達成し,ベンチマーク分類器を1.18%上回った。 This paper presents our approaches for the SMM4H24 Shared Task 5 on the binary classification of English tweets reporting children's medical disorders. Our first approach involves fine-tuning a single RoBERTa-large model, while the second approach entails ensembling the results of three fine-tuned BERTweet-large models. We demonstrate that although both approaches exhibit identical performance on validation data, the BERTweet-large ensemble excels on test data. Our best-performing system achieves an F1-score of 0.938 on test data, outperforming the benchmark classifier by 1.18%.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# コロイドナノ結晶ホストにおけるコヒーレントエルビウムスピン欠陥 Coherent Erbium Spin Defects in Colloidal Nanocrystal Hosts ( http://arxiv.org/abs/2406.07762v1 ) ライセンス: Link先を確認	Joeson Wong, Mykyta Onizhuk, Jonah Nagura, Arashdeep S. Thind, Jasleen K. Bindra, Christina Wicker, Gregory D. Grant, Yuxuan Zhang, Jens Niklas, Oleg G. Poluektov, Robert F. Klie, Jiefei Zhang, Giulia Galli, F. Joseph Heremans, David D. Awschalom, A. Paul Alivisatos,	(参考訳) 本研究では,Er3+イオンを二酸化セリウムナノ結晶ホストにドープした状態でのスピンコヒーレンスの約1マイクロ秒間におけるスピンコヒーレンスを示す。核スピンフリーホスト材料において、ドパント密度を即時拡散限界以下に低減し、ナノ結晶当たりの1つのエルビウムスピン欠陥の限界に達することにより、長いスピンコヒーレンスを実現する。高度に対称な立方体において大きなオルバッハエネルギーを観測し、それ以外は急速にデコヒーレとなるクビット内のコヒーレンスを保護する。空間的に相関した電子分光測定により、ナノ結晶表面におけるCe3+の存在が明らかになる。これらの要因にもかかわらず、欠陥を埋め込んだナノ結晶ホストは、コアシェルの製作、酸素空孔の酸化還元調整、有機界面活性剤の改質など、将来スピンコヒーレンスと機能を強化するために利用可能な複数の方法を含む、量子センシングと量子通信のアプリケーションにとって、非常に有望である。 We demonstrate nearly a microsecond of spin coherence in Er3+ ions doped in cerium dioxide nanocrystal hosts, despite a large gyromagnetic ratio and nanometric proximity of the spin defect to the nanocrystal surface. The long spin coherence is enabled by reducing the dopant density below the instantaneous diffusion limit in a nuclear spin-free host material, reaching the limit of a single erbium spin defect per nanocrystal. We observe a large Orbach energy in a highly symmetric cubic site, further protecting the coherence in a qubit that would otherwise rapidly decohere. Spatially correlated electron spectroscopy measurements reveal the presence of Ce3+ at the nanocrystal surface that likely acts as extraneous paramagnetic spin noise. Even with these factors, defect-embedded nanocrystal hosts show tremendous promise for quantum sensing and quantum communication applications, with multiple avenues, including core-shell fabrication, redox tuning of oxygen vacancies, and organic surfactant modification, available to further enhance their spin coherence and functionality in the future.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# 光ポーリングスクリーニングにおけるインターベンショナルスタイル転送による遺伝子レベル表現学習 Gene-Level Representation Learning via Interventional Style Transfer in Optical Pooled Screening ( http://arxiv.org/abs/2406.07763v1 ) ライセンス: Link先を確認	Mahtab Bigverdi, Burkhard Hockendorf, Heming Yao, Phil Hanslovsky, Romain Lopez, David Richmond,	(参考訳) 光プールスクリーニング(OPS)は、自動顕微鏡と遺伝的摂動を組み合わせて、スケーラブルで費用対効果の高い方法で遺伝子機能を体系的に研究する。得られたデータを活用するには、画像から細胞内摂動表現型の生物学的に有益な表現を抽出する必要がある。我々は、OPSを用いて得られた遺伝的摂動細胞の画像から、遺伝子レベルの特徴表現を学習するために、スタイル-トランスファーアプローチを採用する。本手法は,遺伝子機能に応じた遺伝子表現のクラスタリングにおける工学的特徴よりも優れ,潜伏する生物学的関係を明らかにするために有用であることを示す。このアプローチは、健康と病気における遺伝子の役割を調べるための有望な代替手段を提供する。 Optical pooled screening (OPS) combines automated microscopy and genetic perturbations to systematically study gene function in a scalable and cost-effective way. Leveraging the resulting data requires extracting biologically informative representations of cellular perturbation phenotypes from images. We employ a style-transfer approach to learn gene-level feature representations from images of genetically perturbed cells obtained via OPS. Our method outperforms widely used engineered features in clustering gene representations according to gene function, demonstrating its utility for uncovering latent biological relationships. This approach offers a promising alternative to investigate the role of genes in health and disease.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# AIベースのコーディングアシスタントの実践 - 現状, 知覚, 今後の展開 Using AI-Based Coding Assistants in Practice: State of Affairs, Perceptions, and Ways Forward ( http://arxiv.org/abs/2406.07765v1 ) ライセンス: Link先を確認	Agnia Sergeyuk, Yaroslav Golubev, Timofey Bryksin, Iftekhar Ahmed,	(参考訳) ここ数年、コードのためのAIアシスタント -- ソフトウェアエンジニアリングにおける多目的AIベースのヘルパー -- が出現した。彼らの迅速な開発は、開発者がどのようにそれを使っているのか、なぜ開発ワークフローの特定の部分でそれを使用していないのか、何を改善する必要があるのかをよりよく理解する必要がある。本研究では,AIアシスタントの利用状況に関する大規模調査を行い,特定のソフトウェア開発活動とステージに着目した。我々は5つの幅広い活動について481人のプログラマの意見を集めた。 (a)新機能の実装 b) テストを書くこと (c)バグトリアージ (d)リファクタリング,及び (e)自然言語のアーティファクトや個々のステージを書くこと。その結果,AIアシスタントの利用状況は,活動やステージによって異なることがわかった。例えば、開発者は、テストや自然言語のアーティファクトを最も楽しいアクティビティとして記述し、最も多くを委譲したいと考えています。これは、現在開発者を支援する機能にとって、良い焦点になるかもしれません。開発者がアシスタントを使用しない理由については、信頼や企業の方針といった一般的なことに加えて、さらなる研究のガイドとなる固定可能な問題、例えば、プロジェクトサイズのコンテキストの欠如、アシスタントに対する認識の欠如などがあります。私たちは、ユーザーが実際にAIアシスタントを必要としている場所について、アクティブな研究を行うために、包括的で具体的な結果が特に必要であると考えています。 The last several years saw the emergence of AI assistants for code -- multi-purpose AI-based helpers in software engineering. Their quick development makes it necessary to better understand how specifically developers are using them, why they are not using them in certain parts of their development workflow, and what needs to be improved. In this work, we carried out a large-scale survey aimed at how AI assistants are used, focusing on specific software development activities and stages. We collected opinions of 481 programmers on five broad activities: (a) implementing new features, (b) writing tests, (c) bug triaging, (d) refactoring, and (e) writing natural-language artifacts, as well as their individual stages. Our results show that usage of AI assistants varies depending on activity and stage. For instance, developers find writing tests and natural-language artifacts to be the least enjoyable activities and want to delegate them the most, currently using AI assistants to generate tests and test data, as well as generating comments and docstrings most of all. This can be a good focus for features aimed to help developers right now. As for why developers do not use assistants, in addition to general things like trust and company policies, there are fixable issues that can serve as a guide for further research, e.g., the lack of project-size context, and lack of awareness about assistants. We believe that our comprehensive and specific results are especially needed now to steer active research toward where users actually need AI assistants.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# コンフォーマル化された遠隔操作:人間の入力を高次元ロボット行動に忠実にマッピングする Conformalized Teleoperation: Confidently Mapping Human Inputs to High-Dimensional Robot Actions ( http://arxiv.org/abs/2406.07767v1 ) ライセンス: Link先を確認	Michelle Zhao, Reid Simmons, Henny Admoni, Andrea Bajcsy,	(参考訳) 補助ロボットアームは、人間の遠隔操作者がジョイスティックのように低次元の入力で制御できるよりも、自由度が高いことが多い。この課題を克服するために、既存のアプローチでは、低次元の人間の入力から高次元のロボット動作へのマッピングを学ぶために、データ駆動方式を使用している。しかし、そのようなブラックボックスマッピングが低次元入力からユーザの意図した高次元動作を確実に推測できるかどうかを判断することは、未解決の問題である。我々のキーとなる考え方は、訓練時に補助写像を適用して、高次元のアクション量子化を付加的に推定し、厳密な不確実性定量法によってこれらの量子化を校正することである。具体的には、時間とともに間隔を調整し、マッピングの実行時の不確実性境界を減らし、マッピングが常に誤予測した場合のバウンダリを増大させる適応整合予測を利用する。さらに,不確実なユーザ入力やロボットの状態を検出する不確かさに基づくメカニズムを提案する。補助カップの把握とゴールリーチを含む2次元補助ナビゲーションタスクと2つの7DOF Kinova Jacoタスクにおける提案手法の有効性を評価した。本研究は, 適応型補助遠隔操作が, 多様な嗜好によって引き起こされ, 地図の訓練データセットにおける低精度軌跡によって引き起こされる高い不確実性を検出する(しかし, 区別はしない)ことを実証した。全体として、この作業は、ロボットが自身の不確実性を定量化し、必要に応じて積極的に介入を求めることを可能にするための重要なステップだと考えています。 Assistive robotic arms often have more degrees-of-freedom than a human teleoperator can control with a low-dimensional input, like a joystick. To overcome this challenge, existing approaches use data-driven methods to learn a mapping from low-dimensional human inputs to high-dimensional robot actions. However, determining if such a black-box mapping can confidently infer a user's intended high-dimensional action from low-dimensional inputs remains an open problem. Our key idea is to adapt the assistive map at training time to additionally estimate high-dimensional action quantiles, and then calibrate these quantiles via rigorous uncertainty quantification methods. Specifically, we leverage adaptive conformal prediction which adjusts the intervals over time, reducing the uncertainty bounds when the mapping is performant and increasing the bounds when the mapping consistently mis-predicts. Furthermore, we propose an uncertainty-interval-based mechanism for detecting high-uncertainty user inputs and robot states. We evaluate the efficacy of our proposed approach in a 2D assistive navigation task and two 7DOF Kinova Jaco tasks involving assistive cup grasping and goal reaching. Our findings demonstrate that conformalized assistive teleoperation manages to detect (but not differentiate between) high uncertainty induced by diverse preferences and induced by low-precision trajectories in the mapping's training dataset. On the whole, we see this work as a key step towards enabling robots to quantify their own uncertainty and proactively seek intervention when needed.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# リアルタイム3次元知覚とベイジアンペイオフ推定を用いた個人化製品アソシエーション Personalized Product Assortment with Real-time 3D Perception and Bayesian Payoff Estimation ( http://arxiv.org/abs/2406.07769v1 ) ライセンス: Link先を確認	Porter Jenkins, Michael Selander, J. Stockton Jenkins, Andrew Merrill, Kyle Armstrong,	(参考訳) 品揃えの選択は、物理的な小売業者にとって重要な課題だ。在庫と買い物客の好みを効果的に合わせることは、売上を増やし、外貨を減らせる。しかし、現実の環境では、製品アソシエーションの可能性の組合せが爆発的に爆発するため、この問題は困難である。消費者の嗜好は、通常、空間と時間にわたって異質であり、在庫と価格の調整を困難にしている。さらに、既存の戦略は、集約される傾向があり、解像度が低く、レイテンシが高いため、シンジケートされたデータに依存している。これらの課題を解決するために、リアルタイムレコメンデーションシステムを導入します。本システムは,3次元コンピュータビジョンの最近の進歩を,認識と自動的,きめ細かな販売推定に活用する。これらの知覚的コンポーネントはネットワークの端で動作し、リアルタイムの報酬信号を促進する。さらに,3次元LIDARデータからのノイズ推定を考慮したベイズペイオフモデルを構築した。我々は,異種消費者の嗜好に適応するための空間クラスタリングと,組合せ探索問題に対処するためのグラフベースの候補生成アルゴリズムを利用する。ドリンク製品を用いた2～8週間のA/Bテストで実店舗でテストを行い,それぞれ35%,27.5%の売り上げ増を示した。最後に、28週間にわたる観察調査を行い、販売の9.4倍の伸びを示した。 Product assortment selection is a critical challenge facing physical retailers. Effectively aligning inventory with the preferences of shoppers can increase sales and decrease out-of-stocks. However, in real-world settings the problem is challenging due to the combinatorial explosion of product assortment possibilities. Consumer preferences are typically heterogeneous across space and time, making inventory-preference alignment challenging. Additionally, existing strategies rely on syndicated data, which tends to be aggregated, low resolution, and suffer from high latency. To solve these challenges we introduce a real-time recommendation system, which we call \ours. Our system utilizes recent advances in 3D computer vision for perception and automatic, fine grained sales estimation. These perceptual components run on the edge of the network and facilitate real-time reward signals. Additionally, we develop a Bayesian payoff model to account for noisy estimates from 3D LIDAR data. We rely on spatial clustering to allow the system to adapt to heterogeneous consumer preferences, and a graph-based candidate generation algorithm to address the combinatorial search problem. We test our system in real-world stores across two, 6-8 week A/B tests with beverage products and demonstrate a 35% and 27\% increase in sales respectively. Finally, we monitor the deployed system for a period of 28 weeks with an observational study and show a 9.4\% increase in sales.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# DualBind:タンパク質-リガンド結合親和性予測のためのデュアルロスフレームワーク DualBind: A Dual-Loss Framework for Protein-Ligand Binding Affinity Prediction ( http://arxiv.org/abs/2406.07770v1 ) ライセンス: Link先を確認	Meng Liu, Saee Gopal Paliwal,	(参考訳) タンパク質-リガンド結合親和性の正確な予測は、薬物開発に不可欠である。機械学習の最近の進歩は、この課題に対して有望な結果を示している。しかし、これらの手法は一般にラベル付きデータに大きく依存しており、それは少ないか信頼できないか、またはボルツマンが分散したデータのように実際には当てはまらない仮定に依存している。本稿では、教師付き平均二乗誤差(MSE)と教師なし復調スコアマッチング(DSM)を統合して、結合エネルギー関数を正確に学習する新しいフレームワークであるDualBindを提案する。 DualBindは、より正確な絶対親和性予測を提供することでDSMのみのモデルの限界に対処するだけでなく、一般化性を改善し、MSEのみのモデルと比較してラベル付きデータへの依存を減らす。実験の結果,DualBindは結合親和性の予測に優れ,ラベル付きデータとラベルなしデータの両方を有効利用して性能を向上させることができることがわかった。 Accurate prediction of protein-ligand binding affinities is crucial for drug development. Recent advances in machine learning show promising results on this task. However, these methods typically rely heavily on labeled data, which can be scarce or unreliable, or they rely on assumptions like Boltzmann-distributed data that may not hold true in practice. Here, we present DualBind, a novel framework that integrates supervised mean squared error (MSE) with unsupervised denoising score matching (DSM) to accurately learn the binding energy function. DualBind not only addresses the limitations of DSM-only models by providing more accurate absolute affinity predictions but also improves generalizability and reduces reliance on labeled data compared to MSE-only models. Our experimental results demonstrate that DualBind excels in predicting binding affinities and can effectively utilize both labeled and unlabeled data to enhance performance.	翻訳日:2024-06-13 20:56:21 公開日:2024-06-11
# 動的光ファイバー伝送行列のコンパクト潜在空間モデリングのための自己アテンションに基づく非線形基底変換 Self-attention-based non-linear basis transformations for compact latent space modelling of dynamic optical fibre transmission matrices ( http://arxiv.org/abs/2406.07775v1 ) ライセンス: Link先を確認	Yijie Zheng, Robert J. Kilpatrick, David B. Phillips, George S. D. Gordon,	(参考訳) マルチモード光ファイバー(英: Multimode optical fibres)は、光を効率的に輸送するガラスの毛細いストランドである。彼らは、体内の奥深くで前例のないサブセル画像の解像度を提供する次世代の医療内視鏡を約束する。しかし、そのようなファイバーに光を閉じ込めることによって、画像は本質的にトランジット中にスクランブルされる。従来、このスクランブルは特定の繊維がどのように光を散乱するかを事前に計算し、ファイバーの物理モデルを表す定常線形行列方程式を解くことで補償されてきた。しかし、技術が現実世界の展開に向かって発展するにつれて、運動や温度変化などの要因による光に対するファイバーの効果を表すマトリックスの動的変化や、体内でのファイバー先端の到達不能に起因する非線形性を考慮する必要がある。このような複雑で動的で非線形な振る舞いはニューラルネットワークによる近似に適しているが、ほとんどの画像再構成ネットワークは、隣接するピクセル間の強い相関を仮定する畳み込み層に依存している。我々は、自己アテンション層を用いて、様々なファイバー行列の座標表現を、さらなる処理に適したコンパクトで低次元の表現を許容する基底に動的に変換する新しい概念を導入する。本手法の有効性を,多種多様なファイバー・マトリックス・データセットに示す。また,本モデルでは,0.01～0.11の相乗比,pの相乗比で,変換基部の繊維基の疎度を有意に向上させることを示した。さらに、変換された表現は、元の行列を10%の再構成誤差で再構成することを許容し、その可逆性を証明している。 Multimode optical fibres are hair-thin strands of glass that efficiently transport light. They promise next-generation medical endoscopes that provide unprecedented sub-cellular image resolution deep inside the body. However, confining light to such fibres means that images are inherently scrambled in transit. Conventionally, this scrambling has been compensated by pre-calibrating how a specific fibre scrambles light and solving a stationary linear matrix equation that represents a physical model of the fibre. However, as the technology develops towards real-world deployment, the unscrambling process must account for dynamic changes in the matrix representing the fibre's effect on light, due to factors such as movement and temperature shifts, and non-linearities resulting from the inaccessibility of the fibre tip when inside the body. Such complex, dynamic and nonlinear behaviour is well-suited to approximation by neural networks, but most leading image reconstruction networks rely on convolutional layers, which assume strong correlations between adjacent pixels, a strong inductive bias that is inappropriate for fibre matrices which may be expressed in a range of arbitrary coordinate representations with long-range correlations. We introduce a new concept that uses self-attention layers to dynamically transform the coordinate representations of varying fibre matrices to a basis that admits compact, low-dimensional representations suitable for further processing. We demonstrate the effectiveness of this approach on diverse fibre matrix datasets. We show our models significantly improve the sparsity of fibre bases in their transformed bases with a participation ratio, p, as a measure of sparsity, of between 0.01 and 0.11. Further, we show that these transformed representations admit reconstruction of the original matrices with < 10% reconstruction error, demonstrating the invertibility.	翻訳日:2024-06-13 20:46:21 公開日:2024-06-11
# アルツハイマー病の進行予測のための統一的解釈可能性と説明可能性 Unifying Interpretability and Explainability for Alzheimer's Disease Progression Prediction ( http://arxiv.org/abs/2406.07777v1 ) ライセンス: Link先を確認	Raja Farrukh Ali, Stephanie Milani, John Woods, Emmanuel Adenij, Ayesha Farooq, Clayton Mansel, Jeffrey Burns, William Hsu,	(参考訳) 強化学習(Reinforcement Learning, RL)は、最近、アルツハイマー病(AD)の進行を予測することを約束している。しかし、どのRLアルゴリズムがこの課題に適しているかは明らかでない。さらに、これらの手法は本質的には説明不可能であり、実際の臨床シナリオにおける適用性を制限している。私たちの仕事はこれらの2つの重要な問題に対処する。 ADの因果的、解釈可能なモデルを用いて、ベースライン(0)データのみを用いて、脳の認知を10年以上にわたって予測する4つの現代RLアルゴリズムの性能を比較した。次に、SHAP (SHapley Additive exPlanations) を適用し、モデル内の各アルゴリズムによる決定について説明する。当社のアプローチは、解釈可能性と説明可能性を組み合わせることで、AD進行に影響を及ぼす重要な要因についての洞察を与え、グローバルおよび個人レベルの患者レベルの分析を提供する。以上の結果より,RL法は病状進行を良好にモデル化できるが,全ての方法がアミロイド蓄積の重要性を適切に把握できないことが示唆された。我々の研究は、予測精度と透明性を融合し、臨床医や研究者が情報医療決定のための疾患進行モデリングを強化することを支援することを目的としている。コードはhttps://github.com/rfali/xrlad.comで入手できる。 Reinforcement learning (RL) has recently shown promise in predicting Alzheimer's disease (AD) progression due to its unique ability to model domain knowledge. However, it is not clear which RL algorithms are well-suited for this task. Furthermore, these methods are not inherently explainable, limiting their applicability in real-world clinical scenarios. Our work addresses these two important questions. Using a causal, interpretable model of AD, we first compare the performance of four contemporary RL algorithms in predicting brain cognition over 10 years using only baseline (year 0) data. We then apply SHAP (SHapley Additive exPlanations) to explain the decisions made by each algorithm in the model. Our approach combines interpretability with explainability to provide insights into the key factors influencing AD progression, offering both global and individual, patient-level analysis. Our findings show that only one of the RL methods is able to satisfactorily model disease progression, but the post-hoc explanations indicate that all methods fail to properly capture the importance of amyloid accumulation, one of the pathological hallmarks of Alzheimer's disease. Our work aims to merge predictive accuracy with transparency, assisting clinicians and researchers in enhancing disease progression modeling for informed healthcare decisions. Code is available at https://github.com/rfali/xrlad.	翻訳日:2024-06-13 20:46:21 公開日:2024-06-11
# Bisimulation Metrics is Optimal Transport Distances and can Computediently Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently ( http://arxiv.org/abs/2406.04056v2 ) ライセンス: Link先を確認	Sergio Calo, Anders Jonsson, Gergely Neu, Ludovic Schwartz, Javier Segovia,	(参考訳) マルコフ連鎖間の最適な輸送距離を定式化するための新しい枠組みを提案する。これまで知られていた定式化は、連鎖によって誘導される結合分布全体と、適切に定義されたマルコフ決定過程における動的プログラミング(DP)への還元による導出解の間の結合を研究していた。しかし、この定式化は、関連するDP演算子を計算する際には、静的な最適輸送問題を完全に解決する必要があるため、これまでは特に効率的なアルゴリズムを導いていない。本研究では, 分割占有結合と呼ばれる関節分布の平坦化バージョン間の結合を考慮し, この縮小された空間における線形プログラム(LP)の解法として, 関節分布の全空間における最適輸送距離を等価に定式化できることを示す。このLP定式化により、最適輸送理論の他の領域からいくつかのアルゴリズム的アイデアを移植することができる。具体的には,最適化問題にエントロピー正規化の適切な概念を導入し,Sinkhorn Value Iteration (SVI) と呼ぶSinkhornライクな手法を用いて,最適な輸送距離を直接計算することができる。本手法は,バニラ・シンクホーンを各状態に走らせるのと同じ計算コストで,最適結合に迅速に収束することを示す。その過程で, 最適輸送距離はマルコフ連鎖間のバイシミュレーション指標の共通概念と正確に一致していることが指摘され, 結果もそのような指標の計算に適用され, 実際, この目的のために開発した最もよく知られた手法よりもはるかに効率的であることが判明した。 We propose a new framework for formulating optimal transport distances between Markov chains. Previously known formulations studied couplings between the entire joint distribution induced by the chains, and derived solutions via a reduction to dynamic programming (DP) in an appropriately defined Markov decision process. This formulation has, however, not led to particularly efficient algorithms so far, since computing the associated DP operators requires fully solving a static optimal transport problem, and these operators need to be applied numerous times during the overall optimization process. In this work, we develop an alternative perspective by considering couplings between a flattened version of the joint distributions that we call discounted occupancy couplings, and show that calculating optimal transport distances in the full space of joint distributions can be equivalently formulated as solving a linear program (LP) in this reduced space. This LP formulation allows us to port several algorithmic ideas from other areas of optimal transport theory. In particular, our formulation makes it possible to introduce an appropriate notion of entropy regularization into the optimization problem, which in turn enables us to directly calculate optimal transport distances via a Sinkhorn-like method we call Sinkhorn Value Iteration (SVI). We show both theoretically and empirically that this method converges quickly to an optimal coupling, essentially at the same computational cost of running vanilla Sinkhorn in each pair of states. Along the way, we point out that our optimal transport distance exactly matches the common notion of bisimulation metrics between Markov chains, and thus our results also apply to computing such metrics, and in fact our algorithm turns out to be significantly more efficient than the best known methods developed so far for this purpose.	翻訳日:2024-06-13 11:28:49 公開日:2024-06-11
# パラメータアンダーレジームにおけるフェデレーション表現学習 Federated Representation Learning in the Under-Parameterized Regime ( http://arxiv.org/abs/2406.04596v3 ) ライセンス: Link先を確認	Renpu Liu, Cong Shen, Jing Yang,	(参考訳) フェデレーション表現学習(FRL)は、クライアントが共通の表現をトレーニングし、パーソナライズされた頭を維持しながら協調する、パーソナライズされたフェデレーション学習(FL)フレームワークである。しかし、既存の研究は主に過度にパラメータ化された体制に焦点を当てている。本稿では, フラックスモデルがすべての地中構造モデルの変動を表現するのに不十分な, パラメータ下条件下でのFRLについて検討する。我々は新しいFRLアルゴリズムFLUTEを提案し、パラメータ下状態における線形モデルに対する標本の複雑さと収束率を理論的に特徴づける。我々の知る限りでは、この方式で証明可能な性能保証を備えたFRLアルゴリズムは初めてである。 FLUTEは、データ非依存のランダム初期化と、不整合局所表現から大域的最適表現に代表される部分空間の蒸留を支援する、慎重に設計された目的関数を備えている。技術的には、FL解析による低ランク行列近似手法を橋渡しする。また、FLUTEを線形表現を超えて拡張する。実験により、FLUTEは、合成タスクと実世界のタスクの両方において、最先端のFRLソリューションよりも優れていることが示された。 Federated representation learning (FRL) is a popular personalized federated learning (FL) framework where clients work together to train a common representation while retaining their personalized heads. Existing studies, however, largely focus on the over-parameterized regime. In this paper, we make the initial efforts to investigate FRL in the under-parameterized regime, where the FL model is insufficient to express the variations in all ground-truth models. We propose a novel FRL algorithm FLUTE, and theoretically characterize its sample complexity and convergence rate for linear models in the under-parameterized regime. To the best of our knowledge, this is the first FRL algorithm with provable performance guarantees in this regime. FLUTE features a data-independent random initialization and a carefully designed objective function that aids the distillation of subspace spanned by the global optimal representation from the misaligned local representations. On the technical side, we bridge low-rank matrix approximation techniques with the FL analysis, which may be of broad interest. We also extend FLUTE beyond linear representations. Experimental results demonstrate that FLUTE outperforms state-of-the-art FRL solutions in both synthetic and real-world tasks.	翻訳日:2024-06-13 11:28:49 公開日:2024-06-11
# アンタングル表現学習によるグラフニューラルネットワークにおけるサイズ一般化の促進 Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning ( http://arxiv.org/abs/2406.04601v3 ) ライセンス: Link先を確認	Zheng Huang, Qihui Yang, Dawei Zhou, Yujun Yan,	(参考訳) ほとんどのグラフニューラルネットワーク(GNN)は、任意のサイズのグラフで操作できるが、その分類性能は、トレーニング中に遭遇したグラフよりも大きいグラフで低下することが多い。既存の手法では、グラフ表現からサイズ情報の除去が不十分であり、その結果、サブ最適性能とバックボーンモデルへの依存が生じる。そこで我々は,グラフ表現からサイズ因子をアンタングル化する新しい,モデルに依存しないフレームワークである DISGEN を提案する。 DISGENはサイズとタスク不変の拡張を採用し、デカップリングロスを導入し、隠れた表現における共有情報を最小化し、その効果を理論的に保証する。実験の結果, DISGENは実世界のデータセットにおいて, 最大6%の精度で最先端のモデルより優れており, GNNのサイズ一般化性の向上に有効であることが示唆された。私たちのコードは、https://github.com/GraphmindDartmouth/DISGEN.comで利用可能です。 Although most graph neural networks (GNNs) can operate on graphs of any size, their classification performance often declines on graphs larger than those encountered during training. Existing methods insufficiently address the removal of size information from graph representations, resulting in sub-optimal performance and reliance on backbone models. In response, we propose DISGEN, a novel and model-agnostic framework designed to disentangle size factors from graph representations. DISGEN employs size- and task-invariant augmentations and introduces a decoupling loss that minimizes shared information in hidden representations, with theoretical guarantees for its effectiveness. Our empirical results show that DISGEN outperforms the state-of-the-art models by up to 6% on real-world datasets, underscoring its effectiveness in enhancing the size generalizability of GNNs. Our codes are available at: https://github.com/GraphmindDartmouth/DISGEN.	翻訳日:2024-06-13 11:28:49 公開日:2024-06-11
# VTrans: 変分情報ボトルネックに基づくプルーニングによる変圧器圧縮の高速化 VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning ( http://arxiv.org/abs/2406.05276v2 ) ライセンス: Link先を確認	Oshin Dutta, Ritvik Gupta, Sumeet Agarwal,	(参考訳) 近年,資源制約のあるデバイスに対して,大規模な事前学習型トランスフォーマーモデルを圧縮することの重要性が高まっている。しかし、伝統的なプルーニング法は、しばしば埋め込み層を無傷で残し、過パラメータ化のモデルに繋がる。さらに、プルーニングされたモデルのパフォーマンスを維持するために、大規模なデータセットによる広範な圧縮時間が必要となる。これらの課題に対処するために,変分情報ボトルネック(VIB)の原理で導かれる反復的刈り取りフレームワークであるVTransを提案する。提案手法は,VIBトレーニングマスクを用いた埋め込み,アテンションヘッド,層など,すべての構造成分を圧縮する。このアプローチは各レイヤに必須の重みしか保持せず、特定のモデルサイズや計算上の制約に準拠することを保証する。特に,本手法は,タスク非依存とタスク特化の両面において,従来の最先端手法よりも最大70%圧縮を実現している。高速VTransは、VBマスクを排他的に微調整し、圧縮を25倍まで加速し、従来の方法に比べて性能損失が最小限である。 BERT, ROBERTa, GPT-2モデルに対する広範囲な実験により, 本法の有効性が確認された。さらに,LLaMA-2-7Bのような大型モデルの圧縮におけるスケーラビリティを実証し,従来のプルーニング法と比較して優れた性能を実現する。さらに、注意に基づく探索を用いて、モデルの冗長性を質的に評価し、アプローチの効率性を解釈する。特に,本手法では,タスククリティカルなキーワードに係わる上で,保持された頭部が最優先のプルーニング候補として,特別なトークンや現在のトークンに注意を払っている。 In recent years, there has been a growing emphasis on compressing large pre-trained transformer models for resource-constrained devices. However, traditional pruning methods often leave the embedding layer untouched, leading to model over-parameterization. Additionally, they require extensive compression time with large datasets to maintain performance in pruned models. To address these challenges, we propose VTrans, an iterative pruning framework guided by the Variational Information Bottleneck (VIB) principle. Our method compresses all structural components, including embeddings, attention heads, and layers using VIB-trained masks. This approach retains only essential weights in each layer, ensuring compliance with specified model size or computational constraints. Notably, our method achieves upto 70% more compression than prior state-of-the-art approaches, both task-agnostic and task-specific. We further propose faster variants of our method: Fast-VTrans utilizing only 3% of the data and Faster-VTrans, a time efficient alternative that involves exclusive finetuning of VIB masks, accelerating compression by upto 25 times with minimal performance loss compared to previous methods. Extensive experiments on BERT, ROBERTa, and GPT-2 models substantiate the efficacy of our method. Moreover, our method demonstrates scalability in compressing large models such as LLaMA-2-7B, achieving superior performance compared to previous pruning methods. Additionally, we use attention-based probing to qualitatively assess model redundancy and interpret the efficiency of our approach. Notably, our method considers heads with high attention to special and current tokens in un-pruned model as foremost candidates for pruning while retained heads are observed to attend more to task-critical keywords.	翻訳日:2024-06-13 11:28:49 公開日:2024-06-11
# 表現編集による大規模言語モデルの調整:制御の観点から Aligning Large Language Models with Representation Editing: A Control Perspective ( http://arxiv.org/abs/2406.05954v2 ) ライセンス: Link先を確認	Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang,	(参考訳) 大規模言語モデル(LLM)を人間の目的に合わせることは、現実世界のアプリケーションには不可欠である。しかし、アライメントのための微調整 LLM は不安定なトレーニングに悩まされ、かなりの計算資源を必要とする。プロンプトやガイドデコーディングのようなテスト時のアライメント技術は、基礎となるモデルを変更せず、その性能は元のモデルの性能に依存している。これらの課題に対処するために,表現編集によるLLMの整合性を提案する。本手法の核となるのは,事前学習した自己回帰型LDMを離散時間確率力学系として見ることである。この言語力学系の状態空間に外部制御信号を導入する。我々はベルマン方程式に従って隠蔽状態の値関数を直接訓練し、勾配に基づく最適化によりテスト時に最適な制御信号が得られるようにした。実験の結果,本手法は既存のテスト時間アライメント手法より優れており,微調整法に比べて資源の削減が著しく少ないことがわかった。 Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabilities. To address these challenges, we propose aligning LLMs through representation editing. The core of our method is to view a pre-trained autoregressive LLM as a discrete-time stochastic dynamical system. To achieve alignment for specific objectives, we introduce external control signals into the state space of this language dynamical system. We train a value function directly on the hidden states according to the Bellman equation, enabling gradient-based optimization to obtain the optimal control signals at test time. Our experiments demonstrate that our method outperforms existing test-time alignment techniques while requiring significantly fewer resources compared to fine-tuning methods.	翻訳日:2024-06-13 11:18:52 公開日:2024-06-11
# EARS:音声強調と残響をベンチマークした無響全帯域音声データセット EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation ( http://arxiv.org/abs/2406.06185v2 ) ライセンス: Link先を確認	Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann,	(参考訳) 我々は、さまざまな背景から107人の話者からなる高品質な音声データセットEARS(Expressive Anechoic Recordings of Speech)データセットをリリースした。データセットには、感情的なスピーチ、異なる読み方、非言語音、会話の自由なスピーチなど、幅広い種類の話し方が含まれている。提案手法は,データセット上での音声強調とデバーベレーションのための様々な手法をベンチマークし,その性能を測定値を用いて評価する。また、音声強調タスクの参加者20名による聴取テストを行い、生成方法が好まれる。我々は、アップロードされたデータのオンライン自動評価を可能にするブラインドテストセットを導入する。データセットダウンロードリンクと自動評価サーバはオンラインで見つけることができる。 We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics. In addition, we conduct a listening test with 20 participants for the speech enhancement task, where a generative method is preferred. We introduce a blind test set that allows for automatic online evaluation of uploaded data. Dataset download links and automatic evaluation server can be found online.	翻訳日:2024-06-13 11:18:52 公開日:2024-06-11
# 自然言語プロンプトによるテキスト音声の感情制御 Controlling Emotion in Text-to-Speech with Natural Language Prompts ( http://arxiv.org/abs/2406.06406v2 ) ライセンス: Link先を確認	Thomas Bott, Florian Lux, Ngoc Thang Vu,	(参考訳) 近年、自然言語の直感的な使用により、プロンプトは、生成機械学習モデルの出力を制御するための標準的な方法の1つになってきた。そこで本研究では,感情に富んだテキストからの埋め込みを前提としたシステムを提案する。これにより、変換器をベースとしたアーキテクチャにおいて、話者と即時埋め込みの合同表現がいくつかの点で統合される。提案手法は感情的な音声とテキストを融合したデータセットに基づいて訓練され,モデルの一般化能力を高めるため,各トレーニングイテレーションのプロンプトが変化する。主観的および主観的評価の結果は、条件付き合成システムの音声へのプロンプトに存在する感情を正確に伝達する能力を示している。同時に、話者のアイデンティティの正確なトラクタビリティと、全体的な高い音声品質とインテリジェンスを維持する。 In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language. In this work, we propose a system conditioned on embeddings derived from an emotionally rich text that serves as prompt. Thereby, a joint representation of speaker and prompt embeddings is integrated at several points within a transformer-based architecture. Our approach is trained on merged emotional speech and text datasets and varies prompts in each training iteration to increase the generalization capabilities of the model. Objective and subjective evaluation results demonstrate the ability of the conditioned synthesis system to accurately transfer the emotions present in a prompt to speech. At the same time, precise tractability of speaker identities as well as overall high speech quality and intelligibility are maintained.	翻訳日:2024-06-13 11:18:52 公開日:2024-06-11
# 時間ビンスピン絡み合いプロトコルにおけるスペクトル拡散の強調 Rephasing spectral diffusion in time-bin spin-spin entanglement protocols ( http://arxiv.org/abs/2406.06497v2 ) ライセンス: Link先を確認	Mehmet T. Uysal, Jeff D. Thompson,	(参考訳) 高忠実度スピンスピン絡み合わせの生成は、長距離にわたって量子情報の分散を行うための量子リピータネットワークの重要な課題である。固体ベースのスピン光子インタフェースは量子ネットワークのノードを実現するための候補となるが、しばしば光転移のスペクトル拡散によって制限され、絡み合った状態の位相誤差が生じる。ここでは、励起状態のエミッタをシェルプして未知の位相を再焦点化することにより、絡み合った状態が生じた後、準定常周波数変動から位相誤差を補正する手法を提案する。準定常周波数変動の場合、その忠実度はシェルビングに使用される励起状態の寿命によってのみ決定されるため、特にスペクトル拡散が相関する長寿命シェルビング状態のシステムに適している。このようなシェルヴィング状態は、ケイ素またはSiCの希土類エミッタや色中心などのクラマースダブルト系において、強い周波数依存性のパーセル増強を伴うナノフォトニックキャビティと相互作用する。このプロトコルは、絡み合いの発生率を低下させることなく、高忠実な絡み合いのスピンペアを生成するために使用できる。 Generating high fidelity spin-spin entanglement is an essential task of quantum repeater networks for the distribution of quantum information across long distances. Solid-state based spin-photon interfaces are promising candidates to realize nodes of a quantum network, but are often limited by spectral diffusion of the optical transition, which results in phase errors on the entangled states. Here, we introduce a method to correct phase errors from quasi-static frequency fluctuations after the entangled state is generated, by shelving the emitters in the excited state to refocus the unknown phase. For quasi-static frequency fluctuations, the fidelity is determined only by the lifetime of the excited state used for shelving, making it particularly suitable for systems with a long-lived shelving state with correlated spectral diffusion. Such a shelving state may be found in Kramers doublet systems such as rare-earth emitters and color centers in Si or SiC interfaced with nanophotonic cavities with a strongly frequency-dependent Purcell enhancement. The protocol can be used to generate high-fidelity entangled spin pairs without reducing the rate of entanglement generation.	翻訳日:2024-06-13 11:18:52 公開日:2024-06-11
# テレコムバンドにおける単一Er$^{3+}=イオンのスピン光子絡み Spin-photon entanglement of a single Er$^{3+}$ ion in the telecom band ( http://arxiv.org/abs/2406.06515v2 ) ライセンス: Link先を確認	Mehmet T. Uysal, Łukasz Dusanowski, Haitong Xu, Sebastian P. Horvath, Salim Ourari, Robert J. Cava, Nathalie P. de Leon, Jeff D. Thompson,	(参考訳) 量子リピータを用いた長距離量子通信は、セキュアな通信、分散量子コンピューティング、量子エンハンスドセンシングおよびメトロジーを可能にする技術である。量子リピータの構成要素として、スピン光子絡み合いは原子と固体の量子ビットの両方で実証されている。しかし、以前に実証された長いスピンコヒーレンスを持つ量子ビットは、長距離通信に必要な低損失の通信帯域に直接光子を放出しない。ここでは, シリコンナノフォトニック回路に集積された固体結晶中の1つのEr$^{3+}$イオンを用いたスピン光子エンタングルメントを実演する。テレコムバンドへの直接放出は、15.6kmの光ファイバー上で1.48Hzの絡み合い速度を可能にし、忠実度は73(3)$\%$である。これにより、スケーラブルなナノフォトニクスデバイスと多くのスペクトル多重Er$^{3+}$イオンに基づく大規模量子ネットワークへの扉が開く。 Long-distance quantum communication using quantum repeaters is an enabling technology for secure communication, distributed quantum computing and quantum-enhanced sensing and metrology. As a building block of quantum repeaters, spin-photon entanglement has been demonstrated with both atomic and solid-state qubits. However, previously demonstrated qubits with long spin coherence do not directly emit photons into the low-loss telecom band that is needed for long-distance communication. Here, we demonstrate spin-photon entanglement using a single Er$^{3+}$ ion in a solid-state crystal, integrated into a silicon nanophotonic circuit. Direct emission into the telecom band enables an entanglement rate of 1.48 Hz over 15.6 km of optical fiber, with a fidelity of 73(3)$\%$. This opens the door to large-scale quantum networks based on scalable nanophotonic devices and many spectrally multiplexed Er$^{3+}$ ions.	翻訳日:2024-06-13 11:18:52 公開日:2024-06-11
# $k$-NNレグレッションにおける$k$の選択のための最小不一致原理戦略 Minimum discrepancy principle strategy for choosing $k$ in $k$-NN regression ( http://arxiv.org/abs/2008.08718v6 ) ライセンス: Link先を確認	Yaroslav Averyanov, Alain Celisse,	(参考訳) ホールドアウトデータを使わずに、$k$-NN回帰推定器でハイパーパラメータ$k$を選択するための新しいデータ駆動戦略を提案する。我々は,ハイパーパラメータを反復的手順 ($k$以上) として選択する問題を扱い,早期停止の考え方と最小差分原理に基づく実践的戦略を用いて提案する。このモデル選択戦略は、いくつかの滑らかな函数クラス、例えば有界領域上のリプシッツ函数クラスに対する共変量に対する固定設計の仮定の下で、ミニマックス最適であることが証明されている。この手法は、ホールドアウト法や5倍のクロスバリデーション、AIC基準など、他のモデル選択手法と比較して、人工的および実世界のデータセットの統計性能を向上することが多い。戦略の新規性は、モデル選択手順の計算時間を減少させ、結果の推定器の統計的(最小限)最適性を保存することから生じる。より正確には、サイズ$n$のサンプルとして$k$を$\left\{ 1, \ldots, n \right\}$と$\left\{ f^1, \ldots, f^n \right\}$の中から選ぶ必要があるとすると、最小の離散性原理は回帰関数の推定器である。 We present a novel data-driven strategy to choose the hyperparameter $k$ in the $k$-NN regression estimator without using any hold-out data. We treat the problem of choosing the hyperparameter as an iterative procedure (over $k$) and propose using an easily implemented in practice strategy based on the idea of early stopping and the minimum discrepancy principle. This model selection strategy is proven to be minimax-optimal, under the fixed-design assumption on covariates, over some smoothness function classes, for instance, the Lipschitz functions class on a bounded domain. The novel method often improves statistical performance on artificial and real-world data sets in comparison to other model selection strategies, such as the Hold-out method, 5-fold cross-validation, and AIC criterion. The novelty of the strategy comes from reducing the computational time of the model selection procedure while preserving the statistical (minimax) optimality of the resulting estimator. More precisely, given a sample of size $n$, if one should choose $k$ among $\left\{ 1, \ldots, n \right\}$, and $\left\{ f^1, \ldots, f^n \right\}$ are the estimators of the regression function, the minimum discrepancy principle requires calculation of a fraction of the estimators, while this is not the case for the generalized cross-validation, Akaike's AIC criteria or Lepskii principle.	翻訳日:2024-06-13 01:45:51 公開日:2024-06-11
# ビデオ質問応答のためのオープンエンディング型マルチモーダルリレーショナル推論 Open-Ended Multi-Modal Relational Reasoning for Video Question Answering ( http://arxiv.org/abs/2012.00822v4 ) ライセンス: Link先を確認	Haozheng Luo, Ruiyang Qin, Chenwei Xu, Guo Ye, Zening Luo,	(参考訳) 本稿では,外部環境を分析し,参加者の質問に答えるためのロボットエージェントを提案する。このエージェントの主な焦点は、ビデオベースのシーン内で言語ベースのインタラクションを使用する個人を支援することである。提案手法は,ロボットエージェント内にビデオ認識技術と自然言語処理モデルを統合する。本研究では,ロボットエージェントと参加者間の関連する問題を調べることによって,人間とロボットの相互作用に影響を及ぼす重要な要因について検討する。方法論的には, 信頼と相互作用効率の正の相関が明らかとなった。さらに,本モデルでは,他のベンチマーク手法と比較して2～3倍の性能向上を示す。 In this paper, we introduce a robotic agent specifically designed to analyze external environments and address participants' questions. The primary focus of this agent is to assist individuals using language-based interactions within video-based scenes. Our proposed method integrates video recognition technology and natural language processing models within the robotic agent. We investigate the crucial factors affecting human-robot interactions by examining pertinent issues arising between participants and robot agents. Methodologically, our experimental findings reveal a positive relationship between trust and interaction efficiency. Furthermore, our model demonstrates a 2\% to 3\% performance enhancement in comparison to other benchmark methods.	翻訳日:2024-06-13 01:45:51 公開日:2024-06-11
# 深層学習に基づく航空機検出のためのベンチマークデータセット:HRPlanes A benchmark dataset for deep learning-based airplane detection: HRPlanes ( http://arxiv.org/abs/2204.10959v2 ) ライセンス: Link先を確認	Tolga Bakirman, Elif Sertel,	(参考訳) 衛星画像からの航空機検出は、画像の複雑な背景とセンサー形状と大気効果に起因するデータ取得条件の違いのため、難しい課題である。深層学習は,航空機の自動検出のための信頼性の高い高精度な手法を提供するが,有望な結果を得るためには大量のトレーニングデータが必要である。本研究では,Google Earth(GE)の画像を用いて,各平面の境界ボックスを画像上にラベル付けすることで,高分解能平面(HRPlanes)と呼ばれる新しい航空機検出データセットを作成する。 HRPlanは、様々な衛星から得られた様々な地形、季節、衛星の幾何学的条件を表すために、世界中の様々な空港のGE画像を含む。我々は, YOLOv4とFaster R-CNNという2つの広く使われているオブジェクト検出手法を用いて, データセットの評価を行った。予備的な結果から,提案したデータセットは将来のアプリケーションに有用なデータソースとベンチマークデータセットとなる可能性が示唆された。さらに, 本研究の成果は, 航空機検出のための異なるデータセットやモデルの伝達学習に利用することができる。 Airplane detection from satellite imagery is a challenging task due to the complex backgrounds in the images and differences in data acquisition conditions caused by the sensor geometry and atmospheric effects. Deep learning methods provide reliable and accurate solutions for automatic detection of airplanes; however, huge amount of training data is required to obtain promising results. In this study, we create a novel airplane detection dataset called High Resolution Planes (HRPlanes) by using images from Google Earth (GE) and labeling the bounding box of each plane on the images. HRPlanes include GE images of several different airports across the world to represent a variety of landscape, seasonal and satellite geometry conditions obtained from different satellites. We evaluated our dataset with two widely used object detection methods namely YOLOv4 and Faster R-CNN. Our preliminary results show that the proposed dataset can be a valuable data source and benchmark data set for future applications. Moreover, proposed architectures and results of this study could be used for transfer learning of different datasets and models for airplane detection.	翻訳日:2024-06-13 01:45:51 公開日:2024-06-11
# 商用レーザーを用いたマクロ遅延型量子消光器 A macroscopic delayed-choice quantum eraser using a commercial laser ( http://arxiv.org/abs/2205.14353v4 ) ライセンス: Link先を確認	Byoung S. Ham,	(参考訳) 量子力学の心臓は、単一粒子の直交基底の間の量子重ね合わせである。量子力学の粒子の性質において、量子重ね合わせは直交偏極基底のような相互排他的な性質の間の確率振幅で表される。遅延チョイス量子消去器は光子の性質を後決定するためのものであり、因果関係の問題を引き起こす。過去数十年間、量子消去器はほとんどあらゆる種類の光子を用いて研究されてきた。ここでは、連続波レーザーを用いて、マクロ的遅延格子量子消去器を実験的に実証し、マクロ的状態における量子重ね合わせについて論じる。このため、2つの偏光ビームスプリッターからなる非干渉型マッハ・ツェンダー干渉計が選択され、偏光基底を制御し、偏光基底投影により遅延チョイスで測定する。 The heart of quantum mechanics is quantum superposition between orthogonal bases of a single particle. In the particle nature of quantum mechanics, quantum superposition is represented by probability amplitudes between mutually exclusive natures such as orthogonal polarization bases. The delayed-choice quantum eraser is for the post-determination of the photon nature, raising the cause-effect relation issue. Over the last several decades, quantum erasers have been intensively studied using nearly all kinds of photons. Here, the macroscopic delayed-choice quantum eraser is experimentally demonstrated using a continuous wave laser and discussed for quantum superposition in a macroscopic regime. For this, a noninterfering Mach-Zehnder interferometer composed of two polarizing beam splitters is chosen to manipulate polarization bases of lights and to measure them in a delayed-choice manner via polarization-basis projection.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# メトリジングフェアネス Metrizing Fairness ( http://arxiv.org/abs/2205.15049v5 ) ライセンス: Link先を確認	Yves Rychener, Bahar Taskesen, Daniel Kuhn,	(参考訳) 本研究では,2つの人口集団の個人に有意な影響を及ぼす教師付き学習問題について検討し,統計パリティ(SP)などのグループフェアネス基準に対して公平な予測因子を求める。予測器は2つの群内の予測分布がコルモゴロフ距離に近ければSPフェアであり、学習問題の目的関数においてこれらの2つの分布の相似性をペナルティ化することにより公平性を達成する。本稿では,厳密なSP制約が保証される条件を特定し,予測精度を向上させる。また、コルモゴロフ距離以外の積分確率測度(IPMs)を用いて不公平さを測定するという概念的および計算的利点を示す。概念的には、どのIMMのジェネレータもユーティリティ関数のファミリーとして解釈でき、このIMMに関する不公平さは、2つの人口集団の個人が期待されるユーティリティを分散させた場合に生じます。また,不公平性正規化予測損失は,正方形$\mathcal L^2$-distance,あるいは正方形最大平均誤差によって測定された場合,トレーニングサンプルのランダムなミニバッチから構成される非バイアス勾配推定器を許容することを示した。この場合、フェアラーニング問題は、効率的な確率勾配勾配(SGD)アルゴリズムに影響を受けやすい。合成データと実データに関する数値実験により、これらのSGDアルゴリズムは、公正な学習のための最先端の手法よりも優れた精度と不公平なトレードオフを達成できることが示される。 We study supervised learning problems that have significant effects on individuals from two demographic groups, and we seek predictors that are fair with respect to a group fairness criterion such as statistical parity (SP). A predictor is SP-fair if the distributions of predictions within the two groups are close in Kolmogorov distance, and fairness is achieved by penalizing the dissimilarity of these two distributions in the objective function of the learning problem. In this paper, we identify conditions under which hard SP constraints are guaranteed to improve predictive accuracy. We also showcase conceptual and computational benefits of measuring unfairness with integral probability metrics (IPMs) other than the Kolmogorov distance. Conceptually, we show that the generator of any IPM can be interpreted as a family of utility functions and that unfairness with respect to this IPM arises if individuals in the two demographic groups have diverging expected utilities. We also prove that the unfairness-regularized prediction loss admits unbiased gradient estimators, which are constructed from random mini-batches of training samples, if unfairness is measured by the squared $\mathcal L^2$-distance or by a squared maximum mean discrepancy. In this case, the fair learning problem is susceptible to efficient stochastic gradient descent (SGD) algorithms. Numerical experiments on synthetic and real data show that these SGD algorithms outperform state-of-the-art methods for fair learning in that they achieve superior accuracy-unfairness trade-offs -- sometimes orders of magnitude faster.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# Muffliato: 分散最適化と平均化のためのピアツーピアプライバシの増幅 Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging ( http://arxiv.org/abs/2206.05091v3 ) ライセンス: Link先を確認	Edwige Cyffers, Mathieu Even, Aurélien Bellet, Laurent Massoulié,	(参考訳) 分散最適化は、そのスケーラビリティと効率性のために、機械学習でますます人気がある。ネットワークグラフ内の隣人が送ったメッセージのみをノードが監視するので、直感的には、より優れたプライバシー保証を提供する必要がある。しかし、この利益を形式化し、定量化するのは難しい。既存の結果は通常、分散化の利点を見落としているローカル微分プライバシー(LDP)の保証に制限される。本研究では、ノード$u$からノード$v$へのプライバシリークが、グラフ内の相対的な位置に依存する可能性があるという事実を捉えた、LDPの緩和であるペアワイズネットワーク差分プライバシーを導入する。次に、固定およびランダムな通信グラフ上で、局所ノイズ注入と(単純またはランダム化された)ゴシップ平均化プロトコルの組み合わせを分析する。また、局所勾配降下ステップとゴシップ平均化を交互に交互に行う、微分プライベートな分散最適化アルゴリズムを導出する。我々のアルゴリズムは,グラフのノード間距離の関数としてプライバシ保証を増幅し,信頼されたキュレータのプライバシユーティリティトレードオフをグラフトポロジに明示的に依存する要因にマッチさせることを示した。最後に、合成および実世界のデータセットに関する実験で、プライバシーの向上について説明する。 Decentralized optimization is increasingly popular in machine learning for its scalability and efficiency. Intuitively, it should also provide better privacy guarantees, as nodes only observe the messages sent by their neighbors in the network graph. But formalizing and quantifying this gain is challenging: existing results are typically limited to Local Differential Privacy (LDP) guarantees that overlook the advantages of decentralization. In this work, we introduce pairwise network differential privacy, a relaxation of LDP that captures the fact that the privacy leakage from a node $u$ to a node $v$ may depend on their relative position in the graph. We then analyze the combination of local noise injection with (simple or randomized) gossip averaging protocols on fixed and random communication graphs. We also derive a differentially private decentralized optimization algorithm that alternates between local gradient descent steps and gossip averaging. Our results show that our algorithms amplify privacy guarantees as a function of the distance between nodes in the graph, matching the privacy-utility trade-off of the trusted curator, up to factors that explicitly depend on the graph topology. Finally, we illustrate our privacy gains with experiments on synthetic and real-world datasets.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# 変圧器を用いた変圧オートエンコーダの形式的意味幾何学 Formal Semantic Geometry over Transformer-based Variational AutoEncoder ( http://arxiv.org/abs/2210.06230v2 ) ライセンス: Link先を確認	Yingji Zhang, Danilo S. Carvalho, Ian Pratt-Hartmann, André Freitas,	(参考訳) 形式的/記号的意味論は、その \textit{localisation} や \textit{composition} プロパティによって、標準的で厳密な制御性と文表現への解釈性を提供することができる。言語モデル(LM)の生成を制御・解釈するために、そのような特性を現在の分散文表現にどのように提供できるか。本研究では, 文意味論を<textit{semantic role - word content}特徴の合成として理論的にフレーム化し, 形式的意味幾何学を提案する。このような幾何学をトランスフォーマーベースのLM(すなわちGPT2)に注入するために、トランスフォーマーベースの変分オートエンコーダを監督的アプローチで展開し、低次元の潜在ガウス空間上で文生成を操作・説明することができる。さらに,このような幾何学上の文ベクトルの移動を誘導する新しい探索アルゴリズムを提案する。実験結果から,形式的意味幾何学は文生成により良い制御と解釈をもたらす可能性が示唆された。 Formal/symbolic semantics can provide canonical, rigid controllability and interpretability to sentence representations due to their \textit{localisation} or \textit{composition} property. How can we deliver such property to the current distributional sentence representations to control and interpret the generation of language models (LMs)? In this work, we theoretically frame the sentence semantics as the composition of \textit{semantic role - word content} features and propose the formal semantic geometry. To inject such geometry into Transformer-based LMs (i.e. GPT2), we deploy Transformer-based Variational AutoEncoder with a supervision approach, where the sentence generation can be manipulated and explained over low-dimensional latent Gaussian space. In addition, we propose a new probing algorithm to guide the movement of sentence vectors over such geometry. Experimental results reveal that the formal semantic geometry can potentially deliver better control and interpretation to sentence generation.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# グラフの非現実的説明に関する調査:定義・方法・評価・研究課題 A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges ( http://arxiv.org/abs/2210.12089v3 ) ライセンス: Link先を確認	Mario Alfonso Prado-Romero, Bardh Prenkaj, Giovanni Stilo, Fosca Giannotti,	(参考訳) グラフニューラルネットワーク(GNN)は、コミュニティ検出と分子分類においてよく機能する。 Counterfactual Explanations (CE) はブラックボックスモデルの透明性の限界を克服するための反例を提供する。グラフ学習の関心が高まっているため、我々はGNNにおけるCEの概念に注目している。我々は、分類学、統一表記法、ベンチマークデータセットと評価指標を提供するためにSoAを分析した。本稿では,14の手法,評価プロトコル,22のデータセット,19のメトリクスについて論じる。提案手法の大半をGRETELライブラリに統合し,その強度と落とし穴を理解する実験的な評価を行った。オープンな課題と今後の作業を強調します。 Graph Neural Networks (GNNs) perform well in community detection and molecule classification. Counterfactual Explanations (CE) provide counter-examples to overcome the transparency limitations of black-box models. Due to the growing attention in graph learning, we focus on the concepts of CE for GNNs. We analysed the SoA to provide a taxonomy, a uniform notation, and the benchmarking datasets and evaluation metrics. We discuss fourteen methods, their evaluation protocols, twenty-two datasets, and nineteen metrics. We integrated the majority of methods into the GRETEL library to conduct an empirical evaluation to understand their strengths and pitfalls. We highlight open challenges and future work.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# QMAとQCMAの分配試験オラクル分離 A distribution testing oracle separation between QMA and QCMA ( http://arxiv.org/abs/2210.15380v5 ) ライセンス: Link先を確認	Anand Natarajan, Chinmay Nirkhe,	(参考訳) 量子複雑性理論では、$\textit{non-deterministic}$の量子計算の定義が量子証人$(\textsf{QMA})$、または古典的目撃者がsuffice$(\textsf{QCMA})$を必要としているかどうかという長い問題である。本稿では,各計算複雑性クラスを分離したランダム化された古典オラクルを構築することにより,この問題を進展させる。以前の分離 (Aaronson-Kuperberg (CCC'07), Fefferman-Kimmel (MFCS'18)) は量子ユニタリオラクルを必要とした。分離問題は、正規の非方向グラフでサポートされている分布が複数の連結成分(yesインスタンス)で構成されているか、または1つの拡張連結成分(noインスタンス)で構成されているかを決定することである。したがって、オラクルは$n$ビットブール関数上の分布である。 It is a long-standing open question in quantum complexity theory whether the definition of $\textit{non-deterministic}$ quantum computation requires quantum witnesses $(\textsf{QMA})$ or if classical witnesses suffice $(\textsf{QCMA})$. We make progress on this question by constructing a randomized classical oracle separating the respective computational complexity classes. Previous separations [Aaronson-Kuperberg (CCC'07), Fefferman-Kimmel (MFCS'18)] required a quantum unitary oracle. The separating problem is deciding whether a distribution supported on regular un-directed graphs either consists of multiple connected components (yes instances) or consists of one expanding connected component (no instances) where the graph is given in an adjacency-list format by the oracle. Therefore, the oracle is a distribution over $n$-bit boolean functions.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# 条件付き独立グラフの復元方法:調査 Methods for Recovering Conditional Independence Graphs: A Survey ( http://arxiv.org/abs/2211.06829v2 ) ライセンス: Link先を確認	Harsh Shrivastava, Urszula Chajewska,	(参考訳) 条件付き独立グラフ(CIグラフ)は、主に特徴関係についての洞察を得るために使用される確率的グラフィカルモデルの一種である。各エッジは、直接依存に関する情報を提供する接続された特徴間の部分的相関を表す。本調査では,CIグラフを復元する技術について,さまざまな手法をリストアップし,その進歩について検討する。従来の最適化手法に加えて,最近開発されたディープラーニングアーキテクチャや推奨実装についても取り上げる。より広範な採用を容易にするために、関連する操作を集約するプリミナリ、例えば混合データ型のための共分散行列を得る技術を含める。 Conditional Independence (CI) graphs are a type of probabilistic graphical models that are primarily used to gain insights about feature relationships. Each edge represents the partial correlation between the connected features which gives information about their direct dependence. In this survey, we list out different methods and study the advances in techniques developed to recover CI graphs. We cover traditional optimization methods as well as recently developed deep learning architectures along with their recommended implementations. To facilitate wider adoption, we include preliminaries that consolidate associated operations, for example techniques to obtain covariance matrix for mixed datatypes.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# 分散シフトのためのラベルアライメント規則化 Label Alignment Regularization for Distribution Shift ( http://arxiv.org/abs/2211.14960v4 ) ライセンス: Link先を確認	Ehsan Imani, Guojun Zhang, Runjia Li, Jun Luo, Pascal Poupart, Philip H. S. Torr, Yangchen Pan,	(参考訳) 最近の研究は、教師あり学習におけるラベルアライメント特性(LAP)を強調している。この観測からインスピレーションを得て、対象領域の予測とその頂点特異ベクトルとの整合性を促進する教師なし領域適応の正規化法を提案する。表現の正規化に重点を置く従来のドメイン適応アプローチとは異なり、ソースドメインとターゲットドメインの両方でLAPによって導かれる教師なしのターゲットデータと整合するように分類器を正規化する。理論的解析により、ある仮定の下では、我々の解は対象の領域データの右上特異ベクトルの範囲内にあり、最適解と整合することを示した。古典的領域適応理論で見られる最適結合リスク仮定を除去することにより,従来の領域適応手法が高い結合誤差のためにしばしば不足する問題に対処する上で,本手法の有効性を示す。さらに、MNIST-USPSドメイン適応や言語間感情分析などのよく知られたタスクにおいて、ドメイン適応ベースラインよりもパフォーマンスが向上したことを報告した。 Recent work has highlighted the label alignment property (LAP) in supervised learning, where the vector of all labels in the dataset is mostly in the span of the top few singular vectors of the data matrix. Drawing inspiration from this observation, we propose a regularization method for unsupervised domain adaptation that encourages alignment between the predictions in the target domain and its top singular vectors. Unlike conventional domain adaptation approaches that focus on regularizing representations, we instead regularize the classifier to align with the unsupervised target data, guided by the LAP in both the source and target domains. Theoretical analysis demonstrates that, under certain assumptions, our solution resides within the span of the top right singular vectors of the target domain data and aligns with the optimal solution. By removing the reliance on the commonly used optimal joint risk assumption found in classic domain adaptation theory, we showcase the effectiveness of our method on addressing problems where traditional domain adaptation methods often fall short due to high joint error. Additionally, we report improved performance over domain adaptation baselines in well-known tasks such as MNIST-USPS domain adaptation and cross-lingual sentiment analysis.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# 結晶のディラック・フォックモデルに対する最小化器の存在 Existence of minimizers for the Dirac-Fock model of crystals ( http://arxiv.org/abs/2212.01142v5 ) ライセンス: Link先を確認	Isabelle Catto, Long Meng, Eric Paturel, Eric Séré,	(参考訳) 非相対論的結晶の基底状態に関する数学的および物理学的な文献には、多くの異なるモデルが存在するが、相対論的ケースはあまり研究されておらず、結晶の完全な相対論的処理に関する数学的結果も分かっていない。本稿では,結晶の平均場相対論的エネルギーを周期密度行列で紹介する。このモデルは、原子と分子のディラック・フォック基底状態の最近の定義と、結晶の非相対論的ハートリー・フォックモデルの両方から着想を得ている。細胞1個あたりの電子数があまり多くない場合、基底状態の存在を証明します。 Whereas many different models exist in the mathematical and physics literature for ground states of non-relativistic crystals, the relativistic case has been much less studied and we are not aware of any mathematical result on a fully relativistic treatment of crystals. In this paper, we introduce a mean-field relativistic energy for crystals in terms of periodic density matrices. This model is inspired both from a recent definition of the Dirac-Fock ground state for atoms and molecules, due to one of us, and from the non-relativistic Hartree-Fock model for crystals. We prove the existence of a ground state when the number of electrons per cell is not too large.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# 有限サム最小値問題に対する分散確率勾配勾配勾配 Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems ( http://arxiv.org/abs/2212.02724v3 ) ライセンス: Link先を確認	Hongchang Gao,	(参考訳) 最小限の最適化問題は、多くの機械学習モデルに広く応用されているため、近年、大きな注目を集めている。ミニマックス問題を解決するために,様々な確率最適化手法が提案されている。しかし、そのほとんどは、トレーニングデータが複数のワーカーに分散される分散設定を無視している。本稿では,有限サム最小値問題に対する分散確率勾配勾配昇降法を開発した。特に、分散還元勾配を用いることで、サンプル複雑性とサンプル複雑性を$O(\frac{\sqrt{n}\kappa^3}{(1-\lambda)^2\epsilon^2})$O(\frac{\kappa^3}{(1-\lambda)^2\epsilon^2})$convex-strongly-concave minimax問題に対する通信複雑性を達成できる。我々の研究は、我々が知る限り、この種のミニマックス問題の理論的複雑さを最初に達成したものである。最終的に,本手法をAUCの最大化に適用し,実験結果から本手法の有効性を確認した。 Minimax optimization problems have attracted significant attention in recent years due to their widespread application in numerous machine learning models. To solve the minimax problem, a wide variety of stochastic optimization methods have been proposed. However, most of them ignore the distributed setting where the training data is distributed on multiple workers. In this paper, we developed a novel decentralized stochastic gradient descent ascent method for the finite-sum minimax problem. In particular, by employing the variance-reduced gradient, our method can achieve $O(\frac{\sqrt{n}\kappa^3}{(1-\lambda)^2\epsilon^2})$ sample complexity and $O(\frac{\kappa^3}{(1-\lambda)^2\epsilon^2})$ communication complexity for the nonconvex-strongly-concave minimax problem. As far as we know, our work is the first one to achieve such theoretical complexities for this kind of minimax problem. At last, we apply our method to AUC maximization, and the experimental results confirm the effectiveness of our method.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# 確率的パラメータ摂動に対する変分量子アルゴリズムのロバスト性 Robustness of Variational Quantum Algorithms against stochastic parameter perturbation ( http://arxiv.org/abs/2301.00048v3 ) ライセンス: Link先を確認	Daniil Rabinovich, Ernesto Campos, Soumik Adhikary, Ekaterina Pankovets, Dmitry Vinichenko, Jacob Biamonte,	(参考訳) 変分量子アルゴリズムは、現在の量子デバイスの制約内で実行するように調整されているが、性能劣化エラーによって制限される。本研究では,変動量子アルゴリズムに固有の現実的なゲート誤差を反映したノイズモデルについて考察する。本稿では,このノイズモデルによる変分生成量子状態のデコヒーレンスについて検討する。最適化回路の摂動解析により、安定補題によって設定された基準が満たされるノイズ閾値を決定する。最大14量子ビットの様々な問題に対して,変分量子固有解法と量子近似最適化アルゴリズムについて検討した。さらに,特定のゲートエラーが状態のコヒーレンスに与える影響を著しく小さくし,性能を損なうことなく実行時間を短縮できることを示す。 Variational quantum algorithms are tailored to perform within the constraints of current quantum devices, yet they are limited by performance-degrading errors. In this study, we consider a noise model that reflects realistic gate errors inherent to variational quantum algorithms. We investigate the decoherence of a variationally prepared quantum state due to this noise model, which causes a deviation from the energy estimation in the variational approach. By performing a perturbative analysis of optimized circuits, we determine the noise threshold at which the criteria set by the stability lemma is met. We assess our findings against the variational quantum eigensolver and quantum approximate optimization algorithm for various problems with up to 14 qubits. Moreover, we show that certain gate errors have a significantly smaller impact on the coherence of the state, allowing us to reduce the execution time without compromising performance.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# 機械学習ノートにおける理解のための静的解析駆動型強化 Static Analysis Driven Enhancements for Comprehension in Machine Learning Notebooks ( http://arxiv.org/abs/2301.04419v4 ) ライセンス: Link先を確認	Ashwin Prasad Shivarpatna Venkatesh, Samkutty Sabu, Mouli Chekkapalli, Jiawei Wang, Li Li, Eric Bodden,	(参考訳) Jupyterノートブックを使えば、開発者はリッチテキストとインラインビジュアライゼーションでコードスニペットをインターリーブできる。データサイエンティストはJupyterノートブックを、主にPythonで書かれた機械学習ベースのソリューションの作成と共有のためのデファクトスタンダードとして使っている。しかし、最近の研究では、公共のプラットフォームで利用可能なJupyterノートの大部分が文書化されておらず、物語構造が欠けていることが示されている。これにより、これらのノートの読みやすさが低下する。本稿では,ML操作の分類に基づく分類的マークダウンヘッダーをコードセルに自動アノテートし,その分類に従って関数呼び出しを分類・表示する,新しいツールベースのアプローチであるHeaderGenを提案する。この機能を実現するために、HeaderGenはPyCGの既存のコールグラフ分析を強化した。精度を向上させるため、HeaderGenはPyCGの分析を拡張し、外部ライブラリコードの処理とフロー感度をサポートする。前者は関数戻り型の解決を容易にすることで実現される。 Kaggleによる15個の実世界のJupyterノートの評価は、HeaderGenの基盤となるコールグラフ解析は高い精度(95.6%の精度と95.3%のリコール)が得られることを示している。これは、HeaderGenがpytype(Google)、pyright(Microsoft)、Jediのような既存の型推論ツールが不足している外部ライブラリの戻り型を解決できるためである。ヘッダ生成の精度は85.7%、リコールレートは92.8%である。ユーザスタディでは、HeaderGenが参加者の理解とナビゲーション作業の高速化を支援する。ツールの型推論機能をさらに評価するために,154のコードスニペットと845の型アノテーションを含むマイクロベンチマークを備えた型推論ツール評価フレームワークであるTypeEvalPyを紹介した。 4つのツールの比較分析により、HeaderGenは他のツールよりも優れていることが分かりました。 Jupyter notebooks enable developers to interleave code snippets with rich-text and in-line visualizations. Data scientists use Jupyter notebook as the de-facto standard for creating and sharing machine-learning based solutions, primarily written in Python. Recent studies have demonstrated, however, that a large portion of Jupyter notebooks available on public platforms are undocumented and lacks a narrative structure. This reduces the readability of these notebooks. To address this shortcoming, this paper presents HeaderGen, a novel tool-based approach that automatically annotates code cells with categorical markdown headers based on a taxonomy of ML operations, and classifies and displays function calls according to this taxonomy. For this functionality to be realized, HeaderGen enhances an existing call graph analysis in PyCG. To improve precision, HeaderGen extends PyCG's analysis with support for handling external library code and flow-sensitivity. The former is realized by facilitating the resolution of function return-types. The evaluation on 15 real-world Jupyter notebooks from Kaggle shows that HeaderGen's underlying call graph analysis yields high accuracy (95.6% precision and 95.3% recall). This is because HeaderGen can resolve return-types of external libraries where existing type inference tools such as pytype (by Google), pyright (by Microsoft), and Jedi fall short. The header generation has a precision of 85.7% and a recall rate of 92.8%. In a user study, HeaderGen helps participants finish comprehension and navigation tasks faster. To further evaluate the type inference capability of tools, we introduce TypeEvalPy, a framework for evaluating type inference tools with a micro-benchmark containing 154 code snippets and 845 type annotations. Our comparative analysis on four tools revealed that HeaderGen outperforms other tools in exact matches with the ground truth.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# StreamingFlow: ニューラル正規微分方程式による非同期マルチモーダルデータストリームによる実行予測 StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation ( http://arxiv.org/abs/2302.09585v2 ) ライセンス: Link先を確認	Yining Shi, Kun Jiang, Ke Wang, Jiusi Li, Yunlong Wang, Mengmeng Yang, Diange Yang,	(参考訳) 周囲環境の将来の占有状態を予測することは、自動運転にとって重要な課題である。しかし、現在の最高の単一モダリティ法や多モード融合認識法は、将来の占有状態の均一なスナップショットしか予測できず、センサ融合のために厳密に同期された感覚データを必要とする。本稿では,これらの制約を緩和する新しいフレームワーク,StreamingFlowを提案する。 StreamingFlowは、非同期なマルチセンサーデータストリームを融合させ、将来のタイムスタンプで将来の占有マップのストリーミング予測を実行する、新しいBEV占有予測器である。ニューラル常微分方程式(N-ODE)をリカレントニューラルネットワークに統合することにより、StreamingFlowは時相水平線上のBEV特徴の微分を学習し、融合プロセスの一部として暗黙センサのBEV特徴を更新し、BEV状態を望ましい将来時点に伝播させる。予測のゼロショット一般化能力は良好で、観測された予測時間地平線の補間や、観測されていない将来の期間の合理的な推測に反映される。 nuScenesとLyft L5という2つの大規模なデータセットに対する大規模な実験は、StreamingFlowが従来のビジョンベースのLiDARベースのメソッドよりも大幅に優れており、最先端のフュージョンベースのメソッドよりも優れたパフォーマンスを示していることを示している。 Predicting the future occupancy states of the surrounding environment is a vital task for autonomous driving. However, current best-performing single-modality methods or multi-modality fusion perception methods are only able to predict uniform snapshots of future occupancy states and require strictly synchronized sensory data for sensor fusion. We propose a novel framework, StreamingFlow, to lift these strong limitations. StreamingFlow is a novel BEV occupancy predictor that ingests asynchronous multi-sensor data streams for fusion and performs streaming forecasting of the future occupancy map at any future timestamps. By integrating neural ordinary differential equations (N-ODE) into recurrent neural networks, StreamingFlow learns derivatives of BEV features over temporal horizons, updates the implicit sensor's BEV features as part of the fusion process, and propagates BEV states to the desired future time point. It shows good zero-shot generalization ability of prediction, reflected in the interpolation of the observed prediction time horizon and the reasonable inference of the unseen farther future period. Extensive experiments on two large-scale datasets, nuScenes and Lyft L5, demonstrate that StreamingFlow significantly outperforms previous vision-based, LiDAR-based methods, and shows superior performance compared to state-of-the-art fusion-based methods.	翻訳日:2024-06-13 01:37:54 公開日:2024-06-11
# 可変サイズ圧縮によるデータ依存一般化境界 Data-dependent Generalization Bounds via Variable-Size Compressibility ( http://arxiv.org/abs/2303.05369v3 ) ライセンス: Link先を確認	Milad Sefidgaran, Abdellatif Zaidi,	(参考訳) 本稿では,新たに導入した「可変サイズ圧縮性」フレームワークのレンズを用いて,一般化誤差に関する新たなデータ依存上界を確立する。このフレームワークでは、アルゴリズムの一般化誤差を、その入力データの可変サイズの「圧縮率」にリンクする。これは、その未知の分布ではなく、手元にある入力データの経験的測度に依存する境界を与える。私たちが確立した新しい一般化境界は、テール境界、期待上のテール境界、および予想外境界である。さらに,本フレームワークは,入力データの任意の関数に対する一般境界を導出し,確率変数を出力することも可能であることを示した。特に、これらの一般境界は、いくつかの既存のPAC-Bayesおよび特殊ケースとして回収されるデータ依存の内在的次元ベース境界を仮定し、改善することが示され、我々のアプローチの統一的な特徴が明らかにされる。例えば、一般化誤差を最適化軌跡に接続し、プロセスの速度歪み次元、プロセスのR'enyi情報次元、および計量平均次元と様々な興味深い関係を明らかにする、新しいデータ依存内在次元ベース境界が確立される。 In this paper, we establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework that we introduce newly here. In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data. This is shown to yield bounds that depend on the empirical measure of the given input data at hand, rather than its unknown distribution. Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds. Moreover, it is shown that our framework also allows to derive general bounds on any function of the input data and output hypothesis random variables. In particular, these general bounds are shown to subsume and possibly improve over several existing PAC-Bayes and data-dependent intrinsic dimension-based bounds that are recovered as special cases, thus unveiling a unifying character of our approach. For instance, a new data-dependent intrinsic dimension-based bound is established, which connects the generalization error to the optimization trajectories and reveals various interesting connections with the rate-distortion dimension of a process, the R\'enyi information dimension of a process, and the metric mean dimension.	翻訳日:2024-06-13 01:28:06 公開日:2024-06-11
# マルチスケールサーフェス・ビジョン・トランス The Multiscale Surface Vision Transformer ( http://arxiv.org/abs/2303.11909v3 ) ライセンス: Link先を確認	Simon Dahan, Logan Z. J. Williams, Daniel Rueckert, Emma C. Robinson,	(参考訳) 表面メッシュは、ヒト大脳皮質の構造的および機能的情報を表現するのに好まれる領域であるが、その複雑なトポロジと幾何学は、ディープラーニング解析に重大な課題をもたらす。トランスフォーマーはシーケンス・ツー・シーケンス・ラーニングのドメインに依存しないアーキテクチャとして優れているが、自己注意操作の二次コストは多くの密集予測タスクの障害となっている。視覚変換器を用いた階層型モデリングの最近の進歩に触発されて,表面深層学習のためのバックボーンアーキテクチャとして,Multiscale Surface Vision Transformer (MS-SiT)を導入した。自己保持機構は局所的なメッシュウインドウ内で適用され、基礎となるデータの高精細なサンプリングを可能にし、シフトウインドウ戦略はウィンドウ間の情報の共有を改善する。隣接パッチは順次マージされ、MS-SiTは任意の予測タスクに適した階層表現を学習できる。以上の結果から,MS-SiTは,発達型Human Connectome Project(dHCP)データセットを用いて,新生児の表現型予測タスクにおいて,既存の表面深層学習法よりも優れていた。さらに、表面セグメンテーションのためのU字型アーキテクチャにMS-SiTバックボーンを組み込むことで、UK Biobank(UKB)と手動で注釈付けされたMindBoggleデータセットを使用した皮質パーセル化の競合結果が示される。コードとトレーニングされたモデルはhttps://github.com/metrics-lab/ surface-vision-transformersで公開されている。 Surface meshes are a favoured domain for representing structural and functional information on the human cortex, but their complex topology and geometry pose significant challenges for deep learning analysis. While Transformers have excelled as domain-agnostic architectures for sequence-to-sequence learning, the quadratic cost of the self-attention operation remains an obstacle for many dense prediction tasks. Inspired by some of the latest advances in hierarchical modelling with vision transformers, we introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning. The self-attention mechanism is applied within local-mesh-windows to allow for high-resolution sampling of the underlying data, while a shifted-window strategy improves the sharing of information between windows. Neighbouring patches are successively merged, allowing the MS-SiT to learn hierarchical representations suitable for any prediction task. Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks using the Developing Human Connectome Project (dHCP) dataset. Furthermore, building the MS-SiT backbone into a U-shaped architecture for surface segmentation demonstrates competitive results on cortical parcellation using the UK Biobank (UKB) and manually-annotated MindBoggle datasets. Code and trained models are publicly available at https://github.com/metrics-lab/surface-vision-transformers.	翻訳日:2024-06-13 01:28:06 公開日:2024-06-11
# 慣性幾何論理ゲート Inertial geometric quantum logic gates ( http://arxiv.org/abs/2303.13674v4 ) ライセンス: Link先を確認	Daniel Turyansky, Oded Ovdat, Roie Dann, Ziv Aqua, Ronnie Kosloff, Barak Dayan, Adi Pick,	(参考訳) 我々はSTIRAPと量子論理ゲートの高速かつ堅牢なプロトコルを提案する。我々のゲートは、徐々に加速する慣性ハミルトニアンの瞬時固有状態によって得られる幾何学的位相に基づいている。まず、慣性進化の基準を確立し、その後これらの条件を満たすパルス形状を設計する。これらの調整パルスは、幾何学的論理ゲートの最適化に使用される。我々のプロトコルを$^{87}$Rb原子で解析し、その結果ゲートの忠実度が現在の最先端に近づき、ロバスト性は著しく向上した。 We present rapid and robust protocols for STIRAP and quantum logic gates. Our gates are based on geometric phases acquired by instantaneous eigenstates of a slowly accelerating inertial Hamiltonian. To begin, we establish the criteria for inertial evolution and subsequently engineer pulse shapes that fulfill these conditions. These tailored pulses are then used to optimize geometric logic gates. We analyze a realization of our protocols with $^{87}$Rb atoms, resulting in gate fidelity that approaches the current state-of-the-art, with marked improvements in robustness.	翻訳日:2024-06-13 01:28:06 公開日:2024-06-11
# 文脈的セマンティックシフト検出に関する調査 A Survey on Contextualised Semantic Shift Detection ( http://arxiv.org/abs/2304.01666v2 ) ライセンス: Link先を確認	Stefano Montanelli, Francesco Periti,	(参考訳) セマンティックシフト検出(セマンティックシフト検出、Semantic Shift Detection、SSD)は、ターゲット語の意味における時間的変化を識別し、解釈し、評価するタスクである。伝統的に、SSDは言語学者や社会科学者によって手作業や時間のかかる活動を通じて対処されてきた。近年,自然言語処理と単語埋め込みに基づく計算手法が注目され,SSDを可能な限り自動化している。特に、過去3年間で、単語の複数の使用/意味を処理し、関連するセマンティックシフトをより正確にキャプチャできる、単語コンテキスト化された埋め込みモデルに基づいて、大きな進歩がなされてきた。本稿では,SSDの文脈的埋め込み(CSSDetection)に基づく手法を探索し,表現,時間認識,学習のモダリティ次元を特徴付ける分類フレームワークを提案する。フレームワークが活用されます一シフトアセスメントの措置を見直しること二性能のアプローチを比較すること、及び三スケーラビリティ、解釈可能性、堅牢性の観点から現在の課題について議論すること。 CSS検出に関するオープンな課題と今後の研究方向性が、ついに概説されている。 Semantic Shift Detection (SSD) is the task of identifying, interpreting, and assessing the possible change over time in the meanings of a target word. Traditionally, SSD has been addressed by linguists and social scientists through manual and time-consuming activities. In the recent years, computational approaches based on Natural Language Processing and word embeddings gained increasing attention to automate SSD as much as possible. In particular, over the past three years, significant advancements have been made almost exclusively based on word contextualised embedding models, which can handle the multiple usages/meanings of the words and better capture the related semantic shifts. In this paper, we survey the approaches based on contextualised embeddings for SSD (i.e., CSSDetection) and we propose a classification framework characterised by meaning representation, time-awareness, and learning modality dimensions. The framework is exploited i) to review the measures for shift assessment, ii) to compare the approaches on performance, and iii) to discuss the current issues in terms of scalability, interpretability, and robustness. Open challenges and future research directions about CSSDetection are finally outlined.	翻訳日:2024-06-13 01:28:06 公開日:2024-06-11
# ニューラルネットワークのスパリティはプライバシーを向上する Sparsity in neural networks can improve their privacy ( http://arxiv.org/abs/2304.10553v2 ) ライセンス: Link先を確認	Antoine Gonon, Léon Zheng, Clément Lalanne, Quoc-Tung Le, Guillaume Lauga, Can Pouliquen,	(参考訳) 本稿は、ニューラルネットワークがメンバーシップ推論攻撃に対していかに堅牢になるかを測る。得られた実験結果から,ネットワークの疎結合性はネットワークのプライバシを向上し,タスクに匹敵する性能を保っていることが示された。この実証研究は、既存の文学を完成し、拡張する。 This article measures how sparsity can make neural networks more robust to membership inference attacks. The obtained empirical results show that sparsity improves the privacy of the network, while preserving comparable performances on the task at hand. This empirical study completes and extends existing literature.	翻訳日:2024-06-13 01:28:06 公開日:2024-06-11
# 非可逆ニューラルネットワークによる説明の非有角意味空間の学習 Learning Disentangled Semantic Spaces of Explanations via Invertible Neural Networks ( http://arxiv.org/abs/2305.01713v3 ) ライセンス: Link先を確認	Yingji Zhang, Danilo S. Carvalho, André Freitas,	(参考訳) 切り離された潜在空間は、通常、より良い意味分離性と幾何学的性質を持ち、より良い解釈可能性とより制御可能なデータ生成をもたらす。これはコンピュータビジョンにおいてよく研究されているが、画像のゆがみのようなタスクでは、NLP領域の文のゆがみは、いまだに未調査である。これまでのほとんどの研究は、スタイル転送の文脈において、感情のようなタスク固有の生成因子を混同することに集中してきた。本研究では,より一般的な文意味的特徴の局所的な修正と制御を目的とした,文の絡み合いのより一般的な形態に着目した。これを実現するために,文意味の絡み合いという新しい概念に寄与し,より分離性の良い潜在空間を提供するために,トランスフォーマーベース言語オートエンコーダ(AE)と統合されたフローベース可逆ニューラルネットワーク(INN)機構を導入する。実験結果から,分散潜在空間を意味的に不整合した文空間に適合させることで,近年の最先端言語VAEモデルと比較して,言語解釈性や制御生成性が向上することが確認された。 Disentangled latent spaces usually have better semantic separability and geometrical properties, which leads to better interpretability and more controllable data generation. While this has been well investigated in Computer Vision, in tasks such as image disentanglement, in the NLP domain sentence disentanglement is still comparatively under-investigated. Most previous work have concentrated on disentangling task-specific generative factors, such as sentiment, within the context of style transfer. In this work, we focus on a more general form of sentence disentanglement, targeting the localised modification and control of more general sentence semantic features. To achieve this, we contribute to a novel notion of sentence semantic disentanglement and introduce a flow-based invertible neural network (INN) mechanism integrated with a transformer-based language Autoencoder (AE) in order to deliver latent spaces with better separability properties. Experimental results demonstrate that the model can conform the distributed latent space into a better semantically disentangled sentence space, leading to improved language interpretability and controlled generation when compared to the recent state-of-the-art language VAE models.	翻訳日:2024-06-13 01:28:06 公開日:2024-06-11
# ビジュアル・トランスフォーメーション・テリング Visual Transformation Telling ( http://arxiv.org/abs/2305.01928v2 ) ライセンス: Link先を確認	Wanqing Cui, Xin Hong, Yanyan Lan, Liang Pang, Jiafeng Guo, Xueqi Cheng,	(参考訳) 人間は、表面的な状態の違い(例えば地面の湿気)から、生活経験による変化(例えば雨)まで、自然に推論することができる。本稿では,実世界のシナリオにおいて,この変換推論能力をテストするための新しい視覚的推論タスクを提案する。一連の状態(すなわち画像)が与えられた場合、VTTは隣接する2つの状態間の変換を記述する必要がある。表面状態推論にフォーカスする既存の視覚的推論タスクとは異なり、VTTの利点は、状態の違いの背後にある基本的な原因、例えばアクションやイベントをキャプチャすることである。我々は,CrossTaskとCOINという2つの既存の指導ビデオデータセットから,13,547のサンプルからなる変換推論を支援する新しいデータセットを収集する。各サンプルには、キー状態のイメージとその変換記述が含まれている。我々のデータセットは、様々な現実世界のアクティビティをカバーし、トレーニングと評価のための豊富なリソースを提供する。 VTTの初期ベンチマークを構築するために、従来のビジュアルストーリーテリング手法(CST, GLACNet, Densecap)や高度なマルチモーダルな大規模言語モデル(LLaVA v1.5-7B, Qwen-VL-chat, Gemini Pro Vision, GPT-4o, GPT-4)など、いくつかのモデルを試した。実験の結果、最先端モデルでさえもVTTの課題に直面しており、改善すべき領域を強調していることが明らかとなった。 Humans can naturally reason from superficial state differences (e.g. ground wetness) to transformations descriptions (e.g. raining) according to their life experience. In this paper, we propose a new visual reasoning task to test this transformation reasoning ability in real-world scenarios, called \textbf{V}isual \textbf{T}ransformation \textbf{T}elling (VTT). Given a series of states (i.e. images), VTT requires to describe the transformation occurring between every two adjacent states. Different from existing visual reasoning tasks that focus on surface state reasoning, the advantage of VTT is that it captures the underlying causes, e.g. actions or events, behind the differences among states. We collect a novel dataset to support the study of transformation reasoning from two existing instructional video datasets, CrossTask and COIN, comprising 13,547 samples. Each sample involves the key state images along with their transformation descriptions. Our dataset covers diverse real-world activities, providing a rich resource for training and evaluation. To construct an initial benchmark for VTT, we test several models, including traditional visual storytelling methods (CST, GLACNet, Densecap) and advanced multimodal large language models (LLaVA v1.5-7B, Qwen-VL-chat, Gemini Pro Vision, GPT-4o, and GPT-4). Experimental results reveal that even state-of-the-art models still face challenges in VTT, highlighting substantial areas for improvement.	翻訳日:2024-06-13 01:28:06 公開日:2024-06-11
# OntoType: Ontology-Guided and Pre-Trained Language Model Assisted Fine-Grained Entity Typing OntoType: Ontology-Guided and Pre-Trained Language Model Assisted Fine-Grained Entity Typing ( http://arxiv.org/abs/2305.12307v3 ) ライセンス: Link先を確認	Tanay Komarlu, Minhao Jiang, Xuan Wang, Jiawei Han,	(参考訳) 文脈に敏感できめ細かなセマンティックタイプでテキスト中のエンティティを割り当てるFETは、構造化されていないテキストから知識を抽出するための基本的なタスクであるが重要なタスクである。 FETは自然言語処理において広く研究されており、典型的には人間の注釈付きコーパスをトレーニングに頼っている。近年の研究では、FETのためのリッチでコンテキスト対応の弱監視を生成するための知識基盤として、事前学習言語モデル(PLM)の利用について検討している。しかし、PLMは、粗い型ときめ細かい型、あるいはタイピングに適さないトークンをしばしば生成するため、知識ベースとして機能するために、指示とガイダンスが必要である。本研究では、オントロジーが意味論的にリッチで階層的な構造を提供し、複数のPLMモデルとヘッドワードが生成する最良の結果の選択を支援することをビジョンする。具体的には、アノテーションのないオントロジー誘導型FET手法OntoTypeを提案する。これは、粗いものから細いものまで型オントロジー構造に従っており、複数のPLMをアンサンブルして、型候補のセットを生成し、その型解決を自然言語推論モデルを用いてローカルコンテキスト下で改善する。オントノート,FIGER,NYTデータセットの関連構造を用いた実験により,本手法は最先端のゼロショット・ファインダー・エンティティ・タイピング法,および典型的なLLM法であるChatGPTよりも優れた性能を示した。エラー解析により,既存のオントロジー構造の改良により,より微細なエンティティタイピングが向上することが示された。 Fine-grained entity typing (FET), which assigns entities in text with context-sensitive, fine-grained semantic types, is a basic but important task for knowledge extraction from unstructured text. FET has been studied extensively in natural language processing and typically relies on human-annotated corpora for training, which is costly and difficult to scale. Recent studies explore the utilization of pre-trained language models (PLMs) as a knowledge base to generate rich and context-aware weak supervision for FET. However, a PLM still requires direction and guidance to serve as a knowledge base as they often generate a mixture of rough and fine-grained types, or tokens unsuitable for typing. In this study, we vision that an ontology provides a semantics-rich, hierarchical structure, which will help select the best results generated by multiple PLM models and head words. Specifically, we propose a novel annotation-free, ontology-guided FET method, OntoType, which follows a type ontological structure, from coarse to fine, ensembles multiple PLM prompting results to generate a set of type candidates, and refines its type resolution, under the local context with a natural language inference model. Our experiments on the Ontonotes, FIGER, and NYT datasets using their associated ontological structures demonstrate that our method outperforms the state-of-the-art zero-shot fine-grained entity typing methods as well as a typical LLM method, ChatGPT. Our error analysis shows that refinement of the existing ontology structures will further improve fine-grained entity typing.	翻訳日:2024-06-13 01:28:06 公開日:2024-06-11
# 経験的条件付き一貫した最適輸送 Consistent Optimal Transport with Empirical Conditional Measures ( http://arxiv.org/abs/2305.15901v6 ) ライセンス: Link先を確認	Piyushi Manupriya, Rachit Keerti Das, Sayantan Biswas, Saketha Nath Jagarlapudi,	(参考訳) 2つの連接分布からのサンプルを仮定し,共通変数上での最適輸送(OT)の問題を考える。条件付き変数が連続であるような一般的な設定に着目し、2つの関節分布におけるこの変数の限界は同一でないかもしれない。このような設定では、標準のOT変種は使用できず、新しい推定技術が必要である。主な課題は条件分布が明示的には利用できないことであるため、OTの定式化における鍵となる考え方は、輸送計画の限界と経験的条件とを暗黙的に一致させる、結合サンプル上で計算されたカーネル化された最小二乗項を使うことである。軽度条件下では,条件付き変数の関数として推定された輸送計画が漸近的に最適であることを示す。有限サンプルに対しては、正規化対象の偏差が$O(1/m^{1/4})$で有界であることを示し、$m$はサンプルの数である。また,明示的な確率モデルと暗黙的な生成モデルを用いて条件付き輸送計画をモデル化する方法についても論じる。最適計画が解析的に知られている合成データセット上の推定器の一貫性を実証的に検証する。治療に対する細胞応答の予測という文脈において, 素早い学習や条件生成に応用された場合, 本手法は最先端の手法により改善される。 Given samples from two joint distributions, we consider the problem of Optimal Transportation (OT) between them when conditioned on a common variable. We focus on the general setting where the conditioned variable may be continuous, and the marginals of this variable in the two joint distributions may not be the same. In such settings, standard OT variants cannot be employed, and novel estimation techniques are necessary. Since the main challenge is that the conditional distributions are not explicitly available, the key idea in our OT formulation is to employ kernelized-least-squares terms computed over the joint samples, which implicitly match the transport plan's marginals with the empirical conditionals. Under mild conditions, we prove that our estimated transport plans, as a function of the conditioned variable, are asymptotically optimal. For finite samples, we show that the deviation in terms of our regularized objective is bounded by $O(1/m^{1/4})$, where $m$ is the number of samples. We also discuss how the conditional transport plan could be modelled using explicit probabilistic models as well as using implicit generative ones. We empirically verify the consistency of our estimator on synthetic datasets, where the optimal plan is analytically known. When employed in applications like prompt learning for few-shot classification and conditional-generation in the context of predicting cell responses to treatment, our methodology improves upon state-of-the-art methods.	翻訳日:2024-06-13 01:28:06 公開日:2024-06-11
# 物理インフォームドニューラルネットワークにおける積分損失からの学習 Learning from Integral Losses in Physics Informed Neural Networks ( http://arxiv.org/abs/2305.17387v2 ) ライセンス: Link先を確認	Ehsan Saleh, Saba Ghaffari, Timothy Bretl, Luke Olson, Matthew West,	(参考訳) 本研究は、部分積分微分方程式の下での物理インフォームドネットワークのトレーニング問題に対する解を提案する。これらの方程式は、訓練のために単一の残留物を構築するために無限または多数の神経評価を必要とする。その結果、正確な評価は非現実的であり、これらの積分を偏りのない推定値に置き換える自然な近似は、偏りのある損失関数や解に繋がることを示した。このバイアスを克服するために、決定論的サンプリングアプローチ、二重サンプリング手法、遅延ターゲット手法の3種類の潜在的な解について検討する。ベンチマークのためのPDEの3つのクラスを考える: 1つは特異電荷と最大10次元の弱解を持つポアソン問題、もう1つは電磁場上の弱解とマクスウェル方程式、もう1つはスモロショフスキ凝固問題を定義する。また,提案手法が提案され,提案手法が提案されることにより,サンプルサイズの積分値と同等の精度で精度の高い解が得られることを示す。私たちの実装はオープンソースで、https://github.com/ehsansaleh/btspinn.comで公開しています。 This work proposes a solution for the problem of training physics-informed networks under partial integro-differential equations. These equations require an infinite or a large number of neural evaluations to construct a single residual for training. As a result, accurate evaluation may be impractical, and we show that naive approximations at replacing these integrals with unbiased estimates lead to biased loss functions and solutions. To overcome this bias, we investigate three types of potential solutions: the deterministic sampling approaches, the double-sampling trick, and the delayed target method. We consider three classes of PDEs for benchmarking; one defining Poisson problems with singular charges and weak solutions of up to 10 dimensions, another involving weak solutions on electro-magnetic fields and a Maxwell equation, and a third one defining a Smoluchowski coagulation problem. Our numerical results confirm the existence of the aforementioned bias in practice and also show that our proposed delayed target approach can lead to accurate solutions with comparable quality to ones estimated with a large sample size integral. Our implementation is open-source and available at https://github.com/ehsansaleh/btspinn.	翻訳日:2024-06-13 01:28:06 公開日:2024-06-11
# ShiftAddViT:効率的な視覚変換器に向けた乗算プリミティブの混合 ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer ( http://arxiv.org/abs/2306.06446v5 ) ライセンス: Link先を確認	Haoran You, Huihong Shi, Yipin Guo, Yingyan, Lin,	(参考訳) 視覚変換器(ViT)は印象的な性能を示し、複数の視覚タスクのための統一されたバックボーンとなっている。しかし、ViTsの注意機構と多層パーセプトロン(MLPs)は、濃密な乗算のため、十分に効率が良くないため、コストのかかるトレーニングと推論に繋がる。そこで本研究では,プリコンパイルプリミティブ,例えばビットワイズシフト,加算の混合による事前学習ViTの再パラメータ化を,スクラッチからトレーニングを必要とせず,GPU上でのエンドツーエンドの推論高速化を実現するために,$\textbf{ShiftAddViT}$と呼ばれる新しいタイプの乗算モデルに向けて提案する。具体的には、クエリ、キー、値のすべての$\texttt{MatMuls}$は、ハミング空間のバイナリコードにクエリとキーをマッピングした後、追加のカーネルを使用して再パラメータ化される。残りのMLPまたは線形層はシフトカーネルで再パラメータ化される。我々はTVMを利用して、GPU上のハードウェアの実践的な展開のために、カスタマイズされたカーネルを実装し、最適化する。このような注意再パラメータ化はモデル精度を維持しつつも,MLPに適用した場合の精度低下を必然的に招きかねない。両世界のベストを尽くすために、我々はさらに、乗算またはプリミティブをエキスパートとして取り上げ、例えば、乗算とシフト、新しいレイテンシ対応のロードバランシング損失を設計することで、MDPを再パラメータ化するための、新たな専門家(MoE)フレームワークを提案する。このような損失は、遅延に応じて異なる専門家に動的に入力トークンを割り当てるための一般的なルータのトレーニングに役立つ。様々な2D/3Dトランスフォーマーベースの視覚タスクの広範囲な実験は、提案したShiftAddViTの有効性を一貫して検証し、GPUのレイテンシ低減に$\textbf{5.18$\times$}および$\textbf{42.9}$%の省エネを達成し、オリジナルまたは効率的なViTと同等の精度を維持しながら、最大で$\textbf{5.18$\times$}のレイテンシ削減を実現した。 Vision Transformers (ViTs) have shown impressive performance and have become a unified backbone for multiple vision tasks. However, both the attention mechanism and multi-layer perceptrons (MLPs) in ViTs are not sufficiently efficient due to dense multiplications, leading to costly training and inference. To this end, we propose to reparameterize pre-trained ViTs with a mixture of multiplication primitives, e.g., bitwise shifts and additions, towards a new type of multiplication-reduced model, dubbed $\textbf{ShiftAddViT}$, which aims to achieve end-to-end inference speedups on GPUs without requiring training from scratch. Specifically, all $\texttt{MatMuls}$ among queries, keys, and values are reparameterized using additive kernels, after mapping queries and keys to binary codes in Hamming space. The remaining MLPs or linear layers are then reparameterized with shift kernels. We utilize TVM to implement and optimize those customized kernels for practical hardware deployment on GPUs. We find that such a reparameterization on attention maintains model accuracy, while inevitably leading to accuracy drops when being applied to MLPs. To marry the best of both worlds, we further propose a new mixture of experts (MoE) framework to reparameterize MLPs by taking multiplication or its primitives as experts, e.g., multiplication and shift, and designing a new latency-aware load-balancing loss. Such a loss helps to train a generic router for assigning a dynamic amount of input tokens to different experts according to their latency. Extensive experiments on various 2D/3D Transformer-based vision tasks consistently validate the effectiveness of our proposed ShiftAddViT, achieving up to $\textbf{5.18$\times$}$ latency reductions on GPUs and $\textbf{42.9}$% energy savings, while maintaining a comparable accuracy as original or efficient ViTs.	翻訳日:2024-06-13 01:28:06 公開日:2024-06-11
# FFB: グループフェアネス法におけるフェアフェアネスベンチマーク FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods ( http://arxiv.org/abs/2306.09468v2 ) ライセンス: Link先を確認	Xiaotian Han, Jianfeng Chi, Yu Chen, Qifan Wang, Han Zhao, Na Zou, Xia Hu,	(参考訳) 本稿では,グループフェアネス手法のベンチマークフレームワークであるFair Fairness Benchmark(\textsf{FFB})を紹介する。機械学習における公正性を保証することは倫理的コンプライアンスにとって重要である。しかし、実験的な設定の不整合、アクセス可能なアルゴリズム実装の欠如、現在のフェアネスパッケージやツールの拡張性に制限があるため、フェアネス手法の比較と開発には課題がある。これらの課題に対処するため,グループフェアネス法の評価を行うためのオープンソース標準ベンチマークを導入し,グループフェアネスの異なる概念を確実にするための最先端手法の包括的分析を行う。柔軟性、拡張性、最小限、研究指向のオープンソースコードの提供、統一されたフェアネスメソッドベンチマークパイプラインの確立、および広範なベンチマークにより、$\mathbf{45,079}$実験、$\mathbf{14,428}$GPU時間から重要な洞察を得る。我々は,我々の研究が公正研究コミュニティの成長と発展を著しく促進すると考えている。 This paper introduces the Fair Fairness Benchmark (\textsf{FFB}), a benchmarking framework for in-processing group fairness methods. Ensuring fairness in machine learning is important for ethical compliance. However, there exist challenges in comparing and developing fairness methods due to inconsistencies in experimental settings, lack of accessible algorithmic implementations, and limited extensibility of current fairness packages and tools. To address these issues, we introduce an open-source standardized benchmark for evaluating in-processing group fairness methods and provide a comprehensive analysis of state-of-the-art methods to ensure different notions of group fairness. This work offers the following key contributions: the provision of flexible, extensible, minimalistic, and research-oriented open-source code; the establishment of unified fairness method benchmarking pipelines; and extensive benchmarking, which yields key insights from $\mathbf{45,079}$ experiments, $\mathbf{14,428}$ GPU hours. We believe that our work will significantly facilitate the growth and development of the fairness research community.	翻訳日:2024-06-13 01:18:21 公開日:2024-06-11
# ベイジアン境界補正による列車後ブラックボックス防御 Post-train Black-box Defense via Bayesian Boundary Correction ( http://arxiv.org/abs/2306.16979v3 ) ライセンス: Link先を確認	He Wang, Yunfeng Diao,	(参考訳) ディープニューラルネットワークに基づく分類器は、敵の攻撃の影響を受けやすい。脆弱な分類器が与えられた場合、既存の防御方法は大部分がホワイトボックスであり、しばしば修正された損失関数/訓練体制の下で被害者を再訓練する必要がある。被害者のモデル/データ/トレーニングは、通常、ユーザーには利用できないが、再トレーニングは、限られた計算資源のような理由で不可能ではない。そこで本研究では,ポストトレインブラックボックス防衛フレームワークを提案する。事前訓練された分類器を、モデル固有の知識がほとんどないレジリエントな分類器に変えることができる。これは、それらの結合確率を最大化するために、クリーンデータ、逆例、および分類器に関する新しいジョイントベイズ処理によって達成される。さらに、新たなポストトレイン戦略が装備されており、これは被害者を無傷に保ち、再訓練を避けている。我々はこの枠組みをベイズ境界補正(BBC)と命名した。 BBCは、さまざまなデータタイプに容易に適応できる汎用的で柔軟なフレームワークである。我々は,静的データと動的データの両方に対して,画像分類と骨格に基づく人間の活動認識のためにBBCをインスタンス化する。被曝評価は、BBCが既存の防衛方法と比較して、より優れた堅牢性を持ち、クリーンな精度を著しく損なうことなく、堅牢性を高めることができることを示している。 Classifiers based on deep neural networks are susceptible to adversarial attack, where the widely existing vulnerability has invoked the research in defending them from potential threats. Given a vulnerable classifier, existing defense methods are mostly white-box and often require re-training the victim under modified loss functions/training regimes. While the model/data/training specifics of the victim are usually unavailable to the user, re-training is unappealing, if not impossible for reasons such as limited computational resources. To this end, we propose a new post-train black-box defense framework. It can turn any pre-trained classifier into a resilient one with little knowledge of the model specifics. This is achieved by new joint Bayesian treatments on the clean data, the adversarial examples and the classifier, for maximizing their joint probability. It is further equipped with a new post-train strategy which keeps the victim intact, avoiding re-training. We name our framework Bayesian Boundary Correction (BBC). BBC is a general and flexible framework that can easily adapt to different data types. We instantiate BBC for image classification and skeleton-based human activity recognition, for both static and dynamic data. Exhaustive evaluation shows that BBC has superior robustness and can enhance robustness without severely hurting the clean accuracy, compared with existing defense methods.	翻訳日:2024-06-13 01:18:21 公開日:2024-06-11
# S-HR-VQVAE:映像予測のための逐次階層型残差学習ベクトル量子化変分オートエンコーダ S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction ( http://arxiv.org/abs/2307.06701v2 ) ライセンス: Link先を確認	Mohammad Adiban, Kalin Stefanov, Sabato Marco Siniscalchi, Giampiero Salvi,	(参考訳) 我々は、組み合わせた新しいモデルを作成することによって、映像予測課題に対処する。 (i)最近提案した階層的残差ベクトル量子化変分オートエンコーダ(HR-VQVAE)と (II)新しい時空間PixelCNN(ST-PixelCNN)。本稿では、この手法を逐次階層的残差学習ベクトル量子化変分オートエンコーダ(S-HR-VQVAE)と呼ぶ。 S-HR-VQVAEは、時空間情報を扱うST-PixelCNNの能力と相似表現による静止画像のモデリングにおける本質的な能力を活用することで、ビデオ予測における主要な課題に対処することができる。これには、時空間情報の学習、高次元データの処理、ぼやけた予測との闘い、物理的特性の暗黙的なモデリングが含まれる。 KTH Human ActionとMoving-MNISTタスクの大規模な実験結果から、モデルサイズがはるかに小さいにもかかわらず、定量評価と定性評価の両方において、我々のモデルはトップビデオ予測技術と好適に比較できることが示された。最後に、HR-VQVAEとST-PixelCNNパラメータを共同で推定する新しいトレーニング手法を提案することにより、S-HR-VQVAEを向上する。 We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the ST-PixelCNN's ability at handling spatiotemporal information, S-HR-VQVAE can better deal with chief challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on the KTH Human Action and Moving-MNIST tasks demonstrate that our model compares favorably against top video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and ST-PixelCNN parameters.	翻訳日:2024-06-13 01:18:21 公開日:2024-06-11
# CalibNet: RGB-D Salient Instance Segmentationのためのデュアルブランチクロスモーダル校正 CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation ( http://arxiv.org/abs/2307.08098v2 ) ライセンス: Link先を確認	Jialun Pei, Tao Jiang, He Tang, Nian Liu, Yueming Jin, Deng-Ping Fan, Pheng-Ann Heng,	(参考訳) 本稿では,CalibNetと呼ばれるデュアルブランチ・クロスモーダルな特徴キャリブレーションアーキテクチャを用いて,RGB-Dサリエント・インスタンスセグメンテーションのための新しい手法を提案する。本手法では,カーネルとマスクブランチの深さとRGBの機能を同時に校正し,インスタンス認識カーネルとマスク機能を生成する。 CalibNetは、動的インタラクティブカーネル(DIK)とウェイトシェアリング融合(WSF)の3つの単純なモジュールで構成され、効果的にインスタンス対応カーネルを生成し、クロスモーダル機能を統合する。深度特性の質を向上させるため,DIKおよびWSFに先立ってDSAモジュールを組み込んだ。さらに、詳細なインスタンスレベルのアノテーションを備えた1,940のイメージを含む新しいDSISデータセットも提供します。 COME15K-Nテストセットでは、CalibNetが58.0%APで320480の入力サイズを持ち、他のフレームワークをはるかに上回っている。私たちのコードとデータセットは、https://github.com/PJLallen/CalibNet.comで公開されています。 We propose a novel approach for RGB-D salient instance segmentation using a dual-branch cross-modal feature calibration architecture called CalibNet. Our method simultaneously calibrates depth and RGB features in the kernel and mask branches to generate instance-aware kernels and mask features. CalibNet consists of three simple modules, a dynamic interactive kernel (DIK) and a weight-sharing fusion (WSF), which work together to generate effective instance-aware kernels and integrate cross-modal features. To improve the quality of depth features, we incorporate a depth similarity assessment (DSA) module prior to DIK and WSF. In addition, we further contribute a new DSIS dataset, which contains 1,940 images with elaborate instance-level annotations. Extensive experiments on three challenging benchmarks show that CalibNet yields a promising result, i.e., 58.0% AP with 320480 input size on the COME15K-N test set, which significantly surpasses the alternative frameworks. Our code and dataset are available at: https://github.com/PJLallen/CalibNet.	翻訳日:2024-06-13 01:18:21 公開日:2024-06-11
# 三次元載荷キャパシタ付き車両ルーティング問題に対する強化学習の利用 Using Reinforcement Learning for the Three-Dimensional Loading Capacitated Vehicle Routing Problem ( http://arxiv.org/abs/2307.12136v2 ) ライセンス: Link先を確認	Stefan Schoepf, Stephen Mak, Julian Senoner, Liming Xu, Netland Torbjörn, Alexandra Brintrup,	(参考訳) 重貨物車両はサプライチェーンの輸送システムにおいて重要なバックボーンであるが、英国における負荷効率は60%に過ぎず、二酸化炭素排出量に大きく貢献している。効率を上げるためのソリューションとして、協調的な車両ルーティングが提案されているが、この可能性を秘めている。重要な課題の1つは、コローディングとルーティングのための実行可能なソリューションの効率的な計算である。現在の運用研究手法は,問題の大きさの増大に伴う非線形スケーリングに悩まされており,地理的に限られた領域に縛られ,日々の運用に要する時間に計算結果が計算される。これはルーティングにおける局所的な最適化のみを可能にし、グローバルな最適化の可能性は未対応のままである。約線形時間で3次元負荷容量化車両ルーティング問題を解くための強化学習モデルを開発した。この問題は、運用研究において広く研究されているが、強化学習による問題解決に関する出版物は存在しない。我々は、強化学習モデルの好適なスケーリングを実証し、最先端の手法に対してルーティング性能をベンチマークする。このモデルは、確立された方法と比較して平均ギャップ3.83%から8.10%の範囲で実行される。我々のモデルは、強化学習による大規模ロジスティクス最適化に向けた有望な第一歩であるだけでなく、この研究の流れの基盤でもある。 GitHub:https://github.com/if-loops/3L-CVRP Heavy goods vehicles are vital backbones of the supply chain delivery system but also contribute significantly to carbon emissions with only 60% loading efficiency in the United Kingdom. Collaborative vehicle routing has been proposed as a solution to increase efficiency, but challenges remain to make this a possibility. One key challenge is the efficient computation of viable solutions for co-loading and routing. Current operations research methods suffer from non-linear scaling with increasing problem size and are therefore bound to limited geographic areas to compute results in time for day-to-day operations. This only allows for local optima in routing and leaves global optimisation potential untouched. We develop a reinforcement learning model to solve the three-dimensional loading capacitated vehicle routing problem in approximately linear time. While this problem has been studied extensively in operations research, no publications on solving it with reinforcement learning exist. We demonstrate the favourable scaling of our reinforcement learning model and benchmark our routing performance against state-of-the-art methods. The model performs within an average gap of 3.83% to 8.10% compared to established methods. Our model not only represents a promising first step towards large-scale logistics optimisation with reinforcement learning but also lays the foundation for this research stream. GitHub: https://github.com/if-loops/3L-CVRP	翻訳日:2024-06-13 01:18:21 公開日:2024-06-11
# スパイクニューロンによるスパース事象の学習による自動車物体検出 Automotive Object Detection via Learning Sparse Events by Spiking Neurons ( http://arxiv.org/abs/2307.12900v5 ) ライセンス: Link先を確認	Hu Zhang, Yanchen Li, Luziwei Leng, Kaiwei Che, Qian Liu, Qinghai Guo, Jianxing Liao, Ran Cheng,	(参考訳) イベントベースのセンサーは、1ドル\mathrm{\mu}\text{s}$の時間分解能と120ドル\text{dB}$のダイナミックレンジによって区別される。 ANN(Artificial Neural Networks)を利用する従来のオブジェクト検出技術は、これらのセンサがキャプチャするイベントのスパースで非同期性のため、課題に直面している。対照的に、スパイキングニューラルネットワーク(SNN)は、イベントベースのデータに本質的に整合した時間表現を提供する、有望な代替手段を提供する。本稿では,SNNの膜電位ダイナミクスとスパース現象の変調能力について検討する。安定トレーニング用に設計された革新的スパイクトリガー適応しきい値機構を導入する。これらの知見に基づいて,自動車のイベントベース物体検出に最適化されたスパイク特徴ピラミッドネットワーク(SpikeFPN)を提案する。総合的な評価は、SpikeFPNが従来のSNNと高度なANNの両方を上回り、注意機構が強化されていることを示している。明らかに、SpikeFPNはgen1 Automotive Detection (GAD)ベンチマークデータセットで平均平均精度0.477を達成する。さらに、SpikeFPNの効率的な設計は、計算資源を最適化しながら堅牢な性能を確保する。ソースコードはhttps://github.com/EMI-Group/spikefpn.comで公開されている。 Event-based sensors, distinguished by their high temporal resolution of 1 $\mathrm{\mu}\text{s}$ and a dynamic range of 120 $\text{dB}$, stand out as ideal tools for deployment in fast-paced settings like vehicles and drones. Traditional object detection techniques that utilize Artificial Neural Networks (ANNs) face challenges due to the sparse and asynchronous nature of the events these sensors capture. In contrast, Spiking Neural Networks (SNNs) offer a promising alternative, providing a temporal representation that is inherently aligned with event-based data. This paper explores the unique membrane potential dynamics of SNNs and their ability to modulate sparse events. We introduce an innovative spike-triggered adaptive threshold mechanism designed for stable training. Building on these insights, we present a specialized spiking feature pyramid network (SpikeFPN) optimized for automotive event-based object detection. Comprehensive evaluations demonstrate that SpikeFPN surpasses both traditional SNNs and advanced ANNs enhanced with attention mechanisms. Evidently, SpikeFPN achieves a mean Average Precision (mAP) of 0.477 on the GEN1 Automotive Detection (GAD) benchmark dataset, marking significant increases over the selected SNN baselines. Moreover, the efficient design of SpikeFPN ensures robust performance while optimizing computational resources, attributed to its innate sparse computation capabilities. Source codes are publicly accessible at https://github.com/EMI-Group/spikefpn.	翻訳日:2024-06-13 01:18:21 公開日:2024-06-11
# UniAP: 混合整数擬似プログラミングによる層間および層内自動並列化 UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming ( http://arxiv.org/abs/2307.16375v4 ) ライセンス: Link先を確認	Hao Lin, Ke Wu, Jie Li, Jun Li, Wu-Jun Li,	(参考訳) 分散学習は、ディープラーニングモデル、特に大規模モデルのトレーニングに一般的に使用される。分散学習において、手動並列性(英語版)(MP)法はかなりの人的努力を必要とし、柔軟性に制限がある。したがって、並列戦略最適化プロセスを自動化するために、最近自動並列化法(AP)が提案されている。既存のAP法は、並列戦略の2つのカテゴリ(すなわち層間並列性と層間並列性)を共同で最適化しないため、準最適解に苦しむ。本論文では、混合整数二次計画法により層間および層間自動並列性を統一するUniAPと呼ばれる新しいAP手法を提案する。我々の知る限りでは、UniAPは並列戦略の2つのカテゴリを共同で最適化し、最適な解を見つけるための最初の並列手法である。実験の結果、UniAPは最先端のメソッドをスループット3.80$\times$で上回り、ストラテジー最適化時間を最大107$\times$で5つのTransformerベースのモデルで削減している。 Distributed learning is commonly used for training deep learning models, especially large models. In distributed learning, manual parallelism (MP) methods demand considerable human effort and have limited flexibility. Hence, automatic parallelism (AP) methods have recently been proposed for automating the parallel strategy optimization process. Existing AP methods suffer from sub-optimal solutions because they do not jointly optimize the two categories of parallel strategies (i.e., inter-layer parallelism and intra-layer parallelism). In this paper, we propose a novel AP method called UniAP, which unifies inter- and intra-layer automatic parallelism by mixed integer quadratic programming. To the best of our knowledge, UniAP is the first parallel method that can jointly optimize the two categories of parallel strategies to find an optimal solution. Experimental results show that UniAP outperforms state-of-the-art methods by up to 3.80$\times$ in throughput and reduces strategy optimization time by up to 107$\times$ across five Transformer-based models.	翻訳日:2024-06-13 01:18:21 公開日:2024-06-11
# Text-CRS: テキスト敵対攻撃に対する一般化されたロバストネスフレームワーク Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks ( http://arxiv.org/abs/2307.16630v2 ) ライセンス: Link先を確認	Xinyu Zhang, Hanbin Hong, Yuan Hong, Peng Huang, Binghui Wang, Zhongjie Ba, Kui Ren,	(参考訳) 言語モデル、特に基本テキスト分類モデルは、同義語置換や単語挿入攻撃のようなテキストの敵対攻撃に影響を受けやすいことが示されている。このような攻撃から守るために、モデルロバスト性を改善する研究機関が成長してきた。しかし、実証的ロバスト性の代わりに証明可能なロバスト性を保証することは、まだ広く研究されていない。本稿では,ランダムなスムース化に基づく自然言語処理(NLP)のための一般化された堅牢性フレームワークであるText-CRSを提案する。我々の知る限り、NLPの既存の認証スキームは、同義置換攻撃における$\ell_0$摂動に対する堅牢性しか証明できない。置換と埋め込み変換の組み合わせとして,各単語レベルの逆行操作(同義語置換,単語の並べ替え,挿入,削除)を表現し,このような逆行操作に対して,置換と埋め込みの双方において堅牢性境界を導出するための新しい滑らか化定理を提案する。精度と半径をさらに向上するため、離散語間の数値関係を考察し、ランダムな平滑化のための適切な雑音分布を選択する。最後に、複数の言語モデルとデータセットについてかなりの実験を行う。 Text-CRSは、4つの異なる単語レベルの対数操作すべてに対処でき、精度が大幅に向上する。また,同義語置換攻撃に対する最先端認証よりも,単語レベルの4つの操作の精度と半径に関する最初のベンチマークも提供する。 The language models, especially the basic text classification models, have been shown to be susceptible to textual adversarial attacks such as synonym substitution and word insertion attacks. To defend against such attacks, a growing body of research has been devoted to improving the model robustness. However, providing provable robustness guarantees instead of empirical robustness is still widely unexplored. In this paper, we propose Text-CRS, a generalized certified robustness framework for natural language processing (NLP) based on randomized smoothing. To our best knowledge, existing certified schemes for NLP can only certify the robustness against $\ell_0$ perturbations in synonym substitution attacks. Representing each word-level adversarial operation (i.e., synonym substitution, word reordering, insertion, and deletion) as a combination of permutation and embedding transformation, we propose novel smoothing theorems to derive robustness bounds in both permutation and embedding space against such adversarial operations. To further improve certified accuracy and radius, we consider the numerical relationships between discrete words and select proper noise distributions for the randomized smoothing. Finally, we conduct substantial experiments on multiple language models and datasets. Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement. We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.	翻訳日:2024-06-13 01:18:21 公開日:2024-06-11
# LLaMA-E: オブジェクトインターリーブインストラクションによるEコマースオーサリングの強化 LLaMA-E: Empowering E-commerce Authoring with Object-Interleaved Instruction Following ( http://arxiv.org/abs/2308.04913v2 ) ライセンス: Link先を確認	Kaize Shi, Xueyao Sun, Dingxian Wang, Yinlin Fu, Guandong Xu, Qing Li,	(参考訳) Eコマースのオーサリングは、嗜好の誘惑と検索体験を強化するために、エンゲージメント、多様性、ターゲットとなるコンテンツを作成することを必要とする。 LLM(Large Language Models)はコンテンツ生成に革命をもたらしたが、ドメイン固有の機能の記憶が限られているため、eコマースアプリケーションでは不足することが多い。本稿では、顧客、販売者、プラットフォームの文脈的嗜好に対処する統合eコマースオーサリングモデルであるLLaMA-Eを提案する。広告生成,クエリ強化製品タイトル書き換え,製品分類,購入意図の推測,一般的なeコマースQ&Aといったタスクから導かれる命令セットを設計する。命令の定式化により、提示および要求対象機能のインターリーブ被覆が保証され、ベースモデルのアライメントにより、Eコマースの知識を包括的にパラメータ化することができる。提案したLLaMA-Eモデルは、最先端評価性能を達成し、ゼロショット実用的な応用において優位性を示す。私たちの知る限り、このLLMは、参加オブジェクトに焦点を絞った機能を統合することで、包括的なシナリオ理解を備えたオーサリングアプリケーションを強化するために作られた最初のLLMです。 E-commerce authoring entails creating engaging, diverse, and targeted content to enhance preference elicitation and retrieval experience. While Large Language Models (LLMs) have revolutionized content generation, they often fall short in e-commerce applications due to their limited memorization of domain-specific features. This paper proposes LLaMA-E, the unified e-commerce authoring models that address the contextual preferences of customers, sellers, and platforms, the essential objects in e-commerce operation. We design the instruction set derived from tasks of ads generation, query-enhanced product title rewriting, product classification, purchase intent speculation, and general e-commerce Q&A. The instruction formulation ensures the interleaved cover of the presented and required object features, allowing the alignment of base models to parameterise e-commerce knowledge comprehensively. The proposed LLaMA-E models achieve state-of-the-art evaluation performance and exhibit the advantage in zero-shot practical applications. To our knowledge, this is the first LLM tailored to empower authoring applications with comprehensive scenario understanding by integrating features focused on participated objects.	翻訳日:2024-06-13 01:18:21 公開日:2024-06-11
# 表面加工型オートエンコーダを用いた脳波の時空間符号化 Spatio-Temporal Encoding of Brain Dynamics with Surface Masked Autoencoders ( http://arxiv.org/abs/2308.05474v3 ) ライセンス: Link先を確認	Simon Dahan, Logan Z. J. Williams, Yourong Guo, Daniel Rueckert, Emma C. Robinson,	(参考訳) 人間の脳活動の時空間的ダイナミクスを符号化する堅牢で一般的なモデルの開発は、神経科学的な発見を進める上で不可欠である。しかし、ヒト大脳皮質の組織における顕著な個体差は、これらのシグナルの集団レベルの傾向を特定するのを困難にしている。最近、Surface Vision Transformer (SiTs) は皮質信号のモデリングに有望なアプローチとして登場したが、アーキテクチャに帰納バイアスがないため、低データシナリオではいくつかの制限に直面している。これらの課題に対処するため,本研究では,正中性格子上での皮質信号の多変量および時空間事前学習のための表面Masked AutoEncoder (sMAE) とビデオ表面Masked AutoEncoder (vsMAE) を提案する。これらのモデルは、皮質構造と関数の強い潜在表現を学習することにより、入力のマスクされたバージョンから皮質特徴写像を再構築するように訓練されている。このような表現は、個々の表現型のより良いモデリングに変換され、下流タスクのパフォーマンスが向上する。提案手法は, 若年成人Human Connectome Project(HCP)とHCP(dHCP)の開発データを用いて, 皮質表現型回帰の評価を行った。その結果、(v)sMAE事前学習モデルでは、複数のタスクにおける表現型予測性能が$\ge 26\%$で向上し、スクラッチからトレーニングしたモデルと比較してより高速に収束することが示された。最後に、英国バイオバンク(UKB)のような大規模データセット上の事前学習型ビジョントランスフォーマーが、低データレギュレーションへのトランスファー学習をサポートすることを示す。私たちのコードと事前訓練されたモデルは、https://github.com/metrics-lab/ surface-masked-autoencodersで公開されています。 The development of robust and generalisable models for encoding the spatio-temporal dynamics of human brain activity is crucial for advancing neuroscientific discoveries. However, significant individual variation in the organisation of the human cerebral cortex makes it difficult to identify population-level trends in these signals. Recently, Surface Vision Transformers (SiTs) have emerged as a promising approach for modelling cortical signals, yet they face some limitations in low-data scenarios due to the lack of inductive biases in their architecture. To address these challenges, this paper proposes the surface Masked AutoEncoder (sMAE) and video surface Masked AutoEncoder (vsMAE) - for multivariate and spatio-temporal pre-training of cortical signals over regular icosahedral grids. These models are trained to reconstruct cortical feature maps from masked versions of the input by learning strong latent representations of cortical structure and function. Such representations translate into better modelling of individual phenotypes and enhanced performance in downstream tasks. The proposed approach was evaluated on cortical phenotype regression using data from the young adult Human Connectome Project (HCP) and developing HCP (dHCP). Results show that (v)sMAE pre-trained models improve phenotyping prediction performance on multiple tasks by $\ge 26\%$, and offer faster convergence relative to models trained from scratch. Finally, we show that pre-training vision transformers on large datasets, such as the UK Biobank (UKB), supports transfer learning to low-data regimes. Our code and pre-trained models are publicly available at https://github.com/metrics-lab/surface-masked-autoencoders .	翻訳日:2024-06-13 01:18:21 公開日:2024-06-11
# 神経量子状態における平均場理論の単純性 Simplicity of mean-field theories in neural quantum states ( http://arxiv.org/abs/2308.10934v2 ) ライセンス: Link先を確認	Fabian Ballar Trigueros, Tiago Mendes-Santos, Markus Heyl,	(参考訳) 量子多体波動関数を表現するための人工ニューラルネットワークの利用は、基底状態と非平衡ダイナミクスの両方において、近年大きな進歩を遂げている。しかしながら、このニューラルネットワーク量子状態フレームワーク内の状態複雑性の定量化は、いまだ解明されていない。本研究では、この鍵となるオープンな疑問を、補完的な観点から解決する:どの状態が神経量子状態で表すのが簡単か? 具体的には、置換対称性を持つ平均場理論の基底状態は、限られた数の独立したニューラルネットワークパラメータしか必要としないことを示す。熱力学的限界において, 平均場イジングモデルである完全連結横フィールドイジングモデル(TFIM)の基底状態への収束は, 1つのパラメータだけで達成できることを解析的に証明した。解析を拡大し、置換対称性の破れの下で1パラメータアンサッツの挙動を探索する。その目的のために、TFIMは可変長範囲の相互作用を持ち、相互作用指数$\alpha$を特徴とする。神経量子状態に対する1パラメータのアンサッツは、0ドル以上の値の全ての基底状態を正確にキャプチャし、この状態におけるモデルの平均場記述を暗示する。 The utilization of artificial neural networks for representing quantum many-body wave functions has garnered significant attention, with enormous recent progress for both ground states and non-equilibrium dynamics. However, quantifying state complexity within this neural quantum states framework remains elusive. In this study, we address this key open question from the complementary point of view: Which states are simple to represent with neural quantum states? Concretely, we show on a general level that ground states of mean-field theories with permutation symmetry only require a limited number of independent neural network parameters. We analytically establish that, in the thermodynamic limit, convergence to the ground state of the fully-connected transverse-field Ising model (TFIM), the mean-field Ising model, can be achieved with just one single parameter. Expanding our analysis, we explore the behavior of the 1-parameter ansatz under breaking of the permutation symmetry. For that purpose, we consider the TFIM with tunable long-range interactions, characterized by an interaction exponent $\alpha$. We show analytically that the 1-parameter ansatz for the neural quantum state still accurately captures the ground state for a whole range of values for $0\le \alpha \le 1$, implying a mean-field description of the model in this regime.	翻訳日:2024-06-13 01:18:21 公開日:2024-06-11
# 近似メッセージパッシングによる構造一般化線形モデルのスペクトル推定 Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing ( http://arxiv.org/abs/2308.14507v2 ) ライセンス: Link先を確認	Yihan Zhang, Hong Chang Ji, Ramji Venkataramanan, Marco Mondelli,	(参考訳) 本研究では,高次元一般化線形モデルにおけるパラメータ推定の問題について考察する。適切なデータ依存行列の主固有ベクトルを介して得られるスペクトル法は、単純だが驚くほど効果的な解を与える。しかし、その広範囲な使用にもかかわらず、厳密な性能特性とデータ前処理の原理的な方法が、非構造化(すなわちガウスおよびハール直交)設計でのみ利用可能である。対照的に、実世界のデータ行列は高度に構造化されており、非自明な相関を示す。この問題に対処するために、共分散行列$\Sigma$を介して特徴の異方性を取り込む相関ガウス設計を考える。本研究の主な成果は,スペクトル推定器の性能の高精度な漸近的評価である。これにより、パラメータ推定に必要なサンプルの数を最小化する最適な前処理を特定できる。驚くべきことに、そのような前処理は幅広い統計モデルの集合で普遍的であり、部分的には回転不変な設計のための最適なスペクトル推定器の予想に対処する。我々の原理的アプローチは、計算画像や遺伝学に共通する設計を含む、過去のヒューリスティックな手法を大幅に改善する。提案手法は, 近似メッセージパッシングを基礎として, スパイクされた行列の精密な評価と, 対応するスペクトル手法の様々な設定への道を開くものである。 We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructured (i.i.d.\ Gaussian and Haar orthogonal) designs. In contrast, real-world data matrices are highly structured and exhibit non-trivial correlations. To address the problem, we consider correlated Gaussian designs capturing the anisotropic nature of the features via a covariance matrix $\Sigma$. Our main result is a precise asymptotic characterization of the performance of spectral estimators. This allows us to identify the optimal preprocessing that minimizes the number of samples needed for parameter estimation. Surprisingly, such preprocessing is universal across a broad set of statistical models, which partly addresses a conjecture on optimal spectral estimators for rotationally invariant designs. Our principled approach vastly improves upon previous heuristic methods, including for designs common in computational imaging and genetics. The proposed methodology, based on approximate message passing, is broadly applicable and opens the way to the precise characterization of spiked matrices and of the corresponding spectral methods in a variety of settings.	翻訳日:2024-06-13 01:18:21 公開日:2024-06-11
# 顔解析のための均質タン変換による閉塞型深部畳み込みニューラルネットワーク Occlusion-Aware Deep Convolutional Neural Network via Homogeneous Tanh-transforms for Face Parsing ( http://arxiv.org/abs/2308.15323v2 ) ライセンス: Link先を確認	Jianhua Qiua, Weihua Liu, Chaochao Lin, Jiaojiao Li, Haoping Yu, Said Boumaraf,	(参考訳) 顔解析は、各意味的顔成分に対して画素単位のラベルマップを推論する。しかし、特に新型コロナウイルス(COVID-19)の流行で顔の閉塞が一般的な状況になった場合、顔の閉塞を見逃し、単一の顔の外のいくつかの文脈的領域を無視する。 4つの異なるランプからの照明が1つの中心光源よりも均一な分布をもたらす日常生活の照明現象に着想を得て, 4つのタン変換からなる画像前処理のための新しい均一タン変換を提案する。これらの変換は中心視と周辺視を融合させる。提案手法は,隠蔽下での顔解析のジレンマに対処し,周囲の文脈からより多くの情報を圧縮する。均質なtanh-transformsに基づいて,隠蔽顔解析のためのオクルージョン対応畳み込みニューラルネットワークを提案する。タン・ポーラ空間とタン・カルテシアン空間の両方の情報を組み合わせて、受容場を拡張できる。さらに,隠蔽領域の境界に焦点を合わせるために,隠蔽意識の喪失を導入する。ネットワークはシンプルで柔軟性があり、エンドツーエンドでトレーニングすることができる。隠蔽顔解析の今後の研究を容易にするため,新しい顔解析データセットも提供した。このデータセットは、CelebAMask-HQ、Short-video Face Parsing、Helenデータセットなど、いくつかの学術的あるいは産業的なデータセットから手動で精製され、公開される。実験により,本手法は隠蔽下での顔解析において最先端の手法を超越していることが示された。 Face parsing infers a pixel-wise label map for each semantic facial component. Previous methods generally work well for uncovered faces, however, they overlook facial occlusion and ignore some contextual areas outside a single face, especially when facial occlusion has become a common situation during the COVID-19 epidemic. Inspired by the lighting phenomena in everyday life, where illumination from four distinct lamps provides a more uniform distribution than a single central light source, we propose a novel homogeneous tanh-transform for image preprocessing, which is made up of four tanh-transforms. These transforms fuse the central vision and the peripheral vision together. Our proposed method addresses the dilemma of face parsing under occlusion and compresses more information from the surrounding context. Based on homogeneous tanh-transforms, we propose an occlusion-aware convolutional neural network for occluded face parsing. It combines information in both Tanh-polar space and Tanh-Cartesian space, capable of enhancing receptive fields. Furthermore, we introduce an occlusion-aware loss to focus on the boundaries of occluded regions. The network is simple, flexible, and can be trained end-to-end. To facilitate future research of occluded face parsing, we also contribute a new cleaned face parsing dataset. This dataset is manually purified from several academic or industrial datasets, including CelebAMask-HQ, Short-video Face Parsing, and the Helen dataset, and will be made public. Experiments demonstrate that our method surpasses state-of-the-art methods in face parsing under occlusion.	翻訳日:2024-06-13 01:08:37 公開日:2024-06-11
# 教育データマイニングにおけるディープラーニング技術に関する総合的研究 A Comprehensive Survey on Deep Learning Techniques in Educational Data Mining ( http://arxiv.org/abs/2309.04761v4 ) ライセンス: Link先を確認	Yuanguo Lin, Hong Chen, Wei Xia, Fan Lin, Zongyue Wang, Yong Liu,	(参考訳) 教育データマイニング(EDM: Educational Data Mining)は、計算技術の力を利用して教育データを分析する研究分野として発展してきた。教育データの複雑さと多様性の増大に伴い、ディープラーニング技術は、これらのデータを分析してモデル化する際の課題に対処する上で、大きな利点を示してきた。この調査は、Deep LearningによるEDMの最先端を体系的にレビューすることを目的としている。まず、EDMとDeep Learningの簡単な紹介から始め、現代の教育の文脈におけるそれらの関連性を強調します。次に、知識追跡、学生の行動検出、パフォーマンス予測、パーソナライズドレコメンデーションを含む4つの典型的な教育シナリオに適用されるディープラーニング技術について、詳細なレビューを行う。さらに、EDMのための公開データセットと処理ツールの概要を概観する。次に、EDMの実践的課題を分析し、対象とするソリューションを提案する。最後に,本研究領域における新たな動向と今後の方向性を指摘する。 Educational Data Mining (EDM) has emerged as a vital field of research, which harnesses the power of computational techniques to analyze educational data. With the increasing complexity and diversity of educational data, Deep Learning techniques have shown significant advantages in addressing the challenges associated with analyzing and modeling this data. This survey aims to systematically review the state-of-the-art in EDM with Deep Learning. We begin by providing a brief introduction to EDM and Deep Learning, highlighting their relevance in the context of modern education. Next, we present a detailed review of Deep Learning techniques applied in four typical educational scenarios, including knowledge tracing, student behavior detection, performance prediction, and personalized recommendation. Furthermore, a comprehensive overview of public datasets and processing tools for EDM is provided. We then analyze the practical challenges in EDM and propose targeted solutions. Finally, we point out emerging trends and future directions in this research area.	翻訳日:2024-06-13 01:08:37 公開日:2024-06-11
# SSWPのコントラスト前処理によるマルチモーダル自動韻律アノテーション Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP ( http://arxiv.org/abs/2309.05423v2 ) ライセンス: Link先を確認	Jinzuomu Zhong, Yang Li, Hui Huang, Korin Richmond, Jie Liu, Zhiba Su, Jing Guo, Benlai Tang, Fengjie Zhu,	(参考訳) 表現的・制御可能なテキスト音声(TTS)では、明瞭な韻律的特徴は合成音声の自然性と制御性を著しく改善する。しかし、手動韻律アノテーションは労働集約的で矛盾する。本稿では,2段階自動アノテーションパイプラインを提案する。第1段階では、音声文と単語句読解(SSWP)ペアのコントラスト事前学習を用いて、潜在表現における韻律情報を強化する。第2段階では,事前訓練されたエンコーダ,テキスト合成方式,シーケンス分類器からなるマルチモーダルな韻律アノテータを構築した。英語の韻律境界に対する実験により, 韻律語と韻律句境界に対する0.72と0.93f1のスコアが得られた。 In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly improve the naturalness and controllability of synthesised speech. However, manual prosody annotation is labor-intensive and inconsistent. To address this issue, a two-stage automatic annotation pipeline is novelly proposed in this paper. In the first stage, we use contrastive pretraining of Speech-Silence and Word-Punctuation (SSWP) pairs to enhance prosodic information in latent representations. In the second stage, we build a multi-modal prosody annotator, comprising pretrained encoders, a text-speech fusing scheme, and a sequence classifier. Experiments on English prosodic boundaries demonstrate that our method achieves state-of-the-art (SOTA) performance with 0.72 and 0.93 f1 score for Prosodic Word and Prosodic Phrase boundary respectively, while bearing remarkable robustness to data scarcity.	翻訳日:2024-06-13 01:08:37 公開日:2024-06-11
# 累積的知識プロセスにおけるエラーのロバストな評価 Errors are Robustly Tamed in Cumulative Knowledge Processes ( http://arxiv.org/abs/2309.05638v3 ) ライセンス: Link先を確認	Anna Brandenberger, Cassandra Marcussen, Elchanan Mossel, Madhu Sudan,	(参考訳) 我々は,新たな知識単位の妥当性が,その導出の正しさとそれに依存する単位の正しさに左右される社会知識蓄積の過程を研究する。この設定における根本的な疑問は、もし新しい導出の一定割合が間違っているなら、社会における一定の知識の一定割合が有効であることを確実にするために、一定割合を1つから切り離して投資することができるか、というものである。 Ben-Eliezer氏、Mikulincer氏、Mossel氏、Sudan氏(ITCS 2023)は、そのような質問を分析するための具体的な確率モデルを導入し、この問題に対する肯定的な回答を示した。しかしながら、彼らの研究は、各新しいユニットが1つの既存のユニットに依存し、ユニットが$\textit{preferential attachment rule}$に従ってアタッチされる単純なケースに焦点を当てている。本研究は, 累積的知識プロセスの一般的なファミリーについて考察するものであり, 新しいユニットは, 様々なアタッチメント機構に従ってアタッチメントし, 既存の複数のユニットに依存することができる。また、敵ノードの挿入(ランダム)を許す。これらのモデルの$\textit{all}$に対して、多くの単位が依存する単位の有界数をチェックするための単純なヒューリスティックに従う限り、全てのエラーは最終的に排除される。以上の結果から,新たな単位が導出・提示される際には,十分な注意を要するが,コストのかかるチェックを行っていない限り,知識単位の大規模な相互依存コレクションの品質維持が可能であることが示唆された。 We study processes of societal knowledge accumulation, where the validity of a new unit of knowledge depends both on the correctness of its derivation and on the validity of the units it depends on. A fundamental question in this setting is: If a constant fraction of the new derivations is wrong, can investing a constant fraction, bounded away from one, of effort ensure that a constant fraction of knowledge in society is valid? Ben-Eliezer, Mikulincer, Mossel, and Sudan (ITCS 2023) introduced a concrete probabilistic model to analyze such questions and showed an affirmative answer to this question. Their study, however, focuses on the simple case where each new unit depends on just one existing unit, and units attach according to a $\textit{preferential attachment rule}$. In this work, we consider much more general families of cumulative knowledge processes, where new units may attach according to varied attachment mechanisms and depend on multiple existing units. We also allow a (random) fraction of insertions of adversarial nodes. We give a robust affirmative answer to the above question by showing that for $\textit{all}$ of these models, as long as many of the units follow simple heuristics for checking a bounded number of units they depend on, all errors will be eventually eliminated. Our results indicate that preserving the quality of large interdependent collections of units of knowledge is feasible, as long as careful but not too costly checks are performed when new units are derived/deposited.	翻訳日:2024-06-13 01:08:37 公開日:2024-06-11
# 三角格子上の量子ループモデルから生じる立方体臨界 Cubic criticality emerging from a quantum loop model on triangular lattice ( http://arxiv.org/abs/2309.05715v3 ) ライセンス: Link先を確認	Xiaoxue Ran, Zheng Yan, Yan-Cheng Wang, Junchen Rong, Yang Qi, Zi Yang Meng,	(参考訳) 量子ループと二量体モデル(英語版)は、局所的な制約を伴う相関系のアーキティパルな例である。これらのモデルに対する一般的な解を得ることは、熱力学の限界においてそれらを解決するための制御方法が欠如しているため困難である。しかしながら、これらの解は、統計場理論と量子場理論、およびライドバーグ原子配列と量子モイア材料における急速に成長する実験に直ちに関係しており、相関と局所的な制約の間の相互作用は、多くの新しい現象を引き起こす。最近の研究 (X. Ran, Z. Yan, Y.-C. Wang, et al, arXiv:2205.04472 (2022)] において、三角量子量子量子ループモデル(QLM)が格子ネマティック、ビゾンプラケット(VP)結晶とロクサー・キヴェルソン点に近い量子スピン液体(QSL)の豊富な基底状態のダイアグラムをホストしていることが判明した。ここでは、VPとQSL相を分離する連続量子臨界点に注目し、QMCシミュレーションにおいて静的および動的プローブの両方を通して、この遷移が(2+1)D立方の普遍性であることを実証する。この遷移において、QSLの分数化バイソンは結晶化VP相を生じさせるが、その痕跡は従来の立方晶またはO(3)量子臨界点のものと比べれば、二量体およびバイソンスペクトルにおける不規則に大きな不規則次元指数と顕著な連続性を残している。 Quantum loop and dimer models are archetypal examples of correlated systems with local constraints. Obtaining generic solutions for these models is difficult due to the lack of controlled methods to solve them in the thermodynamic limit. Nevertheless, these solutions are of immediate relevance to both statistical and quantum field theories, as well as the rapidly growing experiments in Rydberg atom arrays and quantum moir\'e materials, where the interplay between correlation and local constraints gives rise to a plethora of novel phenomena. In a recent work [X. Ran, Z. Yan, Y.-C. Wang, et al, arXiv:2205.04472 (2022)], it was found through sweeping cluster quantum Monte Carlo (QMC) simulations and field theory analysis that the triangular lattice quantum loop model (QLM) hosts a rich ground state phase diagram with lattice nematic, vison plaquette (VP) crystals, and the $\mathbb{Z}_2$ quantum spin liquid (QSL) close to the Rokhsar-Kivelson point. Here, we focus on the continuous quantum critical point separating the VP and QSL phases and demonstrate via both static and dynamic probes in QMC simulations that this transition is of the (2+1)D cubic universality. In this transition, the fractionalized visons in QSL condense to give rise to the crystalline VP phase, while leaving their trace in the anomalously large anomalous dimension exponent and pronounced continua in the dimer and vison spectra compared with those at the conventional cubic or O(3) quantum critical points.	翻訳日:2024-06-13 01:08:37 公開日:2024-06-11
# 音声によるゼロショットバード分類のためのメタ情報探索 Exploring Meta Information for Audio-based Zero-shot Bird Classification ( http://arxiv.org/abs/2309.08398v2 ) ライセンス: Link先を確認	Alexander Gebhard, Andreas Triantafyllopoulos, Teresa Bez, Lukas Christ, Alexander Kathan, Björn W. Schuller,	(参考訳) 受動的音響モニタリングと機械学習の進歩は、計算バイオ音響研究のための膨大なデータセットの調達につながった。それでも、データ不足は希少で表現不足の種にとって依然として問題である。本研究では,多種多様なメタデータが利用可能であることから,鳥種を事例として,メタ情報を用いてゼロショット音声分類を改善する方法について検討した。本稿では, (S)BERTで符号化されたテキストによる鳥の音響記述, 機能的特徴 (AVONET) , 鳥の生活史 (BLH) の特徴の3つの異なるメタデータ源について検討する。音声の特徴として、オーディオ・スペクトログラム・トランスフォーマー(AST)の埋め込みを抽出し、単一の線形層を用いて補助情報の次元に投影する。次に,ドット積を互換性関数とし,標準ゼロショット学習ランキングヒンジ損失を用いて正しいクラスを決定する。 AVONETとBLHの機能は8から10のクラスを持つ5つのテストセットに対して平均未重み付きF1スコアが.233である。 Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research. Nevertheless, data scarcity is still an issue for rare and underrepresented species. This study investigates how meta-information can improve zero-shot audio classification, utilising bird species as an example case study due to the availability of rich and diverse meta-data. We investigate three different sources of metadata: textual bird sound descriptions encoded via (S)BERT, functional traits (AVONET), and bird life-history (BLH) characteristics. As audio features, we extract audio spectrogram transformer (AST) embeddings and project them to the dimension of the auxiliary information by adopting a single linear layer. Then, we employ the dot product as compatibility function and a standard zero-shot learning ranking hinge loss to determine the correct class. The best results are achieved by concatenating the AVONET and BLH features attaining a mean unweighted F1-score of .233 over five different test sets with 8 to 10 classes.	翻訳日:2024-06-13 01:08:37 公開日:2024-06-11
# 連合学習におけるエッジノードの効率的な資源利用に向けて Toward efficient resource utilization at edge nodes in federated learning ( http://arxiv.org/abs/2309.10367v2 ) ライセンス: Link先を確認	Sadi Alawadi, Addi Ait-Mlouk, Salman Toor, Andreas Hellander,	(参考訳) フェデレートラーニング(FL)は、エッジノードがデータを共有することなく、グローバルモデルの構築に協力的に貢献することを可能にする。これはローカルでプライベートなモデル更新を計算し、サーバによって集約されるデバイスによって実現される。しかし、計算資源の制約やネットワーク通信は、ディープラーニングアプリケーションに典型的なより大きなモデルサイズにとって深刻なボトルネックとなる可能性がある。エッジノードは、限られたハードウェアリソース(RAM、CPU)を持つ傾向があり、エッジにおけるネットワーク帯域幅と信頼性は、フェデレートされたフリートアプリケーションのスケーリングに関する問題である。本稿では,転送学習にインスパイアされたFL戦略を提案し,各グローバルトレーニングラウンドにおけるサーバとネットワークの負荷を低減させる。ローカルモデルの更新毎に、トレーニング対象のレイヤをランダムに選択し、モデルの残りの部分を凍結します。これにより、トレーニングされていないすべてのレイヤ重みをサーバに転送することを排除することで、1ラウンドあたりのサーバ負荷と通信コストを削減できます。本研究の目的は,デバイス上での資源利用と,提案した戦略の下でのグローバルモデル収束とのトレードオフを実証的に検討することである。フェデレート学習フレームワークFEDnを用いて,本手法を実装した。異なるデータセット(CIFAR-10、CASA、IMDB)で多数の実験を行い、異なるディープラーニングモデルアーキテクチャを使用して異なるタスクを実行した。実験の結果,トレーニングの過程を部分的に加速し,デバイス上で資源を効率的に利用し,25%のトレーニングを行うと約75%,53%のデータ伝送量を削減でき,その結果のグローバルモデル精度を損なうことなく,モデル層全体の50%をトレーニングできることがわかった。 Federated learning (FL) enables edge nodes to collaboratively contribute to constructing a global model without sharing their data. This is accomplished by devices computing local, private model updates that are then aggregated by a server. However, computational resource constraints and network communication can become a severe bottleneck for larger model sizes typical for deep learning applications. Edge nodes tend to have limited hardware resources (RAM, CPU), and the network bandwidth and reliability at the edge is a concern for scaling federated fleet applications. In this paper, we propose and evaluate a FL strategy inspired by transfer learning in order to reduce resource utilization on devices, as well as the load on the server and network in each global training round. For each local model update, we randomly select layers to train, freezing the remaining part of the model. In doing so, we can reduce both server load and communication costs per round by excluding all untrained layer weights from being transferred to the server. The goal of this study is to empirically explore the potential trade-off between resource utilization on devices and global model convergence under the proposed strategy. We implement the approach using the federated learning framework FEDn. A number of experiments were carried out over different datasets (CIFAR-10, CASA, and IMDB), performing different tasks using different deep-learning model architectures. Our results show that training the model partially can accelerate the training process, efficiently utilizes resources on-device, and reduce the data transmission by around 75% and 53% when we train 25%, and 50% of the model layers, respectively, without harming the resulting global model accuracy.	翻訳日:2024-06-13 01:08:37 公開日:2024-06-11
# 説明可能なAIによる分類からセグメンテーションへ:き裂検出と成長モニタリングに関する研究 From Classification to Segmentation with Explainable AI: A Study on Crack Detection and Growth Monitoring ( http://arxiv.org/abs/2309.11267v2 ) ライセンス: Link先を確認	Florent Forest, Hugo Porta, Devis Tuia, Olga Fink,	(参考訳) インフラの表面ひび割れのモニタリングは、構造的健康モニタリングに不可欠である。自動視覚検査は、特に難解な領域において、効果的な解決策を提供する。機械学習アプローチはその効果を証明しているが、典型的には教師付きトレーニングには大きな注釈付きデータセットが必要である。ひび割れが検出されると、その重症度を監視するために、損傷の正確なセグメンテーションが要求される。しかし、セグメンテーションのための画像のピクセルレベルのアノテーションは、労働集約的である。このコストを軽減するために、説明可能な人工知能(XAI)を利用して分類器の説明からセグメンテーションを導き、画像レベルの監督が弱いだけを必要とする。本稿では,この手法を表面ひび割れの分断とモニタリングに応用することを提案する。各種XAI法の性能評価を行い,本手法が重度定量化と成長モニタリングをいかに促進するかを検討する。その結果, 得られたセグメンテーションマスクは, 教師付き手法よりも品質が低いが, 意味を保ち, 重度モニタリングが可能であり, 実質的なラベリングコストを低減できることがわかった。 Monitoring surface cracks in infrastructure is crucial for structural health monitoring. Automatic visual inspection offers an effective solution, especially in hard-to-reach areas. Machine learning approaches have proven their effectiveness but typically require large annotated datasets for supervised training. Once a crack is detected, monitoring its severity often demands precise segmentation of the damage. However, pixel-level annotation of images for segmentation is labor-intensive. To mitigate this cost, one can leverage explainable artificial intelligence (XAI) to derive segmentations from the explanations of a classifier, requiring only weak image-level supervision. This paper proposes applying this methodology to segment and monitor surface cracks. We evaluate the performance of various XAI methods and examine how this approach facilitates severity quantification and growth monitoring. Results reveal that while the resulting segmentation masks may exhibit lower quality than those produced by supervised methods, they remain meaningful and enable severity monitoring, thus reducing substantial labeling costs.	翻訳日:2024-06-13 01:08:37 公開日:2024-06-11
# Memory Gym: エージェントのメモリ能力のベンチマークに終止符を打つ Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents ( http://arxiv.org/abs/2309.17207v4 ) ライセンス: Link先を確認	Marco Pleines, Matthias Pallasch, Frank Zimmer, Mike Preuss,	(参考訳) Memory GymはMortar Mayhem、Mystery Path、Searing Spotlightsという、意思決定エージェントのメモリ能力をベンチマークするために設計された2D部分観測可能な環境のスイートを提供する。これらの環境はもともと有限なタスクを持ち、''I pack my bag''のような累積記憶ゲームにおけるエスカレーションの課題を反映して、革新的で無限の形式に拡張されている。タスク設計におけるこの進歩は、単にサンプル効率を評価することから、動的で長期のシナリオにおけるメモリ効率のレベルを推定することへと焦点を移す。利用可能なメモリベースのDeep Reinforcement Learningベースラインのギャップを解決するために,Transformer-XL (TrXL) とプロキシポリシー最適化を統合した実装を導入する。このアプローチでは、TrXLをエピソードメモリの形式として使用し、スライディングウインドウ技術を用いている。 Gated Recurrent Unit (GRU) と TrXL の比較では,異なる設定で異なる性能を示す。 TrXLは有限環境において、ミステリーパスやモルタル・メイヘムにおいて優れた試料効率を示す。しかし、GRUはSeaning Spotlightsよりも効率的である。最も注目すべきは、すべての無限のタスクにおいて、GRUは顕著な復活を行い、TrXLを著しく上回っていることである。 Website and Source Code: https://github.com/MarcoMeter/endless-Memory-gym/ Memory Gym presents a suite of 2D partially observable environments, namely Mortar Mayhem, Mystery Path, and Searing Spotlights, designed to benchmark memory capabilities in decision-making agents. These environments, originally with finite tasks, are expanded into innovative, endless formats, mirroring the escalating challenges of cumulative memory games such as ``I packed my bag''. This progression in task design shifts the focus from merely assessing sample efficiency to also probing the levels of memory effectiveness in dynamic, prolonged scenarios. To address the gap in available memory-based Deep Reinforcement Learning baselines, we introduce an implementation that integrates Transformer-XL (TrXL) with Proximal Policy Optimization. This approach utilizes TrXL as a form of episodic memory, employing a sliding window technique. Our comparative study between the Gated Recurrent Unit (GRU) and TrXL reveals varied performances across different settings. TrXL, on the finite environments, demonstrates superior sample efficiency in Mystery Path and outperforms in Mortar Mayhem. However, GRU is more efficient on Searing Spotlights. Most notably, in all endless tasks, GRU makes a remarkable resurgence, consistently outperforming TrXL by significant margins. Website and Source Code: https://github.com/MarcoMeter/endless-memory-gym/	翻訳日:2024-06-13 01:08:37 公開日:2024-06-11
# 再び質問して、そして失敗する: 大きな言語モデルの判断における弁証 Ask Again, Then Fail: Large Language Models' Vacillations in Judgment ( http://arxiv.org/abs/2310.02174v5 ) ライセンス: Link先を確認	Qiming Xie, Zengzhi Wang, Yi Feng, Rui Xia,	(参考訳) 現在の会話言語モデルは、たとえ元の判断が正しいとしても、フォローアップされた質問に直面すると、しばしば判断を揺るがす。このウェーブリングは、信頼性の高い応答を生成し、ユーザ信頼を構築する上で大きな課題となる。この問題を包括的に評価するために,この不整合を定量化するための2つの指標とともに,現状の言語モデルに広く存在することを確認するために,‘textsc{Follow-up Questioning Mechanism’を導入する。この問題を緩和するために、我々はクローズドソースモデルの様々な促進戦略を探求し、さらに、合成された高品質な嗜好データを通して、言語モデルに元の正しい判断を維持するためのトレーニングベースのフレームワークである「textsc{Unwavering-FQ}」を開発した。実験により,我々のフレームワークの有効性と,モデルの汎用能力を向上する能力を確認した。 We observe that current conversational language models often waver in their judgments when faced with follow-up questions, even if the original judgment was correct. This wavering presents a significant challenge for generating reliable responses and building user trust. To comprehensively assess this issue, we introduce a \textsc{Follow-up Questioning Mechanism} along with two metrics to quantify this inconsistency, confirming its widespread presence in current language models. To mitigate this issue, we explore various prompting strategies for closed-source models; moreover, we develop a training-based framework \textsc{Unwavering-FQ} that teaches language models to maintain their originally correct judgments through synthesized high-quality preference data. Our experimental results confirm the effectiveness of our framework and its ability to enhance the general capabilities of models.	翻訳日:2024-06-13 01:08:37 公開日:2024-06-11
# WeatherGNN:局所数値気象予測バイアス補正のための爆発的気象・空間的依存性 WeatherGNN: Exploiting Meteo- and Spatial-Dependencies for Local Numerical Weather Prediction Bias-Correction ( http://arxiv.org/abs/2310.05517v2 ) ライセンス: Link先を確認	Binqing Wu, Weiqi Chen, Wengwei Wang, Bingqing Peng, Liang Sun, Ling Chen,	(参考訳) 局所的な情報不足のため、数値天気予報(NWP)は特定の領域に偏りをもたらす可能性がある。これまでの研究では、主に手作りの特徴を用いたり、直感的にデータ駆動手法を適用して、気象要因と地域間の複雑な依存関係を見極めることでバイアスを補正していた。そこで本稿では,グラフニューラルネットワーク(GNN)を用いた局所的なNWPバイアス補正手法であるWeatherGNNを提案する。具体的には,空間的不均一性に基づく地域固有の気象依存性を適応的に捉える因子GNNと,トブラーの第1法則と第2法則で導かれる動的空間依存性を効率的に捉えるための高速階層GNNを導入する。 2つの実世界のデータセットに対する実験結果から、WeatherGNNは最先端のパフォーマンスを達成し、RMSEの平均4.75 %で最高のベースラインを上回りました。 Due to insufficient local area information, numerical weather prediction (NWP) may yield biases for specific areas. Previous studies correct biases mainly by employing handcrafted features or applying data-driven methods intuitively, overlooking the complicated dependencies between weather factors and between areas. To address this issue, we propose WeatherGNN, a local NWP bias-correction method that utilizes Graph Neural Networks (GNNs) to exploit meteorological dependencies and spatial dependencies under the guidance of domain knowledge. Specifically, we introduce a factor GNN to capture area-specific meteorological dependencies adaptively based on spatial heterogeneity and a fast hierarchical GNN to capture dynamic spatial dependencies efficiently guided by Tobler's first and second laws of geography. Our experimental results on two real-world datasets demonstrate that WeatherGNN achieves the state-of-the-art performance, outperforming the best baseline with an average of 4.75 \% on RMSE.	翻訳日:2024-06-13 00:58:30 公開日:2024-06-11
# 医療用大規模言語モデルに関する調査--データ・技術・応用から説明責任・倫理まで A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics ( http://arxiv.org/abs/2310.05694v2 ) ライセンス: Link先を確認	Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, Erik Cambria,	(参考訳) 医療分野における大規模言語モデル(LLM)の利用は、特定の専門知識を持つフリーテキストクエリに効果的に対応する能力によって、興奮と懸念の両方を引き起こしている。この調査は、現在開発中のLLMs for Healthcareの機能の概要と、その開発プロセスを詳述し、従来のPLM(Pretrained Language Models)からLCM(LCM)への開発ロードマップの概要を提供することを目的としている。具体的には、まずLLMの可能性を探求し、様々なヘルスケアアプリケーションの有効性と効果を、強みと限界の両方を強調した。第2に,従来のPLMと最新のLSMの比較を行い,様々なLSMの比較を行った。次に、関連するヘルスケアトレーニングデータ、トレーニング方法、最適化戦略、使用法について要約する。最後に、医療環境におけるLCMの展開に関するユニークな懸念、特に公平性、説明責任、透明性、倫理について検討する。本調査は,コンピュータ科学と医療の両分野の観点から総合的な調査を行っている。ヘルスケアに関する議論の他に、アクセス可能なデータセット、最新の方法論、コード実装、Githubでの評価ベンチマークなどのオープンソースリソースのコレクションをコンパイルすることで、コンピュータサイエンスコミュニティを支援しています。早急に、PLMからLLMへの移行という重要なパラダイムシフトが進行中である、と私たちは主張する。このシフトには、差別的なAIアプローチから生成的なAIアプローチへの移行、モデル中心の方法論からデータ中心の方法論への移行が含まれる。また、医療におけるLLMの使用の最大の障害は、公正性、説明責任、透明性、倫理であると判断した。 The utilization of large language models (LLMs) in the Healthcare domain has generated both excitement and concern due to their ability to effectively respond to freetext queries with certain professional knowledge. This survey outlines the capabilities of the currently developed LLMs for Healthcare and explicates their development process, with the aim of providing an overview of the development roadmap from traditional Pretrained Language Models (PLMs) to LLMs. Specifically, we first explore the potential of LLMs to enhance the efficiency and effectiveness of various Healthcare applications highlighting both the strengths and limitations. Secondly, we conduct a comparison between the previous PLMs and the latest LLMs, as well as comparing various LLMs with each other. Then we summarize related Healthcare training data, training methods, optimization strategies, and usage. Finally, the unique concerns associated with deploying LLMs in Healthcare settings are investigated, particularly regarding fairness, accountability, transparency and ethics. Our survey provide a comprehensive investigation from perspectives of both computer science and Healthcare specialty. Besides the discussion about Healthcare concerns, we supports the computer science community by compiling a collection of open source resources, such as accessible datasets, the latest methodologies, code implementations, and evaluation benchmarks in the Github. Summarily, we contend that a significant paradigm shift is underway, transitioning from PLMs to LLMs. This shift encompasses a move from discriminative AI approaches to generative AI approaches, as well as a shift from model-centered methodologies to data-centered methodologies. Also, we determine that the biggest obstacle of using LLMs in Healthcare are fairness, accountability, transparency and ethics.	翻訳日:2024-06-13 00:58:30 公開日:2024-06-11
# 生成モデルの分散評価について On the Distributed Evaluation of Generative Models ( http://arxiv.org/abs/2310.11714v4 ) ライセンス: Link先を確認	Zixiao Wang, Farzan Farnia, Zhenghao Lin, Yunheng Shen, Bei Yu,	(参考訳) 深部生成モデルの評価は、単一の確率分布から参照データが引き出される集中的な環境で広範囲に研究されている。一方、生成モデルのいくつかの応用は、ネットワーク内の複数のクライアントによって評価を行うための参照データを提供するフェデレーション学習設定など、分散設定に関するものである。本稿では,クライアント間での不均一なデータ分布を持つ分散コンテキストにおける生成モデルの評価について検討する。 FID(Fr'echet Inception Distance)とKID(Kernel Inception Distance)に焦点をあてた。 KID測定の場合、クライアントの平均KIDスコアを用いた生成モデルのスコアは、すべてのクライアントのデータを含む集合参照セットに対して集中的なKID評価と同じランキングとなることが証明される。対照的に、FIDに基づく評価には、同じ結果が当てはまらない。分散環境では、各クライアントが2つの生成モデルに同じFIDスコアを割り当てるが、2つのモデルの集中的なFIDスコアは著しく異なる。我々は、FIDとKIDスコアを用いた生成モデルの分散評価に関する理論的結果を支援するために、標準画像データセットと生成モデルに関する数値実験を行った。 The evaluation of deep generative models has been extensively studied in the centralized setting, where the reference data are drawn from a single probability distribution. On the other hand, several applications of generative models concern distributed settings, e.g. the federated learning setting, where the reference data for conducting evaluation are provided by several clients in a network. In this paper, we study the evaluation of generative models in such distributed contexts with potentially heterogeneous data distributions across clients. We focus on the widely-used distance-based evaluation metrics, Fr\'echet Inception Distance (FID) and Kernel Inception Distance (KID). In the case of KID metric, we prove that scoring a group of generative models using the clients' averaged KID score will result in the same ranking as that of a centralized KID evaluation over a collective reference set containing all the clients' data. In contrast, we show the same result does not apply to the FID-based evaluation. We provide examples in which two generative models are assigned the same FID score by each client in a distributed setting, while the centralized FID scores of the two models are significantly different. We perform several numerical experiments on standard image datasets and generative models to support our theoretical results on the distributed evaluation of generative models using FID and KID scores.	翻訳日:2024-06-13 00:58:30 公開日:2024-06-11
# 機械学習モデルにおけるメンバーシップ推論攻撃の基本的限界 Fundamental Limits of Membership Inference Attacks on Machine Learning Models ( http://arxiv.org/abs/2310.13786v4 ) ライセンス: Link先を確認	Eric Aubinais, Elisabeth Gassiat, Pablo Piantanida,	(参考訳) メンバーシップ推論攻撃(MIA)は、特定のデータポイントがトレーニングデータセットの一部であったかどうかを明らかにすることができる。本稿では、機械学習モデルにおけるMIAに関する基本的な統計的制限を探索することによって理論的保証を提供する。より正確には、このような攻撃の有効性と成功を左右する統計量を導出する。理論的には、オーバーフィッティングアルゴリズムを用いた非線形回帰設定では、攻撃が成功する確率が高いことが証明される。最後に、この量の利害関係を示すいくつかの状況について検討する。興味深いことに、我々の発見はデータの識別がアルゴリズムのセキュリティを高める可能性を示唆している。具体的には、基礎となるデータ分布の多様性を定量化する定数によって制限されることが示されている。それらの結果を2つの簡単なシミュレーションで説明する。 Membership inference attacks (MIA) can reveal whether a particular data point was part of the training dataset, potentially exposing sensitive information about individuals. This article provides theoretical guarantees by exploring the fundamental statistical limitations associated with MIAs on machine learning models. More precisely, we first derive the statistical quantity that governs the effectiveness and success of such attacks. We then theoretically prove that in a non-linear regression setting with overfitting algorithms, attacks may have a high probability of success. Finally, we investigate several situations for which we provide bounds on this quantity of interest. Interestingly, our findings indicate that discretizing the data might enhance the algorithm's security. Specifically, it is demonstrated to be limited by a constant, which quantifies the diversity of the underlying data distribution. We illustrate those results through two simple simulations.	翻訳日:2024-06-13 00:58:30 公開日:2024-06-11
# 時系列予測の校正:コンテキスト駆動分布シフトの検出と適応 Calibration of Time-Series Forecasting: Detecting and Adapting Context-Driven Distribution Shift ( http://arxiv.org/abs/2310.14838v2 ) ライセンス: Link先を確認	Mouxiang Chen, Lefei Shen, Han Fu, Zhuo Li, Jianling Sun, Chenghao Liu,	(参考訳) 近年では、時系列予測にディープラーニングモデルを導入することに成功した。データ生成の観点から、既存のモデルは、観測されたか観測されていないかにかかわらず、時間的文脈によって駆動される分布シフトに影響を受けやすいことを示す。このような文脈駆動型分布シフト(CDS)は、特定の文脈内での予測のバイアスを導入し、従来のトレーニングパラダイムに課題を提起する。本稿では,CDSの検出と適応を訓練されたモデルで行うためのユニバーサルキャリブレーション手法を提案する。そこで本研究では,CDS検出装置を「残留型CDS検出器」あるいは「リコンディショナー」と呼び,予測残差とそれに対応するコンテキスト間の相互情報を評価することにより,CDSに対するモデルの脆弱性を定量化する。高いリコンディショナースコアは、重度の感受性を示し、したがってモデル適応を必要とする。このような状況下では、モデルキャリブレーションのための単純かつ強力なアダプタフレームワークを「サンプルレベルのコンテキスト適応(sample-level contextualized adapter)」あるいは「SOLID」と呼ぶ。このフレームワークは、提供されたテストサンプルとコンテキスト的に類似したデータセットのキュレーションと、その後に限られたステップ数でモデルの予測層の微調整を含む。我々の理論的分析は、この適応戦略が最適なバイアス分散トレードオフを達成することを実証している。特に,提案したReconditionorとSOLIDはモデルに依存しず,幅広いモデルに容易に適応可能である。大規模な実験により,SOLIDは実世界のデータセット,特に提案したリコンディショナーによって検出されたCDSのケースにおいて,現在の予測モデルの性能を一貫して向上し,キャリブレーション手法の有効性を検証した。 Recent years have witnessed the success of introducing deep learning models to time series forecasting. From a data generation perspective, we illustrate that existing models are susceptible to distribution shifts driven by temporal contexts, whether observed or unobserved. Such context-driven distribution shift (CDS) introduces biases in predictions within specific contexts and poses challenges for conventional training paradigms. In this paper, we introduce a universal calibration methodology for the detection and adaptation of CDS with a trained model. To this end, we propose a novel CDS detector, termed the "residual-based CDS detector" or "Reconditionor", which quantifies the model's vulnerability to CDS by evaluating the mutual information between prediction residuals and their corresponding contexts. A high Reconditionor score indicates a severe susceptibility, thereby necessitating model adaptation. In this circumstance, we put forth a straightforward yet potent adapter framework for model calibration, termed the "sample-level contextualized adapter" or "SOLID". This framework involves the curation of a contextually similar dataset to the provided test sample and the subsequent fine-tuning of the model's prediction layer with a limited number of steps. Our theoretical analysis demonstrates that this adaptation strategy can achieve an optimal bias-variance trade-off. Notably, our proposed Reconditionor and SOLID are model-agnostic and readily adaptable to a wide range of models. Extensive experiments show that SOLID consistently enhances the performance of current forecasting models on real-world datasets, especially on cases with substantial CDS detected by the proposed Reconditionor, thus validating the effectiveness of the calibration approach.	翻訳日:2024-06-13 00:58:30 公開日:2024-06-11
# 生成拡散モデルの統計熱力学:相転移、対称性の破れ、臨界不安定性 The statistical thermodynamics of generative diffusion models: Phase transitions, symmetry breaking and critical instability ( http://arxiv.org/abs/2310.17467v3 ) ライセンス: Link先を確認	Luca Ambrogioni,	(参考訳) 生成拡散モデルは、機械学習と生成モデリングの多くの領域において、目覚ましい性能を達成した。これらのモデルの背後にある基本的な考え方は、非平衡物理学、変分推論、確率計算であるが、この記事では、これらのモデルの多くの側面が平衡統計力学のツールを用いて理解可能であることを示す。この再構成を用いて、生成拡散モデルが対称性の破れ現象に対応する2次相転移を行うことを示す。これらの相転移は常に平均場普遍性クラスであり、生成力学における自己整合状態の結果であることを示す。相転移から生じる臨界不安定性は、その生成能力の中心にあり、これは平均場臨界指数によって特徴づけられる。最後に、生成過程の動的方程式は、系を熱平衡に保ちながら自由エネルギーを最小化する確率的断熱変換と解釈できることを示す。 Generative diffusion models have achieved spectacular performance in many areas of machine learning and generative modeling. While the fundamental ideas behind these models come from non-equilibrium physics, variational inference and stochastic calculus, in this paper we show that many aspects of these models can be understood using the tools of equilibrium statistical mechanics. Using this reformulation, we show that generative diffusion models undergo second-order phase transitions corresponding to symmetry breaking phenomena. We show that these phase-transitions are always in a mean-field universality class, as they are the result of a self-consistency condition in the generative dynamics. We argue that the critical instability that arises from the phase transitions lies at the heart of their generative capabilities, which are characterized by a set of mean-field critical exponents. Finally, we show that the dynamic equation of the generative process can be interpreted as a stochastic adiabatic transformation that minimizes the free energy while keeping the system in thermal equilibrium.	翻訳日:2024-06-13 00:58:30 公開日:2024-06-11
# MADGF:マルチエージェントデータ生成フレームワーク MADGF: Multi-Agent Data Generation Framework ( http://arxiv.org/abs/2310.17953v3 ) ライセンス: Link先を確認	Peng Xie, Kani Chen,	(参考訳) 自動音声認識(ASR)システムは、主にモノリンガル入力に対応し、混合言語音声によってもたらされる複雑さに対処する。本稿では,この課題に対処する新しいマルチエージェントデータ生成フレームワーク(MADGF)を提案する。我々はオープンソースの多言語ASRモデルWhisperを微調整し、生成したMixed Cantonese and English (MCE)オーディオデータセットを利用して、14.28%、35.13%のMER(Mix Error Rate)を達成した。一方、単一言語認識能力は影響を受けておらず、共通音声zh-HKの文字誤り率(CER)が12.6%、共通音声enの単語誤り率(WER)が14.8%である。しかし、これらの指標はASRシステムにとって重要な全ての側面を包含していない。そこで本研究では,FAL(Original Audio, Accuracy, and Latency)に対するフィデリティ(Fidelity)と呼ばれる新しい評価指標を提案する。 Automatic Speech Recognition (ASR) systems predominantly cater to monolingual inputs and struggle with the complexity introduced by mixed language audio. In this paper, we present a novel Multi-Agent Data Generation Framework (MADGF) to address this challenge. We finetune the open-source multilingual ASR model, Whisper, utilizing our generated Mixed Cantonese and English (MCE) audio dataset, Which achieved an impressive Mix Error Rate (MER) of 14.28%, 35.13% lower than the original model. Meanwhile, single language recognition ability is not affected, 12.6% Character Error Rate (CER) in Common voice zh-HK, 14.8% Word Error Rate (WER) in Common voice en. However, these metrics do not encompass all aspects critical to the ASR systems. Hence, we propose a novel evaluation metric called Fidelity to the Original Audio, Accuracy, and Latency (FAL).	翻訳日:2024-06-13 00:58:30 公開日:2024-06-11
# CXRインベディングにおける保護された特徴バイアスの緩和のためのポストホック直交化法 Post-hoc Orthogonalization for Mitigation of Protected Feature Bias in CXR Embeddings ( http://arxiv.org/abs/2311.01349v2 ) ライセンス: Link先を確認	Tobias Weber, Michael Ingrisch, Bernd Bischl, David Rügamer,	(参考訳) 目的: 深層学習モデルの胸部X線写真埋め込みにおける保護された特徴効果を分析し, 除去すること。方法: 直交化は、CXR埋め込みにおける保護された特徴(例えば、年齢、性別、人種)の影響を除去し、特徴に依存しない結果を保証するために使用される。提案手法の有効性を検証するため,MIMICおよびCheXpertデータセットを3つの事前学習モデル,すなわち教師付きコントラストモデル,自己監督型コントラストモデル,ベースライン分類器モデルを用いて遡及的に検討した。我々の統計分析では、保護された特徴の影響を推定し、その2種類の埋め込みを用いて、人種、年齢、性別を予測する能力を評価することによって、オリジナルと直交した埋め込みを比較した。結果: 本実験により, 保護された特徴が病態の予測に有意な影響を及ぼすことが明らかとなった。直交化を適用することで、これらの特徴が取り除かれる。病理学分類へのいかなる影響も取り除きながら、競争的な予測性能を維持しながら、直交した埋め込みにより、保護された属性を直接予測し、サブグループの格差を軽減することは不可能である。結論: 胸部X線画像分類における直交化手法の応用と評価について検討した。 Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in CXR embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trained models, namely a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our statistical analysis involves comparing the original versus the orthogonalized embeddings by estimating protected feature influences and evaluating the ability to predict race, age, or sex using the two types of embeddings. Results: Our experiments reveal a significant influence of protected features on predictions of pathologies. Applying orthogonalization removes these feature effects. Apart from removing any influence on pathology classification, while maintaining competitive predictive performance, orthogonalized embeddings further make it infeasible to directly predict protected attributes and mitigate subgroup disparities. Conclusion: The presented work demonstrates the successful application and evaluation of the orthogonalization technique in the domain of chest X-ray image classification.	翻訳日:2024-06-13 00:58:30 公開日:2024-06-11
# 校正のためのデータに混ざり合わせる Tailoring Mixup to Data for Calibration ( http://arxiv.org/abs/2311.01434v2 ) ライセンス: Link先を確認	Quentin Bouniot, Pavlo Mozharovskyi, Florence d'Alché-Buc,	(参考訳) これまで提案されてきたすべてのデータ拡張技術の中で、Mixupと呼ばれるトレーニングサンプルの線形補間は、大規模なアプリケーションパネルに有効であることが判明した。パフォーマンスの改善に加えて、Mixupはキャリブレーションと予測の不確実性を改善するための優れたテクニックでもある。しかし、データを不注意に混合すると、多様体の侵入、すなわち、割り当てられた合成ラベルと真のラベル分布との衝突が生じ、キャリブレーションが悪化する。この研究では、データ間の距離が混合されるにつれて、多様体の侵入の可能性が増加することを論じる。そこで本研究では,試料間の類似度に応じて補間係数の基底分布を動的に変化させることを提案し,多様性を損なうことなく適用可能なフレキシブルな枠組みを定義する。提案手法はモデルの性能とキャリブレーションを改善するとともに,より効率的であることを示す。私たちの作業のコードはhttps://github.com/qbouniot/sim_kernel_mixup.comで公開されています。 Among all data augmentation techniques proposed so far, linear interpolation of training samples, also called Mixup, has found to be effective for a large panel of applications. Along with improved performance, Mixup is also a good technique for improving calibration and predictive uncertainty. However, mixing data carelessly can lead to manifold intrusion, i.e., conflicts between the synthetic labels assigned and the true label distributions, which can deteriorate calibration. In this work, we argue that the likelihood of manifold intrusion increases with the distance between data to mix. To this end, we propose to dynamically change the underlying distributions of interpolation coefficients depending on the similarity between samples to mix, and define a flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves performance and calibration of models, while being much more efficient. The code for our work is available at https://github.com/qbouniot/sim_kernel_mixup.	翻訳日:2024-06-13 00:58:30 公開日:2024-06-11
# 量子エンタングルメントの超微細構造 Hyperfine Structure of Quantum Entanglement ( http://arxiv.org/abs/2311.01997v2 ) ライセンス: Link先を確認	Liang-Hong Mo, Yao Zhou, Jia-Rui Sun, Peng Ye,	(参考訳) 量子多体系や量子重力を理解するのに不可欠な量子絡み合いは、フォン・ノイマンのエントロピー、相互情報、絡み合いの輪郭などの様々な測度を通じて、それぞれ固有の制限とともに評価される。本研究では,微細構造として知られる絡み合いの輪郭を粒子数累積に分解する「絡み合いの超微細構造」を導入する。この測度は、量子情報科学においてその重要性を持つ普遍的な性質の集合を示す。フェルミ気体では、相互情報との接続を確立し、共形場理論と相互作用する、AdS$_3$/CFT$_2$ホログラフィック双対性、より微細なサブリージョン-サブリージョン双対性を明らかにし、バルク再構成を拡大する、チャーン絶縁体では異なる量子相を区別する。我々の研究結果は、物理的システム間の量子絡み合いに関する新たな洞察を提供する、実験的アクセシビリティーを示唆している。 Quantum entanglement, crucial for understanding quantum many-body systems and quantum gravity, is commonly assessed through various measures such as von Neumann entropy, mutual information, and entanglement contour, each with its inherent limitations. In this work, we introduce the \textit{hyperfine structure of entanglement}, which dissects entanglement contours known as the fine structure into particle-number cumulants. This measure exhibits a set of universal properties with its significance in quantum information science. We apply it across diverse contexts: in Fermi gases, establishing connections to mutual information and interacting conformal field theory; in AdS$_3$/CFT$_2$ holographic duality, unveiling finer subregion-subregion duality and extending bulk reconstruction; and in Chern insulators, distinguishing between different quantum phases. Our findings suggest experimental accessibility, offering fresh insights into quantum entanglement across physical systems.	翻訳日:2024-06-13 00:58:30 公開日:2024-06-11
# PcLast: 計画可能な継続的遅延状態を発見する PcLast: Discovering Plannable Continuous Latent States ( http://arxiv.org/abs/2311.03534v2 ) ライセンス: Link先を確認	Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, Lekan Molu, Miro Dudik, John Langford, Alex Lamb,	(参考訳) 目標条件付プランニングは、豊富な観測の学習された低次元表現から恩恵を受ける。可変オートエンコーダや逆ダイナミクスから学習されるコンパクトな潜在表現は、ゴール条件付き意思決定を可能にするが、状態到達性を無視し、パフォーマンスを阻害する。本稿では,有効な計画立案と目標条件付き政策学習のために,到達可能な状態を関連付ける表現を学習する。まず、多段階の逆ダイナミクスを持つ潜在表現を学習し、次にこの表現を$\ell_2$空間で到達可能な状態に関連付けるように変換する。提案手法は各種シミュレーションテストベッドで厳密に検証されている。報酬に基づく設定の数値計算の結果、サンプリング効率が大幅に向上した。さらに、報酬のない設定では、このアプローチは計算効率のよい階層的計画を可能にする階層化された状態抽象化が得られ、追加のサンプルはゼロとなる。 Goal-conditioned planning benefits from learned low-dimensional representations of rich observations. While compact latent representations typically learned from variational autoencoders or inverse dynamics enable goal-conditioned decision making, they ignore state reachability, hampering their performance. In this paper, we learn a representation that associates reachable states together for effective planning and goal-conditioned policy learning. We first learn a latent representation with multi-step inverse dynamics (to remove distracting information), and then transform this representation to associate reachable states together in $\ell_2$ space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based settings show significant improvements in sampling efficiency. Further, in reward-free settings this approach yields layered state abstractions that enable computationally efficient hierarchical planning for reaching ad hoc goals with zero additional samples.	翻訳日:2024-06-13 00:58:30 公開日:2024-06-11
# プレトレーニングトランスにおけるマルチモーダルニューロンの発見と編集 Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers ( http://arxiv.org/abs/2311.07470v2 ) ライセンス: Link先を確認	Haowen Pan, Yixin Cao, Xiaozhi Wang, Xun Yang, Meng Wang,	(参考訳) マルチモーダル大言語モデル(LLM)が異なるモダリティを解釈し、相互モーダル表現を統合する内部メカニズムを理解することは、アカデミックと産業の両方において継続的な改善のためにますます重要になっている。本稿では,多モードLLMが視覚的およびテキスト的概念をどう橋渡しするかを解釈し,重要なニューロンを識別する新しい手法を提案する。本手法は,コストのかかる勾配計算の必要性を取り除き,効率と適用範囲の従来の作業を改善する。同定されたニューロンに基づいて, センシティブな単語や幻覚を軽減できる多モーダルな知識編集手法を設計する。設計の合理性については、理論的な仮定を提供する。実験的な評価のために、我々は広範囲にわたる定量的および定性的な実験を行った。この結果は,本手法の有効性を検証するだけでなく,マルチモーダルニューロンの感度,特異性,因果効果の3つの重要な特徴を浮き彫りにし,今後の研究に光を当てることにも寄与する。 Understanding the internal mechanisms by which multi-modal large language models (LLMs) interpret different modalities and integrate cross-modal representations is becoming increasingly critical for continuous improvements in both academia and industry. In this paper, we propose a novel method to identify key neurons for interpretability -- how multi-modal LLMs bridge visual and textual concepts for captioning. Our method improves conventional works upon efficiency and applied range by removing needs of costly gradient computation. Based on those identified neurons, we further design a multi-modal knowledge editing method, beneficial to mitigate sensitive words or hallucination. For rationale of our design, we provide theoretical assumption. For empirical evaluation, we have conducted extensive quantitative and qualitative experiments. The results not only validate the effectiveness of our methods, but also offer insightful findings that highlight three key properties of multi-modal neurons: sensitivity, specificity and causal-effect, to shed light for future research.	翻訳日:2024-06-13 00:58:30 公開日:2024-06-11
# 正規化フローは指数メカニズムの解錠の鍵か? Are Normalizing Flows the Key to Unlocking the Exponential Mechanism? ( http://arxiv.org/abs/2311.09200v4 ) ライセンス: Link先を確認	Robert A. Bridges, Vandy J. Tombs, Christopher B. Stanley,	(参考訳) プライベート最適化のために設計された指数メカニズム(ExpM, Exponential Mechanism)は、一般に難解な密度からサンプリングする必要があり、目的関数の感度を抑えるために、歴史的に連続的なサンプル空間での使用から傍観されてきた。任意の差分プライバシー(DP)メカニズムはExpMとしてインスタンス化することができ、ExpMはDPSGD固有の非効率性を回避する、プライベート機械学習(ML)のためのエレガントなソリューションを提供する。本稿では,密度学習のための表現型深層ネットワークである正規化フロー (NF) を用いて,プライベート最適化のためのExpMとMLの操作を,ExpM密度からおよそのサンプルに適用することを目的とする。 ExpM+NF法はモデルトレーニングのためのSGD法に代わる方法である。サンプリング法でExpMの使用を許可する$\ell^2$損失に対する感度境界を証明した。そこで本研究では,MIMIC-IIIの健康データを用いて,SGD,DPSGD,ExpM+NF訓練法の精度とトレーニング時間を比較した。 ExpM+NFのサンプルモデルでは,非プライベートなSGDと同程度の精度でDPSGDよりも精度が高く,ExpM+NFはOpacusのDPSGDよりも高速であることがわかった。 NF近似のプライバシー証明が得られないため、カルリーニらによるLiRAメンバーシップ推論攻撃や、Steinkeらによる最近のプライバシー監査下限法などを含むプライバシーを調査するための実証的な結果が得られ、ExpM+NFは非プライベートなSGDよりもプライバシーを提供するが、DPSGDほどではないことが示唆されている。この研究の補助的な利点は、MIMIC-IIIの医療データにSOTAのプライバシーと精度をプッシュすること、ベイズ推論にExpM+NFを使うこと、実証的なプライバシ監査の限界を示すこと、分散学習に適用可能ないくつかのプライバシー定理を提供することである。 The Exponential Mechanism (ExpM), designed for private optimization, has been historically sidelined from use on continuous sample spaces, as it requires sampling from a generally intractable density, and, to a lesser extent, bounding the sensitivity of the objective function. Any differential privacy (DP) mechanism can be instantiated as ExpM, and ExpM poses an elegant solution for private machine learning (ML) that bypasses inherent inefficiencies of DPSGD. This paper seeks to operationalize ExpM for private optimization and ML by using an auxiliary Normalizing Flow (NF), an expressive deep network for density learning, to approximately sample from ExpM density. The method, ExpM+NF is an alternative to SGD methods for model training. We prove a sensitivity bound for the $\ell^2$ loss permitting ExpM use with any sampling method. To test feasibility, we present results on MIMIC-III health data comparing (non-private) SGD, DPSGD, and ExpM+NF training methods' accuracy and training time. We find that a model sampled from ExpM+NF is nearly as accurate as non-private SGD, more accurate than DPSGD, and ExpM+NF trains faster than Opacus' DPSGD implementation. Unable to provide a privacy proof for the NF approximation, we present empirical results to investigate privacy including the LiRA membership inference attack of Carlini et al. and the recent privacy auditing lower bound method of Steinke et al. Our findings suggest ExpM+NF provides more privacy than non-private SGD, but not as much as DPSGD, although many attacks are impotent against any model. Ancillary benefits of this work include pushing the SOTA of privacy and accuracy on MIMIC-III healthcare data, exhibiting the use of ExpM+NF for Bayesian inference, showing the limitations of empirical privacy auditing in practice, and providing several privacy theorems applicable to distribution learning.	翻訳日:2024-06-13 00:48:47 公開日:2024-06-11
# NNG-Mix:擬似異常発生による半教師付き異常検出の改善 NNG-Mix: Improving Semi-supervised Anomaly Detection with Pseudo-anomaly Generation ( http://arxiv.org/abs/2311.11961v2 ) ライセンス: Link先を確認	Hao Dong, Gaëtan Frusque, Yue Zhao, Eleni Chatzi, Olga Fink,	(参考訳) 異常検出(AD)は、複雑なシステムにおいて稀かつしばしば重要な事象を識別し、ネットワーク侵入検出、金融詐欺検出、インフラや産業システムにおける故障検出などの分野での応用を見つけるのに不可欠である。 ADは通常、ラベルアノテーションのコストが高いため教師なしの学習タスクとして扱われるが、半教師付き異常検出のように、ドメインの専門家によるラベル付き異常サンプルの小さなセットにアクセスすることはより現実的である。半教師付きおよび教師付きアプローチは、そのようなラベル付きデータを活用することができ、パフォーマンスが向上する。本稿では,ADのための新たな半教師付きあるいは教師付きアプローチを提案する代わりに,ラベル付き異常とラベルなしデータ量に基づいて擬似異常を新たに生成するアルゴリズムを提案する。これは新しい異常の検出を容易にするための拡張として機能する。提案アルゴリズムはNearest Neighbor Gaussian Mixup (NNG-Mix) と名付けられ,ラベル付きデータとラベルなしデータの両方から情報を効率よく統合して擬似アノマリーを生成する。本稿では,このアルゴリズムの性能を,MixupやCutoutといった一般的な拡張手法と比較する。我々は,NNG-Mixの評価を,既存の半教師付きおよび教師付き異常検出アルゴリズムを,生成された擬似異常とともに元のトレーニングデータ上でトレーニングすることで行う。 ADBenchの57のベンチマークデータセットに関する広範な実験を通じて、異なるデータ型を反映し、NNG-Mixが他のデータ拡張手法より優れていることを示す。オリジナルのトレーニングデータにのみトレーニングされたベースラインと比較して、パフォーマンスが大幅に向上する。特に、NNG-MixはADBenchのClassical、CV、NLPデータセットを最大16.4%、8.8%、そして8.0%改善する。ソースコードはhttps://github.com/donghao51/NNG-Mix.comで公開されています。 Anomaly detection (AD) is essential in identifying rare and often critical events in complex systems, finding applications in fields such as network intrusion detection, financial fraud detection, and fault detection in infrastructure and industrial systems. While AD is typically treated as an unsupervised learning task due to the high cost of label annotation, it is more practical to assume access to a small set of labeled anomaly samples from domain experts, as is the case for semi-supervised anomaly detection. Semi-supervised and supervised approaches can leverage such labeled data, resulting in improved performance. In this paper, rather than proposing a new semi-supervised or supervised approach for AD, we introduce a novel algorithm for generating additional pseudo-anomalies on the basis of the limited labeled anomalies and a large volume of unlabeled data. This serves as an augmentation to facilitate the detection of new anomalies. Our proposed algorithm, named Nearest Neighbor Gaussian Mixup (NNG-Mix), efficiently integrates information from both labeled and unlabeled data to generate pseudo-anomalies. We compare the performance of this novel algorithm with commonly applied augmentation techniques, such as Mixup and Cutout. We evaluate NNG-Mix by training various existing semi-supervised and supervised anomaly detection algorithms on the original training data along with the generated pseudo-anomalies. Through extensive experiments on 57 benchmark datasets in ADBench, reflecting different data types, we demonstrate that NNG-Mix outperforms other data augmentation methods. It yields significant performance improvements compared to the baselines trained exclusively on the original training data. Notably, NNG-Mix yields up to 16.4%, 8.8%, and 8.0% improvements on Classical, CV, and NLP datasets in ADBench. Our source code is available at https://github.com/donghao51/NNG-Mix.	翻訳日:2024-06-13 00:48:47 公開日:2024-06-11
# アンダーサンプルMRI再構成のための高速可制御拡散モデル Fast Controllable Diffusion Models for Undersampled MRI Reconstruction ( http://arxiv.org/abs/2311.12078v3 ) ライセンス: Link先を確認	Wei Jiang, Zhuang Xiong, Feng Liu, Nan Ye, Hongfu Sun,	(参考訳) 改良された深層学習法はMRI(MRI)のアンダーサンプル再構成において有望であるが、そのペア化データに対する要求は、MRIの様々な取得パラメータに対する一般化性を制限している。近年、異なるMRI取得のためのペアデータやモデル再構成なしに、アンサンプされたMRI再構成に制御不能な生成拡散モデルが適用されている。しかし、拡散モデルはサンプリングにおいて一般的に遅いため、制御可能な生成プロセスに直接適用した場合、最先端の加速技術は準最適結果をもたらす可能性がある。本研究では,MRI再構成のための拡散モデルの制御可能生成を促進・促進するPredictor-Projector-Noisor (PPN)と呼ばれる新しいアルゴリズムを提案する。以上の結果から, PPNは, 他の制御可能なサンプリング法に比べて, 再構成時間を大幅に短縮した, アンサンプ付きk空間計測に適合した高忠実MR画像を生成することがわかった。さらに、教師なしPPN加速拡散モデルが異なるMRI取得パラメータに適応可能であり、教師付き学習技術よりも臨床的に有用である。 Supervised deep learning methods have shown promise in undersampled Magnetic Resonance Imaging (MRI) reconstruction, but their requirement for paired data limits their generalizability to the diverse MRI acquisition parameters. Recently, unsupervised controllable generative diffusion models have been applied to undersampled MRI reconstruction, without paired data or model retraining for different MRI acquisitions. However, diffusion models are generally slow in sampling and state-of-the-art acceleration techniques can lead to sub-optimal results when directly applied to the controllable generation process. This study introduces a new algorithm called Predictor-Projector-Noisor (PPN), which enhances and accelerates controllable generation of diffusion models for undersampled MRI reconstruction. Our results demonstrate that PPN produces high-fidelity MR images that conform to undersampled k-space measurements with significantly shorter reconstruction time than other controllable sampling methods. In addition, the unsupervised PPN accelerated diffusion models are adaptable to different MRI acquisition parameters, making them more practical for clinical use than supervised learning techniques.	翻訳日:2024-06-13 00:48:47 公開日:2024-06-11
# 部分的可観測強化学習のための効率的な計画付き確率表現 Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning ( http://arxiv.org/abs/2311.12244v3 ) ライセンス: Link先を確認	Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai,	(参考訳) ほとんどの実世界の強化学習アプリケーションでは、状態情報は部分的にしか観測できないため、マルコフ決定プロセスの仮定を破り、状態と観測を分割するアルゴリズムの性能は劣る。一方、部分的に観測可能なマルコフ決定プロセス(POMDPs)は、学習、探索、計画において部分観測可能性を説明するための一般的なフレームワークを提供するが、重要な計算および統計上の課題を提示する。これらの課題に対処するため、部分的な観測から実践的な強化学習のためのコヒーレントな枠組みとトラクタブルなアルゴリズムアプローチへと導く表現に基づく視点を開発する。我々は,提案アルゴリズムの統計的効率を正当化するための理論的解析を行い,提案アルゴリズムが様々なベンチマークで部分的な観測を行い,より実用的な応用に向けて信頼性の高い強化学習を進展させることができることを実証的に証明する。 In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounted for in learning, exploration and planning, but presents significant computational and statistical challenges. To address these difficulties, we develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations. We provide a theoretical analysis for justifying the statistical efficiency of the proposed algorithm, and also empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks, advancing reliable reinforcement learning towards more practical applications.	翻訳日:2024-06-13 00:48:47 公開日:2024-06-11
# Trainwreck:イメージ分類器に対する敵対的攻撃の被害 Trainwreck: A damaging adversarial attack on image classifiers ( http://arxiv.org/abs/2311.14772v2 ) ライセンス: Link先を確認	Jan Zahálka,	(参考訳) 敵攻撃はコンピュータビジョン(CV)にとって重要なセキュリティ上の問題である。 CVモデルは応用実践においてますます価値ある資産となりつつあり、その破壊は経済破壊の一形態として現れつつある。本稿では,標的CVモデルに損傷を与える敵攻撃 (DAA) の探索を行う。 DAAは脅威モデルを定義し、コスト関数DAAを最大化し、成功のための3つの要件 – 有効性、ステルス、カスタマイズ性 – を設定することで形式化される。 DAAのパイオニアであるTrawreck氏は、スチークシー(\epsilon \leq 8/255$)クラスペアの普遍的摂動をサロゲートモデルから得られる訓練データに類似したクラスのデータを混在させる列車時攻撃を提案する。 Trainwreckはブラックボックスで転送可能な攻撃で、ターゲットアーキテクチャの知識を必要とせず、単一の有毒データセットがトレーニングされたモデルのパフォーマンスを劣化させる。 CIFAR-10とCIFAR-100および様々なモデルアーキテクチャ(EfficientNetV2、ResNeXt-101、微調整されたViT-L-16)に関する実験的評価は、Trawreckの有効性を示している。 Trainwreckは、最先端のデータ中毒と比較して、同様の、あるいはより良い有効性を実現し、有毒率パラメータによって完全にカスタマイズできる。最後に、ハッシュによるデータの冗長性は、Trawreckや同様のDAAに対する信頼性の高い防御として識別される。コードはhttps://github.com/JanZahalka/trainwreck.comで公開されている。 Adversarial attacks are an important security concern for computer vision (CV). As CV models are becoming increasingly valuable assets in applied practice, disrupting them is emerging as a form of economic sabotage. This paper opens up the exploration of damaging adversarial attacks (DAAs) that seek to damage target CV models. DAAs are formalized by defining the threat model, the cost function DAAs maximize, and setting three requirements for success: potency, stealth, and customizability. As a pioneer DAA, this paper proposes Trainwreck, a train-time attack that conflates the data of similar classes in the training data using stealthy ($\epsilon \leq 8/255$) class-pair universal perturbations obtained from a surrogate model. Trainwreck is a black-box, transferable attack: it requires no knowledge of the target architecture, and a single poisoned dataset degrades the performance of any model trained on it. The experimental evaluation on CIFAR-10 and CIFAR-100 and various model architectures (EfficientNetV2, ResNeXt-101, and a finetuned ViT-L-16) demonstrates Trainwreck's efficiency. Trainwreck achieves similar or better potency compared to the data poisoning state of the art and is fully customizable by the poison rate parameter. Finally, data redundancy with hashing is identified as a reliable defense against Trainwreck or similar DAAs. The code is available at https://github.com/JanZahalka/trainwreck.	翻訳日:2024-06-13 00:48:47 公開日:2024-06-11
# シミュレーションによるアルゴリズムによる説得 Algorithmic Persuasion Through Simulation ( http://arxiv.org/abs/2311.18138v4 ) ライセンス: Link先を確認	Keegan Harris, Nicole Immorlica, Brendan Lucier, Aleksandrs Slivkins,	(参考訳) 本研究では,受取人に製品購入などの二元的行動を取るよう説得するベイズ説得ゲームについて検討する。送信者は、製品の品質が高いか低いかなどの世界の(バイナリ)状態について通知されるが、受信者の信念やユーティリティに関する情報は限られている。顧客の調査やユーザスタディ、最近のAIの進歩によって動機づけられた私たちは、受信者の振る舞いをシミュレートする託宣をクエリすることで、送信側が受信者についてより深く学ぶことを可能にする。一定の数のクエリの後、送信側はメッセージポリシーにコミットし、受信側は受信したメッセージに対して期待するユーティリティを最大化するアクションを取る。我々は受信側が受信側タイプにまたがる分散を考慮すれば,送信側が最適なメッセージポリシーを特徴付ける。次に,このゲームにおいて,送信者の期待するユーティリティを最適化する多項式時間クエリアルゴリズムを設計する。また、近似オラクル、より一般的なクエリ構造、高価なクエリについても検討しています。 We study a Bayesian persuasion game where a sender wants to persuade a receiver to take a binary action, such as purchasing a product. The sender is informed about the (binary) state of the world, such as whether the quality of the product is high or low, but only has limited information about the receiver's beliefs and utilities. Motivated by customer surveys, user studies, and recent advances in AI, we allow the sender to learn more about the receiver by querying an oracle that simulates the receiver's behavior. After a fixed number of queries, the sender commits to a messaging policy and the receiver takes the action that maximizes her expected utility given the message she receives. We characterize the sender's optimal messaging policy given any distribution over receiver types. We then design a polynomial-time querying algorithm that optimizes the sender's expected utility in this game. We also consider approximate oracles, more general query structures, and costly queries.	翻訳日:2024-06-13 00:48:47 公開日:2024-06-11
# Compact3D:ベクトル量子化によるガウススティングの小型化と高速化 Compact3D: Smaller and Faster Gaussian Splatting with Vector Quantization ( http://arxiv.org/abs/2311.18159v2 ) ライセンス: Link先を確認	KL Navaneet, Kossar Pourahmadi Meibodi, Soroush Abbasi Koohpayegani, Hamed Pirsiavash,	(参考訳) 3D Gaussian Splatting (3DGS)は,SOTA NeRF法と比較して学習時間とレンダリング時間を高速化する3Dラディアンス場をモデリング・レンダリングする新しい手法である。しかし、何百万もの3Dガウスのパラメータを保存する必要があるため、NeRF法に比べてはるかに大きなストレージ需要の欠点がある。ガウス群の大群が類似したパラメータを共有し、K平均アルゴリズムに基づく単純なベクトル量子化法を導入し、ガウスのパラメータを定量化する。次に、小さなコードブックと各ガウス語のコードのインデックスを格納する。我々は、それらをソートし、ラン長エンコーディングに類似した手法を用いることで、インデックスをさらに圧縮する。さらに、ゼロ不透明性(見えないガウス)を奨励する単純な正規化器を用いてガウスの数を減らし、モデルを圧縮し、レンダリングを高速化する。我々は、標準ベンチマークと、この分野で使用されている標準ベンチマークよりも桁違いに大きい既存の3Dデータセットに関する広範な実験を行っている。本稿では,3DGSのストレージコストを40倍から50倍に削減し,レンダリング時間を2倍から3倍に削減し,レンダリング画像の品質を低下させる方法を提案する。 3D Gaussian Splatting (3DGS) is a new method for modeling and rendering 3D radiance fields that achieves much faster learning and rendering time compared to SOTA NeRF methods. However, it comes with the drawback of a much larger storage demand compared to NeRF methods since it needs to store the parameters for millions of 3D Gaussians. We notice that large groups of Gaussians share similar parameters and introduce a simple vector quantization method based on K-means algorithm to quantize the Gaussian parameters. Then, we store the small codebook along with the index of the code for each Gaussian. We compress the indices further by sorting them and using a method similar to run-length encoding. Moreover, we use a simple regularizer that encourages zero opacity (invisible Gaussians) to reduce the number of Gaussians, thereby compressing the model and speeding up the rendering. We do extensive experiments on standard benchmarks as well as an existing 3D dataset that is an order of magnitude larger than the standard benchmarks used in this field. We show that our simple yet effective method can reduce the storage costs for 3DGS by 40 to 50x and rendering time by 2 to 3x with a very small drop in the quality of rendered images.	翻訳日:2024-06-13 00:48:47 公開日:2024-06-11
# 胸部X線撮影のための解剖学的に一貫性のある埋め込みの学習 Learning Anatomically Consistent Embedding for Chest Radiography ( http://arxiv.org/abs/2312.00335v2 ) ライセンス: Link先を確認	Ziyu Zhou, Haozhe Luo, Jiaxuan Pang, Xiaowei Ding, Michael Gotway, Jianming Liang,	(参考訳) 自己教師付き学習(SSL)アプローチは、最近、注釈のない画像から視覚表現を学ぶことに大きな成功を示している。画像と比較すると、同じ画像プロトコルで取得した医用画像は解剖学的に高い一貫性を示す。この解剖学的整合性を利用するために, PEAC (patch embedded of anatomical consistency) と呼ばれる新しいSSLアプローチを導入し, 医用画像解析を行った。具体的には, 安定したグリッドベースマッチング, トレーニング済みPEACモデルを様々な下流タスクに移行し, 1) PEACが既存の最先端の完全・自己管理手法よりもはるかに優れた性能を達成し, (2) PEACは同一患者の視点, 異なる性別, 体重, 健康状態の患者間での解剖学的構造的整合性を把握し, 医療画像解析の手法の解釈可能性を高めることを提案する。 Self-supervised learning (SSL) approaches have recently shown substantial success in learning visual representations from unannotated images. Compared with photographic images, medical images acquired with the same imaging protocol exhibit high consistency in anatomy. To exploit this anatomical consistency, this paper introduces a novel SSL approach, called PEAC (patch embedding of anatomical consistency), for medical image analysis. Specifically, in this paper, we propose to learn global and local consistencies via stable grid-based matching, transfer pre-trained PEAC models to diverse downstream tasks, and extensively demonstrate that (1) PEAC achieves significantly better performance than the existing state-of-the-art fully/self-supervised methods, and (2) PEAC captures the anatomical structure consistency across views of the same patient and across patients of different genders, weights, and healthy statuses, which enhances the interpretability of our method for medical image analysis.	翻訳日:2024-06-13 00:48:47 公開日:2024-06-11
# OpenStereo: ステレオマッチングと強力なベースラインのための総合ベンチマーク OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline ( http://arxiv.org/abs/2312.00343v6 ) ライセンス: Link先を確認	Xianda Guo, Chenming Zhang, Juntao Lu, Yiqi Wang, Yiqun Duan, Tian Yang, Zheng Zhu, Long Chen,	(参考訳) ステレオマッチングは、ロボット工学、自律運転、その他のコンピュータビジョンタスクにおいて重要であるステレオ画像ペア内のマッチングピクセル間の格差を推定することを目的としている。近年、数多くの印象的な手法が開発されているにもかかわらず、実用アプリケーションに最も適したアーキテクチャを決定することは依然として困難である。このギャップに対処するため,本研究では,最適化性能のための個別モデルにのみ焦点をあてるのではなく,実用性を重視した総合的なベンチマークを提案する。具体的には,OpenStereoと呼ばれる,柔軟で効率的なステレオマッチングコードベースを開発する。 OpenStereoには10以上のネットワークモデルのトレーニングと推論コードが含まれています。 OpenStereoに基づいて実験を行い、元の論文で報告されたパフォーマンス指標を達成または超えた。さらに, 立体マッチングにおける最近の展開の総合的分析とデコンストラクションを, 包括的アブレーション実験を通じて実施する。これらの調査により、強力なベースラインモデルであるStereoBaseが誕生した。私たちのStereoBaseは、SceneFlow、KITTI 2015、2012(Reflective)で第1位であり、すべてのメトリクスで最高のパフォーマンスを実現しています。さらに、StereoBaseは強力なクロスデータセットの一般化を持っている。コードは \url{https://github.com/XiandaGuo/OpenStereo} で公開されている。 Stereo matching aims to estimate the disparity between matching pixels in a stereo image pair, which is important to robotics, autonomous driving, and other computer vision tasks. Despite the development of numerous impressive methods in recent years, determining the most suitable architecture for practical application remains challenging. Addressing this gap, our paper introduces a comprehensive benchmark focusing on practical applicability rather than solely on individual models for optimized performance. Specifically, we develop a flexible and efficient stereo matching codebase, called OpenStereo. OpenStereo includes training and inference codes of more than 10 network models, making it, to our knowledge, the most complete stereo matching toolbox available. Based on OpenStereo, we conducted experiments and have achieved or surpassed the performance metrics reported in the original paper. Additionally, we conduct an exhaustive analysis and deconstruction of recent developments in stereo matching through comprehensive ablative experiments. These investigations inspired the creation of StereoBase, a strong baseline model. Our StereoBase ranks 1st on SceneFlow, KITTI 2015, 2012 (Reflective) among published methods and achieves the best performance across all metrics. In addition, StereoBase has strong cross-dataset generalization. Code is available at \url{https://github.com/XiandaGuo/OpenStereo}.	翻訳日:2024-06-13 00:48:47 公開日:2024-06-11
# 人間からのフィードバックからナッシュラーニング Nash Learning from Human Feedback ( http://arxiv.org/abs/2312.00886v4 ) ライセンス: Link先を確認	Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot,	(参考訳) 人間からのフィードバックからの強化学習(RLHF)は、大規模言語モデル(LLM)と人間の嗜好を整合させる主要なパラダイムとして現れている。典型的には、RLHFは人間のフィードバックから報酬モデルを学ぶ最初のステップであり、しばしば事前訓練されたLLMによって生成されるテキスト世代間の好みとして表現される。その後、LLMのポリシーは強化学習アルゴリズムにより報酬モデルを最大化するよう最適化することで微調整される。しかし、現在の報酬モデルの本質的な制限は、人間の嗜好の豊かさとサンプリング分布への依存を完全に表現できないことである。本研究では,LLMの微調整のための代替パイプラインを提案する。提案手法は,提案する2つの入力に条件付けされた嗜好モデルの初期学習を伴い,その後に,競合する政策よりも好まれる応答を一貫して生成するポリシーを追求し,この選好モデルのナッシュ均衡を定義する。このアプローチを人間のフィードバック(NLHF)からナッシュラーニング(Nash Learning)と呼ぶ。表形式のポリシー表現の文脈では、ミラー降下原理に基づく新しいアルゴリズム的解であるナッシュ-MDを提示する。このアルゴリズムは一連のポリシーを生成し、最後の繰り返しは正規化されたナッシュ平衡に収束する。さらに、ポリシーのパラメトリック表現について検討し、ディープラーニングアーキテクチャのための勾配降下アルゴリズムを導入する。提案手法の有効性を示すために,テキスト要約タスクにおけるLLMの微調整を含む実験結果を提案する。我々はNLHFが、LLMと人間の嗜好を整合させる分野を前進させる可能性を秘め、嗜好学習と政策最適化のための魅力的な道を提供すると考えている。 Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human preferences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to maximize the reward model through a reinforcement learning algorithm. However, an inherent limitation of current reward models is their inability to fully represent the richness of human preferences and their dependency on the sampling distribution. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. To demonstrate the effectiveness of our approach, we present experimental results involving the fine-tuning of a LLM for a text summarization task. We believe NLHF offers a compelling avenue for preference learning and policy optimization with the potential of advancing the field of aligning LLMs with human preferences.	翻訳日:2024-06-13 00:48:47 公開日:2024-06-11
# 非線形計測モデルを用いた拡散後サンプリングによるCT再構成 CT Reconstruction using Diffusion Posterior Sampling conditioned on a Nonlinear Measurement Model ( http://arxiv.org/abs/2312.01464v2 ) ライセンス: Link先を確認	Shudong Li, Xiao Jiang, Matthew Tivnan, Grace J. Gang, Yuan Shen, J. Webster Stayman,	(参考訳) 拡散モデルはCTの再構成と復元における画像生成のための強力なディープラーニングツールとして実証されてきた。近年,高画質CT画像の高画質化のために,スコアベース拡散前の拡散を確率モデルと組み合わせた拡散後サンプリングが用いられている。この技術は、1回で教師なしのCT事前トレーニングを可能にするので魅力的であり、任意のデータモデルに組み込むことができる。しかし、現在の手法は画像の再構成や復元にX線CT物理の線形モデルに依存している。伝送トモグラフィー再構成問題を線形化することは一般的であるが、これは真および本質的に非線形フォワードモデルに対する近似である。本研究では,拡散後サンプリングによる非線形CT画像再構成の逆問題を解決する手法を提案する。本研究では, 従来の非条件拡散モデルを用いて, 事前スコア関数推定器を訓練し, ベイズ則を適用して, 非線形物理モデルから導出した測度スコア関数と組み合わせて, 逆時間拡散過程のサンプリングに使用できる後方スコア関数に到達させる。このプラグ・アンド・プレイ法は, 一般化された非線形CT画像再構成を, 追加の訓練を必要とせず, 異なる前方モデルで複数のCTシステム設計に組み込むことができる。本研究では, 高速化処理のための順序付きサブセット変種を含むこの再構成を行うアルゴリズムを開発し, 事前の教師なしトレーニングを用いて, 完全サンプル化低線量データとスパースビュージオメトリの両方でその手法を実証する。 Diffusion models have been demonstrated as powerful deep learning tools for image generation in CT reconstruction and restoration. Recently, diffusion posterior sampling, where a score-based diffusion prior is combined with a likelihood model, has been used to produce high quality CT images given low-quality measurements. This technique is attractive since it permits a one-time, unsupervised training of a CT prior; which can then be incorporated with an arbitrary data model. However, current methods rely on a linear model of x-ray CT physics to reconstruct or restore images. While it is common to linearize the transmission tomography reconstruction problem, this is an approximation to the true and inherently nonlinear forward model. We propose a new method that solves the inverse problem of nonlinear CT image reconstruction via diffusion posterior sampling. We implement a traditional unconditional diffusion model by training a prior score function estimator, and apply Bayes rule to combine this prior with a measurement likelihood score function derived from the nonlinear physical model to arrive at a posterior score function that can be used to sample the reverse-time diffusion process. This plug-and-play method allows incorporation of a diffusion-based prior with generalized nonlinear CT image reconstruction into multiple CT system designs with different forward models, without the need for any additional training. We develop the algorithm that performs this reconstruction, including an ordered-subsets variant for accelerated processing and demonstrate the technique in both fully sampled low dose data and sparse-view geometries using a single unsupervised training of the prior.	翻訳日:2024-06-13 00:48:47 公開日:2024-06-11
# マトリックスの不具合? Fakepediaによる言語モデル検索と検出 A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia ( http://arxiv.org/abs/2312.02073v3 ) ライセンス: Link先を確認	Giovanni Monea, Maxime Peyrard, Martin Josifoski, Vishrav Chaudhary, Jason Eisner, Emre Kıcıman, Hamid Palangi, Barun Patra, Robert West,	(参考訳) 大規模言語モデル(LLM)は、そのコンテキストで提供される新しい情報を引き出すという印象的な能力を持つ。しかし、特に文脈情報がパラメータに格納されている事実的知識と矛盾する状況において、この文脈的基盤のメカニズムは依然として不明であり、LLMはリコール時にも優れている。検索強化された生成手法では、コンテキストを最新の情報で豊かにすることで、グラウンドディングが古い記憶された知識を正し、ノイズを生じさせる可能性があることを期待する。本稿では,モデルの内部パラメトリック知識と衝突するために構築された,対物文の新たなデータセットであるFakepediaを用いて,接地能力を研究する新しい手法を提案する。本研究では,内的パラメトリック知識が文脈情報と衝突した場合の接地能力を評価するために設計された対物データセットであるFakepediaを紹介する。我々は,Fakepedia を用いて様々な LLM をベンチマークし,Musked Grouped Causal Tracing (MGCT) 法に基づいて Fakepedia クエリに応答する際の LLM 成分の因果媒介分析を行う。この分析により, 接地応答と非接地応答の異なる計算パターンを同定する。最後に, 地下応答と接地応答の区別が, 計算解析のみで達成できることを実証した。本研究は, 現実的リコール機構に関する既存の知見とともに, 現実的リコール機構と接地的リコール機構がLLM内でどのように作用するかについて, 一貫性のある物語を提供する。 Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context. Yet the mechanisms underlying this contextual grounding remain unknown, especially in situations where contextual information contradicts factual knowledge stored in the parameters, which LLMs also excel at recalling. Favoring the contextual information is critical for retrieval-augmented generation methods, which enrich the context with up-to-date information, hoping that grounding can rectify outdated or noisy stored knowledge. We present a novel method to study grounding abilities using Fakepedia, a novel dataset of counterfactual texts constructed to clash with a model's internal parametric knowledge. In this study, we introduce Fakepedia, a counterfactual dataset designed to evaluate grounding abilities when the internal parametric knowledge clashes with the contextual information. We benchmark various LLMs with Fakepedia and conduct a causal mediation analysis of LLM components when answering Fakepedia queries, based on our Masked Grouped Causal Tracing (MGCT) method. Through this analysis, we identify distinct computational patterns between grounded and ungrounded responses. We finally demonstrate that distinguishing grounded from ungrounded responses is achievable through computational analysis alone. Our results, together with existing findings about factual recall mechanisms, provide a coherent narrative of how grounding and factual recall mechanisms interact within LLMs.	翻訳日:2024-06-13 00:48:47 公開日:2024-06-11
# QMGeo: 確率量子化による異なる私的フェデレーション学習と混合縮尺幾何分布 QMGeo: Differentially Private Federated Learning via Stochastic Quantization with Mixed Truncated Geometric Distribution ( http://arxiv.org/abs/2312.05761v2 ) ライセンス: Link先を確認	Zixi Wang, M. Cenk Gursoy,	(参考訳) フェデレートラーニング(FL)は、複数のユーザがパラメータサーバの調整の下でのみモデル更新を送信し、データセットをローカルに保つことで、グローバル機械学習(ML)モデルを共同でトレーニングすることを可能にするフレームワークである。このような分散フレームワークの重要な動機の1つは、ユーザにプライバシ保証を提供することである。しかしながら、ユーザのデータセットをローカルに保存することは、プライバシに十分なものではないことが示されている。フレームワークにランダム性を導入することで、証明可能なプライバシー保証を提供するために、いくつかの差分プライバシー(DP)機構が提案されている。 FLフレームワークは、特に機械学習モデルが複雑さとサイズを増すにつれて、通信効率の課題にも直面する。量子化は一般的に利用される手法であり、基礎となる情報の圧縮表現を伝送することで通信コストを削減する。 FLにおけるDPと量子化の研究はいくつかあるが、プライバシ保証の提供における量子化手法の潜在的貢献は、まだ広く分析されていない。本稿では、混合幾何分布を利用して、付加雑音を伴わずにDPを提供するのに必要なランダム性を導入する、新しい確率量子化法を提案する。我々は,フレームワークの収束解析を行い,その性能を実証研究する。 Federated learning (FL) is a framework which allows multiple users to jointly train a global machine learning (ML) model by transmitting only model updates under the coordination of a parameter server, while being able to keep their datasets local. One key motivation of such distributed frameworks is to provide privacy guarantees to the users. However, preserving the users' datasets locally is shown to be not sufficient for privacy. Several differential privacy (DP) mechanisms have been proposed to provide provable privacy guarantees by introducing randomness into the framework, and majority of these mechanisms rely on injecting additive noise. FL frameworks also face the challenge of communication efficiency, especially as machine learning models grow in complexity and size. Quantization is a commonly utilized method, reducing the communication cost by transmitting compressed representation of the underlying information. Although there have been several studies on DP and quantization in FL, the potential contribution of the quantization method alone in providing privacy guarantees has not been extensively analyzed yet. We in this paper present a novel stochastic quantization method, utilizing a mixed geometric distribution to introduce the randomness needed to provide DP, without any additive noise. We provide convergence analysis for our framework and empirically study its performance.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# 空調シナリオ分布下での半教師付きクロスドメインクレーター検出のための2段階適応ネットワーク Two-Stage Adaptive Network for Semi-Supervised Cross-Domain Crater Detection under Varying Scenario Distributions ( http://arxiv.org/abs/2312.06169v2 ) ライセンス: Link先を確認	Yifan Liu, Tiecheng Song, Chengye Xian, Ruiyuan Chen, Yi Zhao, Rui Li, Tan Guo,	(参考訳) クレーター検出は、地球外惑星の歴史を人類が探索し理解するための貴重な情報を提供することができる。非常に異なるシナリオ分布のため、既知のラベル付きクレーターのデータセットで訓練された既存の検出モデルは、新しい未衝突惑星に適用すると、ほとんど効果がない。この問題に対処するために,半教師付きクロスクレーター検出のための2段階適応ネットワーク(TAN)を提案する。我々のネットワークはYOLOv5検出器上に構築されており、そこではクロスドメインの一般化能力を高めるために一連の戦略が採用されている。まず,注意に基づくスケール適応融合(ASAF)戦略を提案する。さらに, ハードサンプルのオーバーフィット問題に対処するスムーズなハードサンプルマイニング (SHEM) 機能を提案する。第2段階では、ソースとターゲットドメイン間の分布差を軽減するために、半教師付き学習のためのソートベースの擬似学習(SPF)戦略を提案する。どちらの段階でも、異なるクロスドメインタスクに適合するために、弱いまたは強い画像拡張を用いる。ベンチマークデータセットによる実験結果から,提案するネットワークは,様々なシナリオ分布下でのクレーター検出の領域適応性を向上できることが示された。 Crater detection can provide valuable information for humans to explore the topography and understand the history of extraterrestrial planets. Due to the significantly varying scenario distributions, existing detection models trained on known labelled crater datasets are hardly effective when applied to new unlabelled planets. To address this issue, we propose a two-stage adaptive network (TAN) for semi-supervised cross-domain crater detection. Our network is built on the YOLOv5 detector, where a series of strategies are employed to enhance its cross-domain generalisation ability. In the first stage, we propose an attention-based scale-adaptive fusion (ASAF) strategy to handle objects with significant scale variances. Furthermore, we propose a smoothing hard example mining (SHEM) loss function to address the issue of overfitting on hard examples. In the second stage, we propose a sort-based pseudo-labelling fine-tuning (SPF) strategy for semi-supervised learning to mitigate the distributional differences between source and target domains. For both stages, we employ weak or strong image augmentation to suit different cross-domain tasks. Experimental results on benchmark datasets demonstrate that the proposed network can enhance domain adaptation ability for crater detection under varying scenario distributions.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# EgoPlan-Bench: ヒューマン・レベル・プランニングのためのマルチモーダル・大規模言語モデルのベンチマーク EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning ( http://arxiv.org/abs/2312.06722v3 ) ライセンス: Link先を確認	Yi Chen, Yuying Ge, Yixiao Ge, Mingyu Ding, Bohao Li, Rui Wang, Ruifeng Xu, Ying Shan, Xihui Liu,	(参考訳) 人工知能(AGI)の追求はマルチモーダル大言語モデル(MLLM)によって加速され、多モーダル入力の処理において優れた推論能力、一般化能力、熟練度を示す。 AGIの進化における重要なマイルストーンは、人間レベルの計画の達成、複雑な環境で情報的決定を行う基本的な能力、および幅広い現実世界の問題を解決することである。 MLLMの目覚ましい進歩にもかかわらず、疑問が残る。現在のMLLMは、人間レベルの計画を達成するのにどれくらい時間がかかるのか? 本稿では,現実のシナリオにおけるMLLMの計画能力を評価するための総合的なベンチマークであるEgoPlan-Benchを紹介する。 EgoPlan-Bench氏はMLLMの計画能力の評価を強調し、現実的なタスク、多様なアクション計画、複雑な視覚的観察を特徴としている。幅広いMLLMを厳格に評価した結果,EgoPlan-Benchは人間レベルのタスクプランニングを実現するためのMLLMの改善のかなりの範囲を浮き彫りにした。この進歩を容易にするために,EgoPlan-Bench上でのモデル性能を効果的に向上する特別な命令チューニングデータセットであるEgoPlan-ITを提案する。将来の研究を進めるために、すべてのコード、データ、および維持されたベンチマークのリーダーボードを利用可能にしました。 The pursuit of artificial general intelligence (AGI) has been accelerated by Multimodal Large Language Models (MLLMs), which exhibit superior reasoning, generalization capabilities, and proficiency in processing multimodal inputs. A crucial milestone in the evolution of AGI is the attainment of human-level planning, a fundamental ability for making informed decisions in complex environments, and solving a wide range of real-world problems. Despite the impressive advancements in MLLMs, a question remains: How far are current MLLMs from achieving human-level planning? To shed light on this question, we introduce EgoPlan-Bench, a comprehensive benchmark to evaluate the planning abilities of MLLMs in real-world scenarios from an egocentric perspective, mirroring human perception. EgoPlan-Bench emphasizes the evaluation of planning capabilities of MLLMs, featuring realistic tasks, diverse action plans, and intricate visual observations. Our rigorous evaluation of a wide range of MLLMs reveals that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning. To facilitate this advancement, we further present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench. We have made all codes, data, and a maintained benchmark leaderboard available to advance future research.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# ニューラルネットワークエージェントの差分履歴 diff History for Neural Language Agents ( http://arxiv.org/abs/2312.07540v3 ) ライセンス: Link先を確認	Ulyana Piterbarg, Lerrel Pinto, Rob Fergus,	(参考訳) ニューラルネットワークモデル(LM)は、汎用的なエンボディドコントロールのためのエキサイティングなソリューションを提供する。しかし、LMベースのコントローラを使用する場合、重要な技術的問題が発生する。環境観測はテキストに変換されなければならない。その結果、LMエージェントの先行処理は、観測サイズが小さく、インタラクション履歴や命令チューニングの必要が最小限に抑えられている。本稿では,これらの問題に対するシンプルで効果的な解法である差分履歴を導入する。 The Unix diff command on the Continuous Text Observation in the interaction histories used to prompt LM Policy, both both both abstract away excess information and the content of textual inputs on the salient changes in the environment。意思決定のための長期的推論を必要とする未解決のビデオゲームであるNetHackでは、diff履歴をチューニングしたLMが、ニューラルエージェントの最先端のパフォーマンスと一致し、以前の作業よりも1800倍少ないトレーニング例を必要とする。簡潔なテキストによるBabyAI-Text環境においても、差分履歴はプロンプトの長さを増大させるが、その表現は低サンプル命令チューニングの効率を25%向上させる。さらに、差分履歴は異なるチューニングデータセットサイズで好適にスケール可能であることを示す。私たちはコードとデータをhttps://diffhistory.github.io.comに公開しています。 Neural Language Models (LMs) offer an exciting solution for general-purpose embodied control. However, a key technical issue arises when using an LM-based controller: environment observations must be converted to text, which coupled with history, results in long and verbose textual prompts. As a result, prior work in LM agents is limited to restricted domains with small observation size as well as minimal needs for interaction history or instruction tuning. In this paper, we introduce diff history, a simple and highly effective solution to these issues. By applying the Unix diff command on consecutive text observations in the interaction histories used to prompt LM policies, we can both abstract away redundant information and focus the content of textual inputs on the salient changes in the environment. On NetHack, an unsolved video game that requires long-horizon reasoning for decision-making, LMs tuned with diff history match state-of-the-art performance for neural agents while needing 1800x fewer training examples compared to prior work. Even on the simpler BabyAI-Text environment with concise text observations, we find that although diff history increases the length of prompts, the representation it provides offers a 25% improvement in the efficiency of low-sample instruction tuning. Further, we show that diff history scales favorably across different tuning dataset sizes. We open-source our code and data to https://diffhistory.github.io.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# アイドルレベルをもつ量子オットーサイクルにおける仕事と効率の変動 Work and efficiency fluctuations in a quantum Otto cycle with idle levels ( http://arxiv.org/abs/2312.12350v2 ) ライセンス: Link先を確認	Maron F. Anka, Thiago R. de Oliveira, Daniel Jonathan,	(参考訳) We study the performance of a quantum Otto heat engine with two spins coupled by a Heisenberg interaction, account in the mean value of work and efficiency but also fluctuations。まず, このシステムでは, 熱浴と平衡に系の磁化, 磁化率と, その変動が直接関係していることを示す。所定温度域の相対ゆらぎを低く抑えながら, 作業抽出が可能な領域を解析し, 単スピン系ヒートエンジンよりも高い効率を実現した。特に「アイドル」レベルが存在するため、スピン間カップリングの増加は、他のパラメータによって、変動を増大または減少させる可能性がある。しかし、いずれの場合も、作業や効率の相対的な変動は大きいままであり、これはこの微視的なエンジンが作業の源としてあまり信頼性がないことを意味している。 We study the performance of a quantum Otto heat engine with two spins coupled by a Heisenberg interaction, taking into account not only the mean values of work and efficiency but also their fluctuations. We first show that, for this system, the output work and its fluctuations are directly related to the magnetization and magnetic susceptibility of the system at equilibrium with either heat bath. We analyze the regions where the work extraction can be done with low relative fluctuation for a given range of temperatures, while still achieving an efficiency higher than that of a single spin system heat engine. In particular, we find that, due to the presence of `idle' levels, an increase in the inter-spin coupling can either increase or decrease fluctuations, depending on the other parameters. In all cases, however, we find that the relative fluctuations in work or efficiency remain large, implying that this microscopic engine is not very reliable as a source of work.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# 効率的な忠実度推定:オルタナティブな導出とその応用 Efficient fidelity estimation: Alternative derivation and related applications ( http://arxiv.org/abs/2312.12438v3 ) ライセンス: Link先を確認	Diego S. Starke, Marcos L. W. Basso, Jonas Maziero,	(参考訳) A. J. Baldwin と J. A. Jones は、[Phys. Rev. A 107, 012427 (2023)] において、Uhlmann-Jozsa の二つの量子状態 $\rho$ と $\sigma$,====(Tr\sqrt{\sqrt{\rho}\sigma\sqrt{\rho}})^2$ の間の忠実さが、$F(\rho,\sigma) = (Tr\sqrt{\rho\sigma})^2$ として単純形式で書けることを証明した。本稿では、関数列展開とトレース関数の性質を用いて、この結果の代替的証明を与える。我々のアプローチは、単純化された式の有効性を補強するだけでなく、量子状態に対する新しい相似関数や密度作用素のより複雑なトレース関数の探索も促進する。 In [Phys. Rev. A 107, 012427 (2023)], A. J. Baldwin and J. A. Jones proved that Uhlmann-Jozsa's fidelity between two quantum states $\rho$ and $\sigma$, i.e., $F(\rho,\sigma)~:=~(Tr\sqrt{\sqrt{\rho}\sigma\sqrt{\rho}})^2$, can be written in a simplified form as $F(\rho,\sigma) = (Tr\sqrt{\rho\sigma})^2$. In this article, we give an alternative proof of this result, using a function power series expansion and the properties of the trace function. Our approach not only reinforces the validity of the simplified expression but also facilitates the exploration of novel dissimilarity functions for quantum states and more complex trace functions of a density operator.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# NeRFをベースとした色とオパクティを持つガウススメッティング Gaussian Splatting with NeRF-based Color and Opacity ( http://arxiv.org/abs/2312.13729v4 ) ライセンス: Link先を確認	Dawid Malarz, Weronika Smolak, Jacek Tabor, Sławomir Tadeja, Przemysław Spurek,	(参考訳) ニューラル・レージアンス・フィールド(Neural Radiance Fields、NeRF)は、ニューラルネットワークが3次元物体の複雑な位置を捉えていることを示す。ニューラルネットワークの重みの中に形状と色情報をエンコードすることで、NeRFは3Dオブジェクトの驚くほどシャープな新しいビューを生み出すのに優れています。近年, 生成モデルを用いた多くのNeRFの一般化が出現し, その汎用性が高まっている。対照的に、Gaussian Splatting(GS)は、ニューラルネットワークの動作を必要としないため、トレーニングと推論の高速化により、同様のレンダリング品質を提供する。ガウス分布の集合に3Dオブジェクトに関する情報をエンコードし、古典的なメッシュと同様に3Dでレンダリングできる。残念なことに、GSは通常数十万のガウス成分を必要とするため、条件付けが難しい。両モデルの欠点を軽減するために,3Dオブジェクト形状のGS表現とNeRFによる色と不透明度の符号化を用いたビューイングディビジョン・ガウシアン・スプレイティング(VDGS)のハイブリッドモデルを提案する。我々のモデルは、トレーニング可能な位置(すなわちガウスの手段)、形状(すなわちガウスの共分散)、色と不透明度、およびその色と不透明度の変化をもたらすためにガウスのパラメータと視方向を捉えるニューラルネットワークを用いている。その結果,3次元オブジェクトの影や光の反射,透過性を,テクスチャや光の成分を加えることなくよりうまく表現することができた。 Neural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. By encoding the shape and color information within neural network weights, NeRFs excel at producing strikingly sharp novel views of 3D objects. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding its versatility. In contrast, Gaussian Splatting (GS) offers a similar render quality with faster training and inference as it does not need neural networks to work. It encodes information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS are difficult to condition since they usually require circa hundred thousand Gaussian components. To mitigate the caveats of both models, we propose a hybrid model Viewing Direction Gaussian Splatting (VDGS) that uses GS representation of the 3D object's shape and NeRF-based encoding of color and opacity. Our model uses Gaussian distributions with trainable positions (i.e. means of Gaussian), shape (i.e. covariance of Gaussian), color and opacity, and a neural network that takes Gaussian parameters and viewing direction to produce changes in the said color and opacity. As a result, our model better describes shadows, light reflections, and the transparency of 3D objects without adding additional texture and light components.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# 損失と不確実性に基づく能動学習アルゴリズムの収束性について On the Convergence of Loss and Uncertainty-based Active Learning Algorithms ( http://arxiv.org/abs/2312.13927v3 ) ライセンス: Link先を確認	Daniel Haimovich, Dima Karamshuk, Fridolin Linder, Niek Tax, Milan Vojnovic,	(参考訳) 確率勾配降下法(SGD)アルゴリズムを用いて、機械学習モデルの学習に必要な収束率とデータサンプルサイズについて検討し、損失値または不確実値に基づいてデータポイントをサンプリングする。これらの学習方法は、アクティブな学習とデータサブセット選択の問題に特に関係している。一定のステップサイズを更新したSGDに対して,2乗ヒンジ損失と類似のトレーニング損失関数を用いた線形分類器と線形分離可能なデータセットの収束結果を示す。さらに、より一般的な分類器やデータセットに分析を拡張し、広い範囲の損失に基づくサンプリング戦略と滑らかな凸トレーニング損失関数を考慮に入れた。本稿では,SGDを適応的なステップサイズで利用し,確率的Polyakのステップサイズを期待できる新しいアルゴリズムであるAdaptive-Weight Smpling(AWS)を提案する。滑らかな凸トレーニング損失関数に対して,AWSの収束率値を確立する。我々の数値実験は、正確な損失値または推定損失値を用いて、さまざまなデータセット上でAWSの効率を実証する。 We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty value. These training methods are particularly relevant for active learning and data subset selection problems. For SGD with a constant step size update, we present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and similar training loss functions. Additionally, we extend our analysis to more general classifiers and datasets, considering a wide range of loss-based sampling strategies and smooth convex training loss functions. We propose a novel algorithm called Adaptive-Weight Sampling (AWS) that utilizes SGD with an adaptive step size that achieves stochastic Polyak's step size in expectation. We establish convergence rate results for AWS for smooth convex training loss functions. Our numerical experiments demonstrate the efficiency of AWS on various datasets by using either exact or estimated loss values.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# $\mathtt{RNN}$の再スケーリング、離散化、線形化について About rescaling, discretisation and linearisation of $\mathtt{RNN}$ ( http://arxiv.org/abs/2312.15974v3 ) ライセンス: Link先を確認	Mariano Caruso, Cecilia Jarne,	(参考訳) 我々は、リカレントニューラルネットワーク(\mathtt{RNN}$s)の数学的基礎と、時間的再スケーリング、離散化、線形化の3つの基本的な手順について検討した。これらの技術は、時相力学、実用的な計算実装、解析のための線形近似に関する洞察を可能にするために、$\matht{RNN}$sの振る舞いを特徴づけるための必須のツールを提供する。我々はこれらの手順の柔軟な適用順序について議論し、神経科学および機械学習応用のための$\mathtt{RNN}$sをモデル化し分析することの重要性を強調した。これらの手順がどのような条件で交換可能かは、ここで明確に記述する。 We explored the mathematical foundations of Recurrent Neural Networks ($\mathtt{RNN}$s) and three fundamental procedures: temporal rescaling, discretisation and linearisation. These techniques provide essential tools for characterizing $\mathtt{RNN}$s behaviour, enabling insights into temporal dynamics, practical computational implementation, and linear approximations for analysis. We discuss the flexible order of application of these procedures, emphasizing their significance in modelling and analyzing $\mathtt{RNN}$s for neuroscience and machine learning applications. We explicitly describe here under what conditions these procedures can be interchangeable.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# Oceanship: 水中オーディオターゲット認識のための大規模データセット Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition ( http://arxiv.org/abs/2401.02099v3 ) ライセンス: Link先を確認	Zeyu Li, Suncheng Xiang, Tong Yu, Jingsheng Gao, Jiacheng Ruan, Yanping Hu, Ting Liu, Yuzhuo Fu,	(参考訳) 水中オーディオの認識は、船が動いている間を識別する上で重要な役割を担っている。水中目標認識タスクは、海洋環境保護、船舶放射音の検出、水中騒音の制御、沿岸船舶の派遣など、幅広い用途に応用されている。従来のUATRタスクでは、オーディオデータから特徴を抽出し、船舶のタイプを予測するためにネットワークをトレーニングする。現在のUATRデータセットは、持続時間とサンプル量の両方の欠点を示す。本論文では,大規模かつ多様な水中オーディオデータセットであるOceanshipを提案する。このデータセットは15のカテゴリで構成され、総期間は121時間であり、座標、速度、船舶タイプ、タイムスタンプといった包括的なアノテーション情報を含んでいる。我々は2021年から2022年にかけて,Ocean Communication Network(ONC)データベースからオリジナルの通信データをクロールして整理してデータセットをコンパイルした。音声検索タスクは一般的な音声分類では確立されているが,水中音声認識の文脈では検討されていない。 Oceanshipデータセットを活用することで、水中オーディオ検索のためのOceannetというベースラインモデルを導入する。このモデルは1(R@1)の精度67.11%、リコール精度5(R@5)の精度99.13%をDeepshipデータセットで達成している。 The recognition of underwater audio plays a significant role in identifying a vessel while it is in motion. Underwater target recognition tasks have a wide range of applications in areas such as marine environmental protection, detection of ship radiated noise, underwater noise control, and coastal vessel dispatch. The traditional UATR task involves training a network to extract features from audio data and predict the vessel type. The current UATR dataset exhibits shortcomings in both duration and sample quantity. In this paper, we propose Oceanship, a large-scale and diverse underwater audio dataset. This dataset comprises 15 categories, spans a total duration of 121 hours, and includes comprehensive annotation information such as coordinates, velocity, vessel types, and timestamps. We compiled the dataset by crawling and organizing original communication data from the Ocean Communication Network (ONC) database between 2021 and 2022. While audio retrieval tasks are well-established in general audio classification, they have not been explored in the context of underwater audio recognition. Leveraging the Oceanship dataset, we introduce a baseline model named Oceannet for underwater audio retrieval. This model achieves a recall at 1 (R@1) accuracy of 67.11% and a recall at 5 (R@5) accuracy of 99.13% on the Deepship dataset.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# 病理画像解析のためのベンチマークパスCLIP Benchmarking PathCLIP for Pathology Image Analysis ( http://arxiv.org/abs/2401.02651v2 ) ライセンス: Link先を確認	Sunyi Zheng, Xiaonan Cui, Yuxuan Sun, Jingxiong Li, Honglin Li, Yunlong Zhang, Pingyi Chen, Xueping Jing, Zhaoxiang Ye, Lin Yang,	(参考訳) 正確な画像分類と検索は、臨床診断と治療決定にとって重要である。最近のコントラスト言語画像事前学習(CLIP)モデルは、自然画像の理解に顕著な習熟性を示している。 CLIPからインスピレーションを得たPathCLIPは、20万以上の画像とテキストペアをトレーニングに利用して、病理画像解析用に特別に設計されている。 PathCLIPのパフォーマンスは印象的だが、その頑丈さは幅広い画像の破損の下では未だに不明である。そこで我々は,骨肉腫とWSSS4LUADのデータセットから,多彩な画像に対するPathCLIPの性能評価を行った。実験では, 明るさ, コントラスト, ガウスのぼかし, 解像度, 彩度, 色調, マークアップの7種類の汚職を4つの重度レベルで導入した。実験の結果,PathCLIPは画像の破損に対して比較的堅牢であり,ゼロショット分類ではOpenAI-CLIPとPLIPを上回っていることがわかった。 7つの汚職のうち、ぼかしと解像度がPathCLIPのサーバパフォーマンスを劣化させる可能性がある。これは、臨床検査を行う前に画像の品質を確保することが重要であることを示している。また,画像画像検索作業におけるPathCLIPのロバスト性を評価し,骨肉腫に対するPLIPよりもPathCLIPの有効性は低いが,WSSS4LUADは多彩な腐敗下では良好であることを明らかにした。全体として、PathCLIPは、画像に対して印象的なゼロショット分類と検索性能を示すが、それを使用するには適切な注意が必要である。この研究がPathCLIPの質的な印象を与え、他のCLIPモデルとの違いを理解するのに役立ちたい。 Accurate image classification and retrieval are of importance for clinical diagnosis and treatment decision-making. The recent contrastive language-image pretraining (CLIP) model has shown remarkable proficiency in understanding natural images. Drawing inspiration from CLIP, PathCLIP is specifically designed for pathology image analysis, utilizing over 200,000 image and text pairs in training. While the performance the PathCLIP is impressive, its robustness under a wide range of image corruptions remains unknown. Therefore, we conduct an extensive evaluation to analyze the performance of PathCLIP on various corrupted images from the datasets of Osteosarcoma and WSSS4LUAD. In our experiments, we introduce seven corruption types including brightness, contrast, Gaussian blur, resolution, saturation, hue, and markup at four severity levels. Through experiments, we find that PathCLIP is relatively robustness to image corruptions and surpasses OpenAI-CLIP and PLIP in zero-shot classification. Among the seven corruptions, blur and resolution can cause server performance degradation of the PathCLIP. This indicates that ensuring the quality of images is crucial before conducting a clinical test. Additionally, we assess the robustness of PathCLIP in the task of image-image retrieval, revealing that PathCLIP performs less effectively than PLIP on Osteosarcoma but performs better on WSSS4LUAD under diverse corruptions. Overall, PathCLIP presents impressive zero-shot classification and retrieval performance for pathology images, but appropriate care needs to be taken when using it. We hope this study provides a qualitative impression of PathCLIP and helps understand its differences from other CLIP models.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# 臨界インフラにおけるサイバーセキュリティ : ポスト量子暗号の視点から Cybersecurity in Critical Infrastructures: A Post-Quantum Cryptography Perspective ( http://arxiv.org/abs/2401.03780v2 ) ライセンス: Link先を確認	Javier Oliva del Moral, Antonio deMarti iOlius, Gerard Vidal, Pedro M. Crespo, Josu Etxezarreta Martinez,	(参考訳) 産業環境の機械は、数年前にインターネットに接続され、その性能が向上した。しかし、この変更により、サイバー攻撃に対する脆弱な環境が、彼らの正しい機能を損なう恐れがあり、経済や社会問題を引き起こした。さらに,オペレーショナル・テクノロジー (OT) デバイス間の通信に暗号システムを実装することは,情報技術 (IT) 環境よりも難しい課題である。このため、産業用通信ネットワークにおける暗号システムの実装は、通信のセキュリティと産業用インフラの償却との間のトレードオフに直面している。クリティカル・インフラストラクチャー(Critical Infrastructure、CI)とは、例えば電気など、日々の社会・経済の発展に重要な資源を提供する産業を指す。さらに、サイバーセキュリティに対する新たな脅威は、RSAやECCのような最先端の暗号プロトコルを破る可能性から、量子コンピュータの理論的な提案によってもたらされた。多くのグローバルエージェントは、セキュアな通信を量子セキュアなパラダイムに移行することが、フォールトトレランスの到来前に確立されるべき優先事項であることを認識するようになった。本稿では,CI環境にポスト量子暗号(PQC)を実装する際の問題点について述べる。そのために、これらのシナリオの要件と、それらがITとどのように異なるのかを説明します。また、古典暗号や、量子コンピュータがこのようなセキュリティプロトコルにどのように脅威をもたらすかについても紹介する。さらに,PQCプロトコルの現状と特徴について述べる。産業環境におけるPQCの統合の問題点について論じる。 The machinery of industrial environments was connected to the Internet years ago with the scope of increasing their performance. However, this change made such environments vulnerable against cyber-attacks that can compromise their correct functioning resulting in economic or social problems. Moreover, implementing cryptosystems in the communications between operational technology (OT) devices is a more challenging task than for information technology (IT) environments since the OT networks are generally composed of legacy elements, characterized by low-computational capabilities. Consequently, implementing cryptosystems in industrial communication networks faces a trade-off between the security of the communications and the amortization of the industrial infrastructure. Critical Infrastructure (CI) refers to the industries which provide key resources for the daily social and economical development, e.g. electricity. Furthermore, a new threat to cybersecurity has arisen with the theoretical proposal of quantum computers, due to their potential ability of breaking state-of-the-art cryptography protocols, such as RSA or ECC. Many global agents have become aware that transitioning their secure communications to a quantum secure paradigm is a priority that should be established before the arrival of fault-tolerance. In this paper, we aim to describe the problematic of implementing post-quantum cryptography (PQC) to CI environments. For doing so, we describe the requirements for these scenarios and how they differ against IT. We also introduce classical cryptography and how quantum computers pose a threat to such security protocols. Furthermore, we introduce state-of-the-art proposals of PQC protocols and present their characteristics. We conclude by discussing the problematic of integrating PQC in industrial environments.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# フラッグで楽しむ:フラッグマニフォールドによるロバストな主要方向 Fun with Flags: Robust Principal Directions via Flag Manifolds ( http://arxiv.org/abs/2401.04071v3 ) ライセンス: Link先を確認	Nathan Mankovich, Gustau Camps-Valls, Tolga Birdal,	(参考訳) 主成分分析(PCA)は、多様体の拡張や外層汚染データとともに、コンピュータビジョンや機械学習では不可欠である。そこで本研究では,PCAとその変種に対する統一形式を提示し,線形部分空間のフラグに基づくフレームワークを導入する。分散を最大化するか、再構成誤差を最小化する従来のPCA手法を一般化することから始める。我々はこれらの解釈を拡張して、外れ値とデータ多様体を考慮し、新しい次元削減アルゴリズムを広範囲に開発する。共通の計算手法を考案するために、フラグ多様体の最適化問題として、頑健で双対なPCAを再放送する。次に、このフラグベースのフレームワークに主測地線解析(Tangent-PCA)の接空間近似を組み込み、新しいロバストかつ双対測地線PCAのバリエーションを作成する。ここで導入された"フラグ化(flagification)"によって提供される顕著な柔軟性は、特定のフラグタイプによって識別される、さらにアルゴリズム的なバリエーションを可能にします。最後に、Stiefel多様体を用いたこれらのフラグ形式に対する効果的な収束解法を提案する。実世界のシナリオと合成シナリオの両方に関する実証的な結果から、新しいアルゴリズムの優位性、特に多様体上の外れ値に対するロバスト性を示す。 Principal component analysis (PCA), along with its extensions to manifolds and outlier contaminated data, have been indispensable in computer vision and machine learning. In this work, we present a unifying formalism for PCA and its variants, and introduce a framework based on the flags of linear subspaces, ie a hierarchy of nested linear subspaces of increasing dimension, which not only allows for a common implementation but also yields novel variants, not explored previously. We begin by generalizing traditional PCA methods that either maximize variance or minimize reconstruction error. We expand these interpretations to develop a wide array of new dimensionality reduction algorithms by accounting for outliers and the data manifold. To devise a common computational approach, we recast robust and dual forms of PCA as optimization problems on flag manifolds. We then integrate tangent space approximations of principal geodesic analysis (tangent-PCA) into this flag-based framework, creating novel robust and dual geodesic PCA variations. The remarkable flexibility offered by the 'flagification' introduced here enables even more algorithmic variants identified by specific flag types. Last but not least, we propose an effective convergent solver for these flag-formulations employing the Stiefel manifold. Our empirical results on both real-world and synthetic scenarios, demonstrate the superiority of our novel algorithms, especially in terms of robustness to outliers on manifolds.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# RudolfV:病理学者のための基礎モデル RudolfV: A Foundation Model by Pathologists for Pathologists ( http://arxiv.org/abs/2401.04079v4 ) ライセンス: Link先を確認	Jonas Dippel, Barbara Feulner, Tobias Winterhoff, Timo Milbich, Stephan Tietz, Simon Schallenberg, Gabriel Dernbach, Andreas Kunft, Simon Heinke, Marie-Lisa Eich, Julika Ribbat-Idel, Rosemarie Krupar, Philipp Anders, Niklas Prenißl, Philipp Jurmeister, David Horst, Lukas Ruff, Klaus-Robert Müller, Frederick Klauschen, Maximilian Alber,	(参考訳) 人工知能は、臨床診断や生物医学の研究に影響を与える病理学を変革し始めている。しかし、多くの計算病理学的アプローチが提案されているが、現在のAIモデルは、一般化、応用の多様性、希少疾患の扱いに関して制限されている。近年の取り組みはこれらの課題に対処するために自己監督的基礎モデルを導入しているが、既存のアプローチでは設計による病理学的な知識を活用できない。本研究では, 組織タイプ58種を含む15以上の実験室から, 病理学の専門知識, 半自動データキュレーション, および多種多様なデータセットを取り入れ, 組織化学的, 免疫組織化学的染色モードを129種類含む, 計算病理学の基礎モデルを設計するための新しいアプローチを提案する。我々は,腫瘍のマイクロ環境プロファイリング,バイオマーカー評価,参照事例探索に重点を置くベンチマークにおいて,我々のモデル「RudolfV」が既存の最先端基盤モデルを上回っ,良好なロバスト性を示した。本研究は、ドメイン固有の知識が、病理基盤モデルの効率性と性能を向上し、新しい応用領域を実現する方法を示す。 Artificial intelligence has started to transform histopathology impacting clinical diagnostics and biomedical research. However, while many computational pathology approaches have been proposed, most current AI models are limited with respect to generalization, application variety, and handling rare diseases. Recent efforts introduced self-supervised foundation models to address these challenges, yet existing approaches do not leverage pathologist knowledge by design. In this study, we present a novel approach to designing foundation models for computational pathology, incorporating pathologist expertise, semi-automated data curation, and a diverse dataset from over 15 laboratories, including 58 tissue types, and encompassing 129 different histochemical and immunohistochemical staining modalities. We demonstrate that our model "RudolfV" surpasses existing state-of-the-art foundation models across different benchmarks focused on tumor microenvironment profiling, biomarker evaluation, and reference case search while exhibiting favorable robustness properties. Our study shows how domain-specific knowledge can increase the efficiency and performance of pathology foundation models and enable novel application areas.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# 知識グラフ埋め込みのためのブロック対角直交関係と行列エンティティ Block-Diagonal Orthogonal Relation and Matrix Entity for Knowledge Graph Embedding ( http://arxiv.org/abs/2401.05967v2 ) ライセンス: Link先を確認	Yihua Zhu, Hidetoshi Shimodaira,	(参考訳) 知識グラフ埋め込み(KGE)の主な目的は、実体と関係性の低次元表現を学習し、行方不明な事実を予測することである。 RotatE や QuatE のような回転法は KGE ではよく機能するが、それらは2つの課題に直面している。これらの問題に対処するために、OrthogonalEという新しいKGEモデルを導入する。このアプローチにより、KGEモデルの汎用性と柔軟性が向上する。実験結果から,我々の新しいKGEモデルOrthogonalEは汎用的かつ柔軟であり,最先端のKGEモデルよりも優れており,関係パラメータの大幅な削減が期待できる。 The primary aim of Knowledge Graph embeddings (KGE) is to learn low-dimensional representations of entities and relations for predicting missing facts. While rotation-based methods like RotatE and QuatE perform well in KGE, they face two challenges: limited model flexibility requiring proportional increases in relation size with entity dimension, and difficulties in generalizing the model for higher-dimensional rotations. To address these issues, we introduce OrthogonalE, a novel KGE model employing matrices for entities and block-diagonal orthogonal matrices with Riemannian optimization for relations. This approach enhances the generality and flexibility of KGE models. The experimental results indicate that our new KGE model, OrthogonalE, is both general and flexible, significantly outperforming state-of-the-art KGE models while substantially reducing the number of relation parameters.	翻訳日:2024-06-13 00:39:03 公開日:2024-06-11
# 大規模言語モデルは時間的推論を学習できる Large Language Models Can Learn Temporal Reasoning ( http://arxiv.org/abs/2401.06853v5 ) ライセンス: Link先を確認	Siheng Xiong, Ali Payani, Ramana Kompella, Faramarz Fekri,	(参考訳) 大きな言語モデル(LLM)は顕著な推論能力を示しているが、欠陥や不正確さがないわけではない。近年の研究では、これらの制限を緩和する様々な方法が紹介されている。特に、時間的推論(TR)は、多種多様な時間的概念と複雑な時間的論理に依存しているため、LLMにとって重要な課題である。本稿では,言語ベースTRに向けた新しいフレームワークであるTG-LLMを提案する。元の文脈を推論する代わりに、TRの学習を促進する潜在表現である時間グラフ(TG)を採用する。完全制御可能で、最小限の監視を必要とする合成データセット(TGQA)は、このテキストからTGへの翻訳タスクにおいて、微調整のLLMのために構築される。実験では,データセット上で学習したTG翻訳の能力が,他のTRタスクやベンチマークに転送可能であることを確認した。それに加えて、私たちはLLMにChain-of-Thought(CoT)ブートストラップとグラフデータ拡張を通じて、意図的にTGを推論するように教えています。有用性と多様性のバランスを保っているこれらの戦略は,バニラのCoT蒸留よりも信頼性が高く,最終結果が得られた。 While large language models (LLMs) have demonstrated remarkable reasoning capabilities, they are not without their flaws and inaccuracies. Recent studies have introduced various methods to mitigate these limitations. Temporal reasoning (TR), in particular, presents a significant challenge for LLMs due to its reliance on diverse temporal concepts and intricate temporal logic. In this paper, we propose TG-LLM, a novel framework towards language-based TR. Instead of reasoning over the original context, we adopt a latent representation, temporal graph (TG) that enhances the learning of TR. A synthetic dataset (TGQA), which is fully controllable and requires minimal supervision, is constructed for fine-tuning LLMs on this text-to-TG translation task. We confirmed in experiments that the capability of TG translation learned on our dataset can be transferred to other TR tasks and benchmarks. On top of that, we teach LLM to perform deliberate reasoning over the TGs via Chain-of-Thought (CoT) bootstrapping and graph data augmentation. We observed that those strategies, which maintain a balance between usefulness and diversity, bring more reliable CoTs and final results than the vanilla CoT distillation.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# 112量子ビットを用いたシュウィンガーモデルにおけるハドロンダイナミクスの量子シミュレーション Quantum Simulations of Hadron Dynamics in the Schwinger Model using 112 Qubits ( http://arxiv.org/abs/2401.08044v2 ) ライセンス: Link先を確認	Roland C. Farrell, Marc Illa, Anthony N. Ciavarella, Martin J. Savage,	(参考訳) ハドロン波束は、IBMの133量子ビットHeron量子コンピュータibm_torinoの112キュービットを使用して、Schwingerモデルで準備され、時間的に進化する。ハドロンウェーブパレットの初期化は2つのステップで行われる。まず、最近開発されたSC-ADAPT-VQEアルゴリズムとワークフローを用いて、格子全体にわたって真空を調製する。その後、SC-ADAPT-VQEは局所状態の準備に拡張され、真空上にハドロン波束を確立するのに使用される。これは、断熱的に作製されたハドロン波束との重なりを最大化する低深度回路を適応的に構築することによる。ウェーブパケットの局所的な性質のため、これらの回路は古典的コンピュータを用いて小さな格子の列上で決定され、その後、量子コンピュータを用いてシミュレーションのために大きな格子上にウェーブパケットを作成するために頑強にスケールされる。時間進化は2階のトロッター化によって実現される。必要な量子ビット接続と回路深度の両方を低減するために、近似準局所相互作用を導入する。この近似は、遠距離における閉じ込めの出現によって可能となり、相互作用の増大する距離と指数関数的に収束する。複数のエラー軽減戦略を用いて、13,858個の2ビットゲート(CNOT深さ370)を用いて、最大14個の時間進化のトロッターステップを実行する。ハドロンの伝播は明らかであり, マトリックス製品状態シミュレーションと比較した結果が得られた。ハドロン散乱シミュレーションにおける短期量子優位性の可能性について論じる。 Hadron wavepackets are prepared and time evolved in the Schwinger model using 112 qubits of IBM's 133-qubit Heron quantum computer ibm_torino. The initialization of the hadron wavepacket is performed in two steps. First, the vacuum is prepared across the whole lattice using the recently developed SC-ADAPT-VQE algorithm and workflow. SC-ADAPT-VQE is then extended to the preparation of localized states, and used to establish a hadron wavepacket on top of the vacuum. This is done by adaptively constructing low-depth circuits that maximize the overlap with an adiabatically prepared hadron wavepacket. Due to the localized nature of the wavepacket, these circuits can be determined on a sequence of small lattices using classical computers, and then robustly scaled to prepare wavepackets on large lattices for simulations using quantum computers. Time evolution is implemented with a second-order Trotterization. To reduce both the required qubit connectivity and circuit depth, an approximate quasi-local interaction is introduced. This approximation is made possible by the emergence of confinement at long distances, and converges exponentially with increasing distance of the interactions. Using multiple error-mitigation strategies, up to 14 Trotter steps of time evolution are performed, employing 13,858 two-qubit gates (with a CNOT depth of 370). The propagation of hadrons is clearly identified, with results that compare favorably with Matrix Product State simulations. Prospects for a near-term quantum advantage in simulations of hadron scattering are discussed.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# リアルタイム組込みシステム故障インジェクタを意識したマイクロアーキテクチャイベント A Micro Architectural Events Aware Real-Time Embedded System Fault Injector ( http://arxiv.org/abs/2401.08397v2 ) ライセンス: Link先を確認	Enrico Magliano, Alessio Carpegna, Alessadro Savino, Stefano Di Carlo,	(参考訳) 現代では、システムの複雑さが増大し、SACRESの信頼性、信頼性、セキュリティに重大な課題が生じる。主な問題として、瞬時電圧スパイク、電磁干渉、中性子衝突、外気温といった現象への感受性がある。これらの要因はトランジスタのスイッチ状態の変化を誘発し、ビットフリッピング、ソフトエラー、メモリに格納されたデータの過渡的破壊をもたらす。ソフトエラーの発生はシステム障害を招き、システムに有害な状態をもたらす可能性がある。特に自動車、航空工学、航空宇宙などの重要な分野において、そのような機能不全は現実世界に影響を及ぼし、個人に害を与える可能性がある。本稿では,マイクロアーキテクチャイベントの監視,集約,検査を容易にする新しい故障インジェクタを提案する。これはマイクロプロセッサのPMUとデバッグインターフェースを活用することで実現され、特に障害注入の再現性を保証することに焦点を当てている。フォールトインジェクション手法は、メモリシステム内のビットフリップをターゲットとし、CPUレジスタとRAMに影響を与える。これらの断層注入の結果、ソフトエラーの影響を徹底的に解析し、同定された断層とSACRESが要求する本質的なタイミング予測可能性との間に堅牢な相関関係を確立することができる。 In contemporary times, the increasing complexity of the system poses significant challenges to the reliability, trustworthiness, and security of the SACRES. Key issues include the susceptibility to phenomena such as instantaneous voltage spikes, electromagnetic interference, neutron strikes, and out-of-range temperatures. These factors can induce switch state changes in transistors, resulting in bit-flipping, soft errors, and transient corruption of stored data in memory. The occurrence of soft errors, in turn, may lead to system faults that can propel the system into a hazardous state. Particularly in critical sectors like automotive, avionics, or aerospace, such malfunctions can have real-world implications, potentially causing harm to individuals. This paper introduces a novel fault injector designed to facilitate the monitoring, aggregation, and examination of micro-architectural events. This is achieved by harnessing the microprocessor's PMU and the debugging interface, specifically focusing on ensuring the repeatability of fault injections. The fault injection methodology targets bit-flipping within the memory system, affecting CPU registers and RAM. The outcomes of these fault injections enable a thorough analysis of the impact of soft errors and establish a robust correlation between the identified faults and the essential timing predictability demanded by SACRES.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# AIが生物の脅威景観に及ぼす影響のリスク分析に向けて Towards Risk Analysis of the Impact of AI on the Deliberate Biological Threat Landscape ( http://arxiv.org/abs/2401.12755v3 ) ライセンス: Link先を確認	Matthew E. Walsh,	(参考訳) 近年,生物工学と人工知能(AI)の融合によって生物リスクが増大し,バイオテクノロジーと人工知能のガバナンスに注目が集まっている。 2023年、人工知能の安全、安全、信頼に足る開発と利用に関する執行命令は、人工知能がバイオリスクをいかに高めるかを評価する必要がある。この観点から、バイオリスクを評価するための量的および質的な枠組みが提示される。どちらのフレームワークも、記法的なシナリオを使用して実行され、そのメリットと制限が議論される。最後に、視点は、評価と評価の方法論が、生命科学におけるAIの進歩に追随しなければならないことに注意して結論付ける。 The perception that the convergence of biological engineering and artificial intelligence (AI) could enable increased biorisk has recently drawn attention to the governance of biotechnology and artificial intelligence. The 2023 Executive Order, Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, requires an assessment of how artificial intelligence can increase biorisk. Within this perspective, quantitative and qualitative frameworks for evaluating biorisk are presented. Both frameworks are exercised using notional scenarios and their benefits and limitations are then discussed. Finally, the perspective concludes by noting that assessment and evaluation methodologies must keep pace with advances of AI in the life sciences.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# SciMMIR: 科学的マルチモーダル情報検索のベンチマーク SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval ( http://arxiv.org/abs/2401.13478v2 ) ライセンス: Link先を確認	Siwei Wu, Yizhi Li, Kang Zhu, Ge Zhang, Yiming Liang, Kaijing Ma, Chenghao Xiao, Haoran Zhang, Bohao Yang, Wenhu Chen, Wenhao Huang, Noura Al Moubayed, Jie Fu, Chenghua Lin,	(参考訳) マルチモーダル情報検索(MMIR)は、特に画像とテキストのペアリングにおいて、高度な表現学習と相互モーダルアライメント研究を通じて大きな進歩を遂げた、急速に発展する分野である。しかしながら、科学領域内の画像テキストペアリングにおけるMMIR性能を評価するための現在のベンチマークでは、学術言語で記述されたチャートや表のイメージが通常重要な役割を果たさない、顕著なギャップが示されている。このギャップを埋めるために、オープンアクセス紙コレクションを活用して、科学領域に関連するデータを抽出する特別科学的MMIR(SciMMIR)ベンチマークを開発する。このベンチマークは、科学的文書に詳細なキャプションのある数字や表から抽出された、530Kの精巧にキュレートされた画像テキストペアからなる。さらに,2レベルサブセットサブカテゴリ階層アノテーションを用いて画像テキストペアに注釈を付け,ベースラインのより包括的な評価を容易にする。 CLIP や BLIP などの視覚言語モデルを用いて,マルチモーダル画像キャプションにおけるゼロショットおよび微調整の評価を行った。我々の分析は、事前学習と微調整の影響、視覚およびテキストエンコーダの影響など、科学領域におけるMMIRの重要な洞察を提供する。データとチェックポイントはすべてhttps://github.com/Wusiwei0410/SciMMIR.comで公開されています。 Multi-modal information retrieval (MMIR) is a rapidly evolving field, where significant progress, particularly in image-text pairing, has been made through advanced representation learning and cross-modality alignment research. However, current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap, where chart and table images described in scholarly language usually do not play a significant role. To bridge this gap, we develop a specialised scientific MMIR (SciMMIR) benchmark by leveraging open-access paper collections to extract data relevant to the scientific domain. This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents. We further annotate the image-text pairs with two-level subset-subcategory hierarchy annotations to facilitate a more comprehensive evaluation of the baselines. We conducted zero-shot and fine-tuning evaluations on prominent multi-modal image-captioning and visual language models, such as CLIP and BLIP. Our analysis offers critical insights for MMIR in the scientific domain, including the impact of pre-training and fine-tuning settings and the influence of the visual and textual encoders. All our data and checkpoints are publicly available at https://github.com/Wusiwei0410/SciMMIR.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# 効率的かつスケーラブルなモデル予測制御のためのニューロモルフィック二次計画法 Neuromorphic quadratic programming for efficient and scalable model predictive control ( http://arxiv.org/abs/2401.14885v2 ) ライセンス: Link先を確認	Ashish Rao Mangalore, Gabriel Andreas Fonseca Guerra, Sumedh R. Risbud, Philipp Stratmann, Andreas Wild,	(参考訳) ロボット工学や他のサイズ、重量、電力に制約のある自律システムのエッジでの応用は、大規模な最適化問題に対するリアルタイムおよび低エネルギーのソリューションを必要とすることが多い。イベントベースおよびメモリ統合ニューロモルフィックアーキテクチャは、従来のフォン・ノイマンアーキテクチャと比較してエネルギー効率と性能に優れた最適化問題を解くことを約束する。本稿では,Intelのスケーラブルなニューロモルフィック研究チップLoihi 2における2次コスト関数と線形制約を用いた凸連続最適化問題の解法を提案する。四足歩行ロボットプラットフォームANYmalのモデル予測制御(MPC)問題に適用すると、様々な問題サイズに対して10ミリ秒未満の解時間を持つCPUとGPU上で、最先端のOSQPと比較して2桁以上のエネルギー遅延積の2桁の削減が達成される。これらの結果は、ロボット制御アプリケーションにおける非ヴォン・ノイマンアーキテクチャの利点を示している。 Applications in robotics or other size-, weight- and power-constrained autonomous systems at the edge often require real-time and low-energy solutions to large optimization problems. Event-based and memory-integrated neuromorphic architectures promise to solve such optimization problems with superior energy efficiency and performance compared to conventional von Neumann architectures. Here, we present a method to solve convex continuous optimization problems with quadratic cost functions and linear constraints on Intel's scalable neuromorphic research chip Loihi 2. When applied to model predictive control (MPC) problems for the quadruped robotic platform ANYmal, this method achieves over two orders of magnitude reduction in combined energy-delay product compared to the state-of-the-art solver, OSQP, on (edge) CPUs and GPUs with solution times under ten milliseconds for various problem sizes. These results demonstrate the benefit of non-von-Neumann architectures for robotic control applications.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# 地球観測データにおける予測信頼度向上のための潜時空間距離 A Latent Space Metric for Enhancing Prediction Confidence in Earth Observation Data ( http://arxiv.org/abs/2401.17342v2 ) ライセンス: Link先を確認	Ioannis Pitsiorlas, Argyro Tsantalidou, George Arvanitakis, Marios Kountouris, Charalambos Kontoes,	(参考訳) 本研究では,地球観測(EO)データを用いた回帰作業において,機械学習モデル予測の信頼性を推定するための新しいアプローチを提案する。変動型オートエンコーダアーキテクチャを利用して、EOデータセットの潜在空間表現による信頼度を導出する。この手法は、潜在表現におけるユークリッド距離と個々のMA予測における絶対誤差(AE)との相関を確立する上で重要である。本研究は,イタリア・ヴェネト地方とドイツのアッパーライン・バレーのEOデータセットに焦点をあて,蚊の集団に大きく影響された地域を対象としている。重要な発見は、MA予測のAEと提案された信頼度との0.46の顕著な相関である。この相関は、EOデータ分析と蚊量研究の両方の文脈において、AIモデルの信頼性を定量化し、AIモデルの予測の信頼性を高めるための、堅牢で新しい指標を示す。 This study presents a new approach for estimating confidence in machine learning model predictions, specifically in regression tasks utilizing Earth Observation (EO) data, with a particular focus on mosquito abundance (MA) estimation. We take advantage of a Variational AutoEncoder architecture, to derive a confidence metric by the latent space representations of EO datasets. This methodology is pivotal in establishing a correlation between the Euclidean distance in latent representations and the Absolute Error (AE) in individual MA predictions. Our research focuses on EO datasets from the Veneto region in Italy and the Upper Rhine Valley in Germany, targeting areas significantly affected by mosquito populations. A key finding is a notable correlation of 0.46 between the AE of MA predictions and the proposed confidence metric. This correlation signifies a robust, new metric for quantifying the reliability and enhancing the trustworthiness of the AI model's predictions in the context of both EO data analysis and mosquito abundance studies.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# 確率的推論によるロバスト逆グラフ Robust Inverse Graphics via Probabilistic Inference ( http://arxiv.org/abs/2402.01915v2 ) ライセンス: Link先を確認	Tuan Anh Le, Pavel Sountsov, Matthew D. Hoffman, Ben Lee, Brian Patton, Rif A. Saurous,	(参考訳) 雨や雪、霧といった汚職の存在下で、1枚の画像から3Dシーンを推測するにはどうすればよいのか? ストレートフォワード領域のランダム化は、前もって汚職の家族を知ることに依存する。本稿では,前向きの強いシーンと,前向きの非形式的一様腐敗に依存した,ベイズ的アプローチによる頑健な逆グラフ(RIG)を提案する。 1つの画像が与えられた後、RIGはシーンと汚職に対して共同で後部推論を行う。我々は、このアイデアを、前もって神経放射野(NeRF)シーンを訓練し、二次的なNeRFを用いて、非形式的な前置詞を表わす腐敗を表現することで実証する。クリーンなデータのみに基づいてトレーニングされたRIGは、完全な推論の代わりにポイント推定を行うディープ推定器や代替のNeRFアプローチより優れている。その結果、フローの正規化と拡散モデルに基づく、多くのシーン先行アーキテクチャが得られた。後者では,補助潜伏変数 (ReGAL) を用いた拡散条件付き再構成誘導法を, 汚職などの補助潜伏変数の存在下で適用する。 RIGは、シーンプリエントが生成タスクを超えてどのように使用できるかをデモしている。 How do we infer a 3D scene from a single image in the presence of corruptions like rain, snow or fog? Straightforward domain randomization relies on knowing the family of corruptions ahead of time. Here, we propose a Bayesian approach-dubbed robust inverse graphics (RIG)-that relies on a strong scene prior and an uninformative uniform corruption prior, making it applicable to a wide range of corruptions. Given a single image, RIG performs posterior inference jointly over the scene and the corruption. We demonstrate this idea by training a neural radiance field (NeRF) scene prior and using a secondary NeRF to represent the corruptions over which we place an uninformative prior. RIG, trained only on clean data, outperforms depth estimators and alternative NeRF approaches that perform point estimation instead of full inference. The results hold for a number of scene prior architectures based on normalizing flows and diffusion models. For the latter, we develop reconstruction-guidance with auxiliary latents (ReGAL)-a diffusion conditioning algorithm that is applicable in the presence of auxiliary latent variables such as the corruption. RIG demonstrates how scene priors can be used beyond generation tasks.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# 大規模言語モデルはインコンテキストをどうやって学習するか? インコンテキストヘッドのクエリとキーマトリクスは、メトリック学習のための2つの塔である How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning ( http://arxiv.org/abs/2402.02872v2 ) ライセンス: Link先を確認	Zeping Yu, Sophia Ananiadou,	(参考訳) 文分類作業における文脈内学習(ICL)のメカニズムを意味的に無関係なラベル(foo/bar)を用いて検討した。 ICL の精度は 87.6\% から 24.4\% に大きく影響している。この現象を理解するために、これらのヘッド内の値出力ベクトルを分析し、各ラベル位置のベクトルが対応するラベルに関する実質的な情報を含んでいることを発見する。さらに,「foo」から「bar」への予測シフトは,「foo」と「bar」の位置におけるこれらの頭部の注意点の減少と増加によるものと考えられた。そこで本研究では,テキスト内ヘッドにおいて,値出力行列がラベル特徴を抽出し,問合せキー行列が最終位置と各ラベル位置の類似性を演算する,という仮説を提案する。クエリとキー行列は、最後の位置の特徴とラベル位置でのそれぞれのデモンストレーションの類似度を学習する2つのタワーと見なすことができる。この仮説を用いて、ICLにおける多数ラベルバイアスと回帰バイアスを説明し、これらのバイアスをそれぞれ22\%と17\%に減少させる2つの方法を提案する。 We investigate the mechanism of in-context learning (ICL) on sentence classification tasks with semantically-unrelated labels ("foo"/"bar"). We find intervening in only 1\% heads (named "in-context heads") significantly affects ICL accuracy from 87.6\% to 24.4\%. To understand this phenomenon, we analyze the value-output vectors in these heads and discover that the vectors at each label position contain substantial information about the corresponding labels. Furthermore, we observe that the prediction shift from "foo" to "bar" is due to the respective reduction and increase in these heads' attention scores at "foo" and "bar" positions. Therefore, we propose a hypothesis for ICL: in in-context heads, the value-output matrices extract label features, while the query-key matrices compute the similarity between the features at the last position and those at each label position. The query and key matrices can be considered as two towers that learn the similarity metric between the last position's features and each demonstration at label positions. Using this hypothesis, we explain the majority label bias and recency bias in ICL and propose two methods to reduce these biases by 22\% and 17\%, respectively.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# 協調・競争同時ゲームにおけるゼロショットインタラクションのマスタリング Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games ( http://arxiv.org/abs/2402.03136v2 ) ライセンス: Link先を確認	Yannik Mahlau, Frederik Schubert, Bodo Rosenhahn,	(参考訳) セルフプレイとプランニングの組み合わせは,例えばChessやGoなど,シーケンシャルなゲームで大きな成功を収めています。しかし、AlphaZeroのようなアルゴリズムを同時ゲームに適用することは新たな課題となる。これらのゲームでは、他のエージェントの同時動作に関する情報が欠落することは、異なるナッシュ平衡を選択するか、最適にプレーしないかという制限要因となる。したがって、同時ゲームで他のエージェントと対話する際には、他のエージェントの振る舞いをモデル化することが不可欠である。そこで我々はAlbatross: AlphaZero for Learning bounded-rational Agents and temperature-based Response Optimization using Simulated Self-playを提案する。アルバトロスはSBRLE(Smooth Best Response Logit Equilibrium)という新しい平衡概念の演奏を学び、任意の演奏力を持つエージェントとの協調と競争を可能にした。我々は,協調的かつ競争的な完全情報ゲームセット上で,アルバトロスの広範な評価を行う。 AlphaZeroとは対照的に、AlbatrossはBattlesnakeの対戦ゲームにおいて弱いエージェントを利用することができる。さらに、前回のOvercookedベンチマークと比べて37.6%改善している。 The combination of self-play and planning has achieved great successes in sequential games, for instance in Chess and Go. However, adapting algorithms such as AlphaZero to simultaneous games poses a new challenge. In these games, missing information about concurrent actions of other agents is a limiting factor as they may select different Nash equilibria or do not play optimally at all. Thus, it is vital to model the behavior of the other agents when interacting with them in simultaneous games. To this end, we propose Albatross: AlphaZero for Learning Bounded-rational Agents and Temperature-based Response Optimization using Simulated Self-play. Albatross learns to play the novel equilibrium concept of a Smooth Best Response Logit Equilibrium (SBRLE), which enables cooperation and competition with agents of any playing strength. We perform an extensive evaluation of Albatross on a set of cooperative and competitive simultaneous perfect-information games. In contrast to AlphaZero, Albatross is able to exploit weak agents in the competitive game of Battlesnake. Additionally, it yields an improvement of 37.6% compared to previous state of the art in the cooperative Overcooked benchmark.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# MLLM-as-a-Judge:ビジョンランゲージベンチマークによるマルチモーダルLCM-as-a-Judgeの評価 MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark ( http://arxiv.org/abs/2402.04788v3 ) ライセンス: Link先を確認	Dongping Chen, Ruoxi Chen, Shilin Zhang, Yinuo Liu, Yaochen Wang, Huichi Zhou, Qihui Zhang, Yao Wan, Pan Zhou, Lichao Sun,	(参考訳) 近年,MLLM (Multimodal Large Language Models) が注目されている。しかし、MLLMの実用性を評価することは、主に人間の嗜好に合わせたマルチモーダルベンチマークが欠如していることから、かなりの課題を呈している。本稿では, LLM における LLM-as-a-Judge の概念からインスピレーションを得て, MLLM-as-a-Judge と呼ばれる新しいベンチマークを導入する。本研究は, MLLMがPair Comparisonにおいて顕著な人間ライクな識別を示す一方で, Scoring EvaluationとBatch Rankingにおいて, 人間の嗜好とは大きく異なることを明らかにした。さらに, GPT-4Vのような先進モデルにおいても, 多様なバイアス, 幻覚応答, 判断の不整合など, LLMの判定能力の持続的な課題が明らかにされている。これらの知見は,MLLMを信頼性の高い評価器として扱う前に実施すべき強化の必要性とさらなる研究努力を強調した。これを踏まえ、審査員として機能するMLLMの領域内での継続的な開発を支援するための追加的な取り組みを提唱する。コードとデータセットはプロジェクトのホームページで公開されています: \url{https://mllm-judge.github.io/}。 Multimodal Large Language Models (MLLMs) have gained significant attention recently, showing remarkable potential in artificial general intelligence. However, assessing the utility of MLLMs presents considerable challenges, primarily due to the absence of multimodal benchmarks that align with human preferences. Drawing inspiration from the concept of LLM-as-a-Judge within LLMs, this paper introduces a novel benchmark, termed MLLM-as-a-Judge, to assess the ability of MLLMs in assisting judges across diverse modalities, encompassing three distinct tasks: Scoring Evaluation, Pair Comparison, and Batch Ranking. Our study reveals that, while MLLMs demonstrate remarkable human-like discernment in Pair Comparison, there is a significant divergence from human preferences in Scoring Evaluation and Batch Ranking. Furthermore, a closer examination reveals persistent challenges in the judgment capacities of LLMs, including diverse biases, hallucinatory responses, and inconsistencies in judgment, even in advanced models such as GPT-4V. These findings emphasize the pressing need for enhancements and further research efforts to be undertaken before regarding MLLMs as fully reliable evaluators. In light of this, we advocate for additional efforts dedicated to supporting the continuous development within the domain of MLLM functioning as judges. The code and dataset are publicly available at our project homepage: \url{https://mllm-judge.github.io/}.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# 複素数値ニューラルネットワークと不規則分散マイクロホンを用いた室内伝達関数再構成 Room Transfer Function Reconstruction Using Complex-valued Neural Networks and Irregularly Distributed Microphones ( http://arxiv.org/abs/2402.04866v3 ) ライセンス: Link先を確認	Francesca Ronchini, Luca Comanducci, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti,	(参考訳) 室内の複雑な音場を計算するのに必要な室内伝達関数の再構成には、いくつかの重要な実世界の応用がある。しかし、非現実的な数のマイクロフォンがしばしば必要である。近年, 従来の信号処理法に加えて, 室内の散乱点における非常に限られた測定結果から, 室内伝達関数を再構築する深層学習技術が適用されている。本稿では,数個の不規則分散マイクロホンを用いて,第1室共振器の周波数範囲における室内伝達関数を推定するために,複素数値ニューラルネットワークを用いる。私たちの知る限りでは、複雑な評価されたニューラルネットワークが部屋の移動関数を推定するために使用されるのは、これが初めてです。複素数値最適化の利点を考察するため,提案手法を現状のカーネルベース信号処理手法と比較し, 提案手法が位相精度, 全体の音場品質の面で有意な利点を示すことを示す。情報的目的のために、このモデルと、同様に構造化されたデータ駆動型アプローチを比較し、実数値ニューラルネットワークを適用して、音場の大きさだけを再構成する。 Reconstructing the room transfer functions needed to calculate the complex sound field in a room has several important real-world applications. However, an unpractical number of microphones is often required. Recently, in addition to classical signal processing methods, deep learning techniques have been applied to reconstruct the room transfer function starting from a very limited set of measurements at scattered points in the room. In this paper, we employ complex-valued neural networks to estimate room transfer functions in the frequency range of the first room resonances, using a few irregularly distributed microphones. To the best of our knowledge, this is the first time that complex-valued neural networks are used to estimate room transfer functions. To analyze the benefits of applying complex-valued optimization to the considered task, we compare the proposed technique with a state-of-the-art kernel-based signal processing approach for sound field reconstruction, showing that the proposed technique exhibits relevant advantages in terms of phase accuracy and overall quality of the reconstructed sound field. For informative purposes, we also compare the model with a similarly-structured data-driven approach that, however, applies a real-valued neural network to reconstruct only the magnitude of the sound field.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# 知識蒸留におけるグラフニューラルネットワークを用いた大規模言語モデル Large Language Model Meets Graph Neural Network in Knowledge Distillation ( http://arxiv.org/abs/2402.05894v4 ) ライセンス: Link先を確認	Shengxiang Hu, Guobing Zou, Song Yang, Yanglan Gan, Bofeng Zhang, Yixin Chen,	(参考訳) サービス指向アーキテクチャでは、信頼性を維持し、ユーザの満足度を高めるために、QoS(Quality of Service)を正確に予測することが重要です。しかし、ユーザとサービス間の高次の協調関係を常に見落とし、正確な機能を学ぶ上で重要な特定のユーザサービス呼び出し毎に機能学習を動的に調整できないため、大きな課題が残っている。さらに、QoS進化を捉えるためのRNNに依存しているため、長距離依存関係の管理が難しいため、長期的なトレンドを検出することができる。これらの課題に対処するために、時間対応QoS予測のための \underline{T}arget-Prompt \underline{O}nline \underline{G}raph \underline{C}ollaborative \underline{L}earning (TOGCL) フレームワークを提案する。 TOGCLは、動的なユーザサービス呼び出しグラフを利用して、歴史的なインタラクションをモデル化し、ユーザサービス間の関係を包括的に表現する。このグラフに基づいて、ターゲットユーザ/サービスとその隣人間の暗黙的な協調関係と関連する歴史的QoS値とを同時に考慮しながら、ユーザとサービスのオンラインの深い潜伏した特徴を各時間スライス時に抽出するターゲットプロンプトグラフアテンションネットワークを開発する。さらに、ユーザやサービスの時間的特徴進化パターンを明らかにするために、多層トランスフォーマーエンコーダが使用され、時間的認識のQoS予測につながった。 WS-DREAMデータセットで実施された大規模な実験により、提案したTOGCLフレームワークは、複数のメトリクスにわたって最先端のメソッドを著しく上回り、最大38.80\%の改善が達成された。これらの結果は、TOGCLフレームワークの正確な時間的QoS予測の有効性を裏付けるものである。 In service-oriented architectures, accurately predicting the Quality of Service (QoS) is crucial for maintaining reliability and enhancing user satisfaction. However, significant challenges remain due to existing methods always overlooking high-order latent collaborative relationships between users and services and failing to dynamically adjust feature learning for every specific user-service invocation, which are critical for learning accurate features. Additionally, reliance on RNNs for capturing QoS evolution hampers models' ability to detect long-term trends due to difficulties in managing long-range dependencies. To address these challenges, we propose the \underline{T}arget-Prompt \underline{O}nline \underline{G}raph \underline{C}ollaborative \underline{L}earning (TOGCL) framework for temporal-aware QoS prediction. TOGCL leverages a dynamic user-service invocation graph to model historical interactions, providing a comprehensive representation of user-service relationships. Building on this graph, it develops a target-prompt graph attention network to extract online deep latent features of users and services at each time slice, simultaneously considering implicit collaborative relationships between target users/services and their neighbors, as well as relevant historical QoS values. Additionally, a multi-layer Transformer encoder is employed to uncover temporal feature evolution patterns of users and services, leading to temporal-aware QoS prediction. Extensive experiments conducted on the WS-DREAM dataset demonstrate that our proposed TOGCL framework significantly outperforms state-of-the-art methods across multiple metrics, achieving improvements of up to 38.80\%. These results underscore the effectiveness of the TOGCL framework for precise temporal QoS prediction.	翻訳日:2024-06-12 22:42:29 公開日:2024-06-11
# 光子数分解量子貯留層計算 Photon Number-Resolving Quantum Reservoir Computing ( http://arxiv.org/abs/2402.06339v2 ) ライセンス: Link先を確認	Sam Nerenberg, Oliver Neill, Giulia Marcucci, Daniele Faccio,	(参考訳) ニューロモルフィックプロセッサは、物理人工ニューロンの実装を通じて機械学習アルゴリズムの効率を改善し、計算を行う。しかし、効率的な古典的ニューロモルフィックプロセッサが様々な形で実証されている一方で、実用的な量子ニューロモルフィックプラットフォームはまだ開発の初期段階にある。本稿では、光子数分解された出力状態の検出によって可能となるフォトニック量子貯水池計算のための固定光ネットワークを提案する。これは、高次元ヒルベルト空間にアクセスしながら入力量子状態に必要な複雑さを著しく減少させる。このアプローチは、現在利用可能なテクノロジで実装可能であり、量子機械学習への参入障壁を低くする。 Neuromorphic processors improve the efficiency of machine learning algorithms through the implementation of physical artificial neurons to perform computations. However, whilst efficient classical neuromorphic processors have been demonstrated in various forms, practical quantum neuromorphic platforms are still in the early stages of development. Here we propose a fixed optical network for photonic quantum reservoir computing that is enabled by photon number-resolved detection of the output states. This significantly reduces the required complexity of the input quantum states while still accessing a high-dimensional Hilbert space. The approach is implementable with currently available technology and lowers the barrier to entry to quantum machine learning.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# 独立線形関数近似を持つマルコフゲームに対する精製サンプル複素性 Refined Sample Complexity for Markov Games with Independent Linear Function Approximation ( http://arxiv.org/abs/2402.07082v2 ) ライセンス: Link先を確認	Yan Dai, Qiwen Cui, Simon S. Du,	(参考訳) マルコフゲーム(MG)はマルチエージェント強化学習(MARL)の重要なモデルである。長年、「マルチエージェントの曲線」(アルゴリズムのパフォーマンスはエージェント数とともに指数関数的に低下する)はいくつかの最近の作品(Daskalakis et al , 2023; Cui et al , 2023; Wang et al , 2023)まで避けられないと信じられてきた。これらの作業によってマルチエージェントの呪いが解決されるが、状態空間が禁制的に大きく、(線形)関数近似が展開された場合、O(T^{-1/4})$の緩やかな収束率を持つか、損失関数が時間によって任意に変化する場合でも、単一エージェントの場合では避けられる$A_{\max}$ -- に多項式依存をもたらすかのいずれかである。本稿では,Wang et al (2023) による AVLPR フレームワークを改良し,data-dependent(確率的)悲観的推定法を設計し,プラグインアルゴリズムの幅広い選択を可能にする。独立線形関数近似を持つMGに特化する場合、時折極端な推定誤差をカバーするために、新しいアクション依存ボーナスを提案する。単一エージェントRLによる最先端技術を用いて,マルチエージェントの呪いに対処し,最適な$O(T^{-1/2})$収束率を達成し,$\text{poly}(A_{\max})$依存性を同時に回避するアルゴリズムを提案する。 Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL). It was long believed that the "curse of multi-agents" (i.e., the algorithmic performance drops exponentially with the number of agents) is unavoidable until several recent works (Daskalakis et al., 2023; Cui et al., 2023; Wang et al., 2023). While these works resolved the curse of multi-agents, when the state spaces are prohibitively large and (linear) function approximations are deployed, they either had a slower convergence rate of $O(T^{-1/4})$ or brought a polynomial dependency on the number of actions $A_{\max}$ -- which is avoidable in single-agent cases even when the loss functions can arbitrarily vary with time. This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing data-dependent (i.e., stochastic) pessimistic estimation of the sub-optimality gap, allowing a broader choice of plug-in algorithms. When specialized to MGs with independent linear function approximations, we propose novel action-dependent bonuses to cover occasionally extreme estimation errors. With the help of state-of-the-art techniques from the single-agent RL literature, we give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T^{-1/2})$ convergence rate, and avoids $\text{poly}(A_{\max})$ dependency simultaneously.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# 量子相転移の量子貯水池探査 Quantum reservoir probing of quantum phase transitions ( http://arxiv.org/abs/2402.07097v2 ) ライセンス: Link先を確認	Kaito Kobayashi, Yukitoshi Motome,	(参考訳) 量子相転移は、量子多体系において非常に顕著な現象である。しかし、それらの平衡系における正確な同定は、重要な理論的および実験的課題を生じさせる。これまでのところ、グローバルな量子クエンチを用いた動的検出プロトコルが提案されており、遷移はグローバルな非平衡励起と区別されている。本研究では,局所量子クエンチによって誘起される局所化された平衡励起により,量子相転移が検出可能であることを示す。クエンチ後の結果として生じるダイナミクスは、局所的なクエンチ演算と量子系の固有力学の両方に影響されるが、前者の効果は、量子貯水池探索(QRP)と呼ばれる最先端のフレームワークを通じて排他的に抽出される。 QRPを通して、局所的なクエンチの影響は異なる量子相によって異なり、量子臨界点付近で増幅された量子ゆらぎによって著しく抑制され、相境界を正確に決定する。我々は、QRPが、パラダイム的に積分可能で非可積分な量子系における量子相転移や、位相的量子相転移さえも、すべて単一サイトオブザーバブルを用いた同一のフレームワーク内で検出できることを実証した。 Quantum phase transitions are highly remarkable phenomena manifesting in quantum many-body systems. However, their precise identifications in equilibrium systems pose significant theoretical and experimental challenges. Thus far, dynamical detection protocols employing global quantum quenches have been proposed, wherein transitions are discerned from global nonequilibrium excitations. In this work, we demonstrate that quantum phase transitions can be detected through localized out-of-equilibrium excitations induced by local quantum quenches. While the resulting dynamics after the quench are influenced by both the local quench operation and the intrinsic dynamics of the quantum system, the effects of the former are exclusively extracted through the cutting-edge framework called quantum reservoir probing (QRP). Through the QRP, we find that the impacts of the local quenches vary across different quantum phases and are significantly suppressed by quantum fluctuations amplified near quantum critical points, thereby precisely delineating phase boundaries. We demonstrate that the QRP can detect quantum phase transitions in the paradigmatic integrable and nonintegrable quantum systems, and even topological quantum phase transitions, all within the identical framework employing single-site observables.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# GALA3D:Layout-guided Generative Gaussian Splattingによるテキストから3D複合シーン生成に向けて GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting ( http://arxiv.org/abs/2402.07207v2 ) ライセンス: Link先を確認	Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, Jinlin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang,	(参考訳) GALA3D, GALA3D, 生成3D GAussian, LAyout-guided control, for effective compositional text-to-3D generation。まず,大規模言語モデル(LLM)を用いて初期レイアウトを生成し,適応的な幾何学的制約を伴う3次元コンテンツ生成のためのレイアウト誘導型3次元ガウス表現を導入する。次に、条件付き拡散を用いたインスタンスシーン構成最適化機構を提案し、複数のオブジェクト間の一貫した幾何、テクスチャ、スケール、正確な相互作用を持つリアルな3Dシーンを協調的に生成し、同時にLLMから抽出された粗いレイアウト先を調整し、生成されたシーンと整合させる。実験の結果,GALA3Dは最先端のシーンレベルの3Dコンテンツ生成と制御可能な編集のためのエンド・ツー・エンド・エンド・フレームワークであり,シーン内のオブジェクトレベルのエンティティの高忠実性を確保していることがわかった。ソースコードとモデルはgala3d.github.ioで入手できる。 We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an instance-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interactions among multiple objects while simultaneously adjusting the coarse layout priors extracted from the LLMs to align with the generated scene. Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene. The source codes and models will be available at gala3d.github.io.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# Mercury: 大規模言語モデルのためのコード効率ベンチマーク Mercury: A Code Efficiency Benchmark for Code Large Language Models ( http://arxiv.org/abs/2402.07844v4 ) ライセンス: Link先を確認	Mingzhe Du, Anh Tuan Luu, Bin Ji, Qian Liu, See-Kiong Ng,	(参考訳) コードのための大規模言語モデル(Code LLM)を評価する最近の取り組みの中で、既存のベンチマークは主に生成されたコードの機能的正しさに焦点を合わせており、その計算効率の重要性を無視している。このギャップを埋めるために、コードLLMの最初のコード効率ベンチマークであるMercuryを提示する。 1,889のPythonタスクで構成され、それぞれに現実の効率のベースラインとして機能する適切なソリューションが伴い、ランタイムディストリビューションの包括的な分析を可能にする。この分布に基づいて,関数の正しさとコード効率を同時に反映するために,実行時毎のパススコアを算出する新たな測度Beyondを導入する。 Mercuryでは、コードLLMがPassで65%、Beyondで50%以下を達成できる。理想のBeyondスコアがPassスコアと一致していることを考えると、Code LLMは機能的に正しいコードを生成する素晴らしい能力を示すが、その効率に顕著なギャップがあることを示している。最後に、我々の実証実験により、DPO(Direct Preference Optimization)が、Supervised Fine Tuning(SFT)と比較して、コード効率を高めるための堅牢なベースラインとして機能していることが判明した。私たちのコードとデータはGitHubで入手可能です。 Amidst the recent strides in evaluating Large Language Models for Code (Code LLMs), existing benchmarks have mainly focused on the functional correctness of generated code, neglecting the importance of their computational efficiency. To fill the gap, we present Mercury, the first code efficiency benchmark for Code LLMs. It comprises 1,889 Python tasks, each accompanied by adequate solutions that serve as real-world efficiency baselines, enabling a comprehensive analysis of the runtime distribution. Based on the distribution, we introduce a new metric Beyond, which computes a runtime-percentile-weighted Pass score to reflect functional correctness and code efficiency simultaneously. On Mercury, leading Code LLMs can achieve 65% on Pass, while less than 50% on Beyond. Given that an ideal Beyond score would be aligned with the Pass score, it indicates that while Code LLMs exhibit impressive capabilities in generating functionally correct code, there remains a notable gap in their efficiency. Finally, our empirical experiments reveal that Direct Preference Optimization (DPO) serves as a robust baseline for enhancing code efficiency compared with Supervised Fine Tuning (SFT), which paves a promising avenue for future exploration of efficient code generation. Our code and data are available on GitHub: https://github.com/Elfsong/Mercury.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# ロジスティック損失を伴うオンライン多クラス分類におけるFenchel-Young LossesとSurrogate Regretの改善によるオンライン構造化予測 Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss ( http://arxiv.org/abs/2402.08180v2 ) ライセンス: Link先を確認	Shinsaku Sakaue, Han Bao, Taira Tsuchiya, Taihei Oki,	(参考訳) 本稿では,全情報フィードバックを用いたオンライン構造化予測について検討する。オンライン多クラス分類において、Van der Hoeven (2020) は、時間の地平線とは独立な「emph{exploit-the-surrogate-gap}」フレームワークを導入して「emph{finite}surrogate」境界を確立した。しかし、このフレームワークは主に、推定スコアを出力に変換するための分類固有の手順に依存するため、マルチクラス分類に限られている。我々は,多クラス分類におけるロジスティックな損失を含む大規模サロゲート損失の族である 'emph{Fenchel-Young loss} によるオンライン構造化予測にエクスプロイト・ザ・サロゲート・ギャップ・フレームワークを拡張し,種々の構造化予測問題における有限サロゲート後悔境界を求める。この目的のために、推定したスコアを一般的な構造化出力に変換する \emph{randomized decoding} を提案し、分析する。さらに、ロジスティック損失を伴うオンラインマルチクラス分類にデコードを適用することで、$O(\\| \mathbf{U} \\|_\mathrm{F}^2)$の代理的後悔境界を求め、$\mathbf{U}$は最良の線形線形推定器であり、$\\| \cdot \\|_\mathrm{F}$はフロベニウスノルムを表す。この境界は対数的因子に強くなり、Van der Hoeven (2020) による$O(d\\| \mathbf{U} \\|_\mathrm{F}^2) の以前の境界を $d$ の係数で改善する。 This paper studies online structured prediction with full-information feedback. For online multiclass classification, Van der Hoeven (2020) established \emph{finite} surrogate regret bounds, which are independent of the time horizon, by introducing an elegant \emph{exploit-the-surrogate-gap} framework. However, this framework has been limited to multiclass classification primarily because it relies on a classification-specific procedure for converting estimated scores to outputs. We extend the exploit-the-surrogate-gap framework to online structured prediction with \emph{Fenchel--Young losses}, a large family of surrogate losses that includes the logistic loss for multiclass classification as a special case, obtaining finite surrogate regret bounds in various structured prediction problems. To this end, we propose and analyze \emph{randomized decoding}, which converts estimated scores to general structured outputs. Moreover, by applying our decoding to online multiclass classification with the logistic loss, we obtain a surrogate regret bound of $O(\\| \mathbf{U} \\|_\mathrm{F}^2)$, where $\mathbf{U}$ is the best offline linear estimator and $\\| \cdot \\|_\mathrm{F}$ denotes the Frobenius norm. This bound is tight up to logarithmic factors and improves the previous bound of $O(d\\| \mathbf{U} \\|_\mathrm{F}^2)$ due to Van der Hoeven (2020) by a factor of $d$, the number of classes.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# マスク・テキスト・スーパービジョンによるオープンボキャブラリセグメンテーション Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision ( http://arxiv.org/abs/2402.08960v2 ) ライセンス: Link先を確認	Zhaoqing Wang, Xiaobo Xia, Ziye Chen, Xiao He, Yandong Guo, Mingming Gong, Tongliang Liu,	(参考訳) 現在の最先端のオープンボキャブラリセグメンテーション手法は、一般的に監督のためにイメージマスク文のトリプルトアノテーションに依存している。しかし、そのような詳細なアノテーションを取得することは労働集約的であり、複雑な現実世界のシナリオにおいてスケーラビリティの課題を提起する。既存の弱い教師付きアプローチでは、画像とテキストのペアを利用してアノテーションのコストを削減しているが、マスクの監督の欠如により、モデルが複数のインスタンスを見つけ出し、類似のセマンティクスを持つピクセルを正確にグループ化することが難しくなり、汎用性とパフォーマンスが著しく損なわれる。本稿では、未ペア画像マスクと画像テキストペアから学習し、独立して効率的に収集できる、弱教師付きオープン語彙セグメンテーションフレームワークUnpair-Segを紹介する。 Unpair-Segは最初、一連のバイナリマスクを予測し、自信あるマスクとテキストエンティティのペアを特定して擬似ラベルを生成する。次に、これらの擬似ラベルに基づいて、領域埋め込みとテキスト埋め込みを整列させる機能アダプタを訓練し、オープン語彙セグメンテーションを実現する。しかし、マスク・エンティリティ対応の固有のノイズは、信頼できるペアを得るのに困難をもたらす。そこで我々は、視覚言語による大規模モデルを用いて、入力画像を再キャプチャし、精密な実体を抽出し、ノイズの多いマスマスマスのペアを減らすためのマルチスケールマッチング戦略を設計する。我々のUnpair-Segフレームワークは、ADE-847とPASCAL Context-459データセットで14.6\%と19.5\% mIoUを達成し、完全に教師されたメソッドと弱い教師付きメソッドのギャップを著しく狭めている。 Current state-of-the-art open-vocabulary segmentation methods typically rely on image-mask-text triplet annotations for supervision. However, acquiring such detailed annotations is labour-intensive and poses scalability challenges in complex real-world scenarios. While existing weakly-supervised approaches leverage image-text pairs to reduce the expansive annotation cost, the lack of mask supervision makes it difficult for the model to locate multiple instances and accurately group pixels with similar semantics, significantly hampering versatility and performance. In this paper, we introduce Unpair-Seg, a novel weakly-supervised open-vocabulary segmentation framework that learns from unpaired image-mask and image-text pairs, which can be independently and efficiently collected. Unpair-Seg initially predicts a set of binary masks and generates pseudo labels by identifying confident pairs of masks and text entities. We then train a feature adapter to align region embeddings with text embeddings based on these pseudo labels, achieving open-vocabulary segmentation. However, the inherent noise in the mask-entity correspondence poses a challenge to obtaining reliable pairs. To address this, we employ a vision-language large model to re-caption the input images and extract precise entities, and we design a multi-scale matching strategy to reduce noisy mask-entity pairs. Our Unpair-Seg framework demonstrates impressive performance, achieving 14.6\% and 19.5\% mIoU on the ADE-847 and PASCAL Context-459 datasets, significantly narrowing the gap between fully-supervised and weakly-supervised methods.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# 自己アライメント・フォー・ファクチュアリティ:自己評価によるLLMの幻覚の軽減 Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation ( http://arxiv.org/abs/2402.09267v2 ) ライセンス: Link先を確認	Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Lifeng Jin, Linfeng Song, Haitao Mi, Helen Meng,	(参考訳) 人間的な能力の増大にもかかわらず、大きな言語モデル(LLM)は、たとえ関連する知識を持っていたとしても、事実的不正確さ、すなわち「幻覚」に苦しむことが多い。これらの幻覚に対処するためには、現在のアプローチは通常、高品質な人間の事実性アノテーションを必要とする。本研究では, LLMの自己評価能力を活用し, モデルが現実性に向かうためのトレーニング信号を提供する自己アライメント・フォー・ファクチュアリティについて検討する。具体的には、自己評価コンポーネントであるSelf-Evalを組み込んで、内部知識のみに基づいて、LLMが生成した応答の事実性を検証する。さらに,モデルの信頼性評価とキャリブレーションを改善し,LLMの自己評価能力を高めるために,自己知識チューニング(SK-Tuning)を設計する。次に、これらの自己アノテートされた応答を利用して、直接選好最適化アルゴリズムを用いてモデルを微調整する。提案手法は,TruthfulQAとBioGENの3つの重要な知識集約タスクにおいて,Llamaファミリーモデルに対する現実的精度を大幅に向上させることを示す。 Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e. "hallucinations", even when they hold relevant knowledge. To address these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. Specifically, we incorporate Self-Eval, a self-evaluation component, to prompt an LLM to validate the factuality of its own generated responses solely based on its internal knowledge. Additionally, we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM's self-evaluation ability by improving the model's confidence estimation and calibration. We then utilize these self-annotated responses to fine-tune the model via Direct Preference Optimization algorithm. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks on TruthfulQA and BioGEN.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# ランダム幾何グラフの幾何学的再構成 Reconstructing the Geometry of Random Geometric Graphs ( http://arxiv.org/abs/2402.09591v2 ) ライセンス: Link先を確認	Han Huang, Pakawut Jiradilok, Elchanan Mossel,	(参考訳) ランダム幾何学グラフは、距離空間上で定義されたランダムグラフモデルである。そのようなモデルは、計量空間からの最初のサンプリングポイントによって定義され、その後、各ペアのサンプリングポイントと、その距離に依存する確率を、ペア間で独立に連結する。この研究では、多様体の仮定の下でサンプリングされたグラフから基底空間の幾何を効率的に再構成する方法、すなわち、基底空間が低次元多様体であり、接続確率が$\mathbb{R}^N$ の与えられた多様体の埋め込みにおける点間のユークリッド距離の厳密な減少関数であることを仮定する。我々の研究は、多様体学習における大きな研究を補完するものであり、そこでは、その(近似的な)距離とともに、多様体でサンプリングされた点から多様体を復元することを目的としている。 Random geometric graphs are random graph models defined on metric spaces. Such a model is defined by first sampling points from a metric space and then connecting each pair of sampled points with probability that depends on their distance, independently among pairs. In this work, we show how to efficiently reconstruct the geometry of the underlying space from the sampled graph under the manifold assumption, i.e., assuming that the underlying space is a low dimensional manifold and that the connection probability is a strictly decreasing function of the Euclidean distance between the points in a given embedding of the manifold in $\mathbb{R}^N$. Our work complements a large body of work on manifold learning, where the goal is to recover a manifold from sampled points sampled in the manifold along with their (approximate) distances.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# 学習エージェントを用いた一般化プライマリエージェント問題 Generalized Principal-Agent Problem with a Learning Agent ( http://arxiv.org/abs/2402.09721v3 ) ライセンス: Link先を確認	Tao Lin, Yiling Chen,	(参考訳) Stackelbergゲーム、契約設計、ベイズ的説得を含む一般的な主エージェント問題(Bayesian Persuasion)は、エージェントがプリンシパルのコミット戦略に最もよく反応する経済問題である。本研究では,主観的主観的問題を,主観的主観的主観的主観的問題に対して,主観的主観的主観的問題と,主観的主観的主観的問題と,主観的主観的主観的問題とを交互に検討する。この問題を, ほぼベスト対応剤を用いた一括一般化主エージェント問題に還元する。この削減により,(1) エージェントが文脈非学習学習アルゴリズムを使用する場合,(2) エージェントが文脈非学習学習学習アルゴリズムを使用する場合,(2) エージェントが文脈非学習学習学習アルゴリズムを使用する場合,(2) エージェントが非学習モデルにおける最適なユーティリティ以上のユーティリティを得ることができない。しかし(3) エージェントが平均に基づく学習アルゴリズム(非学習だが非学習的)を使用する場合、プリンシパルは非学習モデルよりもはるかに優れている。これらの一般的な結果は、Stackelbergゲームにおける以前の結果を洗練し、学習エージェントとの契約設計を行うだけでなく、学習エージェントによるベイズ的説得の新しい結果をもたらす。 Generalized principal-agent problems, including Stackelberg games, contract design, and Bayesian persuasion, are a class of economic problems where an agent best responds to a principal's committed strategy. We study repeated generalized principal-agent problems under the assumption that the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal. We reduce this problem to a one-shot generalized principal-agent problem with an approximately-best-responding agent. Using this reduction, we show that: (1) if the agent uses contextual no-regret learning algorithms, then the principal can guarantee a utility that is at least the principal's optimal utility in the classic non-learning model minus the square root of the agent's regret; (2) if the agent uses contextual no-swap-regret learning algorithms, then the principal cannot obtain any utility more than the optimal utility in the non-learning model plus the agent's swap regret. But (3) if the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can do significantly better than the non-learning model. These general results not only refine previous results in Stackelberg games and contract design with learning agents but also lead to new results for Bayesian persuasion with a learning agent.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# 二次元構造環境における巨大原子との脱コヒーレンスを回避する Avoiding decoherence with giant atoms in a two-dimensional structured environment ( http://arxiv.org/abs/2402.10879v2 ) ライセンス: Link先を確認	Emil Raaholt Ingelsten, Anton Frisk Kockum, Ariadna Soro,	(参考訳) 巨大原子は、複数の離散点で光にカップリングできる量子エミッタである。そのような原子は1次元の導波路を介してデコヒーリングすることなく相互作用することが示されている。本稿では,有限エネルギーバンドとバンドギャップを特徴とする2次元2乗格子に結合した巨大原子がどのように振る舞うかを考察する。特に、巨大原子が脱コヒーレンスを避けるために、連続体(BIC)における境界状態が果たす役割について述べる。数値計算法を開発することにより、系の力学を解明し、1つの巨大原子内での干渉BICの出現と、多くの巨大原子間での振動BICの出現を示すことができる。このようにして、2次元格子のデコヒーレンスから保護される原子結合点の幾何学的配置を求める。これらの工学的な結果から、光と物質の間の相互作用は量子シミュレーションや量子情報処理に応用できる可能性がある。 Giant atoms are quantum emitters that can couple to light at multiple discrete points. Such atoms have been shown to interact without decohering via a one-dimensional waveguide. Here, we study how giant atoms behave when coupled to a two-dimensional square lattice of coupled cavities, an environment characterized by a finite energy band and band gaps. In particular, we describe the role that bound states in the continuum (BICs) play in how giant atoms avoid decoherence. By developing numerical methods, we are able to investigate the dynamics of the system and show the appearance of interfering BICs within a single giant atom, as well as oscillating BICs between many giant atoms. In this way, we find the geometric arrangements of atomic coupling points that yield protection from decoherence in the two-dimensional lattice. These results on engineering the interaction between light and matter may find applications in quantum simulation and quantum information processing.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# セトロイドを用いた最小ベイズリスク復号 Centroid-Based Efficient Minimum Bayes Risk Decoding ( http://arxiv.org/abs/2402.11197v2 ) ライセンス: Link先を確認	Hiroyuki Deguchi, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe, Hideki Tanaka, Masao Utiyama,	(参考訳) 最小ベイズリスク(MBR)復号化は,人間の評価と高い相関性を持つ神経メトリックであるCOMETを用いて,最先端の翻訳性能を達成した。しかし、MBR復号法は、翻訳仮説と全ての参照翻訳の間の期待スコアを計算するため、2次時間を必要とする。我々は,MBRデコーディングの高速化を図るために,セントロイド型MBRデコーディング(CBMBR)を提案する。提案手法は特徴空間内の参照翻訳をクラスタリングし,各クラスタのセントロイドを用いてスコアを算出する。実験の結果,我々のCBMBRは期待スコア計算の復号速度を5.7倍に向上するだけでなく,WMT'22 En$\leftrightarrow$Ja, En$\leftrightarrow$De, En$\leftrightarrow$Zh, WMT'23 En$\leftrightarrow$Jaの翻訳品質において,最大0.5 COMETのバニラMBR復号性能を向上した。 Minimum Bayes risk (MBR) decoding achieved state-of-the-art translation performance by using COMET, a neural metric that has a high correlation with human evaluation. However, MBR decoding requires quadratic time since it computes the expected score between a translation hypothesis and all reference translations. We propose centroid-based MBR (CBMBR) decoding to improve the speed of MBR decoding. Our method clusters the reference translations in the feature space, and then calculates the score using the centroids of each cluster. The experimental results show that our CBMBR not only improved the decoding speed of the expected score calculation 5.7 times, but also outperformed vanilla MBR decoding in translation quality by up to 0.5 COMET in the WMT'22 En$\leftrightarrow$Ja, En$\leftrightarrow$De, En$\leftrightarrow$Zh, and WMT'23 En$\leftrightarrow$Ja translation tasks.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# LLMが検索強化を必要とするのはいつか? LLMの過信を緩和することで検索増強に役立つ When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation ( http://arxiv.org/abs/2402.11457v2 ) ライセンス: Link先を確認	Shiyu Ni, Keping Bi, Jiafeng Guo, Xueqi Cheng,	(参考訳) 大きな言語モデル(LLM)は、特定の知識を持っていないことや、そのようなケースで明らかな答えを提供する傾向があることを知るのが困難であることが判明した。 Retrieval Augmentation (RA)はLLMの幻覚を緩和するために広く研究されている。しかし、余分なオーバーヘッドと保証されていない検索品質のため、RAを常に実行するのが最適ではないかもしれない。簡単な考え方は、LLMが質問に対して不確実である場合にのみ検索を行うことである。このことは、LLMが知識境界を知覚しRAを支援する能力を高める動機となります。本稿ではまず,LSMのそのような能力を定量的に測定し,その過信を確かめる。そこで本研究では,質問に対するLCMの確かさが,外部検索情報への依存とどのように相関するかを考察する。本稿では,LLMの知識境界に対する認識を高めるためのいくつかの手法を提案する。さらに、これらの手法により、LLMはより少ない検索呼び出しでRAの同等またはそれ以上の性能を達成することができる。 Large Language Models (LLMs) have been found to have difficulty knowing they do not possess certain knowledge and tend to provide specious answers in such cases. Retrieval Augmentation (RA) has been extensively studied to mitigate LLMs' hallucinations. However, due to the extra overhead and unassured quality of retrieval, it may not be optimal to conduct RA all the time. A straightforward idea is to only conduct retrieval when LLMs are uncertain about a question. This motivates us to enhance the LLMs' ability to perceive their knowledge boundaries to help RA. In this paper, we first quantitatively measure LLMs' such ability and confirm their overconfidence. Then, we study how LLMs' certainty about a question correlates with their dependence on external retrieved information. We propose several methods to enhance LLMs' perception of knowledge boundaries and show that they are effective in reducing overconfidence. Additionally, equipped with these methods, LLMs can achieve comparable or even better performance of RA with much fewer retrieval calls.	翻訳日:2024-06-12 22:32:43 公開日:2024-06-11
# 雑音相関による量子ワイアタップチャネル符号化 Quantum Wiretap Channel Coding Assisted by Noisy Correlation ( http://arxiv.org/abs/2402.13194v2 ) ライセンス: Link先を確認	Minglai Cai, Andreas Winter,	(参考訳) ユーザ(Sender Alice, receiver Bob, eavesdropper Eve)が共有量子状態のリソースにアクセスでき、さらにチャネルの入力と出力が可能である。極端なケースは、アリスとボブの間の最大エンタングルメントまたは秘密鍵であり、どちらも一度だけメッセージをパディングすることができる。しかし、ここでは、ワイヤタップチャネルと共有ステートの両方が一般的です。その他の極端なケースでは、電信チャネルとその私的容量(N. Cai, A. Winter and R. W. Yeung, Probl. Inform. Transm. 40(4):318-336, 2004)を回復する。我々は、与えられたリソースの状態を使って、秘密の古典的コミュニケーションのためのコードを構築する方法を示す。我々の主な成果は、補助的なプライベートキャパシティの低い境界であり、それは漸近的にマルチレターのコンバースに合致し、あらゆる過去の結果を特別なケースとして包含する。 We consider the private classical capacity of a quantum wiretap channel, where the users (sender Alice, receiver Bob, and eavesdropper Eve) have access to the resource of a shared quantum state, additionally to their channel inputs and outputs. An extreme case is maximal entanglement or a secret key between Alice and Bob, both of which would allow for onetime padding the message. But here both the wiretap channel and the shared state are general. In the other extreme case that the state is trivial, we recover the wiretap channel and its private capacity [N. Cai, A. Winter and R. W. Yeung, Probl. Inform. Transm. 40(4):318-336, 2004]. We show how to use the given resource state to build a code for secret classical communication. Our main result is a lower bound on the assisted private capacity, which asymptotically meets the multi-letter converse and which encompasses all sorts of previous results as special cases.	翻訳日:2024-06-12 22:22:49 公開日:2024-06-11
# 中間子振動物理におけるマクロリアリズムのテスト Tests of macrorealism in meson oscillation physics ( http://arxiv.org/abs/2402.13299v2 ) ライセンス: Link先を確認	Massimo Blasone, Fabrizio Illuminati, Luciano Petruzziello, Kyrylo Simonov, Luca Smaldone,	(参考訳) マクロリアリズム(Macrorealism)は、任意の時点で系が一定の状態を占め、系の進化は量子力学の原理とは対照的に、その上で行われた測定とは独立である、という直感的な概念を定式化している。本研究では,3時間レゲット-ガーグ型不等式と非シグナリング・イン・タイムとアロー・オブ・アロー・オブ・タイムの条件を中間子振動の文脈内で比較した。以上より,初期条件下ではLeggett--Garg不等式は認められなかった。しかし、符号なしの時間条件は破られることが判明し、中間子物理学の解析にマクロ現実論的な記述を適用することは不可能であることが判明した。 Macrorealism formalizes the intuitive notion that at any given time the system occupies a definite state and that the evolution of the system is independent of the measurements performed on it, in contrast to the principles of quantum mechanics. In this study, we carry out a comparative analysis between three-time Leggett--Garg-type inequalities and the conditions of no-signaling-in-time and arrow-of-time for macrorealism within the context of meson oscillations. Our findings indicate that, under given initial conditions, no violations of Leggett--Garg inequalities are observed. However, no-signaling-in-time conditions are found to be violated, thereby revealing the impossibility of applying a macrorealistic description to the analysis of meson physics.	翻訳日:2024-06-12 22:22:49 公開日:2024-06-11
# コード生成のためのテスト駆動開発 Test-Driven Development for Code Generation ( http://arxiv.org/abs/2402.13521v2 ) ライセンス: Link先を確認	Noble Saji Mathews, Meiyappan Nagappan,	(参考訳) 最近のLarge Language Models (LLM)は、問題ステートメントから直接コードスニペットを生成する重要な機能を示している。この自動化プロセスは、要求に応じてコードを記述することの多い、従来のヒューマン主導のソフトウェア開発を反映している。歴史的に、テスト駆動開発(TDD)はそのメリットを証明し、開発者は機能コードの前にテストを書かなければなりません。 LLMベースのコード生成にTDD原則を適用することで、開発者は事前に定義されたテストに対して生成されたコードの有効性を検証することができる。本稿では,TDDをAI支援コード生成プロセスに組み込む方法について検討する。 GPT-4 や Llama 3 のような LLM に問題文に加えてテストを提供することでコード生成結果が向上するという仮説を実験的に評価した。 MBPPやHumanEvalといった関数レベルのコード生成ベンチマークを実験した。私たちの結果は、テストケースを含むことが、プログラミングの課題を解決する上で、より高い成功をもたらすことを一貫して示しています。 TDDはLLMが生成したコードが要求を効果的に捉えるのに役立つ、有望なパラダイムである、と私たちは主張しています。 Recent Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements. This increasingly automated process mirrors traditional human-led software development, where code is often written in response to a requirement. Historically, Test-Driven Development (TDD) has proven its merit, requiring developers to write tests before the functional code, ensuring alignment with the initial problem statements. Applying TDD principles to LLM-based code generation offers one distinct benefit: it enables developers to verify the correctness of generated code against predefined tests. This paper investigates if and how TDD can be incorporated into AI-assisted code-generation processes. We experimentally evaluate our hypothesis that providing LLMs like GPT-4 and Llama 3 with tests in addition to the problem statements enhances code generation outcomes. We experimented with established function-level code generation benchmarks such as MBPP and HumanEval. Our results consistently demonstrate that including test cases leads to higher success in solving programming challenges. We assert that TDD is a promising paradigm for helping ensure that the code generated by LLMs effectively captures the requirements.	翻訳日:2024-06-12 22:22:49 公開日:2024-06-11
# Clifford-Steerable Convolutional Neural Networks Clifford-Steerable Convolutional Neural Networks ( http://arxiv.org/abs/2402.14730v2 ) ライセンス: Link先を確認	Maksim Zhdanov, David Ruhe, Maurice Weiler, Ana Lucic, Johannes Brandstetter, Patrick Forré,	(参考訳) Clifford-Steerable Convolutional Neural Networks (CS-CNNs) は$\mathrm{E}(p, q)$-equivariant CNNの新しいクラスである。 CS-CNN は擬ユークリッド空間 $\mathbb{R}^{p,q}$ 上の乗ベクトル場を処理する。例えば、$\mathrm{E}(3)$-equivariance on $\mathbb{R}^3$ と Poincar\'e-equivariance on Minkowski spacetime $\mathbb{R}^{1,3}$ をカバーしている。我々のアプローチはクリフォード群同変ニューラルネットワークによる$\mathrm{O}(p,q)$-steerable kernelの暗黙のパラメトリゼーションに基づいている。我々は流体力学のベースライン法と相対論的電磁力学予測タスクを著しく、一貫して上回っている。 We present Clifford-Steerable Convolutional Neural Networks (CS-CNNs), a novel class of $\mathrm{E}(p, q)$-equivariant CNNs. CS-CNNs process multivector fields on pseudo-Euclidean spaces $\mathbb{R}^{p,q}$. They cover, for instance, $\mathrm{E}(3)$-equivariance on $\mathbb{R}^3$ and Poincar\'e-equivariance on Minkowski spacetime $\mathbb{R}^{1,3}$. Our approach is based on an implicit parametrization of $\mathrm{O}(p,q)$-steerable kernels via Clifford group equivariant neural networks. We significantly and consistently outperform baseline methods on fluid dynamics as well as relativistic electrodynamics forecasting tasks.	翻訳日:2024-06-12 22:22:49 公開日:2024-06-11
# 計画による予算制約ツール学習 Budget-Constrained Tool Learning with Planning ( http://arxiv.org/abs/2402.15960v2 ) ライセンス: Link先を確認	Yuanhang Zheng, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu,	(参考訳) ツール学習への集中的な取り組みにもかかわらず、特定の予算制約の中でユーザクエリを解決することに焦点を当てた、予算制約のツール学習の問題は、広く見過ごされてきた。本稿では,予算制約ツール学習のための新しい手法を提案する。当社のアプローチでは、ツールを利用する前に、予算制約の下で望ましいプランを作成します。この計画では、実現可能なツールの概要と、採用可能な最大回数を概説し、大規模言語モデルのツール学習プロセスの概要を概説する。これにより、より広い視点から予算を割り当てることができます。余分なコストを伴わずに計画を作成するためには、まず、過去の経験に基づいて候補ツールの有用性を見積もることを提案する。その後、計画の定式化に動的プログラミングを用いる。実験により,本手法を各種ツール学習手法に統合し,厳格な予算制約下での有効性を著しく向上できることを示した。 Despite intensive efforts devoted to tool learning, the problem of budget-constrained tool learning, which focuses on resolving user queries within a specific budget constraint, has been widely overlooked. This paper proposes a novel method for budget-constrained tool learning. Our approach involves creating a preferable plan under the budget constraint before utilizing the tools. This plan outlines the feasible tools and the maximum number of times they can be employed, offering a comprehensive overview of the tool learning process for large language models. This allows them to allocate the budget from a broader perspective. To devise the plan without incurring significant extra costs, we suggest initially estimating the usefulness of the candidate tools based on past experience. Subsequently, we employ dynamic programming to formulate the plan. Experimental results demonstrate that our method can be integrated with various tool learning methods, significantly enhancing their effectiveness under strict budget constraints.	翻訳日:2024-06-12 22:22:49 公開日:2024-06-11
# Hybrid$^2$ Neural ODE Causal Modeling とグリセミック応答への応用 Hybrid$^2$ Neural ODE Causal Modeling and an Application to Glycemic Response ( http://arxiv.org/abs/2402.17233v2 ) ライセンス: Link先を確認	Bob Junyi Zou, Matthew E. Levine, Dessi P. Zaharieva, Ramesh Johari, Emily B. Fox,	(参考訳) フレキシブルで表現力のあるニューラルネットワークコンポーネントを備えたメカニスティックODEベースのダイナミクスを構成するハイブリッドモデルは、特にそのようなODEベースのモデリングが重要な解釈可能性と検証された因果基底(例えば、対実的推論)を提供する科学領域において、急速に人気が高まっている。メカニスティックモデルの導入は、小さなデータセットや部分的に観察された複雑なシステムから学ぶ際に重要な、標準的なブラックボックスモデリングアプローチにおける帰納的バイアスを与える。残念なことに、ハイブリッドモデルがより柔軟になるにつれて、力学モデルによって提供される因果基底は急速に失われる。この問題は、ある介入に対する治療効果の「emph{ ranking}」という、たとえ正確な治療効果が分かっていなくても、他のドメイン知識の共通源を活用することで解決する。我々は、この情報を、標準的な予測損失と組み合わせて、因果的に有効なハイブリッドモデルへの学習を偏見する \emph{hybrid loss} にエンコードする。 1型糖尿病患者の運動後の血糖動態をモデル化する難しい課題において, 最先端の予測性能「emph{and}因果妥当性」を達成できることを実証する。 Hybrid models composing mechanistic ODE-based dynamics with flexible and expressive neural network components have grown rapidly in popularity, especially in scientific domains where such ODE-based modeling offers important interpretability and validated causal grounding (e.g., for counterfactual reasoning). The incorporation of mechanistic models also provides inductive bias in standard blackbox modeling approaches, critical when learning from small datasets or partially observed, complex systems. Unfortunately, as the hybrid models become more flexible, the causal grounding provided by the mechanistic model can quickly be lost. We address this problem by leveraging another common source of domain knowledge: \emph{ranking} of treatment effects for a set of interventions, even if the precise treatment effect is unknown. We encode this information in a \emph{causal loss} that we combine with the standard predictive loss to arrive at a \emph{hybrid loss} that biases our learning towards causally valid hybrid models. We demonstrate our ability to achieve a win-win, state-of-the-art predictive performance \emph{and} causal validity, in the challenging task of modeling glucose dynamics post-exercise in individuals with type 1 diabetes.	翻訳日:2024-06-12 22:22:49 公開日:2024-06-11
# reBandit:大麻使用を減らすためのランダムエフェクトベースのオンラインRLアルゴリズム reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use ( http://arxiv.org/abs/2402.17739v2 ) ライセンス: Link先を確認	Susobhan Ghosh, Yongyi Guo, Pei-Yao Hung, Lara Coughlin, Erin Bonar, Inbal Nahum-Shani, Maureen Walton, Susan Murphy,	(参考訳) 大麻の使用頻度の増大と関連する大麻使用障害(CUD)は、世界的に公衆衛生上の大きな課題となっている。特に若年層(EA:18～25歳)では、特に大きな治療のギャップがあり、大麻の使用とCUDは、2030年の国連持続可能な開発目標(SDG)において重要な目標である。本研究では、モバイルヘルス研究で活用されるreBandit(reBandit)と呼ばれるオンライン強化学習(RL)アルゴリズムを開発し、EAにおける大麻使用を減らすことを目的とした、個人化されたモバイルヘルス介入を提供する。 reBanditは無作為な効果と情報的ベイジアン事前を利用して、騒々しいモバイルの健康環境で迅速かつ効率的に学習する。さらに、reBanditはEmpirical Bayesと最適化技術を使って、ハイパーパラメータをオンラインに自動更新する。提案アルゴリズムの性能を評価するため,先行研究から得られたデータを用いてシミュレーションテストベッドを構築し,モバイル健康研究においてよく用いられるアルゴリズムと比較した。我々は,reBanditがすべてのベースラインアルゴリズムと同等あるいは同等に動作することを示すとともに,シミュレーション環境における人口の不均一性の増加に伴い,性能ギャップが拡大し,多様な研究参加者に適応する能力が証明された。 The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an online reinforcement learning (RL) algorithm called reBandit which will be utilized in a mobile health study to deliver personalized mobile health interventions aimed at reducing cannabis use among EAs. reBandit utilizes random effects and informative Bayesian priors to learn quickly and efficiently in noisy mobile health environments. Moreover, reBandit employs Empirical Bayes and optimization techniques to autonomously update its hyper-parameters online. To evaluate the performance of our algorithm, we construct a simulation testbed using data from a prior study, and compare against commonly used algorithms in mobile health studies. We show that reBandit performs equally well or better than all the baseline algorithms, and the performance gap widens as population heterogeneity increases in the simulation environment, proving its adeptness to adapt to diverse population of study participants.	翻訳日:2024-06-12 22:22:49 公開日:2024-06-11
# 確率過程の因果発見における符号カーネル条件独立試験 Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes ( http://arxiv.org/abs/2402.18477v2 ) ライセンス: Link先を確認	Georg Manten, Cecilia Casolo, Emilio Ferrucci, Søren Wengel Mogensen, Cristopher Salvi, Niki Kilbertus,	(参考訳) 観測データから確率力学系の根底にある因果構造を推定することは、科学や健康、金融といった分野において大きな可能性を秘めている。このような過程は確率微分方程式(SDE)を通して正確にモデル化されることが多く、「どの変数が他の変数の微分に入るか」によって因果関係を暗示する。本稿では、近年のシグネチャカーネルの進歩を活用して、SDEのソリューションであるパス空間(path-space)に基づくカーネルベースの条件独立性テスト(CI)を開発する。提案したCIテストのパス空間に対する既存手法と比較して,厳密に優れた性能を示し,理論的整合性を提供する。そこで我々は,非巡回確率力学系(自己ループが可能)に対する制約に基づく因果探索アルゴリズムを開発し,時間的情報を利用して有向非巡回グラフ全体を復元する。忠実さとCIのオラクルを仮定すると、私たちのアルゴリズムは健全で完全なものであることが示されます。開発したCIテストと因果発見アルゴリズムが、さまざまな設定でベースラインを上回っていることを実証的に検証します。 Inferring the causal structure underlying stochastic dynamical systems from observational data holds great promise in domains ranging from science and health to finance. Such processes can often be accurately modeled via stochastic differential equations (SDEs), which naturally imply causal relationships via "which variables enter the differential of which other variables". In this paper, we develop a kernel-based test of conditional independence (CI) on "path-space" -- e.g., solutions to SDEs, but applicable beyond that -- by leveraging recent advances in signature kernels. We demonstrate strictly superior performance of our proposed CI test compared to existing approaches on path-space and provide theoretical consistency results. Then, we develop constraint-based causal discovery algorithms for acyclic stochastic dynamical systems (allowing for self-loops) that leverage temporal information to recover the entire directed acyclic graph. Assuming faithfulness and a CI oracle, we show that our algorithms are sound and complete. We empirically verify that our developed CI test in conjunction with the causal discovery algorithms outperform baselines across a range of settings.	翻訳日:2024-06-12 22:22:49 公開日:2024-06-11
# Log Neural Controlled Differential Equations: The Lie Brackets makes difference Log Neural Controlled Differential Equations: The Lie Brackets Make a Difference ( http://arxiv.org/abs/2402.18512v2 ) ライセンス: Link先を確認	Benjamin Walker, Andrew D. McLeod, Tiexin Qin, Yichuan Cheng, Haoliang Li, Terry Lyons,	(参考訳) 制御微分方程式(CDE)のベクトル場は、制御経路と解経路の進化の関係を記述する。ニューラルCDE(NCDE)は、時系列データを制御パスからの観測として扱い、ニューラルネットワークを使用してCDEのベクトルフィールドをパラメータ化し、ソリューションパスを継続的な進化した隠れ状態として使用する。それらの定式化によって不規則サンプリングレートが堅牢になるため、NCDEは実世界のデータをモデル化するための強力なアプローチである。ニューラル粗微分方程式(NRDE)に基づいて,新しい,効果的かつ効率的なNCDE学習法であるLog-NCDEを導入する。 Log-NCDEのコアコンポーネントはLog-ODEメソッドである。ログNCDEは、NCDE、NRDE、線形リカレントユニット、S5、MAMBAを最大50$,000ドルの観測値を持つ多変量時系列データセットで上回っている。 The vector field of a controlled differential equation (CDE) describes the relationship between a control path and the evolution of a solution path. Neural CDEs (NCDEs) treat time series data as observations from a control path, parameterise a CDE's vector field using a neural network, and use the solution path as a continuously evolving hidden state. As their formulation makes them robust to irregular sampling rates, NCDEs are a powerful approach for modelling real-world data. Building on neural rough differential equations (NRDEs), we introduce Log-NCDEs, a novel, effective, and efficient method for training NCDEs. The core component of Log-NCDEs is the Log-ODE method, a tool from the study of rough paths for approximating a CDE's solution. Log-NCDEs are shown to outperform NCDEs, NRDEs, the linear recurrent unit, S5, and MAMBA on a range of multivariate time series datasets with up to $50{,}000$ observations.	翻訳日:2024-06-12 22:22:49 公開日:2024-06-11
# ディジタル法科学調査効率向上のための大規模言語モデルの可能性を探る Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency ( http://arxiv.org/abs/2402.19366v2 ) ライセンス: Link先を確認	Akila Wickramasekara, Frank Breitinger, Mark Scanlon,	(参考訳) デジタル法医学的分析を必要とするケースが増えていることで、法執行機関が迅速に調査を行う能力について懸念が高まっている。そこで本稿では,Large Language Models (LLMs) をデジタル法科学調査に統合し,これらの課題に対処する可能性と有効性について検討する。既存のデジタル法学モデル,ツール,LLM,ディープラーニング技術,LLMを対象とする総合的な文献レビューが実施されている。このレビューでは、既存のデジタル法科学プロセスにおける現在の課題を特定し、LCMの導入の障害と可能性について検討する。結論として、デジタル法医学におけるLLMの採用は、適切な制約を伴って、調査効率を改善し、トレーサビリティを改善し、法執行機関が直面する技術的および司法的障壁を軽減する可能性があると主張している。 The growing number of cases that require digital forensic analysis raises concerns about the ability of law enforcement to conduct investigations promptly. Consequently, this paper delves into the potential and effectiveness of integrating Large Language Models (LLMs) into digital forensic investigation to address these challenges. A comprehensive literature review is carried out, encompassing existing digital forensic models, tools, LLMs, deep learning techniques, and the use of LLMs in investigations. The review identifies current challenges within existing digital forensic processes and explores both the obstacles and possibilities of incorporating LLMs. In conclusion, the study asserts that the adoption of LLMs in digital forensics, with appropriate constraints, has the potential to improve investigation efficiency, improve traceability, and alleviate technical and judicial barriers faced by law enforcement entities.	翻訳日:2024-06-12 22:22:49 公開日:2024-06-11
# ICC:マルチモーダルデータセットキュレーションのための画像キャプションコンクリートの定量化 ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation ( http://arxiv.org/abs/2403.01306v3 ) ライセンス: Link先を確認	Moran Yanuka, Morris Alper, Hadar Averbuch-Elor, Raja Giryes,	(参考訳) ペア化されたテキストイメージデータに対するWebスケールのトレーニングは、ますますマルチモーダルな学習の中心になりつつある。標準的なデータフィルタリングアプローチは、ミスマッチしたテキストイメージペアを削除することに成功しているが、セマンティックな関連性はあるものの、非常に抽象的で主観的なテキストを許可する。これらのアプローチには、ノイズの多いデータセットで学習するための最も強力な信号を提供する最も具体的なサンプルを分離する、きめ細かい機能がない。そこで本研究では,画像参照のない字幕テキストを評価可能な新しい指標である画像キャプション具体性を提案し,その具体性や関連性をマルチモーダル学習に用いた。提案手法は,マルチモーダル表現における視覚的セマンティック情報損失を測定するために,強力な基礎モデルを利用する。本研究は, 単語・文レベルの両文の具体性評価に強く関連していることを示す。さらに、ICCを用いたキュレーションは、既存のアプローチを補完するものとして、マルチモーダルなWebスケールデータセットから高品質なサンプルを選択することに成功し、リソース制約のある設定での効率的なトレーニングを可能にした。 Web-scale training on paired text-image data is becoming increasingly central to multimodal learning, but is challenged by the highly noisy nature of datasets in the wild. Standard data filtering approaches succeed in removing mismatched text-image pairs, but permit semantically related but highly abstract or subjective text. These approaches lack the fine-grained ability to isolate the most concrete samples that provide the strongest signal for learning in a noisy dataset. In this work, we propose a new metric, image caption concreteness, that evaluates caption text without an image reference to measure its concreteness and relevancy for use in multimodal learning. Our approach leverages strong foundation models for measuring visual-semantic information loss in multimodal representations. We demonstrate that this strongly correlates with human evaluation of concreteness in both single-word and sentence-level texts. Moreover, we show that curation using ICC complements existing approaches: It succeeds in selecting the highest quality samples from multimodal web-scale datasets to allow for efficient training in resource-constrained settings.	翻訳日:2024-06-12 22:13:02 公開日:2024-06-11
# 3DGStream:フォトリアリスティックフリー視点ビデオの効率的なストリーミングのための3Dガウスのオンザフライトレーニング 3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos ( http://arxiv.org/abs/2403.01444v4 ) ライセンス: Link先を確認	Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, Wei Xing,	(参考訳) マルチビュービデオからダイナミックなシーンのフォトリアリスティックなフリー視点ビデオ(FVV)を構築することは、依然として困難な取り組みだ。現在のニューラルレンダリング技術によって達成された顕著な進歩にもかかわらず、これらの手法は一般的にオフライントレーニングのために完全なビデオシーケンスを必要とし、リアルタイムレンダリングができない。これらの制約に対処するために,実世界のダイナミックシーンの高速FVVストリーミングを目的とした3DGStreamを提案する。提案手法は,12秒以内のフレーム毎の高速な再構築と,200FPSでのリアルタイムレンダリングを実現する。具体的には,3Dガウス(3DG)を用いてシーンを表現している。 1フレームあたりの3DGを直接最適化する、na\\"iveアプローチの代わりに、コンパクトなニューラルトランスフォーメーションキャッシュ(NTC)を使用して、3DGの変換とローテーションをモデル化し、各FVVフレームに必要なトレーニング時間とストレージを著しく短縮する。さらに,動的シーンにおける創発的オブジェクトを扱うための適応的な3DG付加戦略を提案する。実験により、3DGStreamは、最先端の手法と比較してレンダリング速度、画質、トレーニング時間、モデルストレージにおいて競争力を発揮することが示された。 Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specifically, we utilize 3D Gaussians (3DGs) to represent the scene. Instead of the na\"ive approach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strategy to handle emerging objects in dynamic scenes. Experiments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods.	翻訳日:2024-06-12 22:13:02 公開日:2024-06-11
# 不均衡処理を施した2次ロバスト推定器の校正 Calibrating doubly-robust estimators with unbalanced treatment assignment ( http://arxiv.org/abs/2403.01585v2 ) ライセンス: Link先を確認	Daniele Ballinari,	(参考訳) 機械学習手法、特にDouble Machine Learning (DML) 推定器 (Chernozhukov et al , 2018) は平均治療効果(ATE)の推定でますます人気がある。しかし、データセットは、わずかな観測しか処理されない不均衡な処理の割り当てを示すことが多く、不安定な確率スコアの推定につながる。そこで本研究では,DML推定器の簡易拡張について提案する。本論文は,DML推定器の漸近特性が維持されていることを示す理論的結果を提供する。シミュレーション研究は、推定器の有限サンプル性能を例示する。 Machine learning methods, particularly the double machine learning (DML) estimator (Chernozhukov et al., 2018), are increasingly popular for the estimation of the average treatment effect (ATE). However, datasets often exhibit unbalanced treatment assignments where only a few observations are treated, leading to unstable propensity score estimations. We propose a simple extension of the DML estimator which undersamples data for propensity score modeling and calibrates scores to match the original distribution. The paper provides theoretical results showing that the estimator retains the DML estimator's asymptotic properties. A simulation study illustrates the finite sample performance of the estimator.	翻訳日:2024-06-12 22:13:02 公開日:2024-06-11
# 冷却による量子計算 Quantum Computation by Cooling ( http://arxiv.org/abs/2403.01760v4 ) ライセンス: Link先を確認	Jaeyoon Cho,	(参考訳) 断熱量子計算は、解をカプセル化した多体基底状態を見つけることによって、計算問題を解くことを目的としたパラダイムモデルである。しかし、複雑な多体ハミルトニアンのスペクトルギャップによる断熱的進化の利用は、その分析をばかげている。代わりに、断熱進化の最終的なギャップを埋めた系を直接冷却することは可能であるが、一般の地上でのそのようなスキームの分析は欠落している。ここでは、この目的のために特定のハミルトンモデルを提案する。このスキームは空洞冷却にインスパイアされ、ゼロ温度貯水池のエミュレーションを含む。アシラ貯水池の繰り返し廃棄はシステムのエントロピーを抽出し、システムをその基底状態に向かって駆動する。同時に、廃棄された量子ビットの測定は、リターンとしてシステムのエネルギーレベル構造を示唆する。この冷却法に基づく量子計算は、その計算能力において量子回路に基づく計算と等価であることを示す。次に、組合せ最適化問題に対するいくつかの実例を用いて、このスキームを例示する。最初の例では、冷却は任意の局所エネルギーミニマムから解放され、いくつかの改良を加えてグロバーの探索アルゴリズムにスキームを還元する。第2の例では、冷却は豊富な局所エネルギーミニマに悩まされる。これを回避するために、ハミルトニアンに、局所的なミニマに閉じ込められた集団が高次遷移によってトンネルアウトできるようなメカニズムを埋め込む。このアイデアを,特定の組合せ最適化問題に対する数値シミュレーションで支持する。また、量子多体基底状態の調製への応用についても議論し、冷却の時間スケールを決定する上で、スペクトルギャップが重要な要素であると主張した。 Adiabatic quantum computation is a paradigmatic model aiming to solve a computational problem by finding the many-body ground state encapsulating the solution. However, its use of an adiabatic evolution depending on the spectral gap of an intricate many-body Hamiltonian makes its analysis daunting. While it is plausible to directly cool the final gapped system of the adiabatic evolution instead, the analysis of such a scheme on a general ground is missing. Here, we propose a specific Hamiltonian model for this purpose. The scheme is inspired by cavity cooling, involving the emulation of a zero-temperature reservoir. Repeated discarding of ancilla reservoir qubits extracts the entropy of the system, driving the system toward its ground state. At the same time, the measurement of the discarded qubits hints at the energy level structure of the system as a return. We show that quantum computation based on this cooling procedure is equivalent in its computational power to the one based on quantum circuits. We then exemplify the scheme with a few illustrative use cases for combinatorial optimization problems. In the first example, the cooling is free from any local energy minima, reducing the scheme to Grover's search algorithm with a few improvements. In the second example, the cooling suffers from abundant local energy minima. To circumvent this, we implant a mechanism in the Hamiltonian so that the population trapped in the local minima can tunnel out by high-order transitions. We support this idea with a numerical simulation for a particular combinatorial optimization problem. We also discuss its application to preparing quantum many-body ground states, arguing that the spectral gap is a crucial factor in determining the time scale of the cooling.	翻訳日:2024-06-12 22:13:02 公開日:2024-06-11
# LLMにおけるアンラーニングのためのガードレールベースライン Guardrail Baselines for Unlearning in LLMs ( http://arxiv.org/abs/2403.03329v3 ) ライセンス: Link先を確認	Pratiksha Thaker, Yash Maurya, Shengyuan Hu, Zhiwei Steven Wu, Virginia Smith,	(参考訳) 最近の研究は、ファインタニングが大きな言語モデルから「未学習」の概念への有望なアプローチであることを実証している。しかし、ファインチューニングは、一連の例を生成することと、モデルを更新するためにファインチューニングの繰り返しを実行することの両方を必要とするため、コストがかかる可能性がある。本研究では, ファインタニングに匹敵する学習結果が得られることを示す。我々は、より計算集約的な微調整法の性能を評価する際に、これらの軽量なベースラインを調べることを推奨する。プロンプトやフィルタリングといった手法が、未学習の問題に対する普遍的な解決策であるとは主張していませんが、我々の研究は、ガードレールと微調整のパワーをより分離できる評価指標の必要性を示唆し、既存のメトリクスやベンチマークにおいて、ガードレールが意図せぬ振る舞いを露呈するシナリオを強調しています。 Recent work has demonstrated that finetuning is a promising approach to 'unlearn' concepts from large language models. However, finetuning can be expensive, as it requires both generating a set of examples and running iterations of finetuning to update the model. In this work, we show that simple guardrail-based approaches such as prompting and filtering can achieve unlearning results comparable to finetuning. We recommend that researchers investigate these lightweight baselines when evaluating the performance of more computationally intensive finetuning methods. While we do not claim that methods such as prompting or filtering are universal solutions to the problem of unlearning, our work suggests the need for evaluation metrics that can better separate the power of guardrails vs. finetuning, and highlights scenarios where guardrails expose possible unintended behavior in existing metrics and benchmarks.	翻訳日:2024-06-12 22:13:02 公開日:2024-06-11
# Black-Box $k$-to-1$-PCAの削減:理論と応用 Black-Box $k$-to-$1$-PCA Reductions: Theory and Applications ( http://arxiv.org/abs/2403.03905v3 ) ライセンス: Link先を確認	Arun Jambulapati, Syamantak Kumar, Jerry Li, Shourya Pandey, Ankit Pensia, Kevin Tian,	(参考訳) k$-principal component analysis(k$-PCA)問題は基本的なアルゴリズムプリミティブであり、データ解析や次元減少アプリケーションで広く利用されている。統計的設定では、$k$-PCA の目標は、分布の共分散行列のトップ固有空間を特定することである。これらの設定により、ブラックボックスデフレレーション法を$k$-PCAアルゴリズムを設計するためのフレームワークとして分析し、近似されたトップ固有ベクトルを返すブラックボックス1ドル$-PCAオラクルを介して未知のターゲット行列へのアクセスを2つの近似概念でモデル化する。当然、$k$-PCAアルゴリズム設計における最も自然な還元に基づくアプローチであるにもかかわらず、そのようなブラックボックスメソッドは、1ドル$-PCAオークルを$k$倍と再帰的に呼ぶが、以前はよく理解されていなかった。我々の主な貢献は、$k$-PCAのデフレ法における近似パラメータの劣化に対する、よりシャープな境界である。 ePCA (Energy PCA) と呼ぶ近似の二次形式として、デフレ法はパラメータ損失を伴わないことを示す。 cPCA(correlation PCA)という別のよく研究された近似概念に対して、デフレ法が実現可能なパラメータ構造を厳しく特徴づける。さらに、全ての実現可能なレシエーションにおいて、$k$-cPCAデフレアルゴリズムは、任意の定数$k$に対して漸近パラメータ損失を生じないことを示す。我々は,現状の$k$-PCAアルゴリズムを解析に役立てるためにフレームワークを適用し,サンプルの複雑さの先行処理を$\mathsf{poly}(k)$ factorで改善した。 The $k$-principal component analysis ($k$-PCA) problem is a fundamental algorithmic primitive that is widely-used in data analysis and dimensionality reduction applications. In statistical settings, the goal of $k$-PCA is to identify a top eigenspace of the covariance matrix of a distribution, which we only have black-box access to via samples. Motivated by these settings, we analyze black-box deflation methods as a framework for designing $k$-PCA algorithms, where we model access to the unknown target matrix via a black-box $1$-PCA oracle which returns an approximate top eigenvector, under two popular notions of approximation. Despite being arguably the most natural reduction-based approach to $k$-PCA algorithm design, such black-box methods, which recursively call a $1$-PCA oracle $k$ times, were previously poorly-understood. Our main contribution is significantly sharper bounds on the approximation parameter degradation of deflation methods for $k$-PCA. For a quadratic form notion of approximation we term ePCA (energy PCA), we show deflation methods suffer no parameter loss. For an alternative well-studied approximation notion we term cPCA (correlation PCA), we tightly characterize the parameter regimes where deflation methods are feasible. Moreover, we show that in all feasible regimes, $k$-cPCA deflation algorithms suffer no asymptotic parameter loss for any constant $k$. We apply our framework to obtain state-of-the-art $k$-PCA algorithms robust to dataset contamination, improving prior work in sample complexity by a $\mathsf{poly}(k)$ factor.	翻訳日:2024-06-12 22:13:02 公開日:2024-06-11
# 最近の大規模視線モデルの有効性評価 Effectiveness Assessment of Recent Large Vision-Language Models ( http://arxiv.org/abs/2403.04306v4 ) ライセンス: Link先を確認	Yao Jiang, Xinyu Yan, Ge-Peng Ji, Keren Fu, Meijun Sun, Huan Xiong, Deng-Ping Fan, Fahad Shahbaz Khan,	(参考訳) 大規模視覚言語モデル(LVLM)の出現は、人工知能の探求において顕著な進歩を示している。しかし、特殊タスクと一般タスクの両方におけるモデルの有効性は、さらなる調査を保証している。本論文は,これらの新モデルを包括的に理解することを目的とした,特殊課題と汎用課題におけるLVLMの能力評価の試みである。専門的なタスクにおけるそれらの有効性を評価するために、我々は、自然、医療、産業という3つの異なるアプリケーションシナリオで、挑戦的なタスクを6つ採用しています。これら6つのタスクには、サリエント/カモフラージュ/透明物体検出、ポリープ検出、皮膚病変検出、工業的異常検出が含まれる。我々は,MiniGPT-v2,LLaVA-1.5,Shikraを含む最近の3つのオープンソースLVLMの視覚的認識および局所化性能について検討した。さらに、前述のLVLMとGPT-4Vを併用した実証的研究を行い、オブジェクトカウント、不条理応答、アベイランス推論、属性認識、空間関係推論を含む一般的なタスクにおいて、それらのマルチモーダル理解能力を評価する。本研究により, これらのLVLMは, 特殊タスクだけでなく, 一般タスクにおいても, 限られた習熟度を示すことが明らかとなった。我々は、この欠陥を深く掘り下げ、特殊タスクにおける認知の制限、物体幻覚、テキスト・ツー・イメージの干渉、複雑な問題における堅牢性の低下など、いくつかの潜在的な要因を明らかにする。本研究は,LVLMの今後の発展に有用な知見を提供し,研究者がLVLMを一般用途と専門用途の両方で改善するのに役立つことを期待する。 The advent of large vision-language models (LVLMs) represents a remarkable advance in the quest for artificial general intelligence. However, the model's effectiveness in both specialized and general tasks warrants further investigation. This paper endeavors to evaluate the competency of popular LVLMs in specialized and general tasks, respectively, aiming to offer a comprehensive understanding of these novel models. To gauge their effectiveness in specialized tasks, we employ six challenging tasks in three different application scenarios: natural, healthcare, and industrial. These six tasks include salient/camouflaged/transparent object detection, as well as polyp detection, skin lesion detection, and industrial anomaly detection. We examine the performance of three recent open-source LVLMs, including MiniGPT-v2, LLaVA-1.5, and Shikra, on both visual recognition and localization in these tasks. Moreover, we conduct empirical investigations utilizing the aforementioned LVLMs together with GPT-4V, assessing their multi-modal understanding capabilities in general tasks including object counting, absurd question answering, affordance reasoning, attribute recognition, and spatial relation reasoning. Our investigations reveal that these LVLMs demonstrate limited proficiency not only in specialized tasks but also in general tasks. We delve deep into this inadequacy and uncover several potential factors, including limited cognition in specialized tasks, object hallucination, text-to-image interference, and decreased robustness in complex problems. We hope that this study can provide useful insights for the future development of LVLMs, helping researchers improve LVLMs for both general and specialized applications.	翻訳日:2024-06-12 22:13:02 公開日:2024-06-11
# 真実を意識した文脈選択:非現実的な文脈で誤解される大規模言語モデルの幻覚を緩和する Truth-Aware Context Selection: Mitigating Hallucinations of Large Language Models Being Misled by Untruthful Contexts ( http://arxiv.org/abs/2403.07556v3 ) ライセンス: Link先を確認	Tian Yu, Shaolei Zhang, Yang Feng,	(参考訳) LLM(Large Language Models)は、印象的なテキスト生成機能を示しているが、ユーザや知識増強ツールが提供する非現実的なコンテキストによって容易に誤解され、幻覚に繋がる。本研究では,LLMが非現実的コンテキストによって誤解されるのを防止し,知識の増大を活かすために,入力から非現実的コンテキストを適応的に認識しマスクする軽量な手法であるTruth-Aware Context Selection (TACS)を提案する。 TACSは、LLM内のパラメータ化された知識を活用して、入力コンテキスト上で真理検出を行うことから始まる。その後、各位置の真偽に基づいて対応する注目マスクを構築し、真偽のコンテキストを選択し、非真実のコンテキストを破棄する。さらに,新たな評価基準である外乱適応率を導入し,LLMが真理情報を受け入れ,非真理情報に抵抗する能力をさらに研究する。実験結果から,TACSは非現実的文脈を効果的にフィルタリングし,誤解を招く情報を提示した場合のLLMの応答の全体的な品質を著しく向上させることができることがわかった。 Although Large Language Models (LLMs) have demonstrated impressive text generation capabilities, they are easily misled by untruthful contexts provided by users or knowledge augmentation tools, leading to hallucinations. To alleviate LLMs from being misled by untruthful context and take advantage of knowledge augmentation, we propose Truth-Aware Context Selection (TACS), a lightweight method to adaptively recognize and mask untruthful context from the inputs. TACS begins by performing truth detection on the input context, leveraging the parameterized knowledge within the LLM. Subsequently, it constructs a corresponding attention mask based on the truthfulness of each position, selecting the truthful context and discarding the untruthful context. Additionally, we introduce a new evaluation metric, Disturbance Adaption Rate, to further study the LLMs' ability to accept truthful information and resist untruthful information. Experimental results indicate that TACS can effectively filter untruthful context and significantly improve the overall quality of LLMs' responses when presented with misleading information.	翻訳日:2024-06-12 22:13:02 公開日:2024-06-11
# 任意の2量子状態の幾何学的量子不一致:正確な値と一般上界 Geometric quantum discord of an arbitrary two-qudit state: the exact value and general upper bounds ( http://arxiv.org/abs/2403.09342v3 ) ライセンス: Link先を確認	Elena R. Loubenets, Louis Hanotel,	(参考訳) 2量子状態の幾何学的な量子不協和は、多くの論文で研究されているが、その明示的な形の正確な解析値は、一般的な2量子状態、一般的な2量子状態、いくつかの2量子状態の特別な族についてのみ知られている。一般的なブロッホベクトル形式主義 (J. Phys. A: Math. Theor. 54 195301 (2021)) に基づいて、その相関行列のパラメータとその縮小状態のブロッホベクトルを通じて、任意の次元の一般2量子状態に対する幾何量子不協和の明確な正確な解析値を求める。この新たな解析結果は、[Phys. A. 85, 204102 (2012)] で発見された幾何学的量子不協和の低い境界が、各2量子状態で達成され、また、幾何学的不協和の既知の正確な結果が、特定の場合のみ含まれていることを示している。さらに、この状態のヒルベルト空間特性と純粋な2量子状態の場合には、純あるいは混合の任意の2量子状態を見つけることができる。 The geometric quantum discord of a two-qudit state has been studied in many papers, however, its exact analytical value in the explicit form is known only for a general two-qubit state, a general qubit-qudit state and some special families of two-qudit states. Based on the general Bloch vectors formalism [J. Phys. A: Math. Theor. 54 195301 (2021)], we find the explicit exact analytical value of the geometric quantum discord for a general two-qudit state of an arbitrary dimension via the parameters of its correlation matrix and the Bloch vectors of its reduced states. This new general analytical result indicates that the lower bound on the geometric quantum discord found in [Phys. Rev. A. 85, 204102 (2012)] is attained on each two-qudit state and also, includes all the known exact results on the geometric discord only as particular cases. Moreover, it allows us to find for an arbitrary two-qudit state, pure or mixed, the new general upper bounds on its geometric quantum discord, expressed via the Hilbert space characteristics of this state and in case of a pure two-qudit state -- in terms of its concurrence.	翻訳日:2024-06-12 22:13:02 公開日:2024-06-11
# ヘテロ親和性グラフに対するGNNの逆過程による過平滑化の軽減 Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs ( http://arxiv.org/abs/2403.10543v2 ) ライセンス: Link先を確認	MoonJeong Park, Jaeseung Heo, Dongwoo Kim,	(参考訳) グラフニューラルネットワーク(GNN)は拡散過程に似ており、多くの層を積み重ねる際の学習表現の過度な平滑化につながる。したがって、メッセージパッシングの逆プロセスは、フォワードメッセージの伝搬を反転させることで区別可能なノード表現を生成することができる。この区別可能な表現は、異種グラフのような異なるラベルで近隣ノードをよりよく分類するのに役立ちます。本稿では, 逆過程の設計原理をGNNの3つの変種に適用する。異種グラフデータに対する実験により, 隣接ノードは, 分類を成功させるために異なる表現を持つ必要があるため, 逆処理が多くの場合において予測性能を著しく向上することを示した。さらなる分析により、逆のメカニズムが数百層にわたるオーバー・スムースを緩和できることが判明した。私たちのコードはhttps://github.com/ml-postech/reverse-gnn.comで利用可能です。 Graph Neural Network (GNN) resembles the diffusion process, leading to the over-smoothing of learned representations when stacking many layers. Hence, the reverse process of message passing can produce the distinguishable node representations by inverting the forward message propagation. The distinguishable representations can help us to better classify neighboring nodes with different labels, such as in heterophilic graphs. In this work, we apply the design principle of the reverse process to the three variants of the GNNs. Through the experiments on heterophilic graph data, where adjacent nodes need to have different representations for successful classification, we show that the reverse process significantly improves the prediction performance in many cases. Additional analysis reveals that the reverse mechanism can mitigate the over-smoothing over hundreds of layers. Our code is available at https://github.com/ml-postech/reverse-gnn.	翻訳日:2024-06-12 22:13:02 公開日:2024-06-11
# CRS-Diff:制御可能な生成型リモートセンシング基礎モデル CRS-Diff: Controllable Generative Remote Sensing Foundation Model ( http://arxiv.org/abs/2403.11614v3 ) ライセンス: Link先を確認	Datao Tang, Xiangyong Cao, Xingsong Hou, Zhongyuan Jiang, Deyu Meng,	(参考訳) 生成モデルの出現は、リモートセンシング(RS)画像生成の分野に革命をもたらした。高品質な画像を生成するにもかかわらず、既存の手法は主にテキスト制御条件に依存しているため、常に正確かつ安定した画像を生成するとは限らない。本稿では,RS画像生成に適した新しいRS生成基盤フレームワークであるCRS-Diffを提案する。具体的には、CRS-Diffはテキスト条件、メタデータ条件、画像条件制御入力を同時にサポートし、より正確な制御により生成プロセスを洗練できる。複数条件制御情報を効果的に統合するために,複数機能融合を実現するための新しい条件制御機構を導入し,制御条件の誘導効果を高める。我々の知る限り、CRS-Diffは、最初の多重条件制御可能な生成RS基盤モデルである。 CRS-Diffは, 従来法と比較して, 定量的かつ定性的にRS画像を生成する能力に優れていた。さらに、当社のCRS-Diffは、下流タスク、例えば道路抽出のための高品質なトレーニングデータを生成するデータエンジンとして機能する。コードはhttps://github.com/Sonettoo/CRS-Diffで公開されている。 The emergence of generative models has revolutionized the field of remote sensing (RS) image generation. Despite generating high-quality images, existing methods are limited in relying mainly on text control conditions and thus don't always generate images accurately and stablely. In this paper, we propose CRS-Diff, a new RS generative foundation framework specifically tailored for RS image generation, leveraging the inherent advantages of diffusion models while integrating more advanced control mechanisms. Specifically, CRS-Diff can simultaneously support text-condition, metadata-condition, and image-condition control inputs, thus enabling more precise control to refine the generation process. To effectively integrate multiple condition control information, we introduce a new conditional control mechanism to achieve multi-scale feature fusion, thus enhancing the guiding effect of control conditions. To our knowledge, CRS-Diff is the first multiple-condition controllable generative RS foundation model. Experimental results in single-condition and multiple-condition cases have demonstrated the superior ability of our CRS-Diff to generate RS images both quantitatively and qualitatively compared with previous methods. Additionally, our CRS-Diff can serve as a data engine that generates high-quality training data for downstream tasks, e.g., road extraction. The code is available at https://github.com/Sonettoo/CRS-Diff.	翻訳日:2024-06-12 22:13:02 公開日:2024-06-11
# ChatGPTはディープフェイクを検出できるか? : メディアフォサイシクスにおける多モーダル大言語モデルを用いた検討 Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics ( http://arxiv.org/abs/2403.14077v4 ) ライセンス: Link先を確認	Shan Jia, Reilin Lyu, Kangran Zhao, Yize Chen, Zhiyuan Yan, Yan Ju, Chuanbo Hu, Xin Li, Baoyuan Wu, Siwei Lyu,	(参考訳) AI生成メディアコンテンツを指すDeepFakesは、偽情報の手段としての利用が懸念されている。 DeepFakesの検出は現在、プログラムされた機械学習アルゴリズムで解決されている。本研究では,DeepFake検出におけるマルチモーダル大言語モデル(LLM)の機能について検討する。我々は,マルチモーダルLLMを実証するために定性的かつ定量的な実験を行い,慎重に設計し,迅速な技術によってAI生成画像を公開できることを実証した。 LLMは本質的にはメディアの法医学的タスクに向いておらず、そのプロセスはプログラミングを必要としないことを考慮すれば興味深い。本稿では,これらのタスクに対するマルチモーダル LLM の限界について論じ,改善の可能性を提案する。 DeepFakes, which refer to AI-generated media content, have become an increasing concern due to their use as a means for disinformation. Detecting DeepFakes is currently solved with programmed machine learning algorithms. In this work, we investigate the capabilities of multimodal large language models (LLMs) in DeepFake detection. We conducted qualitative and quantitative experiments to demonstrate multimodal LLMs and show that they can expose AI-generated images through careful experimental design and prompt engineering. This is interesting, considering that LLMs are not inherently tailored for media forensic tasks, and the process does not require programming. We discuss the limitations of multimodal LLMs for these tasks and suggest possible improvements.	翻訳日:2024-06-12 22:13:02 公開日:2024-06-11
# Feshbach共鳴の閉チャネルパラメータ Closed-channel parameters of Feshbach resonances ( http://arxiv.org/abs/2403.14962v2 ) ライセンス: Link先を確認	Pascal Naidon,	(参考訳) この研究は、フェシュバッハ共鳴の閉チャネルが実験的な可観測物によってどのように特徴づけられるかを研究する。驚いたことに、フェシュバッハ共鳴に付随する2体観測器は閉路の性質に敏感であることが判明した。特に、この状況では、通常の実験データから共鳴を引き起こす境界状態のエネルギーを決定することは不可能である。これは、その深い2体相互作用ポテンシャルのため、超低温原子中の全ての磁気フェシュバッハ共鳴のケースである。この感度は、ハドロン共鳴のような浅い相互作用ポテンシャルを含むフェシュバッハ共鳴との大きな違いを浮き彫りにする。しかし、短距離2体相関と3体オブザーバブルは「閉チャネル散乱長」と呼ばれる閉チャネルのパラメータの影響を受けているようである。超低温原子系におけるこのパラメータを測定するための光解離実験が提案されている。 This work investigates how the closed channel of a Feshbach resonance is characterised by experimental observables. Surprisingly, it is found that the two-body observables associated with the Feshbach resonance can be insensitive to the properties of the closed channel. In particular, it is impossible in this situation to determine the energy of the bound state causing the resonance from the usual experimental data. This is the case for all magnetic Feshbach resonances in ultracold atoms, due to their deep two-body interaction potentials. This insensitivity highlights a major difference with Feshbach resonances that involve shallow interaction potentials, such as hadron resonances. It appears however that short-range two-body correlations and three-body observables are affected by a parameter of the closed channel called the "closed-channel scattering length". A photoassociation experiment is proposed to measure this parameter in ultracold atom systems.	翻訳日:2024-06-12 22:03:14 公開日:2024-06-11
# 中程度の高強度レーザー場における双極子-双極子相互作用 Dipole-dipole interacting two-level emitters in a moderately intense laser field ( http://arxiv.org/abs/2403.15193v2 ) ライセンス: Link先を確認	Profirie Bardetski, Mihai A. Macovei,	(参考訳) 共振器の共振特性について検討し, 共振器内での2レベルエミッタの共振特性について検討した。任意の2レベルラジエーター間の平均距離は対応する発光波長よりも小さく、双極子-双極子相互作用は無視できない。世俗近似の下では、集団共鳴蛍光スペクトルは2N+1スペクトル線から構成されており、Nは試料からの放射体の数である。 2Nスペクトル側バンドは、レーザー周波数の中心線に対して、一般化されたラビ周波数の周囲に対称に位置し、双極子-双極子結合強度が集団自発崩壊速度よりも大きい場合、識別可能である。このようにして、自然に散乱した集団共鳴蛍光スペクトルを測定することで、アンサンブル内の放射体数を推定することができる。対照的に、双極子-双極子カップリングが共振自在崩壊速度よりも順に小さいが、それでも無視できない場合、スペクトルはモローのような蛍光スペクトルとなり、2つの側バンドのスペクトル線はそれぞれ双極子-双極子カップリング強度に比例する。 We investigate the resonance fluorescence features of a small ensemble of closely packed and moderately laser pumped two-level emitters at resonance. The mean distance between any two-level radiators is smaller than the corresponding emission wavelength, such that the dipole-dipole interactions are not negligible. We have found that under the secular approximation, the collective resonance fluorescence spectrum consists of 2N+1 spectral lines, where N is the number of emitters from the sample. The 2N spectral sidebands, symmetrically located around the generalized Rabi frequency with respect to the central line at the laser frequency, are distinguishable if the dipole-dipole coupling strength is larger than the collective spontaneous decay rate. This way, one can estimate the radiators number within the ensemble via measuring of the spontaneously scattered collective resonance fluorescence spectrum. Contrary, if the dipole-dipole coupling is of the order of or smaller than the cooperative spontaneous decay rate, but still non-negligible, the spectrum turns into a Mollow-like fluorescence spectrum, where the two sidebands spectral lines broadens, proportional to the dipole-dipole coupling strength, respectively.	翻訳日:2024-06-12 22:03:14 公開日:2024-06-11
# 脳電図を用いた対話教育におけるChatGPTの適用効果の検討 Investigation of the effectiveness of applying ChatGPT in Dialogic Teaching Using Electroencephalography ( http://arxiv.org/abs/2403.16687v5 ) ライセンス: Link先を確認	Jiayue Zhang, Yiheng Liu, Wenqi Cai, Lanlan Wu, Yali Peng, Jingjing Yu, Senqing Qi, Taotao Long, Bao Ge,	(参考訳) 近年、人工知能技術の急速な発展、特にChatGPTのような大規模言語モデル(LLM)の出現は、教育分野への応用に大きな可能性を示している。 LLMは、知識を解釈し、質問に答え、文脈を考慮し、学生に対話的な教えを支援する能力を持っている。したがって,LLMの指導的役割を効果的に果たす能力について検討し,対話型教育シナリオにおける人間教育者に似た学習を促進することは,非常に貴重な研究課題である。この研究は、34人の大学生を参加者として募集し、ランダムに2つのグループに分けられた。実験群はChatGPTを用いて対話型指導を行い,コントロール群は人間教師と対話した。両グループは情報関連コースであるDigital Image Processingでヒストグラム等化単位を学習した。調査の結果,保持試験における両群間に比較スコアが認められた。しかし,ChatGPTとの対話に携わる学生は,移行試験において低い成績を示した。脳波データによると、ChatGPTと相互作用する学生は認知活動のレベルが高く、ChatGPTが知識基盤の確立と認知活動の促進に役立つことが示唆された。しかし、学生の育成に力を入れている。知識の応用と創造性は重要ではありません研究結果から,ChatGPTは情報関連科目における対話指導における教科の遂行に全力を尽くすことができないことが明らかとなった。 ChatGPTと従来の人間の教師を組み合わせることが、より理想的なアプローチかもしれない。両者のシナジスティックな利用は、生徒により包括的な学習支援を提供し、教育の質の向上に寄与する。 In recent years, the rapid development of artificial intelligence technology, especially the emergence of large language models (LLMs) such as ChatGPT, has presented significant prospects for application in the field of education. LLMs possess the capability to interpret knowledge, answer questions, and consider context, thus providing support for dialogic teaching to students. Therefore, an examination of the capacity of LLMs to effectively fulfill instructional roles, thereby facilitating student learning akin to human educators within dialogic teaching scenarios, is an exceptionally valuable research topic. This research recruited 34 undergraduate students as participants, who were randomly divided into two groups. The experimental group engaged in dialogic teaching using ChatGPT, while the control group interacted with human teachers. Both groups learned the histogram equalization unit in the information-related course "Digital Image Processing". The research findings show comparable scores between the two groups on the retention test. However, students who engaged in dialogue with ChatGPT exhibited lower performance on the transfer test. Electroencephalography data revealed that students who interacted with ChatGPT exhibited higher levels of cognitive activity, suggesting that ChatGPT could help students establish a knowledge foundation and stimulate cognitive activity. However, its strengths on promoting students. knowledge application and creativity were insignificant. Based upon the research findings, it is evident that ChatGPT cannot fully excel in fulfilling teaching tasks in the dialogue teaching in information related courses. Combining ChatGPT with traditional human teachers might be a more ideal approach. The synergistic use of both can provide students with more comprehensive learning support, thus contributing to enhancing the quality of teaching.	翻訳日:2024-06-12 22:03:14 公開日:2024-06-11
# コンパイラフィードバックによる精密コード生成のためのプロジェクトレベルコードコンテキストの反復的リファインメント Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback ( http://arxiv.org/abs/2403.16792v3 ) ライセンス: Link先を確認	Zhangqian Bi, Yao Wan, Zheng Wang, Hongyu Zhang, Batu Guan, Fangxin Lu, Zili Zhang, Yulei Sui, Hai Jin, Xuanhua Shi,	(参考訳) 大規模言語モデル(LLM)は、コードの自動生成において顕著な進歩を示している。しかし、LLM生成コードには、API使用率、クラス、データ構造、プロジェクト固有の情報が欠落しているエラーが含まれている可能性がある。プロジェクト固有のコンテキストの多くはLCMのプロンプトに適合しないので、モデルがプロジェクトレベルのコードコンテキストを探索できるようにする方法を見つけなければなりません。我々は,コンパイラフィードバックを用いてLLM生成コードを改善する新しいコード生成手法であるCoCoGenを提案する。 CoCoGenは、まず静的解析を利用して、生成されたコードとプロジェクトのコンテキストのミスマッチを特定する。その後、コードリポジトリから抽出された情報を使用して、識別されたエラーを反復的に調整し、修正する。我々は CoCoGen を GPT-3.5-Turbo と Code Llama (13B) の2つの代表的な LLM と統合し,Python コード生成に適用する。実験の結果,CoCoGenはプロジェクトコンテキストに依存したコード生成において,バニラLLMを80%以上改善し,既存の検索ベースコード生成ベースラインを一貫して上回っていることがわかった。 Large Language Models (LLMs) have shown remarkable progress in automated code generation. Yet, LLM-generated code may contain errors in API usage, class, data structure, or missing project-specific information. As much of this project-specific context cannot fit into the prompts of LLMs, we must find ways to allow the model to explore the project-level code context. We present CoCoGen, a new code generation approach that uses compiler feedback to improve the LLM-generated code. CoCoGen first leverages static analysis to identify mismatches between the generated code and the project's context. It then iteratively aligns and fixes the identified errors using information extracted from the code repository. We integrate CoCoGen with two representative LLMs, i.e., GPT-3.5-Turbo and Code Llama (13B), and apply it to Python code generation. Experimental results show that CoCoGen significantly improves the vanilla LLMs by over 80% in generating code dependent on the project context and consistently outperforms the existing retrieval-based code generation baselines.	翻訳日:2024-06-12 22:03:14 公開日:2024-06-11
# 複数の専門家のLLMをジェネラリストとして、エキスパートのToken Routingを通じてシンジケートする An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing ( http://arxiv.org/abs/2403.16854v3 ) ライセンス: Link先を確認	Ziwei Chai, Guoyin Wang, Jing Su, Tianjie Zhang, Xuanwen Huang, Xuwu Wang, Jingjing Xu, Jianbo Yuan, Hongxia Yang, Fei Wu, Yang Yang,	(参考訳) 本稿では,複数の専門家LLMのシームレスな統合を支援する汎用フレームワークであるExpert-Token-Routingを紹介する。我々のフレームワークは,メタLLMの語彙内の特別な専門家トークンとして,専門家LLMを表現している。メタLSMは、新しいトークンを生成するように、専門家のLSMにルーティングすることができる。 Expert-Token-Routingは、既存の命令データセットから専門家のLLMの暗黙の専門知識を学ぶことをサポートするだけでなく、プラグイン・アンド・プレイで新しい専門家のLLMを動的に拡張することを可能にする。また、ユーザの視点からは詳細なコラボレーションプロセスを隠蔽し、独特なLLMのように対話を容易にする。本フレームワークは,6つの異なる専門家ドメインを組み込んだベンチマークにおいて,複数の専門家LLMを相乗化して汎用LLMシステムを構築する上での有効性と堅牢性を示すため,既存の複数LLMコラボレーションパラダイムよりも優れていた。 We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. Our framework represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. The meta LLM can route to an expert LLM like generating new tokens. Expert-Token-Routing not only supports learning the implicit expertise of expert LLMs from existing instruction dataset but also allows for dynamic extension of new expert LLMs in a plug-and-play manner. It also conceals the detailed collaboration process from the user's perspective, facilitating interaction as though it were a singular LLM. Our framework outperforms various existing multi-LLM collaboration paradigms across benchmarks that incorporate six diverse expert domains, demonstrating effectiveness and robustness in building generalist LLM system via synergizing multiple expert LLMs.	翻訳日:2024-06-12 22:03:14 公開日:2024-06-11
# パラメータ化メモリインジェクションを用いたパーソナライズLDM応答生成 Personalized LLM Response Generation with Parameterized Memory Injection ( http://arxiv.org/abs/2404.03565v2 ) ライセンス: Link先を確認	Kai Zhang, Lizhi Qing, Yangyang Kang, Xiaozhong Liu,	(参考訳) 大規模言語モデル(LLM)は、自然言語の理解と生成に優れた能力を発揮している。一方、パーソナライズされたLDM応答生成は、医療などの重要な分野の個人に多大な利益をもたらす可能性がある。既存の研究では、新しいクエリの点から、パーソナライズされた応答生成のためのユーザー固有の知識を予め蓄積したLLMに促すためのメモリ拡張手法が検討されている。このようなパラダイムは、微粒な粒度情報を知覚できない、と我々は主張する。本研究では,パラメータ係数ファインチューニング(PEFT)とベイズ最適化探索戦略を併用して,新しい「textbf{M}emory-\textbf{i}njected approach」を提案し,それを用いて「textbf{L}LM \textbf{P}ersonalization(\textbf{MiLP})」を実現する。 Large Language Models (LLMs) have exhibited remarkable proficiency in comprehending and generating natural language. On the other hand, personalized LLM response generation holds the potential to offer substantial benefits for individuals in critical areas such as medical. Existing research has explored memory-augmented methods to prompt the LLM with pre-stored user-specific knowledge for personalized response generation in terms of new queries. We contend that such paradigm is unable to perceive fine-granularity information. In this study, we propose a novel \textbf{M}emory-\textbf{i}njected approach using parameter-efficient fine-tuning (PEFT) and along with a Bayesian Optimisation searching strategy to achieve \textbf{L}LM \textbf{P}ersonalization(\textbf{MiLP}).	翻訳日:2024-06-12 22:03:14 公開日:2024-06-11
# 言語間のアライメントを理解する - サーベイ Understanding Cross-Lingual Alignment -- A Survey ( http://arxiv.org/abs/2404.06228v2 ) ライセンス: Link先を確認	Katharina Hämmerl, Jindřich Libovický, Alexander Fraser,	(参考訳) 多言語言語モデルにおける言語間の表現の有意義な類似性である言語間アライメントは、近年、活発な研究分野となっている。我々は,言語間のアライメントを改善する手法の文献を調査し,手法の分類を提供し,各分野の洞察を要約する。我々は、言語間のアライメントとその制限について、異なる理解を提示する。多数の調査論文から得られた結果の質的な要約を提供する。最後に、この知見をエンコーダモデルだけでなく、エンコーダデコーダやデコーダのみのモデルにも適用し、言語ニュートラルと言語固有の情報の効果的なトレードオフが重要であると論じる。 Cross-lingual alignment, the meaningful similarity of representations across languages in multilingual language models, has been an active field of research in recent years. We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field. We present different understandings of cross-lingual alignment and their limitations. We provide a qualitative summary of results from a large number of surveyed papers. Finally, we discuss how these insights may be applied not only to encoder models, where this topic has been heavily studied, but also to encoder-decoder or even decoder-only models, and argue that an effective trade-off between language-neutral and language-specific information is key.	翻訳日:2024-06-12 22:03:14 公開日:2024-06-11
# ToNER: 生成言語モデルを用いた型指向名前付きエンティティ認識 ToNER: Type-oriented Named Entity Recognition with Generative Language Model ( http://arxiv.org/abs/2404.09145v2 ) ライセンス: Link先を確認	Guochao Jiang, Ziqin Luo, Yuchen Shi, Dixuan Wang, Jiaqing Liang, Deqing Yang,	(参考訳) 近年、微調整された生成モデルは、名前付きエンティティ認識(NER)タスクにおける以前のタグ付けベースまたはスパンベースモデルよりも強力であることが証明されている。また、エンティティタイプのようなエンティティに関連する情報は、モデルにNERをより良く達成するよう促すことも見出されている。しかし、与えられた文の中に実際に存在するエンティティタイプを事前に判断するのは簡単ではなく、潜在的なエンティティタイプを多すぎると、必然的にモデルを混乱させてしまう。本稿では,NERタスクの促進におけるエンティティタイプのメリットを活用するために,生成モデルに基づく新しいNERフレームワーク,すなわちToNERを提案する。 ToNERでは、文中に最も現れる可能性が最も高いエンティティタイプを特定するために、最初は型マッチングモデルが提案されている。次に、生成モデルのエンコーダを微調整するために複数のバイナリ分類タスクを追加し、入力文の洗練された表現を生成する。さらに、モデルがより正確な結果を出力するために、モデルをさらに微調整するエンティティタイプを見つけるための補助的なタスクを追加します。いくつかのNERベンチマークに関する広範な実験により、エンティティタイプの利用を指向したToNERにおける提案した戦略の有効性が検証された。 In recent years, the fine-tuned generative models have been proven more powerful than the previous tagging-based or span-based models on named entity recognition (NER) task. It has also been found that the information related to entities, such as entity types, can prompt a model to achieve NER better. However, it is not easy to determine the entity types indeed existing in the given sentence in advance, and inputting too many potential entity types would distract the model inevitably. To exploit entity types' merit on promoting NER task, in this paper we propose a novel NER framework, namely ToNER based on a generative model. In ToNER, a type matching model is proposed at first to identify the entity types most likely to appear in the sentence. Then, we append a multiple binary classification task to fine-tune the generative model's encoder, so as to generate the refined representation of the input sentence. Moreover, we add an auxiliary task for the model to discover the entity types which further fine-tunes the model to output more accurate results. Our extensive experiments on some NER benchmarks verify the effectiveness of our proposed strategies in ToNER that are oriented towards entity types' exploitation.	翻訳日:2024-06-12 22:03:14 公開日:2024-06-11
# マーケティングチャネルを量子変換に統合し、ガウス過程モデルによる販売予測のためのエンサンブルカーネルのベイズ最適化 Integrating Marketing Channels into Quantile Transformation and Bayesian Optimization of Ensemble Kernels for Sales Prediction with Gaussian Process Models ( http://arxiv.org/abs/2404.09386v2 ) ライセンス: Link先を確認	Shahin Mirshekari, Negin Hayeri Motedayen, Mohammad Ensaf,	(参考訳) 本研究では,Rational Basis Function (RBF), Rational Quadratic, Mat\'ern kernelsを統合したアンサンブルカーネルを用いた革新的なガウスプロセス(GP)モデルを提案する。ベイズ最適化を適用することで、各カーネルの最適な重み付けを効率的に見つけることができ、複雑な販売データパターンを扱うモデルの能力を高めることができる。提案手法は従来のGPモデルよりも優れており,Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Coefficient of determined (R^2$) といった主要な指標に対して,98倍の精度と優れたパフォーマンスを実現している。この進歩は、予測精度を改善するためのアンサンブルカーネルとベイズ最適化の有効性を強調し、セールス予測における機械学習アプリケーションに深い影響をもたらす。 This study introduces an innovative Gaussian Process (GP) model utilizing an ensemble kernel that integrates Radial Basis Function (RBF), Rational Quadratic, and Mat\'ern kernels for product sales forecasting. By applying Bayesian optimization, we efficiently find the optimal weights for each kernel, enhancing the model's ability to handle complex sales data patterns. Our approach significantly outperforms traditional GP models, achieving a notable 98\% accuracy and superior performance across key metrics including Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Coefficient of Determination ($R^2$). This advancement underscores the effectiveness of ensemble kernels and Bayesian optimization in improving predictive accuracy, offering profound implications for machine learning applications in sales forecasting.	翻訳日:2024-06-12 22:03:14 公開日:2024-06-11
# 多目的進化アルゴリズムのためのニアタイトランタイム保証 Near-Tight Runtime Guarantees for Many-Objective Evolutionary Algorithms ( http://arxiv.org/abs/2404.12746v2 ) ライセンス: Link先を確認	Simon Wietheger, Benjamin Doerr,	(参考訳) 多目的進化アルゴリズム(MOEA)の数学的ランタイム解析の分野では大きな進歩があったが、離散多目的問題におけるMOEAの性能はほとんど理解されていない。特に、古典的なベンチマークにおけるSEMO、グローバルSEMO、SMS-EMOAアルゴリズムの既存のバウンダリは、すべてParetoフロントの約2倍である。本研究では,最も一般的な4つのベンチマーク問題であるOneMinMax, CountingOnesCountingZeros, LeadingOnesTrailingZeros, and OneJumpZeroJumpにおいて,これらの3つのアルゴリズムのほぼ28のランタイム保証を証明した。私たちのバウンダリはParetoのフロントサイズにのみ依存しており、これらのベンチマーク上のMOEAは、以前の研究が示唆していたよりも、多くの目標にずっとうまく対応していることを示している。我々の境界は、ビットストリングの目的数と長さの小さな多項式因子と密接な関係にある。このような厳密な境界がこれらのMOEAの多目的利用に対して証明されたのはこれが初めてである。このような結果はNSGA-IIでは成り立たないことが知られているが、我々は最近の構造的結果を通じてNSGA-IIIアルゴリズムに遷移することを示す。 Despite significant progress in the field of mathematical runtime analysis of multi-objective evolutionary algorithms (MOEAs), the performance of MOEAs on discrete many-objective problems is little understood. In particular, the few existing bounds for the SEMO, global SEMO, and SMS-EMOA algorithms on classic benchmarks are all roughly quadratic in the size of the Pareto front. In this work, we prove near-tight runtime guarantees for these three algorithms on the four most common benchmark problems OneMinMax, CountingOnesCountingZeros, LeadingOnesTrailingZeros, and OneJumpZeroJump, and this for arbitrary numbers of objectives. Our bounds depend only linearly on the Pareto front size, showing that these MOEAs on these benchmarks cope much better with many objectives than what previous works suggested. Our bounds are tight apart from small polynomial factors in the number of objectives and length of bitstrings. This is the first time that such tight bounds are proven for many-objective uses of these MOEAs. While it is known that such results cannot hold for the NSGA-II, we do show that our bounds, via a recent structural result, transfer to the NSGA-III algorithm.	翻訳日:2024-06-12 22:03:14 公開日:2024-06-11
# EMの返却:QA評価のためのエンティティ駆動型回答セットの拡張 Return of EM: Entity-driven Answer Set Expansion for QA Evaluation ( http://arxiv.org/abs/2404.15650v2 ) ライセンス: Link先を確認	Dongryeol Lee, Minwoo Lee, Kyungmin Min, Joonsuk Park, Kyomin Jung,	(参考訳) 近年,大規模言語モデル(LLM)を直接使用することが,QAモデルを評価する上で最も信頼性の高い手法であることが示されている。しかし、限定的な解釈可能性、高いコスト、環境被害に悩まされている。そこで本研究では,エンティティ駆動型回答セット拡張を用いたソフトEMを提案する。本手法は, 表面形状が実体の種類によっては特定のパターンに従うことがしばしばあるという観察に基づいて, 多様な表面形状を含むように金の解集合を拡張する。実験結果から,本手法は従来の評価手法よりも高い性能を示した。さらに,評価手法の信頼性はLLM法と同等であり,高い解釈可能性と環境負荷の低減の利点も提供する。 Recently, directly using large language models (LLMs) has been shown to be the most reliable method to evaluate QA models. However, it suffers from limited interpretability, high cost, and environmental harm. To address these, we propose to use soft EM with entity-driven answer set expansion. Our approach expands the gold answer set to include diverse surface forms, based on the observation that the surface forms often follow particular patterns depending on the entity type. The experimental results show that our method outperforms traditional evaluation methods by a large margin. Moreover, the reliability of our evaluation method is comparable to that of LLM-based ones, while offering the benefits of high interpretability and reduced environmental harm.	翻訳日:2024-06-12 22:03:14 公開日:2024-06-11
# 時系列・時空間データの拡散モデルに関する調査 A Survey on Diffusion Models for Time Series and Spatio-Temporal Data ( http://arxiv.org/abs/2404.18886v3 ) ライセンス: Link先を確認	Yiyuan Yang, Ming Jin, Haomin Wen, Chaoli Zhang, Yuxuan Liang, Lintao Ma, Yi Wang, Chenghao Liu, Bin Yang, Zenglin Xu, Jiang Bian, Shirui Pan, Qingsong Wen,	(参考訳) 時系列の研究は、時間とともに傾向や異常を理解するために不可欠であり、様々な分野の予測的な洞察を可能にする。一方、時空間データは空間と時間の両方の現象を解析するのに不可欠であり、複雑なシステム相互作用のダイナミックな視点を提供する。近年,拡散モデルが時系列や時空間データマイニングに広く応用されている。シーケンシャルなデータや時間的なデータの生成能力や推論能力を向上するだけでなく、他の下流タスクにも拡張する。本研究では,時系列および時空間データにおける拡散モデルの使用状況について,モデルカテゴリ,タスクタイプ,データモダリティ,実践的アプリケーション領域で分類し,包括的かつ徹底的にレビューする。本稿では,拡散モデルを無条件型と条件付き型に分類し,時系列データと時空間データを別々に検討する。教師なしモデル(unconditioned model)は確率ベースモデルとスコアベースモデルに分けられ、予測、異常検出、分類、計算などの予測および生成タスクを提供する。一方、条件付きモデルでは、余分な情報を利用して性能を向上し、予測的タスクと生成的タスクの両方で同様に分割される。本調査では,医療,レコメンデーション,気候,エネルギー,オーディオ,交通など,さまざまな分野の応用を幅広く取り上げ,これらのモデルがどのようにデータを分析し,生成するかの基本的な理解を提供する。この構造的概要を通じて,従来の課題に対処し,拡散モデルフレームワーク内で革新的なソリューションを探求することによって,将来的なイノベーションと応用を導くことを目的として,時系列および時空間データ分析のための拡散モデルに関する包括的理解を研究者や実践者に提供することを目的とする。 The study of time series is crucial for understanding trends and anomalies over time, enabling predictive insights across various sectors. Spatio-temporal data, on the other hand, is vital for analyzing phenomena in both space and time, providing a dynamic perspective on complex system interactions. Recently, diffusion models have seen widespread application in time series and spatio-temporal data mining. Not only do they enhance the generative and inferential capabilities for sequential and temporal data, but they also extend to other downstream tasks. In this survey, we comprehensively and thoroughly review the use of diffusion models in time series and spatio-temporal data, categorizing them by model category, task type, data modality, and practical application domain. In detail, we categorize diffusion models into unconditioned and conditioned types and discuss time series and spatio-temporal data separately. Unconditioned models, which operate unsupervised, are subdivided into probability-based and score-based models, serving predictive and generative tasks such as forecasting, anomaly detection, classification, and imputation. Conditioned models, on the other hand, utilize extra information to enhance performance and are similarly divided for both predictive and generative tasks. Our survey extensively covers their application in various fields, including healthcare, recommendation, climate, energy, audio, and transportation, providing a foundational understanding of how these models analyze and generate data. Through this structured overview, we aim to provide researchers and practitioners with a comprehensive understanding of diffusion models for time series and spatio-temporal data analysis, aiming to direct future innovations and applications by addressing traditional challenges and exploring innovative solutions within the diffusion model framework.	翻訳日:2024-06-12 21:53:26 公開日:2024-06-11
# 時間対1スパイクニューラルネットワークにおける効率的な連続学習を可能にするアクティブデンドライト Active Dendrites Enable Efficient Continual Learning in Time-To-First-Spike Neural Networks ( http://arxiv.org/abs/2404.19419v2 ) ライセンス: Link先を確認	Lorenzo Pes, Rick Luiken, Federico Corradi, Charlotte Frenkel,	(参考訳) 人間の脳は、連続した情報のストリームから新しいタスクに効率的に適応するが、ニューラルネットワークモデルは、これまで学んだタスクを破滅的に忘れずに、シーケンシャルな情報から学ぶのに苦労する。この制限は、情報が本質的にシーケンシャルな方法で提示される現実のシナリオにおいて、エッジデバイスをデプロイする上で大きなハードルとなる。錐体ニューロンの活発な樹状突起は、新しいタスクを段階的に学習する脳の能力において重要な役割を担っている。そこで本研究では, アクティブデンドライトを応用したスパイクニューラルネットワークモデルを提案する。我々のモデルは、時間的に符号化されたSNNにおける破滅的な忘れを効果的に軽減し、Split MNISTデータセットを用いてテストセット上の88.3%のタスクで学習終了精度を実証する。さらに、エッジデバイスでの現実的なデプロイメントを実現するための、新しいデジタルハードウェアアーキテクチャも提供しています。 Xilinx Zynq-7020 SoC FPGAを用いて、量子化されたソフトウェアモデルと100-%の一致を示し、平均推定時間は37.3ms、精度は80.0%である。 While the human brain efficiently adapts to new tasks from a continuous stream of information, neural network models struggle to learn from sequential information without catastrophically forgetting previously learned tasks. This limitation presents a significant hurdle in deploying edge devices in real-world scenarios where information is presented in an inherently sequential manner. Active dendrites of pyramidal neurons play an important role in the brain ability to learn new tasks incrementally. By exploiting key properties of time-to-first-spike encoding and leveraging its high sparsity, we present a novel spiking neural network model enhanced with active dendrites. Our model can efficiently mitigate catastrophic forgetting in temporally-encoded SNNs, which we demonstrate with an end-of-training accuracy across tasks of 88.3% on the test set using the Split MNIST dataset. Furthermore, we provide a novel digital hardware architecture that paves the way for real-world deployment in edge devices. Using a Xilinx Zynq-7020 SoC FPGA, we demonstrate a 100-% match with our quantized software model, achieving an average inference time of 37.3 ms and an 80.0% accuracy.	翻訳日:2024-06-12 21:53:26 公開日:2024-06-11
# 複雑さから明瞭さへ:AIが科学者の知覚と科学に対する大衆の理解をいかに高めるか From Complexity to Clarity: How AI Enhances Perceptions of Scientists and the Public's Understanding of Science ( http://arxiv.org/abs/2405.00706v2 ) ライセンス: Link先を確認	David M. Markowitz,	(参考訳) 本稿では, 科学コミュニケーションを簡素化し, 一般の科学理解を高めるために, 生成型AIの有効性を評価した。 PNASの論文をAIが生成したものと比較することで、この研究はまずこのような要約や大衆の認識を言語学的にシンプルに評価した。研究1aは,PNAS要約(科学要約)と重要文(レイ要約)の単純さを解析し,レイ要約が言語学的にシンプルであるが,効果サイズの違いは小さいことを示した。研究1bでは,大規模言語モデル GPT-4 を用いて,論文の要約に基づく意味表現を作成した。研究2は、単純なGPT要約が、より複雑に書かれた人間のPNAS要約よりも、科学者(彼らはより信頼でき、信頼できるが、知的ではないと見なされた)の良好な認識を促進することを実験的に実証した。実験3では,複雑なPNASサマリーと比較して,単純なGPTサマリーを読めば,科学的文章の理解が向上することが実験的に示された。参加者はGPT要約を同記事のPNAS要約と比較し,より詳細かつ具体的な方法で科学論文を要約した。 AIは、単純な言語ヒューリスティックを通じて科学コミュニティと一般市民を巻き込む可能性があり、より情報のある社会のための科学的普及への統合を提唱している。 This paper evaluated the effectiveness of using generative AI to simplify science communication and enhance the public's understanding of science. By comparing lay summaries of journal articles from PNAS, yoked to those generated by AI, this work first assessed linguistic simplicity across such summaries and public perceptions. Study 1a analyzed simplicity features of PNAS abstracts (scientific summaries) and significance statements (lay summaries), observing that lay summaries were indeed linguistically simpler, but effect size differences were small. Study 1b used a large language model, GPT-4, to create significance statements based on paper abstracts and this more than doubled the average effect size without fine-tuning. Study 2 experimentally demonstrated that simply-written GPT summaries facilitated more favorable perceptions of scientists (they were perceived as more credible and trustworthy, but less intelligent) than more complexly-written human PNAS summaries. Crucially, Study 3 experimentally demonstrated that participants comprehended scientific writing better after reading simple GPT summaries compared to complex PNAS summaries. In their own words, participants also summarized scientific papers in a more detailed and concrete manner after reading GPT summaries compared to PNAS summaries of the same article. AI has the potential to engage scientific communities and the public via a simple language heuristic, advocating for its integration into scientific dissemination for a more informed society.	翻訳日:2024-06-12 21:53:26 公開日:2024-06-11
# ボルン・レッドフィールド・マスター方程式を超えた非マルコビアン性のスペクトルシグネチャの定量化 Quantifying spectral signatures of non-Markovianity beyond the Born-Redfield master equation ( http://arxiv.org/abs/2405.01722v2 ) ライセンス: Link先を確認	A. Keefe, N. Agarwal, A. Kamal,	(参考訳) オープン量子力学における記憶あるいは時間非局所効果は、ノイズ量子系の理解と制御において理論的および実践的な課題をもたらす。非マルコフ力学の診断の開発には包括的で協力的な取り組みがあったが、既存の測定基準はすべて時間領域の測定に頼っている。本研究では,システム定常状態における非マルコビアン性の検出が可能な非マルコビアン性の分光測度を提案する。実験可能なことに加えて,提案手法には直接情報理論的解釈があり,マルコフ近似を行う際の単位帯域当たりの情報損失が大きい。同じ静脈では、周波数領域量子マスター方程式(FD-QME)がボルン・レッドフィールドの標準的な記述を超え、還元されたシステムの状態の完全なメモリを保持する。 FD-QMEと提案手法を用いて, 環境相関や遅延効果を含む複数のシステム環境環境で, 非マルコビアン性を確実に診断し, 定量化することができる。 Memory or time-non-local effects in open quantum dynamics pose theoretical as well as practical challenges in the understanding and control of noisy quantum systems. While there has been a comprehensive and concerted effort towards developing diagnostics for non-Markovian dynamics, all existing measures rely on time-domain measurements which are typically slow and expensive as they require averaging several runs to resolve small transient features on a broad background, and scale unfavorably with system size and complexity. In this work, we propose a spectroscopic measure of non-Markovianity which can detect persistent non-Markovianity in the system steady state. In addition to being experimentally viable, the proposed measure has a direct information theoretic interpretation: a large value indicates the information loss per unit bandwidth of making the Markov approximation. In the same vein, we derive a frequency-domain quantum master equation (FD-QME) that goes beyond the standard Born-Redfield description and retains the full memory of the state of the reduced system. Using the FD-QME and the proposed measure, we are able to reliably diagnose and quantify non-Markovianity in several system-environment settings including those with environmental correlations and retardation effects.	翻訳日:2024-06-12 21:53:26 公開日:2024-06-11
# ギャップの閉鎖:ニューラル・ネットワーク・パラメトリゼーションによるマルコフサンプリング下でのアクター・クライトのグローバル・コンバージェンス(Last Iterate)の実現 Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization ( http://arxiv.org/abs/2405.01843v3 ) ライセンス: Link先を確認	Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal,	(参考訳) Actor-Critic(AC)アルゴリズムの現在最先端の理論解析は、AC実装の実践的な側面に対処する上で著しく遅れている。この重要なギャップは、ACの実践的な実装に合わせて分析を行うために橋渡しが必要である。そこで本論文では,アクタ/アクタ/アクタ/アクタ/アクタ/マルチ層ニューラルネットワークパラメトリゼーション,テキストbf{M}アルコビアンサンプリング,テキストbf{C}非連続状態-アクション空間,テキストbf{L}astイテレートの性能,およびテキストbf{G}ロバル最適性について,MCMCLGの基準を検討することを提案する。これらの側面は実質的に重要であり、既存のACアルゴリズムの理論解析ではほとんど見過ごされてきた。本研究は,5つの重要な実践的側面(MCLG基準の範囲)をすべて包含するACアルゴリズムの包括的理論的解析を提供することにより,これらのギャップに対処する。我々は、大域収束サンプル複雑性境界を$\tilde{\mathcal{O}}\left({\epsilon^{-3}}\right)$とする。我々は,MDPの弱勾配支配特性と,批判的推定における誤差のユニークな解析を用いて,この結果を実現する。 The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: \textbf{M}ulti-layer neural network parametrization for actor/critic, \textbf{M}arkovian sampling, \textbf{C}ontinuous state-action spaces, the performance of the \textbf{L}ast iterate, and \textbf{G}lobal optimality. These aspects are practically significant and have been largely overlooked in existing theoretical analyses of AC algorithms. In this work, we address these gaps by providing the first comprehensive theoretical analysis of AC algorithms that encompasses all five crucial practical aspects (covers MMCLG criteria). We establish global convergence sample complexity bounds of $\tilde{\mathcal{O}}\left({\epsilon^{-3}}\right)$. We achieve this result through our novel use of the weak gradient domination property of MDP's and our unique analysis of the error in critic estimation.	翻訳日:2024-06-12 21:53:26 公開日:2024-06-11
# 生成コンテンツ豊か化 Generated Contents Enrichment ( http://arxiv.org/abs/2405.03650v2 ) ライセンス: Link先を確認	Mahdi Naseri, Jiayan Qiu, Zhou Wang,	(参考訳) 本稿では,生成コンテンツエンリッチメント(GCE)と呼ばれる新しい人工知能生成タスクについて検討する。視覚的にリアルなコンテンツを生成するための限定的な意味論によって、与えられたテキスト記述を暗黙的に豊かにする従来の人工知能コンテンツ生成タスクとは違って、提案したGCEは、視覚的、構造的に合理的で、意味的に豊富である視覚的およびテキスト的領域において、コンテンツリッチ化を明確に実行しようと試みている。本稿では, GCE の解決に向けて, エンリッチメントにおける意味論と意味間関係を明確に探求するディープ・エンド・ツー・エンド手法を提案する。具体的には、まず入力記述を意味グラフとしてモデル化し、各ノードはオブジェクトを表し、各エッジはオブジェクト間の関係に対応する。次に、入力シーン記述の上にグラフ畳み込みネットワークを導入し、リッチオブジェクトとその入力オブジェクトとの関係を予測する。最後に、リッチな記述を画像合成モデルに入力し、視覚コンテンツ生成を行う。 The Visual Genome dataset on the Visual Genome showed promising and visually plausible results。 In this paper, we investigate a novel artificial intelligence generation task, termed as generated contents enrichment (GCE). Different from conventional artificial intelligence contents generation task that enriches the given textual description implicitly with limited semantics for generating visually real content, our proposed GCE strives to perform content enrichment explicitly on both the visual and textual domain, from which the enriched contents are visually real, structurally reasonable, and semantically abundant. Towards to solve GCE, we propose a deep end-to-end method that explicitly explores the semantics and inter-semantic relationships during the enrichment. Specifically, we first model the input description as a semantic graph, wherein each node represents an object and each edge corresponds to the inter-object relationship. We then adopt Graph Convolutional Networks on top of the input scene description to predict the enriching objects and their relationships with the input objects. Finally, the enriched description is fed into an image synthesis model to carry out the visual contents generation. Our experiments conducted on the Visual Genome dataset exhibit promising and visually plausible results.	翻訳日:2024-06-12 21:53:26 公開日:2024-06-11
# HC-Mamba:医療画像分割のためのハイブリッド畳み込み技術を用いたビジョンMAMBA HC-Mamba: Vision MAMBA with Hybrid Convolutional Techniques for Medical Image Segmentation ( http://arxiv.org/abs/2405.05007v3 ) ライセンス: Link先を確認	Jiashu Xu,	(参考訳) 自動医用画像分割技術は、病理診断を迅速化し、患者医療の効率を向上する可能性がある。しかし、医療画像は複雑なテクスチャや構造を持つことが多く、ダウンサンプリングによる画像解像度の低下や情報損失といった問題に直面していることが多い。この問題に対処するため,現代空間モデルMambaに基づく新しい医用画像分割モデルHC-Mambaを提案する。具体的には、HC-Mambaモデルにおける拡張畳み込み手法を導入し、畳み込みカーネルの知覚場を拡張して計算コストを増大させることなく、より広い範囲の文脈情報をキャプチャする。さらに、HC-Mambaモデルでは、深度的に分離可能な畳み込みを採用し、パラメータの数とモデルの計算能力を大幅に削減する。拡張畳み込みと深度的に分離可能な畳み込みを組み合わせることで、HC-Mambaは高レベルの性能を維持しながら、より低い計算コストで大規模医療画像データを処理できる。臓器の分節や皮膚病変などの分節作業に関する包括的実験を行い,Synapse,ISIC17,ISIC18について広範な実験を行い,HC-Mambaモデルの有用性について検討した。実験の結果,HC-Mambaはこれらのデータセットの競合性能を示し,医用画像のセグメンテーションの有効性と有用性を示した。 Automatic medical image segmentation technology has the potential to expedite pathological diagnoses, thereby enhancing the efficiency of patient care. However, medical images often have complex textures and structures, and the models often face the problem of reduced image resolution and information loss due to downsampling. To address this issue, we propose HC-Mamba, a new medical image segmentation model based on the modern state space model Mamba. Specifically, we introduce the technique of dilated convolution in the HC-Mamba model to capture a more extensive range of contextual information without increasing the computational cost by extending the perceptual field of the convolution kernel. In addition, the HC-Mamba model employs depthwise separable convolutions, significantly reducing the number of parameters and the computational power of the model. By combining dilated convolution and depthwise separable convolutions, HC-Mamba is able to process large-scale medical image data at a much lower computational cost while maintaining a high level of performance. We conduct comprehensive experiments on segmentation tasks including organ segmentation and skin lesion, and conduct extensive experiments on Synapse, ISIC17 and ISIC18 to demonstrate the potential of the HC-Mamba model in medical image segmentation. The experimental results show that HC-Mamba exhibits competitive performance on all these datasets, thereby proving its effectiveness and usefulness in medical image segmentation.	翻訳日:2024-06-12 21:53:26 公開日:2024-06-11
# 連続的ブラウン橋拡散によるフレーム補間 Frame Interpolation with Consecutive Brownian Bridge Diffusion ( http://arxiv.org/abs/2405.05953v2 ) ライセンス: Link先を確認	Zonglin Lyu, Ming Li, Jianbo Jiao, Chen Chen,	(参考訳) ビデオフレーム補間(VFI)における最近の研究は、拡散に基づく条件付き画像生成問題としてVFIを定式化しようと試み、ランダムなノイズと隣接するフレームを与えられた中間フレームを合成している。ビデオの解像度が比較的高いため、LDM(Latent Diffusion Models)が条件生成モデルとして使われ、オートエンコーダは画像をラテント表現に圧縮し、これらのラテント表現からイメージを再構成する。このような定式化は重要な課題である: VFI は出力が決定論的に基底真理中間フレームに等しいことを期待するが、LCM はモデルが複数回実行されると、ランダムに異なる画像の集合を生成する。多様な生成の理由は、LDMにおける生成された潜在表現の累積分散(生成の各ステップで蓄積される分散)が大きいからである。これによりサンプリング軌道はランダムになり、決定論的世代よりも多様になる。この問題に対処するため,我々は,Branian Bridge Diffusionを用いたフレーム補間法を提案する。具体的には、決定論的初期値を入力とし、生成した潜在表現の累積分散をはるかに小さくする、連続的なブラウン橋拡散を提案する。実験の結果,本手法はオートエンコーダの改良とともに改良され,VFIの最先端性能が向上し,さらなる向上の可能性が残っていることが示唆された。 Recent work in Video Frame Interpolation (VFI) tries to formulate VFI as a diffusion-based conditional image generation problem, synthesizing the intermediate frame given a random noise and neighboring frames. Due to the relatively high resolution of videos, Latent Diffusion Models (LDMs) are employed as the conditional generation model, where the autoencoder compresses images into latent representations for diffusion and then reconstructs images from these latent representations. Such a formulation poses a crucial challenge: VFI expects that the output is deterministically equal to the ground truth intermediate frame, but LDMs randomly generate a diverse set of different images when the model runs multiple times. The reason for the diverse generation is that the cumulative variance (variance accumulated at each step of generation) of generated latent representations in LDMs is large. This makes the sampling trajectory random, resulting in diverse rather than deterministic generations. To address this problem, we propose our unique solution: Frame Interpolation with Consecutive Brownian Bridge Diffusion. Specifically, we propose consecutive Brownian Bridge diffusion that takes a deterministic initial value as input, resulting in a much smaller cumulative variance of generated latent representations. Our experiments suggest that our method can improve together with the improvement of the autoencoder and achieve state-of-the-art performance in VFI, leaving strong potential for further enhancement.	翻訳日:2024-06-12 21:53:26 公開日:2024-06-11
# 重ね合わせパラメータの解析的連続による擬エントロピー和則 Pseudoentropy sum rule by analytical continuation of the superposition parameter ( http://arxiv.org/abs/2405.09745v2 ) ライセンス: Link先を確認	Wu-zhong Guo, Yao-zong Jiang, Jin Xu,	(参考訳) 本稿では,重畳状態の擬エントロピーと絡み合いエントロピーを接続する和則を確立する。重ね合わせパラメータの解析的継続により、重ね合わせ状態の遷移行列と密度行列を統一的に扱うことができることを示す。この枠組みの中では、(還元された)遷移行列、擬R'enyiエントロピー、擬エントロピーの和規則を自然に導出する。さらに、擬エントロピーの和則と解析的継続後の重ね合わせ状態のエントロピー関数の特異性構造との密接な関係を示す。また、非エルミート遷移行列の重力双対を理解することと擬エントロピーの絶対値の上限を確立することとの関連性を含む和則の潜在的な応用についても検討する。 In this paper, we establish a sum rule that connects the pseudoentropy and entanglement entropy of a superposition state. Through analytical continuation of the superposition parameter, we demonstrate that the transition matrix and density matrix of the superposition state can be treated in a unified manner. Within this framework, we naturally derive sum rules for the (reduced) transition matrix, pseudo R\'enyi entropy, and pseudoentropy. Furthermore, we demonstrate the close relationship between the sum rule for pseudoentropy and the singularity structure of the entropy function for the superposition state after analytical continuation. We also explore potential applications of the sum rule, including its relevance to understanding the gravity dual of non-Hermitian transition matrices and establishing upper bounds for the absolute value of pseudoentropy.	翻訳日:2024-06-12 21:53:26 公開日:2024-06-11
# 分散量子コンピューティングによる限定接続下における量子コンピューティングのスケーラビリティ向上 Scalability enhancement of quantum computing under limited connectivity through distributed quantum computing ( http://arxiv.org/abs/2405.10942v2 ) ライセンス: Link先を確認	Shao-Hua Hu, George Biswas, Jun-Yi Wu,	(参考訳) 本稿では、量子体積ランダム回路サンプリングを用いて、2QPUエンタングルメント支援分散量子コンピューティング(DQC)をベンチマークし、単一QPU量子コンピューティングと比較する。まず、ランダム回路において、単一キュービットの非偏極ノイズモデルを指定する。この誤差モデルに基づいて、平均ゲート忠実度、重出力確率、線形クロスエントロピーの3つの図形の1対1対応を示す。本研究では,特定雑音モデルに基づく平均ゲート忠実度の解析的近似を導出し,数値シミュレーションと整合性を示す。近似は、DQC装置の拡張接続グラフから得られるノイズ伝搬行列に基づいて算出される。数値シミュレーションでは,接続性に制限のあるQPUに対するDQCのスケーラビリティ向上について紹介する。さらに,DQCにおける拡張性を評価するためのヒューリスティックな手法と,DQC構成の構造を最適化するためのガイドも提供する。 We employ quantum-volume random-circuit sampling to benchmark the two-QPU entanglement-assisted distributed quantum computing (DQC) and compare it with single-QPU quantum computing. We first specify a single-qubit depolarizing noise model in the random circuit. Based on this error model, we show the one-to-one correspondence of three figures of merits, namely average gate fidelity, heavy output probability, and linear cross-entropy. We derive an analytical approximation of the average gate fidelity under the specified noise model, which is shown to align with numerical simulations. The approximation is calculated based on a noise propagation matrix obtained from the extended connectivity graph of a DQC device. In numerical simulation, we unveil the scalability enhancement in DQC for the QPUs with limited connectivity. Furthermore, we provide a simple formula to estimate the average gate fidelity, which also provides us with a heuristic method to evaluate the scalability enhancement in DQC, and a guide to optimize the structure of a DQC configuration.	翻訳日:2024-06-12 21:53:25 公開日:2024-06-11
# 点クラウドデータセットへの量子ニューラルネットワークの適用における正確な置換と回転対称性の強制 Enforcing exact permutation and rotational symmetries in the application of quantum neural network on point cloud datasets ( http://arxiv.org/abs/2405.11150v2 ) ライセンス: Link先を確認	Zhelun Li, Lento Nagano, Koji Terashi,	(参考訳) 量子機械学習の分野での最近の進歩は、量子回路の構造に物理対称性を取り入れるというアイデアを推進してきた。この領域における重要なマイルストーンは、入力オブジェクトの置換の下で同変である$S_{n}$-permutation等変量子ニューラルネットワーク(QNN)の実現である。本稿では,ポイントクラウドデータセットの回転対称性をQNNに符号化することに焦点を当てる。このアプローチのキーとなる洞察は、ベクトル入力を持つすべての回転不変関数は、ベクトル内部積の入力を持つ関数と等価であるということである。プロトン-陽子衝突によって生じる高エネルギー粒子崩壊をSO(1,3)$ローレンツ対称性で数値的に証明し,その有効性を示す。 Recent developments in the field of quantum machine learning have promoted the idea of incorporating physical symmetries in the structure of quantum circuits. A crucial milestone in this area is the realization of $S_{n}$-permutation equivariant quantum neural networks (QNN) that are equivariant under permutations of input objects. In this work, we focus on encoding the rotational symmetry of point cloud datasets into the QNN. The key insight of the approach is that all rotationally invariant functions with vector inputs are equivalent to a function with inputs of vector inner products. We provide a novel structure of QNN that is exactly invariant to both rotations and permutations, with its efficacy demonstrated numerically in the problems of two-dimensional image classifications and identifying high-energy particle decays, produced by proton-proton collisions, with the $SO(1,3)$ Lorentz symmetry.	翻訳日:2024-06-12 21:53:25 公開日:2024-06-11
# MTVQA:多言語テキスト中心ビジュアル質問応答のベンチマーク MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering ( http://arxiv.org/abs/2405.11985v2 ) ライセンス: Link先を確認	Jingqun Tang, Qi Liu, Yongjie Ye, Jinghui Lu, Shu Wei, Chunhui Lin, Wanqing Li, Mohamad Fitri Faiz Bin Mahmood, Hao Feng, Zhen Zhao, Yanjie Wang, Yuliang Liu, Hao Liu, Xiang Bai, Can Huang,	(参考訳) Text-Centric Visual Question Answering (TEC-VQA)は、テキスト中心の視覚環境における人間と機械の相互作用を促進するだけでなく、テキスト中心のシーン理解の領域におけるAIモデルを評価するデファクトゴールドプロキシとしても機能する。それでも、既存のTEC-VQAベンチマークのほとんどは、英語や中国語のような高リソース言語に焦点を当てている。翻訳エンジンによる非テキスト中心のVQAデータセットにおける多言語QAペアの拡張という先駆的な取り組みにもかかわらず、翻訳ベースのプロトコルは、TEC-VQAに適用した場合、かなりの「視覚的・テキスト的誤り」問題に遭遇する。具体的には、画像に存在する視覚的テキストを無視しながら、質問対のテキストを優先する。さらに、ニュアンス付き意味、文脈歪み、言語バイアス、質問型多様性に関連する複雑さに対処できない。 MTVQAは9つの言語にまたがる高品質なヒューマンエキスパートアノテーションを特徴付けるベンチマークであり、2,116枚の画像からなる6,778対の質問応答対で構成されている。さらに, GPT-4o, GPT-4V, Claude3, Gemini など数多くの最先端マルチモーダル言語モデル (MLLM) を MTVQA データセット上で総合的に評価することにより, MTVQA の価値を裏付ける性能改善の余地がまだ大きいことが明らかとなった。さらに、MTVQAデータセット内に多言語学習データを提供し、このデータによる簡単な微調整により、多言語TEC-VQAの性能を大幅に向上させることができることを示す。我々は,MTVQAが研究コミュニティに新たな洞察を与え,多言語視覚テキスト理解のさらなる探求を促すことを願っている。プロジェクトのホームページはhttps://bytedance.github.io/MTVQA/で公開されている。 Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scene understanding. Nonetheless, most existing TEC-VQA benchmarks have focused on high-resource languages like English and Chinese. Despite pioneering works to expand multilingual QA pairs in non-text-centric VQA datasets through translation engines, the translation-based protocol encounters a substantial "visual-textual misalignment" problem when applied to TEC-VQA. Specifically, it prioritizes the text in question-answer pairs while disregarding the visual text present in images. Moreover, it fails to address complexities related to nuanced meaning, contextual distortion, language bias, and question-type diversity. In this work, we tackle multilingual TEC-VQA by introducing MTVQA, the first benchmark featuring high-quality human expert annotations across 9 diverse languages, consisting of 6,778 question-answer pairs across 2,116 images. Further, by comprehensively evaluating numerous state-of-the-art Multimodal Large Language Models (MLLMs), including GPT-4o, GPT-4V, Claude3, and Gemini, on the MTVQA dataset, it is evident that there is still a large room for performance improvement, underscoring the value of MTVQA. Additionally, we supply multilingual training data within the MTVQA dataset, demonstrating that straightforward fine-tuning with this data can substantially enhance multilingual TEC-VQA performance. We aspire that MTVQA will offer the research community fresh insights and stimulate further exploration in multilingual visual text comprehension. The project homepage is available at https://bytedance.github.io/MTVQA/.	翻訳日:2024-06-12 21:53:25 公開日:2024-06-11
# MOSS:モノクルビデオからのモーションベース3D合成 MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video ( http://arxiv.org/abs/2405.12806v2 ) ライセンス: Link先を確認	Hongsheng Wang, Xiang Cai, Xi Sun, Jinhong Yue, Zhanyun Tang, Shengyu Zhang, Feng Lin, Fei Wu,	(参考訳) 単一視点の人間の再構築は、仮想現実の応用、特に複雑な人間の動きを含む文脈において中心的な位置を占める。これは、現実的な衣服の変形を達成する上での顕著な課題である。現在の手法は、運動が表面の変形に与える影響をしばしば見落とし、その結果、表面は大域的な動きによって課される制約を欠いている。これらの制約を克服するために,動作対応のガウス分割を実現するために,運動情報を利用した3次元クローン合成(MOSS)という革新的なフレームワークを導入する。本フレームワークは,KGAS (Kinematic Gaussian Locating Splatting) とUID (Surface deformation Detector) の2つのモジュールから構成される。 KGASは、体表面を横切る大域的な運動を伝播するためにマトリックス・フィッシャー分布を包含する。この分布の密度と回転係数はガウスを明示的に制御し、再構成された表面の現実性を高める。さらに,KGASに基づく単一視点での局所閉塞に対処するため,UIDは重要な表面を同定し,これらの変形を補うために幾何的再構成を行う。実験により,MOSSはモノクロビデオからの3次元衣料合成において,最先端の視覚的品質を実現することが示された。特に,ヒトNeRFとガウススプラッティングをそれぞれ33.94%,LPIPSで16.75%改善した。コードはhttps://wanghongsheng01.github.io/MOSS/で公開されている。 Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcome these limitations, we introduce an innovative framework, Motion-Based 3D Clo}thed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface. Our framework consists of two modules: Kinematic Gaussian Locating Splatting (KGAS) and Surface Deformation Detector (UID). KGAS incorporates matrix-Fisher distribution to propagate global motion across the body surface. The density and rotation factors of this distribution explicitly control the Gaussians, thereby enhancing the realism of the reconstructed surface. Additionally, to address local occlusions in single-view, based on KGAS, UID identifies significant surfaces, and geometric reconstruction is performed to compensate for these deformations. Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos. Notably, we improve the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS respectively. Codes are available at https://wanghongsheng01.github.io/MOSS/.	翻訳日:2024-06-12 21:43:40 公開日:2024-06-11
# 公正データを信頼する - 公平性駆動型データ削除技術における品質の活用 Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal Techniques ( http://arxiv.org/abs/2405.12926v2 ) ライセンス: Link先を確認	Manh Khoi Duong, Stefan Conrad,	(参考訳) 本稿では,特定のデータポイントをトレーニングセットから除去し,その集合内の個体群を公平に表現することを目的としたバイアス軽減手法について述べる。機械学習モデルは、これらの前処理データセットに基づいてトレーニングされており、その予測は公正であると期待されている。しかし、そのようなアプローチは関連するデータを除外し、到達したサブセットはさらなる使用にはあまり信頼できない。先行手法の信頼性を高めるために,(1)グループカバレッジ,(2)データ損失の最小化に加えて,サブセットが満たさなければならない追加要件と目的を提案する。グループ全体の除去は、測定された公正性を改善する可能性があるが、すべてのグループを表現できないことは公平とは考えられないため、このプラクティスは非常に問題である。第2の懸念として、差別を最小限にしながらデータの保持を提唱する。公平性とデータ損失を考慮した多目的最適化問題を導入することにより,これらの目的のバランスをとるパレート最適解を求める手法を提案する。このようなソリューションを識別することで、公正性とデータ品質のトレードオフに関する情報的な決定を下し、アプリケーションに最も適したサブセットを選択することができる。 In this paper, we deal with bias mitigation techniques that remove specific data points from the training set to aim for a fair representation of the population in that set. Machine learning models are trained on these pre-processed datasets, and their predictions are expected to be fair. However, such approaches may exclude relevant data, making the attained subsets less trustworthy for further usage. To enhance the trustworthiness of prior methods, we propose additional requirements and objectives that the subsets must fulfill in addition to fairness: (1) group coverage, and (2) minimal data loss. While removing entire groups may improve the measured fairness, this practice is very problematic as failing to represent every group cannot be considered fair. In our second concern, we advocate for the retention of data while minimizing discrimination. By introducing a multi-objective optimization problem that considers fairness and data loss, we propose a methodology to find Pareto-optimal solutions that balance these objectives. By identifying such solutions, users can make informed decisions about the trade-off between fairness and data quality and select the most suitable subset for their application.	翻訳日:2024-06-12 21:43:40 公開日:2024-06-11
# SemEval-2024 Task 3: 会話におけるマルチモーダル感情原因分析 SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations ( http://arxiv.org/abs/2405.13049v2 ) ライセンス: Link先を確認	Fanfan Wang, Heqing Ma, Jianfei Yu, Rui Xia, Erik Cambria,	(参考訳) 感情を理解する能力は人間のような人工知能の重要な要素であり、感情は人間の認知、意思決定、社会的相互作用に大きな影響を及ぼす。会話における感情認識に加えて、会話における個人の感情状態の背後にある潜在的な原因を特定するタスクは、多くのアプリケーションシナリオにおいて非常に重要である。会話におけるマルチモーダル感情原因分析(Multimodal Emotion Cause Analysis in Conversations)と名付けられたSemEval-2024タスク3を編成する。異なるモダリティ設定の下では、2つのサブタスクで構成されている: テキスト感情因果ペア抽出 (TECPE) とマルチモーダル感情因果ペア抽出 (MECPE) である。共有タスクには143件の登録があり、216件の応募が成功した。本稿では,タスク,データセット,評価設定について紹介し,トップチームのシステムを要約し,参加者の知見について議論する。 The ability to understand emotions is an essential component of human-like artificial intelligence, as emotions greatly influence human cognition, decision making, and social interactions. In addition to emotion recognition in conversations, the task of identifying the potential causes behind an individual's emotional state in conversations, is of great importance in many application scenarios. We organize SemEval-2024 Task 3, named Multimodal Emotion Cause Analysis in Conversations, which aims at extracting all pairs of emotions and their corresponding causes from conversations. Under different modality settings, it consists of two subtasks: Textual Emotion-Cause Pair Extraction in Conversations (TECPE) and Multimodal Emotion-Cause Pair Extraction in Conversations (MECPE). The shared task has attracted 143 registrations and 216 successful submissions. In this paper, we introduce the task, dataset and evaluation settings, summarize the systems of the top teams, and discuss the findings of the participants.	翻訳日:2024-06-12 21:43:40 公開日:2024-06-11
# LM4LV:低レベル視覚タスクのための凍結型大規模言語モデル LM4LV: A Frozen Large Language Model for Low-level Vision Tasks ( http://arxiv.org/abs/2405.15734v2 ) ライセンス: Link先を確認	Boyang Zheng, Jinjin Gu, Shijun Li, Chao Dong,	(参考訳) 大規模言語モデル(LLMs)の成功は、コンピュータビジョンにおける様々な分野のパラダイムを変える多モード大規模言語モデル(MLLMs)の新たな研究トレンドを生み出している。 MLLMは、VQAやテキスト・トゥ・イメージのような多くの高レベルな視覚および視覚言語タスクにおいて有望な結果を示してきたが、低レベルな視覚タスクがMLLMの利点を如何に発揮できるかを示す研究は行われていない。その結果,ほとんどのMLLMは視覚モジュールの設計上,低レベルな特徴に欠けており,低レベルな視覚タスクを解くには本質的に不可能であることが判明した。本研究の目的は、FROZEN LLMがマルチモーダルデータや先行データなしで様々な低レベル視覚タスクを解決できるフレームワークである$\textbf{LM4LV}$である。これは低レベルのビジョンにおけるLLMの強い可能性を示し、MLLMと低レベルのビジョンタスクの間のギャップを埋める。この研究がLSMの新たな視点を刺激し、そのメカニズムをより深く理解することを願っている。コードはhttps://github.com/bytetriper/LM4LVで入手できる。 The success of large language models (LLMs) has fostered a new research trend of multi-modality large language models (MLLMs), which changes the paradigm of various fields in computer vision. Though MLLMs have shown promising results in numerous high-level vision and vision-language tasks such as VQA and text-to-image, no works have demonstrated how low-level vision tasks can benefit from MLLMs. We find that most current MLLMs are blind to low-level features due to their design of vision modules, thus are inherently incapable for solving low-level vision tasks. In this work, we purpose $\textbf{LM4LV}$, a framework that enables a FROZEN LLM to solve a range of low-level vision tasks without any multi-modal data or prior. This showcases the LLM's strong potential in low-level vision and bridges the gap between MLLMs and low-level vision tasks. We hope this work can inspire new perspectives on LLMs and deeper understanding of their mechanisms. Code is available at https://github.com/bytetriper/LM4LV.	翻訳日:2024-06-12 21:43:40 公開日:2024-06-11
# 衝突回避型マルチタスク強化学習のための有限時間解析 Finite-Time Analysis for Conflict-Avoidant Multi-Task Reinforcement Learning ( http://arxiv.org/abs/2405.16077v2 ) ライセンス: Link先を確認	Yudan Wang, Peiyao Xiao, Hao Ban, Kaiyi Ji, Shaofeng Zou,	(参考訳) MTRL (Multi-task reinforcement learning) は,多くの実世界の応用において大きな期待を抱いている。既存のMTRLアルゴリズムは、個々の目的関数と与えられたタスクの優先順位(または重み)を同時に最適化するポリシーを学ぶことを目的としている。しかしながら、これらのメソッドは、大きな勾配を持つタスクが更新方向を支配し、結果として他のタスクのパフォーマンスが低下する、という、‘textit{gradient conflict’という問題に悩まされることが多い。本稿では,タスク重み更新におけるCAとFCというサブプロデューサの2つの選択肢に基づいて,新しい動的重み付けマルチタスク・アクター・クリティック・アルゴリズム(MTAC)を開発する。 MTAC-CAは、タスク間の最小値改善を最大化し、MTAC-FCターゲットをはるかに高速な収束速度で、コンフリクト回避(CA)更新方向を見つけることを目的としている。両アルゴリズムを包括的に有限時間収束解析する。 MTAC-CAは$\epsilon+\epsilon_{\text{app}}$-accurate Pareto stationary policy using $\mathcal{O}({\epsilon^{-5}})$ sample, and ensure a small $\epsilon+\sqrt{\epsilon_{\text{app}}}$-level CA distance (定義されているCA方向の距離)。 MTAC-FCはサンプルの複雑さを$\mathcal{O}(\epsilon^{-3})$に改善するが、一定レベルのCA距離を持つ。 MT10における実験により,既存のMTRL法よりもアルゴリズムの性能が向上したことを示す。 Multi-task reinforcement learning (MTRL) has shown great promise in many real-world applications. Existing MTRL algorithms often aim to learn a policy that optimizes individual objective functions simultaneously with a given prior preference (or weights) on different tasks. However, these methods often suffer from the issue of \textit{gradient conflict} such that the tasks with larger gradients dominate the update direction, resulting in a performance degeneration on other tasks. In this paper, we develop a novel dynamic weighting multi-task actor-critic algorithm (MTAC) under two options of sub-procedures named as CA and FC in task weight updates. MTAC-CA aims to find a conflict-avoidant (CA) update direction that maximizes the minimum value improvement among tasks, and MTAC-FC targets at a much faster convergence rate. We provide a comprehensive finite-time convergence analysis for both algorithms. We show that MTAC-CA can find a $\epsilon+\epsilon_{\text{app}}$-accurate Pareto stationary policy using $\mathcal{O}({\epsilon^{-5}})$ samples, while ensuring a small $\epsilon+\sqrt{\epsilon_{\text{app}}}$-level CA distance (defined as the distance to the CA direction), where $\epsilon_{\text{app}}$ is the function approximation error. The analysis also shows that MTAC-FC improves the sample complexity to $\mathcal{O}(\epsilon^{-3})$, but with a constant-level CA distance. Our experiments on MT10 demonstrate the improved performance of our algorithms over existing MTRL methods with fixed preference.	翻訳日:2024-06-12 21:43:40 公開日:2024-06-11
# AIGB:拡散モデリングによる生成的自動入札 AIGB: Generative Auto-bidding via Diffusion Modeling ( http://arxiv.org/abs/2405.16141v2 ) ライセンス: Link先を確認	Jiayan Guo, Yusen Huo, Zhilin Zhang, Tianyu Wang, Chuan Yu, Jian Xu, Yan Zhang, Bo Zheng,	(参考訳) 自動入札は、広告主に自動入札を提供することによって、オンライン広告を促進する上で重要な役割を担っている。強化学習(RL)は自動入札で人気を集めている。しかし、現在のRL自動入札法のほとんどはマルコフ状態遷移を前提としたマルコフ決定過程(MDP)によってモデル化されている。この仮定は、長い地平線シナリオで実行できることを制限し、高度にランダムなオンライン広告環境を扱う際にモデルを不安定にする。本稿では,AIGB(AI-Generated Bidding)を提案する。このパラダイムでは、入札生成のための条件付き拡散モデルであるDiffBidを提案する。 DiffBidはリターンとトラジェクトリ全体の相関を直接モデル化し、長い地平線におけるタイムステップ間のエラー伝播を効果的に回避する。さらにDiffBidは、特定の制約に固執しながら、与えられた目標を最大化するトラジェクトリを生成するための汎用的なアプローチを提供する。 Alibabaの広告プラットフォーム上での実際のデータセットとオンラインA/Bテストで実施された大規模な実験は、DiffBidの有効性を示し、GMVが2.81%、ROIが3.36%増加した。 Auto-bidding plays a crucial role in facilitating online advertising by automatically providing bids for advertisers. Reinforcement learning (RL) has gained popularity for auto-bidding. However, most current RL auto-bidding methods are modeled through the Markovian Decision Process (MDP), which assumes the Markovian state transition. This assumption restricts the ability to perform in long horizon scenarios and makes the model unstable when dealing with highly random online advertising environments. To tackle this issue, this paper introduces AI-Generated Bidding (AIGB), a novel paradigm for auto-bidding through generative modeling. In this paradigm, we propose DiffBid, a conditional diffusion modeling approach for bid generation. DiffBid directly models the correlation between the return and the entire trajectory, effectively avoiding error propagation across time steps in long horizons. Additionally, DiffBid offers a versatile approach for generating trajectories that maximize given targets while adhering to specific constraints. Extensive experiments conducted on the real-world dataset and online A/B test on Alibaba advertising platform demonstrate the effectiveness of DiffBid, achieving 2.81% increase in GMV and 3.36% increase in ROI.	翻訳日:2024-06-12 21:43:40 公開日:2024-06-11
# PatchScaler:超解法のための効率的パッチ非依存拡散モデル PatchScaler: An Efficient Patch-Independent Diffusion Model for Super-Resolution ( http://arxiv.org/abs/2405.17158v3 ) ライセンス: Link先を確認	Yong Liu, Hang Dong, Jinshan Pan, Qingji Dong, Kai Chen, Rongxiang Zhang, Lean Fu, Fei Wang,	(参考訳) 拡散モデルは、その印象的なコンテンツ生成機能により、超解像の品質を著しく向上させる。提案手法は,画像中のすべての画像パッチが,高解像度画像の再構成に同じサンプリングステップを必要とするという観測結果から,パッチレベルの再構成の困難さに応じて,特徴パッチを異なるグループに分割するパッチ適応型グループサンプリング (PGS) を開発し,各グループに最適な設定を割り当てることにより,より高速に高速化し,より高速なテクスチャ・テクスチャ・テクスチャ・テクスチャ・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション・インジェクション(SR)法を提案する。 Diffusion models significantly improve the quality of super-resolved images with their impressive content generation capabilities. However, the huge computational costs limit the applications of these methods.Recent efforts have explored reasonable inference acceleration to reduce the number of sampling steps, but the computational cost remains high as each step is performed on the entire image.This paper introduces PatchScaler, a patch-independent diffusion-based single image super-resolution (SR) method, designed to enhance the efficiency of the inference process.The proposed method is motivated by the observation that not all the image patches within an image need the same sampling steps for reconstructing high-resolution images.Based on this observation, we thus develop a Patch-adaptive Group Sampling (PGS) to divide feature patches into different groups according to the patch-level reconstruction difficulty and dynamically assign an appropriate sampling configuration for each group so that the inference speed can be better accelerated.In addition, to improve the denoising ability at each step of the sampling, we develop a texture prompt to guide the estimations of the diffusion model by retrieving high-quality texture priors from a patch-independent reference texture memory.Experiments show that our PatchScaler achieves favorable performance in both quantitative and qualitative evaluations with fast inference speed.Our code and model are available at \url{https://github.com/yongliuy/PatchScaler}.	翻訳日:2024-06-12 21:43:40 公開日:2024-06-11
# クリックスルーレート予測のための統一低ランク圧縮フレームワーク Unified Low-rank Compression Framework for Click-through Rate Prediction ( http://arxiv.org/abs/2405.18146v2 ) ライセンス: Link先を確認	Hao Yu, Minghao Fu, Jiandong Ding, Yusheng Zhou, Jianxin Wu,	(参考訳) Deep Click-Through Rate (CTR)予測モデルは、現代の産業レコメンデーションシナリオにおいて重要な役割を果たす。しかし、高いメモリオーバーヘッドと計算コストは、リソース制約のある環境へのデプロイメントを制限する。低ランク近似はコンピュータビジョンや自然言語処理モデルに有効な手法であるが、CTR予測モデルの圧縮への応用はあまり検討されていない。メモリと計算資源が限られているため、CTR予測モデルの圧縮はしばしば3つの根本的な課題、すなわち(1)に直面している。エッジデバイスに適応するためのモデルサイズをどうやって削減するか? (2)。 CTR予測モデル推論の高速化 (3)。圧縮後のオリジナルのモデルの能力を維持するには? 従来の低ランク圧縮研究は主にテンソル分解を用いており、高いパラメータ圧縮比が得られるが、AUCの劣化と計算オーバーヘッドが増大する。これらの課題に対処するために,CTR予測モデルを圧縮する低ランク分解フレームワークを提案する。最も古典的な行列分解SVD法であっても、我々のフレームワークは元のモデルよりも優れた性能を実現することができる。本フレームワークの有効性をさらに向上するため,モデル重みを圧縮するのではなく,出力特性を局所的に圧縮する。我々の統合低ランク圧縮フレームワークは、様々なCTR予測モデルにおける埋め込みテーブルやMLP層に適用できる。 2つの学術データセットと1つの実産業ベンチマークによる大規模な実験により、3-5倍のモデルサイズ削減により、圧縮されたモデルは、圧縮されていないオリジナルのモデルよりも高速な推論と高いAUCを達成できることが示された。私たちのコードはhttps://github.com/yuhao318/Atomic_Feature_Mimickingにあります。 Deep Click-Through Rate (CTR) prediction models play an important role in modern industrial recommendation scenarios. However, high memory overhead and computational costs limit their deployment in resource-constrained environments. Low-rank approximation is an effective method for computer vision and natural language processing models, but its application in compressing CTR prediction models has been less explored. Due to the limited memory and computing resources, compression of CTR prediction models often confronts three fundamental challenges, i.e., (1). How to reduce the model sizes to adapt to edge devices? (2). How to speed up CTR prediction model inference? (3). How to retain the capabilities of original models after compression? Previous low-rank compression research mostly uses tensor decomposition, which can achieve a high parameter compression ratio, but brings in AUC degradation and additional computing overhead. To address these challenges, we propose a unified low-rank decomposition framework for compressing CTR prediction models. We find that even with the most classic matrix decomposition SVD method, our framework can achieve better performance than the original model. To further improve the effectiveness of our framework, we locally compress the output features instead of compressing the model weights. Our unified low-rank compression framework can be applied to embedding tables and MLP layers in various CTR prediction models. Extensive experiments on two academic datasets and one real industrial benchmark demonstrate that, with 3-5x model size reduction, our compressed models can achieve both faster inference and higher AUC than the uncompressed original models. Our code is at https://github.com/yuhao318/Atomic_Feature_Mimicking.	翻訳日:2024-06-12 21:43:40 公開日:2024-06-11
# シンボリック・チェーン・オブ・サートによる忠実な論理的推論 Faithful Logical Reasoning via Symbolic Chain-of-Thought ( http://arxiv.org/abs/2405.18357v2 ) ライセンス: Link先を確認	Jundong Xu, Hao Fei, Liangming Pan, Qian Liu, Mong-Li Lee, Wynne Hsu,	(参考訳) 最近のChain-of-Thought(CoT)技術は、大きな言語モデル(LLM)の推論能力を心の理論で強化するが、それでも象徴的な表現や厳格な推論規則に大きく依存する論理的推論を扱うのに苦労するかもしれない。 LLMの論理的推論能力を強化するために,シンボル表現と論理規則をCoTプロンプトと統合した完全LLMベースのフレームワークであるSymbCoTを提案する。技術的には、LLM、SymbCoT上に構築する 1)まず自然言語の文脈を記号形式に変換し、次に 2) 記号論理規則で問題を解決するためのステップバイステッププランを導出する。 3) 続いて翻訳及び推論連鎖をチェックする検証者。 First-Order LogicとConstraint Optimizationのシンボル式を使った5つの標準データセットの徹底的な評価により、SymbCoTはCoTメソッドよりも大幅に改善されている一方で、現在の最先端のパフォーマンスを更新している。さらに、我々のシステムがより忠実で、柔軟で、説明可能な論理的推論で進んでいることを実証する。我々の知る限りでは、LLMの論理的推論のために記号表現と規則をCoTに結合するのはこれが初めてである。コードはhttps://github.com/Aiden0526/SymbCoT.comで公開されている。 While the recent Chain-of-Thought (CoT) technique enhances the reasoning ability of large language models (LLMs) with the theory of mind, it might still struggle in handling logical reasoning that relies much on symbolic expressions and rigid deducing rules. To strengthen the logical reasoning capability of LLMs, we propose a novel Symbolic Chain-of-Thought, namely SymbCoT, a fully LLM-based framework that integrates symbolic expressions and logic rules with CoT prompting. Technically, building upon an LLM, SymbCoT 1) first translates the natural language context into the symbolic format, and then 2) derives a step-by-step plan to solve the problem with symbolic logical rules, 3) followed by a verifier to check the translation and reasoning chain. Via thorough evaluations on 5 standard datasets with both First-Order Logic and Constraint Optimization symbolic expressions, SymbCoT shows striking improvements over the CoT method consistently, meanwhile refreshing the current state-of-the-art performances. We further demonstrate that our system advances in more faithful, flexible, and explainable logical reasoning. To our knowledge, this is the first to combine symbolic expressions and rules into CoT for logical reasoning with LLMs. Code is open at https://github.com/Aiden0526/SymbCoT.	翻訳日:2024-06-12 21:43:40 公開日:2024-06-11
# 人気バイアスの緩和のためのアライメントとコントラスト Popularity-Aware Alignment and Contrast for Mitigating Popularity Bias ( http://arxiv.org/abs/2405.20718v2 ) ライセンス: Link先を確認	Miaomiao Cai, Lei Chen, Yifan Wang, Haoyue Bai, Peijie Sun, Le Wu, Min Zhang, Meng Wang,	(参考訳) 協調フィルタリング(CF)は一般的に、現実のデータセットにおけるアイテムの不均一な分布のため、人気バイアスの重大な問題に悩まされる。このバイアスは、人気アイテムと不人気アイテムの間にかなりの精度のギャップをもたらす。ユーザの好みの正確な理解を妨げるだけでなく、リコメンデーションシステムにおけるMatthew効果を悪化させる。人気バイアスを軽減するため、既存の取り組みは不人気アイテムの強調や、アイテム表現と人気との相関関係の分離に重点を置いている。効果にもかかわらず,既存の作品では,(1)人気項目からの共通監視信号を抽出し,不人気項目の表現を改善する方法,(2)人気バイアスによる表現分離を緩和する方法の2つの課題に直面している。本研究では,人気バイアスの実証分析を行い,2つの課題に対処するために,大衆意識アライメントとコントラスト(PAAC)を提案する。具体的には、一般的なアイテム表現でモデル化された共通スーパーバイザリー信号を使用し、不人気なアイテム表現を学習するために、新しい人気を意識した教師付きアライメントモジュールを提案する。さらに,コントラスト学習の損失を再重み付けすることで,表現の分離を人気中心の視点から緩和することを提案する。最後に,3つの実世界のデータセットに対する広範な実験を通じて,人気バイアスを緩和するPAACの有効性と理論的根拠を検証する。私たちのコードはhttps://github.com/miaomiao-cai2/KDD2024-PAACで公開されています。 Collaborative Filtering (CF) typically suffers from the significant challenge of popularity bias due to the uneven distribution of items in real-world datasets. This bias leads to a significant accuracy gap between popular and unpopular items. It not only hinders accurate user preference understanding but also exacerbates the Matthew effect in recommendation systems. To alleviate popularity bias, existing efforts focus on emphasizing unpopular items or separating the correlation between item representations and their popularity. Despite the effectiveness, existing works still face two persistent challenges: (1) how to extract common supervision signals from popular items to improve the unpopular item representations, and (2) how to alleviate the representation separation caused by popularity bias. In this work, we conduct an empirical analysis of popularity bias and propose Popularity-Aware Alignment and Contrast (PAAC) to address two challenges. Specifically, we use the common supervisory signals modeled in popular item representations and propose a novel popularity-aware supervised alignment module to learn unpopular item representations. Additionally, we suggest re-weighting the contrastive learning loss to mitigate the representation separation from a popularity-centric perspective. Finally, we validate the effectiveness and rationale of PAAC in mitigating popularity bias through extensive experiments on three real-world datasets. Our code is available at https://github.com/miaomiao-cai2/KDD2024-PAAC.	翻訳日:2024-06-12 21:33:54 公開日:2024-06-11
# 一般化可能な目標認識フェアネスによるヘイトスピーチ検出 Hate Speech Detection with Generalizable Target-aware Fairness ( http://arxiv.org/abs/2406.00046v2 ) ライセンス: Link先を確認	Tong Chen, Danny Wang, Xurong Liang, Marten Risius, Gianluca Demartini, Hongzhi Yin,	(参考訳) ソーシャルメディアプラットフォームの普及による副作用に対抗するため、ヘイトスピーチ検出(HSD)は、早期に有害なオンライン投稿の拡散を阻止する重要な役割を担っている。しかし、ソーシャルメディア上で広く普及している話題コミュニティを考えると、訓練されたHSD分類器は特定の対象グループ(例えば、女性や黒人)に偏りやすくなり、偽陽性/陰性の結果が、コンテンツモデレーション機構の公正性に対する公衆の信頼を著しく損なうことになり、最終的にはオンライン社会の多様性を損なうことになる。既存のフェアネスを意識したHSD法は、対象とするグループ間でのいくつかの相違を緩和することができるが、それらは主に、既知の、固定されたと思われるターゲットの狭い選択に特化している。これにより、新たなターゲットグループが常に時間とともに出現する現実世界のユースケースへの一般化が必然的に防止される。この欠陥に対処するために、我々は、推論中に多様で見えざるターゲットを含む各ポストを適切に分類する新しい方法であるGeneralizable target-aware Fairness (GetFair)を提案する。ターゲット関連の機能に対するHSD分類器の急激な依存を取り除くため、GetFairは、フィルタされたポスト埋め込みからターゲットグループを回復する識別器を欺くために、対向パイプラインで一連のフィルタ関数を訓練する。拡張性と一般化性を維持するため、ターゲット間のセマンティック親和性によって正規化されるハイパーネットワークを用いて、全てのフィルタ関数を革新的にパラメータ化する。ターゲットの事前訓練された単語を入力として埋め込み、ハイパーネットワークは専用のフィルタパラメータを格納することなく、各ターゲット固有のフィルタがオンザフライで使用する重みを生成する。最後に、2つのHSDデータセットの比較実験では、サンプル外のターゲットでGetFairのパフォーマンスが有利であることが示されている。 To counter the side effect brought by the proliferation of social media platforms, hate speech detection (HSD) plays a vital role in halting the dissemination of toxic online posts at an early stage. However, given the ubiquitous topical communities on social media, a trained HSD classifier easily becomes biased towards specific targeted groups (e.g., female and black people), where a high rate of false positive/negative results can significantly impair public trust in the fairness of content moderation mechanisms, and eventually harm the diversity of online society. Although existing fairness-aware HSD methods can smooth out some discrepancies across targeted groups, they are mostly specific to a narrow selection of targets that are assumed to be known and fixed. This inevitably prevents those methods from generalizing to real-world use cases where new targeted groups constantly emerge over time. To tackle this defect, we propose Generalizable target-aware Fairness (GetFair), a new method for fairly classifying each post that contains diverse and even unseen targets during inference. To remove the HSD classifier's spurious dependence on target-related features, GetFair trains a series of filter functions in an adversarial pipeline, so as to deceive the discriminator that recovers the targeted group from filtered post embeddings. To maintain scalability and generalizability, we innovatively parameterize all filter functions via a hypernetwork that is regularized by the semantic affinity among targets. Taking a target's pretrained word embedding as input, the hypernetwork generates the weights used by each target-specific filter on-the-fly without storing dedicated filter parameters. Finally, comparative experiments on two HSD datasets have shown advantageous performance of GetFair on out-of-sample targets.	翻訳日:2024-06-12 21:33:54 公開日:2024-06-11
# LanEvil: レーン検出のロバストさを環境問題にベンチマークする LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions ( http://arxiv.org/abs/2406.00934v3 ) ライセンス: Link先を確認	Tianyuan Zhang, Lu Wang, Hainan Li, Yisong Xiao, Siyuan Liang, Aishan Liu, Xianglong Liu, Dacheng Tao,	(参考訳) レーン検出(LD)は自律走行システムにおいて不可欠な要素であり、適応型クルーズ制御や自動車線センターなどの基本的な機能を提供している。既存のLDベンチマークは主に、道路上の影やタイヤマークのような環境錯覚に対するLDモデルの堅牢性を無視し、一般的なケースを評価することに焦点を当てている。この研究のギャップは、現実世界の交通状況に自然に存在するため、重要な安全上の課題を生じさせる。本稿では,これらの環境錯覚によるLDに対する潜在的脅威を初めて研究し,この自然破壊に対するLDの堅牢性を評価するための総合的な指標であるLanEvilを確立する。 LDタスクにおける実世界の影響要因を幅広くカバーする,14種類の重要かつ重要な環境錯覚(例えば,影,反射)を体系的に設計する。実世界の環境をベースとして、広く使われているCARLAシミュレータを用いて、94の現実的でカスタマイズ可能な3Dケースを作成し、90,292枚のサンプル画像からなるデータセットを作成する。大規模な実験を通じて、LanEvilを用いた一般的なLD手法の堅牢性をベンチマークし、性能劣化(平均5.37%の精度と10.70%のF1スコア)を明らかにし、シャドーエフェクトが最もリスクが高い(7.39%の精度)。さらに、協調シミュレーションにより商用自動運転システムOpenPilotとApolloの性能を評価し、提案した環境錯覚が誤った判断や交通事故につながることを実証する。環境イリュージョンに対する対策として,照明条件下でのロバスト性向上(+3.76%)を目立たせる厳密な例を用いた注意領域混合(AAM)手法を提案する。われわれの論文が今後、より堅牢な自動運転システムに貢献できることを願っている。ウェブサイト: https://lanevil.github.io/.com Lane detection (LD) is an essential component of autonomous driving systems, providing fundamental functionalities like adaptive cruise control and automated lane centering. Existing LD benchmarks primarily focus on evaluating common cases, neglecting the robustness of LD models against environmental illusions such as shadows and tire marks on the road. This research gap poses significant safety challenges since these illusions exist naturally in real-world traffic situations. For the first time, this paper studies the potential threats caused by these environmental illusions to LD and establishes the first comprehensive benchmark LanEvil for evaluating the robustness of LD against this natural corruption. We systematically design 14 prevalent yet critical types of environmental illusions (e.g., shadow, reflection) that cover a wide spectrum of real-world influencing factors in LD tasks. Based on real-world environments, we create 94 realistic and customizable 3D cases using the widely used CARLA simulator, resulting in a dataset comprising 90,292 sampled images. Through extensive experiments, we benchmark the robustness of popular LD methods using LanEvil, revealing substantial performance degradation (-5.37% Accuracy and -10.70% F1-Score on average), with shadow effects posing the greatest risk (-7.39% Accuracy). Additionally, we assess the performance of commercial auto-driving systems OpenPilot and Apollo through collaborative simulations, demonstrating that proposed environmental illusions can lead to incorrect decisions and potential traffic accidents. To defend against environmental illusions, we propose the Attention Area Mixing (AAM) approach using hard examples, which witness significant robustness improvement (+3.76%) under illumination effects. We hope our paper can contribute to advancing more robust auto-driving systems in the future. Website: https://lanevil.github.io/.	翻訳日:2024-06-12 21:33:54 公開日:2024-06-11
# SceneTextGen:拡散モデルを用いたレイアウト非依存のシーンテキスト画像合成 SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models ( http://arxiv.org/abs/2406.01062v2 ) ライセンス: Link先を確認	Qilong Zhangli, Jindong Jiang, Di Liu, Licheng Yu, Xiaoliang Dai, Ankit Ramchandani, Guan Pang, Dimitris N. Metaxas, Praveen Krishnan,	(参考訳) 拡散モデルは画像生成の質を大幅に向上させてきたが、これらの画像内のテキストを正確かつコヒーレントにレンダリングする能力は依然として大きな課題である。従来の拡散に基づくシーンテキスト生成法は、中間レイアウト出力に依存して制限されるのが一般的である。この依存はしばしば、レイアウト生成フェーズの決定論的性質から生じる固有の制限である、テキストスタイルとフォントの制限された多様性をもたらす。これらの課題に対処するために,本稿では,事前定義されたレイアウトステージの必要性を回避するために設計された,新しい拡散ベースモデルであるSceneTextGenを紹介する。そうすることで、SceneTextGenはテキストのより自然で多様な表現を促進する。 SceneTextGenの斬新さは、3つの重要なコンポーネントの統合にある: 詳細なタイポグラフィ特性をキャプチャする文字レベルエンコーダと、文字レベルのインスタンスセグメンテーションモデルと、不要なテキスト生成とマイナーな文字不正確な問題に対処するワードレベルスポッティングモデルである。本手法の有効性は,標準拡散法とテキスト固有法を比較検討し,異なる公開視覚テキストデータセット間で生成した画像に対する文字認識率の向上を示すことで検証した。 While diffusion models have significantly advanced the quality of image generation, their capability to accurately and coherently render text within these images remains a substantial challenge. Conventional diffusion-based methods for scene text generation are typically limited by their reliance on an intermediate layout output. This dependency often results in a constrained diversity of text styles and fonts, an inherent limitation stemming from the deterministic nature of the layout generation phase. To address these challenges, this paper introduces SceneTextGen, a novel diffusion-based model specifically designed to circumvent the need for a predefined layout stage. By doing so, SceneTextGen facilitates a more natural and varied representation of text. The novelty of SceneTextGen lies in its integration of three key components: a character-level encoder for capturing detailed typographic properties, coupled with a character-level instance segmentation model and a word-level spotting model to address the issues of unwanted text generation and minor character inaccuracies. We validate the performance of our method by demonstrating improved character recognition rates on generated images across different public visual text datasets in comparison to both standard diffusion based methods and text specific methods.	翻訳日:2024-06-12 21:33:54 公開日:2024-06-11
# AutoStudio:マルチターンインタラクティブ画像生成における一貫性のある主題の作成 AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation ( http://arxiv.org/abs/2406.01388v2 ) ライセンス: Link先を確認	Junhao Cheng, Xi Lu, Hanhui Li, Khun Loun Zai, Baiqiao Yin, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang,	(参考訳) 最先端のテキスト・ツー・イメージ(T2I)生成モデルは、既に優れた単一画像の生成に優れており、さらに難しい課題であるマルチターン・インタラクティブな画像生成が、関連研究コミュニティの注目を集め始めている。このタスクでは、複数のターンでユーザーと対話し、一貫性のある画像列を生成する必要がある。しかし、ユーザが頻繁に主題を切り替える可能性があるため、現在の取り組みは多様な画像を生成しながら主題の一貫性を維持するのに苦労している。この問題に対処するために、AutoStudioと呼ばれるトレーニング不要のマルチエージェントフレームワークを導入する。 AutoStudioは、対話を処理するために大きな言語モデル(LLM)に基づく3つのエージェントと、高品質な画像を生成するための安定した拡散(SD)ベースのエージェントを使用している。特にAutoStudioは一対話の対話を解釈し、各主題の文脈を管理する主観管理者二被写体位置を制御するためのきめ細かいバウンディングボックスを生成するレイアウト生成装置三レイアウト改良の提案をする監督官、及び (iv)画像生成を完了させる引き出し。さらに,従来のUNetを置き換えるためにParallel-UNetを導入する。また,小被写体を保存しやすくするための被写体初期化生成手法も導入した。当社のAutoStudioでは,対話的かつ一貫したマルチオブジェクト画像のシーケンスを生成することができる。パブリックなCMIGBenchベンチマークと人間による評価による大規模な実験では、AutoStudioは複数のターンにまたがる複数オブジェクトの一貫性を維持しており、Frechet Inception Distanceの平均は13.65%、平均的な文字-文字類似度は2.83%向上している。 As cutting-edge Text-to-Image (T2I) generation models already excel at producing remarkable single images, an even more challenging task, i.e., multi-turn interactive image generation begins to attract the attention of related research communities. This task requires models to interact with users over multiple turns to generate a coherent sequence of images. However, since users may switch subjects frequently, current efforts struggle to maintain subject consistency while generating diverse images. To address this issue, we introduce a training-free multi-agent framework called AutoStudio. AutoStudio employs three agents based on large language models (LLMs) to handle interactions, along with a stable diffusion (SD) based agent for generating high-quality images. Specifically, AutoStudio consists of (i) a subject manager to interpret interaction dialogues and manage the context of each subject, (ii) a layout generator to generate fine-grained bounding boxes to control subject locations, (iii) a supervisor to provide suggestions for layout refinements, and (iv) a drawer to complete image generation. Furthermore, we introduce a Parallel-UNet to replace the original UNet in the drawer, which employs two parallel cross-attention modules for exploiting subject-aware features. We also introduce a subject-initialized generation method to better preserve small subjects. Our AutoStudio hereby can generate a sequence of multi-subject images interactively and consistently. Extensive experiments on the public CMIGBench benchmark and human evaluations show that AutoStudio maintains multi-subject consistency across multiple turns well, and it also raises the state-of-the-art performance by 13.65% in average Frechet Inception Distance and 2.83% in average character-character similarity.	翻訳日:2024-06-12 21:33:54 公開日:2024-06-11
# WEIRD ICWSM: 西洋、教育、工業化、富裕化、民主化はいかにソーシャル・コンピューティング・リサーチか? WEIRD ICWSM: How Western, Educated, Industrialized, Rich, and Democratic is Social Computing Research? ( http://arxiv.org/abs/2406.02090v2 ) ライセンス: Link先を確認	Ali Akbar Septiandri, Marios Constantinides, Daniele Quercia,	(参考訳) ソーシャルコンピューティングの研究の多くは、本質的にバイアスを伴うソーシャルメディアプラットフォームからのデータを分析している。このような偏見の見過ごされた源は、WEIRD (Western, Educated, Industrialized, Rich, Democratic) 人口の過剰表現である。我々は、AAAI ICWSMカンファレンスで発表された研究において、WEIRD人口に対する依存度を評価した。私たちは2018年から2022年にかけて発行された494の論文を分析しました。合成データセットを解析する論文や、明確な起源の国を欠いた論文をフィルタリングした後、420件の論文から、WEIRDスコア計算のための完全な手動検証データを用いたクラウドソーシング研究188件の論文を抽出した。このデータは、既存のWEIRDメトリクスをソーシャルメディアデータに適用するために使用される。その結果、これらの論文の37%は、欧米のデータにのみ焦点をあてていることがわかった。この割合は、CHI (76%) とFAccT (84%) の会議で観測された割合よりも著しく低く、ICWSM内のデータセット起源の多様性を示唆している。しかし、ICWSMの研究では、FAccTと比較して教育、工業化、リッチの国の人口を多く調べており、特に政治的自由と権利を反映した「民主的」な変数について言及している。このことは、政治的自由が制限された国からの発見を隠蔽するソーシャルメディアデータの有用性を指摘する。これらの知見に基づき、WEIRDバイアスを考慮に入れた現在の「ペーパーチェックリスト」の拡張を推奨するとともに、未表示地域からの多様なデータセットの使用を奨励し、研究の傾きを広げるようコミュニティに呼びかける。 Much of the research in social computing analyzes data from social media platforms, which may inherently carry biases. An overlooked source of such bias is the over-representation of WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations, which might not accurately mirror the global demographic diversity. We evaluated the dependence on WEIRD populations in research presented at the AAAI ICWSM conference; the only venue whose proceedings are fully dedicated to social computing research. We did so by analyzing 494 papers published from 2018 to 2022, which included full research papers, dataset papers and posters. After filtering out papers that analyze synthetic datasets or those lacking clear country of origin, we were left with 420 papers from which 188 participants in a crowdsourcing study with full manual validation extracted data for the WEIRD scores computation. This data was then used to adapt existing WEIRD metrics to be applicable for social media data. We found that 37% of these papers focused solely on data from Western countries. This percentage is significantly less than the percentages observed in research from CHI (76%) and FAccT (84%) conferences, suggesting a greater diversity of dataset origins within ICWSM. However, the studies at ICWSM still predominantly examine populations from countries that are more Educated, Industrialized, and Rich in comparison to those in FAccT, with a special note on the 'Democratic' variable reflecting political freedoms and rights. This points out the utility of social media data in shedding light on findings from countries with restricted political freedoms. Based on these insights, we recommend extensions of current "paper checklists" to include considerations about the WEIRD bias and call for the community to broaden research inclusivity by encouraging the use of diverse datasets from underrepresented regions.	翻訳日:2024-06-12 21:33:54 公開日:2024-06-11
# CondTSF: 時系列予測のためのデータセット凝縮の一行プラグイン CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting ( http://arxiv.org/abs/2406.02131v3 ) ライセンス: Link先を確認	Jianrong Ding, Zhanyu Liu, Guanjie Zheng, Haiming Jin, Linghe Kong,	(参考訳) Dataset Condensationは、ディープラーニングのトレーニングに使用できる小さなデータセットを生成して、トレーニングコストを削減できる、生まれたばかりのテクニックだ。データセット凝縮の目的は、合成データセットでトレーニングされたモデルが、完全なデータセットでトレーニングされたモデルと互換性を持って動作できることを保証することである。しかし、既存の手法は主に分類タスクに集中しており、時系列予測(TS予測)への適応に挑戦している。この課題は、合成データの評価における相違から生じる。分類において、合成データは、全データセットで訓練されたモデルと、合成データセットで訓練されたモデルが、出力ロジット分布のばらつきにかかわらず、同一のラベルを同じ入力のために生成した場合、よく蒸留されると考えられる。逆に, TS予測において, 合成データ蒸留の有効性は, モデル間の距離によって決定される。合成データは、予測内のすべてのデータポイントが類似している場合にのみよく蒸留される。その結果,TS予測は分類よりも厳密な評価手法が得られた。このギャップを緩和するため,TS予測のためのデータセット凝縮の最適化目標を理論的に分析し,時系列予測のためのデータセット凝縮(CondTSF)として指定されたデータセット凝縮の1行プラグインを提案する。 CondTSFを以前のデータセット凝縮法にプラグインすることで、完全なデータセットでトレーニングされたモデルの予測と合成データセットでトレーニングされたモデルとの距離の短縮が容易になり、パフォーマンスが向上する。一般的に用いられている8つの時系列データセットについて広範な実験を行う。 CondTSFは、すべてのデータセット、特に低凝縮率において、以前のデータセット凝縮メソッドのパフォーマンスを一貫して改善する。 Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting). This challenge arises from disparities in the evaluation of synthetic data. In classification, the synthetic data is considered well-distilled if the model trained with the full dataset and the model trained with the synthetic dataset yield identical labels for the same input, regardless of variations in output logits distribution. Conversely, in TS-forecasting, the effectiveness of synthetic data distillation is determined by the distance between predictions of the two models. The synthetic data is deemed well-distilled only when all data points within the predictions are similar. Consequently, TS-forecasting has a more rigorous evaluation methodology compared to classification. To mitigate this gap, we theoretically analyze the optimization objective of dataset condensation for TS-forecasting and propose a new one-line plugin of dataset condensation designated as Dataset Condensation for Time Series Forecasting (CondTSF) based on our analysis. Plugging CondTSF into previous dataset condensation methods facilitates a reduction in the distance between the predictions of the model trained with the full dataset and the model trained with the synthetic dataset, thereby enhancing performance. We conduct extensive experiments on eight commonly used time series datasets. CondTSF consistently improves the performance of all previous dataset condensation methods across all datasets, particularly at low condensing ratios.	翻訳日:2024-06-12 21:33:54 公開日:2024-06-11
# 一時集約I.I.D.データからの因果関係の復元可能性について On the Recoverability of Causal Relations from Temporally Aggregated I.I.D. Data ( http://arxiv.org/abs/2406.02191v2 ) ライセンス: Link先を確認	Shunxing Fan, Mingming Gong, Kun Zhang,	(参考訳) 本研究では,時間的アグリゲーションが時間的(非時間的)因果発見に及ぼす影響を概観する。これは、真の因果時間ラグが観測間隔よりもかなり短いことが観察の動機となっている。この不一致は高い凝集を引き起こすため、時間遅延因果関係は消失し、マニフェストへの瞬時に依存する。このような瞬間的依存は、発見結果を意味づけるためにある意味において真の因果関係と整合性を持つことを期待するが、どのような整合性が必要なのか、いつそのような整合性を満たすのかは不明である。機能的一貫性と条件的独立性は,それぞれ機能的因果モデルに基づく手法と条件的独立性に基づく手法に対応し,それらの構成が持つ条件を提供する。本研究では,特に完全非線形の場合において,因果発見結果が集約によって著しく歪む可能性があること,また,部分線形性や適切な先行性がある場合,集約データから因果関係が回復可能であることを理論的,実験的に示す。これらのデータから因果発見結果を解釈する際には,コミュニティは慎重かつ慎重なアプローチをとるべきであり,なぜ,いつ凝集が因果発見法の性能を歪めるかを示す。 We consider the effect of temporal aggregation on instantaneous (non-temporal) causal discovery in general setting. This is motivated by the observation that the true causal time lag is often considerably shorter than the observational interval. This discrepancy leads to high aggregation, causing time-delay causality to vanish and instantaneous dependence to manifest. Although we expect such instantaneous dependence has consistency with the true causal relation in certain sense to make the discovery results meaningful, it remains unclear what type of consistency we need and when will such consistency be satisfied. We proposed functional consistency and conditional independence consistency in formal way correspond functional causal model-based methods and conditional independence-based methods respectively and provide the conditions under which these consistencies will hold. We show theoretically and experimentally that causal discovery results may be seriously distorted by aggregation especially in complete nonlinear case and we also find causal relationship still recoverable from aggregated data if we have partial linearity or appropriate prior. Our findings suggest community should take a cautious and meticulous approach when interpreting causal discovery results from such data and show why and when aggregation will distort the performance of causal discovery methods.	翻訳日:2024-06-12 21:33:54 公開日:2024-06-11
# 量子コンピュータにおける正準第二量子化の動的実装 A dynamical implementation of canonical second quantization on a quantum computer ( http://arxiv.org/abs/2406.03147v2 ) ライセンス: Link先を確認	Juan José Gálvez-Viruet, Felipe J. Llanes-Estrada,	(参考訳) 量子コンピュータの個別レジスタにおける生成・破壊演算子の実装に関する理論的手法を開発し、可変粒子数問題における第2量子化における粒子モードの透過的・動的生成と破壊を可能にする。有限メモリバンク上の可換(反可換)関係の定理を確立し、必要となる対称性および反対称性作用素を提供する。最後に、従来の 2-体および 4-体ハミルトン項の下でのユニタリ進化に対するこれらの作用素の項の式と、粒子数の変更項を提供する。この形式主義では、$n$粒子を$N_p$モードで成すために必要な量子ビットの数は、それぞれ$n\log_2 N_p$である。そのようなスケーリングは、それぞれに多くの状態を持つ穏やかな数の粒子が存在する場合(そして、各状態がほとんどない多数の粒子に対してより有利でない場合)に、$O(N_p)$ qubitsを必要とするジョーダン・ウィグナー変換よりも効率的である。効率は低いが、コンパクトエンコーディングよりも扱いにくい。 We develop theoretical methods for the implementation of creation and destruction operators in separate registers of a quantum computer, allowing for a transparent and dynamical creation and destruction of particle modes in second quantization in problems with variable particle number. We establish theorems for the commutation (anticommutation) relations on a finite memory bank and provide the needed symmetrizing and antisymmetrizing operators. Finally, we provide formulae in terms of these operators for unitary evolution under conventional two- and four-body Hamiltonian terms, as well as terms varying the particle number. In this formalism, the number of qubits needed to codify $n$ particles with $N_p$ modes each is of order $n\log_2 N_p$. Such scaling is more efficient than the Jordan-Wigner transformation which requires $O(N_p)$ qubits, whenever there are a modest number of particles with a large number of states available to each (and less advantageous for a large number of particles with few states available to each). And although less efficient, it is also less cumbersome than compact encoding.	翻訳日:2024-06-12 21:33:54 公開日:2024-06-11
# グローバルクリッパー:トランスフォーマーを用いた物体検出モデルの安全性と信頼性を高める Global Clipper: Enhancing Safety and Reliability of Transformer-based Object Detection Models ( http://arxiv.org/abs/2406.03229v2 ) ライセンス: Link先を確認	Qutub Syed Sha, Michael Paulitsch, Karthik Pattabiraman, Korbinian Hagn, Fabian Oboril, Cornelius Buerkle, Kay-Ulrich Scholl, Gereon Hinz, Alois Knoll,	(参考訳) トランスフォーマーをベースとした物体検出モデルが進むにつれ、自動運転車や航空といった重要な分野への影響が拡大すると予想されている。推論中にビットフリップを引き起こすソフトエラーは、DNNのパフォーマンスに大きく影響し、予測が変更された。 CNNの従来の範囲制限ソリューションは、トランスフォーマーでは不足している。本研究は,トランスフォーマーモデルに特化して設計された効果的な緩和戦略であるGlobal ClipperとGlobal Hybrid Clipperを紹介する。ソフトエラーに対するレジリエンスを大幅に向上させ、欠陥推論を ~ 0 % に削減する。また、3つのデータセットを用いて2つのトランスフォーマーモデル(DINO-DETRとLite-DETR)と2つのCNNモデル(YOLOv3とSSD)を包括的にモデルロバスト性を評価するために、64以上のシナリオにわたる広範なテストについて詳述した。さらに、トランスにおける注目ブロックのユニークな側面とCNNとの運用上の差異について検討する。 As transformer-based object detection models progress, their impact in critical sectors like autonomous vehicles and aviation is expected to grow. Soft errors causing bit flips during inference have significantly impacted DNN performance, altering predictions. Traditional range restriction solutions for CNNs fall short for transformers. This study introduces the Global Clipper and Global Hybrid Clipper, effective mitigation strategies specifically designed for transformer-based models. It significantly enhances their resilience to soft errors and reduces faulty inferences to ~ 0\%. We also detail extensive testing across over 64 scenarios involving two transformer models (DINO-DETR and Lite-DETR) and two CNN models (YOLOv3 and SSD) using three datasets, totalling approximately 3.3 million inferences, to assess model robustness comprehensively. Moreover, the paper explores unique aspects of attention blocks in transformers and their operational differences from CNNs.	翻訳日:2024-06-12 21:33:54 公開日:2024-06-11
# 多様なモデリング単位を用いたCTCに基づく音声認識の強化 Enhancing CTC-based speech recognition with diverse modeling units ( http://arxiv.org/abs/2406.03274v2 ) ライセンス: Link先を確認	Shiyi Han, Zhihong Lei, Mingbin Xu, Xingyu Na, Zhen Huang,	(参考訳) 近年,変圧器などのディープラーニングアーキテクチャの進歩により,エンド・ツー・エンド(E2E)自動音声認識(ASR)モデルの進化が目覚ましい。 E2Eシステムの上に、研究者はE2EモデルのN-best仮説を音素モデルで再現することで、かなりの精度の向上を実現した。このことは、システムの組み合わせ効果以外の改善がどこから来るのかという興味深い疑問を提起する。提案手法は,E2Eモデルを多種多様なモデリングユニットと共同で訓練する,効率的な共同訓練手法である。この手法は音素モデルとグラフモデルの両方の長所を整合させるだけでなく、これらの多種多様なモデリング単位を相乗的に使用することでモデルの精度を大幅に向上させる。我々の研究は、より堅牢で正確なASRシステムの開発において、異種モデリングユニットの最適統合に関する新たな知見を提供する。 In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR) models has been remarkable, largely due to advances in deep learning architectures like transformer. On top of E2E systems, researchers have achieved substantial accuracy improvement by rescoring E2E model's N-best hypotheses with a phoneme-based model. This raises an interesting question about where the improvements come from other than the system combination effect. We examine the underlying mechanisms driving these gains and propose an efficient joint training approach, where E2E models are trained jointly with diverse modeling units. This methodology does not only align the strengths of both phoneme and grapheme-based models but also reveals that using these diverse modeling units in a synergistic way can significantly enhance model accuracy. Our findings offer new insights into the optimal integration of heterogeneous modeling units in the development of more robust and accurate ASR systems.	翻訳日:2024-06-12 21:33:54 公開日:2024-06-11
# シンフォニック定義と意味的関係を用いた意味的変化の分類 Using Synchronic Definitions and Semantic Relations to Classify Semantic Change Types ( http://arxiv.org/abs/2406.03452v3 ) ライセンス: Link先を確認	Pierluigi Cassotti, Stefano De Pascale, Nina Tahmasebi,	(参考訳) 言葉が意味を変える方法が、古い意味と新しい意味(一般化、特殊化、共同催眠術)の関係を強調して、異なるタイプの変化に分類できるという事実は、豊富な証拠がある。本稿では,同期語彙関係と単語の意味定義の両方から情報を利用するモデルを構築し,このような変化を検出する手法を提案する。具体的には,WordNet の構文定義と階層情報を用いて,Blank (1997) のセマンティックチェンジ型データセットのデジタル化バージョンでそれをテストする。最後に,意味的関連性の人的判断の近似モデルと2値の語彙的意味的変化検出のモデルを改善する方法を示す。 There is abundant evidence of the fact that the way words change their meaning can be classified in different types of change, highlighting the relationship between the old and new meanings (among which generalization, specialization and co-hyponymy transfer). In this paper, we present a way of detecting these types of change by constructing a model that leverages information both from synchronic lexical relations and definitions of word meanings. Specifically, we use synset definitions and hierarchy information from WordNet and test it on a digitized version of Blank's (1997) dataset of semantic change types. Finally, we show how the sense relationships can improve models for both approximation of human judgments of semantic relatedness as well as binary Lexical Semantic Change Detection.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-11
# データスケールがコンピュータ制御エージェントに及ぼす影響について On the Effects of Data Scale on Computer Control Agents ( http://arxiv.org/abs/2406.03679v2 ) ライセンス: Link先を確認	Wei Li, William Bishop, Alice Li, Chris Rawles, Folawiyo Campbell-Ajala, Divya Tyamagundlu, Oriana Riva,	(参考訳) 人間のタスクを達成するためにコンピュータインターフェースを制御する自律エージェントが登場している。 LLMをこのようなエージェントに利用することは特に興味深いが、人間によるタスクのデモを微調整しない限り、性能は比較的低い。本研究では,ファインチューニング単独が現実のコンピュータ制御エージェント構築に有効なアプローチであるかどうかを考察する。特に、ドメイン内のハイレベルタスクとローレベルタスクの両方で測定されたパフォーマンスが、より多くのトレーニングデータが収集されるにつれて、ドメインスケール外に与える影響について検討する。この目的のために、Androidアプリで毎日のタスクを15,283回デモする新しいデータセット、AndroidControlを収集、リリースしました。既存のデータセットと比較して、各AndroidControlタスクインスタンスには、ハイレベルとローレベルの両方のヒューマン生成命令が含まれています。さらに、AndroidControlは833のAndroidアプリに対して15,283のユニークなタスクを含む、これまでで最も多様なコンピュータ制御データセットです。データセットを用いて、ドメインを微調整したモデルでテストすると、ゼロと数ショットのベースラインを上回り、ロバストなパフォーマンスを単純により多くのデータを収集して得られるようにスケールする。ドメイン外では、パフォーマンスは大幅に遅くなり、特にハイレベルなタスクでは、より多くのデータのみを微調整することは、ドメイン外での堅牢なパフォーマンスを達成するには不十分である、と示唆する。 Autonomous agents that control computer interfaces to accomplish human tasks are emerging. Leveraging LLMs to power such agents has been of special interest, but unless fine-tuned on human-collected task demonstrations, performance is still relatively low. In this work we study whether fine-tuning alone is a viable approach for building real-world computer control agents. In particularly, we investigate how performance measured on both high and low-level tasks in domain and out of domain scales as more training data is collected. To this end we collect and release a new dataset, AndroidControl, consisting of 15,283 demonstrations of everyday tasks with Android apps. Compared to existing datasets, each AndroidControl task instance includes both high and low-level human-generated instructions, allowing us to explore the level of task complexity an agent can handle. Moreover, AndroidControl is the most diverse computer control dataset to date, including 15,283 unique tasks over 833 Android apps, thus allowing us to conduct in-depth analysis of the model performance in and out of the domain of the training data. Using the dataset, we find that when tested in domain fine-tuned models outperform zero and few-shot baselines and scale in such a way that robust performance might feasibly be obtained simply by collecting more data. Out of domain, performance scales significantly more slowly and suggests that in particular for high-level tasks, fine-tuning on more data alone may be insufficient for achieving robust out-of-domain performance.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-11
# Bench2Drive: 閉ループエンドツーエンド自動運転の多機能ベンチマークを目指して Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving ( http://arxiv.org/abs/2406.03877v2 ) ライセンス: Link先を確認	Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, Junchi Yan,	(参考訳) ファンデーションモデルの急速なスケーリングに特徴付けられる時代において、自律運転技術は、データ駆動方式のスケールアップの可能性から、エンドツーエンドの自動運転(E2E-AD)が出現する変革的なしきい値に近づいている。しかし、既存のE2E-AD手法は、L2エラーと衝突率を指標として、オープンループのログ再生方式で評価され(例えば、nuScenesでは)、最近コミュニティで認められたように、アルゴリズムの駆動性能を完全に反映することができなかった。閉ループ法で評価されたE2E-AD法は, 運転スコアを指標とした固定経路(例えば, CARLAのTown05Long, Longest6)で試験される。さらに、これらの手法は通常、トレーニングのために独自のデータを収集するので、アルゴリズムレベルの公正比較は不可能である。完全自動運転(FSD)のための包括的で現実的で公正なテスト環境の必要性を満たすため、E2E-ADシステムのマルチ能力をクローズドループで評価するための最初のベンチマークであるBench2Driveを提示する。 Bench2Driveの公式トレーニングデータは200万の完全な注釈付きフレームで構成され、CARLA v2の44のインタラクティブシナリオ(カットイン、オーバーテイク、デトゥールなど)、23の天気(雨、霧、雨など)、12の町(都市、村、大学など)で均一に配布された10000のショートクリップから収集されている。評価プロトコルでは、E2E-ADモデルでは、異なる場所と天候下で44の対話的なシナリオをパスし、220のルートを合計し、異なる状況下での運転能力に関する包括的かつ不整合な評価を提供する必要がある。我々は最先端のE2E-ADモデルを実装し、Bench2Driveで評価し、現状と今後の方向性について洞察を提供する。 In an era marked by the rapid scaling of foundation models, autonomous driving technologies are approaching a transformative threshold where end-to-end autonomous driving (E2E-AD) emerges due to its potential of scaling up in the data-driven manner. However, existing E2E-AD methods are mostly evaluated under the open-loop log-replay manner with L2 errors and collision rate as metrics (e.g., in nuScenes), which could not fully reflect the driving performance of algorithms as recently acknowledged in the community. For those E2E-AD methods evaluated under the closed-loop protocol, they are tested in fixed routes (e.g., Town05Long and Longest6 in CARLA) with the driving score as metrics, which is known for high variance due to the unsmoothed metric function and large randomness in the long route. Besides, these methods usually collect their own data for training, which makes algorithm-level fair comparison infeasible. To fulfill the paramount need of comprehensive, realistic, and fair testing environments for Full Self-Driving (FSD), we present Bench2Drive, the first benchmark for evaluating E2E-AD systems' multiple abilities in a closed-loop manner. Bench2Drive's official training data consists of 2 million fully annotated frames, collected from 10000 short clips uniformly distributed under 44 interactive scenarios (cut-in, overtaking, detour, etc), 23 weathers (sunny, foggy, rainy, etc), and 12 towns (urban, village, university, etc) in CARLA v2. Its evaluation protocol requires E2E-AD models to pass 44 interactive scenarios under different locations and weathers which sums up to 220 routes and thus provides a comprehensive and disentangled assessment about their driving capability under different situations. We implement state-of-the-art E2E-AD models and evaluate them in Bench2Drive, providing insights regarding current status and future directions.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-11
# 多目的強化学習に基づく時空間早期予測 Spatio-temporal Early Prediction based on Multi-objective Reinforcement Learning ( http://arxiv.org/abs/2406.04035v2 ) ライセンス: Link先を確認	Wei Shao, Yufan Kang, Ziyan Peng, Xiao Xiao, Lei Wang, Yuhui Yang, Flora D Salim,	(参考訳) 正確さとタイムラインは、予測タスクの目標と矛盾することが多い。早期の予測は誤報の頻度が高くなりうるが、より多くの情報を集めるのに遅延予測は役に立たない。森林火災、犯罪、交通渋滞などの応用において、タイムリーな予測は人命と財産を守るのに不可欠である。したがって、精度とタイムラインのバランスを見つけることが重要である。本稿では,多目的強化学習に基づく時空間的早期予測モデルを提案する。モデルは2つの主要な課題に対処する。 1【早期予測の精度の向上】 2 地域ごとに最適な予測時間を決定するための最適な政策を提供する。提案手法は,3つの大規模実世界のデータセットにおいて,初期時空間予測タスクにおける既存手法よりも優れた性能を示す。 Accuracy and timeliness are indeed often conflicting goals in prediction tasks. Premature predictions may yield a higher rate of false alarms, whereas delaying predictions to gather more information can render them too late to be useful. In applications such as wildfires, crimes, and traffic jams, timely predictions are vital for safeguarding human life and property. Consequently, finding a balance between accuracy and timeliness is crucial. In this paper, we propose a spatio-temporal early prediction model based on Multi-Objective reinforcement learning that can either implement an optimal policy given a preference or infer the preference based on a small number of samples. The model addresses two primary challenges: 1) enhancing the accuracy of early predictions and 2) providing the optimal policy for determining the most suitable prediction time for each area. Our method demonstrates superior performance on three large-scale real-world datasets, surpassing existing methods in early spatio-temporal prediction tasks.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-11
# 効率的な音声合成のための線形注意付き小型E:小言語モデル Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis ( http://arxiv.org/abs/2406.04467v2 ) ライセンス: Link先を確認	Théodor Lemerle, Nicolas Obin, Axel Roebel,	(参考訳) 言語モデルを用いたテキスト音声合成(TTS)の最近の進歩は、自然性やゼロショット音声のクローニングの実現において顕著な能力を示した。注目すべきなのは、デコーダのみのトランスフォーマーが、この領域で目立ったアーキテクチャであることだ。しかし、トランスフォーマーは、シーケンス長の2次複雑さから生じる課題に直面し、長いシーケンスとリソース制約のあるハードウェアのトレーニングを妨げる。さらに、それらはTSアライメントの単調性に関して特定の帰納的バイアスを欠いている。そこで本研究では,リピートとスキップの問題を緩和する特別なクロスアテンション機構を導入し,トランスフォーマーを新たなアーキテクチャに置き換えることを提案する。その結果、我々のアーキテクチャは、長いサンプルで効率的に訓練でき、同等の大きさのベースラインに対して最先端のゼロショット音声クローンを実現することができる。私たちの実装とデモはhttps://github.com/theodorblackbird/lina-speech.comで公開されています。 Recent advancements in text-to-speech (TTS) powered by language models have showcased remarkable capabilities in achieving naturalness and zero-shot voice cloning. Notably, the decoder-only transformer is the prominent architecture in this domain. However, transformers face challenges stemming from their quadratic complexity in sequence length, impeding training on lengthy sequences and resource-constrained hardware. Moreover they lack specific inductive bias with regards to the monotonic nature of TTS alignments. In response, we propose to replace transformers with emerging recurrent architectures and introduce specialized cross-attention mechanisms for reducing repeating and skipping issues. Consequently our architecture can be efficiently trained on long samples and achieve state-of-the-art zero-shot voice cloning against baselines of comparable size. Our implementation and demos are available at https://github.com/theodorblackbird/lina-speech.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-11
# ブラックボックスLCMによるロジット不要なロジットベース検出器の改良 Improving Logits-based Detector without Logits from Black-box LLMs ( http://arxiv.org/abs/2406.05232v2 ) ライセンス: Link先を確認	Cong Zeng, Shengkun Tang, Xianjun Yang, Yuanzhou Chen, Yiyou Sun, zhiqiang xu, Yao Li, Haifeng Chen, Wei Cheng, Dongkuan Xu,	(参考訳) LLM(Large Language Models)の出現はテキスト生成に革命をもたらした。この機械文と人文文の線がぼやけていることは、プロプライエタリなLLMの頻繁な更新とクローズドな性質によって、一方と他方を区別する作業がさらに複雑になる、という新たな課題を示している。従来のロジットに基づく検出手法では、ブラックボックスのLLMから正確なロジットが利用できない場合に、Surrogateモデルを用いてLCM生成したコンテンツを識別する。しかし、これらの手法はサロゲートの分布としばしば開示されるターゲットモデルとのミスアライメントに対処し、特に新しいクローズドソースモデルの導入による性能劣化につながった。さらに、現在の方法論は、ソースモデルが特定されると一般的に有効であるが、モデルバージョンが不明なシナリオや、テストセットが様々なソースモデルから出力を出力するシナリオに干渉する。これらの制約に対処するため、我々は、ソースLLMからのロジットを使わずに、ブラックボックステキスト検出における最先端性能を再定義する革新的なフレームワークであるDis Distribution-Aligned LLMs Detection (DALD)を提案する。 DALDは、サロゲートモデルの分布を未知の目標LLMの分布と整合させ、最小限のトレーニング投資で高速モデルイテレーションに対する検出能力とレジリエンスを向上させるように設計されている。コーパスサンプルをChatGPT, GPT-4, Claude-3などの先進モデルの公開出力から活用することにより、DALDファイントゥインシュロゲートモデルを未知のソースモデル分布と効率的に同期させる。 The advent of Large Language Models (LLMs) has revolutionized text generation, producing outputs that closely mimic human writing. This blurring of lines between machine- and human-written text presents new challenges in distinguishing one from the other a task further complicated by the frequent updates and closed nature of leading proprietary LLMs. Traditional logits-based detection methods leverage surrogate models for identifying LLM-generated content when the exact logits are unavailable from black-box LLMs. However, these methods grapple with the misalignment between the distributions of the surrogate and the often undisclosed target models, leading to performance degradation, particularly with the introduction of new, closed-source models. Furthermore, while current methodologies are generally effective when the source model is identified, they falter in scenarios where the model version remains unknown, or the test set comprises outputs from various source models. To address these limitations, we present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection even without logits from source LLMs. DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations with minimal training investment. By leveraging corpus samples from publicly accessible outputs of advanced models such as ChatGPT, GPT-4 and Claude-3, DALD fine-tunes surrogate models to synchronize with unknown source model distributions effectively.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-11
# ShiftAddLLM: トレーニング後の乗算レスパラメータ化による事前学習LDMの高速化 ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization ( http://arxiv.org/abs/2406.05981v2 ) ライセンス: Link先を確認	Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan, Lin,	(参考訳) 大規模言語モデル(LLM)は、言語タスクにおいて顕著なパフォーマンスを示しているが、リソース制約のあるデバイスにデプロイする際の課題は、その広範なパラメータと密度の高い乗算に依存するため、高いメモリ要求と遅延ボトルネックをもたらす。 Shift-and-add再パラメータ化は、LLMの注目層と多層パーセプトロン(MLP)層の両方において、ハードウェアフレンドリなプリミティブにコストのかかる乗算を置き換えることで、有望なソリューションを提供する。しかし、現在の再パラメータ化技術では、LLMのリソース集約的な精度を回復するために、スクラッチやフルパラメータの微調整からのトレーニングが必要である。そこで本研究では,事前学習後の再パラメータ化を高速化し,ShiftAddLLMと呼ばれる効率的な乗算自由モデルを作成することを提案する。具体的には,各重み行列を群ワイドスケーリング因子と組み合わせた二乗行列に定量化する。関連する乗算は(1)アクティベーションとスケーリング係数のシフト、(2)クエリに再パラメータ化され、バイナリ行列に従って加算される。精度損失を低減するため,重みと出力のアクティベーション再パラメータ化誤差を最小化する多目的最適化手法を提案する。さらに、再パラメータ化のための層間の感度の変化に基づいて、メモリ使用量とレイテンシをさらに削減する自動ビット割り当て戦略を開発する。 5つのLLMファミリーと8つのタスクによる実験は、ShiftAddLLMの有効性を一貫して検証し、それぞれ3ビットと2ビットの最も競争力のある量子化LDMと比較して5.6ポイントと22.7ポイントの平均パープレキシティ改善を実現し、元のLCMよりも80%以上のメモリとエネルギー削減を実現した。コードとモデルはhttps://github.com/GATECH-EIC/ShiftAddLLM.comで公開されている。 Large language models (LLMs) have shown impressive performance on language tasks but face challenges when deployed on resource-constrained devices due to their extensive parameters and reliance on dense multiplications, resulting in high memory demands and latency bottlenecks. Shift-and-add reparameterization offers a promising solution by replacing costly multiplications with hardware-friendly primitives in both the attention and multi-layer perceptron (MLP) layers of an LLM. However, current reparameterization techniques require training from scratch or full parameter fine-tuning to restore accuracy, which is resource-intensive for LLMs. To address this, we propose accelerating pretrained LLMs through post-training shift-and-add reparameterization, creating efficient multiplication-free models, dubbed ShiftAddLLM. Specifically, we quantize each weight matrix into binary matrices paired with group-wise scaling factors. The associated multiplications are reparameterized into (1) shifts between activations and scaling factors and (2) queries and adds according to the binary matrices. To reduce accuracy loss, we present a multi-objective optimization method to minimize both weight and output activation reparameterization errors. Additionally, based on varying sensitivity across layers to reparameterization, we develop an automated bit allocation strategy to further reduce memory usage and latency. Experiments on five LLM families and eight tasks consistently validate the effectiveness of ShiftAddLLM, achieving average perplexity improvements of 5.6 and 22.7 points at comparable or lower latency compared to the most competitive quantized LLMs at 3 and 2 bits, respectively, and more than 80% memory and energy reductions over the original LLMs. Codes and models are available at https://github.com/GATECH-EIC/ShiftAddLLM.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-11
# 量子コンピュータにおける大規模・高精度流体シミュレーションの実現 Enabling Large-Scale and High-Precision Fluid Simulations on Near-Term Quantum Computers ( http://arxiv.org/abs/2406.06063v2 ) ライセンス: Link先を確認	Zhao-Yun Chen, Teng-Yang Ma, Chuang-Chao Ye, Liang Xu, Ming-Yang Tan, Xi-Ning Zhuang, Xiao-Fan Xu, Yun-Jie Wang, Tai-Ping Sun, Yong Chen, Lei Du, Liang-Liang Guo, Hai-Feng Zhang, Hao-Ran Tao, Tian-Le Wang, Xiao-Yan Yang, Ze-An Zhao, Peng Wang, Sheng Zhang, Chi Zhang, Ren-Ze Zhao, Zhi-Long Jia, Wei-Cheng Kong, Meng-Han Dou, Jun-Chao Wang, Huan-Yu Liu, Cheng Xue, Peng-Jun-Yi Zhang, Sheng-Hong Huang, Peng Duan, Yu-Chun Wu, Guo-Ping Guo,	(参考訳) 量子計算流体力学(QCFD)は、量子アルゴリズムを高効率に活用することにより、古典計算流体力学(CFD)に代わる有望な代替手段を提供する。本稿では, 超伝導量子コンピュータ上に実装された総合QCFD法を提案し, 定常ポアゼイユ流と非定常音波伝搬のシミュレーションに成功した。ポワゼイユ流シミュレーションは相対誤差が0.2 %以下に達し、非定常音響波シミュレーションは5043次元行列を解き、これまでで最大の流体シミュレーションを量子コンピュータ上で達成した。我々のアプローチは量子コンピューティングと古典コンピューティングを橋渡しし、量子ハードウェアの制約に適応し、大規模CFD問題に対するスケーラブルなソリューションを提供する。 Quantum computational fluid dynamics (QCFD) offers a promising alternative to classical computational fluid dynamics (CFD) by leveraging quantum algorithms for higher efficiency. This paper introduces a comprehensive QCFD method implemented on a superconducting quantum computer, demonstrating successful simulations of steady Poiseuille flow and unsteady acoustic wave propagation. The Poiseuille flow simulation achieved a relative error of less than $0.2\%$, and the unsteady acoustic wave simulation solved a 5043-dimension matrix, marking the largest fluid simulation on a quantum computer to date. Our approach bridges quantum and classical computing, adapting to quantum hardware constraints and offering scalable solutions for large-scale CFD problems, which paves the way for practical applications of near-term quantum computers in computational science.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-11
# セルフチューニング: 自己学習を通じて新たな知識を効果的に獲得するLLMの指導 Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching ( http://arxiv.org/abs/2406.06326v2 ) ライセンス: Link先を確認	Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Yipeng Zhang, Haitao Mi, Helen Meng,	(参考訳) 大規模言語モデル(LLM)は、一度のトレーニングと常に進化する世界の性質のために、最新の情報の提供に苦慮することが多い。 LLMの現在の状態を維持するために、既存のアプローチは、通常、新しいドキュメントの事前トレーニングを継続する。しかし、それらは記憶された知識の抽出にしばしば困難に直面している。効率的なヒューマンラーニングにおけるFeynman Techniqueの顕著な成功に感銘を受けて,LLMが生文書から新たな知識を効果的に獲得する能力を向上させるための学習フレームワークであるSelf-Tuningを紹介した。具体的には、記憶、理解、自己反省という3つの重要な側面に焦点をあて、自己監督的な方法で作成された知識集約的なタスクのセットで文書を増強する自己学習戦略を開発する。さらに,3つのWiki-Newpages-2023-QAデータセットを導入し,記憶,抽出,推論に関するLLMの知識獲得能力を詳細に分析する。 Llama2ファミリーモデルに対する大規模な実験結果から、自己チューニングはすべての知識獲得タスクに対して一貫して優れた性能を示し、過去の知識の保存に優れることが明らかになった。 Large language models (LLMs) often struggle to provide up-to-date information due to their one-time training and the constantly evolving nature of the world. To keep LLMs current, existing approaches typically involve continued pre-training on new documents. However, they frequently face difficulties in extracting stored knowledge. Motivated by the remarkable success of the Feynman Technique in efficient human learning, we introduce Self-Tuning, a learning framework aimed at improving an LLM's ability to effectively acquire new knowledge from raw documents through self-teaching. Specifically, we develop a Self-Teaching strategy that augments the documents with a set of knowledge-intensive tasks created in a self-supervised manner, focusing on three crucial aspects: memorization, comprehension, and self-reflection. Additionally, we introduce three Wiki-Newpages-2023-QA datasets to facilitate an in-depth analysis of an LLM's knowledge acquisition ability concerning memorization, extraction, and reasoning. Extensive experimental results on Llama2 family models reveal that Self-Tuning consistently exhibits superior performance across all knowledge acquisition tasks and excels in preserving previous knowledge.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-11
# 大規模言語モデルエージェントを用いたウェアラブルデータのヘルスインサイトへの変換 Transforming Wearable Data into Health Insights using Large Language Model Agents ( http://arxiv.org/abs/2406.06464v2 ) ライセンス: Link先を確認	Mike A. Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor, Kumar Ayush, Hao-Wei Su, Qian He, Cory Y. McLean, Mark Malhotra, Shwetak Patel, Jiening Zhan, Tim Althoff, Daniel McDuff, Xin Liu,	(参考訳) ウェアラブルヘルストラッカーの普及と、睡眠と運動の重要性にもかかわらず、ウェアラブルデータから実用的なパーソナライズされた洞察を導出することは、これらのデータの非自明なオープンエンド分析を必要とするため、依然として課題である。近年の大規模言語モデル(LLM)エージェントの台頭は,世界に対する推論や対話にツールを利用することで,このようなパーソナライズされた分析を大規模に実現する,有望な機会を提供する。しかし、LLMエージェントの個人の健康分析への応用は、いまだに未解決のままである。本稿では,現在最先端のコード生成と情報検索ツールを活用し,ウェアラブルからの行動健康データを解析・解釈するエージェントシステムであるPersonal Health Insights Agent(PHIA)を紹介する。 4000以上の健康意識の質問をベンチマークで回答するデータセットを2つ評価する。 650時間の人間と専門家による評価に基づいて、PHIAは事実の数値的な質問の84%以上と、クラウドソーシングによるオープンエンドな質問の83%以上に正確に対処できることがわかった。この研究は、集団全体の行動の健康を向上させ、個人が自身のウェアラブルデータを解釈し、データ駆動の洞察によって知らされる、アクセス可能でパーソナライズされたウェルネスの新たな時代への道を歩む可能性がある。 Despite the proliferation of wearable health trackers and the importance of sleep and exercise to health, deriving actionable personalized insights from wearable data remains a challenge because doing so requires non-trivial open-ended analysis of these data. The recent rise of large language model (LLM) agents, which can use tools to reason about and interact with the world, presents a promising opportunity to enable such personalized analysis at scale. Yet, the application of LLM agents in analyzing personal health is still largely untapped. In this paper, we introduce the Personal Health Insights Agent (PHIA), an agent system that leverages state-of-the-art code generation and information retrieval tools to analyze and interpret behavioral health data from wearables. We curate two benchmark question-answering datasets of over 4000 health insights questions. Based on 650 hours of human and expert evaluation we find that PHIA can accurately address over 84% of factual numerical questions and more than 83% of crowd-sourced open-ended questions. This work has implications for advancing behavioral health across the population, potentially enabling individuals to interpret their own wearable data, and paving the way for a new era of accessible, personalized wellness regimens that are informed by data-driven insights.	翻訳日:2024-06-12 21:24:05 公開日:2024-06-11
# AIに基づく待ち行列システムの設計とスケジューリング Design and Scheduling of an AI-based Queueing System ( http://arxiv.org/abs/2406.06855v1 ) ライセンス: Link先を確認	Jiung Lee, Hongseok Namkoong, Yibo Zeng,	(参考訳) サービスシステムにおける予測モデルを利用して最適なスケジューリング決定を行うためには,外部性による混雑の予測誤差が他のジョブの遅延に与える影響を理解する必要がある。予測モデルがヒューマンサーバと相互作用するアプリケーション(例えば、コンテンツモデレーション)によって動機づけられた本研究では、ジョブのクラスを予測モデルを用いて推定する多数の単一サーバキューからなる大規模キューシステムについて考察する。交通渋滞における誤予測が混雑コストに与える影響を特徴付けることにより,予測されたクラス情報をほぼ最適に組み込んだインデックスベースのポリシーを設計する。我々の理論的結果は、下流の待ち行列性能を中心とする単純なモデル選択手順を提供することで予測モデルの設計をガイドし、AIベースのトリアージを用いた待ち行列システムの設計方法に関する新たな洞察を提供する。実際のオンラインコメントをベースとしたコンテンツモデレーションタスクにおいて,大規模言語モデルを微調整して毒性分類器を構築する。 To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a job is estimated using a prediction model. By characterizing the impact of mispredictions on congestion cost in heavy traffic, we design an index-based policy that incorporates the predicted class information in a near-optimal manner. Our theoretical results guide the design of predictive models by providing a simple model selection procedure with downstream queueing performance as a central concern, and offer novel insights on how to design queueing systems with AI-based triage. We illustrate our framework on a content moderation task based on real online comments, where we construct toxicity classifiers by finetuning large language models.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# 表層強化学習における政策差分推定によるサンプル複雑度低減 Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning ( http://arxiv.org/abs/2406.06856v1 ) ライセンス: Link先を確認	Adhyyan Narang, Andrew Wagenmaker, Lillian Ratliff, Kevin Jamieson,	(参考訳) 本稿では,文脈的包帯および表層強化学習(RL)における純粋探索問題に対する非漸近的サンプル複雑性について検討する。バンディットにおける既存の研究は、個々の政策の行動の違いのみを推定することで、それぞれの政策の行動を直接推定するよりもはるかに安価に、最良の政策を特定できることを示した。しかし、RLの最もよく知られた複雑さはこの利点を生かせず、代わりにそれぞれのポリシーの振舞いを直接見積もる。 RLにおける政策の行動の違いだけを見積もるだけで十分だろうか? 文脈的包帯については肯定的だが,表層RLでは否定的であり,文脈的包帯とRLとの分離が示されている。しかし、このことから着想を得た結果、RLの差のみを推定することがほぼ十分であることが示され、単一の参照ポリシの振る舞いを推定できれば、他のポリシが基準ポリシから逸脱するかを見積もるだけで十分である。我々は,この原理を定式化し,この知識を最大限に活用するために,表状RLの標本複雑性に最も強く依存するアルゴリズムを開発した。 In this paper, we study the non-asymptotic sample complexity for the pure exploration problem in contextual bandits and tabular reinforcement learning (RL): identifying an epsilon-optimal policy from a set of policies with high probability. Existing work in bandits has shown that it is possible to identify the best policy by estimating only the difference between the behaviors of individual policies, which can be substantially cheaper than estimating the behavior of each policy directly. However, the best-known complexities in RL fail to take advantage of this and instead estimate the behavior of each policy directly. Does it suffice to estimate only the differences in the behaviors of policies in RL? We answer this question positively for contextual bandits but in the negative for tabular RL, showing a separation between contextual bandits and RL. However, inspired by this, we show that it almost suffices to estimate only the differences in RL: if we can estimate the behavior of a single reference policy, it suffices to only estimate how any other policy deviates from this reference policy. We develop an algorithm which instantiates this principle and obtains, to the best of our knowledge, the tightest known bound on the sample complexity of tabular RL.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# FLUX:カーネルフュージョンによるGPU上での高速ソフトウェアベースの通信オーバーラップ FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion ( http://arxiv.org/abs/2406.06858v1 ) ライセンス: Link先を確認	Liwen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Ningxin Zheng, Xuanrun Zhang, Zuquan Song, Ziheng Jiang, Haibin Lin, Xin Liu,	(参考訳) 大規模なディープラーニングモデルは、広範囲のアプリケーションで多くのタスクを解く強力な能力を示している。これらの大きなモデルは一般的に、トレーニングと推論を必要とします。テンソル並列性(Tensor parallelism)は、単一のプロセッサのメモリ容量制限を克服し、/または特定のレイテンシ要求を満たすために計算を高速化するために、デバイス間で操作やレイヤの計算を分割する一般的な手法である。しかし、この種の並列処理は、ランタイム全体のかなりの部分を占めるかもしれない追加の通信を導入します。これにより、ノード内のNVLinkを持つGPUなど、高速な相互接続を持つデバイス群における、このテクニックのスケーラビリティが制限される。本稿では,GPUに依存する計算で通信遅延を著しく隠蔽する新しいFlux法を提案する。 Fluxは通信処理と計算処理を細かな演算に過度に分解し、さらに大きなカーネルに融合させ、カーネル効率を損なうことなく効果的に通信を隠蔽する。 Fluxは核融合によって最大96%の通信を重複させる可能性がある。全体として、様々なGPU世代と相互接続を持つ128GPUのクラスタ上で、Megatron-LMをトレーニングするための最大1.24倍のスピードアップを実現し、様々なGPU世代と相互接続を持つ8GPUを持つクラスタ上で、vLLM上の推論をプリフィルおよびデコードするための最大1.66倍と1.30倍のスピードアップを実現している。 Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning computation of an operation or layer across devices to overcome the memory capacity limitation of a single processor, and/or to accelerate computation to meet a certain latency requirement. However, this kind of parallelism introduces additional communication that might contribute a significant portion of overall runtime. Thus limits scalability of this technique within a group of devices with high speed interconnects, such as GPUs with NVLinks in a node. This paper proposes a novel method, Flux, to significantly hide communication latencies with dependent computations for GPUs. Flux over-decomposes communication and computation operations into much finer-grained operations and further fuses them into a larger kernel to effectively hide communication without compromising kernel efficiency. Flux can potentially overlap up to 96% of communication given a fused kernel. Overall, it can achieve up to 1.24x speedups for training over Megatron-LM on a cluster of 128 GPUs with various GPU generations and interconnects, and up to 1.66x and 1.30x speedups for prefill and decoding inference over vLLM on a cluster with 8 GPUs with various GPU generations and interconnects.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# プロカ量子電磁力学における2つの原子間の分散相互作用 Dispersive interaction between two atoms in Proca Quantum Electrodynamics ( http://arxiv.org/abs/2406.06862v1 ) ライセンス: Link先を確認	Gabriel Camacho de Pinho, Carlos Augusto Domingues Zarro, Carlos Farina, Reinaldo de Melo e Souza, Maurício Hippert,	(参考訳) 2つの原子間の分散相互作用における質量光子の影響を解析する。私たちは、Proca Quantum Electrodynamicsの文脈で働きます。光子質量は、新しい長さスケールを導入するだけでなく、電磁場に対して縦方向の偏光を生じさせる。我々は、任意の距離状態の原子間の相互作用エネルギーを明示的に取得し、いくつかの特定のケースについて考察する。与えられた原子間距離に対して、光子質量が大きいほど、それは非遅延近似であることを示す。 We analyze the influence of a massive photon in the dispersive interaction between two atoms in their fundamental states. We work in the context of Proca Quantum Electrodynamics. The photon mass not only introduces a new length scale but also gives rise to a longitudinal polarization for the electromagnetic field. We obtain explicitly the interaction energy between the atoms for any distance regime and consider several particular cases. We show that, for a given interatomic distance, the greater the photon mass the better it is the non-retarded approximation.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# Ollabench氏:人間中心の相互依存サイバーセキュリティに対するLLMの推論の評価 Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent Cybersecurity ( http://arxiv.org/abs/2406.06863v1 ) ライセンス: Link先を確認	Tam n. Nguyen,	(参考訳) 大規模言語モデル(LLM)は、複雑な相互依存型サイバーセキュリティシステムを表現することによってエージェントベースモデリングを強化する可能性があり、サイバーセキュリティ脅威モデリングとリスク管理を改善する。しかし、この文脈でのLCMの評価は、法的なコンプライアンスと効果的なアプリケーション開発に不可欠である。既存のLCM評価フレームワークは、しばしば、相互依存型サイバーセキュリティに不可欠なヒューマンファクターと認知コンピューティング能力を見落としている。このギャップに対処するために,シナリオベースの情報セキュリティコンプライアンスや非コンプライアンス問題に答える上で,LLMの正確性,無駄性,一貫性を評価する新しい評価フレームワークであるOllaBenchを提案する。 OllaBenchは、24の認知行動理論と38の査読論文の実証的証拠の基礎の上に構築されている。 OllaBench は OpenAI, Anthropic, Google, Microsoft, Meta など,21の LLM の評価に使用された。その結果,商業用LLMは総合的精度が最も高いが,改善の余地は大きいことがわかった。より小型の低解像度オープンウェイトLCMは性能に劣らず, 評価モデル間でトークン効率と整合性に有意な差がある。 OllaBenchはユーザフレンドリーなインターフェースを提供し、幅広いLLMプラットフォームをサポートしている。 Large Language Models (LLMs) have the potential to enhance Agent-Based Modeling by better representing complex interdependent cybersecurity systems, improving cybersecurity threat modeling and risk management. However, evaluating LLMs in this context is crucial for legal compliance and effective application development. Existing LLM evaluation frameworks often overlook the human factor and cognitive computing capabilities essential for interdependent cybersecurity. To address this gap, I propose OllaBench, a novel evaluation framework that assesses LLMs' accuracy, wastefulness, and consistency in answering scenario-based information security compliance and non-compliance questions. OllaBench is built on a foundation of 24 cognitive behavioral theories and empirical evidence from 38 peer-reviewed papers. OllaBench was used to evaluate 21 LLMs, including both open-weight and commercial models from OpenAI, Anthropic, Google, Microsoft, Meta and so on. The results reveal that while commercial LLMs have the highest overall accuracy scores, there is significant room for improvement. Smaller low-resolution open-weight LLMs are not far behind in performance, and there are significant differences in token efficiency and consistency among the evaluated models. OllaBench provides a user-friendly interface and supports a wide range of LLM platforms, making it a valuable tool for researchers and solution developers in the field of human-centric interdependent cybersecurity and beyond.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# メタモルフィックプロンプトテストによるLCM生成プログラムの検証 Validating LLM-Generated Programs with Metamorphic Prompt Testing ( http://arxiv.org/abs/2406.06864v1 ) ライセンス: Link先を確認	Xiaoyin Wang, Dakai Zhu,	(参考訳) ソフトウェア開発における最新のパラダイムシフトは、GPT(Generative Pre-trained Transformer)によって紹介された、Large Language Models (LLMs)によるイノベーションと自動化をもたらす。 LLMの生成するコードの潜在的な利点は、特に効率性と迅速なプロトタイピングにおいて大きく、LCMがソフトウェア開発ライフサイクルにますます統合されるにつれて、これらの言語モデルから生成されたコードが品質と正確性について深い疑問を呈するサプライチェーン、複雑で多面的な課題が発生する。 LLM生成コードを取り巻くこれらの重要な懸念を包括的に調査するためには、研究が必要である。本稿では,これらの課題に対処するため,メタモルフィック・プロンプト・テストと呼ばれる新しい手法を提案する。直感的な観察では、本質的な一貫性は常に正しいコード片の間に存在しますが、欠陥のあるコード片には存在しません。したがって、パラフレーズで複数のプロンプトに与えられたプロンプトを変更でき、LLMに生成したコードの複数バージョンを取得するよう依頼することができるので、クロスバリデーションにより、セマンティックリレーションが取得したコードにまだ保持されているかどうかを検証できる。我々のHumanEvalに対する評価は,GPT-4が生成する誤プログラムの75%を,偽陽性率8.6%で検出できることを示す。 The latest paradigm shift in software development brings in the innovation and automation afforded by Large Language Models (LLMs), showcased by Generative Pre-trained Transformer (GPT), which has shown remarkable capacity to generate code autonomously, significantly reducing the manual effort required for various programming tasks. Although, the potential benefits of LLM-generated code are vast, most notably in efficiency and rapid prototyping, as LLMs become increasingly integrated into the software development lifecycle and hence the supply chain, complex and multifaceted challenges arise as the code generated from these language models carry profound questions on quality and correctness. Research is required to comprehensively explore these critical concerns surrounding LLM-generated code. In this paper, we propose a novel solution called metamorphic prompt testing to address these challenges. Our intuitive observation is that intrinsic consistency always exists among correct code pieces but may not exist among flawed code pieces, so we can detect flaws in the code by detecting inconsistencies. Therefore, we can vary a given prompt to multiple prompts with paraphrasing, and to ask the LLM to acquire multiple versions of generated code, so that we can validate whether the semantic relations still hold in the acquired code through cross-validation. Our evaluation on HumanEval shows that metamorphic prompt testing is able to detect 75 percent of the erroneous programs generated by GPT-4, with a false positive rate of 8.6 percent.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# アイボーリングコンビニアル問題:多モーダル大言語モデルを用いたトラベリングセールスマン問題の解法 Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems ( http://arxiv.org/abs/2406.06865v1 ) ライセンス: Link先を確認	Mohammed Elhenawy, Ahmed Abdelhay, Taqwa I. Alhadidi, Huthaifa I Ashqar, Shadi Jaradat, Ahmed Jaber, Sebastien Glaser, Andry Rakotonirainy,	(参考訳) MLLM (Multimodal Large Language Models) は、テキスト、画像、オーディオなど、双方向のモダリティを処理する能力を示した。これらのモデルは、既存の知識を幅広く活用することで、少数のショットとゼロショットのインコンテキスト学習シナリオで証明されているように、特定のトレーニング例を最小限あるいは全く含まない複雑な問題に対処することができる。本稿では,2次元平面上の点分布の画像解析により,旅行セールスマン問題(TSP)の「眼球」解に対するMLLMの視覚機能の利用について検討する。本実験は,MLLMが有効なTSP経路を効果的に「眼球」できるという仮説を検証することを目的とした。ゼロショット、少数ショット、自己アンサンブル、自己修正ゼロショット評価の結果は、有望な結果を示している。これらの知見がMLLMの視覚的推論能力の他の組み合わせ問題に対処するためのさらなる探究を促すことを期待する。 Multimodal Large Language Models (MLLMs) have demonstrated proficiency in processing di-verse modalities, including text, images, and audio. These models leverage extensive pre-existing knowledge, enabling them to address complex problems with minimal to no specific training examples, as evidenced in few-shot and zero-shot in-context learning scenarios. This paper investigates the use of MLLMs' visual capabilities to 'eyeball' solutions for the Traveling Salesman Problem (TSP) by analyzing images of point distributions on a two-dimensional plane. Our experiments aimed to validate the hypothesis that MLLMs can effectively 'eyeball' viable TSP routes. The results from zero-shot, few-shot, self-ensemble, and self-refine zero-shot evaluations show promising outcomes. We anticipate that these findings will inspire further exploration into MLLMs' visual reasoning abilities to tackle other combinatorial problems.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# 埋め込みには何が入っていますか。埋め込みの匂いは甘いでしょうか? What's in an embedding? Would a rose by any embedding smell as sweet? ( http://arxiv.org/abs/2406.06870v1 ) ライセンス: Link先を確認	Venkat Venkatasubramanian,	(参考訳) LLM(Large Language Models)はしばしば、真の「理解」が欠如しており、その知識を「理解する」能力が欠如しているとして批判されている。私たちはこの視点が重要な洞察を欠いていると信じています。我々はLSMが「幾何学的」のような経験的な「下地」を開発しており、NLP、コンピュータビジョン、コーディング支援など様々な応用に十分と思われることを示唆している。しかし、この「幾何学的」理解は、不完全でノイズの多いデータから構築され、数十年前にヒューリスティックスベースのエキスパートシステムによって直面した課題と同様に、信頼できない、一般化が難しい、推論能力や説明が欠如している。これらの制限を克服するために、私たちはLLMをエキスパートシステムで使用されるシンボリックAI要素を含む知識の「代数的」表現に統合すべきだと提案する。この統合の目的は、第一原理に根ざした「深い」知識を持つだけでなく、人間専門家の能力を模倣し、説明し、説明する能力を持つ、大きな知識モデル(LKM)を作ることである。生成AIの潜在能力を安全かつ効果的に活用するためには、LLMからより包括的なLKMへのパラダイムシフトが必要である。 Large Language Models (LLMs) are often criticized for lacking true "understanding" and an ability to "reason" with their knowledge, being seen merely as advanced autocomplete systems. We believe that this perspective might be missing an important insight. We suggest that LLMs do develop a kind of empirical "understanding" that is "geometry"-like, which seems quite sufficient for a range of applications in NLP, computer vision, coding assistance, etc. However, this "geometric" understanding, built from incomplete and noisy data, makes them unreliable, difficult to generalize, and lacking in inference capabilities and explanations, similar to the challenges faced by heuristics-based expert systems decades ago. To overcome these limitations, we suggest that LLMs should be integrated with an "algebraic" representation of knowledge that includes symbolic AI elements used in expert systems. This integration aims to create large knowledge models (LKMs) that not only possess "deep" knowledge grounded in first principles, but also have the ability to reason and explain, mimicking human expert capabilities. To harness the full potential of generative AI safely and effectively, a paradigm shift from LLMs to the more comprehensive LKMs is needed.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# ヒューマンフィードバックによる政策整合性向上のための共同実証と選好学習 Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback ( http://arxiv.org/abs/2406.06874v1 ) ライセンス: Link先を確認	Chenliang Li, Siliang Zeng, Zeyi Liao, Jiaxiang Li, Dongyeop Kang, Alfredo Garcia, Mingyi Hong,	(参考訳) 人間の好みと価値を調整することは、現代の基礎モデルの構築とAIの具体化にとって重要な要件である。しかし、人間フィードバックによる強化学習(RLHF)のような一般的なアプローチでは、教師付き微調整(SFT)、報酬モデリング(RM)、強化学習(RL)のように、タスクを連続的に分割し、1つの特定の学習タスクを実行する。このようなシーケンシャルなアプローチは、データの利用不足や学習された報酬モデルと生成されたポリシーの間の分散ミスマッチといった深刻な問題を引き起こし、最終的にはアライメント性能が低下する。そこで我々は,AIHF(Alignment with Integrated Human Feedback)と呼ばれる単一段階のアプローチを開発し,人間の嗜好と実演を統合し,報酬モデルとポリシーを訓練する。提案手法では,RLHF や Directly Policy Optimization (DPO) などの一般的なアライメントアルゴリズムの削減と活用が容易であり,既存のアライメントパイプラインに小さな変更を加えるだけでよい。本研究では,LLMにおけるアライメント問題と,MuJoCoにおけるロボット制御問題を含む広範な実験により,提案手法の有効性を実証する。提案手法はRLHFやDPOといった既存のアライメントアルゴリズムを,特に高品質な嗜好データが比較的限定されている場合,大きなマージンで上回っている。 Aligning human preference and value is an important requirement for building contemporary foundation models and embodied AI. However, popular approaches such as reinforcement learning with human feedback (RLHF) break down the task into successive stages, such as supervised fine-tuning (SFT), reward modeling (RM), and reinforcement learning (RL), each performing one specific learning task. Such a sequential approach results in serious issues such as significant under-utilization of data and distribution mismatch between the learned reward model and generated policy, which eventually lead to poor alignment performance. We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF), capable of integrating both human preference and demonstration to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms such as RLHF and Directly Policy Optimization (DPO), and only requires minor changes to the existing alignment pipelines. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo. We observe that the proposed solutions outperform the existing alignment algorithms such as RLHF and DPO by large margins, especially when the amount of high-quality preference data is relatively limited.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# 有限ミンコフスキー時空相関関数からの包含反応 Inclusive reactions from finite Minkowski spacetime correlation functions ( http://arxiv.org/abs/2406.06877v1 ) ライセンス: Link先を確認	Marco A. Carrillo, Raúl A. Briceño, Alexandru M. Sturzu,	(参考訳) 任意のキネマティックスのための少数のハドロン系の散乱振幅を決定する必要性は、現代の原子核とハドロン物理学の幅広いサブフィールドを拡張する。本研究では,量子コンピューティングやテンソルネットワークなどのリアルタイム手法による散乱振幅の最小値の決定について,これまでの研究をさらに進める。このような計算は、散乱振幅が十分に定義されていない有限ミンコフスキー時空で行う必要がある。前報では,有限体積相関関数から構築した散乱振幅の系統的即効性推定器の推算を行った。ここでは、この処方薬が以前検討したよりも大きな運動領域に作用することを示すとともに、より広範な散乱振幅のクラスを示す。最後に、そのような計算に必要な有限時間分離に伴う誤差の大きさの順序を推定する新しい手法を考案する。理論の最も軽い質量の単位において、$\mathcal{O}(10\%)$内の実時間法を用いて振幅を制約するためには、時空体積は$mL \sim \mathcal{O}(10-10^2)$および$mT\sim \mathcal{O}(10^2-10^4)$を満たす必要がある。 The need to determine scattering amplitudes of few-hadron systems for arbitrary kinematics expands a broad set of subfields of modern-day nuclear and hadronic physics. In this work, we expand upon previous explorations on the use of real-time methods, like quantum computing or tensor networks, to determine few-body scattering amplitudes. Such calculations must be performed in a finite Minkowski spacetime, where scattering amplitudes are not well defined. Our previous work presented a conjecture of a systematically improvable estimator for scattering amplitudes constructed from finite-volume correlation functions. Here we provide further evidence that the prescription works for larger kinematic regions than previously explored as well as a broader class of scattering amplitudes. Finally, we devise a new method for estimating the order of magnitude of the error associated with finite time separations needed for such calculations. In units of the lightest mass of the theory, we find that to constrain amplitudes using real-time methods within $\mathcal{O}(10\%)$, the spacetime volumes must satisfy $mL \sim \mathcal{O}(10-10^2)$ and $ mT\sim \mathcal{O}(10^2-10^4)$.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# 反復学習モデルを用いた言語接触のモデル化 Modeling language contact with the Iterated Learning Model ( http://arxiv.org/abs/2406.06878v1 ) ライセンス: Link先を確認	Seth Bullock, Conor Houghton,	(参考訳) 言語間の接触は語彙やその他の言語特徴を伝達する可能性があるが、これは必ずしも起こらない。ここでは,反復学習モデルを用いて,言語接触時の言語抵抗を簡易に検証する。反復学習モデルは言語変化のエージェントベースモデルであり、言語伝達ボトルネックの結果、表現的で構成的な言語が自然に発生することを示す。最近導入された反復学習モデルであるSemi-Supervised ILMは、言語接触をシミュレートするために使われている。これらのシミュレーションには、言語接触に関わる複雑な要素の多くが含まれておらず、話者の集団をモデル化していないが、モデルでは、モデル内の言語を自発的に表現的かつ構成的に導くダイナミクスが、他の言語と混同しても言語がその中核的な特徴を維持することを示している。 Contact between languages has the potential to transmit vocabulary and other language features; however, this does not always happen. Here, an iterated learning model is used to examine, in a simple way, the resistance of languages to change during language contact. Iterated learning models are agent-based models of language change, they demonstrate that languages that are expressive and compositional arise spontaneously as a consequence of a language transmission bottleneck. A recently introduced type of iterated learning model, the Semi-Supervised ILM is used to simulate language contact. These simulations do not include many of the complex factors involved in language contact and do not model a population of speakers; nonetheless the model demonstrates that the dynamics which lead languages in the model to spontaneously become expressive and compositional, also cause a language to maintain its core traits even after mixing with another language.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# SpikePipe: 階層間パイプライニングとマルチプロセッサスケジューリングによるスパイクニューラルネットワークの高速化トレーニング SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling ( http://arxiv.org/abs/2406.06879v1 ) ライセンス: Link先を確認	Sai Sanjeet, Bibhu Datta Sahoo, Keshab K. Parhi,	(参考訳) スパイキングニューラルネットワーク(SNN)はその高エネルギー効率のために人気を博している。先行研究では、バックプロパゲーションに基づく手法を含む、SNNの訓練方法が提案されている。 SNNのトレーニングは従来のものに比べて計算コストが高く、マルチプロセッサハードウェアアクセラレーションの恩恵を受けるだろう。本稿では,シストリックアレイベースのプロセッサとマルチプロセッサスケジューリングを用いて,SNNのトレーニングを高速化するための層間パイプライニングを提案する。遅延勾配を用いたトレーニングの効果は、3つのネットワークで異なるデータセットでトレーニングし、小さなネットワークでは劣化せず、大きなネットワークでは10%も劣化しないことを示した。 SNNの各種トレーニングタスクをシストリックアレイにマッピングし,提案手法を3つのネットワーク上で評価する。結果は、標準的なパイプラインアルゴリズムと比較される。提案手法は,標準的なパイプライン化アルゴリズムと比較して平均1.6倍の高速化を実現し,場合によっては2倍の高速化を実現している。提案手法による通信オーバーヘッドは,訓練に必要な通信量の0.5%以下である。 Spiking Neural Networks (SNNs) have gained popularity due to their high energy efficiency. Prior works have proposed various methods for training SNNs, including backpropagation-based methods. Training SNNs is computationally expensive compared to their conventional counterparts and would benefit from multiprocessor hardware acceleration. This is the first paper to propose inter-layer pipelining to accelerate training in SNNs using systolic array-based processors and multiprocessor scheduling. The impact of training using delayed gradients is observed using three networks training on different datasets, showing no degradation for small networks and < 10% degradation for large networks. The mapping of various training tasks of the SNN onto systolic arrays is formulated, and the proposed scheduling method is evaluated on the three networks. The results are compared against standard pipelining algorithms. The results show that the proposed method achieves an average speedup of 1.6X compared to standard pipelining algorithms, with an upwards of 2X improvement in some cases. The incurred communication overhead due to the proposed method is less than 0.5% of the total required communication of training.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# EFIペアに必要な擬似エンタングルメント Pseudo-Entanglement is Necessary for EFI Pairs ( http://arxiv.org/abs/2406.06881v1 ) ライセンス: Link先を確認	Manuel Goulão, David Elkouss,	(参考訳) 最小限の仮定については、古典暗号の多くはワンウェイ関数(OWF)の存在に依存していることが知られている。しかし、近年の証拠は、量子資源を考える際にはそうではないことを示している。量子鍵分布のよく知られた非条件のセキュリティに加えて、計算暗号はOWF、例えば擬似ランダム状態[JLS18]、一方通行状態ジェネレータ[MY23]、またはEFIペアの状態[BCQ23]よりも弱いプリミティブ上に構築されることが知られている。我々は、新しい量子リソース、擬似絡み合いを考察し、暗号の最も弱い計算仮定(コミットメント、曖昧な転送、セキュアなマルチパーティ計算、計算ゼロ知識証明)の候補であるEFIペアの存在が、いくつかの合理的適応の下で [ABF+24, ABV23] によって定義された擬似絡み合いの存在を示唆していることを示す。 EFIペアのみを与えられた疑似絡み合った量子状態の族を構築することでこれを証明している。この結果は,計算暗号の分野において重要な意味を持つ。これは、疑似絡み合いが存在しない場合、ほとんどの暗号は存在しないことを示している。さらに、他の仮定を単一のプリミティブに統一する道を開くかもしれない、ほとんどの計算暗号において、新しい最小の仮定として擬似絡み合いを確立している。最後に、擬似絡み合いは物理現象と効率的な計算を結びつけ、その結果、暗号と物理世界の関連性を強化する。 Regarding minimal assumptions, most of classical cryptography is known to depend on the existence of One-Way Functions (OWFs). However, recent evidence has shown that this is not the case when considering quantum resources. Besides the well known unconditional security of Quantum Key Distribution, it is now known that computational cryptography may be built on weaker primitives than OWFs, e.g., pseudo-random states [JLS18], one-way state generators [MY23], or EFI pairs of states [BCQ23]. We consider a new quantum resource, pseudo-entanglement, and show that the existence of EFI pairs, one of the current main candidates for the weakest computational assumption for cryptography (necessary for commitments, oblivious transfer, secure multi-party computation, computational zero-knowledge proofs), implies the existence of pseudo-entanglement, as defined by [ABF+24, ABV23] under some reasonable adaptations. We prove this by constructing a new family of pseudo-entangled quantum states given only EFI pairs. Our result has important implications for the field of computational cryptography. It shows that if pseudo-entanglement does not exist, then most of cryptography cannot exist either. Moreover, it establishes pseudo-entanglement as a new minimal assumption for most of computational cryptography, which may pave the way for the unification of other assumptions into a single primitive. Finally, pseudo-entanglement connects physical phenomena and efficient computation, thus, our result strengthens the connection between cryptography and the physical world.	翻訳日:2024-06-12 19:46:28 公開日:2024-06-11
# PLUM: 優先学習プラステストケースはより良いコード言語モデルになる PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models ( http://arxiv.org/abs/2406.06887v1 ) ライセンス: Link先を確認	Dylan Zhang, Shizhe Diao, Xueyan Zou, Hao Peng,	(参考訳) 命令に精通したコード言語モデル(LM)は、様々なプログラミングタスクにおいて有望であることを示している。自然言語命令とゴールドコードスニペットペアに基づいて、言語モデリングの目的を使ってトレーニングされている。最近の証拠は、これらのモデルはトレーニング中に間違った解に晒されることがなく、しばしば正しい解と間違った解を区別するのに苦労していることを示唆している。不正なソリューションよりも正しいソリューションを好むようにモデルを訓練する選好学習は、コードLMの境界をさらに推し進めるのに役立ちますか? PLUMは、コードL\textbf{M}sに適したテストケースを具現化した、新規な \textbf{p}reference \textbf{l}earning framework a\textbf{u}earning framework a\textbf{u}gmented with code case of code L\textbf{M}s。 PLUMは、(1)自然言語命令のテストケースの生成、(2)ポリシーからの候補ソリューションのサンプリング、およびそれらのテストケースに対する評価の三段階からなる。 PLUMは、最先端のオープンソース言語モデルであるCodeQwen-1.5-7B-Chatであっても、HumanEval (+)やMBPP (+)のような既存のコード生成ベンチマークにおける既存のコードLMの性能を大幅に改善することを示した。 PLUMは制御された微調整(SFT)段階を補完し、相乗効果を示す。 Instruction-finetuned code language models (LMs) have shown promise in various programming tasks. They are trained, using a language modeling objective, on natural language instructions and gold code snippet pairs. Recent evidence suggests that these models, never exposed to incorrect solutions during training, often struggle to distinguish between correct and incorrect solutions. This observation raises our inquiry: Can preference learning, which trains models to prefer correct solutions over incorrect ones, help push the boundaries of code LMs even further? We propose PLUM, a novel \textbf{p}reference \textbf{l}earning framework a\textbf{u}gmented with test cases tailored for code L\textbf{M}s.PLUM aims to investigate the key success factors and potential benefits of preference learning in code LMs, which remain elusive despite its success in aligning LMs with human values. PLUM consists of three stages: (1) Generating test cases for natural language instructions, (2) sampling candidate solutions from the policy and evaluating them against the test cases to create a preference dataset, which is then used to (3) train the policy with a preference learning algorithm. Experiments demonstrate that PLUM substantially improves the performance of existing code LMs on established code generation benchmarks such as HumanEval (+) and MBPP (+), even for the state-of-the-art open-source language model CodeQwen-1.5-7B-Chat. PLUM complements the supervised fine-tuning (SFT) stage, demonstrating synergistic effects.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# 運動整合性モデル: 遠方運動提示蒸留による映像拡散の加速 Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation ( http://arxiv.org/abs/2406.06890v1 ) ライセンス: Link先を確認	Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang,	(参考訳) 画像拡散蒸留は, 非常に少ないサンプリングステップで高忠実度生成を実現する。しかし、これらの手法をビデオ拡散に直接適用すると、公開ビデオデータセットの視覚的品質が制限されるため、フレーム品質が不満足になることが多い。これは教師と生徒のビデオ拡散モデルの両方のパフォーマンスに影響を与える。本研究の目的は,高画質の画像データを用いて,フレームの外観を改善しながらビデオ拡散蒸留を改善することである。動きと外観学習を両立させる一段ビデオ拡散蒸留法である動き整合モデル(MCM)を提案する。具体的には、ビデオ教師モデルから動きを蒸留するビデオ一貫性モデルと、高品質な画像データに合うようにフレームの外観を向上する画像識別装置とを含む。この組み合わせは,(1)低品質の映像フレームからビデオ蒸留が学習する際のフレーム学習目標の相違,(2)トレーニングや推論で使用されるビデオサンプルの品質の違いによるトレーニングと推論の相違,の2つの課題を提示する。これらの課題に対処するために, 遠絡型運動蒸留と混合軌跡蒸留を導入する。前者は運動表現のみに蒸留目標を適用し、後者は低品質ビデオドメインと高画質ビデオドメインの両方から蒸留軌跡を混合することによりトレーニング推論の相違を緩和する。大規模な実験により,MCMは最先端のビデオ拡散蒸留性能を達成できた。さらに,本手法は映像拡散モデルのフレーム品質を向上させることができ,高い美的スコアや特定のスタイルのフレームを対応するビデオデータなしで生成することができる。 Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public video datasets. This affects the performance of both teacher and student video diffusion models. Our study aims to improve video diffusion distillation while improving frame appearance using abundant high-quality image data. We propose motion consistency model (MCM), a single-stage video diffusion distillation method that disentangles motion and appearance learning. Specifically, MCM includes a video consistency model that distills motion from the video teacher model, and an image discriminator that enhances frame appearance to match high-quality image data. This combination presents two challenges: (1) conflicting frame learning objectives, as video distillation learns from low-quality video frames while the image discriminator targets high-quality images; and (2) training-inference discrepancies due to the differing quality of video samples used during training and inference. To address these challenges, we introduce disentangled motion distillation and mixed trajectory distillation. The former applies the distillation objective solely to the motion representation, while the latter mitigates training-inference discrepancies by mixing distillation trajectories from both the low- and high-quality video domains. Extensive experiments show that our MCM achieves the state-of-the-art video diffusion distillation performance. Additionally, our method can enhance frame quality in video diffusion models, producing frames with high aesthetic scores or specific styles without corresponding video data.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# Tokenize features, enhance table: the FT-TABPFN model for tabular classification Tokenize features, enhancing tables: the FT-TABPFN model for tabular classification ( http://arxiv.org/abs/2406.06891v1 ) ライセンス: Link先を確認	Quangao Liu, Wei Yang, Chen Liang, Longlong Pang, Zhuozhang Zou,	(参考訳) 従来の表型分類法は、通常、スクラッチからの教師付き学習に依存しており、モデルパラメータを決定するために広範囲なトレーニングデータを必要とする。しかし、このパラダイムは、Presideed-Data Fitted Networks (TabPFN)と呼ばれる新しいアプローチによって変更されている。 TabPFNは、大規模な合成データセットに基づいて訓練された12層トランスフォーマーを使用して、普遍的な表表表現を学習する。この方法は、1つのフォワードパスで新しいタスクの高速かつ正確な予測を可能にし、追加のトレーニングは不要である。 TabPFNは小さなデータセットで成功したが、一般的には分類的特徴を扱う際のパフォーマンスが低下している。この制限を克服するため,TabPFNの強化版であるFT-TabPFNを提案する。ダウンストリームタスク用に微調整することで、FT-TabPFNはオリジナルのモデルの機能を拡大するだけでなく、表の分類における適用性と精度を大幅に改善する。私たちの完全なソースコードは、コミュニティの利用と開発に利用可能です。 Traditional methods for tabular classification usually rely on supervised learning from scratch, which requires extensive training data to determine model parameters. However, a novel approach called Prior-Data Fitted Networks (TabPFN) has changed this paradigm. TabPFN uses a 12-layer transformer trained on large synthetic datasets to learn universal tabular representations. This method enables fast and accurate predictions on new tasks with a single forward pass and no need for additional training. Although TabPFN has been successful on small datasets, it generally shows weaker performance when dealing with categorical features. To overcome this limitation, we propose FT-TabPFN, which is an enhanced version of TabPFN that includes a novel Feature Tokenization layer to better handle classification features. By fine-tuning it for downstream tasks, FT-TabPFN not only expands the functionality of the original model but also significantly improves its applicability and accuracy in tabular classification. Our full source code is available for community use and development.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# 完全接続ネットはできないが、トランスフォーマーはおそらくスパーストークン選択を学習する Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot ( http://arxiv.org/abs/2406.06893v1 ) ライセンス: Link先を確認	Zixuan Wang, Stanley Wei, Daniel Hsu, Jason D. Lee,	(参考訳) トランスフォーマーアーキテクチャは、構造情報の選択と構成に特有な能力があるため、様々なディープラーニング環境で普及している。これらの能力に触発され、サンフォードらはスパーストークン選択タスクを提案し、トランスフォーマーは完全接続ネットワーク(FCN)が最悪の場合フェールする。その上で, 平均ケース設定に対するFCNの低境界を強化し, FCN上での変圧器のアルゴリズム的分離を確立する。具体的には、勾配降下で訓練された一層変圧器は、スパーストークン選択タスクを確実に学習し、驚くべきことに、分配長の強い一般化を示す。理論的知見を正当化するための実験シミュレーションを提供する。 The transformer architecture has prevailed in various deep learning settings due to its exceptional capabilities to select and compose structural information. Motivated by these capabilities, Sanford et al. proposed the sparse token selection task, in which transformers excel while fully-connected networks (FCNs) fail in the worst case. Building upon that, we strengthen the FCN lower bound to an average-case setting and establish an algorithmic separation of transformers over FCNs. Specifically, a one-layer transformer trained with gradient descent provably learns the sparse token selection task and, surprisingly, exhibits strong out-of-distribution length generalization. We provide empirical simulations to justify our theoretical findings.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# 単調変分不等式による非線形時系列埋め込み Nonlinear time-series embedding by monotone variational inequality ( http://arxiv.org/abs/2406.06894v1 ) ライセンス: Link先を確認	Jonathan Y. Zhou, Yao Xie,	(参考訳) 野生では、心電図、モーションキャプチャー、ゲノム、自然言語などのシーケンシャルなデータに遭遇することが多い。非線形時系列の低次元表現を教師なしで学習する新しい手法を導入し,再現可能な回復保証を実現する。学習された表現は、クラスタリングや分類といった下流の機械学習タスクに使用することができる。この方法は、観測されたシーケンスが共通の領域から生じるという仮定に基づいているが、各シーケンスはローランク正則化を通じて互いに関連付けられた自己回帰モデルに従う。本研究では,単調な変分不等式を用いた計算効率の良い凸行列パラメータ回復問題として,学習領域全体の幾何学を学習できる低ランク制約による共通領域仮定を符号化した。本稿では,実世界の時系列データに対する本手法の競合性能をベースラインで示すとともに,シンボリックテキストモデリングとRNAシークエンスクラスタリングの有効性を示す。 In the wild, we often encounter collections of sequential data such as electrocardiograms, motion capture, genomes, and natural language, and sequences may be multichannel or symbolic with nonlinear dynamics. We introduce a new method to learn low-dimensional representations of nonlinear time series without supervision and can have provable recovery guarantees. The learned representation can be used for downstream machine-learning tasks such as clustering and classification. The method is based on the assumption that the observed sequences arise from a common domain, but each sequence obeys its own autoregressive models that are related to each other through low-rank regularization. We cast the problem as a computationally efficient convex matrix parameter recovery problem using monotone Variational Inequality and encode the common domain assumption via low-rank constraint across the learned representations, which can learn the geometry for the entire domain as well as faithful representations for the dynamics of each individual sequence using the domain information in totality. We show the competitive performance of our method on real-world time-series data with the baselines and demonstrate its effectiveness for symbolic text modeling and RNA sequence clustering.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# CodeScore-R:コード合成の機能的正当性を評価するための自動ロバストネスメトリック CodeScore-R: An Automated Robustness Metric for Assessing the FunctionalCorrectness of Code Synthesis ( http://arxiv.org/abs/2406.06902v1 ) ライセンス: Link先を確認	Guang Yang, Yu Zhou, Xiang Chen, Xiangyu Zhang,	(参考訳) コード合成の分野では、評価指標が不可欠です。一般的に使用されるコード評価メトリクスは、マッチベース、セマンティックベース、実行ベースという3つのタイプに分類される。中でも、実行ベースのPass@kメトリックは、テストケースを実行することで、予測されたコードの機能を正確に評価する。しかし、このメトリクスを計算するにはかなりのオーバーヘッドが必要であり、テストケースを必要とせずに予測されたコードの機能を評価する自動評価指標の設計が必要である。さらに、予測されたコードがマイナーチェンジしても精度を維持することのできる指標として、優れた評価指標が堅牢である可能性があり、これらの課題に対処するために、コード合成の機能を評価するために、UniXcoderとContrastive Learningをベースにした、CodeScore-Rと呼ばれる自動化された堅牢なメトリクスを提案する。 CodeScore-Rは、スケッチベースの処理、構文等価変換、突然変異テストなどの技術を用いて、識別子、構文構造、演算子による推論を効果的に軽減する。実験結果によると、JavaとPythonのコード生成とマイグレーションのタスクでは、CodeScore-Rは、他の評価指標よりも優れており、Pass@kメトリックとより密に一致しているが、強い堅牢性を示している。 Evaluation metrics are crucial in the field of code synthesis. Commonly used code evaluation metrics canbe classified into three types: match-based, semantic-based, and execution-based. Among them, the execution-basedPass@k metric accurately assesses the functionality of predicted code by executing test cases. However, calculatingthis metric requires a significant amount of overhead, necessitating the design of an automated evaluation metric thatcan assess the functionality of predicted code without the need for test cases. Additionally, a good evaluation metricshould be robust, that is the metric can maintain its accuracy even when the predicted code undergoes minor changes.To address these challenges, we propose an automated robust metric, called CodeScore-R, based on UniXcoder andcontrastive learning, for evaluating the functionality of code synthesis. CodeScore-R employs techniques such assketch-based processing, syntactic-equivalent transformations, and mutation testing to effectively mitigate theinterference caused by identifiers, syntax structures, and operators on evaluation results. Experimental resultsdemonstrate that in the tasks of code generation and migration in Java and Python, CodeScore-R outperforms otherevaluation metrics and is more closely aligned with the Pass@k metric, while exhibiting stronger robustness.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# 特徴選択のためのカーネル依存度最大化の限界について On the Limitation of Kernel Dependence Maximization for Feature Selection ( http://arxiv.org/abs/2406.06903v1 ) ライセンス: Link先を確認	Keli Liu, Feng Ruan,	(参考訳) 特徴選択のための単純で直感的な方法は、応答と特徴の間の依存性の非パラメトリック尺度を最大化する特徴サブセットを選択することである。文献からの一般的な提案は、非パラメトリック依存尺度としてヒルベルト・シュミット独立基準(HSIC)を使用している。機能選択に対するこのアプローチの背景にある理論的根拠は、重要な機能が応答に高い依存を示し、選択した機能のセットに含めることでHSICが増加することである。反例を通して、この根拠に欠陥があり、HSICの最大化による特徴選択が重要な特徴を見逃すことを実証する。 A simple and intuitive method for feature selection consists of choosing the feature subset that maximizes a nonparametric measure of dependence between the response and the features. A popular proposal from the literature uses the Hilbert-Schmidt Independence Criterion (HSIC) as the nonparametric dependence measure. The rationale behind this approach to feature selection is that important features will exhibit a high dependence with the response and their inclusion in the set of selected features will increase the HSIC. Through counterexamples, we demonstrate that this rationale is flawed and that feature selection via HSIC maximization can miss critical features.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# SignMusketeers: 大規模手話翻訳のための効率的なマルチストリームアプローチ SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale ( http://arxiv.org/abs/2406.06907v1 ) ライセンス: Link先を確認	Shester Gueuwou, Xiaodan Du, Greg Shakhnarovich, Karen Livescu,	(参考訳) 手話から手話への翻訳作業を含む手話ビデオ処理における永続的な課題は、手話の表現を効果的かつ効率的な方法で学習する方法である。提案手法は,署名者の顔,手,身体の姿勢など,署名された言語の性質と言語学にインフォームドされ,署名したビデオの最も関連性の高い部分のみに焦点を当てる。しかし,手と顔の一貫性のないポーズ追跡モデルからポーズ推定座標を用いる代わりに,手話の複雑な手話と豊かな表情を自己指導的に学習することを提案する。我々のアプローチは、個々のフレームから(ビデオシーケンスではなく)学習することに基づいており、手話事前学習よりもずっと効率的である。 How2Signデータセット上の手話翻訳の最先端性を確立した最近のモデルと比較して,本手法は計算の3倍以下を用いて類似の翻訳性能が得られる。 A persistent challenge in sign language video processing, including the task of sign language to written language translation, is how we learn representations of sign language in an effective and efficient way that can preserve the important attributes of these languages, while remaining invariant to irrelevant visual differences. Informed by the nature and linguistics of signed languages, our proposed method focuses on just the most relevant parts in a signing video: the face, hands and body posture of the signer. However, instead of using pose estimation coordinates from off-the-shelf pose tracking models, which have inconsistent performance for hands and faces, we propose to learn the complex handshapes and rich facial expressions of sign languages in a self-supervised fashion. Our approach is based on learning from individual frames (rather than video sequences) and is therefore much more efficient than prior work on sign language pre-training. Compared to a recent model that established a new state of the art in sign language translation on the How2Sign dataset, our approach yields similar translation performance, using less than 3\% of the compute.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# UVIS: 教師なしのビデオインスタンスセグメンテーション UVIS: Unsupervised Video Instance Segmentation ( http://arxiv.org/abs/2406.06908v1 ) ライセンス: Link先を確認	Shuaiyi Huang, Saksham Suri, Kamal Gupta, Sai Saketh Rambhatla, Ser-nam Lim, Abhinav Shrivastava,	(参考訳) ビデオインスタンスのセグメンテーションには、ビデオフレームをまたいだすべてのオブジェクトの分類、セグメンテーション、追跡が必要である。マスクやボックス,あるいはカテゴリラベルに依存する既存のアプローチとは違って,ビデオアノテーションや濃密なラベルベースの事前トレーニングを使わずにビデオインスタンスセグメンテーションを実行できる,新しいビデオインスタンスセグメンテーション(Unsupervised Video Instance Segmentation, UVIS)フレームワークであるUVISを提案する。我々の重要な洞察は、自己監督型視覚基礎モデルDINOの前の密な形状と、画像キャプチャ型視覚言語モデルCLIPのオープンセット認識能力を活用することにある。 UVISフレームワークは,フレームレベルの擬似ラベル生成,トランスフォーマーベースのVISモデルトレーニング,クエリベースのトラッキングという3つの重要なステップで構成されている。教師なしセットアップにおけるVIS予測の品質向上のために,デュアルメモリ設計を導入する。この設計は、正確な擬似ラベルを生成するセマンティックメモリバンクと、オブジェクトトラックの時間的一貫性を維持するトラッキングメモリバンクとを含む。提案手法を,YoutubeVIS-2019,YoutubeVIS-2021,Occluded VISの3つの標準VISベンチマークで評価した。 UVISはYouTubeVIS-2019で21.1 APを達成したが、ビデオアノテーションや密集事前学習は行わず、教師なしVISフレームワークの可能性を示している。 Video instance segmentation requires classifying, segmenting, and tracking every object across video frames. Unlike existing approaches that rely on masks, boxes, or category labels, we propose UVIS, a novel Unsupervised Video Instance Segmentation (UVIS) framework that can perform video instance segmentation without any video annotations or dense label-based pretraining. Our key insight comes from leveraging the dense shape prior from the self-supervised vision foundation model DINO and the openset recognition ability from the image-caption supervised vision-language model CLIP. Our UVIS framework consists of three essential steps: frame-level pseudo-label generation, transformer-based VIS model training, and query-based tracking. To improve the quality of VIS predictions in the unsupervised setup, we introduce a dual-memory design. This design includes a semantic memory bank for generating accurate pseudo-labels and a tracking memory bank for maintaining temporal consistency in object tracks. We evaluate our approach on three standard VIS benchmarks, namely YoutubeVIS-2019, YoutubeVIS-2021, and Occluded VIS. Our UVIS achieves 21.1 AP on YoutubeVIS-2019 without any video annotations or dense pretraining, demonstrating the potential of our unsupervised VIS framework.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# 高次元極限における非線形コントラスト学習モデルの学習ダイナミクス Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit ( http://arxiv.org/abs/2406.06909v1 ) ライセンス: Link先を確認	Lineghuan Meng, Chuang Wang,	(参考訳) 本論文は, 単層非線形コントラスト学習モデルにおける学習力学の高次元的解析について述べる。モデル重みの実験的分布は、マッキーン・ブラソフ非線形偏微分方程式(PDE)によって支配される決定論的尺度に収束する。 L2正規化の下では、このPDEは訓練過程におけるモデル性能の進化を反映して、低次元常微分方程式(ODE)の閉集合に還元される。 ODEの固定点位置とその安定性を解析し,いくつかの興味深い結果を示した。まず、隠された変数の2番目のモーメントだけが、非形式的初期化を伴う状態における機能の学習性に影響を与える。第二に、高次モーメントは局所安定性に影響を与えるのではなく、アトラクション領域を制御することによって特徴選択の確率に影響を与える。最後に、データ議論で付加される独立ノイズは性能を低下させるが、負に相関するノイズは、勾配推定のばらつきを低減し、性能が向上する。解析モデルの単純さにもかかわらず、これは訓練力学の豊富な現象を示し、実用的な大規模モデルの背後にあるより複雑なメカニズムを理解する方法を確立している。 This letter presents a high-dimensional analysis of the training dynamics for a single-layer nonlinear contrastive learning model. The empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE). Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations (ODEs), reflecting the evolution of the model performance during the training process. We analyze the fixed point locations and their stability of the ODEs unveiling several interesting findings. First, only the hidden variable's second moment affects feature learnability at the state with uninformative initialization. Second, higher moments influence the probability of feature selection by controlling the attraction region, rather than affecting local stability. Finally, independent noises added in the data argumentation degrade performance but negatively correlated noise can reduces the variance of gradient estimation yielding better performance. Despite of the simplicity of the analyzed model, it exhibits a rich phenomena of training dynamics, paving a way to understand more complex mechanism behind practical large models.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# Agent-SiMT:大規模言語モデルを用いたエージェント支援同時機械翻訳 Agent-SiMT: Agent-assisted Simultaneous Machine Translation with Large Language Models ( http://arxiv.org/abs/2406.06910v1 ) ライセンス: Link先を確認	Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng,	(参考訳) 同時機械翻訳(SiMT)は、原文を読みながらターゲット翻訳を生成する。これは、文章を読み、翻訳を生成するのに最適なタイミングを決定するためのポリシーに依存している。既存の SiMT メソッドは一般的に、ポリシーを同時に決定し、翻訳を生成する、従来の Transformer アーキテクチャを採用している。ポリシーの決定には優れていますが、翻訳性能は最適以下です。逆に、広範囲のコーパスで訓練されたLarge Language Models (LLMs) は、優れた生成能力を有するが、SiMTの訓練方法による翻訳ポリシーの取得は困難である。そこで本研究では,従来のSiMT手法とLLMの強度を組み合わせたフレームワークであるAgent-SiMTを紹介する。エージェント−SiMTは、ポリシー決定剤及び翻訳剤を含む。ポリシー決定エージェントは、部分的ソース文と翻訳を用いて翻訳ポリシーを決定するSiMTモデルにより管理される。 LLMを利用した翻訳エージェントは、部分的ソース文に基づいて翻訳を生成する。 2人のエージェントは、SiMTを達成するために協力します。実験により、Agent-SiMTは最先端の性能を発揮することが示された。 Simultaneous Machine Translation (SiMT) generates target translations while reading the source sentence. It relies on a policy to determine the optimal timing for reading sentences and generating translations. Existing SiMT methods generally adopt the traditional Transformer architecture, which concurrently determines the policy and generates translations. While they excel at determining policies, their translation performance is suboptimal. Conversely, Large Language Models (LLMs), trained on extensive corpora, possess superior generation capabilities, but it is difficult for them to acquire translation policy through the training methods of SiMT. Therefore, we introduce Agent-SiMT, a framework combining the strengths of LLMs and traditional SiMT methods. Agent-SiMT contains the policy-decision agent and the translation agent. The policy-decision agent is managed by a SiMT model, which determines the translation policy using partial source sentence and translation. The translation agent, leveraging an LLM, generates translation based on the partial source sentence. The two agents collaborate to accomplish SiMT. Experiments demonstrate that Agent-SiMT attains state-of-the-art performance.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# AsyncDiff: Asynchronous Denoisingによる拡散モデルの並列化 AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising ( http://arxiv.org/abs/2406.06911v1 ) ライセンス: Link先を確認	Zigeng Chen, Xinyin Ma, Gongfan Fang, Zhenxiong Tan, Xinchao Wang,	(参考訳) 拡散モデルは、様々なアプリケーションにまたがる優れた生成能力に対して、コミュニティから大きな関心を集めてきた。しかし、その典型的な多重ステップのシーケンシャルデノジング特性は、高い累積遅延を生じさせ、それによって並列計算の可能性が排除される。そこで本研究では,複数のデバイスにまたがるモデル並列化を実現する,汎用的でプラグアンドプレイなアクセラレーション方式であるAsyncDiffを紹介する。提案手法では、ノイズ予測モデルを複数のコンポーネントに分割し、それぞれが異なるデバイスに割り当てる。これらのコンポーネント間の依存関係連鎖を断ち切るために、連続拡散ステップにおいて隠蔽状態間の高い類似性を利用して、従来のシーケンシャルなdenoisingを非同期プロセスに変換する。その結果、各コンポーネントは別々のデバイス上で並列に計算される。提案手法は、生成品質に最小限の影響を与えながら、推論遅延を著しく低減する。具体的には、安定拡散 v2.1 では、AsyncDiff は NVIDIA A5000 GPU の 4 台の CLIP Score で 0.38 をわずかに削減するだけで、無視できる劣化と 4.0 のスピードアップで 2.7 倍のスピードアップを達成する。我々の実験は、AsyncDiffがビデオ拡散モデルに容易に適用でき、性能を向上できることを示した。コードはhttps://github.com/czg1225/AsyncDiffで公開されている。 Diffusion models have garnered significant interest from the community for their great generative ability across various applications. However, their typical multi-step sequential-denoising nature gives rise to high cumulative latency, thereby precluding the possibilities of parallel computation. To address this, we introduce AsyncDiff, a universal and plug-and-play acceleration scheme that enables model parallelism across multiple devices. Our approach divides the cumbersome noise prediction model into multiple components, assigning each to a different device. To break the dependency chain between these components, it transforms the conventional sequential denoising into an asynchronous process by exploiting the high similarity between hidden states in consecutive diffusion steps. Consequently, each component is facilitated to compute in parallel on separate devices. The proposed strategy significantly reduces inference latency while minimally impacting the generative quality. Specifically, for the Stable Diffusion v2.1, AsyncDiff achieves a 2.7x speedup with negligible degradation and a 4.0x speedup with only a slight reduction of 0.38 in CLIP Score, on four NVIDIA A5000 GPUs. Our experiments also demonstrate that AsyncDiff can be readily applied to video diffusion models with encouraging performances. The code is available at https://github.com/czg1225/AsyncDiff.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# 安全なマルチパーティ計算における通信複雑度について On the Communication Complexity of Secure Multi-Party Computation With Aborts ( http://arxiv.org/abs/2406.06914v1 ) ライセンス: Link先を確認	James Bartusek, Thiago Bergamaschi, Seri Khoury, Saachi Mutreja, Orr Paradise,	(参考訳) 暗号の主な目的は、セキュアなマルチパーティ計算(MPC)である。残念なことに、MPCがすべての当事者にアウトプット配信を保証することは、大多数の当事者が悪意を持っていれば実現不可能であることが知られている。実際、ポイント・ツー・ポイント・ネットワーク(すなわち、ブロードキャスト・チャンネルにアクセスできない)で運営している当事者は、第三者の3分の1以上が悪意がある場合(Lamport, Shostak, Pease, JACM 1980)、アウトプットに関する合意さえ得られない。ポイント・ツー・ポイント・モデルのこの実現可能性に触発され、ゴールドワッサーとリンデル(J. Cryptol 2005)はMPCの定義を導入した。この定義の下では、悪意のある振る舞いを検知した場合、任意のパーティがプロトコルを中止する可能性がある。彼らは,選択的中絶を伴うMPCが,中絶を伴う放送機能を実装することで,悪意ある当事者に対して実現可能であることを示した。停電を伴うMPCのモデルは、長年にわたって多くの注目を集めてきたが、ポイント・ツー・ポイント・ネットワーク上の通信の複雑さについてはほとんど知られていない。本研究では, MPC の通信複雑性について検討し, このモデルでほぼ最適な通信プロトコルを考案する。すなわち、正直な当事者数$h$、通信の複雑さ、プロトコルの局所性の間のトレードオフを証明します。ここでは、局所性は各当事者が通信しなければならないピアの数に縛られる。 A central goal of cryptography is Secure Multi-party Computation (MPC), where $n$ parties desire to compute a function of their joint inputs without letting any party learn about the inputs of its peers. Unfortunately, it is well-known that MPC guaranteeing output delivery to every party is infeasible when a majority of the parties are malicious. In fact, parties operating over a point-to-point network (i.e. without access to a broadcast channel) cannot even reach an agreement on the output when more than one third of the parties are malicious (Lamport, Shostak, and Pease, JACM 1980). Motivated by this infeasibility in the point-to-point model, Goldwasser and Lindell (J. Cryptol 2005) introduced a definition of MPC that does not require agreement, referred to as MPC with selective abort. Under this definition, any party may abort the protocol if they detect malicious behavior. They showed that MPC with selective abort is feasible for any number of malicious parties by implementing a broadcast functionality with abort. While the model of MPC with abort has attracted much attention over the years, little is known about its communication complexity over point-to-point networks. In this work, we study the communication complexity of MPC with abort and devise nearly-optimal communication efficient protocols in this model. Namely, we prove trade-offs between the number of honest parties $h$, the communication complexity, and the locality of the protocols. Here, locality is a bound on the number of peers with which each party must communicate.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# ワークフォースマネジメントにおけるスケーラビリティ - スケーラビリティ原則を適用して4日間の作業週間を創出する Scalability in Workforce Management: Applying Scalability Principles to Foster a Four-Day Work Week ( http://arxiv.org/abs/2406.06915v1 ) ライセンス: Link先を確認	Sunkanmi Oluwadare, Ebubechukwu Edokwe, Olatunde Ayeomoni,	(参考訳) 従来の5日間のワークウィークでは課題が増加し、4日間のワークウィークのような代替モデルの探索が進められている。本研究は,4日間の労働週間の労働管理を再定義する上で,クラウドコンピューティングとITから派生したスケーラビリティ原則の変革的可能性について検討する。この研究は、グレー文学と体系的なレビューアプローチを組み合わせた多言語リテラシー研究手法を用いている。関連する作業の包括的なレビューを通じて,4日間のワークウィークへの移行の課題とメリットについて検討する。パイロットプログラム、明確なコミュニケーション、アジリティが重要な成功要因として認識されます。労働管理におけるスケーラビリティの原則の合成は、4日間の作業週間へのスムーズな移行のための強力なフレームワークとして役立ちます。適応性、動的リソース割り当て、データ駆動の洞察を優先順位付けすることで、企業は圧縮された作業スケジュールの完全な可能性を解き放ちます。この研究は、近代的な労働構造の進化する景観を育み、従業員の幸福を優先しようとする組織に貴重な洞察を与えている。 The traditional five-day workweek faces mounting challenges, prompting exploration of alternative models like the four-day workweek. This research explores the transformative potential of scalability principles derived from cloud computing and IT in redefining workforce management for a four-day workweek. The study employs a Multivocal Literacy Research methodology, combining grey literature and systematic review approaches. Through a comprehensive review of related work, the challenges, and benefits of transitioning to a four-day workweek are explored. Pilot programs, clear communication, and agility are identified as critical success factors. The synthesis of scalability principles in workforce management serves as a powerful framework for a smooth transition towards a four-day workweek. By prioritizing adaptability, dynamic resource allocation, and data-driven insights, organizations can unlock the full potential of a compressed work schedule. This research contributes valuable insights for organizations seeking to thrive in the evolving landscape of modern work structures and prioritizing employee well-being.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# モナディック正則性:完備性と双対性 Monadic ortholattices: completions and duality ( http://arxiv.org/abs/2406.06917v1 ) ライセンス: Link先を確認	John Harding, Joseph McDonald, Miguel Peinado,	(参考訳) モナディック正則多様体の多様体は、マクニールと正準完備化の下で閉じていることを示す。いずれの場合も、$L$の完備化は、モナディック直交フレームである関連する双対空間$X$を形成することによって得られる。これは直交関係と、ある条件を満たす追加の二項関係を持つ集合である。 MacNeille 補完の場合、$X$ は$L$ のゼロでない要素から作られ、標準補完の場合、$X$ は$L$ の適切なフィルタから形成される。対応する$L$の完備化は、$X$の双直交閉部分集合の直交集合として得られる。ゴールドブラット (Goldblatt) とビンビオ (Bimb\'o) が行ったように、直交フレームに適切な位相を導入することにより、モナディック直交の圏とモナディック直交空間の間の双対の随伴が得られる。この双対随伴の制限は双対同値を与える。 We show that the variety of monadic ortholattices is closed under MacNeille and canonical completions. In each case, the completion of $L$ is obtained by forming an associated dual space $X$ that is a monadic orthoframe. This is a set with an orthogonality relation and an additional binary relation satisfying certain conditions. For the MacNeille completion, $X$ is formed from the non-zero elements of $L$, and for the canonical completion, $X$ is formed from the proper filters of $L$. The corresponding completion of $L$ is then obtained as the ortholattice of bi-orthogonally closed subsets of $X$ with an additional operation defined through the binary relation of $X$. With the introduction of a suitable topology on an orthoframe, as was done by Goldblatt and Bimb\'o, we obtain a dual adjunction between the categories of monadic ortholattices and monadic orthospaces. A restriction of this dual adjunction provides a dual equivalence.	翻訳日:2024-06-12 19:36:38 公開日:2024-06-11
# LLMに基づくコード生成のより現実的な評価に向けて--実験研究以降 Towards more realistic evaluation of LLM-based code generation: an experimental study and beyond ( http://arxiv.org/abs/2406.06918v1 ) ライセンス: Link先を確認	Dewu Zheng, Yanlin Wang, Ensheng Shi, Ruikai Zhang, Yuchi Ma, Hongyu Zhang, Zibin Zheng,	(参考訳) 複雑な実世界のソフトウェア開発シナリオにおいて、LLM(Large Language Models)のコード生成能力を評価するために、多くの評価手法が開発されている。通常、プロジェクトの最新バージョンからのコンテキストコードを活用して、所望の関数を正確に生成するLLMを促進する。しかし、このような評価手法は、時間とともにソフトウェアプロジェクトの動的進化を考慮せず、進化しない状況として言及し、将来のコンテキストリークと有用なコンテキスト欠如の問題を招いている。その結果,LLMの性能が不正確なことが判明した。本稿では,LLMのコード生成性能を,ソフトウェア開発の進化する性質を反映した設定内で深く理解するための実証的研究を行う。そこで我々はまず,自動実行に基づく評価ツールを備えた,進化型リポジトリレベルのコード生成データセットであるHumanEvoを構築した。次に、HumanEvoを依存性レベルに応じて手動で分類し、依存関係レベルが異なる関数を生成する際のモデルの性能をより包括的に分析する。第3に,提案したベンチマークの有効性を検証するため,HumanEvoの7つの代表および多種多様なLCMを用いて広範な実験を行った。我々は実験を通して多くの重要な知見を得た。例えば、従来の非無視評価手法は、10.0%から61.1%の範囲でLLMの膨張性能を低下させることがわかった。この結果に基づいて,コード生成におけるLCMのより現実的な評価について,実用的な提案を行う。私たちはまた、将来の研究を促進するために、進化を意識したコード生成ツールボックスも作っています。ソースコード、データセット、付録を含むレプリケーションパッケージはhttps://github.com/DeepSoftwareAnalytics/EvoEval.comで入手できる。 To evaluate the code generation capabilities of Large Language Models (LLMs) in complex real-world software development scenarios, many evaluation approaches have been developed. They typically leverage contextual code from the latest version of a project to facilitate LLMs in accurately generating the desired function. However, such evaluation approaches fail to consider the dynamic evolution of software projects over time, which we refer to as evolving-ignored situation, leading to issues of future context leakage and useful context missing. This in turn results in inaccurate evaluation of LLMs' performance. In this paper, we conduct an empirical study to deeply understand LLMs' code generation performance within settings that reflect the evolving nature of software development. To achieve this, we first construct an evolving-aware repository-level code generation dataset, namely HumanEvo, equipped with an automated execution-based evaluation tool. Second, we manually categorize HumanEvo according to dependency levels to more comprehensively analyze the model's performance in generating functions with different dependency levels. Third, we conduct extensive experiments on HumanEvo with seven representative and diverse LLMs to verify the effectiveness of the proposed benchmark. We obtain many important findings through our experimental study. For example, we find that previous evolving-ignored evaluation approaches lead to inflated performance of the LLMs, ranging from 10.0% to 61.1%. Based on the findings, we give actionable suggestions on more realistic evaluation of LLMs on code generation. We also build a shared evolving-aware code generation toolbox to facilitate future research. Replication package including source code, datasets and appendix is available at https://github.com/DeepSoftwareAnalytics/EvoEval.	翻訳日:2024-06-12 17:35:04 公開日:2024-06-11
# 非自己回帰型パーソナライズドバンドル生成 Non-autoregressive Personalized Bundle Generation ( http://arxiv.org/abs/2406.06925v1 ) ライセンス: Link先を確認	Wenchuan Yang, Cheng Yang, Jichao Li, Yuejin Tan, Xin Lu, Chuan Shi,	(参考訳) 多数の候補項目からユーザの好みのバンドルを作成することを目的としたパーソナライズされたバンドル生成問題は、推奨事項に注目が集まる。しかし、既存の研究はバンドルの順序不変性を無視し、逐次モデリング手法をソリューションとして採用している。そこで本研究では,非自己回帰機構を用いてバンドル生成を行い,BundleNATという新しいエンコーダ・デコーダ・フレームワークを設計する。具体的には、逐次依存を学習する代わりに、ユーザの嗜好とアイテムベースの互換性情報を埋め込むために事前学習技術とグラフニューラルネットワークを採用し、自己注意に基づくエンコーダを用いてグローバル依存パターンを抽出することを提案する。次に、所望のバンドルを直接ワンショットで出力できる置換同変復号アーキテクチャを設計する。 YoushuとNeteaseの3つの実世界のデータセットの実験では、提案された BundleNAT は、それぞれ精度、精度+、リコールの絶対的な改善を35.92%、平均で10.97%、23.67%で上回っている。 The personalized bundle generation problem, which aims to create a preferred bundle for user from numerous candidate items, receives increasing attention in recommendation. However, existing works ignore the order-invariant nature of the bundle and adopt sequential modeling methods as the solution, which might introduce inductive bias and cause a large latency in prediction. To address this problem, we propose to perform the bundle generation via non-autoregressive mechanism and design a novel encoder-decoder framework named BundleNAT, which can effectively output the targeted bundle in one-shot without relying on any inherent order. In detail, instead of learning sequential dependency, we propose to adopt pre-training techniques and graph neural network to fully embed user-based preference and item-based compatibility information, and use a self-attention based encoder to further extract global dependency pattern. We then design a permutation-equivariant decoding architecture that is able to directly output the desired bundle in a one-shot manner. Experiments on three real-world datasets from Youshu and Netease show the proposed BundleNAT significantly outperforms the current state-of-the-art methods in average by up to 35.92%, 10.97% and 23.67% absolute improvements in Precision, Precision+, and Recall, respectively.	翻訳日:2024-06-12 17:35:04 公開日:2024-06-11
# 知覚コンポーネントを用いた表現学習の解説 Explaining Representation Learning with Perceptual Components ( http://arxiv.org/abs/2406.06930v1 ) ライセンス: Link先を確認	Yavuz Yarici, Kiran Kokilepersaud, Mohit Prabhushankar, Ghassan AlRegib,	(参考訳) 自己教師付きモデルは明確な意味を持たない表現空間を作成する。この表現の解釈可能性問題は、従来の説明可能性法をこの文脈では非効率にする。本稿では,色,形状,テクスチャという3つの重要な知覚成分を用いて表現空間を解析する新しい手法を提案する。我々はこれらの成分の選択的マスキングを用いて表現の変化を観察し、それぞれの重要なマップを区別する。ラベルが存在しないシナリオでは、これらの重要地図は人間の視覚システムに不可欠なので、より直感的な説明を提供する。我々のアプローチは表現空間の解釈可能性を高め、人間の視覚的知覚に共鳴する説明を提供する。我々は,異なる学習対象が知覚的成分を用いて異なる表現空間をいかに作るかを分析する。さらに,様々な画像領域にまたがる画像の表現について検討し,異なる文脈におけるこれらの構成要素の役割についての洞察を提供する。 Self-supervised models create representation spaces that lack clear semantic meaning. This interpretability problem of representations makes traditional explainability methods ineffective in this context. In this paper, we introduce a novel method to analyze representation spaces using three key perceptual components: color, shape, and texture. We employ selective masking of these components to observe changes in representations, resulting in distinct importance maps for each. In scenarios, where labels are absent, these importance maps provide more intuitive explanations as they are integral to the human visual system. Our approach enhances the interpretability of the representation space, offering explanations that resonate with human visual perception. We analyze how different training objectives create distinct representation spaces using perceptual components. Additionally, we examine the representation of images across diverse image domains, providing insights into the role of these components in different contexts.	翻訳日:2024-06-12 17:35:03 公開日:2024-06-11
# ソーシャルネットワークの分散化と言論の自由のオンライン化 Decentralized Social Networks and the Future of Free Speech Online ( http://arxiv.org/abs/2406.06934v1 ) ライセンス: Link先を確認	Tao Huang,	(参考訳) MastodonやBlueSkyのような分散ソーシャルネットワークは近年注目を集め、議論の的となっている。中央ノードからエンドユーザへ権限を委譲することで、分散化されたソーシャルネットワークは、集中型プラットフォーム上の既存の病理を修復することを目的としており、インターネットの未来として多くの人々が見てきた。本稿では,分散化プロジェクトのオンラインコミュニケーションへの展望を批判的かつ体系的に評価する。フリースピーチの規範的理論を用いて、分散化設計がユーザの表現の自由をオンラインで促進するかどうかを検証している。この分析は、この領域における価値に基づく設計の重要性を強調しながら、約束と落とし穴の両方が存在することを示している。分散化されたネットワークの設計において最も顕著な2つの問題は、分散化の理想と、ネットワーク上の中央集権化の絶え間ないニーズのバランスをとる方法と、ユーザが真のコントロールを行使できるようにする方法である。この記事では、共有ブロックリストやオプトイン検索関数など、いくつかの設計例を使用して、設計選択の根底にある価値の考慮事項を説明する。新たなネットワークの設計を容易にするために、法と政策の介入に関する暫定的な提案が提案されている。明確な回答を提供するのではなく、この記事では、設計選択の価値含意をマッピングし、利害関係を強調し、将来の研究の方向性を示す。 Decentralized social networks like Mastodon and BlueSky are trending topics that have drawn much attention and discussion in recent years. By devolving powers from the central node to the end users, decentralized social networks aim to cure existing pathologies on the centralized platforms and have been viewed by many as the future of the Internet. This article critically and systematically assesses the decentralization project's prospect for communications online. It uses normative theories of free speech to examine whether and how the decentralization design could facilitate users' freedom of expression online. The analysis shows that both promises and pitfalls exist, highlighting the importance of value-based design in this area. Two most salient issues for the design of the decentralized networks are: how to balance the decentralization ideal with constant needs of centralization on the network, and how to empower users to make them truly capable of exercising their control. The article then uses some design examples, such as the shared blocklist and the opt-in search function, to illustrate the value considerations underlying the design choices. Some tentative proposals for law and policy interventions are offered to better facilitate the design of the new network. Rather than providing clear answers, the article seeks to map the value implications of the design choices, highlight the stakes, and point directions for future research.	翻訳日:2024-06-12 17:35:03 公開日:2024-06-11
# 古典的データを最小損失の行列積状態表現に符号化する最適クビットマッピング探索 Optimal Qubit Mapping Search for Encoding Classical Data into Matrix Product State Representation with Minimal Loss ( http://arxiv.org/abs/2406.06935v1 ) ライセンス: Link先を確認	Hyeongjun Jeon, Kyungmin Lee, Dongkyu Lee, Bongsang Kim, Taehyun Kim,	(参考訳) Matrix Product State(MPS)は、古典的なデータを量子状態にエンコードするフレームワークを提供する。本研究では,古典データの符号化に特化して設計されたMPS表現の効率性と精度を向上させる手法について検討する。提案手法は,MPSトランケーション誤差が古典データのパターンに依存するという観測に基づいて,与えられた古典データに対して最適な量子ビットマッピングを求めるアルゴリズムを考案し,MPS表現の効率と忠実性を向上させる。さらに、量子分類器の文脈における最適化MPSの影響を評価し、従来のマッピングと比較して性能が向上したことを示す。この改良により、古典的データを量子状態に符号化するための提案手法の有効性が確かめられる。 MPS表現と最適量子ビットマッピングを組み合わせることで、より効率的で正確な量子データ表現と処理のための新しい方法を開拓することができる。 Matrix product state (MPS) offers a framework for encoding classical data into quantum states, enabling the efficient utilization of quantum resources for data representation and processing. This research paper investigates techniques to enhance the efficiency and accuracy of MPS representations specifically designed for encoding classical data. Based on the observations that MPS truncation error depends on the pattern of the classical data, we devised an algorithm that finds optimal qubit mapping for given classical data, thereby improving the efficiency and fidelity of the MPS representation. Furthermore, we evaluate the impact of the optimized MPS in the context of quantum classifiers, demonstrating their enhanced performance compared to the conventional mapping. This improvement confirms the efficacy of the proposed techniques for encoding classical data into quantum states. MPS representation combined with optimal qubit mapping can pave a new way for more efficient and accurate quantum data representation and processing.	翻訳日:2024-06-12 17:35:03 公開日:2024-06-11
# エンドツーエンド同時音声-音声翻訳のための非自己回帰生成フレームワーク A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation ( http://arxiv.org/abs/2406.06937v1 ) ライセンス: Link先を確認	Zhengrui Ma, Qingkai Fang, Shaolei Zhang, Shoutao Guo, Yang Feng, Min Zhang,	(参考訳) 同時翻訳モデルは、コミュニケーションを促進する上で重要な役割を果たす。しかし、既存の研究は主にテキスト・トゥ・テキスト・モデルや音声・トゥ・テキスト・モデルに焦点を当てており、音声・音声翻訳を実現するために追加のカスケード・コンポーネントを必要とする。これらのパイプライン手法は、各カスケードコンポーネントにエラーの伝搬と遅延の蓄積に悩まされ、話者とリスナーの同期が減少する。これらの課題を克服するために,音声・テキスト・音声・音声の同時翻訳のための非自己回帰生成フレームワーク(NAST-S2X)を提案する。固定長音声チャンクの受信時に複数のテキストや音響単位トークンを同時に生成できる非自己回帰デコーダを開発する。デコーダは空白または繰り返しトークンを生成し、CTCデコードを使用して遅延を動的に調整することができる。実験結果から,NAST-S2Xは音声・テキスト・音声の両タスクにおいて,最先端のモデルよりも優れていた。 3秒未満の遅延で高品質な同時解釈を実現し、オフライン生成において28倍のデコードスピードアップを提供する。 Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization between the speaker and listener. To overcome these challenges, we propose a novel non-autoregressive generation framework for simultaneous speech translation (NAST-S2X), which integrates speech-to-text and speech-to-speech tasks into a unified end-to-end framework. We develop a non-autoregressive decoder capable of concurrently generating multiple text or acoustic unit tokens upon receiving fixed-length speech chunks. The decoder can generate blank or repeated tokens and employ CTC decoding to dynamically adjust its latency. Experimental results show that NAST-S2X outperforms state-of-the-art models in both speech-to-text and speech-to-speech tasks. It achieves high-quality simultaneous interpretation within a delay of less than 3 seconds and provides a 28 times decoding speedup in offline generation.	翻訳日:2024-06-12 17:35:03 公開日:2024-06-11
# 埋蔵・信頼性の高い長期文書理解への回答 : 課題・洞察・課題 Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges ( http://arxiv.org/abs/2406.06938v1 ) ライセンス: Link先を確認	Abhilasha Sancheti, Koustava Goswami, Balaji Vasan Srinivasan,	(参考訳) 信頼できる、信頼性があり、説明可能なシステムを構築するためには、情報検索に関する質問に対する回答テキストのソースドキュメントへの投稿が不可欠である。我々は,長期文書理解(LDC)のためのポストホック応答帰属の新たな課題を定式化する。長文の抽象的および情報探索的LCCデータセットが欠如していることから,既存の検索ベースと提案した回答分解とテキスト・エントリメントに基づく最適選択属性システムの有効性と弱点を評価するために,既存のデータセットをリファクタリングした。私たちは、既存のデータセットの制限と、このタスクにおけるシステムの実際のパフォーマンスを評価するデータセットの必要性に光を当てています。 Attributing answer text to its source document for information-seeking questions is crucial for building trustworthy, reliable, and accountable systems. We formulate a new task of post-hoc answer attribution for long document comprehension (LDC). Owing to the lack of long-form abstractive and information-seeking LDC datasets, we refactor existing datasets to assess the strengths and weaknesses of existing retrieval-based and proposed answer decomposition and textual entailment-based optimal selection attribution systems for this task. We throw light on the limitations of existing datasets and the need for datasets to assess the actual performance of systems on this task.	翻訳日:2024-06-12 17:35:03 公開日:2024-06-11
# 可変射影による最適マトリックスミメティックテンソル代数 Optimal Matrix-Mimetic Tensor Algebras via Variable Projection ( http://arxiv.org/abs/2406.06942v1 ) ライセンス: Link先を確認	Elizabeth Newman, Katherine Keegan,	(参考訳) 近年の<matrix-mimetic} テンソルフレームワークの進歩により、多線形データ解析のための線形代数特性の保存が可能となり、その結果、マルチウェイデータの最適な表現が得られるようになった。行列緩和性(Matrix mimeticity)は、テンソルを、行列に類似した乗算、分解、解析が可能な作用素として解釈することから生じる。テンソル演算の下方には、可逆線型変換によってパラメータ化される代数的フレームワークがある。線形写像の選択は、表現品質にとって不可欠であり、実際には、データ内の期待される相関に基づいてヒューリスティックに作成される。しかし、多くの場合、これらの相関関係は未知であり、一般的なヒューリスティックスは最適以下の性能をもたらす。本研究では,データの事前知識に頼ることなく,最適線形写像と対応するテンソル表現を同時に学習する。我々の新しいフレームワークは、変数プロジェクションを使用して変換と表現の結合を明示的にキャプチャします。我々はリーマン最適化を用いて直交変換を学習することで線型写像の可逆性を保っている。可変射影型アルゴリズムの変換と収束解析の独創性理論を提供する。金融指標追跡,画像圧縮,縮小順序モデリングなど,幅広い応用の数値実験を通じて,我々のフレームワークの汎用性を実証する。この作業に関連するすべてのコードは、https://github.com/elizabethnewman/star-M-opt.comで公開しています。 Recent advances in {matrix-mimetic} tensor frameworks have made it possible to preserve linear algebraic properties for multilinear data analysis and, as a result, to obtain optimal representations of multiway data. Matrix mimeticity arises from interpreting tensors as operators that can be multiplied, factorized, and analyzed analogous to matrices. Underlying the tensor operation is an algebraic framework parameterized by an invertible linear transformation. The choice of linear mapping is crucial to representation quality and, in practice, is made heuristically based on expected correlations in the data. However, in many cases, these correlations are unknown and common heuristics lead to suboptimal performance. In this work, we simultaneously learn optimal linear mappings and corresponding tensor representations without relying on prior knowledge of the data. Our new framework explicitly captures the coupling between the transformation and representation using variable projection. We preserve the invertibility of the linear mapping by learning orthogonal transformations with Riemannian optimization. We provide original theory of uniqueness of the transformation and convergence analysis of our variable-projection-based algorithm. We demonstrate the generality of our framework through numerical experiments on a wide range of applications, including financial index tracking, image compression, and reduced order modeling. We have published all the code related to this work at https://github.com/elizabethnewman/star-M-opt.	翻訳日:2024-06-12 17:35:03 公開日:2024-06-11
# FAULT+PROBE: ジェネリックなRowhammerベースのビットリカバリ攻撃 FAULT+PROBE: A Generic Rowhammer-based Bit Recovery Attack ( http://arxiv.org/abs/2406.06943v1 ) ライセンス: Link先を確認	Kemal Derya, M. Caner Tol, Berk Sunar,	(参考訳) Rowhammerは、不正な攻撃者がDRAMセル内でエラーを誘発するセキュリティ脆弱性である。障害注入が攻撃を成功させるのを防ぐために、広く受け入れられている緩和は、命令とデータに対する障害チェックを実装している。障害が被害者の機能に与える影響を調べることで,この仮定の有効性に挑戦する。具体的には、攻撃者がビットフリップの方向パターンに基づいて、被害者のメモリのプロファイルを構築することができることを示す。このプロファイルを使用して、DRAM行内の最も感受性の高いビット位置を識別する。その後、これらの位置は、被害者の行動の変化から観察された側面情報を用いて、オンラインアタックフェーズ中に利用され、機密性の高いビット値が導出される。本研究の主な目的は,ローハンマーをプローブとして使用し,被害者の記憶の完全性から,被害者の操作行動に基づく統計的故障解析(SFA)へと重点を移すことである。 FAULT+PROBEは,機密情報を漏洩する誤ったシグネチャの発生を防止するために,確認後の故障チェック機構を回避するために用いられる可能性があることを示す。メモリプロファイリングの段階で識別されたキー位置に方向障害を注入する。攻撃者は署名生成率を観察し、それに応じて秘密ビット値を復号する。この回避は、被害者の監視可能なチャネルによって実現される。 FAULT+PROBEは、署名された犠牲者に限らず、障害注入の試みの結果を漏らす可観測チャネルが存在する任意のシステム上で秘密のビットを探索するのに使うことができる。この攻撃を実証するために、我々は、TLS 1.3ハンドシェイクのSSL実装において、フォールトプロテクトされたECDSAをターゲットにしている。 256ビットセッションキーを平均回復率22ビット/時間,100%の成功率で回収する。 Rowhammer is a security vulnerability that allows unauthorized attackers to induce errors within DRAM cells. To prevent fault injections from escalating to successful attacks, a widely accepted mitigation is implementing fault checks on instructions and data. We challenge the validity of this assumption by examining the impact of the fault on the victim's functionality. Specifically, we illustrate that an attacker can construct a profile of the victim's memory based on the directional patterns of bit flips. This profile is then utilized to identify the most susceptible bit locations within DRAM rows. These locations are then subsequently leveraged during an online attack phase with side information observed from the change in the victim's behavior to deduce sensitive bit values. Consequently, the primary objective of this study is to utilize Rowhammer as a probe, shifting the emphasis away from the victim's memory integrity and toward statistical fault analysis (SFA) based on the victim's operational behavior. We show FAULT+PROBE may be used to circumvent the verify-after-sign fault check mechanism, which is designed to prevent the generation of erroneous signatures that leak sensitive information. It does so by injecting directional faults into key positions identified during a memory profiling stage. The attacker observes the signature generation rate and decodes the secret bit value accordingly. This circumvention is enabled by an observable channel in the victim. FAULT+PROBE is not limited to signing victims and can be used to probe secret bits on arbitrary systems where an observable channel is present that leaks the result of the fault injection attempt. To demonstrate the attack, we target the fault-protected ECDSA in wolfSSL's implementation of the TLS 1.3 handshake. We recover 256-bit session keys with an average recovery rate of 22 key bits/hour and a 100% success rate.	翻訳日:2024-06-12 17:35:03 公開日:2024-06-11
# スパースベイズネットワーク:医用画像解析における効率的な不確実性定量化 Sparse Bayesian Networks: Efficient Uncertainty Quantification in Medical Image Analysis ( http://arxiv.org/abs/2406.06946v1 ) ライセンス: Link先を確認	Zeinab Abboud, Herve Lombaert, Samuel Kadoury,	(参考訳) 医用画像の予測不確実性を効果的に定量化することは依然として課題である。ベイズニューラルネットワーク(BNN)は予測の不確実性を提供するが、訓練にはかなりの計算資源が必要である。アンサンブルのようなベイズ近似は将来性を示しているが、それでも高い訓練と推論コストに悩まされている。既存のアプローチは、トレーニング後のBNN推論のコストに主に対処するが、トレーニング効率の改善とパラメータの複雑さの低減にはほとんど重点を置いていない。本研究では,疎(部分)ベイズネットワークのトレーニング手順を紹介する。本手法は,パラメータのサブセットを勾配感度解析により決定論的サリエンシを評価することでベイズ的パラメータとして選択的に割り当てる。結果として得られるネットワークは決定論的パラメータとベイズ的パラメータを結合し、両方の表現の利点を利用して高いタスク固有の性能を達成し、予測の不確実性を最小化する。セグメンテーションのための多ラベルChestMNIST,ISIC,LIDC-IDRI,セグメンテーションのためのLIDC-IDRIを用いて,ベイズパラメータを95%以上削減し,完全ベイズおよびアンサンブル法と比較して計算コストを大幅に削減することで,競合性能と予測不確実性の推定を実現した。 Efficiently quantifying predictive uncertainty in medical images remains a challenge. While Bayesian neural networks (BNN) offer predictive uncertainty, they require substantial computational resources to train. Although Bayesian approximations such as ensembles have shown promise, they still suffer from high training and inference costs. Existing approaches mainly address the costs of BNN inference post-training, with little focus on improving training efficiency and reducing parameter complexity. This study introduces a training procedure for a sparse (partial) Bayesian network. Our method selectively assigns a subset of parameters as Bayesian by assessing their deterministic saliency through gradient sensitivity analysis. The resulting network combines deterministic and Bayesian parameters, exploiting the advantages of both representations to achieve high task-specific performance and minimize predictive uncertainty. Demonstrated on multi-label ChestMNIST for classification and ISIC, LIDC-IDRI for segmentation, our approach achieves competitive performance and predictive uncertainty estimation by reducing Bayesian parameters by over 95\%, significantly reducing computational expenses compared to fully Bayesian and ensemble methods.	翻訳日:2024-06-12 17:35:03 公開日:2024-06-11
# CAAP: フロントエンドUIのみでコンピュータタスクを解決するためのコンテキスト対応アクションプランニング CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only ( http://arxiv.org/abs/2406.06947v1 ) ライセンス: Link先を確認	Junhee Cho, Jihoon Kim, Daseul Bae, Jinho Choo, Youngjune Gwon, Yeong-Dae Kwon,	(参考訳) ソフトウェアロボットは、日常的かつ反復的なコンピュータタスクを自動化するために、長い間ロボット処理自動化(Roboic Process Automation, RPA)にデプロイされてきた。高度な推論能力を持つLarge Language Models(LLMs)の出現は、これらのエージェントがより複雑で、以前は目に見えなかったタスクをこなすステージを固めている。しかし、最近の文献におけるLLMベースの自動化技術は、しばしば入力のためのHTMLソースコードに依存しており、アプリケーションをWeb環境に制限している。さらに、HTMLコードに含まれる情報は、しばしば不正確または不完全であり、エージェントは実用的なアプリケーションでは信頼性が低い。本研究では,環境認識のためのスクリーンショットのみに基づいて機能するLDMエージェントを提案する。我々の戦略は、コンテキスト認識行動計画(CAAP)と呼ばれ、エージェントが様々な角度でコンテキストを注意深くレビューするよう促す。提案手法により,67種類のMiniWoB++問題に対して94.4%の成功率を達成した。提案手法は,特にコンピュータやスマートフォン上でのアプリケーション間協調を必要とするタスクに対して,より広範な応用の可能性を提供し,自動化エージェントの分野での大きな進歩を示す。コードとモデルはhttps://github.com/caap-agent/caap-agentでアクセスできる。 Software robots have long been deployed in Robotic Process Automation (RPA) to automate mundane and repetitive computer tasks. The advent of Large Language Models (LLMs) with advanced reasoning capabilities has set the stage for these agents to now undertake more complex and even previously unseen tasks. However, the LLM-based automation techniques in recent literature frequently rely on HTML source codes for input, limiting their application to web environments. Moreover, the information contained in HTML codes is often inaccurate or incomplete, making the agent less reliable for practical applications. We propose an LLM-based agent that functions solely on the basis of screenshots for recognizing environments, while leveraging in-context learning to eliminate the need for collecting large datasets of human demonstration. Our strategy, named Context-Aware Action Planning (CAAP) prompting encourages the agent to meticulously review the context in various angles. Through our proposed methodology, we achieve a success rate of 94.4% on 67~types of MiniWoB++ problems, utilizing only 1.48~demonstrations per problem type. Our method offers the potential for broader applications, especially for tasks that require inter-application coordination on computers or smartphones, showcasing a significant advancement in the field of automation agents. Codes and models are accessible at https://github.com/caap-agent/caap-agent.	翻訳日:2024-06-12 17:35:03 公開日:2024-06-11
# 不確実性駆動型アクティブマッピングのためのニューラル可視界 Neural Visibility Field for Uncertainty-Driven Active Mapping ( http://arxiv.org/abs/2406.06948v1 ) ライセンス: Link先を確認	Shangjie Xue, Jesse Dill, Pranay Mathur, Frank Dellaert, Panagiotis Tsiotra, Danfei Xu,	(参考訳) 本稿では,ニューラル・ヴィジビリティ・フィールド(NVF, Neural Visibility Field)について述べる。我々の重要な洞察は、トレーニングビューで見えない領域は、この領域におけるNeRFによる本質的に信頼性の低い色予測につながり、合成ビューでは不確実性が増大するということである。これを解決するために,ベイジアンネットワークを用いて位置ベースフィールドの不確かさをカメラ観測におけるレイベース不確実性に合成することを提案する。その結果、NVFは自然に、観測されていない領域に高い不確実性を割り当て、ロボットが最も有益な次の視点を選択するのを助ける。大規模な評価では,NVFは不確実な定量化だけでなく,能動的マッピングのためのシーン再構成においても優れており,既存の手法よりも優れていた。 This paper presents Neural Visibility Field (NVF), a novel uncertainty quantification method for Neural Radiance Fields (NeRF) applied to active mapping. Our key insight is that regions not visible in the training views lead to inherently unreliable color predictions by NeRF at this region, resulting in increased uncertainty in the synthesized views. To address this, we propose to use Bayesian Networks to composite position-based field uncertainty into ray-based uncertainty in camera observations. Consequently, NVF naturally assigns higher uncertainty to unobserved regions, aiding robots to select the most informative next viewpoints. Extensive evaluations show that NVF excels not only in uncertainty quantification but also in scene reconstruction for active mapping, outperforming existing methods.	翻訳日:2024-06-12 17:35:03 公開日:2024-06-11
# 移動赤外小ターゲット検出のための周波数認識メモリ拡張を用いた三領域特徴学習 Triple-domain Feature Learning with Frequency-aware Memory Enhancement for Moving Infrared Small Target Detection ( http://arxiv.org/abs/2406.06949v1 ) ライセンス: Link先を確認	Weiwei Duan, Luping Ji, Shengjia Chen, Sicheng Zhu, Mao Ye,	(参考訳) 赤外線小ターゲット検出の移動は、小さなターゲットサイズと背景とのコントラストが低いため、大きな課題となる。現在存在する方法は、主に空間時間領域からのみターゲット特徴を抽出することに焦点を当てている。特徴表現をさらに強化するために、周波数のようなより多くの情報領域が潜在的に有用であると考えられている。ターゲット特徴学習を拡張するために,空間時間領域に周波数認識メモリを付加した新しいトリプルドメイン戦略(トリド)を提案する。提案手法では,フーリエ変換を用いた局所的な周波数認識モジュールにより,周波数特性を効果的に分解し,拡張する。人間の視覚システムにインスパイアされた記憶強調は,映像フレーム間の空間的関係を捉えることを目的としている。さらに,時間的運動特徴を差分学習と残差増強によって符号化する。さらに,ドメイン間の特徴的ミスマッチを再現する残差補償ユニットを設計する。我々の知る限りでは、Tridosは空間時間周波数領域におけるターゲット特徴学習を包括的に探求する最初の試みである。 3つのデータセット(DAUB, ITSDT-15K, IRDST)の広範な実験により、我々のトリプルドメイン学習方式は明らかに最先端のものよりも優れていることが検証された。ソースコードはhttps://github.com/UESTC-nnLab/Tridos.comで入手できる。 Moving infrared small target detection presents significant challenges due to tiny target sizes and low contrast against backgrounds. Currently-existing methods primarily focus on extracting target features only from the spatial-temporal domain. For further enhancing feature representation, more information domains such as frequency are believed to be potentially valuable. To extend target feature learning, we propose a new Triple-domain Strategy (Tridos) with the frequency-aware memory enhancement on the spatial-temporal domain. In our scheme, it effectively detaches and enhances frequency features by a local-global frequency-aware module with Fourier transform. Inspired by the human visual system, our memory enhancement aims to capture the target spatial relations between video frames. Furthermore, it encodes temporal dynamics motion features via differential learning and residual enhancing. Additionally, we further design a residual compensation unit to reconcile possible cross-domain feature mismatches. To our best knowledge, our Tridos is the first work to explore target feature learning comprehensively in spatial-temporal-frequency domains. The extensive experiments on three datasets (DAUB, ITSDT-15K, and IRDST) validate that our triple-domain learning scheme could be obviously superior to state-of-the-art ones. Source codes are available at https://github.com/UESTC-nnLab/Tridos.	翻訳日:2024-06-12 17:35:03 公開日:2024-06-11
# リーフ木伝播によるLLM幻覚検出のための確率的枠組み A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation ( http://arxiv.org/abs/2406.06950v1 ) ライセンス: Link先を確認	Bairu Hou, Yang Zhang, Jacob Andreas, Shiyu Chang,	(参考訳) 本稿では,LLM生成文の真偽判定を目的とした幻覚検出の課題に焦点を当てた。この問題に対処するために,LLM の自己整合性を LLM が論理的に関連づけた拡張文の集合として用い,外部の知識データベースを必要とせず,ホワイトボックスとブラックボックスの LLM の両方で動作する方法が一般的である。しかし、既存の多くのアプローチでは、拡張ステートメントは非常に単調で構造化されていないため、これらのステートメントにLLMの信念から意味のある情報を統合することは困難である。また、LLMの信念のバイナライズされたバージョンでは、連続的なバージョンではなく多くのメソッドが動作し、情報が著しく失われる。本稿では,LLM幻覚検出のための確率的フレームワークであるBelief Tree Propagation (BTProp)を提案する。 BTPropは、親ステートメントを3つの分解戦略で子ステートメントに再帰的に分解することで論理関連ステートメントの信念ツリーを導入し、これらのステートメントにLLMの信念スコアを統合するために隠れマルコフツリーモデルを構築した。実験の結果,複数の幻覚検出ベンチマークにおいて,AUROCおよびAUC-PRにより評価された基準値の3%-9%の改善が得られた。コードはhttps://github.com/UCSB-NLP-Chang/BTPropで入手できる。 This paper focuses on the task of hallucination detection, which aims to determine the truthfulness of LLM-generated statements. To address this problem, a popular class of methods utilize the LLM's self-consistencies in its beliefs in a set of logically related augmented statements generated by the LLM, which does not require external knowledge databases and can work with both white-box and black-box LLMs. However, in many existing approaches, the augmented statements tend to be very monotone and unstructured, which makes it difficult to integrate meaningful information from the LLM beliefs in these statements. Also, many methods work with the binarized version of the LLM's belief, instead of the continuous version, which significantly loses information. To overcome these limitations, in this paper, we propose Belief Tree Propagation (BTProp), a probabilistic framework for LLM hallucination detection. BTProp introduces a belief tree of logically related statements by recursively decomposing a parent statement into child statements with three decomposition strategies, and builds a hidden Markov tree model to integrate the LLM's belief scores in these statements in a principled way. Experiment results show that our method improves baselines by 3%-9% (evaluated by AUROC and AUC-PR) on multiple hallucination detection benchmarks. Code is available at https://github.com/UCSB-NLP-Chang/BTProp.	翻訳日:2024-06-12 17:35:03 公開日:2024-06-11
# ロバストステレオマッチングのためのステップワイズ回帰と事前訓練エッジ Stepwise Regression and Pre-trained Edge for Robust Stereo Matching ( http://arxiv.org/abs/2406.06953v1 ) ライセンス: Link先を確認	Weiqing Xiao, Wei Zhao,	(参考訳) 実検体と地上の真理を得るのが難しいため、実世界のアプリケーションにおけるステレオマッチング手法の実現には、一般化性能と微調整性能が不可欠である。しかし、異なるデータセット間での実質的な格差分布と密度の変動の存在は、モデルの一般化と微調整に重大な課題をもたらす。本稿では, SR-Stereoと呼ばれる新しいステレオマッチング手法を提案する。この手法は, 差分クリップの予測により, 異なるデータセット間の分布差を緩和し, 差分クリップの精度を向上させるために, 回帰目標スケールに関連する損失重みを用いる。さらに、この段階的な回帰アーキテクチャは、構造を変更することなく、既存のイテレーションベースのメソッドに容易に拡張でき、パフォーマンスを向上させることができる。さらに, 未熟な土台真実に基づく微調整モデルのエッジぼかしを軽減するために, 事前学習エッジに基づくドメイン適応を提案する。具体的には、予測不一致とRGB画像を用いて、対象領域画像のエッジマップを推定する。エッジマップをフィルタリングしてエッジマップ背景の擬似ラベルを生成し、対象領域におけるスパース基底の真相の相違とともに、事前訓練されたステレオマッチングモデルを協調的に微調整する監督を行う。これらの手法は,SceneFlow,KITTI,Middbury 2014,ETH3Dで広く評価されている。 SR-Stereoは、競争格差推定性能と最先端のクロスドメイン一般化性能を達成する。一方,DAPEは,特にテクスチャレス領域とディテール領域において,微調整モデルの分散度推定性能を著しく向上させる。 Due to the difficulty in obtaining real samples and ground truth, the generalization performance and the fine-tuned performance are critical for the feasibility of stereo matching methods in real-world applications. However, the presence of substantial disparity distributions and density variations across different datasets presents significant challenges for the generalization and fine-tuning of the model. In this paper, we propose a novel stereo matching method, called SR-Stereo, which mitigates the distributional differences across different datasets by predicting the disparity clips and uses a loss weight related to the regression target scale to improve the accuracy of the disparity clips. Moreover, this stepwise regression architecture can be easily extended to existing iteration-based methods to improve the performance without changing the structure. In addition, to mitigate the edge blurring of the fine-tuned model on sparse ground truth, we propose Domain Adaptation Based on Pre-trained Edges (DAPE). Specifically, we use the predicted disparity and RGB image to estimate the edge map of the target domain image. The edge map is filtered to generate edge map background pseudo-labels, which together with the sparse ground truth disparity on the target domain are used as a supervision to jointly fine-tune the pre-trained stereo matching model. These proposed methods are extensively evaluated on SceneFlow, KITTI, Middbury 2014 and ETH3D. The SR-Stereo achieves competitive disparity estimation performance and state-of-the-art cross-domain generalisation performance. Meanwhile, the proposed DAPE significantly improves the disparity estimation performance of fine-tuned models, especially in the textureless and detail regions.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# 分散MIPLIB:ML-Guided MILP法によるマルチドメインライブラリ Distributional MIPLIB: a Multi-Domain Library for Advancing ML-Guided MILP Methods ( http://arxiv.org/abs/2406.06954v1 ) ライセンス: Link先を確認	Weimin Huang, Taoan Huang, Aaron M Ferber, Bistra Dilkina,	(参考訳) 混合整数線形計画法(MILP)は組合せ最適化問題をモデル化するための基本的なツールである。近年、機械学習を使ってMILP問題解決を加速する研究が増えている。このアプローチの人気は高まっているが、異なるドメイン、異なる硬度レベルで異なるMILPインスタンスの分散を標準化されたテストセットで提供する共通のリポジトリが欠如している。本稿では,ML誘導MILP法を進化させるための問題分散のマルチドメインライブラリであるDistributedal MIPLIBを紹介する。この領域の既存の作業と、未使用の現実世界の問題からMILP分布をキュレートし、それらを異なる硬度レベルに分類する。多様な領域と現実的な領域の総合的な評価を可能にすることで、この分野の研究を促進する。配電型MIPLIBを研究車両として使用することの利点を実証的に説明する。 ML誘導変数分岐の性能を未使用の分布上で評価し,改善のための潜在的な領域を特定する。さらに,混合分布から分岐ポリシーを学習し,データに制限がある場合の同種分布と比較して,混合分布の方が優れた性能を示し,より大きなインスタンスによく一般化することを示す。 Mixed Integer Linear Programming (MILP) is a fundamental tool for modeling combinatorial optimization problems. Recently, a growing body of research has used machine learning to accelerate MILP solving. Despite the increasing popularity of this approach, there is a lack of a common repository that provides distributions of similar MILP instances across different domains, at different hardness levels, with standardized test sets. In this paper, we introduce Distributional MIPLIB, a multi-domain library of problem distributions for advancing ML-guided MILP methods. We curate MILP distributions from existing work in this area as well as real-world problems that have not been used, and classify them into different hardness levels. It will facilitate research in this area by enabling comprehensive evaluation on diverse and realistic domains. We empirically illustrate the benefits of using Distributional MIPLIB as a research vehicle in two ways. We evaluate the performance of ML-guided variable branching on previously unused distributions to identify potential areas for improvement. Moreover, we propose to learn branching policies from a mix of distributions, demonstrating that mixed distributions achieve better performance compared to homogeneous distributions when there is limited data and generalize well to larger instances.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# ElasticRec: 推奨モデルのためのエラスティックリソーススケーリングを実現するマイクロサービスベースのモデルサービングアーキテクチャ ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models ( http://arxiv.org/abs/2406.06955v1 ) ライセンス: Link先を確認	Yujeong Choi, Jiin Kim, Minsoo Rhu,	(参考訳) レコメンデーションシステム(RecSys)の普及に伴い、データセンタにおける計算リソースの需要が急増している。しかし、現在のRecSysモデルサービスアーキテクチャで採用されているモデルワイドリソース割り当ては、リソースを効果的に活用するに足りず、最適以下の総所有コストにつながる。本稿では,リソースの弾力性と高いメモリ効率を実現するRecSysのモデルであるElasticRecを提案する。 ElasticRecは、RecSysの異種リソース要求に合わせて、きめ細かいリソース割り当てのためのマイクロサービスベースのソフトウェアアーキテクチャに基づいている。さらにElasticRecは,ユーティリティベースのリソースアロケーションを通じて,高いメモリ効率を実現しています。全体として、ElasticRecはメモリ割り当てサイズの平均3.3倍、メモリユーティリティの8.1倍の削減を実現している。 With the increasing popularity of recommendation systems (RecSys), the demand for compute resources in datacenters has surged. However, the model-wise resource allocation employed in current RecSys model serving architectures falls short in effectively utilizing resources, leading to sub-optimal total cost of ownership. We propose ElasticRec, a model serving architecture for RecSys providing resource elasticity and high memory efficiency. ElasticRec is based on a microservice-based software architecture for fine-grained resource allocation, tailored to the heterogeneous resource demands of RecSys. Additionally, ElasticRec achieves high memory efficiency via our utility-based resource allocation. Overall, ElasticRec achieves an average 3.3x reduction in memory allocation size and 8.1x increase in memory utility, resulting in an average 1.6x reduction in deployment cost compared to state-of-the-art RecSys inference serving system.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# ダークプールの潮流 : アドテックサプライチェーンにおけるマルチステークホルダー脆弱性通知に向けて Turning the Tide on Dark Pools? Towards Multi-Stakeholder Vulnerability Notifications in the Ad-Tech Supply Chain ( http://arxiv.org/abs/2406.06958v1 ) ライセンス: Link先を確認	Yash Vekaria, Rishab Nithyanand, Zubair Shafiq,	(参考訳) オンライン広告は、広告主、パブリッシャー、広告ネットワークを含む複数の利害関係者を含む複雑で不透明なサプライチェーンに依存している。最近の研究は、ダークプールのような広告技術サプライチェーンの脆弱性の存在を実証している。ダークプールの緩和を目的とした脆弱性通知キャンペーンの有効性について検討した。脆弱性通知に関する以前の研究は、主に単一ステークホルダーのシナリオに焦点を当てており、脆弱性通知がマルチステークホルダーのアドテックサプライチェーンに有効であるかどうかは不明だ。我々は,出版者や広告ネットワーク,広告主など,さまざまな利害関係者の,学術者や活動家による脆弱性通知に対する応答性を体系的に評価する,自動脆弱性通知パイプラインを実装している。当社の9ヶ月にわたるマルチステークホルダー通知調査は、特に広告ネットワークをターゲットとするオンライン広告エコシステムにおいて、通知がダークプールの脆弱性を減らす効果的な方法であることを示している。さらに、送信者の評判は、統計的に異なる方法で活動家や学者からの通知に対する反応に影響しない。オンライン広告エコシステムをターゲットとする最初の通知研究であるだけでなく、脆弱性通知において、マルチステークホルダーのコンテキストを初めて研究しました。 Online advertising relies on a complex and opaque supply chain that involves multiple stakeholders, including advertisers, publishers, and ad-networks, each with distinct and sometimes conflicting incentives. Recent research has demonstrated the existence of ad-tech supply chain vulnerabilities such as dark pooling, where low-quality publishers bundle their ad inventory with higher-quality ones to mislead advertisers. We investigate the effectiveness of vulnerability notification campaigns aimed at mitigating dark pooling. Prior research on vulnerability notifications has primarily focused on single-stakeholder scenarios, and it is unclear whether vulnerability notifications can be effective in the multi-stakeholder ad-tech supply chain. We implement an automated vulnerability notification pipeline to systematically evaluate the responsiveness of various stakeholders, including publishers, ad-networks, and advertisers to vulnerability notifications by academics and activists. Our nine-month long multi-stakeholder notification study shows that notifications are an effective method for reducing dark pooling vulnerabilities in the online advertising ecosystem, especially when targeted towards ad-networks. Further, the sender reputation does not impact responses to notifications from activists and academics in a statistically different way. In addition to being the first notification study targeting the online advertising ecosystem, we are also the first to study multi-stakeholder context in vulnerability notifications.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# 逆問題の解法に先立つ拡散の解法能力の解法 Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems ( http://arxiv.org/abs/2406.06959v1 ) ライセンス: Link先を確認	Jiawei Zhang, Jiaxin Zhuang, Cheng Jin, Gen Li, Yuantao Gu,	(参考訳) 近年の拡散モデルの出現により、学習可能な事前の精度が大幅に向上し、逆問題に対処する革新的な方法が提示された。逆問題には本質的に極大な後続推定が伴うため、従来の研究は拡散先行を最適化フレームワークに統合する努力をしてきた。しかし、最適化に基づく逆アルゴリズムは、主に拡散モデル内の事前情報を利用しており、デノナイジング能力は無視されている。このギャップを埋めるために、この研究は拡散過程を利用して、2変数の制約付き最適化タスクとしてノイズの多い逆問題を再設定し、補助最適化変数を導入する。勾配トランケーションを用いることで、プロジェクション勾配降下法を効率よく利用し、対応する最適化問題を解く。 ProjDiffと呼ばれる提案アルゴリズムは、最適化フレームワーク内で事前学習した拡散モデルの事前情報と復調能力を効果的に活用する。画像復元タスクとソース分離および部分生成タスクに関する大規模な実験により、ProjDiffは様々な線形および非線形の逆問題に対して優れた性能を示し、実用的な応用の可能性を強調している。コードはhttps://github.com/weigerzan/ProjDiff/.comで入手できる。 The recent emergence of diffusion models has significantly advanced the precision of learnable priors, presenting innovative avenues for addressing inverse problems. Since inverse problems inherently entail maximum a posteriori estimation, previous works have endeavored to integrate diffusion priors into the optimization frameworks. However, prevailing optimization-based inverse algorithms primarily exploit the prior information within the diffusion models while neglecting their denoising capability. To bridge this gap, this work leverages the diffusion process to reframe noisy inverse problems as a two-variable constrained optimization task by introducing an auxiliary optimization variable. By employing gradient truncation, the projection gradient descent method is efficiently utilized to solve the corresponding optimization problem. The proposed algorithm, termed ProjDiff, effectively harnesses the prior information and the denoising capability of a pre-trained diffusion model within the optimization framework. Extensive experiments on the image restoration tasks and source separation and partial generation tasks demonstrate that ProjDiff exhibits superior performance across various linear and nonlinear inverse problems, highlighting its potential for practical applications. Code is available at https://github.com/weigerzan/ProjDiff/.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# スケールにおける低ランク多辞書選択 Low Rank Multi-Dictionary Selection at Scale ( http://arxiv.org/abs/2406.06960v1 ) ライセンス: Link先を確認	Boya Ma, Maxwell McNeil, Abram Magner, Petko Bogdanov,	(参考訳) スパース辞書符号化フレームワークは、いくつかの予め定義された辞書原子の線形結合として信号を表現している。画像、時系列、グラフ信号、最近では2方向(または2次元)の時空間データに使われ、時空間辞書と時空間辞書が併用されている。大規模かつオーバーコンプリートな辞書は高品質なモデルを可能にするが、複数辞書設定でさらに悪化するスケーラビリティの課題も生んでいる。したがって、本稿で論じる重要な問題は、大規模な辞書やデータセットに対して、どのようにマルチ辞書コーディングをスケールするかである。 LRMDSという低ランクスパース符号化のための多次元原子選択手法を提案する。大規模辞書やデータセットへのスケーラビリティを実現するため、データとの整合性に基づいて列列原子対の群を段階的に選択し、対応するサブ辞書を介して凸緩和符号化を行う。理論的および実験的に、データは原子のスパース部分集合で低ランクの符号化を行う場合、軽度な仮定の下で強い保証でそれらを選択することができることを示した。さらに、合成と実世界の両方のデータセットおよび様々な符号化辞書において、LRMDSのスケーラビリティと品質を実証する。ベースラインに比べて3倍から10倍のスピードアップを実現し、固定されたターゲット原子数を持つ実世界のデータセットの表示品質を最大2桁改善する。 The sparse dictionary coding framework represents signals as a linear combination of a few predefined dictionary atoms. It has been employed for images, time series, graph signals and recently for 2-way (or 2D) spatio-temporal data employing jointly temporal and spatial dictionaries. Large and over-complete dictionaries enable high-quality models, but also pose scalability challenges which are exacerbated in multi-dictionary settings. Hence, an important problem that we address in this paper is: How to scale multi-dictionary coding for large dictionaries and datasets? We propose a multi-dictionary atom selection technique for low-rank sparse coding named LRMDS. To enable scalability to large dictionaries and datasets, it progressively selects groups of row-column atom pairs based on their alignment with the data and performs convex relaxation coding via the corresponding sub-dictionaries. We demonstrate both theoretically and experimentally that when the data has a low-rank encoding with a sparse subset of the atoms, LRMDS is able to select them with strong guarantees under mild assumptions. Furthermore, we demonstrate the scalability and quality of LRMDS in both synthetic and real-world datasets and for a range of coding dictionaries. It achieves 3X to 10X speed-up compared to baselines, while obtaining up to two orders of magnitude improvement in representation quality on some of the real world datasets given a fixed target number of atoms.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# 非エルミート時空と一般化熱場二重形式論 Non-Hermitian spacetime and generalized thermofield double formalism ( http://arxiv.org/abs/2406.06961v1 ) ライセンス: Link先を確認	Wu-zhong Guo, Tao Liu,	(参考訳) 本稿では,非エルミート遷移行列とその重力双対について考察する。場の量子論や重力理論の状態は通常ユークリッド経路積分を用いて準備される。我々は、ユークリッド場の量子論において異なる内積を用いる場合、状態を記述するために非エルミート遷移を導入することは自然かつ必要であることを示した。正定値であるような$\eta$-pseudo-Hermitianの遷移行列は密度行列と同じ役割を果たすが、作用素 $\eta$ は内積の定義と密接に関係している。さらに、これらの遷移行列と密度行列との間には1対1の対応が存在する。 AdS/CFT対応の文脈では、境界場理論におけるユークリッド経路積分はバルク重力経路積分に変換できる。非エルミート時空の構成と解釈について概説する。具体的には、一般的な場合における熱場の概念の実現と、永遠のブラックホールに双対する重力状態の理解において、非エルミート遷移行列が果たす重要な役割を実証する。この文脈では、遷移行列の擬エントロピーはブラックホールエントロピーと解釈できる。最後に、擬エントロピーの強い部分付加性特性と、非エルミート遷移行列と複素計量との接続を強調した。 In this paper, we explore the non-Hermitian transition matrix and its gravity dual. States in quantum field theories or gravity theories are typically prepared using Euclidean path integrals. We demonstrate that it is both natural and necessary to introduce non-Hermitian transitions to describe the state when employing different inner products in Euclidean quantum field theories. Transition matrices that are $\eta$-pseudo-Hermitian, with $\eta$ being positive-definite, play the same role as density matrices, where the operator $\eta$ is closely related to the definition of the inner product. Moreover, there exists a one-to-one correspondence between these transition matrices and density matrices. In the context of AdS/CFT correspondence, the Euclidean path integral in the boundary field theory can be translated to the bulk gravitational path integral. We provide an overview of the construction and interpretation of non-Hermitian spacetime. Specifically, we demonstrate the crucial role of the non-Hermitian transition matrix in realizing the thermofield concept in general cases and in understanding the gravity states dual to the eternal black hole. In this context, the pseudoentropy of the transition matrix can also be interpreted as black hole entropy. Finally, we highlight the strong subadditivity property of pseudoentropy, and the connection between non-Hermitian transition matrices and complex metrics.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# 大規模言語モデルのためのサブネットワーク学習の展開 Evolving Subnetwork Training for Large Language Models ( http://arxiv.org/abs/2406.06962v1 ) ライセンス: Link先を確認	Hanqi Li, Lu Chen, Da Ma, Zijian Wu, Su Zhu, Kai Yu,	(参考訳) 大規模な言語モデルは、人工知能研究の新しい時代を支えてきた。しかし、そのかなりの訓練費は、さらなる開発と広く採用を妨げている。本稿では,大規模言語モデルのパラメータの冗長性に着想を得て,新しい訓練パラダイムであるEvolving Subnetwork Training (EST)を提案する。 ESTは、大規模な言語モデルのレイヤと、各レイヤで一般的に使用されるモジュール、MHA(Multi-Head Attention)とMLP(Multi-Layer Perceptron)からサブネットワークをサンプリングする。トレーニングプロセス中のサブネットワークのサイズを徐々に増加させることで、ESTはトレーニングコストを削減できる。 GPT2モデルとTinyLlamaモデルのトレーニングにESTを適用すると、事前トレーニングデータセットの損失が増大することなく、GPT2では26.7%のFLOPを、TinyLlamaでは25.0のFLOPを削減できる。さらに、ESTは下流タスクのパフォーマンス改善につながります。さらに、トレーニング力学とドロップアウト理論に基づく直感的な理論的研究を行い、ESTの実現可能性を保証する。私たちのコードはhttps://github.com/OpenDFM/ESTで公開されています。 Large language models have ushered in a new era of artificial intelligence research. However, their substantial training costs hinder further development and widespread adoption. In this paper, inspired by the redundancy in the parameters of large language models, we propose a novel training paradigm: Evolving Subnetwork Training (EST). EST samples subnetworks from the layers of the large language model and from commonly used modules within each layer, Multi-Head Attention (MHA) and Multi-Layer Perceptron (MLP). By gradually increasing the size of the subnetworks during the training process, EST can save the cost of training. We apply EST to train GPT2 model and TinyLlama model, resulting in 26.7\% FLOPs saving for GPT2 and 25.0\% for TinyLlama without an increase in loss on the pre-training dataset. Moreover, EST leads to performance improvements in downstream tasks, indicating that it benefits generalization. Additionally, we provide intuitive theoretical studies based on training dynamics and Dropout theory to ensure the feasibility of EST. Our code is available at https://github.com/OpenDFM/EST.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# ビデオ強化多重モード拡散検出の欠如 Missingness-resilient Video-enhanced Multimodal Disfluency Detection ( http://arxiv.org/abs/2406.06964v1 ) ライセンス: Link先を確認	Payal Mohapatra, Shamika Likhite, Subrata Biswas, Bashima Islam, Qi Zhu,	(参考訳) 既存の音声拡散検出技術の多くは音響データのみに依存している。本研究では,利用可能な映像データと音声を併用した実用的なマルチモーダル・ディフルエンシ検出手法を提案する。本稿では,時間的・意味的な文脈を学習するために,重み付けモダリティ非依存エンコーダを統一した新しい融合手法を提案する。私たちのレジリエントなデザインは、推論中にビデオのモダリティが欠落することがある現実世界のシナリオに対応しています。また、両モードが完成することが保証された場合の代替核融合戦略も提示する。 5つのディフルエンシ検出タスクにわたる実験では、統合マルチモーダルアプローチがオーディオのみのアンモダル法よりも優れており、ビデオとオーディオの両モードが常に利用できる場合、平均10%(つまり10ポイント増加)、ビデオのモダリティが欠如している場合でも7%の絶対的な改善が得られている。 Most existing speech disfluency detection techniques only rely upon acoustic data. In this work, we present a practical multimodal disfluency detection approach that leverages available video data together with audio. We curate an audiovisual dataset and propose a novel fusion technique with unified weight-sharing modality-agnostic encoders to learn the temporal and semantic context. Our resilient design accommodates real-world scenarios where the video modality may sometimes be missing during inference. We also present alternative fusion strategies when both modalities are assured to be complete. In experiments across five disfluency-detection tasks, our unified multimodal approach significantly outperforms Audio-only unimodal methods, yielding an average absolute improvement of 10% (i.e., 10 percentage point increase) when both video and audio modalities are always available, and 7% even when video modality is missing in half of the samples.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# 単一モーダルから多モーダル顔面深度検出への展開:調査 Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A Survey ( http://arxiv.org/abs/2406.06965v1 ) ライセンス: Link先を確認	Ping Liu, Qiqi Tao, Joey Tianyi Zhou,	(参考訳) この調査は、人工知能の急速な進歩の中で、ディープフェイク検出の重要な課題に対処する。ビデオ、音声、テキストを含むAI生成メディアがより現実的になるにつれて、誤情報を拡散したり、身元確認詐欺を犯すリスクが高まる。顔中心のディープフェイクに焦点を当てたこの研究は、従来の単一モダリティ手法から、オーディオ視覚とテキスト視覚のシナリオを扱う高度なマルチモーダルアプローチへの進化を辿る。本稿では,検出手法の包括的分類法を提供し,自動エンコーダやGANから拡散モデルへの生成手法の進化を論じ,それらの特性によってこれらの技術を分類する。私たちの知る限りでは、この種の調査はこれが初めてである。また、新しい生成モデルに検出手法を適用することの課題や、ディープフェイク検出器の信頼性と堅牢性の向上、今後の研究に向けての方向性についても検討する。この調査は研究者に詳細なロードマップを提供し、メディア生成、特に顔の偽造にAIを欺くことに対処する技術開発を支援している。すべての関連論文のキュレートされたリストは \href{https://github.com/qiqitao77/Comprehensive-Advances-in-Deepfake-Detection-Spanning-Diverse-Modalitie s}{https://github.com/qiqitao77/Awesome-Comprehensive-Deepfake-Detection} にある。 This survey addresses the critical challenge of deepfake detection amidst the rapid advancements in artificial intelligence. As AI-generated media, including video, audio and text, become more realistic, the risk of misuse to spread misinformation and commit identity fraud increases. Focused on face-centric deepfakes, this work traces the evolution from traditional single-modality methods to sophisticated multi-modal approaches that handle audio-visual and text-visual scenarios. We provide comprehensive taxonomies of detection techniques, discuss the evolution of generative methods from auto-encoders and GANs to diffusion models, and categorize these technologies by their unique attributes. To our knowledge, this is the first survey of its kind. We also explore the challenges of adapting detection methods to new generative models and enhancing the reliability and robustness of deepfake detectors, proposing directions for future research. This survey offers a detailed roadmap for researchers, supporting the development of technologies to counter the deceptive use of AI in media creation, particularly facial forgery. A curated list of all related papers can be found at \href{https://github.com/qiqitao77/Comprehensive-Advances-in-Deepfake-Detection-Spanning-Diverse-Modalitie s}{https://github.com/qiqitao77/Awesome-Comprehensive-Deepfake-Detection}.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# 量子バスとのカップリングによる工学的不純物ベル状態 Engineering impurity Bell states through coupling with a quantum bath ( http://arxiv.org/abs/2406.06966v1 ) ライセンス: Link先を確認	Tran Duong Anh-Tai, Thomás Fogarty, Sergi de María-García, Thomas Busch, Miguel A. García-March,	(参考訳) 理論的には、フェシュバッハ共鳴による粒子間相互作用を制御できる能力のみを用いて、多成分超低温原子ガス中でベル状態を生成する可能性を示す。このために、2つの区別可能な不純物がいくつかのボソンの原子背景雲に浸漬され、システム全体が1次元のハーモニックトラップに閉じ込められていると考える。数値解析により,2つの不純物がボゾン浴からの相互作用により空間的に絡み合ったバイポーラロン状態を形成することを示す。本分析は, トラップの左右の粒子位置を計測することにより, 両不純物間の相関関係を2モードで計算する。種間相互作用は強い絡み合った不純物状態を作り出すために重要であるが、不純物と3体不純物-バス相関の順序によって相関を阻害することもある。これらの欠点は, 浴槽の大きさ, 質量, 種内相互作用の操作によって緩和され, 不純物-不純物相互作用の広範囲にわたって不純物ベル状態が生成できることを示す。 We theoretically demonstrate the feasibility of creating Bell states in multi-component ultra-cold atomic gases by solely using the ability to control the inter-particle interactions via Feshbach resonances. For this we consider two distinguishable impurities immersed in an atomic background cloud of a few bosons, with the entire system being confined in a one-dimensional harmonic trap. By analyzing the numerically obtained ground states we demonstrate that the two impurities can form spatially entangled bipolaron states due to mediated interactions from the bosonic bath. Our analysis is based on calculating the correlations between the two impurities in a two-mode basis, which is experimentally accessible by measuring the particles positions in the left or right sides of the trap. While interspecies interactions are crucial in order to create the strongly entangled impurity states, it can also inhibit correlations depending on the ordering of the impurities and three-body impurity-bath correlations. We show how these drawbacks can be mitigated by manipulating the properties of the bath, namely its size, mass and intraspecies interactions, allowing to create impurity Bell states over a wide range of impurity-impurity interactions.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# 対人的事例を用いた深層学習モデルの二重思考と知覚分析 Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples ( http://arxiv.org/abs/2406.06967v1 ) ライセンス: Link先を確認	Kailas Dayanandan, Anand Sinha, Brejesh Lall,	(参考訳) 二重思考フレームワークは、高速で直感的な処理と遅くて論理的な処理を考慮に入れている。視覚における双対思考の知覚は、直感的および論理的処理からの推論が異なるイメージを必要とする。本稿では、人間の視覚における二重思考の枠組みを実証するために、敵対的データセットを導入し、深層学習モデルの定性的行動の研究にも役立てる。本研究は,物体を局所化するインスタンスセグメンテーションモデルを用いて,人間の視覚の計算モデルとして分類モデルを用いた場合の重大な批判にも対処する。この証拠は、人間の視覚のインスタンスを識別する際の形状の重要性を強調し、深層学習モデルではサブコンポーネントの位置と数に関する誤りによって示されるように、サブ構造に対する理解が欠如していることを示している。さらに、モデルによるエラーと直感的なヒューマン処理の類似性は、モデルが人間の視覚における直感的な思考にのみ対応していることを示している。 The dual thinking framework considers fast, intuitive processing and slower, logical processing. The perception of dual thinking in vision requires images where inferences from intuitive and logical processing differ. We introduce an adversarial dataset to provide evidence for the dual thinking framework in human vision, which also aids in studying the qualitative behavior of deep learning models. Our study also addresses a major criticism of using classification models as computational models of human vision by using instance segmentation models that localize objects. The evidence underscores the importance of shape in identifying instances in human vision and shows that deep learning models lack an understanding of sub-structures, as indicated by errors related to the position and number of sub-components. Additionally, the similarity in errors made by models and intuitive human processing indicates that models only address intuitive thinking in human vision.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# ノルムを越えて:回帰モデルにおける予測誤差の検出 Beyond the Norms: Detecting Prediction Errors in Regression Models ( http://arxiv.org/abs/2406.06968v1 ) ライセンス: Link先を確認	Andres Altieri, Marco Romanelli, Georg Pichler, Florence Alberge, Pablo Piantanida,	(参考訳) 本稿では, 回帰アルゴリズムにおける信頼できない振る舞いを, 内在的変動(例えば, アレター的不確実性)やモデル誤差(例えば, モデル不確実性)から生じうる問題に対処する。まず、回帰における信頼できないという概念、すなわち回帰器の出力が特定の不一致(または誤り)を超えたときに、正式に導入する。そして,確率的モデリングのための強力なツールを用いて,差分密度を推定し,その統計的多様性を統計的相似性のために提案した指標を用いて測定する。これにより、回帰結果の不確実性を表すデータ駆動スコアを導出できる。複数の回帰タスクに対するエラー検出の実証的改善、人気ベースラインのアプローチを一貫して上回り、不確実性定量化と安全な機械学習システムの幅広い分野に寄与することを示す。私たちのコードはhttps://zenodo.org/records/1128 1964で公開されています。 This paper tackles the challenge of detecting unreliable behavior in regression algorithms, which may arise from intrinsic variability (e.g., aleatoric uncertainty) or modeling errors (e.g., model uncertainty). First, we formally introduce the notion of unreliability in regression, i.e., when the output of the regressor exceeds a specified discrepancy (or error). Then, using powerful tools for probabilistic modeling, we estimate the discrepancy density, and we measure its statistical diversity using our proposed metric for statistical dissimilarity. In turn, this allows us to derive a data-driven score that expresses the uncertainty of the regression outcome. We show empirical improvements in error detection for multiple regression tasks, consistently outperforming popular baseline approaches, and contributing to the broader field of uncertainty quantification and safe machine learning systems. Our code is available at https://zenodo.org/records/11281964.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# 未知の空間から3次元へのマルチビュー生成:拡散中を回転するNeRF Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion ( http://arxiv.org/abs/2406.06972v1 ) ライセンス: Link先を確認	Xin Yuan, Rana Hanocka, Michael Maire,	(参考訳) 生成モデル問題として未知のポーズから多視点再構成を行った。シーンの無注釈2次元画像の集合から,2次元画像入力からカメラポーズを予測するネットワークと,3次元シーンに対するニューラルレージアンス場(NeRF)のパラメータの両方を同時に学習する。学習を促進するために, 姿勢予測ネットワークとNeRFの両方をDDPM(Denoising Diffusion Probabilistic Model)内にラップし, 標準的な認知目標を用いてシステムを訓練する。本フレームワークでは,そのポーズを予測し,そのポーズからNeRFを描画することにより,入力された2D画像をデノナイズするタスクをシステムに実行する必要がある。これにより,3次元NeRF表現と画像からカメラ外部パラメータへのマッピングを同時に学習せざるを得なくなる。後者を容易にするために、我々は、ポーズを分布として表現するカスタムネットワークアーキテクチャを設計し、訓練されたエンドツーエンドで単独で聴覚を訓練した場合に、ビュー対応を見つけるための暗黙の能力を与える。この手法により,競合する手法が失敗する場面に対して,知識を伴わずにNeRFを構築できる。トレーニングの終了時に学習したNeRFを3次元シーンモデルとして抽出し,使用することができる。 We cast multiview reconstruction from unknown pose as a generative modeling problem. From a collection of unannotated 2D images of a scene, our approach simultaneously learns both a network to predict camera pose from 2D image input, as well as the parameters of a Neural Radiance Field (NeRF) for the 3D scene. To drive learning, we wrap both the pose prediction network and NeRF inside a Denoising Diffusion Probabilistic Model (DDPM) and train the system via the standard denoising objective. Our framework requires the system accomplish the task of denoising an input 2D image by predicting its pose and rendering the NeRF from that pose. Learning to denoise thus forces the system to concurrently learn the underlying 3D NeRF representation and a mapping from images to camera extrinsic parameters. To facilitate the latter, we design a custom network architecture to represent pose as a distribution, granting implicit capacity for discovering view correspondences when trained end-to-end for denoising alone. This technique allows our system to successfully build NeRFs, without pose knowledge, for challenging scenes where competing methods fail. At the conclusion of training, our learned NeRF can be extracted and used as a 3D scene model; our full system can be used to sample novel camera poses and generate novel-view images.	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# RWKV-CLIP:ロバストな視覚言語表現学習者 RWKV-CLIP: A Robust Vision-Language Representation Learner ( http://arxiv.org/abs/2406.06973v1 ) ライセンス: Link先を確認	Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai, Jiankang Deng,	(参考訳) コントラスト言語-画像事前学習(CLIP)は、Webサイトから取得した画像テキストペアでデータセットを拡張することにより、様々な視覚言語タスクのパフォーマンスを著しく向上させた。本稿では、データとモデルアーキテクチャの観点からCLIPをさらに探求する。インターネットからクロールした大規模画像テキストデータの質を高めるため,Web ベースのテキスト,合成キャプション,検出タグからコンテンツを合成・洗練する大規模言語モデル (LLM) を利用した多種多様な記述生成フレームワークを導入する。さらに,変換器の効果的な並列学習とRNNの効率的な推論を組み合わせた最初のRWKV駆動型視覚言語表現学習モデルであるRWKV-CLIPを提案する。 RWKV-CLIPは、線形プローブ、ゼロショット分類、ゼロショット画像テキスト検索など、複数の下流タスクにおいて最先端のパフォーマンスを達成する。将来の研究を容易にするため、コードと事前訓練されたモデルはhttps://github.com/deepglint/RWKV-CLIPでリリースされる。 Contrastive Language-Image Pre-training (CLIP) has significantly improved performance in various vision-language tasks by expanding the dataset with image-text pairs obtained from websites. This paper further explores CLIP from the perspectives of data and model architecture. To address the prevalence of noisy data and enhance the quality of large-scale image-text data crawled from the internet, we introduce a diverse description generation framework that can leverage Large Language Models (LLMs) to synthesize and refine content from web-based texts, synthetic captions, and detection tags. Furthermore, we propose RWKV-CLIP, the first RWKV-driven vision-language representation learning model that combines the effective parallel training of transformers with the efficient inference of RNNs. Comprehensive experiments across various model scales and pre-training datasets demonstrate that RWKV-CLIP is a robust and efficient vision-language representation learner, it achieves state-of-the-art performance in several downstream tasks, including linear probe, zero-shot classification, and zero-shot image-text retrieval. To facilitate future research, the code and pre-trained models are released at https://github.com/deepglint/RWKV-CLIP	翻訳日:2024-06-12 17:25:19 公開日:2024-06-11
# TraceMesh: 分散トレースのためのスケーラブルでストリーミングのサンプリング TraceMesh: Scalable and Streaming Sampling for Distributed Traces ( http://arxiv.org/abs/2406.06975v1 ) ライセンス: Link先を確認	Zhuangbin Chen, Zhihan Jiang, Yuxin Su, Michael R. Lyu, Zibin Zheng,	(参考訳) 分散トレースは、クラウドベースおよびデータセンタシステムの監視において、基本的な要素として機能する。これは、システム依存関係とパフォーマンスボトルネックを理解するために不可欠である、複数のサービスにわたる要求やオペレーションの完全なライフサイクルを可視化する。計算とストレージのオーバーヘッドを軽減するため、ほとんどのトレースフレームワークでは、重複や冗長な情報を必然的にキャプチャする一様サンプリング戦略を採用している。より高度な手法では、より情報的なトレースに対してサンプリングをバイアスする学習ベースのアプローチを採用している。しかし, 既存の手法では, トレースデータの高次元的, 動的性質を考慮せず, トレースサンプリングの量産に不可欠である。本稿では,分散トレースのためのスケーラブルでストリーミングなサンプリングツールであるTraceMeshについて述べる。 TraceMeshはLocality-Sensitivity Hashing (LSH)を使用して、トレースを低次元空間に投影し、類似性を保ちながらサンプリング効率を向上させる。このプロセスでは、TraceMeshは、統一的で合理化された方法で、これまで見つからなかったトレース機能に対応している。その後、TraceMeshはクラスタリングを進化させ、サンプリング決定を動的に調整し、繰り返しトレースのオーバーサンプリングを避ける。提案手法は,オープンソースのマイクロサービスベンチマークと実運用サービスシステムから収集したトレースデータを用いて評価する。実験結果から,TraceMeshはサンプリング精度と効率の両面で,最先端の手法よりも優れた性能を示した。 Distributed tracing serves as a fundamental element in the monitoring of cloud-based and datacenter systems. It provides visibility into the full lifecycle of a request or operation across multiple services, which is essential for understanding system dependencies and performance bottlenecks. To mitigate computational and storage overheads, most tracing frameworks adopt a uniform sampling strategy, which inevitably captures overlapping and redundant information. More advanced methods employ learning-based approaches to bias the sampling toward more informative traces. However, existing methods fall short of considering the high-dimensional and dynamic nature of trace data, which is essential for the production deployment of trace sampling. To address these practical challenges, in this paper we present TraceMesh, a scalable and streaming sampler for distributed traces. TraceMesh employs Locality-Sensitivity Hashing (LSH) to improve sampling efficiency by projecting traces into a low-dimensional space while preserving their similarity. In this process, TraceMesh accommodates previously unseen trace features in a unified and streamlined way. Subsequently, TraceMesh samples traces through evolving clustering, which dynamically adjusts the sampling decision to avoid over-sampling of recurring traces. The proposed method is evaluated with trace data collected from both open-source microservice benchmarks and production service systems. Experimental results demonstrate that TraceMesh outperforms state-of-the-art methods by a significant margin in both sampling accuracy and efficiency.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# 離散辞書に基づく構造化表現学習のための分解層 Discrete Dictionary-based Decomposition Layer for Structured Representation Learning ( http://arxiv.org/abs/2406.06976v1 ) ライセンス: Link先を確認	Taewon Park, Hyun-Chul Kim, Minho Lee,	(参考訳) ニューロシンボリックニューラルネットワークは、ニューラルネットワークとシンボリック操作を統合するために広範囲に研究され、体系的な一般化が改善されている。具体的には、Tensor Product Representation (TPR)フレームワークは、ベクトル空間内のデータのシンボル構造を符号化することにより、ニューラルネットワークが識別可能なシンボル操作を実行できる。しかしながら、TPRベースのニューラルネットワークは、目に見えないデータを構造化されたTPR表現に分解するのに苦労し、その象徴的な操作を損なうことも多い。この分解問題に対処するため、TPRモデルにおける分解能力を高めるために、離散辞書に基づく分解層(D3)を提案する。 D3は離散的に学習可能なキー値辞書を使用して、分解操作に不可欠な象徴的特徴をキャプチャする。トレーニング中に得られた知識を活用して、入力データをこれらの辞書内の事前学習されたシンボル特徴にマッピングすることで構造化されたTPR表現を生成する。 D3は単純なドロップイン層であり、変更することなく任意のTPRベースのモデルにシームレスに統合できる。実験の結果,D3 は様々な TPR ベースモデルの体系的一般化を著しく改善し,追加パラメータを少なくすることを示した。特に、D3は、目に見えない組合せデータの体系的な分解を要求する合成タスクのベースラインモデルより優れている。 Neuro-symbolic neural networks have been extensively studied to integrate symbolic operations with neural networks, thereby improving systematic generalization. Specifically, Tensor Product Representation (TPR) framework enables neural networks to perform differentiable symbolic operations by encoding the symbolic structure of data within vector spaces. However, TPR-based neural networks often struggle to decompose unseen data into structured TPR representations, undermining their symbolic operations. To address this decomposition problem, we propose a Discrete Dictionary-based Decomposition (D3) layer designed to enhance the decomposition capabilities of TPR-based models. D3 employs discrete, learnable key-value dictionaries trained to capture symbolic features essential for decomposition operations. It leverages the prior knowledge acquired during training to generate structured TPR representations by mapping input data to pre-learned symbolic features within these dictionaries. D3 is a straightforward drop-in layer that can be seamlessly integrated into any TPR-based model without modifications. Our experimental results demonstrate that D3 significantly improves the systematic generalization of various TPR-based models while requiring fewer additional parameters. Notably, D3 outperforms baseline models on the synthetic task that demands the systematic decomposition of unseen combinatorial data.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# クラウドソーシングアノテーションのトレーニングによるクロスドメイン対応作業者選択 Cross-domain-aware Worker Selection with Training for Crowdsourced Annotation ( http://arxiv.org/abs/2406.06977v1 ) ライセンス: Link先を確認	Yushi Sun, Jiachuan Wang, Peng Cheng, Libin Zheng, Lei Chen, Jian Yin,	(参考訳) クラウドソーシングによる注釈は、労働者のプールが与えられた場合の効果的な選択方式に依存する、漸進的な注意を惹きつける。既存の手法では,2つの重要なポイントが欠落する一方,真理に基づくタスクのパフォーマンスに基づいて作業者を選択する方法が提案されている。 1)他の業務における労働者の歴史的パフォーマンス。実世界のシナリオでは、労働者はトレーニング前に知られていなかった、クロスドメインと呼ばれる以前のタスクと相関する新しいタスクを解決する必要がある。 2)労働者としてのダイナミックワーカーのパフォーマンスは,真理から学ぶ。本稿では、クロスドメイン・アウェア・ワーカー選択と呼ばれるアロケーション・スキームをトレーニング・アプローチで設計する際の2つの要因について考察する。本手法では,クロスドメイン相関を統計的に解析し,作業者の学習成果を動的にシミュレートする2つの推定モジュールを提案する。労働者の排除過程を理論的に分析した枠組みが与えられる。提案手法の有効性を検証するため,2つの新しい実世界のデータセットを収集し,合成データセットを生成する。実験の結果,本手法は実世界のデータセットと合成データセットのベースラインよりも優れていた。 Annotation through crowdsourcing draws incremental attention, which relies on an effective selection scheme given a pool of workers. Existing methods propose to select workers based on their performance on tasks with ground truth, while two important points are missed. 1) The historical performances of workers in other tasks. In real-world scenarios, workers need to solve a new task whose correlation with previous tasks is not well-known before the training, which is called cross-domain. 2) The dynamic worker performance as workers will learn from the ground truth. In this paper, we consider both factors in designing an allocation scheme named cross-domain-aware worker selection with training approach. Our approach proposes two estimation modules to both statistically analyze the cross-domain correlation and simulate the learning gain of workers dynamically. A framework with a theoretical analysis of the worker elimination process is given. To validate the effectiveness of our methods, we collect two novel real-world datasets and generate synthetic datasets. The experiment results show that our method outperforms the baselines on both real-world and synthetic datasets.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# Hydra-MDP:マルチターゲットハイドラ蒸留によるエンドツーエンドマルチモーダルプランニング Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation ( http://arxiv.org/abs/2406.06978v1 ) ライセンス: Link先を確認	Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez,	(参考訳) 教師-学生モデルに複数の教師を取り入れた新しいパラダイムであるHydra-MDPを提案する。このアプローチでは、人間とルールベースの教師の両方から知識を蒸留して学生モデルを訓練し、様々な評価指標に合わせて様々な軌道候補を学習するマルチヘッドデコーダを特徴とする。ルールベースの教師の知識により、Hydra-MDPは、非微分不可能なポストプロセッシングに頼るのではなく、エンド・ツー・エンドの方法で環境がプランニングにどのように影響するかを学ぶ。この手法はナブシム問題において1^{st}$の精度を達成し、様々な運転環境や条件における一般化の大幅な改善を示す。コードは \url{https://github.com/woxihuanjiangguo/Hydra-MDP} で入手できる。 We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment influences the planning in an end-to-end manner instead of resorting to non-differentiable post-processing. This method achieves the $1^{st}$ place in the Navsim challenge, demonstrating significant improvements in generalization across diverse driving environments and conditions. Code will be available at \url{https://github.com/woxihuanjiangguo/Hydra-MDP}	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# AudioMarkBench: オーディオ透かしのロバストさのベンチマーク AudioMarkBench: Benchmarking Robustness of Audio Watermarking ( http://arxiv.org/abs/2406.06979v1 ) ライセンス: Link先を確認	Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, Neil Zhenqiang Gong,	(参考訳) 合成音声のリアリズムの増大は、音声合成モデルの進歩によって引き起こされ、偽造や偽情報に関する倫理的懸念が高まる。オーディオ透かしは、人間の知覚できない透かしをAI生成オーディオに埋め込むことで、有望なソリューションを提供する。しかし,音声透かしの対外的摂動に対する頑健性はいまだに実証されていない。本稿では,透かし除去と透かし偽造に対する音響透かしの堅牢性を評価するための最初の体系的ベンチマークであるAudioMarkBenchを紹介する。 AudioMarkBenchには、言語、生物学的性、年齢、最先端の3つの透かし方法、そして15種類の摂動を含むCommon-Voiceから作成された新しいデータセットが含まれている。ノーボックス,ブラックボックス,ホワイトボックスの設定における摂動に対して,これらの手法の頑健さをベンチマークする。以上の結果から,従来の透かし手法の脆弱性を強調し,より堅牢で公正な透かしソリューションの必要性を強調した。私たちのデータセットとコードは、 \url{https://github.com/moyangkuo/AudioMarkBench}で公開されています。 The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied. We present AudioMarkBench, the first systematic benchmark for evaluating the robustness of audio watermarking against watermark removal and watermark forgery. AudioMarkBench includes a new dataset created from Common-Voice across languages, biological sexes, and ages, 3 state-of-the-art watermarking methods, and 15 types of perturbations. We benchmark the robustness of these methods against the perturbations in no-box, black-box, and white-box settings. Our findings highlight the vulnerabilities of current watermarking techniques and emphasize the need for more robust and fair audio watermarking solutions. Our dataset and code are publicly available at \url{https://github.com/moyangkuo/AudioMarkBench}.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# マルチセット・グラフニューラルネットワークのヘルダー安定性について On the Hölder Stability of Multiset and Graph Neural Networks ( http://arxiv.org/abs/2406.06984v1 ) ライセンス: Link先を確認	Yair Davidson, Nadav Dym,	(参考訳) 幸いなことに、和プーリングに基づくマルチセットニューラルネットワークは、すべての異なるマルチセットを分離することができ、その結果、メッセージパッシングニューラルネットワーク(MPNN)によって、1-WLグラフ同型テストで分離可能なすべてのグラフのペアを分離することができる。しかし、この分離の質は非常に弱く、また「分離可能な」多重集合とグラフの埋め込みが固定有限精度で同一視される程度である。本研究では,パラメトリック関数に対するリプシッツとH\"{o}lder連続性の新たな適応により,マルチセットモデルとMPNNの分離品質を解析することを提案する。一般的な和ベースのモデルは、ネットワークの深さとともに急速に減衰するH\"{o}lder指数を持つ低H\"{o}lder連続であることを示す。解析の結果,3つの1-WL反復で分離できるグラフの逆例が得られたが,標準最大値MPNNでは事実上分離できない。これを改善するために,分離品質を向上した2つの新しいMPNNを提案する。これらのMPNNは、敵の例を簡単に分類でき、標準的なグラフ学習タスクにおける標準MPNNと良好に比較できることを示す。 Famously, multiset neural networks based on sum-pooling can separate all distinct multisets, and as a result can be used by message passing neural networks (MPNNs) to separate all pairs of graphs that can be separated by the 1-WL graph isomorphism test. However, the quality of this separation may be very weak, to the extent that the embeddings of "separable" multisets and graphs might even be considered identical when using fixed finite precision. In this work, we propose to fully analyze the separation quality of multiset models and MPNNs via a novel adaptation of Lipschitz and H\"{o}lder continuity to parametric functions. We prove that common sum-based models are lower-H\"{o}lder continuous, with a H\"{o}lder exponent that decays rapidly with the network's depth. Our analysis leads to adversarial examples of graphs which can be separated by three 1-WL iterations, but cannot be separated in practice by standard maximally powerful MPNNs. To remedy this, we propose two novel MPNNs with improved separation quality, one of which is lower Lipschitz continuous. We show these MPNNs can easily classify our adversarial examples, and compare favorably with standard MPNNs on standard graph learning tasks.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# 動的車両ネットワークにおけるDNN分割,タスクオフロード,資源配分:リアプノフ誘導拡散型強化学習アプローチ DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach ( http://arxiv.org/abs/2406.06986v1 ) ライセンス: Link先を確認	Zhang Liu, Hongyang Du, Junzhe Lin, Zhibin Gao, Lianfen Huang, Seyyedali Hosseinalipour, Dusit Niyato,	(参考訳) 人工知能(AI)の急速な進歩は、車載ネットワークのエコシステムにディープニューラルネットワーク(DNN)ベースのタスクを導入した。これらのタスクは計算集約的であり、1台の車両の能力を超える相当な計算資源を必要とすることが多い。この課題に対処するため、Vehicular Edge Computing(VEC)がソリューションとして登場し、V2V/V2I通信によるリソースプールを通じてDNNベースのタスクのコンピューティングサービスを提供する。本稿では,VECにおける共同DNNパーティショニング,タスクオフロード,リソース割り当ての問題を,動的長期最適化として定式化する。我々の目標は、時間とともにシステムの安定性を保証しながら、DNNベースのタスク完了時間を最小化することである。この目的のために、我々はまずLyapunov最適化手法を利用して、安定性の制約による元の長期最適化をスロットごとの決定論的問題に分離する。その後,マルチエージェント拡散に基づく深層強化学習(MAD2RL)アルゴリズムを提案する。さらに,コンベックス最適化手法をサブルーチンとしてMAD2RLに統合し,計算資源の割り当てを行い,学習効率を向上させる。実世界の車両移動軌跡のシミュレーションを通じて,提案アルゴリズムの既存のベンチマーク手法と比較して優れた性能を示す。 The rapid advancement of Artificial Intelligence (AI) has introduced Deep Neural Network (DNN)-based tasks to the ecosystem of vehicular networks. These tasks are often computation-intensive, requiring substantial computation resources, which are beyond the capability of a single vehicle. To address this challenge, Vehicular Edge Computing (VEC) has emerged as a solution, offering computing services for DNN-based tasks through resource pooling via Vehicle-to-Vehicle/Infrastructure (V2V/V2I) communications. In this paper, we formulate the problem of joint DNN partitioning, task offloading, and resource allocation in VEC as a dynamic long-term optimization. Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time. To this end, we first leverage a Lyapunov optimization technique to decouple the original long-term optimization with stability constraints into a per-slot deterministic problem. Afterwards, we propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models to determine the optimal DNN partitioning and task offloading decisions. Furthermore, we integrate convex optimization techniques into MAD2RL as a subroutine to allocate computation resources, enhancing the learning efficiency. Through simulations under real-world movement traces of vehicles, we demonstrate the superior performance of our proposed algorithm compared to existing benchmark solutions.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# ポジションペーパー:効果的なAIガバナンスには技術研究とタレントが必要だ Position Paper: Technical Research and Talent is Needed for Effective AI Governance ( http://arxiv.org/abs/2406.06987v1 ) ライセンス: Link先を確認	Anka Reuel, Lisa Soder, Ben Bucknall, Trond Arne Undheim,	(参考訳) AI能力の最近の進歩と、AIシステムの社会への広範な統合により、世界中の政府は、規制やその他のガバナンスツールを通じて、これらの技術に関連する潜在的な害とリスクを軽減しようとしている。しかしながら、ガバナンスの願望と、その実現に必要な技術ツールの現在の状態の間には、大きなギャップがあります。本稿では、EU、米国、中国の公共機関が発行する政策文書を調査し、提案された政策行動の実施に必要な技術的要件と現在の技術状況との間の特定分野の切り離しを明らかにする。我々の分析は、AIガバナンスにおけるAI/ML研究コミュニティのより緊密な統合を求める動機となっている。一規制措置の現在と想定される技術的基盤のギャップを埋めることを目的とした技術研究の触媒となること。二管理機関内の技術専門知識のレベルを高めて、AIの効果的なガバナンスを通知し、指導すること。 In light of recent advancements in AI capabilities and the increasingly widespread integration of AI systems into society, governments worldwide are actively seeking to mitigate the potential harms and risks associated with these technologies through regulation and other governance tools. However, there exist significant gaps between governance aspirations and the current state of the technical tooling necessary for their realisation. In this position paper, we survey policy documents published by public-sector institutions in the EU, US, and China to highlight specific areas of disconnect between the technical requirements necessary for enacting proposed policy actions, and the current technical state of the art. Our analysis motivates a call for tighter integration of the AI/ML research community within AI governance in order to i) catalyse technical research aimed at bridging the gap between current and supposed technical underpinnings of regulatory action, as well as ii) increase the level of technical expertise within governing institutions so as to inform and guide effective governance of AI.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# 有界集合における正則化量子運動:ヒルベルト的側面 Regularized quantum motion in a bounded set: Hilbertian aspects ( http://arxiv.org/abs/2406.06989v1 ) ライセンス: Link先を確認	Fabio Bagarello, Jean-Pierre Gazeau, Camillo Trapani,	(参考訳) 運動量作用素は、ライン {(ディリクレ境界条件付き)の有界区間を移動する粒子の位置作用素に対して、本質的に自己共役ではないことが知られている。我々は, 運動量演算子を正の有界関数で対称的に重み付けすることで, 考慮区間の指示関数を近似することにより, 本質的な自己随伴性を取り戻すことができることを示す。この重み付き運動量作用素は、函数や分布の、いわゆるワイル=ハイゼンベルク共変積分量子化を通じて、同様に重み付き古典運動量から一貫して得られる。 It is known that the momentum operator canonically conjugated to the position operator for a particle moving in some bounded interval of the line {(with Dirichlet boundary conditions) is not essentially self-adjoint}: it has a continuous set of self-adjoint extensions. We prove that essential self-adjointness can be recovered by symmetrically weighting the momentum operator with a positive bounded function approximating the indicator function of the considered interval. This weighted momentum operator is consistently obtained from a similarly weighted classical momentum through the so-called Weyl-Heisenberg covariant integral quantization of functions or distributions.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# 不確かさの教育 : 物体検出における知識蒸留の可能性 Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection ( http://arxiv.org/abs/2406.06999v1 ) ライセンス: Link先を確認	Junfei Yi, Jianxu Mao, Tengfei Liu, Mingjie Li, Hanyu Gu, Hui Zhang, Xiaojun Chang, Yaonan Wang,	(参考訳) 知識蒸留(KD)は、オブジェクト検出タスクにおけるモデル圧縮に広く採用され、有効な方法である。特に特徴に基づく蒸留法は顕著な性能を示した。既存のアプローチは、データノイズと不完全なトレーニングに由来する教師モデルの知識の不確実さを無視することが多い。これは、教師の不完全な指導に過度に依存する可能性があるため、学生モデルが潜伏した知識を学ぶ能力を制限する。本稿では,既存の蒸留法とシームレスに統合可能な,不確実性推定・識別的知識抽出・知識伝達(UET)と呼ばれる,物体検出の知識の不確実性を有する特徴量に基づく新しい蒸留パラダイムを提案する。モンテカルロのドロップアウト技術を活用することで,学生モデルのトレーニングプロセスに知識の不確実性を導入し,潜伏した知識のより深い探索を容易にする。本手法は,複雑な構造や計算資源を必要とせずに,KDプロセス中に効果的に機能する。大規模実験により, 種々の蒸留方法, 検出器, バックボーンアーキテクチャにまたがる提案手法の有効性が検証された。具体的には、提案したパラダイムに従って、既存のFGD法は最先端(SoTA)性能を実現し、ResNet50ベースのGFLは、COCOデータセット上で44.1%のmAPを達成し、ベースラインを3.9%上回る。 Knowledge distillation (KD) is a widely adopted and effective method for compressing models in object detection tasks. Particularly, feature-based distillation methods have shown remarkable performance. Existing approaches often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training. This limits the student model's ability to learn latent knowledge, as it may overly rely on the teacher's imperfect guidance. In this paper, we propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection, termed "Uncertainty Estimation-Discriminative Knowledge Extraction-Knowledge Transfer (UET)", which can seamlessly integrate with existing distillation methods. By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model, facilitating deeper exploration of latent knowledge. Our method performs effectively during the KD process without requiring intricate structures or extensive computational resources. Extensive experiments validate the effectiveness of our proposed approach across various distillation strategies, detectors, and backbone architectures. Specifically, following our proposed paradigm, the existing FGD method achieves state-of-the-art (SoTA) performance, with ResNet50-based GFL achieving 44.1% mAP on the COCO dataset, surpassing the baselines by 3.9%.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# 大規模言語モデルにおけるテキスト分類における境界曖昧さと継承バイアスの緩和 Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models ( http://arxiv.org/abs/2406.07001v1 ) ライセンス: Link先を確認	Zhenyi Lu, Jie Tian, Wei Wei, Xiaoye Qu, Yu Cheng, Wenfeng xie, Dangyang Chen,	(参考訳) テキスト分類は実践的なシナリオで頻繁に発生する重要な課題であるが、大きな言語モデル(LLM)の時代にはまだ解明されていない。本研究は,LLMがテキスト分類におけるオプションの数や配置の変化に対して脆弱であることを示す。我々の広範な実証分析により、重要なボトルネックは不明瞭な決定境界と特定のトークンや位置に対する固有のバイアスから生じることが明らかになった。これらの問題を緩和するため,LLMのための新しい2段階分類フレームワークを提案する。我々のアプローチは、ペア比較が境界のあいまいさと固有のバイアスを効果的に緩和できるという経験的観察に基づいている。具体的には、多数の選択肢を効率的に絞り込む自己還元手法から始め、決定空間の削減とより高速な比較プロセスに寄与する。その後、相互に対照的な比較がチェーン・オブ・シントで行われ、ニュアンスを引き出し、不確定な選択肢を区別し、曖昧な決定境界を洗練させる。 4つのデータセット(Banking77, HWU64, LIU54, クリニック150)の大規模な実験により, 本フレームワークの有効性が検証された。さらに、我々のフレームワークの利点は、様々なLLMが一貫した改善を達成できることです。我々のコードとデータは \url{https://github.com/Chuge0335/PC-CoT} で利用可能です。 Text classification is a crucial task encountered frequently in practical scenarios, yet it is still under-explored in the era of large language models (LLMs). This study shows that LLMs are vulnerable to changes in the number and arrangement of options in text classification. Our extensive empirical analyses reveal that the key bottleneck arises from ambiguous decision boundaries and inherent biases towards specific tokens and positions. To mitigate these issues, we make the first attempt and propose a novel two-stage classification framework for LLMs. Our approach is grounded in the empirical observation that pairwise comparisons can effectively alleviate boundary ambiguity and inherent bias. Specifically, we begin with a self-reduction technique to efficiently narrow down numerous options, which contributes to reduced decision space and a faster comparison process. Subsequently, pairwise contrastive comparisons are employed in a chain-of-thought manner to draw out nuances and distinguish confusable options, thus refining the ambiguous decision boundary. Extensive experiments on four datasets (Banking77, HWU64, LIU54, and Clinic150) verify the effectiveness of our framework. Furthermore, benefitting from our framework, various LLMs can achieve consistent improvements. Our code and data are available in \url{https://github.com/Chuge0335/PC-CoT}.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# GraphCoder: コードコンテキストグラフベースの検索と言語モデルによるリポジトリレベルのコード補完の強化 GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model ( http://arxiv.org/abs/2406.07003v1 ) ライセンス: Link先を確認	Wei Liu, Ailun Yu, Daoguang Zan, Bo Shen, Wei Zhang, Haiyan Zhao, Zhi Jin, Qianxiang Wang,	(参考訳) リポジトリレベルのコード補完のパフォーマンスは、一般的な知識とリポジトリ固有の知識の両方を効果的に活用することに依存する。一般的なコード補完タスクにおけるLLMの印象的な能力にもかかわらず、レポジトリ固有の知識が欠如しているため、レポジトリレベルのコンプリートではパフォーマンスが不十分であることが多い。この問題に対処するため,グラフベースの検索生成プロセスを通じてLLMの一般的なコード知識とリポジトリ固有の知識を活用する検索拡張コード補完フレームワークであるGraphCoderを提案する。特に、GraphCoderは、コードステートメント間の制御フロー、データ、制御依存性で構成されるコードコンテキストグラフ(CCG)を通じて、補完対象のコンテキストをより正確にキャプチャする。既存の検索拡張アプローチで使用されるシーケンスベースのコンテキストよりも、補完対象のコンテキストをキャプチャする構造的な方法である。 GraphCoderは、ベースライン検索で拡張されたメソッドと比較して、コードマッチングでは+6.06、識別子マッチでは+6.23、時間と空間では+6.23という高い精度のマッチング(EM)を達成する。 The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# DecoR:ロバスト回帰による時系列の分離 DecoR: Deconfounding Time Series with Robust Regression ( http://arxiv.org/abs/2406.07005v1 ) ライセンス: Link先を確認	Felix Schur, Jonas Peters,	(参考訳) 時系列データに対する因果推論は、特に保存されていない共同創設者の存在において、難しい問題である。この研究は、第3の観測されていない時系列によって構成される2つの時系列間の因果効果を推定することに焦点を当てている。共同創設者のスペクトル空間を仮定すると、周波数領域においてこの問題が対向外乱問題としてフレーム化されることを示す。本稿では、周波数領域における頑健な線形回帰を用いて因果効果を推定する新しいアプローチである、ロバスト回帰(DecoR)によるデコンウンディングを導入する。 2つの異なるロバスト回帰手法を考慮し、まず、そのような手法の推定誤差に関する既存の境界を改良する。重要なことに、我々の結果は共変量に対する分布的な仮定を必要としない。そのため、時系列設定で使用できます。これらの結果をDecoRに適用し、適切な仮定の下で、一貫性を示唆するDecoRの推定誤差の上限を証明した。合成データを用いた実験により,DecoRの有効性を示す。さらに,本手法はモデル不特定性に対して頑健であることが示唆された。 Causal inference on time series data is a challenging problem, especially in the presence of unobserved confounders. This work focuses on estimating the causal effect between two time series, which are confounded by a third, unobserved time series. Assuming spectral sparsity of the confounder, we show how in the frequency domain this problem can be framed as an adversarial outlier problem. We introduce Deconfounding by Robust regression (DecoR), a novel approach that estimates the causal effect using robust linear regression in the frequency domain. Considering two different robust regression techniques, we first improve existing bounds on the estimation error for such techniques. Crucially, our results do not require distributional assumptions on the covariates. We can therefore use them in time series settings. Applying these results to DecoR, we prove, under suitable assumptions, upper bounds for the estimation error of DecoR that imply consistency. We show DecoR's effectiveness through experiments on synthetic data. Our experiments furthermore suggest that our method is robust with respect to model misspecification.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# MIPI 2024 RAW画像デノナイズにおける課題:方法と結果 MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results ( http://arxiv.org/abs/2406.07006v1 ) ライセンス: Link先を確認	Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan, Wenhan Luo, Zikun Liu, Mingde Qiao, Junjun Jiang, Kui Jiang, Yao Xiao, Chuyang Sun, Jinhui Hu, Weijian Ruan, Yubo Dong, Kai Chen, Hyejeong Jo, Jiahao Qin, Bingjie Han, Pinle Qin, Rui Chai, Pengyuan Wang,	(参考訳) モバイルプラットフォームでの計算写真や画像の需要が増大し、カメラシステムにおける高度な画像センサと新しいアルゴリズムの広範な開発と統合がもたらされた。しかし、研究のための高品質なデータの不足と、産業や学界からの深い見解交換の機会は、モバイル・インテリジェント・フォトグラフィー・イメージング(MIPI)の開発を妨げている。我々は,ECCV 2022とCVPR 2023で行われたMIPIワークショップの成果に基づいて,新しい画像センサと撮像アルゴリズムに着目した3つのトラックを含む第3回MIPIチャレンジを紹介した。本稿では,MIPI 2024のFew-shot RAW Image Denoising Trackについて概説する。合計で165人の参加者が登録され、7チームが最終テストフェーズで結果を提出した。この課題で開発されたソリューションは、Few-shot RAW Image Denoisingにおける最先端の性能を達成した。この課題の詳細とデータセットへのリンクはhttps://mipichallenge.org/MIPI2024.comで見ることができる。 The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Few-shot RAW Image Denoising track on MIPI 2024. In total, 165 participants were successfully registered, and 7 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art erformance on Few-shot RAW Image Denoising. More details of this challenge and the link to the dataset can be found at https://mipichallenge.org/MIPI2024.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# Crayon: Instant Adapter BlendingとEdge-Server Hybrid Inferenceによるオンデバイス LLM のカスタマイズ Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference ( http://arxiv.org/abs/2406.07007v1 ) ライセンス: Link先を確認	Jihwan Bang, Juntae Lee, Kyuhong Shim, Seunghan Yang, Simyung Chang,	(参考訳) ユーザ指定タスクに対する大規模言語モデル(LLM)のカスタマイズが重要になる。しかしながら、クラウドサーバ上でカスタマイズされたLLMをすべて維持することは、メモリと計算上のオーバーヘッドを大幅に増加させ、ユーザデータをアップロードしてもプライバシー上の懸念につながる可能性がある。オンデバイスLSMは、これらの問題を緩和することで、有望なソリューションを提供することができる。しかし、オンデバイスLCMの性能は、小規模モデルの限界によって本質的に制限されている。これらの制約を克服するために、私たちはまず、デバイス上でのLCMカスタマイズのための新しいアプローチであるCryonを提案する。 Crayonはまず、多様なベースアダプタのプールを構築し、その後すぐにそれを、余分なトレーニングなしでカスタマイズされたアダプタにブレンドします。さらに、より要求の多いクエリや非カスタマイズタスクをサーバ上のより大きな、より有能なLCMに確実に割り当てるデバイスサーバハイブリッド推論戦略を開発する。これにより、デバイス上のカスタマイズのメリットを犠牲にすることなく、最適なパフォーマンスが保証される。複数の質問応答データセットから新しいベンチマークを慎重に作成し,LLMのカスタマイズにおける手法の有効性を示す。 The customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, and uploading user data can also lead to privacy concerns. On-device LLMs can offer a promising solution by mitigating these issues. Yet, the performance of on-device LLMs is inherently constrained by the limitations of small-scaled models. To overcome these restrictions, we first propose Crayon, a novel approach for on-device LLM customization. Crayon begins by constructing a pool of diverse base adapters, and then we instantly blend them into a customized adapter without extra training. In addition, we develop a device-server hybrid inference strategy, which deftly allocates more demanding queries or non-customized tasks to a larger, more capable LLM on a server. This ensures optimal performance without sacrificing the benefits of on-device customization. We carefully craft a novel benchmark from multiple question-answer datasets, and show the efficacy of our method in the LLM customization.	翻訳日:2024-06-12 17:13:54 公開日:2024-06-11
# 眼の目:拡散モデルにおける意味的対応による出現伝達 Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models ( http://arxiv.org/abs/2406.07008v1 ) ライセンス: Link先を確認	Sooyeon Go, Kyungmook Choi, Minjung Shin, Youngjung Uh,	(参考訳) 事前訓練されたテキスト・画像拡散モデルが画像合成の有用なツールとなったため、様々な方法で結果の特定が望まれる。本稿では,対象画像の同じ構造を持つ結果を生成する手法を提案する。例えば、結果翼は基準主翼から色を取り、基準主翼ではない。既存のメソッドは、自己アテンション層内のクエリキーの類似性に依存し、通常は欠陥のある結果を生成する。そこで本研究では,意味的対応を見つけ,意味的対応に従って特徴を明示的に並べ替えることを提案する。対象の構造を保存し、2つの画像が整列していない場合でも、意味的対応に従って参照から色を反映するなど、様々な面で本手法の優位性を示す。 As pretrained text-to-image diffusion models have become a useful tool for image synthesis, people want to specify the results in various ways. In this paper, we introduce a method to produce results with the same structure of a target image but painted with colors from a reference image, i.e., appearance transfer, especially following the semantic correspondence between the result and the reference. E.g., the result wing takes color from the reference wing, not the reference head. Existing methods rely on the query-key similarity within self-attention layer, usually producing defective results. To this end, we propose to find semantic correspondences and explicitly rearrange the features according to the semantic correspondences. Extensive experiments show the superiority of our method in various aspects: preserving the structure of the target and reflecting the color from the reference according to the semantic correspondences, even when the two images are not aligned.	翻訳日:2024-06-12 17:04:10 公開日:2024-06-11
# ゆらぎのある虚ゲージ場における位相相転移 Topological phase transition in fluctuating imaginary gauge fields ( http://arxiv.org/abs/2406.07009v1 ) ライセンス: Link先を確認	Bikashkali Midya,	(参考訳) 非エルミート格子モデルにおける正確な可解性と点ギャップ位相遷移について検討する。これらのモデルは、サイト依存の非相互ホッピング$J e^{\pm g_n}$を含み、空間的に変動する虚ゲージ場$ig_n \hat~x$によって促進される。適切な虚ゲージ変換を用いることで、任意の$g_n$で特徴づけられる格子は、開境界条件下で、場の格子のない格子とスペクトル的に等価であることが明らかとなる。さらに、閉境界を持つシステムは、一様平均場 $i\bar{g}\hat~x$ を特徴とするスペクトル的に等価な格子に単純化することができる。この枠組みは、結合非周期格子に対するスペクトルトポロジカル不変性と関連する境界局在化現象を解析的に予測する包括的手法を提供する。これらの予測はゲージ変換したアイソスペクトル周期格子を解析することによって行われる。特に、準周期的な$g_n= \ln \|\lambda \cos 2\pi \alpha n\|$ と不合理な$\alpha$ を持つ格子に対して、以前は知られていなかった位相位相遷移が明らかにされる。トポロジカルスペクトル指数$W$は、$-N$または$+N$の値を仮定し、すべての$N$開境界固有状態が右端か左端に局在し、ゲージ場の強度にのみ依存する。位相遷移は、すべての固有状態が非局在化される臨界点$\lambda\approx2$で特定される。この理論は長距離ホッピングモデルや高次元に関係があることが示されている。 We investigate the exact solvability and point-gap topological phase transitions in non-Hermitian lattice models. These models incorporate site-dependent nonreciprocal hoppings $J e^{\pm g_n}$, facilitated by a spatially fluctuating imaginary gauge field $ig_n \hat~x$ that disrupts translational symmetry. By employing suitable imaginary gauge transformations, it is revealed that a lattice characterized by any given $g_n$ is spectrally equivalent to a lattice devoid of fields, under open boundary conditions. Furthermore, a system with closed boundaries can be simplified to a spectrally equivalent lattice featuring a uniform mean field $i\bar{g}\hat~x$. This framework offers a comprehensive method for analytically predicting spectral topological invariance and associated boundary localization phenomena for bond-disordered nonperiodic lattices. These predictions are made by analyzing gauge-transformed isospectral periodic lattices. Notably, for a lattice with quasiperiodic $g_n= \ln \|\lambda \cos 2\pi \alpha n\|$ and an irrational $\alpha$, a previously unknown topological phase transition is unveiled. It is observed that the topological spectral index $W$ assumes values of $-N$ or $+N$, leading to all $N$ open-boundary eigenstates localizing either at the right or left edge, solely dependent on the strength of the gauge field, where $\lambda<2$ or $\lambda>2$. A phase transition is identified at the critical point $\lambda\approx2$, at which all eigenstates undergo delocalization. The theory has been shown to be relevant for long-range hopping models and for higher dimensions.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# 自由を破る:非協力的仮定なしで効率的な多人数のプライベート・セット・ユニオン Breaking Free: Efficient Multi-Party Private Set Union Without Non-Collusion Assumptions ( http://arxiv.org/abs/2406.07011v1 ) ライセンス: Link先を確認	Minglang Dong, Yu Chen, Cong Zhang, Yujie Bai,	(参考訳) マルチパーティ・プライベート・セット・ユニオン(MPSU)プロトコルでは、$m$$(m > 2)$パーティがそれぞれセットを持っていて、他のパーティに追加情報を公開することなく、セットのユニオンをまとめて計算することができる。 MPSUプロトコルには2つの主要なカテゴリがある。このカテゴリの既存のすべての作業は、超直線的な公開鍵操作を含み、結果として実用的効率が低下する。 2つ目は、暗黙の転送と対称キー技術に基づくものである。このカテゴリにおける唯一の既存の研究は、Liu and Gao (ASIACRYPT 2023) によって提案されている。残念なことに、これは通常の半正直なセキュリティを達成しない。したがって、標準的な半真性モデルにおいて、暗黙の転送と対称鍵技術に基づく実用的なMPSUプロトコルを構築するという問題は未解決のままである。さらに,線形計算と線形通信の複雑さを両立させるMPSUプロトコルは存在しない。本稿では、これらの2つの未解決問題を解決する。本稿では,標準半高次モデルにおいて,暗黙の転送と対称鍵技術に基づく最初のMPSUプロトコルを提案する。このプロトコルは、LAN設定でLiuやGaoよりも高速な4.9-9.3 \timesである。具体的には、当社のプロトコルはオンラインフェーズでわずか3.6ドル秒で、それぞれ2〜20ドルのアイテムがセットされている。公開鍵演算に基づく線形計算と線形通信の複雑さを両立させる最初のMPSUプロトコルを提案する。このプロトコルは通信コストが低く、Liu や Gao と比較すると、通信コストが3.0-36.5 倍になる。 Multi-party private set union (MPSU) protocol enables $m$ $(m > 2)$ parties, each holding a set, to collectively compute the union of their sets without revealing any additional information to other parties. There are two main categories of MPSU protocols: The first builds on public-key techniques. All existing works in this category involve a super-linear number of public-key operations, resulting in poor practical efficiency. The second builds on oblivious transfer and symmetric-key techniques. The only existing work in this category is proposed by Liu and Gao (ASIACRYPT 2023), which features the best concrete performance among all existing protocols, despite its super-linear computation and communication. Unfortunately, it does not achieve the standard semi-honest security, as it inherently relies on a non-collusion assumption, which is unlikely to hold in practice. Therefore, the problem of constructing a practical MPSU protocol based on oblivious transfer and symmetric-key techniques in standard semi-honest model remains open. Furthermore, there is no MPSU protocol achieving both linear computation and linear communication complexity, which leaves another unresolved problem. In this work, we resolve these two open problems. We propose the first MPSU protocol based on oblivious transfer and symmetric-key techniques in the standard semi-honest model. This protocol is $4.9-9.3 \times$ faster than Liu and Gao in the LAN setting. Concretely, our protocol requires only $3.6$ seconds in online phase for 3 parties with sets of $2^{20}$ items each. We propose the first MPSU protocol achieving both linear computation and linear communication complexity, based on public-key operations. This protocol has the lowest overall communication costs and shows a factor of $3.0-36.5\times$ improvement in terms of overall communication compared to Liu and Gao.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# 音声テキスト検索におけるブリッジング言語ギャップ Bridging Language Gaps in Audio-Text Retrieval ( http://arxiv.org/abs/2406.07012v1 ) ライセンス: Link先を確認	Zhiyong Yan, Heinrich Dinkel, Yongqing Wang, Jizhong Liu, Junbo Zhang, Yujun Wang, Bin Wang,	(参考訳) 音声テキスト検索は難しい作業であり、データベース内で音声クリップやテキストキャプションを検索する必要がある。英語記述に関する既存の研究の主な焦点は、実世界のデータに非英語コンテンツが豊富に存在することを考えると、そのようなモデルの適用性に制限を課している。これらの言語格差に対処するため,多言語テキストエンコーダ(SONAR)を用いて言語固有の情報でテキストデータを符号化する言語拡張(LE)を提案する。さらに、一貫したアンサンブル蒸留(CED)を適用してオーディオエンコーダを最適化し、可変長音声テキスト検索のサポートを強化する。提案手法は,AudioCaps や Clotho などの一般的なデータセット上でのSOTA (State-of-the-art) の性能を示す,英語の音声テキスト検索に優れている。同時に、この手法は、追加の言語強化トレーニングデータの10%しか持たない、他の7つの言語でのコンテンツ検索の習熟度を示し、有望な結果をもたらす。ソースコードはhttps://github.com/zyyan4/ml-clap.comで公開されている。 Audio-text retrieval is a challenging task, requiring the search for an audio clip or a text caption within a database. The predominant focus of existing research on English descriptions poses a limitation on the applicability of such models, given the abundance of non-English content in real-world data. To address these linguistic disparities, we propose a language enhancement (LE), using a multilingual text encoder (SONAR) to encode the text data with language-specific information. Additionally, we optimize the audio encoder through the application of consistent ensemble distillation (CED), enhancing support for variable-length audio-text retrieval. Our methodology excels in English audio-text retrieval, demonstrating state-of-the-art (SOTA) performance on commonly used datasets such as AudioCaps and Clotho. Simultaneously, the approach exhibits proficiency in retrieving content in seven other languages with only 10% of additional language-enhanced training data, yielding promising results. The source code is publicly available https://github.com/zyyan4/ml-clap.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# 過剰語彙による書字におけるChatGPTの活用 Delving into ChatGPT usage in academic writing through excess vocabulary ( http://arxiv.org/abs/2406.07016v1 ) ライセンス: Link先を確認	Dmitry Kobak, Rita González Márquez, Emőke-Ágnes Horvát, Jan Lause,	(参考訳) 最近の大規模言語モデル(LLM)は、人間レベルのパフォーマンスでテキストを生成・修正することができ、ChatGPTのようなシステムで広く商業化されている。これらのモデルには明確な制限があり、不正確な情報を生成し、既存のバイアスを強化し、簡単に誤用できる。しかし、多くの科学者が学術的な執筆を支援するためにそれを使ってきた。学術文献におけるLLMの利用状況についてこの問いに答えるために、学術的なLLMの使用に関する仮定を含まない、偏見のない大規模アプローチを用いる。 2010年から2024年までの1400万のPubMed抽象語の語彙変化について検討し、LLMの出現がある種の単語の出現頻度の急激な増加につながったことを示す。以上の結果から,2024の抽象語のうち少なくとも10%はLLMで処理されていたことが示唆された。この下限は分野、国、雑誌によって異なり、PubMedサブコーポラの30%にも達した。我々は,LLMをベースとした筆記助手の出現が,コビッドパンデミックなどの世界大イベントの影響を超越し,科学文献に前例のない影響を与えていることを示す。 Recent large language models (LLMs) can generate and revise text with human-level performance, and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How wide-spread is LLM usage in the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on academic LLM usage. We study vocabulary changes in 14 million PubMed abstracts from 2010-2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. Our analysis based on excess words usage suggests that at least 10% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, and was as high as 30% for some PubMed sub-corpora. We show that the appearance of LLM-based writing assistants has had an unprecedented impact in the scientific literature, surpassing the effect of major world events such as the Covid pandemic.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# MoreauPruner: 重度摂動に対する大規模言語モデルのロバストプルーニング MoreauPruner: Robust Pruning of Large Language Models against Weight Perturbations ( http://arxiv.org/abs/2406.07017v1 ) ライセンス: Link先を確認	Zixiao Wang, Jingwei Zhang, Wenqian Zhao, Farzan Farnia, Bei Yu,	(参考訳) モデルの重みは静的な値と見なされ、潜在的な重みの摂動の影響は考慮されない。しかし、広く使われている大規模言語モデル(LLM)は、数十億のモデルパラメータを持ち、数発の勾配プルーニングの脆弱性を高める可能性がある。本研究では, モデル重みに対する摂動下での一発勾配解析アルゴリズムが不安定な結果をもたらす可能性を実験的に示す。そして、データフォーマットbfloat16とfloat16を切り替えるという小さなエラーは、大きく異なる結果をもたらす可能性がある。このような不安定性に対処するために、最適化解析を活用し、重量摂動に対する頑健性を示す、MoreauPrunerと呼ばれるLCM構造解析手法を提案する。 MoreauPrunerでは、ニューラルネットワークのMoreauエンベロープに基づいてモデルの重み重みを推定する。我々は、LLaMA-7B、LLaMA-13B、LLaMA3-8B、Vicuna-7Bなど、よく知られたLLM上でMoreauPrunerアルゴリズムを広範囲に評価した。以上の結果から,MoreauPrunerの重量摂動に対する頑健さが示唆され,MoreauPrunerの精度に基づくスコアが既存プルーニング法と比較された。私たちは、コードを \url{https://github.com/ShiningSord/MoreauPruner} でリリースしました。 Few-shot gradient methods have been extensively utilized in existing model pruning methods, where the model weights are regarded as static values and the effects of potential weight perturbations are not considered. However, the widely used large language models (LLMs) have several billion model parameters, which could increase the fragility of few-shot gradient pruning. In this work, we experimentally show that one-shot gradient pruning algorithms could lead to unstable results under perturbations to model weights. And the minor error of switching between data formats bfloat16 and float16 could result in drastically different outcomes. To address such instabilities, we leverage optimization analysis and propose an LLM structural pruning method, called MoreauPruner, with provable robustness against weight perturbations. In MoreauPruner, the model weight importance is estimated based on the neural network's Moreau envelope, which can be flexibly combined with $\ell_1$-norm regularization techniques to induce the sparsity required in the pruning task. We extensively evaluate the MoreauPruner algorithm on several well-known LLMs, including LLaMA-7B, LLaMA-13B, LLaMA3-8B, and Vicuna-7B. Our numerical results suggest the robustness of MoreauPruner against weight perturbations, and indicate the MoreauPruner's successful accuracy-based scores in comparison to several existing pruning methods. We have released the code in \url{https://github.com/ShiningSord/MoreauPruner}.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# テンソルランク条件付き離散潜在変数構造を学習する Learning Discrete Latent Variable Structures with Tensor Rank Conditions ( http://arxiv.org/abs/2406.07020v1 ) ライセンス: Link先を確認	Zhengming Chen, Ruichu Cai, Feng Xie, Jie Qiao, Anpeng Wu, Zijian Li, Zhifeng Hao, Kun Zhang,	(参考訳) 観測されていない離散データは、多くの科学分野においてユビキタスであり、これらの潜伏変数の因果構造を学習する方法は、データパターンを明らかにするために不可欠である。ほとんどの研究は線形潜在変数モデルに焦点を当てたり、非線型関係や複素潜在構造を含む離散データにおけるケースに対処できない潜在構造に厳密な制約を課す。これを達成するために、観測変数集合 $\mathbf{X}_p$ に対してテンソル階数条件を探索し、階数が $\mathbf{X}_p$ 内のすべての変数を d-分離する特定の条件集合 ($\mathbf{X}_p$ では不要) の最小サポートによって決定されることを示す。これにより、異なる観測変数集合上のランクを探索することで潜伏変数を特定でき、さらにいくつかの構造仮定の下で潜伏因果構造を特定できる。本手法の有効性を検証するため, 対応する同定アルゴリズムを提案し, シミュレーション実験を行った。一般に,本研究の結果は,個別潜伏変数による因果発見のための識別境界をエレガントに拡張し,潜伏変数による因果発見の適用範囲を拡大する。 Unobserved discrete data are ubiquitous in many scientific disciplines, and how to learn the causal structure of these latent variables is crucial for uncovering data patterns. Most studies focus on the linear latent variable model or impose strict constraints on latent structures, which fail to address cases in discrete data involving non-linear relationships or complex latent structures. To achieve this, we explore a tensor rank condition on contingency tables for an observed variable set $\mathbf{X}_p$, showing that the rank is determined by the minimum support of a specific conditional set (not necessary in $\mathbf{X}_p$) that d-separates all variables in $\mathbf{X}_p$. By this, one can locate the latent variable through probing the rank on different observed variables set, and further identify the latent causal structure under some structure assumptions. We present the corresponding identification algorithm and conduct simulated experiments to verify the effectiveness of our method. In general, our results elegantly extend the identification boundary for causal discovery with discrete latent variables and expand the application scope of causal discovery with latent variables.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# 大規模言語モデルを用いたテストケースシナリオ生成ツール A Tool for Test Case Scenarios Generation Using Large Language Models ( http://arxiv.org/abs/2406.07021v1 ) ライセンス: Link先を確認	Abdul Malik Sami, Zeeshan Rasheed, Muhammad Waseem, Zheying Zhang, Herda Tomas, Pekka Abrahamsson,	(参考訳) 大規模言語モデル(LLM)は、コードの生成、ソフトウェアの設計と文書化、コードコメントの追加、コードレビュー、テストスクリプトの記述など、様々なタスクでソフトウェア工学(SE)で広く使われている。しかし、テストスクリプトの作成やテストケースの自動化には、機能要件を包括的にカバーするテストスイートのドキュメントが必要である。このようなドキュメントは、特に要求とユーザ要求が進化するにつれて、制約されたスコープとタイムフレーム内で徹底的なテストを可能にする必要があります。この記事では、エピックやハイレベルなユーザストーリーとしてユーザ要求を生成し、これらのストーリーに基づいてテストケースシナリオを作成することに焦点を当てます。 LLMベースのエージェントを採用し、ユーザ要求に対するテストケースシナリオの自動生成をエンジニアリングに促す、Webベースのソフトウェアツールを紹介している。 Large Language Models (LLMs) are widely used in Software Engineering (SE) for various tasks, including generating code, designing and documenting software, adding code comments, reviewing code, and writing test scripts. However, creating test scripts or automating test cases demands test suite documentation that comprehensively covers functional requirements. Such documentation must enable thorough testing within a constrained scope and timeframe, particularly as requirements and user demands evolve. This article centers on generating user requirements as epics and high-level user stories and crafting test case scenarios based on these stories. It introduces a web-based software tool that employs an LLM-based agent and prompt engineering to automate the generation of test case scenarios against user requirements.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# LiSD:LiDARセグメンテーションと検出のための効率的なマルチタスク学習フレームワーク LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection ( http://arxiv.org/abs/2406.07023v1 ) ライセンス: Link先を確認	Jiahua Xu, Si Zuo, Chenfeng Wei, Wei Zhou,	(参考訳) 自動運転の急速な普及に伴い、ライダーベースの3Dセマンティックセグメンテーションとオブジェクト検出手法の研究に焦点が当てられ、交通参加者の安全確保が図られている。近年、学習に基づくアプローチが出現し、従来のアルゴリズムと比較して顕著なパフォーマンス向上が見られた。しかし、分割と検出のタスクは、伝統的に最高の精度を達成するために、分離して検討されてきた。そこで本研究では,分割処理と検出処理の両方に対応可能なLiSDというマルチタスク学習フレームワークを提案する。提案するLiSDはボクセルベースのエンコーダデコーダフレームワークである。セグメンテーションにおける空間性を維持するために異なる統合手法が採用され、検出時のクエリ初期化のための機能を強化している。さらに、クロスタスク情報をインスタンス対応リファインメントモジュールで利用して、より正確な予測を得る。 nuScenesデータセットとWaymo Open Datasetの実験結果から,提案モデルの有効性が示された。 LiSDは、lidar-onlyメソッドのnuScenesセグメンテーションベンチマークにおいて、83.3% mIoUの最先端のパフォーマンスを達成することに注意する必要がある。 With the rapid proliferation of autonomous driving, there has been a heightened focus on the research of lidar-based 3D semantic segmentation and object detection methodologies, aiming to ensure the safety of traffic participants. In recent decades, learning-based approaches have emerged, demonstrating remarkable performance gains in comparison to conventional algorithms. However, the segmentation and detection tasks have traditionally been examined in isolation to achieve the best precision. To this end, we propose an efficient multi-task learning framework named LiSD which can address both segmentation and detection tasks, aiming to optimize the overall performance. Our proposed LiSD is a voxel-based encoder-decoder framework that contains a hierarchical feature collaboration module and a holistic information aggregation module. Different integration methods are adopted to keep sparsity in segmentation while densifying features for query initialization in detection. Besides, cross-task information is utilized in an instance-aware refinement module to obtain more accurate predictions. Experimental results on the nuScenes dataset and Waymo Open Dataset demonstrate the effectiveness of our proposed model. It is worth noting that LiSD achieves the state-of-the-art performance of 83.3% mIoU on the nuScenes segmentation benchmark for lidar-only methods.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# 薬物発見のための大規模言語モデルを用いたエントロピー強化計画 Entropy-Reinforced Planning with Large Language Models for Drug Discovery ( http://arxiv.org/abs/2406.07025v1 ) ライセンス: Link先を確認	Xuefeng Liu, Chih-chan Tien, Peng Ding, Songhao Jiang, Rick L. Stevens,	(参考訳) 薬物発見の目的は、特定の医薬特性を有する化合物を結合標的に向けて同定することである。既存の大規模言語モデル(LLMS)は、分子生成の可能性の観点から高いトークンマッチングスコアを得ることができる。しかし、LSMの復号化のみに依存すると、単一の誤用トークンによる無効な分子の生成や、LSMの以前の経験による不均衡な探索とエクスプロイトによる準最適分子の生成が生じることが多い。本稿では, エントロピー強化型トランスフォーマーデコーディングのためのERP, Entropy-Reinforced Planning for Transformer Decodingを提案する。 ERPはTransformerから直接サンプリングするよりも、複数のプロパティの改善を目指している。我々はSARS-CoV-2ウイルス (3CLPro) とヒト癌細胞標的タンパク質 (RTCB) のベンチマークでERPを評価し,両ベンチマークとも,ERPは現状のアルゴリズムを1～55%,ベースラインを5～10パーセント上回っていることを示した。さらに、この改善は、異なる目的でトレーニングされたTransformerモデル間で堅牢である。最後に、ERPの機能をさらに説明するために、私たちはアルゴリズムを3つのコード生成ベンチマークでテストし、現在の最先端アプローチよりも優れています。私たちのコードは、https://github.com/xuefeng-cs/ERP.comで公開されています。 The objective of drug discovery is to identify chemical compounds that possess specific pharmaceutical properties toward a binding target. Existing large language models (LLMS) can achieve high token matching scores in terms of likelihood for molecule generation. However, relying solely on LLM decoding often results in the generation of molecules that are either invalid due to a single misused token, or suboptimal due to unbalanced exploration and exploitation as a consequence of the LLMs prior experience. Here we propose ERP, Entropy-Reinforced Planning for Transformer Decoding, which employs an entropy-reinforced planning algorithm to enhance the Transformer decoding process and strike a balance between exploitation and exploration. ERP aims to achieve improvements in multiple properties compared to direct sampling from the Transformer. We evaluated ERP on the SARS-CoV-2 virus (3CLPro) and human cancer cell target protein (RTCB) benchmarks and demonstrated that, in both benchmarks, ERP consistently outperforms the current state-of-the-art algorithm by 1-5 percent, and baselines by 5-10 percent, respectively. Moreover, such improvement is robust across Transformer models trained with different objectives. Finally, to further illustrate the capabilities of ERP, we tested our algorithm on three code generation benchmarks and outperformed the current state-of-the-art approach as well. Our code is publicly available at: https://github.com/xuefeng-cs/ERP.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# 長期データセットを用いたニューラルネットワーク探索のための不均一学習率スケジューリング Heterogeneous Learning Rate Scheduling for Neural Architecture Search on Long-Tailed Datasets ( http://arxiv.org/abs/2406.07028v1 ) ライセンス: Link先を確認	Chenxia Tang,	(参考訳) 本稿では,ニューラルネットワーク探索(NAS)アルゴリズム,特に微分可能なアーキテクチャ探索(DARTS)を,クラス分布が高度に不均衡な長いデータセットに適用することの課題に対処する。従来の再サンプリングおよび再重み付け技術は,標準分類タスクに有効であり,DARTSと組み合わせることで性能劣化を招いた。そこで本研究では,不均衡なデータセットを扱うために,バイラテラル分岐ネットワーク(BBN)と統合されたDARTSのアーキテクチャパラメータに適した適応型学習率スケジューリング手法を提案する。提案手法は,訓練の後期において,訓練の時期に応じて,アーキテクチャパラメータの学習率を動的に調整し,よく訓練された表現の破壊を防止する。さらに,アルゴリズムの性能に及ぼす分岐混合因子の影響についても検討する。 CIFAR-10データセットの長期分布を用いた広範囲な実験により,本手法がDARTSに匹敵する精度を達成できることを実証した。実験結果から,再サンプリング法はDARTSアルゴリズムの性能を本質的に損なうことが示唆された。そこで本研究では,DNASを不均衡な学習シナリオに適用する際の注意的データ拡張の重要性を強調した。 In this paper, we attempt to address the challenge of applying Neural Architecture Search (NAS) algorithms, specifically the Differentiable Architecture Search (DARTS), to long-tailed datasets where class distribution is highly imbalanced. We observe that traditional re-sampling and re-weighting techniques, which are effective in standard classification tasks, lead to performance degradation when combined with DARTS. To mitigate this, we propose a novel adaptive learning rate scheduling strategy tailored for the architecture parameters of DARTS when integrated with the Bilateral Branch Network (BBN) for handling imbalanced datasets. Our approach dynamically adjusts the learning rate of the architecture parameters based on the training epoch, preventing the disruption of well-trained representations in the later stages of training. Additionally, we explore the impact of branch mixing factors on the algorithm's performance. Through extensive experiments on the CIFAR-10 dataset with an artificially induced long-tailed distribution, we demonstrate that our method achieves comparable accuracy to using DARTS alone. And the experiment results suggest that re-sampling methods inherently harm the performance of the DARTS algorithm. Our findings highlight the importance of careful data augment when applying DNAS to imbalanced learning scenarios.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# ナッシュバーゲティングによるフェアネスを考慮したメタラーニング Fairness-Aware Meta-Learning via Nash Bargaining ( http://arxiv.org/abs/2406.07029v1 ) ライセンス: Link先を確認	Yi Zeng, Xuelin Yang, Li Chen, Cristian Canton Ferrer, Ming Jin, Michael I. Jordan, Ruoxi Jia,	(参考訳) 機械学習におけるグループレベルの公平性の問題に対処するために、特定の公平性目標に基づいてモデルパラメータを調整することが自然である。このような調整手順をメタラーニングフレームワーク内にキャストすることができる。しかし、メタラーニングによる公平性目標の自然な統合は、サブグループの過度な対立を引き起こし、不安定な収束とモデル性能と公正性の妥協をもたらす。この問題をナビゲートするために,マルチプレイヤー協調交渉ゲームとして,過度な競合の解消を図った。 2段階のメタラーニングフレームワークを導入し、第1段階は、過度な矛盾を解消し、パレートフロントに向けてモデルを操り、第2段階は特定の公正性目標に関して最適化する、ナッシュバーゲティングソリューション(NBS)を使用する。提案手法は理論的な結果,特に線形独立仮定から解放された勾配凝集のNBSの証明,パレート改善の証明,検証損失の単調改善の証明によって支持される。また、6つのキーフェアネスデータセットと2つの画像分類タスクにおいて、様々なフェアネス目標に対して経験的効果を示す。 To address issues of group-level fairness in machine learning, it is natural to adjust model parameters based on specific fairness objectives over a sensitive-attributed validation set. Such an adjustment procedure can be cast within a meta-learning framework. However, naive integration of fairness goals via meta-learning can cause hypergradient conflicts for subgroups, resulting in unstable convergence and compromising model performance and fairness. To navigate this issue, we frame the resolution of hypergradient conflicts as a multi-player cooperative bargaining game. We introduce a two-stage meta-learning framework in which the first stage involves the use of a Nash Bargaining Solution (NBS) to resolve hypergradient conflicts and steer the model toward the Pareto front, and the second stage optimizes with respect to specific fairness goals. Our method is supported by theoretical results, notably a proof of the NBS for gradient aggregation free from linear independence assumptions, a proof of Pareto improvement, and a proof of monotonic improvement in validation loss. We also show empirical effects across various fairness objectives in six key fairness datasets and two image classification tasks.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# RS-DFM: 下流タスクをリモートセンシングする分散ファンデーションモデル RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks ( http://arxiv.org/abs/2406.07032v1 ) ライセンス: Link先を確認	Zhechao Wang, Peirui Cheng, Pengju Tian, Yuchao Wang, Mingxin Chen, Shujing Duan, Zhirui Wang, Xinming Li, Xian Sun,	(参考訳) リモートセンシング軽量基盤モデルは、リモートセンシングにおけるオンライン認識において顕著な成功を収めた。しかし、それらの能力は、自身の観測とモデルのみに基づいてオンライン推論を実行することに限定されており、大規模なリモートセンシングシナリオの包括的理解が欠如している。この制限を克服するために,汎用情報マッピングとインタラクションに基づくリモートセンシング分散ファンデーションモデル(RS-DFM)を提案する。このモデルは、観測結果を統一された空間にマッピングし、タスクに依存しない情報インタラクション戦略を実装することで、複数のプラットフォームおよび様々な下流タスク間での協調認識を実現することができる。具体的には、リモートセンシング斜め観測の地表面の幾何学的先行を利用して、特徴マッピングを絶対深度推定から相対深度推定に変換し、様々な高さと視点で一般化された特徴を抽出する能力を向上させる。さらに,高頻度・低周波の特徴情報を分離し,重要なタスク非依存の詳細を保存しつつ,特徴レベルの圧縮を実現するためのデュアルブランチ情報圧縮モジュールを提案する。本研究では,マルチUAV共同観測のためのマルチタスクシミュレーションデータセットAirCo-MultiTasksを開発した。また,3次元物体検出,インスタンスセグメンテーション,軌道予測など,広範囲にわたる実験を行った。多数の結果から,我々のRS-DFMは,様々なダウンストリームタスクにおける最先端性能を実現していることが示された。 Remote sensing lightweight foundation models have achieved notable success in online perception within remote sensing. However, their capabilities are restricted to performing online inference solely based on their own observations and models, thus lacking a comprehensive understanding of large-scale remote sensing scenarios. To overcome this limitation, we propose a Remote Sensing Distributed Foundation Model (RS-DFM) based on generalized information mapping and interaction. This model can realize online collaborative perception across multiple platforms and various downstream tasks by mapping observations into a unified space and implementing a task-agnostic information interaction strategy. Specifically, we leverage the ground-based geometric prior of remote sensing oblique observations to transform the feature mapping from absolute depth estimation to relative depth estimation, thereby enhancing the model's ability to extract generalized features across diverse heights and perspectives. Additionally, we present a dual-branch information compression module to decouple high-frequency and low-frequency feature information, achieving feature-level compression while preserving essential task-agnostic details. In support of our research, we create a multi-task simulation dataset named AirCo-MultiTasks for multi-UAV collaborative observation. We also conduct extensive experiments, including 3D object detection, instance segmentation, and trajectory prediction. The numerous results demonstrate that our RS-DFM achieves state-of-the-art performance across various downstream tasks.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# 文脈認識型クエリ表現学習による知識グラフのマルチホップ論理推論の改善 Improving Multi-hop Logical Reasoning in Knowledge Graphs with Context-Aware Query Representation Learning ( http://arxiv.org/abs/2406.07034v1 ) ライセンス: Link先を確認	Jeonghoon Kim, Heesoo Jung, Hyeju Jang, Hogun Park,	(参考訳) 知識グラフに対するマルチホップ論理的推論は自然言語処理において重要な課題であり、FOL(First-Order Logic)クエリに答えるための多くのアプローチがある。最近の幾何学(例えば、ボックス、コーン)と確率(例えば、ベータ分布)に基づく方法論は、複雑なFOLクエリに効果的に対処している。しかし、これらの手法に共通する課題は、これらのクエリの正確な幾何学的境界や確率パラメータを決定することである。この課題は、既存の手法が、クエリの論理構造と、クエリのコンテキストと呼ばれるクエリの関係から得られる関係性によって引き起こされる情報を見渡すことで、計算グラフ内の線形なシーケンシャルな操作に依存しているためである。この問題を解決するために、FOLクエリグラフのコンテキストを完全に統合することにより、既存のマルチホップ論理推論手法の有効性を高めるモデルに依存しない手法を提案する。提案手法は,(1)クエリ構造に固有の構造的コンテキスト,(2)クエリグラフの各ノードに固有の関係的コンテキストを,対応する知識グラフに記述したものとして識別する。このデュアルコンテキストパラダイムは、クエリグラフ内のノードがマルチホップ推論ステップ全体を通して洗練された内部表現を実現するのに役立つ。 2つのデータセットの実験を通じて、我々の手法は3つのマルチホップ推論基盤モデルを一貫して強化し、最大19.5%の性能向上を実現した。私たちのコードはhttps://github.com/kjh9503/caqr.comから入手可能です。 Multi-hop logical reasoning on knowledge graphs is a pivotal task in natural language processing, with numerous approaches aiming to answer First-Order Logic (FOL) queries. Recent geometry (e.g., box, cone) and probability (e.g., beta distribution)-based methodologies have effectively addressed complex FOL queries. However, a common challenge across these methods lies in determining accurate geometric bounds or probability parameters for these queries. The challenge arises because existing methods rely on linear sequential operations within their computation graphs, overlooking the logical structure of the query and the relation-induced information that can be gleaned from the relations of the query, which we call the context of the query. To address the problem, we propose a model-agnostic methodology that enhances the effectiveness of existing multi-hop logical reasoning approaches by fully integrating the context of the FOL query graph. Our approach distinctively discerns (1) the structural context inherent to the query structure and (2) the relation-induced context unique to each node in the query graph as delineated in the corresponding knowledge graph. This dual-context paradigm helps nodes within a query graph attain refined internal representations throughout the multi-hop reasoning steps. Through experiments on two datasets, our method consistently enhances the three multi-hop reasoning foundation models, achieving performance improvements of up to 19.5%. Our code is available at https://github.com/kjh9503/caqr.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# ソースコンテキストにもっと注意を払う - 大規模言語モデルからの不誠実な翻訳の軽減 Paying More Attention to Source Context: Mitigating Unfaithful Translations from Large Language Model ( http://arxiv.org/abs/2406.07036v1 ) ライセンス: Link先を確認	Hongbin Zhang, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang,	(参考訳) 大規模言語モデル(LLM)は、印象的な多言語機械翻訳能力を示した。しかし、エンコーダ-デコーダスタイルのモデルとは異なり、デコーダのみのLLMはソースとターゲットのコンテキストの間に明確なアライメントを欠いている。生成過程におけるコントリビューションスコアの分析により、LCMは対応するソーストークン上で以前に生成されたトークンに偏りがあることが判明し、不誠実な翻訳につながった。この問題に対処するために、ゼロショットプロンプトにおけるソースとターゲットの両方の観点から、LLMがソースコンテキストにもっと注意を払うよう提案する。 1) ソースコンテキストの注意重みを調整する。 2)無関係な目標プレフィックスの影響を抑える。さらに,提案する。 3)命令チューニングにおいて、ターゲットプレフィックスの過度な信頼を避ける。 LLM生成した不誠実な翻訳と一般的なテストセットに焦点をあてた、人間による不誠実なテストセットによる実験結果から、複数の言語対にわたる手法の有効性を検証した。さらに人的評価は,幻覚翻訳の削減と忠実な翻訳生成の促進に有効であることを示す。 Large language models (LLMs) have showcased impressive multilingual machine translation ability. However, unlike encoder-decoder style models, decoder-only LLMs lack an explicit alignment between source and target contexts. Analyzing contribution scores during generation processes revealed that LLMs can be biased towards previously generated tokens over corresponding source tokens, leading to unfaithful translations. To address this issue, we propose to encourage LLMs to pay more attention to the source context from both source and target perspectives in zeroshot prompting: 1) adjust source context attention weights; 2) suppress irrelevant target prefix influence; Additionally, we propose 3) avoiding over-reliance on the target prefix in instruction tuning. Experimental results from both human-collected unfaithfulness test sets focusing on LLM-generated unfaithful translations and general test sets, verify our methods' effectiveness across multiple language pairs. Further human evaluation shows our method's efficacy in reducing hallucinatory translations and facilitating faithful translation generation.	翻訳日:2024-06-12 17:04:09 公開日:2024-06-11
# PanoSSC: 自律走行のための単眼パノプティカル3Dシーンの再構築 PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving ( http://arxiv.org/abs/2406.07037v1 ) ライセンス: Link先を確認	Yining Shi, Jiusi Li, Kun Jiang, Ke Wang, Yunlong Wang, Mengmeng Yang, Diange Yang,	(参考訳) 視覚中心の占有ネットワークは、周囲の環境を均一なボクセルとセマンティクスで表現しており、カメラのみの自律運転認識システムの安全運転の新たなトレンドとなっている。現代の占有ネットワークは主に、ボクセルのセマンティックな予測によって、物体表面から見えるボクセルを再構築することに焦点を当てている。通常、1つの物体の矛盾した予測と、隣接する物体の混合予測に悩まされる。これらの混乱は下流計画モジュールの安全性を損なう可能性がある。そこで本研究では,3次元ボクセルシナリオにおけるパノプティクスのセグメンテーションについて検討し,PanoSSCというインスタンス対応の占有ネットワークを提案する。我々は、フォアグラウンドオブジェクトとバックグラウンドを別々に予測し、後処理で両方をマージする。前景のインスタンスグループ化のために,個々のオブジェクトを効率的に抽出できる新しい3Dインスタンスマスクデコーダを提案する。幾何学的再構成,3次元セマンティックセグメンテーション,および3次元インスタンスセグメンテーションをPanoSSCフレームワークに統合し,汎視的ボクセルの評価のための新しい指標を提案する。大規模な実験により,セマンティックKITTIセマンティックシーン補完ベンチマークの競合結果が得られた。 Vision-centric occupancy networks, which represent the surrounding environment with uniform voxels with semantics, have become a new trend for safe driving of camera-only autonomous driving perception systems, as they are able to detect obstacles regardless of their shape and occlusion. Modern occupancy networks mainly focus on reconstructing visible voxels from object surfaces with voxel-wise semantic prediction. Usually, they suffer from inconsistent predictions of one object and mixed predictions for adjacent objects. These confusions may harm the safety of downstream planning modules. To this end, we investigate panoptic segmentation on 3D voxel scenarios and propose an instance-aware occupancy network, PanoSSC. We predict foreground objects and backgrounds separately and merge both in post-processing. For foreground instance grouping, we propose a novel 3D instance mask decoder that can efficiently extract individual objects. we unify geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into PanoSSC framework and propose new metrics for evaluating panoptic voxels. Extensive experiments show that our method achieves competitive results on SemanticKITTI semantic scene completion benchmark.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# オフラインRLにおける限定データ処理のためのドメイン知識の統合 Integrating Domain Knowledge for handling Limited Data in Offline RL ( http://arxiv.org/abs/2406.07041v1 ) ライセンス: Link先を確認	Briti Gangopadhyay, Zhao Wang, Jia-Fong Yeh, Shingo Takamatsu,	(参考訳) 静的データセットから学習する機能により、オフライン強化学習(RL)が現実のアプリケーションにとって魅力的な道として登場した。しかし、最先端のオフラインRLアルゴリズムは、状態空間内の特定の領域に限定された限られたデータに直面した場合に、準最適に実行する。性能劣化は、オフラインのRLアルゴリズムが希少または未確認の観測に対して適切な動作を学習できないことに起因する。本稿では,新しいドメイン知識に基づく正規化手法を提案し,初期ドメイン知識を適応的に改良し,部分省略状態を持つ限られたデータの性能を著しく向上させる。重要な洞察は、正規化という用語が、スパースサンプルとドメイン知識によってカバーされた観測されていない状態に対する誤ったアクションを緩和するということである。標準的な離散環境データセットに対する実証的な評価は、制限されたデータで運用されている既存のオフラインRLアルゴリズムと比較して、少なくとも27%の平均的なパフォーマンス向上を示している。 With the ability to learn from static datasets, Offline Reinforcement Learning (RL) emerges as a compelling avenue for real-world applications. However, state-of-the-art offline RL algorithms perform sub-optimally when confronted with limited data confined to specific regions within the state space. The performance degradation is attributed to the inability of offline RL algorithms to learn appropriate actions for rare or unseen observations. This paper proposes a novel domain knowledge-based regularization technique and adaptively refines the initial domain knowledge to considerably boost performance in limited data with partially omitted states. The key insight is that the regularization term mitigates erroneous actions for sparse samples and unobserved states covered by domain knowledge. Empirical evaluations on standard discrete environment datasets demonstrate a substantial average performance increase of at least 27% compared to existing offline RL algorithms operating on limited data.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# EFFOcc:Effcient Fusionベースの3D Occupancy Networkのための最小ベースライン EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network ( http://arxiv.org/abs/2406.07042v1 ) ライセンス: Link先を確認	Yining Shi, Kun Jiang, Ke Wang, Kangan Qian, Yunlong Wang, Jiusi Li, Tuopu Wen, Mengmeng Yang, Yiliang Xu, Diange Yang,	(参考訳) 3D占有予測(Occ)は、運転シーンを意味を持った一様分割された3Dボクセルグリッドとして表現する自律運転分野において、急速に困難な認識課題である。 3次元物体検出と比較して、格子知覚は不規則な形状、未知のカテゴリー、あるいは部分的に隠蔽された一般物体をよりよく認識する利点がある。しかし、既存の3D占有ネットワーク(occnets)は計算的に重く、ラベルの空白である。モデル複雑性の観点では、occnetは一般に重いConv3Dモジュールまたはボクセルレベルのトランスフォーマーで構成されている。ラベルアノテーションの要件に関しては、Occnetは大規模で高価な高密度のボクセルラベルで管理されている。過剰なネットワークパラメータとラベルアノテーションの要求によって引き起こされるモデルとデータ非効率は、Occnetのオンボード展開を著しく妨げます。本稿では,最先端の精度を達成しつつ,ネットワークの複雑さとラベル要件を最小限に抑える,効率的な3d占有ネットワーク(EFFOcc)を提案する。 EFFOccは単純な2D演算子のみを使用し、Occの精度をOcc3D-nuScenes、Occ3D-Waymo、OpenOccupancy-nuScenesといった大規模ベンチマークの最先端に改善する。 Occ3D-nuScenesベンチマークでは、EFFOccは18.4Mのパラメータしか持たず、我々の知る限り、平均IoU(mIoU)で50.46となる。さらに,ラベル付きデータの要求量を削減するための2段階のアクティブラーニング戦略を提案する。 6\%のラベル付きボクセルでトレーニングされたアクティブEFFOccは47.19 mIoUを達成した。提案したEFFOccは、地域分割蒸留の助けを借りて、視覚のみの占有率予測の改善もサポートしている。コードとデモビデオはhttps://github.com/synsin0/EFFOcc.comで入手できる。 3D occupancy prediction (Occ) is a rapidly rising challenging perception task in the field of autonomous driving which represents the driving scene as uniformly partitioned 3D voxel grids with semantics. Compared to 3D object detection, grid perception has great advantage of better recognizing irregularly shaped, unknown category, or partially occluded general objects. However, existing 3D occupancy networks (occnets) are both computationally heavy and label-hungry. In terms of model complexity, occnets are commonly composed of heavy Conv3D modules or transformers on the voxel level. In terms of label annotations requirements, occnets are supervised with large-scale expensive dense voxel labels. Model and data inefficiency, caused by excessive network parameters and label annotations requirement, severely hinder the onboard deployment of occnets. This paper proposes an efficient 3d occupancy network (EFFOcc), that targets the minimal network complexity and label requirement while achieving state-of-the-art accuracy. EFFOcc only uses simple 2D operators, and improves Occ accuracy to the state-of-the-art on multiple large-scale benchmarks: Occ3D-nuScenes, Occ3D-Waymo, and OpenOccupancy-nuScenes. On Occ3D-nuScenes benchmark, EFFOcc has only 18.4M parameters, and achieves 50.46 in terms of mean IoU (mIoU), to our knowledge, it is the occnet with minimal parameters compared with related occnets. Moreover, we propose a two-stage active learning strategy to reduce the requirements of labelled data. Active EFFOcc trained with 6\% labelled voxels achieves 47.19 mIoU, which is 95.7% fully supervised performance. The proposed EFFOcc also supports improved vision-only occupancy prediction with the aid of region-decomposed distillation. Code and demo videos will be available at https://github.com/synsin0/EFFOcc.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# CVPR 2024 PVUW Workshop: Motion Expression Guided Video Segmentation 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation ( http://arxiv.org/abs/2406.07043v1 ) ライセンス: Link先を確認	Mingqi Gao, Jingnan Luo, Jinyu Yang, Jungong Han, Feng Zheng,	(参考訳) Motion Expression Guided Video Segmentation (MeViS)は、新しいタスクとして、ビデオオブジェクトセグメンテーション(RVOS)の分野に多くの新しい課題をもたらす。本稿では,この課題に対して,静的支配データとフレームサンプリングの有効性について検討し,検証した。本手法は,競技段階でのJ&Fスコア0.5447を達成し,PVUWチャレンジのMeViSトラックで1位となった。コードは以下の通り。 https://github.com/Tapall-AI/MeViS_Track_Solution_2024。 Motion Expression guided Video Segmentation (MeViS), as an emerging task, poses many new challenges to the field of referring video object segmentation (RVOS). In this technical report, we investigated and validated the effectiveness of static-dominant data and frame sampling on this challenging setting. Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeViS track of the PVUW Challenge. The code is available at: https://github.com/Tapall-AI/MeViS_Track_Solution_2024.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# 連続および離散量子バスを用いた光誘起ダイナミクス Photo-induced dynamics with continuous and discrete quantum baths ( http://arxiv.org/abs/2406.07047v1 ) ライセンス: Link先を確認	Zhaoxuan Xie, Mattia Moroder, Ulrich Schollwöck, Sebastian Paeckel,	(参考訳) 複雑な分子における光物理過程の超高速量子力学は、量子化学と生物学における様々な興味深い応用で非常に難しい計算問題である。オープン量子系の最近の発展に触発されて、マルコフの埋め込みを用いて、離散的で効果的なボゾン自由度の集合を通して連続環境を記述する純粋状態の未発見ハイブリッドバス法を導入する。本手法は, 連続スペクトル密度と鋭いピークの双方を記述できる。これにより、離散振動モードの集合のユニタリダイナミクスを用いた長期記憶効果のキャプチャや、リンドブラッドやレッドフィールドのマスター方程式を用いたメモリレスマルコフ環境の利用といった、従来の手法の限界を克服する。量子化学と生物学の2つのパラダイム的問題に対して,本手法をベンチマークする。単元的記述と比較して、ボソニックモードの数が非常に少なく、エクシトニックダイナミクスを正確に記述でき、計算速度がほぼ1桁に向上することを示した。さらに、光ハーベスティング複合体のスペクトル密度が$$\delta$-peakの効果を明示的に考慮し、環境の長期記憶が動的に与える影響を強く示している。 The ultrafast quantum dynamics of photophysical processes in complex molecules is an extremely challenging computational problem with a wide variety of fascinating applications in quantum chemistry and biology. Inspired by recent developments in open quantum systems, we introduce a pure-state unraveled hybrid-bath method that describes a continuous environment via a set of discrete, effective bosonic degrees of freedom using a Markovian embedding. Our method is capable of describing both, a continuous spectral density and sharp peaks embedded into it. Thereby, we overcome the limitations of previous methods, which either capture long-time memory effects using the unitary dynamics of a set of discrete vibrational modes or use memoryless Markovian environments employing a Lindblad or Redfield master equation. We benchmark our method against two paradigmatic problems from quantum chemistry and biology. We demonstrate that compared to unitary descriptions, a significantly smaller number of bosonic modes suffices to describe the excitonic dynamics accurately, yielding a computational speed-up of nearly an order of magnitude. Furthermore, we take into account explicitly the effect of a $\delta$-peak in the spectral density of a light-harvesting complex, demonstrating the strong impact of the long-time memory of the environment on the dynamics.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# GridPE: グリッドセルにインスパイアされたフレームワークでトランスフォーマーの位置エンコーディングを統合する GridPE: Unifying Positional Encoding in Transformers with a Grid Cell-Inspired Framework ( http://arxiv.org/abs/2406.07049v1 ) ライセンス: Link先を確認	Boyang Li, Yulin Wu, Nuoxian Huang,	(参考訳) 空間的位置と関係を理解することは、現代の人工知能システムの基本的能力である。人間の空間認知からの洞察は、この領域で貴重なガイダンスを提供する。最近の神経科学的な発見は、距離計算、経路積分、スケール識別を含む空間表現の基本的な神経成分としてのグリッド細胞の役割を強調している。本稿では,フーリエ解析にインスパイアされた新しい位置符号化方式とグリッドセルに関する計算神経科学の最新知見を紹介する。格子セルがフーリエ基底関数の和を通じて空間位置を符号化すると仮定すると、内部積計算における格子表現の変換不変性を示す。さらに,生物効率の原理に基づく多次元ユークリッド空間に対する最適格子スケール比を導出する。これらの計算原理を利用して、高次元空間内の位置を符号化するGrid-cellインスパイアされたPositional Encoding(GridPE)技術を開発した。 GridPEをPraamid Vision Transformerアーキテクチャに統合しました。我々の理論解析は、GridPEが任意の高次元空間における位置符号化のための統一的なフレームワークを提供することを示している。実験により、GridPEはトランスフォーマーの性能を著しく向上させ、人工知能システムの設計に神経科学的な洞察を取り入れることの重要性を強調した。 Understanding spatial location and relationships is a fundamental capability for modern artificial intelligence systems. Insights from human spatial cognition provide valuable guidance in this domain. Recent neuroscientific discoveries have highlighted the role of grid cells as a fundamental neural component for spatial representation, including distance computation, path integration, and scale discernment. In this paper, we introduce a novel positional encoding scheme inspired by Fourier analysis and the latest findings in computational neuroscience regarding grid cells. Assuming that grid cells encode spatial position through a summation of Fourier basis functions, we demonstrate the translational invariance of the grid representation during inner product calculations. Additionally, we derive an optimal grid scale ratio for multi-dimensional Euclidean spaces based on principles of biological efficiency. Utilizing these computational principles, we have developed a Grid-cell inspired Positional Encoding technique, termed GridPE, for encoding locations within high-dimensional spaces. We integrated GridPE into the Pyramid Vision Transformer architecture. Our theoretical analysis shows that GridPE provides a unifying framework for positional encoding in arbitrary high-dimensional spaces. Experimental results demonstrate that GridPE significantly enhances the performance of transformers, underscoring the importance of incorporating neuroscientific insights into the design of artificial intelligence systems.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# DualMamba:ハイパースペクトル画像分類のための軽量分光・空間マンバ畳み込みネットワーク DualMamba: A Lightweight Spectral-Spatial Mamba-Convolution Network for Hyperspectral Image Classification ( http://arxiv.org/abs/2406.07050v1 ) ライセンス: Link先を確認	Jiamu Sheng, Jingyi Zhou, Jiong Wang, Peng Ye, Jiayuan Fan,	(参考訳) 複合スペクトル-空間関係のモデル化の有効性と効率性は、ハイパースペクトル画像(HSI)分類において重要である。 CNNやトランスフォーマーをベースとした既存の手法の多くは、依然として計算上の重荷に悩まされており、グローバル・ローカルなスペクトル空間的特徴表現を捉えるための改善の余地がある。そこで本研究では,HSI分類のための軽量なデュアルストリームマンバ畳み込みネットワーク(DualMamba)を提案する。具体的には,グローバルおよび局所スペクトル空間の特徴を抽出するために,並列軽量なMambaブロックとCNNブロックを開発した。まず、クロスアテンションスペクトル-空間マンバ加群は、線形複雑度におけるマンバの大域的モデリングを活用するために提案される。このモジュール内の動的位置埋め込みは、視覚的シーケンスの空間的位置情報を強化するように設計されている。軽量なスペクトル/空間的マンバブロックは、効率的な走査戦略と、グローバルなスペクトル/空間的特徴を効率的に抽出する軽量なマンバ設計からなる。また、クロスアテンションスペクトル-空間融合は、クロス相関を学習し、スペクトル-空間的特徴を融合するように設計されている。第二に、光スペクトル空間残差畳み込みモジュールは、残差学習を通して局所スペクトル空間特徴を抽出するために、光スペクトルおよび空間枝を用いて提案される。最後に, 局所スペクトル空間表現のためのグローバルマンバ特徴と局所畳み込み特徴を動的に結合する適応的グローバル局所融合を提案する。現状のHSI分類法と比較して,DualMambaが3つの公開HSIデータセットに対して有意な分類精度を実現し,モデルパラメータと浮動小数点演算(FLOP)の精度が向上したことを示す実験結果が得られた。 The effectiveness and efficiency of modeling complex spectral-spatial relations are both crucial for Hyperspectral image (HSI) classification. Most existing methods based on CNNs and transformers still suffer from heavy computational burdens and have room for improvement in capturing the global-local spectral-spatial feature representation. To this end, we propose a novel lightweight parallel design called lightweight dual-stream Mamba-convolution network (DualMamba) for HSI classification. Specifically, a parallel lightweight Mamba and CNN block are first developed to extract global and local spectral-spatial features. First, the cross-attention spectral-spatial Mamba module is proposed to leverage the global modeling of Mamba at linear complexity. Within this module, dynamic positional embedding is designed to enhance the spatial location information of visual sequences. The lightweight spectral/spatial Mamba blocks comprise an efficient scanning strategy and a lightweight Mamba design to efficiently extract global spectral-spatial features. And the cross-attention spectral-spatial fusion is designed to learn cross-correlation and fuse spectral-spatial features. Second, the lightweight spectral-spatial residual convolution module is proposed with lightweight spectral and spatial branches to extract local spectral-spatial features through residual learning. Finally, the adaptive global-local fusion is proposed to dynamically combine global Mamba features and local convolution features for a global-local spectral-spatial representation. Compared with state-of-the-art HSI classification methods, experimental results demonstrate that DualMamba achieves significant classification accuracy on three public HSI datasets and a superior reduction in model parameters and floating point operations (FLOPs).	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# MPSDynamics.jl:有限温度(非マルコフ)開量子系のテンソルネットワークシミュレーション MPSDynamics.jl: Tensor network simulations for finite-temperature (non-Markovian) open quantum system dynamics ( http://arxiv.org/abs/2406.07052v1 ) ライセンス: Link先を確認	Thibaut Lacroix, Brieuc Le Dé, Angela Riva, Angus J. Dunnett, Alex W. Chin,	(参考訳) MPSDynamics.jlパッケージは、ゼロ温度と有限温度でオープン量子システムシミュレーションを実行するための使いやすいインターフェースを提供する。このパッケージは、環境連鎖マッピングに基づくオルソノーマル多項式アルゴリズム(T-TEDOPA)を用いた、最先端の数値的高精度熱化時間進化密度演算子を用いて、非マルコフ開系力学の研究を目的として開発されている。シミュレーションは、行列積状態 (MPS) とツリーテンソルネットワーク (TTN) 状態として量子状態のテンソルネットワーク表現に依存している。 Juliaプログラミング言語で書かれたMPSDynamics.jlは、時間進化のためのTDVP(Time-Dependent Variational Principle)のいくつかの変種を選択できる汎用的なオープンソースパッケージである。このパッケージは、シングル・サイト・オブザーバブルとマルチ・サイト・オブザーバブルの測定、データの保存とロギングの強力なサポートも提供しており、多体物理学の研究に有用なツールとなっている。現在、長距離の相互作用、時間依存のハミルトン、複数の環境、ボソニックおよびフェルミオン環境、および連星系環境観測装置を扱っている。 The MPSDynamics.jl package provides an easy to use interface for performing open quantum systems simulations at zero and finite temperatures. The package has been developed with the aim of studying non-Markovian open system dynamics using the state-of-the-art numerically exact Thermalized-Time Evolving Density operator with Orthonormal Polynomials Algorithm (T-TEDOPA) based on environment chain mapping. The simulations rely on a tensor network representation of the quantum states as matrix product states (MPS) and tree tensor network (TTN) states. Written in the Julia programming language, MPSDynamics.jl is a versatile open-source package providing a choice of several variants of the Time-Dependent Variational Principle (TDVP) method for time evolution (including novel bond-adaptive one-site algorithms). The package also provides strong support for the measurement of single and multi-site observables, as well as the storing and logging of data, which makes it a useful tool for the study of many-body physics. It currently handles long-range interactions, time-dependent Hamiltonians, multiple environments, bosonic and fermionic environments, and joint system-environment observables.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# TelecomRAG: 検索型Augmented GenerationとLLMによるテレコム標準の標準化 TelecomRAG: Taming Telecom Standards with Retrieval Augmented Generation and LLMs ( http://arxiv.org/abs/2406.07053v1 ) ライセンス: Link先を確認	Girma M. Yilma, Jose A. Ayala-Romero, Andres Garcia-Saavedra, Xavier Costa-Perez,	(参考訳) 大規模言語モデル(LLM)は通信産業を変革する大きな可能性を秘めている。プロフェッショナルが複雑な標準を理解し、コードを生成し、開発を加速するのに役立ちます。しかし、従来のLLMは、通信業務に不可欠な精度と情報源の検証に苦慮している。これを解決するには、通信規格に適合した特殊なLCMベースのソリューションが必要である。 Retrieval-augmented Generation (RAG)は、正確な事実に基づく回答を生成する方法を提供する。本稿では,TelecomRAGを提案する。TelecomRAGは,正確な,詳細な,検証可能な応答を提供する通信標準アシスタントのフレームワークである。本実装では,3GPPリリース16およびリリース18仕様文書から構築した知識ベースを用いて,このアシスタントが汎用LLMを超越し,高い精度,技術的深度,検証性を提供し,通信分野にとって重要な価値を提供することを示す。 Large Language Models (LLMs) have immense potential to transform the telecommunications industry. They could help professionals understand complex standards, generate code, and accelerate development. However, traditional LLMs struggle with the precision and source verification essential for telecom work. To address this, specialized LLM-based solutions tailored to telecommunication standards are needed. Retrieval-augmented generation (RAG) offers a way to create precise, fact-based answers. This paper proposes TelecomRAG, a framework for a Telecommunication Standards Assistant that provides accurate, detailed, and verifiable responses. Our implementation, using a knowledge base built from 3GPP Release 16 and Release 18 specification documents, demonstrates how this assistant surpasses generic LLMs, offering superior accuracy, technical depth, and verifiability, and thus significant value to the telecommunications field.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# CoEvol: マルチエージェント連携によるインストラクションファインタニングのためのより良い応答の構築 CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation ( http://arxiv.org/abs/2406.07054v1 ) ライセンス: Link先を確認	Renhao Li, Minghuan Tan, Derek F. Wong, Min Yang,	(参考訳) 近年,大規模言語モデル (LLM) における命令微調整 (IFT) が注目されている。 IFTデータの自動構築と効率的な選択の試みがなされている。しかし,従来の手法ではデータ品質向上にLLMの可能性を十分に活用できていない。 IFTデータ内の応答は、LSM自体の能力を活用することでさらに強化される可能性がある。本稿では,命令に対する応答を改善するためのLLMベースのマルチエージェント協調フレームワークであるCoEvolを提案する。反応を効果的に洗練するために,議論-助言-編集-ジャッジのパラダイムに従って反復的な枠組みを開発する。フレームワーク内での編集提案の多様性と信頼性を確保するために、2段階のマルチエージェントの議論戦略がさらに考案されている。 MT-Bench と AlpacaEval により評価されたCoEvol を用いたモデルでは,LLM の命令追従能力の向上が実証された。 In recent years, instruction fine-tuning (IFT) on large language models (LLMs) has garnered considerable attention to enhance model performance on unseen tasks. Attempts have been made on automatic construction and effective selection for IFT data. However, we posit that previous methods have not fully harnessed the potential of LLMs for enhancing data quality. The responses within IFT data could be further enhanced by leveraging the capabilities of LLMs themselves. In this paper, we propose CoEvol, an LLM-based multi-agent cooperation framework for the improvement of responses to instructions. To effectively refine the responses, we develop an iterative framework following a debate-advise-edit-judge paradigm. A two-stage multi-agent debate strategy is further devised to ensure the diversity and reliability of editing suggestions within the framework. Empirically, models equipped with CoEvol outperform competitive baselines evaluated by MT-Bench and AlpacaEval, demonstrating its effectiveness in enhancing instruction-following capabilities for LLMs.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# プログラム可能な原子空洞系に対する適応量子最適化アルゴリズム Adaptive quantum optimization algorithms for programmable atom-cavity systems ( http://arxiv.org/abs/2406.07055v1 ) ライセンス: Link先を確認	Yuchen Luo, Xiaopeng Li, Jian Lin,	(参考訳) 短期デバイスの特定の制約に適応した量子アルゴリズムの開発は、実用的な量子優位性に向けた重要なステップである。最近の研究 (Phys. Rev. Lett. 131, 103601(2023)) において、光学キャビティ内の冷たい原子は、プログラム可能な全対全相互作用を持つ普遍量子オプティマイザとして構築でき、原子に対する効果的なハミルトニアンは、数分割問題(NPP)を直接コードする。本稿では,量子アニール法 (QA) と量子近似最適化法 (QAOA) の性能を数値的に検討し,原子量子ビットの基底状態に符号化されたNPPの解を求める。標準QAの成功確率は問題の大きさとともに急速に低下する。最適化されたアニーリングパスや不均一な駆動フィールドは、成功確率を緩やかに改善するだけである。同様に、標準QAOAは常に偽の局所最小値に閉じ込められ、量子回路の深さを増大させるほど大きな性能改善はない。反断熱駆動にインスパイアされたQAOAの適応アンサッツを提案し,NPPハミルトンのパラメータ自由度を高次反断熱項に一致させる。数値シミュレーションにより、我々の適応QAOAは、非常に小さな回路深さで最適解が得られることがわかった。したがって、QAOAパフォーマンスを改善するための追加パラメータの追加最適化コストを支払う価値がある。したがって、我々の適応QAOAは、プログラム可能な原子空洞系に対して、その量子コヒーレンス時間内での競合計算力を実証する有望な選択肢を提供する。 Developing quantum algorithms adaptive to specific constraints of near-term devices is an essential step towards practical quantum advantage. In a recent work [Phys. Rev. Lett. 131, 103601(2023)], we show cold atoms in an optical cavity can be built as a universal quantum optimizer with programmable all-to-all interactions, and the effective Hamiltonian for atoms directly encodes number partitioning problems (NPPs). Here, we numerically investigate the performance of quantum annealing (QA) and quantum approximate optimization algorithm (QAOA) to find the solution of NPP that is encoded in the ground state of atomic qubits. We find the success probability of the standard QA decays rapidly with the problem size. The optimized annealing path or inhomogeneous driving fields only lead to mild improvement on the success probability. Similarly, the standard QAOA always gets trapped in a false local minimum, and there is no significant performance improvement as we increase the depth of the quantum circuit. Inspired by the counterdiabatic driving, we propose an adaptive ansatz of QAOA which releases the parameter freedom of the NPP Hamiltonian to match higher-order counterdiabatic terms. Through numerical simulations, we find that our adaptive QAOA can achieve the optimal solution within very small circuit depth. It is thus worth paying the extra optimization cost of additional parameters for improving QAOA performance. Therefore, our adaptive QAOA provides a promising choice for programmable atom-cavity systems to demonstrate competitive computational power within its quantum coherence time.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# LLM用KVヘッドを効果的に圧縮する Effectively Compress KV Heads for LLM ( http://arxiv.org/abs/2406.07056v1 ) ライセンス: Link先を確認	Hao Yu, Zelan Yang, Shen Li, Yong Li, Jianxin Wu,	(参考訳) 事前訓練された大規模言語モデル(LLM)の出現は、様々な自然言語処理タスクに革命をもたらした。これらのモデルは、キーバリュー(KV)キャッシュを使用して、以前のトークンの冗長な計算を排除する自動回帰復号機構を主に採用している。それでも、コンテキスト長とバッチサイズが大きくなるにつれて、KVキャッシュのメモリフットプリントの線形拡張はLLMデプロイメントの鍵となるボトルネックとなり、生成速度が大幅に低下する。この問題を緩和するために、KVヘッドを減らし、マルチヘッドアテンション(MHA)に匹敵する精度で推論を高速化するため、MHA(Multi-query attention)やGQA(Grouped-query attention)といった従来の技術が開発されている。その効果にもかかわらず、MHAを圧縮するための既存の戦略は、しばしばKVキャッシュの固有の特性を見落としている。そこで本研究では,KVキャッシュの低ランク特性について検討し,KVヘッドを圧縮するための新しい手法を提案する。特に、圧縮誤差を最小限に抑えるため、MHA-to-GQA変換を慎重に最適化し、ロータリー位置埋め込み(RoPE)と互換性を保つため、キーキャッシュをRoPEで特別な戦略を導入する。提案手法は, 資源制約環境下でのLLMの効率向上に期待できる, 元のLLMに匹敵する性能を維持しつつ, KVヘッドの半分あるいは3分の3の圧縮が可能であることを実証する。 The advent of pre-trained large language models (LLMs) has revolutionized various natural language processing tasks. These models predominantly employ an auto-regressive decoding mechanism that utilizes Key-Value (KV) caches to eliminate redundant calculations for previous tokens. Nevertheless, as context lengths and batch sizes increase, the linear expansion in memory footprint of KV caches becomes a key bottleneck of LLM deployment, which decreases generation speeds significantly. To mitigate this issue, previous techniques like multi-query attention (MQA) and grouped-query attention (GQA) have been developed, in order to reduce KV heads to accelerate inference with comparable accuracy to multi-head attention (MHA). Despite their effectiveness, existing strategies for compressing MHA often overlook the intrinsic properties of the KV caches. In this work, we explore the low-rank characteristics of the KV caches and propose a novel approach for compressing KV heads. In particular, we carefully optimize the MHA-to-GQA transformation to minimize compression error, and to remain compatible with rotary position embeddings (RoPE), we also introduce specialized strategies for key caches with RoPE. We demonstrate that our method can compress half or even three-quarters of KV heads while maintaining performance comparable to the original LLMs, which presents a promising direction for more efficient LLM deployment in resource-constrained environments.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# マルチモーダル大言語モデルの信頼性ベンチマーク : 総合的研究 Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study ( http://arxiv.org/abs/2406.07057v1 ) ライセンス: Link先を確認	Yichi Zhang, Yao Huang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Yifan Wang, Huanran Chen, Xiao Yang, Xingxing Wei, Hang Su, Yinpeng Dong, Jun Zhu,	(参考訳) MLLM(Multimodal Large Language Models)の様々なタスクにまたがる優れた能力にもかかわらず、それらは依然として重大な信頼性の課題に直面している。しかし、信頼性の高いMLLMの評価に関する現在の文献は限定的であり、今後の改善に関する詳細な洞察を提供するための総合的な評価が欠如している。本研究では,MLLMの信頼性に関する最初の総合的かつ統一的なベンチマークであるMultiTrustを,真理性,安全性,堅牢性,公正性,プライバシの5つの面で確立する。我々のベンチマークでは、マルチモーダルリスクとクロスモーダルインパクトの両方に対処する厳格な評価戦略を採用しており、32のタスクを自己計算したデータセットでカバーしている。 21の近代MLLMによる大規模な実験は、これまで未解決の信頼性の問題とリスクを明らかにし、マルチモーダリティによって導入された複雑さを強調し、その信頼性を高めるための高度な方法論の必要性を強調している。例えば、典型的なプロプライエタリなモデルは、視覚的に紛らわしいイメージの認識に苦慮し、マルチモーダルなジェイルブレイクや敵対的攻撃に弱い;MLLMは、テキストでプライバシを開示し、推論において無関係なイメージと組み合わせてもイデオロギーや文化的バイアスを明らかにする傾向があり、マルチモーダルがベースLLMの内部リスクを増幅することを示している。さらに、我々は、この重要な分野における将来の進歩を促進することを目的として、標準化された信頼性研究のためのスケーラブルなツールボックスをリリースする。コードとリソースは、https://multi-trust.github.io/.comで公開されている。 Despite the superior capabilities of Multimodal Large Language Models (MLLMs) across diverse tasks, they still face significant trustworthiness challenges. Yet, current literature on the assessment of trustworthy MLLMs remains limited, lacking a holistic evaluation to offer thorough insights into future improvements. In this work, we establish MultiTrust, the first comprehensive and unified benchmark on the trustworthiness of MLLMs across five primary aspects: truthfulness, safety, robustness, fairness, and privacy. Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts, encompassing 32 diverse tasks with self-curated datasets. Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks, highlighting the complexities introduced by the multimodality and underscoring the necessity for advanced methodologies to enhance their reliability. For instance, typical proprietary models still struggle with the perception of visually confusing images and are vulnerable to multimodal jailbreaking and adversarial attacks; MLLMs are more inclined to disclose privacy in text and reveal ideological and cultural biases even when paired with irrelevant images in inference, indicating that the multimodality amplifies the internal risks from base LLMs. Additionally, we release a scalable toolbox for standardized trustworthiness research, aiming to facilitate future advancements in this important field. Code and resources are publicly available at: https://multi-trust.github.io/.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# 音声認識による小学校の読解ミス検出 Reading Miscue Detection in Primary School through Automatic Speech Recognition ( http://arxiv.org/abs/2406.07060v1 ) ライセンス: Link先を確認	Lingyun Gao, Cristian Tejedor-Garcia, Helmer Strik, Catia Cucchiarini,	(参考訳) 自動読解診断システムは,読解演習を効率よく評価する教師と,読解演習にフィードバックでアクセスする学生の両方に有用である。しかし、英語以外の言語における子音の自動音声認識(ASR)についての研究は限られており、ASRに基づく読影診断システムについての研究は限られている。本研究は, オランダ語母語話者の言語認識と, 読解ミスの発見に有効なSOTAモデルについて検討した。 We found that Hubert Large finetuned on Dutch speech achieves SOTA phoneme-level child speech recognition (PER at 23.1\%), while Whisper (Faster Whisper Large-v2) achieves SOTA word-level performance (WER at 9.8\%)。以上の結果から, Wav2Vec2 Large と Whisper は誤読検出に最適な2つの ASR モデルであることが示唆された。特に、Wav2Vec2 Largeは0.83で、Whisperは0.52でF1スコアは0.52である。 Automatic reading diagnosis systems can benefit both teachers for more efficient scoring of reading exercises and students for accessing reading exercises with feedback more easily. However, there are limited studies on Automatic Speech Recognition (ASR) for child speech in languages other than English, and limited research on ASR-based reading diagnosis systems. This study investigates how efficiently state-of-the-art (SOTA) pretrained ASR models recognize Dutch native children speech and manage to detect reading miscues. We found that Hubert Large finetuned on Dutch speech achieves SOTA phoneme-level child speech recognition (PER at 23.1\%), while Whisper (Faster Whisper Large-v2) achieves SOTA word-level performance (WER at 9.8\%). Our findings suggest that Wav2Vec2 Large and Whisper are the two best ASR models for reading miscue detection. Specifically, Wav2Vec2 Large shows the highest recall at 0.83, whereas Whisper exhibits the highest precision at 0.52 and an F1 score of 0.52.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# 2.5D多重インスタンス学習による3次元病理データの探索と病理組織学的評価 Triage of 3D pathology data via 2.5D multiple-instance learning to guide pathologist assessments ( http://arxiv.org/abs/2406.07061v1 ) ライセンス: Link先を確認	Gan Gao, Andrew H. Song, Fiona Wang, David Brenes, Rui Wang, Sarah S. L. Chow, Kevin W. Bishop, Lawrence D. True, Faisal Mahmood, Jonathan T. C. Liu,	(参考訳) ヒトの組織生検に基づく正確な患者の診断は、病理学者が3D体積組織から分離した薄い2D組織スライスを限られた数だけ評価する、現在の臨床実践によって妨げられている。オープントップ光シート顕微鏡のような非破壊的な3D病理の最近の進歩は、空間的に不均一な組織形態の包括的イメージングを可能にし、診断精度を向上させることができる。 3D画像から見慣れた2D H&Eライクな画像セクションを観察することで、病理医を最終診断に頼ることが考えられる。しかし, 大規模3次元病理データセットの手作業による検査は不可能である。そこで本研究では,3次元生検において最もリスクの高い2Dスライスを自動的に同定し,病理医による時間効率のレビューを可能にする深層学習トリアージアプローチであるCARP3Dを提案する。生検のスライスについて,各スライス内の2Dパッチの注意に基づくアグリゲーションを行い,次に隣接するスライスをプールし,コンテキスト認識2.5Dリスクスコアを算出することにより,そのリスクを推定する。前立腺がんのリスク層化では、CARP3Dは2Dセクションの独立解析(AUC=81.3%)に依存して、90.4%の曲線(AUC)の領域を達成している。これらの結果は、追加の深度コンテキストを統合することでモデルの識別能力を高めることを示唆している。結論として,CARP3Dは高リスクスライスを高精度にトリアージすることで,病理診断を改善する可能性を秘めている。 Accurate patient diagnoses based on human tissue biopsies are hindered by current clinical practice, where pathologists assess only a limited number of thin 2D tissue slices sectioned from 3D volumetric tissue. Recent advances in non-destructive 3D pathology, such as open-top light-sheet microscopy, enable comprehensive imaging of spatially heterogeneous tissue morphologies, offering the feasibility to improve diagnostic determinations. A potential early route towards clinical adoption for 3D pathology is to rely on pathologists for final diagnosis based on viewing familiar 2D H&E-like image sections from the 3D datasets. However, manual examination of the massive 3D pathology datasets is infeasible. To address this, we present CARP3D, a deep learning triage approach that automatically identifies the highest-risk 2D slices within 3D volumetric biopsy, enabling time-efficient review by pathologists. For a given slice in the biopsy, we estimate its risk by performing attention-based aggregation of 2D patches within each slice, followed by pooling of the neighboring slices to compute a context-aware 2.5D risk score. For prostate cancer risk stratification, CARP3D achieves an area under the curve (AUC) of 90.4% for triaging slices, outperforming methods relying on independent analysis of 2D sections (AUC=81.3%). These results suggest that integrating additional depth context enhances the model's discriminative capabilities. In conclusion, CARP3D has the potential to improve pathologist diagnosis via accurate triage of high-risk slices within large-volume 3D pathology datasets.	翻訳日:2024-06-12 16:54:22 公開日:2024-06-11
# 深層学習モデルを用いたオンラインデータ同化による熱帯太平洋上洋の再構築 Reconstructing the Tropical Pacific Upper Ocean using Online Data Assimilation with a Deep Learning model ( http://arxiv.org/abs/2406.07063v1 ) ライセンス: Link先を確認	Zilu Meng, Gregory J. Hakim,	(参考訳) 変圧器アーキテクチャに基づくディープラーニング(DL)モデルは、気候モデルデータセットに基づいて訓練され、熱帯太平洋の標準線形逆モデル(LIM)と比較される。 DLモデルでは, 再解析データセットで検証した場合, LIMよりも精度の高い予測が得られた。次に,既存のサンゴのプロキシ測定を模倣した24の海面温度観測結果から,月平均上洋を再構成するアンサンブルカルマンフィルタの有効性を評価し,DLモデルとLIMの結果を比較した。 DLモデルの信号減衰により,後流実験からノイズを付加することで,新しいインフレーション手法を実装した。以上の結果から, DLモデルによる観察を同化することにより, 平均観測時間を1ヶ月から1年間に短縮できる可能性が示唆された。改善された再構成は、過去の観測の記憶を将来の同化時間にマッピングするDLモデルの予測能力の向上によるものである。 A deep learning (DL) model, based on a transformer architecture, is trained on a climate-model dataset and compared with a standard linear inverse model (LIM) in the tropical Pacific. We show that the DL model produces more accurate forecasts compared to the LIM when tested on a reanalysis dataset. We then assess the ability of an ensemble Kalman filter to reconstruct the monthly-averaged upper ocean from a noisy set of 24 sea-surface temperature observations designed to mimic existing coral proxy measurements, and compare results for the DL model and LIM. Due to signal damping in the DL model, we implement a novel inflation technique by adding noise from hindcast experiments. Results show that assimilating observations with the DL model yields better reconstructions than the LIM for observation averaging times ranging from one month to one year. The improved reconstruction is due to the enhanced predictive capabilities of the DL model, which map the memory of past observations to future assimilation times.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# TIM:通知システムにおける時間的相互作用モデル TIM: Temporal Interaction Model in Notification System ( http://arxiv.org/abs/2406.07067v1 ) ライセンス: Link先を確認	Huxiao Ji, Haitao Yang, Linchuan Li, Shunyu Zhang, Cunyi Zhang, Xuanping Li, Wenwu Ou,	(参考訳) 現代のモバイルアプリケーションは、日々のアクティブユーザーを獲得し、ユーザーエンゲージメントを高めるために通知システムに大きく依存している。ユーザーに積極的にリーチできるシステムでは、いつユーザーに通知を送るかを決める必要がある。多くの研究者が通知のタイミングを最適化するために研究してきたが、ユーザーの行動パターンをモデル化することなく、ユーザのコンテキストの特徴のみを利用した。さらに、これらの取り組みは個々の通知のみに焦点を当てており、複数の通知の全体的タイミングを一定期間内に最適化する研究は不足している。これらのギャップを埋めるために、短いビデオアプリケーションKuaishouにおいて、1日毎にCTRを推定することで、ユーザの行動パターンをモデル化するTIM(Temporal Interaction Model)を提案する。 TIMは、通知レシート、クリック、ウォッチタイム、効果的なビューなどの長期的なユーザ履歴のインタラクションシーケンス機能を活用し、時間的注意ユニット(TAU)を使用してユーザーの行動パターンを抽出する。さらに,混乱を最小限に抑えつつ,ユーザのエンゲージメントを向上させるために,全体的通知の時間制御を行うエレガントな戦略を提供する。オフライン実験とオンラインA/BテストによるTIMの有効性を評価する。以上の結果から,TIMはユーザの行動を予測する信頼性の高いツールであり,不適切な乱れを生じさせることなく,ユーザのエンゲージメントを著しく向上させることが示唆された。 Modern mobile applications heavily rely on the notification system to acquire daily active users and enhance user engagement. Being able to proactively reach users, the system has to decide when to send notifications to users. Although many researchers have studied optimizing the timing of sending notifications, they only utilized users' contextual features, without modeling users' behavior patterns. Additionally, these efforts only focus on individual notifications, and there is a lack of studies on optimizing the holistic timing of multiple notifications within a period. To bridge these gaps, we propose the Temporal Interaction Model (TIM), which models users' behavior patterns by estimating CTR in every time slot over a day in our short video application Kuaishou. TIM leverages long-term user historical interaction sequence features such as notification receipts, clicks, watch time and effective views, and employs a temporal attention unit (TAU) to extract user behavior patterns. Moreover, we provide an elegant strategy of holistic notifications send time control to improve user engagement while minimizing disruption. We evaluate the effectiveness of TIM through offline experiments and online A/B tests. The results indicate that TIM is a reliable tool for forecasting user behavior, leading to a remarkable enhancement in user engagement without causing undue disturbance.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# HalluDial: 対話レベル自動幻覚評価のための大規模ベンチマーク HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation ( http://arxiv.org/abs/2406.07070v1 ) ライセンス: Link先を確認	Wen Luo, Tianshu Shen, Wei Li, Guangyue Peng, Richeng Xuan, Houfeng Wang, Xi Yang,	(参考訳) 大規模言語モデル(LLM)は、自然言語処理(NLP)の分野を著しく進歩させ、多様なタスクにまたがる顕著なパフォーマンスを達成し、幅広い現実世界のアプリケーションを実現する。しかし、LSMは幻覚を起こす傾向があり、確立した知識と矛盾するコンテンツを生成するか、元の情報源に反するコンテンツを生成する。既存の幻覚ベンチマークは、主に文レベルの幻覚検出、対話レベルの評価の無視、幻覚の局所化、合理的な規定に焦点を当てている。また、主に事実の幻覚を標的にしつつ、忠実な幻覚を過小評価し、労働集約的あるいは非専門的評価者に依存している。これらの制約に対処するため,我々は,対話レベルの幻覚自動評価のための総合的な大規模ベンチマークであるHaluDialを提案する。 HalluDialは自然幻覚と帰納幻覚の両方を包含し、事実と忠実な幻覚を包含している。ベンチマークには4,094の対話があり、合計146,856のサンプルが含まれている。 HalluDialを活用することで、情報検索対話におけるLLMの幻覚評価能力を包括的にメタ評価し、特殊な判断言語モデルである HalluJudge を導入する。 HalluDialの高データ品質により、HauJudgeは幻覚評価において優れた、あるいは競争的なパフォーマンスを達成でき、LLMにおける対話レベルの幻覚の自動評価を容易にし、この現象に関する貴重な洞察を提供することができる。データセットとコードはhttps://github.com/FlagOpen/HalluDial.comで公開されている。 Large Language Models (LLMs) have significantly advanced the field of Natural Language Processing (NLP), achieving remarkable performance across diverse tasks and enabling widespread real-world applications. However, LLMs are prone to hallucination, generating content that either conflicts with established knowledge or is unfaithful to the original sources. Existing hallucination benchmarks primarily focus on sentence- or passage-level hallucination detection, neglecting dialogue-level evaluation, hallucination localization, and rationale provision. They also predominantly target factuality hallucinations while underestimating faithfulness hallucinations, often relying on labor-intensive or non-specialized evaluators. To address these limitations, we propose HalluDial, the first comprehensive large-scale benchmark for automatic dialogue-level hallucination evaluation. HalluDial encompasses both spontaneous and induced hallucination scenarios, covering factuality and faithfulness hallucinations. The benchmark includes 4,094 dialogues with a total of 146,856 samples. Leveraging HalluDial, we conduct a comprehensive meta-evaluation of LLMs' hallucination evaluation capabilities in information-seeking dialogues and introduce a specialized judge language model, HalluJudge. The high data quality of HalluDial enables HalluJudge to achieve superior or competitive performance in hallucination evaluation, facilitating the automatic assessment of dialogue-level hallucinations in LLMs and providing valuable insights into this phenomenon. The dataset and the code are available at https://github.com/FlagOpen/HalluDial.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# ステートフルファジィは本当に混み合っているのか? Is Stateful Fuzzing Really Challenging? ( http://arxiv.org/abs/2406.07071v1 ) ライセンス: Link先を確認	Cristian Daniele,	(参考訳) ファジィングはソフトウェアの脆弱性を見つけるのに非常に効果的であることが証明されている。ファジィステートレスシステムに関しては、アナリストの判断に疑いの余地はない。実際、過去20年間に考案されたステートレスファズナーの多さの中で、AFL(AFL++とLibAFL)は、その有効性、速度、バグを見つける能力のために立ち上がった。一方、ステートフルなシステムを扱う場合、何が最適なツールなのかは明らかではない。実際、研究コミュニティは、効果的で汎用的なステートフルなファザーを考案(そしてベンチマーク)するのに苦労しています。本稿では,ステートフルファジィファジィの考案とベンチマークが難しい理由について論じる。 Fuzzing has been proven extremely effective in finding vulnerabilities in software. When it comes to fuzz stateless systems, analysts have no doubts about the choice to make. In fact, among the plethora of stateless fuzzers devised in the last 20 years, AFL (with its descendants AFL++ and LibAFL) stood up for its effectiveness, speed and ability to find bugs. On the other hand, when dealing with stateful systems, it is not clear what is the best tool to use. In fact, the research community struggles to devise (and benchmark) effective and generic stateful fuzzers. In this short paper, we discuss the reasons that make stateful fuzzers difficult to devise and benchmark.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# 変分量子学習モデルの訓練可能性と定式化の関係について On the relation between trainability and dequantization of variational quantum learning models ( http://arxiv.org/abs/2406.07072v1 ) ライセンス: Link先を確認	Elies Gil-Fuster, Casper Gyurik, Adrián Pérez-Salinas, Vedran Dunjko,	(参考訳) 変分量子機械学習(QML)の成功の探求は、古典的な機械学習におけるニューラルネットワークの類似として、適切なパラメタライズド量子回路(PQC)の設計に依存している。成功したQMLモデルは、トレーニング容易性や非分散化の特性を満たさなければならない。最近の研究は、トレーニング容易性とそのようなモデルのデクエント化の間の複雑な相互作用を強調しているが、未解決のままである。本研究は、機械学習の観点からこの議論に寄与し、トレーニング容易性や非分散化が相互に排他的でない場合など、多くの結果を特定する。我々はまず、他の文献で見られるものと比較して、関連する概念のより広範な定義を新たに提供することから始めます。これらの正確な定義とモチベーションにより、学習可能性と変分QMLの定式化の関係について検討する。次に、QMLモデルの「変分性」の度合いについても論じ、ハードウェア効率の良いアンサッツ法や量子カーネル法のようなモデルを区別する。最後に,PQCベースのQMLモデルを構築するためのレシピを紹介した。このようなモデルの実用性には対処しない。しかしながら、我々の研究は、より一般的な構造を見つけるための道のりを指している。 The quest for successful variational quantum machine learning (QML) relies on the design of suitable parametrized quantum circuits (PQCs), as analogues to neural networks in classical machine learning. Successful QML models must fulfill the properties of trainability and non-dequantization, among others. Recent works have highlighted an intricate interplay between trainability and dequantization of such models, which is still unresolved. In this work we contribute to this debate from the perspective of machine learning, proving a number of results identifying, among others when trainability and non-dequantization are not mutually exclusive. We begin by providing a number of new somewhat broader definitions of the relevant concepts, compared to what is found in other literature, which are operationally motivated, and consistent with prior art. With these precise definitions given and motivated, we then study the relation between trainability and dequantization of variational QML. Next, we also discuss the degrees of "variationalness" of QML models, where we distinguish between models like the hardware efficient ansatz and quantum kernel methods. Finally, we introduce recipes for building PQC-based QML models which are both trainable and nondequantizable, and corresponding to different degrees of variationalness. We do not address the practical utility for such models. Our work however does point toward a way forward for finding more general constructions, for which finding applications may become feasible.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# 高精度神経オンコロジーのための統合モデリングによるマルチモーダル学習 Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology ( http://arxiv.org/abs/2406.07078v1 ) ライセンス: Link先を確認	Huahui Yi, Xiaofei Wang, Kang Li, Chao Li,	(参考訳) マルチモーダル学習は、組織像とゲノム学を統合することで、顕微鏡や分子レベルでの包括的な視点で精度の高い腫瘍学を強化することを約束している。しかし、既存の手法では、より効果的な統合のために共有情報や補完情報を十分にモデル化することはできない。本研究では,階層型アテンション構造を用いた統一モデリング強化マルチモーダルラーニング(UMEML)フレームワークを導入し,ヒストロジーとゲノミクスの両モードの共通性と相補的特徴を効果的に活用する。具体的には、モダリティの不均衡から一様バイアスを緩和するために、クエリベースのクロスアテンション機構を用いて、病理エンコーダのプロトタイプクラスタリングを行う。プロトタイプの割り当てとモジュラリティ戦略は,共有機能の整合とモダリティギャップの最小化のために設計されている。学習可能なトークンを付加した登録機構を導入し、マルチモーダル統一モデリングにおけるクロスモーダルな特徴統合とロバスト性を高める。本研究は, グリオーマ診断および予後タスクにおける従来の最先端手法を超越し, 精度の高い神経腫瘍学における優位性を実証するものである。 Multimodal learning, integrating histology images and genomics, promises to enhance precision oncology with comprehensive views at microscopic and molecular levels. However, existing methods may not sufficiently model the shared or complementary information for more effective integration. In this study, we introduce a Unified Modeling Enhanced Multimodal Learning (UMEML) framework that employs a hierarchical attention structure to effectively leverage shared and complementary features of both modalities of histology and genomics. Specifically, to mitigate unimodal bias from modality imbalance, we utilize a query-based cross-attention mechanism for prototype clustering in the pathology encoder. Our prototype assignment and modularity strategy are designed to align shared features and minimizes modality gaps. An additional registration mechanism with learnable tokens is introduced to enhance cross-modal feature integration and robustness in multimodal unified modeling. Our experiments demonstrate that our method surpasses previous state-of-the-art approaches in glioma diagnosis and prognosis tasks, underscoring its superiority in precision neuro-Oncology.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# DARA:知識グラフに対する質問応答のための分解アライメント推論型自律言語エージェント DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs ( http://arxiv.org/abs/2406.07080v1 ) ライセンス: Link先を確認	Haishuo Fang, Xiaodan Zhu, Iryna Gurevych,	(参考訳) KGQA(Answering Questions over Knowledge Graphs)は、さまざまなリアルタイムアプリケーションにおいて、十分に機能する自律言語エージェントの鍵である。 KGQAにおけるLarge Language Models (LLM) を利用した言語エージェントのニューラルシンボリック推論能力を改善するために,DARA(Decomposition Alignment-Reasoning Agent)フレームワークを提案する。 DARAは、高レベルの反復的タスク分解と低レベルのタスク基底という2つのメカニズムを通じて、質問を形式的なクエリに効果的に解析する。重要な点として、DARAは少数の高品質な推論軌道で効率的に訓練することができる。実験結果から, LLM上でのDARA微調整(Llama-2-7B, Mistral)は, GPT-4と代替微調整エージェントの両方で, ゼロショット評価の異なるベンチマークにおいて, テキスト内学習ベースエージェントよりも優れており, 現実のアプリケーションではそのようなモデルがより利用しやすいことがわかった。また、DARAはKGQAの最先端列挙およびランク付けに基づく手法に匹敵する性能が得られることを示す。 Answering Questions over Knowledge Graphs (KGQA) is key to well-functioning autonomous language agents in various real-life applications. To improve the neural-symbolic reasoning capabilities of language agents powered by Large Language Models (LLMs) in KGQA, we propose the DecompositionAlignment-Reasoning Agent (DARA) framework. DARA effectively parses questions into formal queries through a dual mechanism: high-level iterative task decomposition and low-level task grounding. Importantly, DARA can be efficiently trained with a small number of high-quality reasoning trajectories. Our experimental results demonstrate that DARA fine-tuned on LLMs (e.g. Llama-2-7B, Mistral) outperforms both in-context learning-based agents with GPT-4 and alternative fine-tuned agents, across different benchmarks in zero-shot evaluation, making such models more accessible for real-life applications. We also show that DARA attains performance comparable to state-of-the-art enumerating-and-ranking-based methods for KGQA.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# 文脈内学習を用いた文書レベル機械翻訳のための大規模言語モデルの効率的な探索 Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning ( http://arxiv.org/abs/2406.07081v1 ) ライセンス: Link先を確認	Menglong Cui, Jiangcun Du, Shaolin Zhu, Deyi Xiong,	(参考訳) 大規模言語モデル(LLM)は、文脈内学習による機械翻訳において優れた性能を示す。文レベルの翻訳とは対照的に、LLMによる文書レベルの翻訳(DOCMT)は、文脈内学習に基づく2つの大きな課題に直面している。これらの問題に対処するために,LLM がコンテキスト内学習を通じてより正確で凝集的で一貫性のある翻訳を生成できる Context-Aware Prompting Method (CAP) を提案する。 CAPは多段階の注意を考慮に入れ、関連のある文を現在の文にコンテキストとして選択し、収集した文から要約を生成する。その後、要約に最もよく似た文は、データストアからデモとして検索され、LLMを効果的にガイドし、凝集性およびコヒーレントな翻訳を生成する。我々は様々なDOCMTタスクにわたる広範囲な実験を行い、その結果、特にゼロ代名詞翻訳(ZPT)や文学翻訳タスクにおいて、我々のアプローチの有効性を実証した。 Large language models (LLMs) exhibit outstanding performance in machine translation via in-context learning. In contrast to sentence-level translation, document-level translation (DOCMT) by LLMs based on in-context learning faces two major challenges: firstly, document translations generated by LLMs are often incoherent; secondly, the length of demonstration for in-context learning is usually limited. To address these issues, we propose a Context-Aware Prompting method (CAP), which enables LLMs to generate more accurate, cohesive, and coherent translations via in-context learning. CAP takes into account multi-level attention, selects the most relevant sentences to the current one as context, and then generates a summary from these collected sentences. Subsequently, sentences most similar to the summary are retrieved from the datastore as demonstrations, which effectively guide LLMs in generating cohesive and coherent translations. We conduct extensive experiments across various DOCMT tasks, and the results demonstrate the effectiveness of our approach, particularly in zero pronoun translation (ZPT) and literary translation tasks.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# ブラックボックス変分推論における効率的な混合学習 Efficient Mixture Learning in Black-Box Variational Inference ( http://arxiv.org/abs/2406.07083v1 ) ライセンス: Link先を確認	Alexandra Hotti, Oskar Kviman, Ricky Molén, Víctor Elvira, Jens Lagergren,	(参考訳) ブラックボックス変分推定 (BBVI) における混合変分分布は, 課題密度推定タスクにおいて顕著な結果を示した。しかし、現在、混合成分の数を拡大することで、学習可能なパラメータの数の増加と、エビデンスローバウンド(ELBO)の評価による推論時間の2次増加につながる可能性がある。私たちの2つの重要なコントリビューションは、これらの制限に対処しています。まず、入力から混合パラメータ空間へのマッピングを1ホット符号化を用いて記憶する、MISVAE(Multiple Importance Smpling Variational Autoencoder)を紹介する。幸いなことに、MISVAEでは、各混合成分がネットワークパラメータの無視的な増加を引き起こす。第2に,BBVIの混合液に対するELBOの新しい推定器を2つ構築し,評価時間を大幅に短縮し,性能への影響も改善した。まとめると、我々のコントリビューションは、何百もの混合コンポーネントへのスケーラビリティを可能にし、以前のMixture VAEと比較してネットワークパラメータが少なく、短時間で優れた推定性能を提供する。 MNIST上でMISVAE実験を行い, 驚くべきSOTA結果を得た。さらに、ベイジアン系統推定を含む他のBBVI設定における推定器を実証的に検証し、8つのデータセット上でのSOTA混合モデルの推定時間を改善する。 Mixture variational distributions in black box variational inference (BBVI) have demonstrated impressive results in challenging density estimation tasks. However, currently scaling the number of mixture components can lead to a linear increase in the number of learnable parameters and a quadratic increase in inference time due to the evaluation of the evidence lower bound (ELBO). Our two key contributions address these limitations. First, we introduce the novel Multiple Importance Sampling Variational Autoencoder (MISVAE), which amortizes the mapping from input to mixture-parameter space using one-hot encodings. Fortunately, with MISVAE, each additional mixture component incurs a negligible increase in network parameters. Second, we construct two new estimators of the ELBO for mixtures in BBVI, enabling a tremendous reduction in inference time with marginal or even improved impact on performance. Collectively, our contributions enable scalability to hundreds of mixture components and provide superior estimation performance in shorter time, with fewer network parameters compared to previous Mixture VAEs. Experimenting with MISVAE, we achieve astonishing, SOTA results on MNIST. Furthermore, we empirically validate our estimators in other BBVI settings, including Bayesian phylogenetic inference, where we improve inference times for the SOTA mixture model on eight data sets.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# ゲーム開発における失敗解析のための大規模言語モデルの活用 Leveraging Large Language Models for Efficient Failure Analysis in Game Development ( http://arxiv.org/abs/2406.07084v1 ) ライセンス: Link先を確認	Leonardo Marini, Linus Gisslén, Alessandro Sestini,	(参考訳) ゲーム、特にソフトウェア開発の分野では、バグの早期発見が最終製品の品質を維持する上で不可欠です。自動テストは、定期的に実行することで、開発の早い段階で問題に対処できる強力なツールです。例えば、新しいコードがコードベースに提出されると、新しい自動テストがこれらの変更を検証する。しかし、テストの失敗の原因となる特定の変更を特定することは、変更のバッチを扱う場合、特にAAAゲームのような大規模なプロジェクトでは、何千人もの人々が単一のコードベースに貢献する場合には、難しくなります。本稿では,テストの失敗の原因となるコードの変更を自動的に識別する手法を提案する。このメソッドは、LLM(Large Language Models)を利用して、エラーメッセージと対応するコード変更を関連付ける。定量的および定性的な評価によるアプローチの有効性について検討する。当社のアプローチは新たに作成したデータセットで71%の精度に達しています。当社は、開発者の観点からツールの有用性とユーザビリティを評価するために、ユーザスタディを通じてモデルをさらに評価し、その結果、問題の調査に費やした時間(最大60%)を大幅に削減しました。 In games, and more generally in the field of software development, early detection of bugs is vital to maintain a high quality of the final product. Automated tests are a powerful tool that can catch a problem earlier in development by executing periodically. As an example, when new code is submitted to the code base, a new automated test verifies these changes. However, identifying the specific change responsible for a test failure becomes harder when dealing with batches of changes -- especially in the case of a large-scale project such as a AAA game, where thousands of people contribute to a single code base. This paper proposes a new approach to automatically identify which change in the code caused a test to fail. The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure. We investigate the effectiveness of our approach with quantitative and qualitative evaluations. Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year. We further evaluated our model through a user study to assess the utility and usability of the tool from a developer perspective, resulting in a significant reduction in time -- up to 60% -- spent investigating issues.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# CAT : 多臓器・腫瘍切除のための解剖学的手技の調整 CAT: Coordinating Anatomical-Textual Prompts for Multi-Organ and Tumor Segmentation ( http://arxiv.org/abs/2406.07085v1 ) ライセンス: Link先を確認	Zhongzhen Huang, Yankai Jiang, Rongzhao Zhang, Shaoting Zhang, Xiaofan Zhang,	(参考訳) 既存の医用画像領域における即時分割法は、主にテキストまたは視覚的プロンプトで関連オブジェクトを分割するが、腫瘍のような医学画像の異常に対処する際には、しばしば不足する。医学的シナリオの複雑さとテキストまたは視覚的プロンプトの限界を認識し, 視覚的およびテキスト的プロンプトの相補的強度を利用して, 様々な臓器や腫瘍を分節する新しい二重プロンプトスキーマを提案する。具体的には、医学領域の知識に富んだテキストによる3Dトリミング画像から得られる解剖学的プロンプトをコーディネートする革新的なモデルであるCATを紹介する。モデルアーキテクチャは一般的なクエリベースの設計を採用しており、プロンプトクエリはマスク予測のためのセグメンテーションクエリを容易にする。統合されたフレームワーク内で2つのタイプのプロンプトを相乗化するために,2つのタイプのプロンプトをアンタングしながらセグメンテーションとプロンプトクエリの両方を洗練するShareRefinerを実装した。 10のパブリックCTデータセットからなるコンソーシアムでトレーニングされたCATは、複数のセグメンテーションタスクにおいて優れたパフォーマンスを示している。特別な社内データセットのさらなる検証により、複数のがんステージにまたがる腫瘍のセグメンテーション能力が明らかになる。このアプローチは、マルチモーダルプロンプトのコーディネートが、医療領域における複雑なシナリオに対処するための有望な道であることを確認した。 Existing promptable segmentation methods in the medical imaging field primarily consider either textual or visual prompts to segment relevant objects, yet they often fall short when addressing anomalies in medical images, like tumors, which may vary greatly in shape, size, and appearance. Recognizing the complexity of medical scenarios and the limitations of textual or visual prompts, we propose a novel dual-prompt schema that leverages the complementary strengths of visual and textual prompts for segmenting various organs and tumors. Specifically, we introduce CAT, an innovative model that Coordinates Anatomical prompts derived from 3D cropped images with Textual prompts enriched by medical domain knowledge. The model architecture adopts a general query-based design, where prompt queries facilitate segmentation queries for mask prediction. To synergize two types of prompts within a unified framework, we implement a ShareRefiner, which refines both segmentation and prompt queries while disentangling the two types of prompts. Trained on a consortium of 10 public CT datasets, CAT demonstrates superior performance in multiple segmentation tasks. Further validation on a specialized in-house dataset reveals the remarkable capacity of segmenting tumors across multiple cancer stages. This approach confirms that coordinating multimodal prompts is a promising avenue for addressing complex scenarios in the medical domain.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# 相対論的QED計算のための2粒子基底集合上の一粒子作用素表現 One-particle operator representation over two-particle basis sets for relativistic QED computations ( http://arxiv.org/abs/2406.07086v1 ) ライセンス: Link先を確認	Péter Hollósy, Péter Jeszenszki, Edit Mátyus,	(参考訳) この研究は 2-スピン-1/2-フェルミオン相対論的量子力学に関係しており、相互作用エネルギーの良好な数値収束に必要な2つの(マニー)-粒子の「特殊相関」基底表現を用いた一粒子プロジェクターの構築に関するものである。一粒子作用素の忠実な表現は、中間的ではあるが本質的な計算段階に現れるが、物理的に関係する反対称部分空間を超えてヒルベルト空間全体を考慮し、多粒子基底上に構築できることが示されている。この発展の応用は、相関相対論的基準状態に対する量子-電気力学補正の計算と、他の2粒子投影技術が信頼できない中-高-高-Z$ヘリウム系の高精度相対論的計算を予見することができる。 This work is concerned with two-spin-1/2-fermion relativistic quantum mechanics, and it is about the construction of one-particle projectors using an inherently two(many)-particle, `explicitly correlated' basis representation, necessary for good numerical convergence of the interaction energy. It is demonstrated that a faithful representation of the one-particle operators, which appear in intermediate but essential computational steps, can be constructed over a many-particle basis set by accounting for the full Hilbert space beyond the physically relevant anti-symmetric subspace. Applications of this development can be foreseen for the computation of quantum-electrodynamics corrections for a correlated relativistic reference state and high-precision relativistic computations of medium-to-high-$Z$ helium-like systems, for which other two-particle projection techniques are unreliable.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# RS-Agent:インテリジェントエージェントによるリモートセンシングタスクの自動化 RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents ( http://arxiv.org/abs/2406.07089v1 ) ライセンス: Link先を確認	Wenjia Xu, Zijian Yu, Yixu Wang, Jiuniu Wang, Mugen Peng,	(参考訳) 近年のLarge Language Models (LLM) と Visual Language Models (VLM) の開発により,リモートセンシングタスクにおいて多くのモデルが大きなパフォーマンスを実現している。しかしながら、これらのモデルは、複雑なリモートセンシングアプリケーションにおいて困難に直面している、基本的なビジョンと言語命令チューニングタスクに制約されている。さらに、これらのモデルは専門分野の専門知識を欠いている。これらの制約に対処するため, RS-Agent という LLM 駆動の知的エージェントを提案する。まず、RS-Agentは"Central Controller"として機能する大規模言語モデル(LLM)を使用しており、様々な問題を理解し、対応することができる。第2に、RS-Agentは多くの高性能リモートセンシング画像処理ツールを統合し、マルチツールとマルチターン会話を容易にする。第3に,我々のRS-Agentは,堅牢な知識文書を活用することで,専門家の質問に答えることができる。我々は,複数のデータセット,例えば RSSDIVCS, RSVQA, DOTAv1 を用いて実験を行った。実験の結果,我々のRS-Agentは,シーン分類,視覚的質問応答,オブジェクトカウントタスクなど,多くのタスクにおいて優れたパフォーマンスを実現していることがわかった。 An increasing number of models have achieved great performance in remote sensing tasks with the recent development of Large Language Models (LLMs) and Visual Language Models (VLMs). However, these models are constrained to basic vision and language instruction-tuning tasks, facing challenges in complex remote sensing applications. Additionally, these models lack specialized expertise in professional domains. To address these limitations, we propose a LLM-driven remote sensing intelligent agent named RS-Agent. Firstly, RS-Agent is powered by a large language model (LLM) that acts as its "Central Controller," enabling it to understand and respond to various problems intelligently. Secondly, our RS-Agent integrates many high-performance remote sensing image processing tools, facilitating multi-tool and multi-turn conversations. Thirdly, our RS-Agent can answer professional questions by leveraging robust knowledge documents. We conducted experiments using several datasets, e.g., RSSDIVCS, RSVQA, and DOTAv1. The experimental results demonstrate that our RS-Agent delivers outstanding performance in many tasks, i.e., scene classification, visual question answering, and object counting tasks.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# AutoTVG: 時間的ビデオグラウンドのための新しいビジョン言語事前学習パラダイム AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding ( http://arxiv.org/abs/2406.07091v1 ) ライセンス: Link先を確認	Xing Zhang, Jiaxi Gu, Haoyu Zhao, Shicong Wang, Hang Xu, Renjing Pei, Songcen Xu, Zuxuan Wu, Yu-Gang Jiang,	(参考訳) テンポラルビデオグラウンディング(TVG)は、言語記述が与えられたビデオから瞬間をローカライズすることを目的としている。近年,テレビGのアノテーションは労働集約的であるため,限られた監督下にあるテレビGは注目されている。視覚言語による事前学習の大きな成功は、TVGに従来の「事前学習+微調整」パラダイムに従うように誘導するが、事前学習プロセスは、事前訓練とテストの間のデータの性質の違いにより、時間的モデリングの欠如と微調整の微調整に悩まされる。さらに、プレテキストとダウンストリームタスクの間に大きなギャップがあるため、事前訓練されたモデルではゼロショットテストは不可能である。従来のパラダイムの欠点を回避するため,自動アノテートビデオからセマンティックアライメントと境界回帰を学習できるTVGのための新しいビジョン言語事前学習パラダイムであるAutoTVGを提案する。具体的に言うと、AutoTVGは、未トリミングビデオからキャプションされた瞬間を生成する新しいCaptioned Moment Generation (CMG)モジュールと、ローカライゼーション結果を予測するリグレッションヘッドを備えたTVGNetで構成されている。 Charades-STAとActivityNet Captionsの実験結果によると、ゼロショットの時間的ビデオグラウンドに関して、AutoTVGは、アウト・オブ・ディストリビューション・テストの下でのイン・ディストリビューション・メソッドと高い競争性能を達成し、トレーニングデータが少ない既存の事前トレーニングフレームワークよりも優れている。 Temporal Video Grounding (TVG) aims to localize a moment from an untrimmed video given the language description. Since the annotation of TVG is labor-intensive, TVG under limited supervision has accepted attention in recent years. The great success of vision-language pre-training guides TVG to follow the traditional "pre-training + fine-tuning" paradigm, however, the pre-training process would suffer from a lack of temporal modeling and fine-grained alignment due to the difference of data nature between pre-train and test. Besides, the large gap between pretext and downstream tasks makes zero-shot testing impossible for the pre-trained model. To avoid the drawbacks of the traditional paradigm, we propose AutoTVG, a new vision-language pre-training paradigm for TVG that enables the model to learn semantic alignment and boundary regression from automatically annotated untrimmed videos. To be specific, AutoTVG consists of a novel Captioned Moment Generation (CMG) module to generate captioned moments from untrimmed videos, and TVGNet with a regression head to predict localization results. Experimental results on Charades-STA and ActivityNet Captions show that, regarding zero-shot temporal video grounding, AutoTVG achieves highly competitive performance with in-distribution methods under out-of-distribution testing, and is superior to existing pre-training frameworks with much less training data.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# 経路表現を伴う表現的記述論理におけるデータ複雑度 Data Complexity in Expressive Description Logics With Path Expressions ( http://arxiv.org/abs/2406.07095v1 ) ライセンス: Link先を確認	Bartosz Bednarczyk,	(参考訳) 擬似フォレスト上の非常に表現力豊かな記述論理ZOIQ(ALCHb Self reg OIQ)に対する満足度問題のデータ複雑性について検討し、NP完全性を確立する。これにより、ZOIQの決定可能なフラグメントに対するデータ複雑性の展望が完成し、OWL2(SRファミリー)の決定可能なフラグメントに関する既知の結果が改善される。同じ手法を用いて、ZIQ におけるルートクエリのentailment問題における coNEXPTIME-completeness (w.r.t.) を確立する。 We investigate the data complexity of the satisfiability problem for the very expressive description logic ZOIQ (a.k.a. ALCHb Self reg OIQ) over quasi-forests and establish its NP-completeness. This completes the data complexity landscape for decidable fragments of ZOIQ, and reproves known results on decidable fragments of OWL2 (SR family). Using the same technique, we establish coNEXPTIME-completeness (w.r.t. the combined complexity) of the entailment problem of rooted queries in ZIQ.	翻訳日:2024-06-12 16:44:39 公開日:2024-06-11
# CTCに基づく単語スポッターを用いたCTCおよびTransducer ASRモデルの高速コンテキストバイアス Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter ( http://arxiv.org/abs/2406.07096v1 ) ライセンス: Link先を確認	Andrei Andrusenko, Aleksandr Laptev, Vladimir Bataev, Vitaly Lavrukhin, Boris Ginsburg,	(参考訳) 文脈的自動音声認識(ASR)システムでは,稀な単語と新単語の正確な認識が問題となっている。ほとんどの文脈バイアス法は、ASRモデルやビーム探索復号アルゴリズムの変更、モデルの再利用の複雑化、推論の減速を含む。本研究は,CTC と Transducer (RNN-T) ASR モデルのための CTC ベースの Word Spotter (CTC-WS) を用いた,文脈バイアスの高速化手法を提案する。提案手法は,CTCログ確率をコンパクトなコンテキストグラフと比較し,潜在的なコンテキストバイアス候補を検出する。有効な候補は、それに対応するフレーム間隔で、グリーディ認識の候補を置き換える。ハイブリッドトランスデューサ-CTCモデルはトランスデューサモデルのCTC-WSアプリケーションを可能にする。その結果,FスコアとWERの同時改善による文脈バイアス認識の高速化が,ベースライン法と比較して有意な向上を示した。提案手法はNVIDIA NeMoツールキットで公開されている。 Accurate recognition of rare and new words remains a pressing problem for contextualized Automatic Speech Recognition (ASR) systems. Most context-biasing methods involve modification of the ASR model or the beam-search decoding algorithm, complicating model reuse and slowing down inference. This work presents a new approach to fast context-biasing with CTC-based Word Spotter (CTC-WS) for CTC and Transducer (RNN-T) ASR models. The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates. The valid candidates then replace their greedy recognition counterparts in corresponding frame intervals. A Hybrid Transducer-CTC model enables the CTC-WS application for the Transducer model. The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER compared to baseline methods. The proposed method is publicly available in the NVIDIA NeMo toolkit.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# フォノンアシスト励起による二層膜WSe$_2$の高純度・安定単光子発光 High-purity and stable single-photon emission in bilayer WSe$_2$ via phonon-assisted excitation ( http://arxiv.org/abs/2406.07097v1 ) ライセンス: Link先を確認	Claudia Piccinini, Athanasios Paralikis, José Ferreira Neto, Abdulmalik Abdulkadir Madigawa, Paweł Wyborski, Vikas Remesh, Luca Vannucci, Niels Gregersen, Battulga Munkhbat,	(参考訳) 励起スキームは、励起子状態を作成し、崩壊ダイナミクスを定義し、放出された単一光子のスペクトル拡散に影響を与えるため、単一光子源に必須である。本稿では,2層型 WSe$_2$ 量子エミッタの単一光子放出特性に及ぼす異なる光励起方略の影響について検討する。フォノンアシスト励起下では、狭く安定な単一光子放出が得られ、純度は 0.94\pm 0.02\,$ に達する。さらに、崩壊時間は1桁以上の$(16.65 \pm 2.39)\ から$(1.33 \pm 0.04)\ へと減少する。最後に、スペクトル線幅の2倍の減少とともに、抑制されたスペクトルの移動を観察する。本研究は, WSe$_2$ベースの量子エミッタの性能を最適化する上で, 励起法が果たす重要な役割を明らかにするものである。 The excitation scheme is essential for single-photon sources as it prepares the exciton state, defines the decay dynamics, and influences the spectral diffusion of the emitted single photons. Here, we investigate the impact of different optical excitation strategies on the single-photon emission characteristics of bilayer WSe$_2$ quantum emitters. Under phonon-assisted excitation, we achieve narrow and stable single-photon emission with an excellent purity reaching $ 0.94\pm 0.02\,$. Furthermore, the decay time is reduced by more than an order of magnitude from $(16.65 \pm 2.39)\,$ns for above-band excitation to $(1.33 \pm 0.04)\,$ns for phonon-assisted excitation. Finally, we observe a suppressed spectral wandering along with a two-fold reduction of the spectral linewidth. Our comprehensive investigation highlights the critical role of the excitation method in optimizing the performance of WSe$_2$-based quantum emitters.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# ユーザクエリによるカタログ強化のガイド Guiding Catalogue Enrichment with User Queries ( http://arxiv.org/abs/2406.07098v1 ) ライセンス: Link先を確認	Yupei Du, Jacek Golebiowski, Philipp Schmidt, Ziawasch Abedjan,	(参考訳) 知識グラフ(KG)の濃縮技術は、進化する製品カタログに依存する商業アプリケーションにとってますます重要になっている。しかし、潜在的富化の巨大な探索空間のため、KG完成法(KGC)の予測精度は低く、現実世界のカタログでは信頼性が低い。さらに、豊か化の候補事実は、ユーザとの関係も様々である。 KGsにおける不完全三重項の正確な予測は、KGC法の主要な焦点であるが、そのような予測をいつ適用するかは無視されている。製品検索のユースケースに触発され、ユーザ検索行動と製品に関連するユーザプロパティーを用いてカタログの関連補完を生成できる角度に対処する。本稿では,リッチなデータポイントを識別するための直観と,汎用的なKGを用いて,パフォーマンス上のメリットを示す。特に,ユーザクエリからエンティティ-述語ペアを抽出し,そのペアを用いてKGC手法の予測を導出する。本手法は2つの人気のある百科事典KG, DBPedia と YAGO 4 について検討した。自動評価と人的評価の両方の結果から,クエリガイダンスは予測の正確性や妥当性を著しく向上することが示された。 Techniques for knowledge graph (KGs) enrichment have been increasingly crucial for commercial applications that rely on evolving product catalogues. However, because of the huge search space of potential enrichment, predictions from KG completion (KGC) methods suffer from low precision, making them unreliable for real-world catalogues. Moreover, candidate facts for enrichment have varied relevance to users. While making correct predictions for incomplete triplets in KGs has been the main focus of KGC method, the relevance of when to apply such predictions has been neglected. Motivated by the product search use case, we address the angle of generating relevant completion for a catalogue using user search behaviour and the users property association with a product. In this paper, we present our intuition for identifying enrichable data points and use general-purpose KGs to show-case the performance benefits. In particular, we extract entity-predicate pairs from user queries, which are more likely to be correct and relevant, and use these pairs to guide the prediction of KGC methods. We assess our method on two popular encyclopedia KGs, DBPedia and YAGO 4. Our results from both automatic and human evaluations show that query guidance can significantly improve the correctness and relevance of prediction.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# D-GRIL: 2-parameter Persistence を用いたエンド・ツー・エンドトポロジカルラーニング D-GRIL: End-to-End Topological Learning with 2-parameter Persistence ( http://arxiv.org/abs/2406.07100v1 ) ライセンス: Link先を確認	Soham Mukherjee, Shreyas N. Samaga, Cheng Xin, Steve Oudot, Tamal K. Dey,	(参考訳) 1パラメータ永続性を用いたエンドツーエンドのトポロジ学習はよく知られている。 GRILと呼ばれる最近導入された2パラメータ永続性に基づくベクトル化手法を用いて,2パラメータ永続性を用いてフレームワークを拡張可能であることを示す。我々は,D-GRILを生産するGRILを識別する理論的基盤を確立する。 D-GRILは,標準ベンチマークグラフデータセット上での2次フィルタ関数の学習に利用できることを示す。さらに, この枠組みは, 薬物発見における生物活性予測の文脈において適用可能であることを示す。 End-to-end topological learning using 1-parameter persistence is well-known. We show that the framework can be enhanced using 2-parameter persistence by adopting a recently introduced 2-parameter persistence based vectorization technique called GRIL. We establish a theoretical foundation of differentiating GRIL producing D-GRIL. We show that D-GRIL can be used to learn a bifiltration function on standard benchmark graph datasets. Further, we exhibit that this framework can be applied in the context of bio-activity prediction in drug discovery.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# MR-RawNet:原波形を用いた可変時間発話のための複数時間分解能話者検証システム MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms ( http://arxiv.org/abs/2406.07103v1 ) ライセンス: Link先を確認	Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-Jin Yu,	(参考訳) 話者認証システムでは、短い発話の利用が持続的な課題となり、主に話者を特徴づける音声情報が不十分なため、性能が低下する。この障害を克服するために、原波形を用いた可変長発声に対する話者検証システムの堅牢性を高めるために、新しい構造であるMR-RawNetを提案する。 MR-RawNetは、時間分解能とスペクトル分解能の両方を最適に調整する多分解能特徴抽出器を介して生波形から時間周波数表現を抽出する。さらに,多種多様な時間的文脈に着目し,発話長の変化に対する頑健性を確保するマルチレゾリューション・アテンション・ブロックを適用した。 VoxCeleb1データセットを用いて行った実験結果から,MR-RawNetは,他の生波形ベースシステムと比較して,可変長の発話に優れた性能を示すことが示された。 In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw waveforms. The MR-RawNet extracts time-frequency representations from raw waveforms via a multi-resolution feature extractor that optimally adjusts both temporal and spectral resolutions simultaneously. Furthermore, we apply a multi-resolution attention block that focuses on diverse and extensive temporal contexts, ensuring robustness against changes in utterance length. The experimental results, conducted on VoxCeleb1 dataset, demonstrate that the MR-RawNet exhibits superior performance in handling utterances of variable duration compared to other raw waveform-based systems.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# アグノスティックシャープネスの最小化 Agnostic Sharpness-Aware Minimization ( http://arxiv.org/abs/2406.07107v1 ) ライセンス: Link先を確認	Van-Anh Nguyen, Quyen Tran, Tuan Truong, Thanh-Toan Do, Dinh Phung, Trung Le,	(参考訳) シャープネスを意識した最小化(SAM)は、トレーニング損失とロスランドスケープのシャープネスの両方を最小化することで、ディープニューラルネットワークトレーニングの改善に役立っている。別の側面として、モデルに依存しないメタラーニング(MAML)はモデルの適応性を改善するために設計されたフレームワークである。 MAMLは、最小限の微調整ステップで複数のタスクへの迅速な適応に適したメタモデルセットを最適化し、限られたデータでうまく一般化できる。本研究では,SAMとMAMLの関連性,特にモデル一般化の強化について検討する。我々はSAMとMAMLの両方の原則を組み合わせた新しいアプローチであるAgnostic-SAMを紹介する。 Agnostic-SAMは、トレーニングデータを使用してモデルをより広い局所最小化に向けて最適化し、検証データに対する損失値の低減を同時に維持することでSAMの中核的な考え方に適応する。これにより、小さな摂動に頑丈なだけでなく、データ分散シフト問題にも弱いフラットなミニマを求める。実験の結果,Agnostic-SAMは,ノイズラベルやデータ制限といった問題条件下で,さまざまなデータセットのベースラインに対する一般化を著しく改善することが示された。 Sharpness-aware minimization (SAM) has been instrumental in improving deep neural network training by minimizing both the training loss and the sharpness of the loss landscape, leading the model into flatter minima that are associated with better generalization properties. In another aspect, Model-Agnostic Meta-Learning (MAML) is a framework designed to improve the adaptability of models. MAML optimizes a set of meta-models that are specifically tailored for quick adaptation to multiple tasks with minimal fine-tuning steps and can generalize well with limited data. In this work, we explore the connection between SAM and MAML, particularly in terms of enhancing model generalization. We introduce Agnostic-SAM, a novel approach that combines the principles of both SAM and MAML. Agnostic-SAM adapts the core idea of SAM by optimizing the model towards wider local minima using training data, while concurrently maintaining low loss values on validation data. By doing so, it seeks flatter minima that are not only robust to small perturbations but also less vulnerable to data distributional shift problems. Our experimental results demonstrate that Agnostic-SAM significantly improves generalization over baselines across a range of datasets and under challenging conditions such as noisy labels and data limitation.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# ルビー格子上の古典スピン液体の再正規化 The renormalized classical spin liquid on the ruby lattice ( http://arxiv.org/abs/2406.07110v1 ) ライセンス: Link先を確認	Zhenjiu Wang, Lode Pollet,	(参考訳) ルビー格子上の動的に調製されたZ_2$量子スピン液体の開始を実験的に検出したことにより、フラストレーションされた磁気学と格子ゲージ理論の物理学は、Rydberg tweezer配列(Semeghini et al, Science 374, 1242 (2021))へと持ち込まれた。このようなモデルの熱力学的性質は依然として不十分であるが、大きな、堅牢で長寿命の量子スピン液体を準備したい場合、その知識は不可欠である。大規模な量子モンテカルロシミュレーションを用いて、PXPモデルにおいて、非正規化された古典スピン液体のエントロピー密度が一定である$S/N$に近づくと、$\ln(2)/6$がデチューニングする$T/\Omega \sim 0.5$から$T/\Omega \sim 0.01$に近づき、(ラビ周波数$\Omega$の単位)から、シミュレーションできる最低温度まで、$T/\Omega \sim 0.01$となる。ファン・デル・ワールス相互作用では、定数エントロピープラトーは依然として発見されているが、その値は$\delta$で変化する。我々は、電気的自由度に対する動的傾斜に対する断熱近似にコメントする。 The recent experimental detection of the onset of a dynamically prepared, gapped $Z_2$ quantum spin liquid on the ruby lattice brought the physics of frustrated magnetism and lattice gauge theory to Rydberg tweezer arrays (Semeghini et al, Science 374, 1242 (2021)). The thermodynamic properties of such models remain inadequately addressed, yet knowledge thereof is indispensable if one wants to prepare large, robust, and long-lived quantum spin liquids. Using large scale quantum Monte Carlo simulations we find in the PXP model a renormalized classical spin liquid with constant entropy density $S/N$ approaching $\ln(2)/6$ in the thermodynamic limit for all moderate and large values of the detuning $\delta$ and starting from $T/\Omega \sim 0.5$ (in units of the Rabi frequency $\Omega$) down to the lowest temperatures we could simulate, $T/\Omega \sim 0.01$. With Van der Waals interactions, constant entropy plateaus are still found but its value shifts with $\delta$. We comment the adiabatic approximation to the dynamical ramps for the electric degrees of freedom.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# NeRSP: スパース偏光画像を用いた反射物体のニューラル3次元再構成 NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images ( http://arxiv.org/abs/2406.07111v1 ) ライセンス: Link先を確認	Yufei Han, Heng Guo, Koki Fukai, Hiroaki Santo, Boxin Shi, Fumio Okura, Zhanyu Ma, Yunpeng Jia,	(参考訳) スパース偏光画像を用いた反射面のニューラル3次元再構成手法NeRSPを提案する。反射面再構成は、鏡面反射がビューに依存しているため非常に難しいため、マルチビューステレオのマルチビュー整合性に反する。一方、スパース画像入力は、実際のキャプチャ設定として、対応マッチングの欠如により、通常不完全または歪んだ結果を引き起こす。本稿では,偏光画像を利用して,スパース入力と反射面の課題を共同処理する。我々は、暗黙のニューラル表現によってモデル化された表面形状を協調的に最適化する、偏光画像形成モデルと多視点方位整合性から、測光的および幾何学的手がかりを導出する。合成および実データを用いた実験により,6つのビューのみを入力として,最先端の表面再構成結果が得られた。 We present NeRSP, a Neural 3D reconstruction technique for Reflective surfaces with Sparse Polarized images. Reflective surface reconstruction is extremely challenging as specular reflections are view-dependent and thus violate the multiview consistency for multiview stereo. On the other hand, sparse image inputs, as a practical capture setting, commonly cause incomplete or distorted results due to the lack of correspondence matching. This paper jointly handles the challenges from sparse inputs and reflective surfaces by leveraging polarized images. We derive photometric and geometric cues from the polarimetric image formation model and multiview azimuth consistency, which jointly optimize the surface geometry modeled via implicit neural representation. Based on the experiments on our synthetic and real datasets, we achieve the state-of-the-art surface reconstruction results with only 6 views as input.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# Beyond Bare Queries: 3D Scene Graphによるオープン語彙オブジェクト検索 Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph ( http://arxiv.org/abs/2406.07113v1 ) ライセンス: Link先を確認	Sergey Linok, Tatiana Zemskova, Svetlana Ladanova, Roman Titkov, Dmitry Yudin,	(参考訳) 自然言語で言及されたオブジェクトの配置は、自律的なエージェントにとって大きな課題となる。既存のCLIPベースのオープンボキャブラリ手法は,単純なクエリによる3次元オブジェクトの検索に成功しているが,オブジェクト関係の理解を求める曖昧な記述には対応できない。そこで,この問題を解決するためにBBQ (Beyond Bare Queries) と呼ばれるモジュラー手法を提案する。この手法は3次元空間グラフ表現を計量エッジで構築し,提案アルゴリズムを用いて大規模言語モデルを人対エージェントインタフェースとして利用する。 BBQは、3Dオブジェクトを形成するためにDINOを使ったロバストなアソシエーション、それらを2Dに投影する高度なレイキャストアルゴリズム、グラフノードとして記述するビジョン言語モデルを採用している。 Replica と ScanNet のデータセットでは,設計手法が3次元オブジェクト中心の地図を正確に構築できることが示されている。オープンな3次元セマンティックセマンティックセグメンテーションにおいて,他のゼロショット手法に対して,その品質が重要な位置を占めることを実証した。また,同じ意味クラスの複数の実体を含む場面において,空間的関係の活用が特に有効であることを示す。 Sr3D と Nr3D のベンチマークでは、提案手法は、他の最先端手法と比較して、複雑なクエリによるオブジェクトの検索を可能にした。設計ソリューションを考えると、最も近いアナログの約x3倍の処理速度を達成した。この有望なパフォーマンスは、応用インテリジェントロボティクスプロジェクトにおける私たちのアプローチの活用を可能にします。コードをlinukc.github.io/bbq/で公開しています。 Locating objects referred to in natural language poses a significant challenge for autonomous agents. Existing CLIP-based open-vocabulary methods successfully perform 3D object retrieval with simple (bare) queries but cannot cope with ambiguous descriptions that demand an understanding of object relations. To tackle this problem, we propose a modular approach called BBQ (Beyond Bare Queries), which constructs 3D scene spatial graph representation with metric edges and utilizes a large language model as a human-to-agent interface through our deductive scene reasoning algorithm. BBQ employs robust DINO-powered associations to form 3D objects, an advanced raycasting algorithm to project them to 2D, and a vision-language model to describe them as graph nodes. On Replica and ScanNet datasets, we show that the designed method accurately constructs 3D object-centric maps. We have demonstrated that their quality takes a leading place for open-vocabulary 3D semantic segmentation against other zero-shot methods. Also, we show that leveraging spatial relations is especially effective for scenes containing multiple entities of the same semantic class. On Sr3D and Nr3D benchmarks, our deductive approach demonstrates a significant improvement, enabling retrieving objects by complex queries compared to other state-of-the-art methods. Considering our design solutions, we achieved a processing speed approximately x3 times faster than the closest analog. This promising performance enables our approach for usage in applied intelligent robotics projects. We make the code publicly available at linukc.github.io/bbq/.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# 革新的・没入的デジタルケアにおけるメタバースの可能性 Unlocking the Potential of the Metaverse for Innovative and Immersive Digital Care ( http://arxiv.org/abs/2406.07114v1 ) ライセンス: Link先を確認	Fatemeh Ebrahimzadeh, Ramin Safa,	(参考訳) 永続的で没入的な仮想環境であるMetaverseは、患者のケア、医療教育、研究を変革することで、医療に革命をもたらす大きな可能性を秘めている。本稿では,このトランスフォーメーション技術にかかわる応用,メリット,課題について考察し,患者のエンゲージメント,コミュニケーション,情報へのアクセス,健康状態を改善する能力を強調した。また、機械学習技術を用いたMetaverseデータの解析によって、洞察を解き明かし、医療アプリケーションをさらに強化する方法について検討する。この議論は、重要な知見を要約し、メタバース統合の重要性と実践的意味を分析し、今後の研究領域を特定する。それは、Metaverseベースのソリューションの開発における大手テック企業の役割と、医療におけるこの技術の変革的ポテンシャルを解き放つための新たな機会と課題に対処することの重要性を強調している。論文は、これらの技術の倫理的かつ効果的な実装を保証するためにステークホルダー間の協力の必要性を強調し、最終的にはよりアクセスしやすく、パーソナライズされ、効率的な医療システムへと繋がる、と結論付けている。 The Metaverse, a persistent, immersive virtual environment, has the immense potential to revolutionize healthcare by transforming patient care, medical education, and research. This paper explores the applications, benefits, and challenges associated with this transformative technology, highlighting its ability to improve patient engagement, communication, access to information, and health outcomes. The paper also examines how the analysis of Metaverse data using machine learning techniques can unlock insights to further enhance healthcare applications. The discussion summarizes key findings, analyzes the significance and practical implications of Metaverse integration, and identifies areas for future research. It underscores the role of major tech companies in developing Metaverse-based solutions and the importance of addressing emerging opportunities and challenges to unlock the transformative potential of this technology in healthcare. The paper concludes by emphasizing the need for collaboration between stakeholders to ensure the ethical and effective implementation of these technologies, ultimately leading to a more accessible, personalized, and efficient healthcare system.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# ツール強化された大規模言語モデルの拡張:推論ツリーのエラーからの洞察を統合する Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees ( http://arxiv.org/abs/2406.07115v1 ) ライセンス: Link先を確認	Sijia Chen, Yibo Wang, Yi-Feng Wu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang,	(参考訳) ツール強化された大規模言語モデル(LLM)は、しばしばAPIの形でツールを活用して、複雑なタスクに対する推論能力を高め、現実世界と対話するインテリジェントエージェントの役割を引き継ぐ。 Qinらによる最近導入されたToolLLaMAモデルは、16000ドル以上の実世界のAPIを推論するためにdeep-first search-based decision tree (DFSDT)メソッドを使用しており、従来のチェーン推論手法と比較して、ツール拡張LDMの計画と推論のパフォーマンスを効果的に改善している。しかし、彼らのアプローチは、訓練中の教師付き微調整(SFT)のために、決定木(推論木とも呼ばれる)から成功した経路のみを用いており、思考のツリーの利点を十分に活用していない。本研究では,この制限に対処するため,決定木から抽出した選好データに基づく推論軌道最適化フレームワークを提案する。まず,従来見過ごされていた探索に乗じて,思考木から嗜好データを構築する新しい手法を提案する。具体的には、ToolBenchデータセットに基づいたツール使用のための効果的なステップワイズ選好データセットであるToolPreferenceを生成する。その後のトレーニング段階では、まずツール使用専門家の軌跡を用いてLSMを微調整し、次にこれらのステップワイズ・ペアを直接選好最適化(DPO)に使用してLCMのポリシーを更新し、ツールプレファー-LLaMA(TP-LLaMA)モデルを作成します。実験の結果, TP-LLaMAは, 推論ツリーの誤差から洞察を得ることで, ほぼすべてのテストシナリオにおいて, ベースラインをはるかに上回る性能を示し, 未知のAPIでより優れた一般化能力を示すことがわかった。同時にTP-LLaMAはベースラインよりも優れた推論効率を示しており、複雑なツール使用推論タスクに適している。 Tool-augmented large language models (LLMs) leverage tools, often in the form of APIs, to enhance their reasoning capabilities on complex tasks, thus taking on the role of intelligent agents interacting with the real world. The recently introduced ToolLLaMA model by Qin et al. [2024] utilizes the depth-first search-based decision tree (DFSDT) method for reasoning with $16000+$ real-world APIs, which effectively improves the planning and inferencing performance of tool-augmented LLMs compared to traditional chain reasoning approaches. However, their approach only employs successful paths from decision trees (also called inference trees) for supervised fine-tuning (SFT) during training, which does not fully exploit the advantages of the tree of thought. In this study, we propose an inference trajectory optimization framework based on the preference data extracted from decision trees to address this limitation. We first introduce a novel method for constructing preference data from the tree of thought, capitalizing on the failed explorations previously overlooked in the trees. Specifically, we generate an effective step-wise preference dataset, named ToolPreference, for tool use based on the ToolBench dataset. In the subsequent training phase, we first fine-tune the LLM with tool-usage expert trajectories and then use these step-wise preference pairs for direct preference optimization (DPO) to update the policy of the LLM, resulting in our ToolPrefer-LLaMA (TP-LLaMA) model. Our experiments demonstrate that by obtaining insights from errors in inference trees, TP-LLaMA significantly outperforms the baselines across almost all test scenarios by a large margin and exhibits better generalization capabilities with unseen APIs. At the same time, TP-LLaMA has also demonstrated superior reasoning efficiency compared to the baselines, making it more suitable for complex tool-usage reasoning tasks.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# ラベルなしデータによるオフラインRLの拡張 Augmenting Offline RL with Unlabeled Data ( http://arxiv.org/abs/2406.07117v1 ) ライセンス: Link先を確認	Zhao Wang, Briti Gangopadhyay, Jia-Fong Yeh, Shingo Takamatsu,	(参考訳) オフライン強化学習(Offline RL)の最近の進歩は、アウト・オブ・ディストリビューション(OOD)問題に対処するための保守的な政策更新に基づく手法に焦点が当てられている。これらの手法は通常、行動規則化の追加や批判的学習の目的の変更を伴い、主に実質的なデータセットをサポートする状態やアクションに焦点を当てる。しかし、データセットからのアクションや状態の欠如が必ずしもその亜最適性を意味するとは限らないと主張することで、この一般的な概念に挑戦する。本稿では,OOD問題に対する新しいアプローチを提案する。政策類似度尺度を補完するオフラインのRL教師学生フレームワークを導入する。このフレームワークにより、学生の政策は、オフラインのRLデータセットだけでなく、教師の方針によって伝達される知識からも洞察を得ることができる。教師の方針は、状態-作用ペアからなる別のデータセットを用いて訓練され、環境と直接対話することなく、実践的なドメイン知識とみなすことができる。我々は、この追加知識がOOD問題を効果的に解決する鍵だと信じています。本研究は,教師と学生のネットワークをアクター・クリティカルな枠組みに統合し,オフラインRLにおける知識伝達研究の新たな道を開くこと,OOD課題に効果的に対処することの意義を示す。 Recent advancements in offline Reinforcement Learning (Offline RL) have led to an increased focus on methods based on conservative policy updates to address the Out-of-Distribution (OOD) issue. These methods typically involve adding behavior regularization or modifying the critic learning objective, focusing primarily on states or actions with substantial dataset support. However, we challenge this prevailing notion by asserting that the absence of an action or state from a dataset does not necessarily imply its suboptimality. In this paper, we propose a novel approach to tackle the OOD problem. We introduce an offline RL teacher-student framework, complemented by a policy similarity measure. This framework enables the student policy to gain insights not only from the offline RL dataset but also from the knowledge transferred by a teacher policy. The teacher policy is trained using another dataset consisting of state-action pairs, which can be viewed as practical domain knowledge acquired without direct interaction with the environment. We believe this additional knowledge is key to effectively solving the OOD issue. This research represents a significant advancement in integrating a teacher-student network into the actor-critic framework, opening new avenues for studies on knowledge transfer in offline RL and effectively addressing the OOD challenge.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# T2S-GPT:テキストからの自己回帰手話生成のための動的ベクトル量子化 T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text ( http://arxiv.org/abs/2406.07119v1 ) ライセンス: Link先を確認	Aoxiong Yin, Haoyuan Li, Kai Shen, Siliang Tang, Yueting Zhuang,	(参考訳) 本研究では,まず手話列を離散コードに符号化し,学習したコードブックに基づいてテキストから手話を生成する2段階手話生成(SLP)パラダイムを提案する。しかし、既存のベクトル量子化(VQ)法は、手話における不均一な情報密度を見渡す固定長符号化であり、重要な領域のアンダーエンコーディングと重要でない領域の過剰エンコーディングをもたらす。この問題に対処するために,手話における情報密度に基づいて符号化長を動的に調整し,正確かつコンパクトな符号化を実現する,新しい動的ベクトル量子化(DVA-VAE)モデルを提案する。そして、GPTに似たモデルが、音声言語テキストからコードシーケンスとその対応する持続時間を生成することを学習する。 PHOENIX14Tデータセットを用いて大規模な実験を行い,提案手法の有効性を実証した。我々は,手話研究を促進するために,486時間の手話ビデオ,音声,文字起こしテキストを含むドイツ語手話データセットPHOENIX-Newsを提案する。私たちのプロジェクトのホームページはhttps://t2sgpt-demo.yinaoxiong.cnです。 In this work, we propose a two-stage sign language production (SLP) paradigm that first encodes sign language sequences into discrete codes and then autoregressively generates sign language from text based on the learned codebook. However, existing vector quantization (VQ) methods are fixed-length encodings, overlooking the uneven information density in sign language, which leads to under-encoding of important regions and over-encoding of unimportant regions. To address this issue, we propose a novel dynamic vector quantization (DVA-VAE) model that can dynamically adjust the encoding length based on the information density in sign language to achieve accurate and compact encoding. Then, a GPT-like model learns to generate code sequences and their corresponding durations from spoken language text. Extensive experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method. To promote sign language research, we propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.Experimental analysis on PHOENIX-News shows that the performance of our model can be further improved by increasing the size of the training data. Our project homepage is https://t2sgpt-demo.yinaoxiong.cn.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# 非線形結晶における非臨界複屈折と準位相マッチングを併用した小型偏光変換光子源 Compact Polarization-Entangled Photon Source Based on Coexisting Noncritically Birefringent and Quasi Phase Matching in a Nonlinear Crystal ( http://arxiv.org/abs/2406.07122v1 ) ライセンス: Link先を確認	C. -Y. Yang, C. -Y. Wang, K. -H. Lin, T. -Y. Tsai, C. -C. Lin, C. Canalias, L. -B. Wang, A. Yabushita, C. -S. Chuu,	(参考訳) 偏光束縛光子は、多くの量子技術や基礎研究に欠かせない。本稿では,2mmの多孔性を有する周期偏極非線形結晶において,2種類の位相整合条件(非臨界複屈折および準位相整合)を同時に達成し,コリニア偏光結合光子を生成する新しい光源を提案する。光子対は、それぞれ0.998と0.935の忠実度で偏光絡み合った状態で生成され、クレーター=ホルネ=シモニー=ホルト不等式を84の標準偏差で破る。コンパクトソースは干渉計、繊細なドメイン構造、ポストセレクションを必要とせず、多くのレプリカやチップスケールデバイスを必要とするスケーラブルな量子コンピューティングと通信に有利である。 Polarization-entangled photons are indispensable to numerous quantum technologies and fundamental studies. In this paper, we propose and demonstrate a novel source that generates collinear polarization-entangled photons by simultaneously achieving two distinct types of phase-matching conditions (noncritically birefringent and quasi phase matching) in a periodically poled nonlinear crystal with a large poling period of 2 mm. The photon pairs are generated in a polarization-entangled state with a fidelity and concurrence of 0.998 and 0.935, respectively, and violate the Clauser-Horne-Shimony-Holt inequality by 84 standard deviations. The compact source does not require interferometer, delicate domain structures, or post selection, and is advantageous for scalable quantum computing and communication, where many replicas or chip-scale devices are needed.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# CHARME:小さな埋め込み問題に対する連鎖型強化学習アプローチ CHARME: A chain-based reinforcement learning approach for the minor embedding problem ( http://arxiv.org/abs/2406.07124v1 ) ライセンス: Link先を確認	Hoang M. Ngo, Nguyen H K. Do, Minh N. Vu, Tamer Kahveci, My T. Thai,	(参考訳) 量子アニーリング(QA)は組合せ最適化問題を効率的に解く大きな可能性を秘めている。しかし、QAアルゴリズムの有効性は、論理グラフとして表される問題インスタンスを量子単位処理(QPU)に埋め込むことに大きく依存している。組込み問題に対する既存の手法は、より大きな問題サイズに直面した場合、スケーラビリティの問題に悩まされる。本稿では,Reinforcement Learning(RL)技術を用いて,CHARMEという小さな埋め込み問題に対処する手法を提案する。 CHARMEには、ポリシーモデリングのためのグラフニューラルネットワーク(GNN)アーキテクチャ、ソリューションの有効性を保証する状態遷移アルゴリズム、効果的なトレーニングのための順序探索戦略の3つの重要なコンポーネントが含まれている。合成および実世界のインスタンスに関する総合的な実験を通して、提案した順序探索戦略と提案したRLフレームワークであるCHARMEの有効性を実証した。詳細では、CHARME は Minorminer や ATOM のような高速な埋め込み法に比べて優れた解が得られる。さらに,本手法は,実行が遅いが高品質なソリューションで知られているOCTベースのアプローチを,いくつかのケースで超越している。さらに,提案手法は,欲求戦略に比較して優れたソリューションを提供することにより,CHARMEフレームワークのトレーニング効率を向上させる。 Quantum Annealing (QA) holds great potential for solving combinatorial optimization problems efficiently. However, the effectiveness of QA algorithms heavily relies on the embedding of problem instances, represented as logical graphs, into the quantum unit processing (QPU) whose topology is in form of a limited connectivity graph, known as the minor embedding Problem. Existing methods for the minor embedding problem suffer from scalability issues when confronted with larger problem sizes. In this paper, we propose a novel approach utilizing Reinforcement Learning (RL) techniques to address the minor embedding problem, named CHARME. CHARME includes three key components: a Graph Neural Network (GNN) architecture for policy modeling, a state transition algorithm ensuring solution validity, and an order exploration strategy for effective training. Through comprehensive experiments on synthetic and real-world instances, we demonstrate that the efficiency of our proposed order exploration strategy as well as our proposed RL framework, CHARME. In details, CHARME yields superior solutions compared to fast embedding methods such as Minorminer and ATOM. Moreover, our method surpasses the OCT-based approach, known for its slower runtime but high-quality solutions, in several cases. In addition, our proposed exploration enhances the efficiency of the training of the CHARME framework by providing better solutions compared to the greedy strategy.	翻訳日:2024-06-12 16:34:54 公開日:2024-06-11
# CARACAS: vehiCular ArchitectuRe for detAiled Can Attacks Simulation CARACAS: vehiCular ArchitectuRe for detAiled Can Attacks Simulation ( http://arxiv.org/abs/2406.07125v1 ) ライセンス: Link先を確認	Sadek Misto Kirdi, Nicola Scarano, Franco Oberti, Luca Mannella, Stefano Di Carlo, Alessandro Savino,	(参考訳) 現代の車両は、ネットワークインフラストラクチャ、特にコントローラエリアネットワーク(CAN)ネットワークを利用する攻撃に対して、ますます脆弱になっている。データ分析と分類に基づいて、IDS(Intrusion Detection Systems)のような現代のツールを使用して、このような脅威を効果的に対処するために、CANメッセージの大規模なデータセットは必須となる。本稿では,CANメッセージによるコンポーネント制御や攻撃インジェクション機能を含む車両モデルであるCARACASを提示するために,Simulinkなどのシミュレーションフレームワークのモデリング機能とアタックモデルの堅牢な表現を組み合わせることで,合成データセットの生成の可能性を検討する。 CARACASは、バッテリ・エレクトリック・ビークル(BEV)モデルを含むこの手法の有効性を示し、2つの異なるシナリオでトルク制御をターゲットにした攻撃に焦点を当てている。 Modern vehicles are increasingly vulnerable to attacks that exploit network infrastructures, particularly the Controller Area Network (CAN) networks. To effectively counter such threats using contemporary tools like Intrusion Detection Systems (IDSs) based on data analysis and classification, large datasets of CAN messages become imperative. This paper delves into the feasibility of generating synthetic datasets by harnessing the modeling capabilities of simulation frameworks such as Simulink coupled with a robust representation of attack models to present CARACAS, a vehicular model, including component control via CAN messages and attack injection capabilities. CARACAS showcases the efficacy of this methodology, including a Battery Electric Vehicle (BEV) model, and focuses on attacks targeting torque control in two distinct scenarios.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# グラフニューラルネットワークの論理蒸留 Logical Distillation of Graph Neural Networks ( http://arxiv.org/abs/2406.07126v1 ) ライセンス: Link先を確認	Alexander Pluska, Pascal Welke, Thomas Gärtner, Sagar Malhotra,	(参考訳) 本稿では,グラフを学習するための論理ベースの解釈可能なモデルと,このモデルをグラフニューラルネットワーク(GNN)から抽出するアルゴリズムを提案する。近年、GNNの表現率と数量化器(C2)を用いた一階述語論理の2変数の断片との関係が示されている。本稿では、C2の拡張を利用して、GNNから解釈可能な論理分類器を抽出する決定木モデルを提案する。我々は,複数のGNNアーキテクチャに対するアプローチを検証した。蒸留されたモデルは解釈可能で簡潔であり、基礎となるGNNと同等の精度が得られる。さらに、C2 で基底真理が表現可能である場合、我々のアプローチは GNN よりも優れている。 We present a logic based interpretable model for learning on graphs and an algorithm to distill this model from a Graph Neural Network (GNN). Recent results have shown connections between the expressivity of GNNs and the two-variable fragment of first-order logic with counting quantifiers (C2). We introduce a decision-tree based model which leverages an extension of C2 to distill interpretable logical classifiers from GNNs. We test our approach on multiple GNN architectures. The distilled models are interpretable, succinct, and attain similar accuracy to the underlying GNN. Furthermore, when the ground truth is expressible in C2, our approach outperforms the GNN.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# 概念モデルにおける周波数構造のマイニング Mining Frequent Structures in Conceptual Models ( http://arxiv.org/abs/2406.07129v1 ) ライセンス: Link先を確認	Mattia Fumagalli, Tiago Prince Sales, Pedro Paulo F. Barcelos, Giovanni Micale, Vadim Zaytsev, Diego Calvanese, Giancarlo Guizzardi,	(参考訳) 構造的手法を用いて知識を表現するという問題は概念モデリングにおいてよく知られており、長年研究されてきた。モデリングパターンの採用は効果的な構造的手法であることが証明されている。パターンは実際、設計問題の解決策として活用できる一般化可能な再帰構造である。モデル作成プロセスの理解と改善を支援する。概念モデリングにおけるパターンの使用の不可能な価値は、いくつかの実験的研究で実証された。しかし、概念モデルにおけるパターン発見は、非常に複雑なタスクとして広く認識されており、パターン識別の体系的な解決策が現在不足している。本稿では,概念モデリング言語で発生する頻繁な構造発見問題に対する一般的なアプローチを提案する。科学的貢献の実証として、UMLクラス図、特にオントUMLモデルに焦点を当てて、アプローチの実装を提供します。この実装は、頻繁なサブグラフマイニングアルゴリズムとグラフ操作技術を組み合わせることで、複数の概念モデルを処理し、複数の基準に従って再帰的な構造を発見することができる探索ツールを含む。主な目的は、言語エンジニアのためのサポート施設を提供することである。これは、良いモデリングプラクティスと悪いモデリングプラクティスの両方を活用するために、概念モデリング言語を進化させ、維持するために、そして与えられた言語でより良いモデルを設計する際のエンコードされた経験の再利用を促進するために使われる。 The problem of using structured methods to represent knowledge is well-known in conceptual modeling and has been studied for many years. It has been proven that adopting modeling patterns represents an effective structural method. Patterns are, indeed, generalizable recurrent structures that can be exploited as solutions to design problems. They aid in understanding and improving the process of creating models. The undeniable value of using patterns in conceptual modeling was demonstrated in several experimental studies. However, discovering patterns in conceptual models is widely recognized as a highly complex task and a systematic solution to pattern identification is currently lacking. In this paper, we propose a general approach to the problem of discovering frequent structures, as they occur in conceptual modeling languages. As proof of concept for our scientific contribution, we provide an implementation of the approach, by focusing on UML class diagrams, in particular OntoUML models. This implementation comprises an exploratory tool, which, through the combination of a frequent subgraph mining algorithm and graph manipulation techniques, can process multiple conceptual models and discover recurrent structures according to multiple criteria. The primary objective is to offer a support facility for language engineers. This can be employed to leverage both good and bad modeling practices, to evolve and maintain the conceptual modeling language, and to promote the reuse of encoded experience in designing better models with the given language.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# 単なる画像による音声の翻訳 Translating speech with just images ( http://arxiv.org/abs/2406.07133v1 ) ライセンス: Link先を確認	Dan Oneata, Herman Kamper,	(参考訳) 視覚的に接地された音声モデルは、音声と画像をリンクする。我々は、既存の画像キャプションシステムを介して、画像とテキストをリンクすることで、この接続を拡張し、その結果、音声を直接テキストにマッピングする能力を得る。このアプローチは、生成されたキャプションと異なる言語で音声を付加することにより、画像のみを用いた音声翻訳に使用できる。本稿では, 実際の低リソース言語であるYor\`ub\'aについて検討し, 低リソース体制で学習できるように事前学習されたコンポーネントを活用するYor\`ub\'a-to- English音声翻訳モデルを提案する。オーバーフィッティングを制限するためには,多様な画像キャプションを生成するデコード方式を用いることが不可欠である。その結果、予測された翻訳は、よりシンプルで短い形式で音声の主意味を捉えていることがわかった。 Visually grounded speech models link speech to images. We extend this connection by linking images to text via an existing image captioning system, and as a result gain the ability to map speech audio directly to text. This approach can be used for speech translation with just images by having the audio in a different language from the generated captions. We investigate such a system on a real low-resource language, Yor\`ub\'a, and propose a Yor\`ub\'a-to-English speech translation model that leverages pretrained components in order to be able to learn in the low-resource regime. To limit overfitting, we find that it is essential to use a decoding scheme that produces diverse image captions for training. Results show that the predicted translations capture the main semantics of the spoken audio, albeit in a simpler and shorter form.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# Never Miss a Beat: 一貫性のある"ミドル"拡張付き大規模言語モデルのコンテキストウィンドウ拡張のための効率的なレシピ Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement ( http://arxiv.org/abs/2406.07138v1 ) ライセンス: Link先を確認	Tong Wu, Yanpeng Zhao, Zilong Zheng,	(参考訳) 近年,事前訓練された大規模言語モデル (LLM) の文脈長を拡張する手法が数多く開発されているが,ターゲット長($\gg4K$)での微調整や,コンテキストの中央からの情報を有効に活用するのに苦労することが多い。これらの問題に対処するために、$\textbf{C}$ontinuity-$\textbf{R}$elativity ind$\textbf{E}$xing with g$\textbf{A}$ussian $\textbf{M}$iddle (CREAM)を提案する。単純なこと以外は、CREAMはトレーニング効率が良く、事前訓練されたコンテキストウィンドウ(Llama 2-4Kなど)での微調整しか必要とせず、LLMをもっと長いターゲットコンテキスト長(例えば256K)まで拡張できる。モデルが中間の情報をより重視することを保証するため、細調整中のコンテキストの中間部分からのサンプリングを促進するために、長いコンテキストのLLMに直面する ``Lost-in-the-Middle''' 問題を緩和するために、切り詰められたガウス的手法を導入する。実験の結果,CREAM は ``Never Miss A Beat'' で $\texttt{Llama2-7B}$ の Base 版と Chat 版の LLM をターゲット長に拡張することができた。私たちのコードはまもなく公開されます。 Recently, many methods have been developed to extend the context length of pre-trained large language models (LLMs), but they often require fine-tuning at the target length ($\gg4K$) and struggle to effectively utilize information from the middle part of the context. To address these issues, we propose $\textbf{C}$ontinuity-$\textbf{R}$elativity ind$\textbf{E}$xing with g$\textbf{A}$ussian $\textbf{M}$iddle (CREAM), which interpolates positional encodings by manipulating position indices. Apart from being simple, CREAM is training-efficient: it only requires fine-tuning at the pre-trained context window (eg, Llama 2-4K) and can extend LLMs to a much longer target context length (eg, 256K). To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning, thus alleviating the ``Lost-in-the-Middle'' problem faced by long-context LLMs. Experimental results show that CREAM successfully extends LLMs to the target length for both Base and Chat versions of $\texttt{Llama2-7B}$ with ``Never Miss A Beat''. Our code will be publicly available soon.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# 確率的スロット注意による物体中心表現学習 Identifiable Object-Centric Representation Learning via Probabilistic Slot Attention ( http://arxiv.org/abs/2406.07141v1 ) ライセンス: Link先を確認	Avinash Kori, Francesco Locatello, Ainkaran Santhirasekaram, Francesca Toni, Ben Glocker, Fabio De Sousa Ribeiro,	(参考訳) モジュラーオブジェクト中心表現の学習は、体系的な一般化に不可欠である。既存の手法は、有望なオブジェクト結合能力を実証的に示すが、理論的な識別可能性の保証は比較的未発達のままである。理論上、対象中心の表現がいつ特定できるかを理解することは、スロットベースの手法を正確性を保証する高次元画像に拡張するために重要である。そこで本研究では,オブジェクト中心のスロット表現に先行して集合混合を課す確率論的スロットアテンションアルゴリズムを提案する。簡単な2次元データと高分解能画像データの両方を用いた理論的識別可能性の実証検証を行った。 Learning modular object-centric representations is crucial for systematic generalization. Existing methods show promising object-binding capabilities empirically, but theoretical identifiability guarantees remain relatively underdeveloped. Understanding when object-centric representations can theoretically be identified is crucial for scaling slot-based methods to high-dimensional images with correctness guarantees. To that end, we propose a probabilistic slot-attention algorithm that imposes an aggregate mixture prior over object-centric slot representations, thereby providing slot identifiability guarantees without supervision, up to an equivalence relation. We provide empirical verification of our theoretical identifiability result using both simple 2-dimensional data and high-resolution imaging datasets.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# 失敗は失敗に終わる - 大規模ビジョンと言語モデルにおける不要な振る舞いの特性と緩和 Failures Are Fated, But Can Be Faded: Characterizing and Mitigating Unwanted Behaviors in Large-Scale Vision and Language Models ( http://arxiv.org/abs/2406.07145v1 ) ライセンス: Link先を確認	Som Sagar, Aditya Taparia, Ransalu Senanayake,	(参考訳) 多くのタスクで驚くほどうまく機能しているように見える大きなディープニューラルネットワークでは、正確性や社会的バイアス、人的価値との整合性に関連するいくつかの障害も観察しています。したがって、これらのモデルをデプロイする前には、エンジニアがモデルをデバッグし、立法機関がモデルを監査する上で、この失敗の状況を特徴付けることが重要です。それでも、モデルの失敗につながる可能性のあるすべての要因の組み合わせを徹底的にテストすることは不可能である。本稿では,<emph{deep reinforcement learning} を用いて,事前学習による識別・生成モデルにおいて,障害モードのランドスケープを探索・構築するポストホック手法を提案する。限られた人間のフィードバックの助けを借りて、発見された障害モードから離れることで、障害状況の再構築をより望ましいものにする方法を実証します。提案手法の有効性を,コンピュータビジョン,自然言語処理,視覚言語タスクで実証的に示す。 In large deep neural networks that seem to perform surprisingly well on many tasks, we also observe a few failures related to accuracy, social biases, and alignment with human values, among others. Therefore, before deploying these models, it is crucial to characterize this failure landscape for engineers to debug and legislative bodies to audit models. Nevertheless, it is infeasible to exhaustively test for all possible combinations of factors that could lead to a model's failure. In this paper, we introduce a post-hoc method that utilizes \emph{deep reinforcement learning} to explore and construct the landscape of failure modes in pre-trained discriminative and generative models. With the aid of limited human feedback, we then demonstrate how to restructure the failure landscape to be more desirable by moving away from the discovered failure modes. We empirically show the effectiveness of the proposed method across common Computer Vision, Natural Language Processing, and Vision-Language tasks.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# 3次元高分解能医用画像のベンチマークと放射線診断レポートの作成 Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images ( http://arxiv.org/abs/2406.07146v1 ) ライセンス: Link先を確認	Che Liu, Zhongwei Wan, Yuqi Wang, Hui Shen, Haozhe Wang, Kangyu Zheng, Mi Zhang, Rossella Arcucci,	(参考訳) 自動放射線学レポート生成は、放射線医によるレポート作成の労働集約的なプロセス、特にCTスキャンなどの3Dラジオグラフィーにおいて有益である。既存の手法では、現在のGPUメモリの制限のため、スライスワイズやアグレッシブなダウンサンプリングによって3Dボリュームを処理することが多い。これらの問題を解決するために,大規模言語モデル(LLM)に基づく高分解能(HR)3Dボリュームの放射線学レポートを効率的かつ効果的に生成する新しいフレームワークを提案する。具体的には、低解像度(LR)視覚トークンをクエリとして利用してHRトークンから情報をマイニングし、詳細なHR情報を保存し、HR情報LR視覚クエリのみを処理することで計算コストを削減している。さらに,5,328 HR 3Dボリュームとペアレポートを備えた新たなデータセットである BIMCV-RG をキュレートしてリリースし,3D HR 医療画像からレポート生成のための最初のベンチマークを確立した。提案手法は,標準解像度,高解像度入力,ゼロショットドメイン転送という3つの異なる設定で,A100-80Gでトレーニング可能な計算コストで,このベンチマークの既存手法を常に上回っている。 Automatic radiology report generation can significantly benefit the labor-intensive process of report writing by radiologists, especially for 3D radiographs like CT scans, which are crucial for broad clinical diagnostics yet underexplored compared to 2D radiographs. Existing methods often handle 3D volumes either slice-wise or with aggressive downsampling due to current GPU memory limitations, which results in a loss of the inherent 3D nature and critical details. To overcome these issues, we introduce a novel framework that efficiently and effectively generates radiology reports for high-resolution (HR) 3D volumes, based on large language models (LLMs). Specifically, our framework utilizes low-resolution (LR) visual tokens as queries to mine information from HR tokens, preserving detailed HR information while reducing computational costs by only processing HR informed LR visual queries. Further benefiting the field, we curate and release BIMCV-RG, a new dataset with 5,328 HR 3D volumes and paired reports, establishing the first benchmarks for report generation from 3D HR medical images. Our method consistently surpasses existing methods on this benchmark across three different settings: normal-resolution, high-resolution inputs, and zero-shot domain transfer, all at an acceptable computational cost, trainable on a single A100-80G.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# ウェアラブルデバイスを用いた生理信号モニタリング:タスク間の認知負荷の評価 Wearable Device-Based Physiological Signal Monitoring: An Assessment Study of Cognitive Load Across Tasks ( http://arxiv.org/abs/2406.07147v1 ) ライセンス: Link先を確認	Ling He, Yanxin Chen, Wenqi Wang, Shuting He, Xiaoqiang Hu,	(参考訳) 本研究では、最先端ウェアラブルモニタリング技術を用いて、FP1チャネルからの脳波データと二次職業学生(SVS)の心拍変動(HRV)データに基づいて、高精度で時間分解能の高い認知負荷評価を行う。これら2つの重要な生理指標を共同で分析することにより、SVS学生の認知負荷の評価と様々なタスクにおける有用性について、その応用価値を考察する。当初、N-BACKタスクを用いて開発されたランダム森林分類モデルにより、認知負荷の異なるSVS学生の生理的信号特性の正確な復号化が可能となり、分類精度は97%に達した。その後、この分類モデルは、国家コンピュータランク試験を含むクロスタスク実験に応用され、多様な学習文脈における方法の有意な適用性とクロスタスクの伝達性を実証した。高可搬性で実施される本研究は、二次職業教育における資源配分の指導を最適化するための理論的・実践的意義と、認知的負荷評価方法とモニタリングのための意義を有している。研究成果は、現在、同校で試行中である。 This study employs cutting-edge wearable monitoring technology to conduct high-precision, high-temporal-resolution cognitive load assessment on EEG data from the FP1 channel and heart rate variability (HRV) data of secondary vocational students(SVS). By jointly analyzing these two critical physiological indicators, the research delves into their application value in assessing cognitive load among SVS students and their utility across various tasks. The study designed two experiments to validate the efficacy of the proposed approach: Initially, a random forest classification model, developed using the N-BACK task, enabled the precise decoding of physiological signal characteristics in SVS students under different levels of cognitive load, achieving a classification accuracy of 97%. Subsequently, this classification model was applied in a cross-task experiment involving the National Computer Rank Examination, demonstrating the method's significant applicability and cross-task transferability in diverse learning contexts. Conducted with high portability, this research holds substantial theoretical and practical significance for optimizing teaching resource allocation in secondary vocational education, as well as for cognitive load assessment methods and monitoring. Currently, the research findings are undergoing trial implementation in the school.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# 等方スピン鎖におけるカルダル・パリ・張類の部分的かつ明確な出現 Partial yet definite emergence of the Kardar-Parisi-Zhang class in isotropic spin chains ( http://arxiv.org/abs/2406.07150v1 ) ライセンス: Link先を確認	Kazumasa A. Takeuchi, Jacopo De Nardis, Ofer Busani, Patrik L. Ferrari, Romain Vasseur,	(参考訳) 一次元等方的ハイゼンベルクモデルのような連続非アベリア対称性を持つ可積分スピン鎖は、理論的な理解がほとんどない超拡散輸送を示す。最近の研究では、カルダル=パリ=チャン(KPZ)の普遍性クラスとの驚くべき関係が報告されているが、この見解は、完全な数え上げ統計における不一致によって最も最近疑問視されている。ここでは、ランダウ・リフシッツ一次元磁石の広範な数値シミュレーションと、KPZ級の正確な研究によって開発された枠組みを組み合わせることで、スピン鎖で探索されていない様々な2点量を特徴付け、KPZスケーリング法則と完全に一致している。これにより、等方性スピン鎖におけるKPZクラスの部分的な出現が確立される。さらに、KPZスケーリング法則は、時空相関の伝播によって要求される適切なガリレオ昇圧の下で、エネルギー電流の存在下ではそのままであることを明らかにした。 Integrable spin chains with a continuous non-Abelian symmetry, such as the one-dimensional isotropic Heisenberg model, show superdiffusive transport with little theoretical understanding. Although recent studies reported a surprising connection to the Kardar-Parisi-Zhang (KPZ) universality class in that case, this view was most recently questioned by discrepancies in full counting statistics. Here, by combining extensive numerical simulations of the Landau-Lifshitz one-dimensional magnet, with a framework developed by exact studies of the KPZ class, we characterize various two-point quantities that remain hitherto unexplored in spin chains, and find full agreement with KPZ scaling laws. This establishes the partial emergence of the KPZ class in isotropic spin chains. Moreover, we reveal that the KPZ scaling laws are intact in the presence of an energy current, under the appropriate Galilean boost required by the propagation of spacetime correlation.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# EEG-ImageNet:脳波データセットと多粒性ラベルの視覚刺激によるベンチマーク EEG-ImageNet: An Electroencephalogram Dataset and Benchmarks with Image Visual Stimuli of Multi-Granularity Labels ( http://arxiv.org/abs/2406.07151v1 ) ライセンス: Link先を確認	Shuqi Zhu, Ziyi Ye, Qingyao Ai, Yiqun Liu,	(参考訳) 脳の活動から見えるものを同定し、再構築することで、生物の視覚システムが世界をどのように表現しているかを調査する特別な洞察を得ることができます。近年,fMRI (Function Magnetic Resonance Imaging) やMEG (Magneticencephalogram) によって収集された脳信号から高画質な画像分類と高画質な画像再構成が達成されているが,これらの装置の高価さとバルク性は,応用の一般化を困難にしている。一方、脳波検査(EEG)は、使いやすさ、コスト効率、高時間分解能、非侵襲性といった利点があるにもかかわらず、包括的なデータセットが欠如しているため、関連する研究では十分に研究されていない。このギャップに対処するために、画像Netデータセットから選択した4000枚の画像に露出した16人の被験者からの録音を含む新しいEEGデータセットであるEEG-ImageNetを紹介した。 EEG-ImageNetは、既存の類似のEEGベンチマークの5倍のEEGイメージペアで構成されている。 EEG-ImageNetは、多粒度ラベルのイメージ刺激、すなわち、粗粒度ラベルの40画像、細粒度ラベルの40画像で収集される。そこで我々は,オブジェクト分類と画像再構成のためのベンチマークを構築した。いくつかの一般的なモデルを用いた実験では、最良のモデルは60%の精度でオブジェクト分類を達成でき、画像再構成は64%程度である。これらの結果は、データセットが脳波ベースの視覚脳-コンピュータインターフェースを進化させ、生物学的システムの視覚的知覚を理解し、マシン視覚モデルを改善する潜在的応用を提供する可能性を実証している。 Identifying and reconstructing what we see from brain activity gives us a special insight into investigating how the biological visual system represents the world. While recent efforts have achieved high-performance image classification and high-quality image reconstruction from brain signals collected by Functional Magnetic Resonance Imaging (fMRI) or magnetoencephalogram (MEG), the expensiveness and bulkiness of these devices make relevant applications difficult to generalize to practical applications. On the other hand, Electroencephalography (EEG), despite its advantages of ease of use, cost-efficiency, high temporal resolution, and non-invasive nature, has not been fully explored in relevant studies due to the lack of comprehensive datasets. To address this gap, we introduce EEG-ImageNet, a novel EEG dataset comprising recordings from 16 subjects exposed to 4000 images selected from the ImageNet dataset. EEG-ImageNet consists of 5 times EEG-image pairs larger than existing similar EEG benchmarks. EEG-ImageNet is collected with image stimuli of multi-granularity labels, i.e., 40 images with coarse-grained labels and 40 with fine-grained labels. Based on it, we establish benchmarks for object classification and image reconstruction. Experiments with several commonly used models show that the best models can achieve object classification with accuracy around 60% and image reconstruction with two-way identification around 64%. These results demonstrate the dataset's potential to advance EEG-based visual brain-computer interfaces, understand the visual perception of biological systems, and provide potential applications in improving machine visual models.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# ペニングトラップを用いた量子光学実験のための高性能真空中光学系 High-performance in-vacuum optical system for quantum optics experiments in a Penning-trap ( http://arxiv.org/abs/2406.07152v1 ) ライセンス: Link先を確認	Joaquín Berrocal, Daniel Rodríguez,	(参考訳) 物理学の多くの分野に影響を及ぼす正確な測定は、ペニングトラップや、荷電粒子の各固有運動が古典的な高調波発振器である温度状態の従来の技術を用いて行われている。直接または間接的にレーザーで粒子を冷却することで、それぞれの発振器の量子状態に到達することができ、電流の代わりに光子を検出することによって精密フロンティアの微妙な効果を制御することができる。本稿では,7-Tペニングトラップにおける個々のカルシウムイオンおよびクーロン結晶から397nmの蛍光光子を検出するための新しい真空中光学系を提案する。計算機シミュレーションの結果に基づいて,回折制限性能を示す。このシステムは単一レーザー冷却したイオンを点状源として利用し、収差補正後のトラップの軸方向と半径方向の最終的な分解能は3.69(3)$\mu$mと2.75(3)$\mu$mに達する。 Accurate measurements with implications in many branches in Physics have been accessed using Penning traps and conventional techniques within a temperature regime where each eigenmotion of a charged particle is still a classical harmonic oscillator. Cooling the particle directly or indirectly with lasers allows reaching the quantum regime of each oscillator, controlling subtle effects in the precision frontier by detecting photons instead of electric current. In this paper, we present a new in-vacuum optical system designed to detecting 397-nm fluorescence photons from individual calcium ions and Coulomb crystals in a 7-T Penning trap. Based on the outcome of computer simulations, our design shows diffraction-limited performance. The system has been characterized using a single laser-cooled ion as a point-like source, reaching a final resolution of 3.69(3) $\mu$m and 2.75(3) $\mu$m for the trap's axial and radial directions, respectively, after correcting aberrations.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# 大規模言語モデルに基づくマルチエージェントコラボレーションのスケールアップ Scaling Large-Language-Model-based Multi-Agent Collaboration ( http://arxiv.org/abs/2406.07155v1 ) ライセンス: Link先を確認	Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun,	(参考訳) 大規模言語モデルによるエージェントのパイオニア化は、多エージェントコラボレーションの設計パターンを強調し、集団知能が個々の能力を上回ることを実証している。ニューロンの増加が創発的能力をもたらすことを示唆する神経スケーリング法に触発された本研究では,マルチエージェント協調におけるエージェントの増加に同様の原理が適用されるかを検討する。提案するマルチエージェント協調ネットワーク(MacNet)は,エージェントを整理し,対話的推論をトポロジカルな順序付けによって効率化する。大規模な実験により、MacNetはベースラインモデルより一貫して優れており、様々なネットワークトポロジにまたがる効果的なエージェントコラボレーションを可能にし、1000以上のエージェント間の協力を支援している。特に,小世界特性に類似したトポロジが優れた性能を発揮する,小世界協調現象が観察された。さらに、我々は、正規化されたソリューションの品質がスケーリングエージェントとしてロジスティックな成長パターンに従うことを示し、これまで観察された神経発生の事例よりもはるかに早く、協調的な出現が生じることを示唆する協調スケーリング法則を特定した。コードとデータはhttps://github.com/OpenBMB/ChatDevで入手できる。 Pioneering advancements in large language model-powered agents have underscored the design pattern of multi-agent collaboration, demonstrating that collective intelligence can surpass the capabilities of each individual. Inspired by the neural scaling law, which posits that increasing neurons leads to emergent abilities, this study investigates whether a similar principle applies to increasing agents in multi-agent collaboration. Technically, we propose multi-agent collaboration networks (MacNet), which utilize directed acyclic graphs to organize agents and streamline their interactive reasoning via topological ordering, with solutions derived from their dialogues. Extensive experiments show that MacNet consistently outperforms baseline models, enabling effective agent collaboration across various network topologies and supporting cooperation among more than a thousand agents. Notably, we observed a small-world collaboration phenomenon, where topologies resembling small-world properties achieved superior performance. Additionally, we identified a collaborative scaling law, indicating that normalized solution quality follows a logistic growth pattern as scaling agents, with collaborative emergence occurring much earlier than previously observed instances of neural emergence. The code and data will be available at https://github.com/OpenBMB/ChatDev.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# 公検証量子乱数に対する4量子フォトニクスシステムと公開鍵と秘密鍵の生成 Four-qubit photonic system for publicly verifiable quantum random numbers and generation of public and private key ( http://arxiv.org/abs/2406.07156v1 ) ライセンス: Link先を確認	Mayalakshmi Kolangatt, Anirudh Verma, Sujai Matta, Kanad Sengupta, C. M. Chandrashekar,	(参考訳) 理論的には、4量子ビットフォトニクスシステムを用いて、公に検証可能な量子乱数を生成し、絡み合い検証を行い、セキュアな公開鍵とプライベート鍵を生成する。所望の4量子状態を生成する量子回路とそのフォトニックアーキテクチャにおける実験的実現は、偏光と経路自由度に絡み合った光子対を用いて行う。 370kbpsの速さで, 4ビットシステムの測定を行い, 公開検証のために4ビット状態の部分情報にアクセスすることにより, 公証かつ純粋に確保されたランダムビットを生成する。システムがパブリックキーとプライベートキーの生成に使用される場合、同じ数のパブリックキーとプライベートキーが同時に生成される。また、4量子状態からのサンプルビットの97.9%が絡み合い検証に合格している。 4量子状態における雑音の理論モデルとその検証および確保されたビットの生成速度への影響は実験結果と完全に一致している。これは、量子システムのセキュリティ特性をリアルタイムに検証するオプションを提供することにより、量子セーフなアプリケーションに小型のマルチキュービットフォトニクスシステムの実用性を実証するものである。 We theoretically propose and experimentally demonstrate the use of a configurable four-qubit photonic system to generate a publicly verifiable quantum random numbers, to perform entanglement verification, and to generate a secure public and private key. Quantum circuits, to generate the desired four-qubit states and its experimental realization in the photonic architecture is carried out using photon pairs entangled in polarization and path degree of freedom. By performing measurements on the four-qubit system and accessing partial information of the four-qubit state for public verification, we generate publicly verified and purely secured random bits at the rate of 370 kbps. When the system is used for generating public and private keys, an equal number of public and private keys are generated simultaneously. We also record about 97.9\% of sampled bits from four-qubit states passing entanglement verification. The theoretical model of noise on the four-qubit state and its effect on the generation rate of verified and secured bits are in perfect agreement with the experimental results. This demonstrates the practical use of the small-scale multi-qubit photonic system for quantum-safe applications by providing the option for real-time verification of the security feature of the quantum system.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# 定常Gottesman-Kitaev-Preskill量子ビットに基づく量子リピータ Quantum repeaters based on stationary Gottesman-Kitaev-Preskill qubits ( http://arxiv.org/abs/2406.07158v1 ) ライセンス: Link先を確認	Stefan Häussler, Peter van Loock,	(参考訳) 量子誤り訂正符号を組み込んだ量子リピータは、遠隔リピータ局上の古典的な通信に依存する確率的量子エラー検出に依存する元の量子リピータと比較して有望な代替であることが示されている。量子ビットを誤り訂正符号に符号化する特に効率的な方法は、単一の発振器モードでさえ十分に大きな物理系として機能するボソニック符号である。ここでは、ボソニックなゴッテマン・キタエフ・プレスキル(GKP)符号を損失補正に基づく量子リピータの自然な選択とみなす。しかし、既存の処理とは違って、局所的な定常記憶量子ビットで発生する励起損失は、例えば集団原子スピンモードで表される。我々は、GKPベースの量子リピータの性能を解析し、評価し、他のメモリベースの量子リピータ方式とは対照的に、初期状態世代や分布とは別に、決定論的線形モード変換によって全ての操作を実行できる。 Quantum repeaters that incorporate quantum error correction codes have been shown to be a promising alternative compared with the original quantum repeaters that rely upon probabilistic quantum error detection depending on classical communication over remote repeater stations. A particularly efficient way of encoding qubits into an error correction code is through bosonic codes where even a single oscillator mode serves as a sufficiently large, physical system. Here we consider the bosonic Gottesman-Kitaev-Preskill (GKP) code as a natural choice for a loss-correction-based quantum repeater. However, unlike existing treatments, we focus on the excitation loss that occurs in the local, stationary memory qubits as represented by, for instance, collective atomic spin modes. We analyze and assess the performance of such a GKP-based quantum repeater where, apart from the initial state generations and distributions, all operations can be performed via deterministic linear mode transformations, as opposed to other existing memory-based quantum repeater schemes.	翻訳日:2024-06-12 16:25:09 公開日:2024-06-11
# セルフリーMIMOにおける無作為ランダムアクセスを用いた深層学習によるユーザアクティビティ検出 Deep Learning-Based Approach for User Activity Detection with Grant-Free Random Access in Cell-Free Massive MIMO ( http://arxiv.org/abs/2406.07160v1 ) ライセンス: Link先を確認	Ali Elkeshawy, HaÏfa Farès, Amor Nafkha,	(参考訳) 現代の無線ネットワークは幅広い接続要求を確実にサポートし、多様なシナリオにまたがる様々なユーザニーズを包含する必要がある。マシンタイプ通信(mMTC)はこれらのネットワークにおいて重要な役割を担っている。従来の許諾ベースのランダムアクセス(GB-RA)プロトコルは、制約のある直交プリアンブルリソースのために制限に直面している。これに対し、GF-RAプロトコルの採用は有望な解決策となる。本稿では,非直交プリアンブル設計を考慮した場合のアクティビティ検出問題に対する教師付き機械学習モデルの適用について検討する。本稿では,GF-RAプロトコルの下で動作しているセルフリーのMultiple-Input Multiple-Output (CF-mMIMO) ネットワーク上でのユーザアクティビティ検出に特化して設計されたデータ駆動アルゴリズムを提案する。さらに,本研究では, 動作検出の精度を簡易化・向上し, 入力摂動に対するアルゴリズムのレジリエンスを評価し, 浮動小数点変換がアルゴリズム性能に与える影響について検討する。シミュレーションは3GPP標準に準拠し、正確なチャネルモデリングを保証し、mMTC GF-RAデバイスの検出能力を向上するためにディープラーニングアプローチを採用した。アルゴリズムは99\%の精度を達成し、実世界のアプリケーションでその有効性を確認する。 Modern wireless networks must reliably support a wide array of connectivity demands, encompassing various user needs across diverse scenarios. Machine-Type Communication (mMTC) is pivotal in these networks, particularly given the challenges posed by massive connectivity and sporadic device activation patterns. Traditional grant-based random access (GB-RA) protocols face limitations due to constrained orthogonal preamble resources. In response, the adoption of grant-free random access (GF-RA) protocols offers a promising solution. This paper explores the application of supervised machine learning models to tackle activity detection issues in scenarios where non-orthogonal preamble design is considered. We introduce a data-driven algorithm specifically designed for user activity detection in Cell-Free Massive Multiple-Input Multiple-Output (CF-mMIMO) networks operating under GF-RA protocols. Additionally, this study presents a novel clustering strategy that simplifies and enhances activity detection accuracy, assesses the resilience of the algorithm to input perturbations, and investigates the effects of adopting floating-to-fixed-point conversion on algorithm performance. Simulations conducted adhere to 3GPP standards, ensuring accurate channel modeling, and employ a deep learning approach to boost the detection capabilities of mMTC GF-RA devices. The results are compelling: the algorithm achieves an exceptional 99\% accuracy rate, confirming its efficacy in real-world applications.	翻訳日:2024-06-12 16:13:39 公開日:2024-06-11
# EmoBox:多言語多言語音声感情認識ツールキットとベンチマーク EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark ( http://arxiv.org/abs/2406.07162v1 ) ライセンス: Link先を確認	Ziyang Ma, Mingjie Chen, Hezhao Zhang, Zhisheng Zheng, Wenxi Chen, Xiquan Li, Jiaxin Ye, Xie Chen, Thomas Hain,	(参考訳) 音声感情認識(SER)は、人間とコンピュータの相互作用において重要な部分であり、産業と学術の両方から広く注目を集めている。しかし、SERの現在の研究分野は、長い間、以下の問題に悩まされてきた。 1) データセットの合理的かつ普遍的な分割はほとんどなく, 異なるモデルや手法を比較するのが困難である。 2) 研究者が参照する多数のコーパスや言語を網羅するベンチマークは行われておらず, 再生が負担となる。本稿では,多言語マルチコーパス音声感情認識ツールキットであるEmoBoxと,コーパス内およびクロスコーパス間設定のベンチマークを提案する。企業内設定のために、異なるデータセットに対するデータパーティショニングを慎重に設計しました。クロスコーパス設定では、アノテーションエラーを軽減し、話者と感情分布で完全にバランスのとれたテストセットを得るため、基礎的なSERモデルである感情2vecを用いる。 EmoBoxをベースとして,14言語を含む32の感情データセットに対して,事前学習した10の音声モデルを用いた企業内SER結果と,完全にバランスの取れたテストセットを持つ4つのデータセットを用いた企業間SER結果を示す。私たちの知る限りでは、これは言語の範囲と量スケールにわたる、最大のSERベンチマークです。当社のツールキットとベンチマークによって,コミュニティにおけるSERの研究が促進されることを願っています。 Speech emotion recognition (SER) is an important part of human-computer interaction, receiving extensive attention from both industry and academia. However, the current research field of SER has long suffered from the following problems: 1) There are few reasonable and universal splits of the datasets, making comparing different models and methods difficult. 2) No commonly used benchmark covers numerous corpus and languages for researchers to refer to, making reproduction a burden. In this paper, we propose EmoBox, an out-of-the-box multilingual multi-corpus speech emotion recognition toolkit, along with a benchmark for both intra-corpus and cross-corpus settings. For intra-corpus settings, we carefully designed the data partitioning for different datasets. For cross-corpus settings, we employ a foundation SER model, emotion2vec, to mitigate annotation errors and obtain a test set that is fully balanced in speakers and emotions distributions. Based on EmoBox, we present the intra-corpus SER results of 10 pre-trained speech models on 32 emotion datasets with 14 languages, and the cross-corpus SER results on 4 datasets with the fully balanced test sets. To the best of our knowledge, this is the largest SER benchmark, across language scopes and quantity scales. We hope that our toolkit and benchmark can facilitate the research of SER in the community.	翻訳日:2024-06-12 16:13:39 公開日:2024-06-11
# FaceGPT:3D人間の顔をチャットする自己教師型学習 FaceGPT: Self-supervised Learning to Chat about 3D Human Faces ( http://arxiv.org/abs/2406.07163v1 ) ライセンス: Link先を確認	Haoran Wang, Mohit Mendiratta, Christian Theobalt, Adam Kortylewski,	(参考訳) 我々は、画像やテキストから3次元の人間の顔を推論するために、VLM(Large Vision-Language Models)のための自己教師型学習フレームワークFaceGPTを紹介した。典型的な3D顔再構成法は、意味論的推論能力に欠ける特殊なアルゴリズムである。 FaceGPTはこの制限を克服し、VLMのトークン空間に3Dフォーマブルフェイスモデル(3DMM)のパラメータを埋め込むことで、テキスト入力と視覚入力の両方から3Dフェイスを生成することができる。 FaceGPTは、アプリ内画像からモデルベースのオートエンコーダとして、自己教師型で訓練される。特に、LLMの隠れ状態は3次元MMパラメータに投影され、その後2次元顔画像として描画され、画像ベース再構成による自己教師あり学習プロセスのガイドとなる。人間の顔の高価な3Dアノテーションを頼らずに、FaceGPTは一般的なユーザー指示を理解する能力を保持しながら、人間の顔の詳細な理解を得る。実験の結果,FaceGPTは高品質な3次元顔再構成を実現するだけでなく,汎用的な視覚指導の能力も維持できることがわかった。さらに、FaceGPTは完全に自己教師され、複雑なテキスト入力に基づいて3D顔を生成する。 We introduce FaceGPT, a self-supervised learning framework for Large Vision-Language Models (VLMs) to reason about 3D human faces from images and text. Typical 3D face reconstruction methods are specialized algorithms that lack semantic reasoning capabilities. FaceGPT overcomes this limitation by embedding the parameters of a 3D morphable face model (3DMM) into the token space of a VLM, enabling the generation of 3D faces from both textual and visual inputs. FaceGPT is trained in a self-supervised manner as a model-based autoencoder from in-the-wild images. In particular, the hidden state of LLM is projected into 3DMM parameters and subsequently rendered as 2D face image to guide the self-supervised learning process via image-based reconstruction. Without relying on expensive 3D annotations of human faces, FaceGPT obtains a detailed understanding about 3D human faces, while preserving the capacity to understand general user instructions. Our experiments demonstrate that FaceGPT not only achieves high-quality 3D face reconstructions but also retains the ability for general-purpose visual instruction following. Furthermore, FaceGPT learns fully self-supervised to generate 3D faces based on complex textual inputs, which opens a new direction in human face analysis.	翻訳日:2024-06-12 16:13:39 公開日:2024-06-11
# 言語フィードバックからの学習による自己改善のための言語モデル指導 Teaching Language Models to Self-Improve by Learning from Language Feedback ( http://arxiv.org/abs/2406.07168v1 ) ライセンス: Link先を確認	Chi Hu, Yimin Hu, Hang Cao, Tong Xiao, Jingbo Zhu,	(参考訳) 大きな言語モデル(LLM)を人間の意図と価値で調整することは、非常に難しい。現在の手法は主に人間の好みに依存しており、自然言語で表現されたニュアンスフィードバックを捉えるのに費用がかかり、不十分である。本稿では、モデルフィードバックをアライメントに活用し、人間のアノテーションへの依存を減らす手法であるセルフリファインメント・チューニング(SRT)を提案する。 SRTはベース言語モデル(例:Tulu2)を使用して初期応答を生成し、より高度なモデル(例:GPT-4-Turbo)によって批判され洗練される。このプロセスにより、ベースモデルはアウトプットを自己評価し、改善し、継続的な学習を促進することができる。 SRTはさらに、自己生成したフィードバックと改善から学び、モデル改善を促進するフィードバックループを作成することで、モデルを最適化する。我々の経験的評価は、SRTが様々なタスクやモデルサイズで強いベースラインを著しく上回っていることを示している。 70Bパラメータモデルに適用すると、SRTはAlpacaEval 2.0ベンチマークで9.6\%から25.8\%に増加し、GPT-4-0314、Claude 2、Geminiなどの確立したシステムを上回る。分析の結果,SRTの成功における言語フィードバックの重要性が強調され,今後の研究の可能性が示唆された。 Aligning Large Language Models (LLMs) with human intentions and values is crucial yet challenging. Current methods primarily rely on human preferences, which are costly and insufficient in capturing nuanced feedback expressed in natural language. In this paper, we present Self-Refinement Tuning (SRT), a method that leverages model feedback for alignment, thereby reducing reliance on human annotations. SRT uses a base language model (e.g., Tulu2) to generate initial responses, which are critiqued and refined by a more advanced model (e.g., GPT-4-Turbo). This process enables the base model to self-evaluate and improve its outputs, facilitating continuous learning. SRT further optimizes the model by learning from its self-generated feedback and refinements, creating a feedback loop that promotes model improvement. Our empirical evaluations demonstrate that SRT significantly outperforms strong baselines across diverse tasks and model sizes. When applied to a 70B parameter model, SRT increases the win rate from 9.6\% to 25.8\% on the AlpacaEval 2.0 benchmark, surpassing well-established systems such as GPT-4-0314, Claude 2, and Gemini. Our analysis highlights the crucial role of language feedback in the success of SRT, suggesting potential for further exploration in this direction.	翻訳日:2024-06-12 16:13:39 公開日:2024-06-11
# RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation ( http://arxiv.org/abs/2406.07169v1 ) ライセンス: Link先を確認	Mirgahney Mohamed, Harry Jake Cunningham, Marc P. Deisenroth, Lourdes Agapito,	(参考訳) 人の動き生成はコンピュータアニメーションにおいて最重要視されている。人間の動きの膨大な可能性、動きのコヒーレンスに対する高い人間の感受性、きめ細かい動きを正確に生成することの難しさにより、これは困難な時間的モデリング課題である。近年,人間の動作生成のための拡散法が提案されている。しかし、生成したシーケンスは依然として動きの不整合に悩まされており、短い時間に制限され、より単純な動きに制限され、推論中にかなりの時間を要する。これらの制約に対処するため、時間モデルのための新しい再帰拡散式である「textit{RecMoDiffuse: Recurrent Flow Diffusion}」を提案する。時間的依存のない配列全体に拡散を与える以前の研究とは異なり、時間的一貫性を本質的に達成し難いアプローチである。本手法は,拡散過程における流れモデルの正規化により時間的制約を明示的に適用し,時間的次元への拡散を拡大する。人間の動作の時間的モデリングにおけるRecMoDiffuseの有効性を実証する。実験の結果、RecMoDiffuse はコヒーレントな動き列を生成し、推論段階における計算オーバーヘッドを低減しつつ、最先端の手法で同等の結果が得られることがわかった。 Human motion generation has paramount importance in computer animation. It is a challenging generative temporal modelling task due to the vast possibilities of human motion, high human sensitivity to motion coherence and the difficulty of accurately generating fine-grained motions. Recently, diffusion methods have been proposed for human motion generation due to their high sample quality and expressiveness. However, generated sequences still suffer from motion incoherence, and are limited to short duration, and simpler motion and take considerable time during inference. To address these limitations, we propose \textit{RecMoDiffuse: Recurrent Flow Diffusion}, a new recurrent diffusion formulation for temporal modelling. Unlike previous work, which applies diffusion to the whole sequence without any temporal dependency, an approach that inherently makes temporal consistency hard to achieve. Our method explicitly enforces temporal constraints with the means of normalizing flow models in the diffusion process and thereby extends diffusion to the temporal dimension. We demonstrate the effectiveness of RecMoDiffuse in the temporal modelling of human motion. Our experiments show that RecMoDiffuse achieves comparable results with state-of-the-art methods while generating coherent motion sequences and reducing the computational overhead in the inference stage.	翻訳日:2024-06-12 16:13:39 公開日:2024-06-11
# VoxNeuS: 勾配補間によるVoxel-based Neural Surfaceの再構築 VoxNeuS: Enhancing Voxel-Based Neural Surface Reconstruction via Gradient Interpolation ( http://arxiv.org/abs/2406.07170v1 ) ライセンス: Link先を確認	Sidun Liu, Peng Qiao, Zongxin Ye, Wenyu Li, Yong Dou,	(参考訳) ニューラルサーフェス・コンストラクションは、多視点画像から3次元モデルを再構成するために、符号付き距離場~(SDF)を学習する。以前の研究では、効率を改善するためにボクセルに基づく明示的な表現を採用していた。しかし、彼らはボクセル格子における補間の勾配不安定さを無視し、収束と滑らかさの低下につながった。さらに、以前の研究は幾何と放射率の最適化を絡み合わせることで、放射率を説明する幾何学の変形を引き起こし、テクスチャ化された平面を再構築する際に人工物を引き起こす。本研究では, 線形補間における勾配の不連続性から勾配の不安定性が生じることを明らかにするとともに, その不連続性を排除するために, オリジナルの解析勾配の代わりに補間勾配を用いることを提案する。勾配補間に基づく計算およびメモリ効率の良いニューラルサーフェス再構成のための軽量表面再構成法であるVoxNeuSを提案する。明示的な表現により、正規化項の勾配、すなわち等角線と曲率損失を直接解き、計算やメモリアクセスオーバーヘッドを回避できる。さらに、VoxNeuSは、放射率最適化による幾何学的変形を処理するために、幾何学的放射分散アーキテクチャを採用している。実験結果から,VoxNeuSは従来よりも再現性が高いことがわかった。トレーニングプロセス全体は15分で、1つの2080ti GPU上で3GB未満のメモリを必要とする。 Neural Surface Reconstruction learns a Signed Distance Field~(SDF) to reconstruct the 3D model from multi-view images. Previous works adopt voxel-based explicit representation to improve efficiency. However, they ignored the gradient instability of interpolation in the voxel grid, leading to degradation on convergence and smoothness. Besides, previous works entangled the optimization of geometry and radiance, which leads to the deformation of geometry to explain radiance, causing artifacts when reconstructing textured planes. In this work, we reveal that the instability of gradient comes from its discontinuity during trilinear interpolation, and propose to use the interpolated gradient instead of the original analytical gradient to eliminate the discontinuity. Based on gradient interpolation, we propose VoxNeuS, a lightweight surface reconstruction method for computational and memory efficient neural surface reconstruction. Thanks to the explicit representation, the gradient of regularization terms, i.e. Eikonal and curvature loss, are directly solved, avoiding computation and memory-access overhead. Further, VoxNeuS adopts a geometry-radiance disentangled architecture to handle the geometry deformation from radiance optimization. The experimental results show that VoxNeuS achieves better reconstruction quality than previous works. The entire training process takes 15 minutes and less than 3 GB of memory on a single 2080ti GPU.	翻訳日:2024-06-12 16:13:39 公開日:2024-06-11

Title

Authors

Abstract

論文公表日・翻訳日

# 都市音環境のより良い可視化に向けて--インタビューからの考察

Towards better visualizations of urban sound environments: insights from interviews ( http://arxiv.org/abs/2407.16889v1 )

ライセンス: Link先を確認

Modan Tailleur, Pierre Aumond, Vincent Tourre, Mathieu Lagrange,

(参考訳) 都市騒音マップや騒音の可視化は伝統的に都市全体の騒音レベルをマクロ的に表現している。しかし、これらの表現は、これらの音環境に関連する音の知覚を正確に計測するのに失敗し、知覚は関連する音源に大きく依存する。本稿では,そのような表現が重要であると想定される都市住民を特定し,音源の表現の必要性を分析することを目的とする。様々な都会の利害関係者とのインタビューを通じて, 現状の実践, 既存ツールの強み, 弱点, 既存の都市音環境表現に音源を組み込むことの意義について考察した。本研究において,音源表現の3つの異なる利用法が出現した。 1) 工業者及び専門市民に対する騒音に関する苦情 2【市民の音質評価】 3)都市計画者の指導。視覚化は、対象のオーディエンスに適応したインジケータを使用し、データアクセシビリティーを可能にする。

Urban noise maps and noise visualizations traditionally provide macroscopic representations of noise levels across cities. However, those representations fail at accurately gauging the sound perception associated with these sound environments, as perception highly depends on the sound sources involved. This paper aims at analyzing the need for the representations of sound sources, by identifying the urban stakeholders for whom such representations are assumed to be of importance. Through spoken interviews with various urban stakeholders, we have gained insight into current practices, the strengths and weaknesses of existing tools and the relevance of incorporating sound sources into existing urban sound environment representations. Three distinct use of sound source representations emerged in this study: 1) noise-related complaints for industrials and specialized citizens, 2) soundscape quality assessment for citizens, and 3) guidance for urban planners. Findings also reveal diverse perspectives for the use of visualizations, which should use indicators adapted to the target audience, and enable data accessibility.

翻訳日:2024-08-05 01:45:45 公開日:2024-06-11

# 赤外線可視画像融合のためのセマンティック・アウェア・マルチガイドネットワーク

A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion ( http://arxiv.org/abs/2407.06159v1 )

ライセンス: Link先を確認

Xiaoli Zhang, Liying Wang, Libo Zhao, Xiongfei Li, Siwei Ma,

(参考訳) マルチモダリティ画像融合は、2つのソース画像から特定のモダリティ情報と共有モダリティ情報を融合することを目的としている。複雑な場面における特徴抽出の不十分さと意味認識の欠如に対処するために, 相関型分解特徴をモデル化し, 補足的特徴と多誘導的特徴集合を効率的に抽出することで高レベルのグラフ表現をモデル化する方法に焦点を当てる。本稿では,3分岐エンコーダデコーダアーキテクチャと,それに対応する融合層を融合戦略として提案する。深部畳み込み後の浅部特徴抽出にマルチDconv Transposed Attention と Local-enhanced Feed Forward Network を用いた変圧器を用いる。 3つの並列ブランチエンコーダでは、CAI(Cross Attention and Invertible Block)が局所的な特徴を抽出し、高周波テクスチャの詳細を保存することができる。残った接続を持つベース機能抽出モジュール(BFE)は、長距離依存性をキャプチャし、共有モダリティ表現能力を向上することができる。グラフ推論モジュール(GR)は、高レベルなクロスモダリティ関係を推論し、CAIの特定のモダリティ補完情報として低レベルな詳細特徴を同時に抽出するために導入された。可視・近赤外画像融合と医用画像融合タスクにおける最先端手法と比較して,本手法が競争力のある結果を得たことを示す実験結果を得た。さらに、その後のタスクで他の融合法を上回り、オブジェクト検出では平均9.78% mAP@.5、セマンティックセグメンテーションでは6.46% mIoUと評価した。

Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images. To tackle the problem of insufficient feature extraction and lack of semantic awareness for complex scenes, this paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation by efficiently extracting complementary features and multi-guided feature aggregation. We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy. The transformer with Multi-Dconv Transposed Attention and Local-enhanced Feed Forward network is used to extract shallow features after the depthwise convolution. In the three parallel branches encoder, Cross Attention and Invertible Block (CAI) enables to extract local features and preserve high-frequency texture details. Base feature extraction module (BFE) with residual connections can capture long-range dependency and enhance shared-modality expression capabilities. Graph Reasoning Module (GR) is introduced to reason high-level cross-modality relations and extract low-level details features as CAI's specific-modality complementary information simultaneously. Experiments demonstrate that our method has obtained competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks. Moreover, we surpass other fusion methods in terms of subsequent tasks, averagely scoring 9.78% mAP@.5 higher in object detection and 6.46% mIoU higher in semantic segmentation.

翻訳日:2024-07-22 14:07:46 公開日:2024-06-11

# 生成AIを用いたコンテキストパーソナライズされたプログラミング演習の評価

Evaluating Contextually Personalized Programming Exercises Created with Generative AI ( http://arxiv.org/abs/2407.11994v1 )

ライセンス: Link先を確認

Evanfiya Logacheva, Arto Hellas, James Prather, Sami Sarsa, Juho Leinonen,

(参考訳) プログラミングスキルは、様々なハンズオンエクササイズを完了して開発されるのが一般的である。このようなプログラミング問題は、学生の興味や文化的背景に文脈化することができる。教育心理学における先行研究は、運動の文脈的パーソナライゼーションが学習者の状況的関心を刺激し、彼らのエンゲージメントに肯定的な影響を及ぼすことを示した。しかし、学生が実践するための多様な包括的なプログラミング演習を作成することは、コンピュータサイエンス教育者にとって時間と労力のかかる課題である。従来の研究では、大きな言語モデルが概念的および文脈的に関連するプログラミング演習を生成できることが示されている。そのため、学生の興味やニーズに合ったパーソナライズされたプログラミング問題を自動的に生成することが可能になる。本報告では,GPT-4で作成した文脈的にパーソナライズされたプログラミング演習を含む,選択型プログラミングコースで実施されるユーザスタディについて報告する。運動の質は,学生と著者の両方が評価した。さらに,本研究は,創生運動に対する学生の態度とシステムとの関わりについて検討した。その結果, GPT-4で発生する運動の質は概して高かった。さらに、参加者は興味を持ち、役に立ちます。このことは、AIが生成するプログラミング問題は、学生が自分の個人的関心や教育的ニーズに合わせた、事実上無制限の実践資料を提供するため、入門プログラミングコースに価値を付加する可能性があることを示唆している。

Programming skills are typically developed through completing various hands-on exercises. Such programming problems can be contextualized to students' interests and cultural backgrounds. Prior research in educational psychology has demonstrated that context personalization of exercises stimulates learners' situational interests and positively affects their engagement. However, creating a varied and comprehensive set of programming exercises for students to practice on is a time-consuming and laborious task for computer science educators. Previous studies have shown that large language models can generate conceptually and contextually relevant programming exercises. Thus, they offer a possibility to automatically produce personalized programming problems to fit students' interests and needs. This article reports on a user study conducted in an elective introductory programming course that included contextually personalized programming exercises created with GPT-4. The quality of the exercises was evaluated by both the students and the authors. Additionally, this work investigated student attitudes towards the created exercises and their engagement with the system. The results demonstrate that the quality of exercises generated with GPT-4 was generally high. What is more, the course participants found them engaging and useful. This suggests that AI-generated programming problems can be a worthwhile addition to introductory programming courses, as they provide students with a practically unlimited pool of practice material tailored to their personal interests and educational needs.

翻訳日:2024-07-22 11:30:12 公開日:2024-06-11

# FoldToken2: コンパクトで不変で生成的タンパク質構造言語を学ぶ

FoldToken2: Learning compact, invariant and generative protein structure language ( http://arxiv.org/abs/2407.00050v1 )

ライセンス: Link先を確認

Zhangyang Gao, Cheng Tan, Stan Z. Li,

(参考訳) 3D座標の等価性は、タンパク質構造表現学習、アライメント、生成において長期にわたる課題を提起している。タンパク質構造を等価に表現するコンパクトで不変な言語を作成できるだろうか? この目的に向けて、FoldToken2を提案し、元の構造の復元性を維持しながら、同変構造を離散トークンに転送する。 FoldToken1からFoldToken2へ、(1)不変構造エンコーダ、(2)ベクトル量子化圧縮機、(3)等価構造デコーダの3つのキーコンポーネントを改善した。タンパク質構造再構築タスクにおいてFoldToken2を評価したところ,従来のFoldToken1はTMScoreで20倍,RMSDで81倍であった。 FoldToken2はおそらく、単一鎖と多鎖タンパク質の量子化の両方でうまく機能する最初の方法である。我々はFoldToken2が、タンパク質構造表現学習、構造アライメント、構造生成タスクのさらなる改善をもたらすと考えている。

The equivalent nature of 3D coordinates has posed long term challenges in protein structure representation learning, alignment, and generation. Can we create a compact and invariant language that equivalently represents protein structures? Towards this goal, we propose FoldToken2 to transfer equivariant structures into discrete tokens, while maintaining the recoverability of the original structures. From FoldToken1 to FoldToken2, we improve three key components: (1) invariant structure encoder, (2) vector-quantized compressor, and (3) equivalent structure decoder. We evaluate FoldToken2 on the protein structure reconstruction task and show that it outperforms previous FoldToken1 by 20\% in TMScore and 81\% in RMSD. FoldToken2 probably be the first method that works well on both single-chain and multi-chain protein structures quantization. We believe that FoldToken2 will inspire further improvement in protein structure representation learning, structure alignment, and structure generation tasks.

翻訳日:2024-07-07 13:43:41 公開日:2024-06-11

# PreSto:レコメンデーションモデルのトレーニングのためのストレージ内データ前処理システム

PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models ( http://arxiv.org/abs/2406.14571v1 )

ライセンス: Link先を確認

Yunjae Lee, Hyeseong Kim, Minsoo Rhu,

(参考訳) トレーニングレコメンデーションシステム(RecSys)は、大量の生データを前処理し、それらをGPUにシームレスに供給するために、"データ前処理"ステージを必要とするため、いくつかの課題に直面している。高いトレーニングスループットを維持するために、最先端のソリューションは大量のCPUサーバを事前処理のために予約する。我々の特徴は、RecSysプリプロセッシングにおいて、CPU中心のプリプロセッシングは機能生成と機能正規化操作にボトルネックがあることである。 PreStoは、ISP(In-Storage Processing)を活用するストレージ中心の事前処理システムです。 PreStoは、エンド・ツー・エンドのプリプロセッシング時間で9.6\times$スピードアップ、4.3\times$コスト効率の向上、1.3\times$エネルギ効率をプロダクションスケールのRecSysプリプロセッシングで平均して1.3\times$エネルギ効率の向上で、ベースラインのCPU中心システムより優れていることを示す。

Training recommendation systems (RecSys) faces several challenges as it requires the "data preprocessing" stage to preprocess an ample amount of raw data and feed them to the GPU for training in a seamless manner. To sustain high training throughput, state-of-the-art solutions reserve a large fleet of CPU servers for preprocessing which incurs substantial deployment cost and power consumption. Our characterization reveals that prior CPU-centric preprocessing is bottlenecked on feature generation and feature normalization operations as it fails to reap out the abundant inter-/intra-feature parallelism in RecSys preprocessing. PreSto is a storage-centric preprocessing system leveraging In-Storage Processing (ISP), which offloads the bottlenecked preprocessing operations to our ISP units. We show that PreSto outperforms the baseline CPU-centric system with a $9.6\times$ speedup in end-to-end preprocessing time, $4.3\times$ enhancement in cost-efficiency, and $11.3\times$ improvement in energyefficiency on average for production-scale RecSys preprocessing.

翻訳日:2024-07-01 07:21:04 公開日:2024-06-11

# ディープラーニングによる大規模市場均衡計算

Large-Scale Contextual Market Equilibrium Computation through Deep Learning ( http://arxiv.org/abs/2406.15459v1 )

ライセンス: Link先を確認

Yunxuan Ma, Yide Bian, Hao Xu, Weitao Yang, Jingshu Zhao, Zhijian Duan, Feng Wang, Xiaotie Deng,

(参考訳) 市場均衡は、経済学と社会最適化分析における最も基本的な解決策の1つである。市場均衡計算に関する既存の研究は、主に比較的少数の購入者による設定に焦点を当てている。そこで本研究では,購入者と商品がコンテキストによって表される大規模購入者人口のシナリオにおける市場均衡の計算について検討する。この現実的で一般化された市場モデルに基づいて、市場均衡を近似する深層学習に基づく手法であるMarketFCNetを導入する。まず、買い手と買い手のコンテキストにのみ依存するニューラルネットワークを用いて、買い手ごとに各商品の割り当てをパラメータ化することから始める。次に,学習アルゴリズムの損失関数を非バイアスで推定する効率的な手法を提案し,勾配降下によるネットワークパラメータの最適化を可能にする。近似解を評価するために、市場均衡から与えられた割当と価格対の偏差を定量化するナッシュギャップと呼ばれる計量を導入する。実験結果から,MarketFCNetは市場規模が拡大するにつれて,既存の手法に比べて競争性能とランニングタイムを著しく低下させ,大規模市場均衡の近似を加速する深層学習手法の可能性を示した。

Market equilibrium is one of the most fundamental solution concepts in economics and social optimization analysis. Existing works on market equilibrium computation primarily focus on settings with a relatively small number of buyers. Motivated by this, our paper investigates the computation of market equilibrium in scenarios with a large-scale buyer population, where buyers and goods are represented by their contexts. Building on this realistic and generalized contextual market model, we introduce MarketFCNet, a deep learning-based method for approximating market equilibrium. We start by parameterizing the allocation of each good to each buyer using a neural network, which depends solely on the context of the buyer and the good. Next, we propose an efficient method to estimate the loss function of the training algorithm unbiasedly, enabling us to optimize the network parameters through gradient descent. To evaluate the approximated solution, we introduce a metric called Nash Gap, which quantifies the deviation of the given allocation and price pair from the market equilibrium. Experimental results indicate that MarketFCNet delivers competitive performance and significantly lower running times compared to existing methods as the market scale expands, demonstrating the potential of deep learning-based methods to accelerate the approximation of large-scale contextual market equilibrium.

翻訳日:2024-07-01 07:01:19 公開日:2024-06-11

# RACon:検索機能強化されたキャラクタロコモーション制御

RACon: Retrieval-Augmented Simulated Character Locomotion Control ( http://arxiv.org/abs/2406.17795v1 )

ライセンス: Link先を確認

Yuxuan Mu, Shihao Zou, Kangning Yin, Zheng Tian, Li Cheng, Weinan Zhang, Jun Wang,

(参考訳) コンピュータアニメーションでは、シミュレートされたキャラクターをライフライクな動きで運転することは困難である。現在の生成モデルは多様な動作に一般化できるが、エンドユーザー制御の応答性に問題を引き起こすことが多い。これらの問題に対処するために, RACon: Retrieval-Augmented Simulated Character Locomotion Controlを提案する。エンドツーエンドの階層的強化学習法は,レトリバーとモーションコントローラを利用する。検索者は、ユーザ指定データベースからタスク指向で動きの専門家を検索し、ユーザの制御に対する応答性を高める。選択された動きの専門家と操作信号は、シミュレートされたキャラクタを駆動するためにコントローラに転送される。さらに、トレーニングプロセスの安定化を図るために、検索強化判別器を設計する。本手法は,実証実験で実証したように,移動制御における品質と量の両方において既存の手法を超越した手法である。さらに、検索用の広範囲なデータベースを切り替えることで、実行時に独特の動作タイプに適応することができる。

In computer animation, driving a simulated character with lifelike motion is challenging. Current generative models, though able to generalize to diverse motions, often pose challenges to the responsiveness of end-user control. To address these issues, we introduce RACon: Retrieval-Augmented Simulated Character Locomotion Control. Our end-to-end hierarchical reinforcement learning method utilizes a retriever and a motion controller. The retriever searches motion experts from a user-specified database in a task-oriented fashion, which boosts the responsiveness to the user's control. The selected motion experts and the manipulation signal are then transferred to the controller to drive the simulated character. In addition, a retrieval-augmented discriminator is designed to stabilize the training process. Our method surpasses existing techniques in both quality and quantity in locomotion control, as demonstrated in our empirical study. Moreover, by switching extensive databases for retrieval, it can adapt to distinctive motion types at run time.

翻訳日:2024-07-01 06:21:45 公開日:2024-06-11

# KROP(Knowledge Return Oriented Prompting)

Knowledge Return Oriented Prompting (KROP) ( http://arxiv.org/abs/2406.11880v1 )

ライセンス: Link先を確認

Jason Martin, Kenneth Yeung,

(参考訳) 多くのLarge Language Models (LLMs) と LLM ベースのアプリが現在デプロイされており、ある種のプロンプトフィルタやアライメントを使用して、それらの整合性を保護する。しかし、これらの措置はばかげたものではない。 KROPは即発注入攻撃を回避し,これらの安全対策のほとんどにおいて事実上検出不可能な手法である。

Many Large Language Models (LLMs) and LLM-powered apps deployed today use some form of prompt filter or alignment to protect their integrity. However, these measures aren't foolproof. This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks, rendering them virtually undetectable to most of these security measures.

翻訳日:2024-06-23 13:24:48 公開日:2024-06-11

# Meent: 機械学習のための微分可能な電磁シミュレータ

Meent: Differentiable Electromagnetic Simulator for Machine Learning ( http://arxiv.org/abs/2406.12904v1 )

ライセンス: Link先を確認

Yongha Kim, Anthony W. Jung, Sanmun Kim, Kevin Octavian, Doyoung Heo, Chaejin Park, Jeongmin Shin, Sunghyun Nam, Chanhyung Park, Juho Park, Sangjun Han, Jinmyoung Lee, Seolho Kim, Min Seok Jang, Chan Y. Park,

(参考訳) 電磁法(EM)シミュレーションは、太陽電池、半導体デバイス、イメージセンサー、将来のディスプレイ、集積フォトニックデバイスなどのサブ波長スケール構造を持つデバイスを解析・設計する上で重要な役割を担っている。具体的には、半導体デバイス構造の推定やナノフォトニクスデバイスの設計といった光学的問題によって、遠く離れた現実世界への影響に関する興味深い研究トピックが提供される。このようなタスクの伝統的なアルゴリズムは、アルゴリズムとEMシミュレーションの両方の計算コストが高いため、しばしば準最適結果をもたらすシミュレーションを通じてパラメータを反復的に精錬する必要がある。機械学習(ML)は、これらの課題を軽減するための有望な候補として現れ、光学研究コミュニティは、さまざまなタスクにわたる古典的手法を超える結果を得るために、MLアルゴリズムをますます採用している。光と機械学習のコミュニティ間の相乗的コラボレーションを促進するためには、両方の研究コミュニティに親しみやすいEMシミュレーションソフトウェアを持つことが不可欠である。この目的のために,厳密な結合波解析(RCWA)を用いたEMシミュレーションソフトウェアであるMeentを提案する。 Pythonで開発され、自動微分(AD)機能を備えたMeentは、光学研究にMLを統合するための汎用プラットフォームとして機能し、その逆も可能である。研究プラットフォームとしての実用性を実証するため、Meentの3つの応用を提示する。 1) 神経オペレーターの訓練用データセットの作成 2)ナノフォトニックデバイス最適化の強化学習環境として機能し、 3)勾配型最適化器を用いた逆問題に対する解を提供する。これらの応用は、EMシミュレーションとML方法論の両方を前進させるMeentの可能性を浮き彫りにする。コードはMITライセンスのhttps://github.com/kc-ml2/meentで公開されている。

Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reaching real world impact. Traditional algorithms for such tasks require iteratively refining parameters through simulations, which often yield sub-optimal results due to the high computational cost of both the algorithms and EM simulations. Machine learning (ML) emerged as a promising candidate to mitigate these challenges, and optics research community has increasingly adopted ML algorithms to obtain results surpassing classical methods across various tasks. To foster a synergistic collaboration between the optics and ML communities, it is essential to have an EM simulation software that is user-friendly for both research communities. To this end, we present Meent, an EM simulation software that employs rigorous coupled-wave analysis (RCWA). Developed in Python and equipped with automatic differentiation (AD) capabilities, Meent serves as a versatile platform for integrating ML into optics research and vice versa. To demonstrate its utility as a research platform, we present three applications of Meent: 1) generating a dataset for training neural operator, 2) serving as an environment for the reinforcement learning of nanophotonic device optimization, and 3) providing a solution for inverse problems with gradient-based optimizers. These applications highlight Meent's potential to advance both EM simulation and ML methodologies. The code is available at https://github.com/kc-ml2/meent with the MIT license to promote the cross-polinations of ideas among academic researchers and industry practitioners.

翻訳日:2024-06-23 13:15:04 公開日:2024-06-11

# PufferLib: 強化学習ライブラリと環境の遊び方

PufferLib: Making Reinforcement Learning Libraries and Environments Play Nice ( http://arxiv.org/abs/2406.12905v1 )

ライセンス: Link先を確認

Joseph Suarez,

(参考訳) 環境、モデル、強化学習ライブラリがあり、一緒に動作するように設計されていますが、そうではありません。 PufferLibは、それらをうまく演奏させる。このライブラリは、一般的な互換性問題を排除し、トレーニングを加速するために高速なベクトル化を行うワンライン環境ラッパーを提供する。 PufferLibを使えば、CleanRLやSB3といった慣れ親しんだライブラリを使って、AtariやProcgenといった古典的なベンチマークからNetHackやNeural MMOのような複雑なシミュレータまでスケールすることができる。 pipパッケージとビルド済みのイメージは、数十の環境に依存しています。私たちのコードはすべてMITライセンスの下でフリーでオープンソースで、ベースライン、ドキュメント、pufferai.github.ioでのサポートが完備しています。

You have an environment, a model, and a reinforcement learning library that are designed to work together but don't. PufferLib makes them play nice. The library provides one-line environment wrappers that eliminate common compatibility problems and fast vectorization to accelerate training. With PufferLib, you can use familiar libraries like CleanRL and SB3 to scale from classic benchmarks like Atari and Procgen to complex simulators like NetHack and Neural MMO. We release pip packages and prebuilt images with dependencies for dozens of environments. All of our code is free and open-source software under the MIT license, complete with baselines, documentation, and support at pufferai.github.io.

翻訳日:2024-06-23 13:15:04 公開日:2024-06-11

# Flextron: マルチインワンのフレキシブルな大言語モデル

Flextron: Many-in-One Flexible Large Language Model ( http://arxiv.org/abs/2406.10260v1 )

ライセンス: Link先を確認

Ruisi Cai, Saurav Muralidharan, Greg Heinrich, Hongxu Yin, Zhangyang Wang, Jan Kautz, Pavlo Molchanov,

(参考訳) 現代のLSMのトレーニングは非常にリソース集約的であり、反復的なトレーニングを通じて限られた計算資源とメモリ資源によって特徴づけられる様々な展開シナリオをカスタマイズするのは現実的ではない。本稿では,フレキシブルモデル展開をサポートするネットワークアーキテクチャとポストトレーニングモデル最適化フレームワークであるFlextronを紹介する。 Flextronアーキテクチャはネストされた弾性構造を利用して、追加の微調整を必要とせず、推論中に特定のユーザ定義のレイテンシと精度ターゲットに迅速に適応する。入力適応性も備えており、トークンをサブネットワーク経由で自動的にルーティングすることで、パフォーマンスと効率を向上させることができる。本稿では,既存のLLMをFlextronモデルに体系的に変換する,サンプル効率のよい学習手法と関連するルーティングアルゴリズムを提案する。我々は,LPMのGPT-3およびLLama-2ファミリ上でFlextronを評価し,複数のエンドツーエンドトレーニングされた変種や他の最先端の弾性ネットワークよりも優れた性能を示す。

Training modern LLMs is extremely resource intensive, and customizing them for various deployment scenarios characterized by limited compute and memory resources through repeated training is impractical. In this paper, we introduce Flextron, a network architecture and post-training model optimization framework supporting flexible model deployment. The Flextron architecture utilizes a nested elastic structure to rapidly adapt to specific user-defined latency and accuracy targets during inference with no additional fine-tuning required. It is also input-adaptive, and can automatically route tokens through its sub-networks for improved performance and efficiency. We present a sample-efficient training method and associated routing algorithms for systematically transforming an existing trained LLM into a Flextron model. We evaluate Flextron on the GPT-3 and LLama-2 family of LLMs, and demonstrate superior performance over multiple end-to-end trained variants and other state-of-the-art elastic networks, all with a single pretraining run that consumes a mere 7.63% tokens compared to original pretraining.

翻訳日:2024-06-19 01:31:17 公開日:2024-06-11

# FoodSky:シェフとダイエットテストに合格した食品指向の大規模言語モデル

FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination ( http://arxiv.org/abs/2406.10261v1 )

ライセンス: Link先を確認

Pengfei Zhou, Weiqing Min, Chaoran Fu, Ying Jin, Mingyu Huang, Xiangyang Li, Shuhuan Mei, Shuqiang Jiang,

(参考訳) 食べ物は人間の生活の基礎であり、栄養源としてだけでなく、文化的アイデンティティや社会的相互作用の基盤としても機能している。グローバルな食生活のニーズと嗜好の複雑さが増大するにつれて、レシピ生成や食事推奨から食生活と食生活の相関関係の発見や理解まで、食品の認識と推論を可能にするために、食品知性が必要である。この目標に向けて,Large Language Models (LLMs) における様々なドメインやタスクにまたがる強力な機能を実現するために,食品指向の LLM FoodSky を導入し,食品データの認識と推論を通じて理解する。中国料理の複雑さと典型性を考慮すると、まず、さまざまな権威ソースから1つの総合的な中華料理コーパス「FoodEarth」を構築し、食品関連データを深く理解するためにFoodSkyが活用する。そこで,我々は,食品の微細なセマンティクスを捕捉し,コンテキスト対応の食品関連テキストを生成する際に,食品Skyを強化するために,トピックベースの選択状態空間モデル(TS3M)と階層的トピック検索拡張生成(HTRAG)機構を提案する。以上の結果から,食生活において,食生活は食生活と食生活の両方において,食生活の汎用的LLMよりも有意に優れており,それぞれ67.2%,66.4%が中国食生活と食生活の総合的LLMよりも優れていたことが示唆された。 FoodSkyは、料理の創造性を高め、健康的な食事パターンを促進するだけでなく、食品分野における複雑な現実世界の問題に対処する、ドメイン固有のLLMの新しい標準も設定している。 FoodSkyのオンラインデモはhttp://222.92.101.211:8200で公開されている。

Food is foundational to human life, serving not only as a source of nourishment but also as a cornerstone of cultural identity and social interaction. As the complexity of global dietary needs and preferences grows, food intelligence is needed to enable food perception and reasoning for various tasks, ranging from recipe generation and dietary recommendation to diet-disease correlation discovery and understanding. Towards this goal, for powerful capabilities across various domains and tasks in Large Language Models (LLMs), we introduce Food-oriented LLM FoodSky to comprehend food data through perception and reasoning. Considering the complexity and typicality of Chinese cuisine, we first construct one comprehensive Chinese food corpus FoodEarth from various authoritative sources, which can be leveraged by FoodSky to achieve deep understanding of food-related data. We then propose Topic-based Selective State Space Model (TS3M) and the Hierarchical Topic Retrieval Augmented Generation (HTRAG) mechanism to enhance FoodSky in capturing fine-grained food semantics and generating context-aware food-relevant text, respectively. Our extensive evaluations demonstrate that FoodSky significantly outperforms general-purpose LLMs in both chef and dietetic examinations, with an accuracy of 67.2% and 66.4% on the Chinese National Chef Exam and the National Dietetic Exam, respectively. FoodSky not only promises to enhance culinary creativity and promote healthier eating patterns, but also sets a new standard for domain-specific LLMs that address complex real-world issues in the food domain. An online demonstration of FoodSky is available at http://222.92.101.211:8200.

翻訳日:2024-06-19 01:31:17 公開日:2024-06-11

# Sinkhornアルゴリズムを用いた公正ランキング問題の高速解法

Fast solution to the fair ranking problem using the Sinkhorn algorithm ( http://arxiv.org/abs/2406.10262v1 )

ライセンス: Link先を確認

Yuki Uehara, Shunnosuke Ikeda, Naoki Nishimura, Koya Ohashi, Yilin Li, Jie Yang, Deddy Jobson, Xingxia Zha, Takeshi Matsumoto, Noriyoshi Sukegawa, Yuichi Takano,

(参考訳) オンラインフリーマーケットのような両面のマーケットプレースでは、消費者にパーソナライズされたアイテムランキングを提供するレコメンデーションシステムが、プロバイダとコンシューマ間の取引を促進する上で重要な役割を担っている。一方、両面の市場は、消費者の満足度と公正度をバランスさせ、商品提供者の活動を刺激する問題に直面している。サイトーとヨアヒムズ(2022)は、公正な分割に基づくナッシュ社会福祉を最大化するインパクトに基づく公正格付け法を考案したが、この方法は、大規模に制約された非線形最適化問題を解くことを必要としており、実際的なレコメンデーターシステムに適用することは極めて困難である。そこで本稿では,インパクトに基づく公正ランキング問題に対する高速な解法を提案する。まず、公正ランキング問題を制約のない最適化問題に変換し、シンクホーンアルゴリズムを繰り返し実行する勾配上昇法を設計する。実験の結果,提案アルゴリズムは高品質で,商用最適化ソフトウェアよりも約1000倍高速であることがわかった。

In two-sided marketplaces such as online flea markets, recommender systems for providing consumers with personalized item rankings play a key role in promoting transactions between providers and consumers. Meanwhile, two-sided marketplaces face the problem of balancing consumer satisfaction and fairness among items to stimulate activity of item providers. Saito and Joachims (2022) devised an impact-based fair ranking method for maximizing the Nash social welfare based on fair division; however, this method, which requires solving a large-scale constrained nonlinear optimization problem, is very difficult to apply to practical-scale recommender systems. We thus propose a fast solution to the impact-based fair ranking problem. We first transform the fair ranking problem into an unconstrained optimization problem and then design a gradient ascent method that repeatedly executes the Sinkhorn algorithm. Experimental results demonstrate that our algorithm provides fair rankings of high quality and is about 1000 times faster than application of commercial optimization software.

翻訳日:2024-06-19 01:31:17 公開日:2024-06-11

# 批判モデルによるコード補完における適応検索のための軽量フレームワーク

A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model ( http://arxiv.org/abs/2406.10263v1 )

ライセンス: Link先を確認

Wenrui Zhang, Tiehang Fu, Ting Yuan, Ge Zhang, Dong Chen, Jie Wang,

(参考訳) Retrieval-Augmented Generationの最近の進歩は、リポジトリレベルでコード補完を大幅に強化した。 RAGをベースとした様々なコード補完システムが設計選択に基づいて提案されている。例えば、検索生成プロセスを何度も繰り返すコストで、より多くの効率性を得ることができます。しかし、現在の手法における検索の非差別的使用は、検索のかなりの部分が不要であり、コード言語モデルに有害または有害な提案をもたらす可能性があるため、効率と有効性の両面での問題を明らかにする。これらの課題に対処するために,検索の必要性に関する洞察を提供し,複数の予測から最適な回答を選択するための軽量な批判手法であるCARDを紹介した。 CARDは任意のRAGベースのコード補完システムにシームレスに統合できる。評価の結果,CARDは21%から46%,API完了の14%から40%,関数完了の6%から46.5%を削減し,精度を向上した。 CARDはレイテンシを16%から83%に削減する。 CARDは異なるLM、レトリバー、プログラミング言語に一般化可能である。軽量で、数秒でトレーニングし、数ミリ秒で推論する。

Recent advancements in Retrieval-Augmented Generation have significantly enhanced code completion at the repository level. Various RAG-based code completion systems are proposed based on different design choices. For instance, gaining more effectiveness at the cost of repeating the retrieval-generation process multiple times. However, the indiscriminate use of retrieval in current methods reveals issues in both efficiency and effectiveness, as a considerable portion of retrievals are unnecessary and may introduce unhelpful or even harmful suggestions to code language models. To address these challenges, we introduce CARD, a lightweight critique method designed to provide insights into the necessity of retrievals and select the optimal answer from multiple predictions. CARD can seamlessly integrate into any RAG-based code completion system. Our evaluation shows that CARD saves 21% to 46% times of retrieval for Line completion, 14% to 40% times of retrieval for API completion, and 6% to 46.5% times of retrieval for function completion respectively, while improving the accuracy. CARD reduces latency ranging from 16% to 83%. CARD is generalizable to different LMs, retrievers, and programming languages. It is lightweight with training in few seconds and inference in few milliseconds.

翻訳日:2024-06-19 01:31:17 公開日:2024-06-11

# 大規模言語モデルを用いたマルチモーダルひずみセンサシステムによる引張の形状認識・モニタリング・ヒューマンインタラクション

Large Language Model-empowered multimodal strain sensory system for shape recognition, monitoring, and human interaction of tensegrity ( http://arxiv.org/abs/2406.10264v1 )

ライセンス: Link先を確認

Zebing Mao, Ryota Kobayashi, Hiroyuki Nabae, Koichi Suzumori,

(参考訳) 引張に基づくシステムは、不均一で予測不可能な環境、特に宇宙探査を動的に探索する上で有望なアプローチである。しかし、このようなシステムの実装は、状態認識、無線監視、ヒューマンインタラクション、スマート分析とアドバイス機能といった知的側面の観点からの課題を提示している。本稿では,深層学習モデルと大規模言語モデルの両方を活用することで,24個のマルチモーダルひずみセンサと6本のストラット張力積分を導入することにより,スマートな張力を実現する。長期記憶モデルによって補助される導電性フレキシブル腱を用いて、伸縮性は外部センサを使わずに自己形状の再構成を実現する。フレスコサーバとgpt-3.5-turboモデルを統合することで、緊張度は自動でiPhoneにデータを送信してワイヤレス監視を可能にし、意思決定のためにデータ分析、説明、予測、提案を提供する。最後に、テングレティの人間間相互作用システムは、人間の言語的側面からテングレティの必要な情報を得るのに役立つ。全体として、このインテリジェントな緊張感に基づくシステムは、未来の探索の可能性を示しており、現実世界のアプリケーションに汎用的なツールとなっている。

A tensegrity-based system is a promising approach for dynamic exploration of uneven and unpredictable environments, particularly, space exploration. However, implementing such systems presents challenges in terms of intelligent aspects: state recognition, wireless monitoring, human interaction, and smart analyzing and advising function. Here, we introduce a 6-strut tensegrity integrate with 24 multimodal strain sensors by leveraging both deep learning model and large language models to realize smart tensegrity. Using conductive flexible tendons assisted by long short-term memory model, the tensegrity achieves the self-shape reconstruction without extern sensors. Through integrating the flask server and gpt-3.5-turbo model, the tensegrity autonomously enables to send data to iPhone for wireless monitoring and provides data analysis, explanation, prediction, and suggestions to human for decision making. Finally, human interaction system of the tensegrity helps human obtain necessary information of tensegrity from the aspect of human language. Overall, this intelligent tensegrity-based system with self-sensing tendons showcases potential for future exploration, making it a versatile tool for real-world applications.

翻訳日:2024-06-19 01:31:17 公開日:2024-06-11

# 感情分析のための言語モデルの改善:認知科学からの洞察

Improving Language Models for Emotion Analysis: Insights from Cognitive Science ( http://arxiv.org/abs/2406.10265v1 )

ライセンス: Link先を確認

Constant Bonard, Gustave Cortal,

(参考訳) 本稿では、感情分析のための言語モデルを改善するために、認知科学研究を感情とコミュニケーションに活用することを提案する。まず,心理学と認知科学の主な感情理論について述べる。次に、自然言語処理における感情アノテーションの主な方法とその心理理論との関係について紹介する。また、認知実用論における感情コミュニケーションの2つの主要な分析方法について述べる。最後に,認知科学研究に基づき,感情分析のための言語モデルを改善するための方向性を提案する。これらの研究は、人間の感情とコミュニケーションの異なる側面を考慮して、新しい注釈体系の構築方法と感情理解のためのベンチマークの道を開くことを示唆している。

We propose leveraging cognitive science research on emotions and communication to improve language models for emotion analysis. First, we present the main emotion theories in psychology and cognitive science. Then, we introduce the main methods of emotion annotation in natural language processing and their connections to psychological theories. We also present the two main types of analyses of emotional communication in cognitive pragmatics. Finally, based on the cognitive science research presented, we propose directions for improving language models for emotion analysis. We suggest that these research efforts pave the way for constructing new annotation schemes and a possible benchmark for emotional understanding, considering different facets of human emotion and communication.

翻訳日:2024-06-19 01:31:17 公開日:2024-06-11

# 格子探索法に基づくハイブリッド深層学習モデルを用いたCOVID-19 Twitterの感性分類

COVID-19 Twitter Sentiment Classification Using Hybrid Deep Learning Model Based on Grid Search Methodology ( http://arxiv.org/abs/2406.10266v1 )

ライセンス: Link先を確認

Jitendra Tembhurne, Anant Agrawal, Kirtan Lakhotia,

(参考訳) 現代では、ソーシャルメディアプラットフォームは、ユーザーが貢献する膨大な量のソーシャルデータを蓄積している。製品やイベントに関する個人の意見や感情的傾向を素早く把握するためには、ユーザ生成コンテンツに対して感情分析を行うことが不可欠となる。マイクロブログのコメントは長いテキストと簡潔なテキストの両方を包含し、複雑なシナリオを提示する。この複雑さは、リッチな内容と短いテキストエントリと比較して複雑な単語の相互関係のため、広範にテキストコンテンツで顕著に発音される。 FacebookやTwitterなどのソーシャルネットワークサイトで共有されている世論の感情分析は進化し、多様なアプリケーションを見つけてきた。しかし、この分野ではいくつかの課題が取り組まれている。ハイブリッド手法は、特に漸進的に複雑なトレーニングデータを扱う場合、感情分析エラーを緩和するための有望なモデルとして現れてきた。本稿では、新型コロナウイルスワクチン接種の難しさを検討するために、感情分類のための8種類のハイブリッドディープラーニングモデルを提案する。感情予測は、Twitter COVID-19データセットへの埋め込み、ディープラーニングモデル、グリッド検索アルゴリズムを使用して達成される。研究によると、新型コロナウイルス(COVID-19)の予防接種に対する大衆の感情は時間とともに改善しているようだ。広範囲な評価により、提案されたモデルでは98.86%の精度が向上し、他のモデルよりも優れていた。具体的には、BERT、CNN、GSの組み合わせが最も正確であり、GloVe、BiLSTM、CNN、GSの組み合わせは98.17%の精度で遅れている。また,2.11%から14.46%の範囲での精度の向上は,既存の研究と比較して提案モデルにより報告されている。

In the contemporary era, social media platforms amass an extensive volume of social data contributed by their users. In order to promptly grasp the opinions and emotional inclinations of individuals regarding a product or event, it becomes imperative to perform sentiment analysis on the user-generated content. Microblog comments often encompass both lengthy and concise text entries, presenting a complex scenario. This complexity is particularly pronounced in extensive textual content due to its rich content and intricate word interrelations compared to shorter text entries. Sentiment analysis of public opinion shared on social networking websites such as Facebook or Twitter has evolved and found diverse applications. However, several challenges remain to be tackled in this field. The hybrid methodologies have emerged as promising models for mitigating sentiment analysis errors, particularly when dealing with progressively intricate training data. In this article, to investigate the hesitancy of COVID-19 vaccination, we propose eight different hybrid deep learning models for sentiment classification with an aim of improving overall accuracy of the model. The sentiment prediction is achieved using embedding, deep learning model and grid search algorithm on Twitter COVID-19 dataset. According to the study, public sentiment towards COVID-19 immunization appears to be improving with time, as evidenced by the gradual decline in vaccine reluctance. Through extensive evaluation, proposed model reported an increased accuracy of 98.86%, outperforming other models. Specifically, the combination of BERT, CNN and GS yield the highest accuracy, while the combination of GloVe, BiLSTM, CNN and GS follows closely behind with an accuracy of 98.17%. In addition, increase in accuracy in the range of 2.11% to 14.46% is reported by the proposed model in comparisons with existing works.

翻訳日:2024-06-19 01:31:17 公開日:2024-06-11

# 生成LDMのトークン確率分布における未使用情報:予測値の計算によるLCM読取理解の改善

Unused information in token probability distribution of generative LLM: improving LLM reading comprehension through calculation of expected values ( http://arxiv.org/abs/2406.10267v1 )

ライセンス: Link先を確認

Krystian Zawistowski,

(参考訳) LLMテキストデコーディングは、LLMの品質を認識するための重要なコンポーネントである。トークン確率の操作により復号法を改良できることを示す2つの実験を行った。まず,SummEvalの要約スコアリングデータセットを用いて,読解理解度を測定する。欲求復号から期待値までのスコアを次のトークン分布で比較する。スコアのエントロピーを高めるために,ロジットを高温でスケールする。これにより SummEval のパフォーマンスが向上する(人間の判断に相関する)。 7BMistralでは6-8%から13-28%,Mixtralでは20%-46%から37%-56%に改善した。利得の一部は位置バイアスに関係しているようだ。第2に、確率に基づく木サンプリングアルゴリズムを用いて、与えられたプロンプトに対して最も確率の高い世代すべてを調べる。

LLM text decoding is key component for perceived LLM quality. We demonstrate two experiments showing that decoding methods could be improved by manipulation of token probabilities. First, we test few LLM on SummEval summary scoring dataset, to measure reading comprehension. We compare scores from greedy decoding to expected values over the next token distribution. We scale logits by large temperature to increase the entropy of scores. This allows strong improvement of performance on SummEval (in terms of correlations to human judgement). We see improvement from 6-8% to 13-28% for 7B Mistral and from 20%-46% to 37%-56% for Mixtral, beating GPT 4 0314 result on two metrics. Part of the gain seems related to positional bias. Secondly, we use probability-based tree sampling algorithm, to examine all most probable generations for given prompt.

翻訳日:2024-06-19 01:31:17 公開日:2024-06-11

# 自然言語処理による自動数学的帰納証明

Autograding Mathematical Induction Proofs with Natural Language Processing ( http://arxiv.org/abs/2406.10268v1 )

ライセンス: Link先を確認

Chenyan Zhao, Mariana Silva, Seth Poulsen,

(参考訳) 数学の証明教育では、学生が数学の証明を書くことを学ぶのを助ける介入が必要である。研究によると、タイムリーなフィードバックは、新しいスキルを学ぶ学生にとって非常に役に立つ。長年にわたり、自然言語処理モデルは数学的テキストに関連するタスクでうまく機能するのに苦労してきたが、近年の自然言語処理の発展は、学生に数学的証明に対する即時フィードバックを与える機会を生み出している。本稿では,既存の大規模言語モデルや他の機械学習技術を活用して,自由形式の数学的証明を自動分解する訓練手法とモデルを提案する。モデルは、誘導問題によって4つの異なる証明から収集された証明データを用いて訓練される。我々は、4つの異なる頑健な大規模言語モデルを使用してパフォーマンスを比較し、それぞれが満足できるパフォーマンスを様々な程度に達成しています。さらに、トレーニングデータと同じ証明を格付けするために、人間の学級者を募集し、最高の学級モデルがほとんどの学級者よりも正確であることを見出した。これらのグレーティングモデルの開発により,帰納的問題による証明のためのオートグラファーの作成と展開を行い,学生とのユーザスタディを実施する。研究結果は、学生がオートグラファーからのフィードバックを使って証明を大幅に改善できることを示しているが、学生は人間のグレーダーを信頼するほどAIオートグラファーを信頼していない。将来の作業は、オートグラファーのフィードバックを改善し、学生がAIオートグラダーを信頼する方法を見つけることができる。

In mathematical proof education, there remains a need for interventions that help students learn to write mathematical proofs. Research has shown that timely feedback can be very helpful to students learning new skills. While for many years natural language processing models have struggled to perform well on tasks related to mathematical texts, recent developments in natural language processing have created the opportunity to complete the task of giving students instant feedback on their mathematical proofs. In this paper, we present a set of training methods and models capable of autograding freeform mathematical proofs by leveraging existing large language models and other machine learning techniques. The models are trained using proof data collected from four different proof by induction problems. We use four different robust large language models to compare their performances, and all achieve satisfactory performances to various degrees. Additionally, we recruit human graders to grade the same proofs as the training data, and find that the best grading model is also more accurate than most human graders. With the development of these grading models, we create and deploy an autograder for proof by induction problems and perform a user study with students. Results from the study shows that students are able to make significant improvements to their proofs using the feedback from the autograder, but students still do not trust the AI autograders as much as they trust human graders. Future work can improve on the autograder feedback and figure out ways to help students trust AI autograders.

翻訳日:2024-06-19 01:31:17 公開日:2024-06-11

# 大規模言語モデルサロゲートとしてのマルコフ制約

Markov Constraint as Large Language Model Surrogate ( http://arxiv.org/abs/2406.10269v1 )

ライセンス: Link先を確認

Alexandre Bonlarron, Jean-Charles Régin,

(参考訳) 本稿では,マルコフ制約の変種であるNgramMarkovについて述べる。制約プログラミング(CP)におけるテキスト生成に特化している。これは、大きな言語モデル(LLM)によって与えられる確率に関連する一連のn-gram(すなわちnワードの列)を含む。これは文のn-グラムの確率の積を制限する。この制約のプロパゲータは、n-gram の最大推定ではなく LLM 分布を取り入れた、素マルコフ制約プロパゲータの拡張と見なすことができる。これはグライディングしきい値、すなわち局所確率が低すぎるn-グラムを拒絶し、平衡解を保証する。また、固定長地平線に対して許容される文につながる可能性が極めて低いn-gramを除去する「ルックアヘッド」アプローチと組み合わせることもできる。この考え方はMDDMarkovProcess制約プロパゲータに基づいているが、MDD(Multi-Valued Decision Diagram)を明示的に使用していない。実験の結果, 生成したテキストは, LLMのパープレキシティ関数と同じような方法で評価されることがわかった。この新しい制約を使用することで、生成される候補文の数を劇的に削減し、計算時間を改善し、より大きなコーパスやより小さなn-gramを使用することができる。 5グラムではなく4グラムで現実の問題が初めて解決された。

This paper presents NgramMarkov, a variant of the Markov constraints. It is dedicated to text generation in constraint programming (CP). It involves a set of n-grams (i.e., sequence of n words) associated with probabilities given by a large language model (LLM). It limits the product of the probabilities of the n-gram of a sentence. The propagator of this constraint can be seen as an extension of the ElementaryMarkov constraint propagator, incorporating the LLM distribution instead of the maximum likelihood estimation of n-grams. It uses a gliding threshold, i.e., it rejects n-grams whose local probabilities are too low, to guarantee balanced solutions. It can also be combined with a "look-ahead" approach to remove n-grams that are very unlikely to lead to acceptable sentences for a fixed-length horizon. This idea is based on the MDDMarkovProcess constraint propagator, but without explicitly using an MDD (Multi-Valued Decision Diagram). The experimental results show that the generated text is valued in a similar way to the LLM perplexity function. Using this new constraint dramatically reduces the number of candidate sentences produced, improves computation times, and allows larger corpora or smaller n-grams to be used. A real-world problem has been solved for the first time using 4-grams instead of 5-grams.

翻訳日:2024-06-19 01:21:32 公開日:2024-06-11

# Trie-Augmented Neural Networks(TANNS)の概念的フレームワーク

A Conceptual Framework For Trie-Augmented Neural Networks (TANNS) ( http://arxiv.org/abs/2406.10270v1 )

ライセンス: Link先を確認

Temitayo Adefemi,

(参考訳) Trie-Augmented Neural Networks (TANN)は、ニューラルネットワークとトリ構造を組み合わせることで、意思決定の透明性と機械学習の効率性を高める階層的な設計を形成する。本稿では,テキストと文書の分類にTANNを用い,RNN(Recurrent Neural Networks)とFNN(Feed forward Neural Networks)を適用した。 20のNewsGroupおよびSMS Spam Collectionデータセット上でTANNを評価し,従来のRNNおよびFFNネットワークとドロップアウト正規化の有無を比較した。その結果, TANNはテキスト分類において, 類似あるいは若干の性能が向上していることがわかった。 TANNの最大の利点は、構造化された意思決定プロセスであり、解釈可能性を向上させる。実装上の課題と実用上の制限について論じる。今後の作業は、より複雑な分類タスクのために、TANNアーキテクチャを洗練することを目的としている。

Trie-Augmented Neural Networks (TANNs) combine trie structures with neural networks, forming a hierarchical design that enhances decision-making transparency and efficiency in machine learning. This paper investigates the use of TANNs for text and document classification, applying Recurrent Neural Networks (RNNs) and Feed forward Neural Networks (FNNs). We evaluated TANNs on the 20 NewsGroup and SMS Spam Collection datasets, comparing their performance with traditional RNN and FFN Networks with and without dropout regularization. The results show that TANNs achieve similar or slightly better performance in text classification. The primary advantage of TANNs is their structured decision-making process, which improves interpretability. We discuss implementation challenges and practical limitations. Future work will aim to refine the TANNs architecture for more complex classification tasks.

翻訳日:2024-06-19 01:21:32 公開日:2024-06-11

# Perlによる非Perlバイオインフォマティクス応用の強化: オブジェクト指向, PDL, Alien, FFI, Inline, OpenMP を用いた新しいコンポーネントベースアプリケーションの構築

Enhancing non-Perl bioinformatic applications with Perl: Building novel, component based applications using Object Orientation, PDL, Alien, FFI, Inline and OpenMP ( http://arxiv.org/abs/2406.10271v1 )

ライセンス: Link先を確認

Christos Argyropoulos,

(参考訳) コンポーネントベースのソフトウェアエンジニアリング(CBSE)は、既存の再利用可能なソフトウェアコンポーネントを新しいアプリケーションに組み立てる方法論である。 Perlはこの分野で10年前まで広く使われていたが、最近のアプリケーションはBiioconductor/RまたはPythonを選択している。この傾向は、Perlがコンポジションを容易にするための様々な抽象化を提供しているため、既存のコンポーネントから新しいバイオインフォマティクスアプリケーションを素早く生成する機会が著しく欠落していることを示している。本稿では,オブジェクト指向フレームワーク,Perl Data Language,および外部関数インタフェースによる非Perlコードへのインターフェース,および外部ソースコードのインライン化によるCBSE用Perlの有用性について述べる。そのため、Rで書かれたRNAシークエンシングシミュレータであるPolyesterを拡張し、編集距離に基づいて高速な配列類似性検索ライブラリをedlibする。最初のケーススタディでは、GNU Scientific LibraryとPDLを使って乱数シミュレーションのために、新しい高性能なPerlモジュールをほぼ無作為に作成し、生物学的配列からポリAテールを"トリム"するために使用されるPythonツール cutadaptのPerlとPerl/C代替案を提案する。 edlibの場合、メタクラスプログラミングのパワーを活用して、多コアエンジン(MCE)モジュールとOpenMP(C/C++/Fortran Application Programming Interface for shared memory multithreaded Processing)によるプロセスベースの並列処理、そして粒度の細かい並列処理を実現します。これらのユースケースは、Bio::SeqAlignmentフレームワークのコンセプト実証を提供する。このフレームワークは、複雑なメモリにおける異種コンポーネントを整理し、新しいビオンフォマティクスツールを構築するためのコマンドラインベースのワークフローで、ロングリードシークエンシング、例えばナノポール、シークエンシングプラットフォームからのデータを分析することができる。

Component-Based Software Engineering (CBSE) is a methodology that assembles pre-existing, re-usable software components into new applications, which is particularly relevant for fast moving, data-intensive fields such as bioinformatics. While Perl was used extensively in this field until a decade ago, more recent applications opt for a Bioconductor/R or Python. This trend represents a significantly missed opportunity for the rapid generation of novel bioinformatic applications out of pre-existing components since Perl offers a variety of abstractions that can facilitate composition. In this paper, we illustrate the utility of Perl for CBSE through a combination of Object Oriented frameworks, the Perl Data Language and facilities for interfacing with non-Perl code through Foreign Function Interfaces and inlining of foreign source code. To do so, we enhance Polyester, a RNA sequencing simulator written in R, and edlib a fast sequence similarity search library based on the edit distance. The first case study illustrates the near effortless authoring of new, highly performant Perl modules for the simulation of random numbers using the GNU Scientific Library and PDL, and proposes Perl and Perl/C alternatives to the Python tool cutadapt that is used to "trim" polyA tails from biological sequences. For the edlib case, we leverage the power of metaclass programming to endow edlib with coarse, process based parallelism, through the Many Core Engine (MCE) module and fine grained parallelism through OpenMP, a C/C++/Fortran Application Programming Interface for shared memory multithreaded processing. These use cases provide proof-of-concept for the Bio::SeqAlignment framework, which can organize heterogeneous components in complex memory and command-line based workflows for the construction of novel bionformatic tools to analyze data from long-read sequencing, e.g. Nanopore, sequencing platforms.

翻訳日:2024-06-19 01:21:32 公開日:2024-06-11

# 中国語と英語におけるコネクテッド音声に基づく認知評価

Connected Speech-Based Cognitive Assessment in Chinese and English ( http://arxiv.org/abs/2406.10272v1 )

ライセンス: Link先を確認

aturnino Luz, Sofia De La Fuente Garcia, Fasih Haider, Davida Fromm, Brian MacWhinney, Alyssa Lanzi, Ya-Ning Chang, Chia-Ju Chou, Yi-Chien Liu,

(参考訳) 本稿では,コネクテッド音声の分析による認知機能評価のための新しいベンチマークデータセットと予測タスクを提案する。このデータセットは、中国語と英語の話者の音声サンプルと臨床情報からなり、認知障害のレベルが異なる。これらのデータは、モデルトレーニングにおけるバランスと表現力を確保するために、確率スコア分析によって年齢と性別によって慎重に一致している。予測タスクは、軽度の認知障害診断と認知テストスコア予測を含む。このフレームワークは、言語にまたがって一般化する音声に基づく認知評価手法の開発を促進するために設計された。本稿では,言語に依存しない,同等の機能を備えたベースライン予測モデルを用いて,診断と認知テストスコア予測を行う。非重みのない平均リコールは59.2%、根平均2乗誤差は2.89である。

We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age and sex by propensity score analysis to ensure balance and representativity in model training. The prediction tasks encompass mild cognitive impairment diagnosis and cognitive test score prediction. This framework was designed to encourage the development of approaches to speech-based cognitive assessment which generalise across languages. We illustrate it by presenting baseline prediction models that employ language-agnostic and comparable features for diagnosis and cognitive test score prediction. The models achieved unweighted average recall was 59.2% in diagnosis, and root mean squared error of 2.89 in score prediction.

翻訳日:2024-06-19 01:21:32 公開日:2024-06-11

# 言葉を超えて: ミッションクリティカルリスク分析における大規模言語モデルでの行動可能性

Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis ( http://arxiv.org/abs/2406.10273v1 )

ライセンス: Link先を確認

Matteo Esposito, Francesco Palagiano, Valentina Lenarduzzi,

(参考訳) コンテキスト。リスク分析は特定のシナリオにおける潜在的なリスクを評価する。リスク分析の原則は、コンテキストレスであり、同じ方法論を、健康や情報技術のセキュリティに関連するリスクに適用することができる。リスク分析には、国内外の規制や基準に関する膨大な知識が必要であり、時間と努力が集中している。大きな言語モデルは、人間よりも少ない時間で情報を素早く要約することができ、特定のタスクに微調整することができる。エイム。本研究は,リスク分析における検索・拡張型LLMと微調整型LLMの有効性を検討することを目的とした実証研究である。我々の知る限り、リスク分析の能力について事前の研究は行われていない。方法。我々は過去5年間に産業状況チームによってアーカイブされた50以上のミッションクリティカルな分析結果から,‘totalscenarios’というユニークなシナリオを手作業でキュレートした。基本モデルであるGPT-3.5とGPT-4とRetrieval-Augmented Generationおよび微調整モデルを比較した。我々は、モデルの競合相手として2人の人間専門家と、3人の人間専門家を雇い、モデルと以前の人間専門家の分析をレビューします。審査員は5000のシナリオ分析を行った。結果と結論。 HEsは高い精度を示したが、LSMsはより速く、より実用的な。さらに,RAG支援LSMが最も低い幻覚率を示し,隠れたリスクを効果的に発見し,人間の専門知識を補完することを示した。したがって、モデルの選択は、正確性のためのFTM、隠れたリスク発見のためのRAG、包括性と行動可能性のためのベースモデルなど、特定のニーズに依存する。したがって、専門家はLLMを、凝縮した時間枠内でのリスク分析を効果的に補完するコンパニオンとして活用することができる。また、不当な対策の実施に伴う不要な費用を回避することでコストを削減できる。

Context. Risk analysis assesses potential risks in specific scenarios. Risk analysis principles are context-less; the same methodology can be applied to a risk connected to health and information technology security. Risk analysis requires a vast knowledge of national and international regulations and standards and is time and effort-intensive. A large language model can quickly summarize information in less time than a human and can be fine-tuned to specific tasks. Aim. Our empirical study aims to investigate the effectiveness of Retrieval-Augmented Generation and fine-tuned LLM in Risk analysis. To our knowledge, no prior study has explored its capabilities in risk analysis. Method. We manually curated \totalscenarios unique scenarios leading to \totalsamples representative samples from over 50 mission-critical analyses archived by the industrial context team in the last five years. We compared the base GPT-3.5 and GPT-4 models versus their Retrieval-Augmented Generation and fine-tuned counterparts. We employ two human experts as competitors of the models and three other three human experts to review the models and the former human expert's analysis. The reviewers analyzed 5,000 scenario analyses. Results and Conclusions. HEs demonstrated higher accuracy, but LLMs are quicker and more actionable. Moreover, our findings show that RAG-assisted LLMs have the lowest hallucination rates, effectively uncovering hidden risks and complementing human expertise. Thus, the choice of model depends on specific needs, with FTMs for accuracy, RAG for hidden risks discovery, and base models for comprehensiveness and actionability. Therefore, experts can leverage LLMs for an effective complementing companion in risk analysis within a condensed timeframe. They can also save costs by averting unnecessary expenses associated with implementing unwarranted countermeasures.

翻訳日:2024-06-19 01:21:32 公開日:2024-06-11

# 一般大言語モデルを用いた数学的文書の分類

Using General Large Language Models to Classify Mathematical Documents ( http://arxiv.org/abs/2406.10274v1 )

ライセンス: Link先を確認

Patrick D. F. Ion, Stephen M. Watt,

(参考訳) 本稿では,最近公開された汎用大言語モデル (LLM) を用いて数学的文書を分類する可能性を評価するための最初の調査について報告する。自動分類は、文学のナビゲーションを改善するための応用的な視点と、数学的結果間の関係を識別するよりオープンな目標から有用である。 MathSciNet と zbMATH の Mathematical Subject Classification MSC 2020 は広く使われており、公開文学において地中真理資料のかなりのコーパスが存在する。我々は,MSC 2020に基づき,arXiv.orgの事前印刷項目の分類を評価した。実験ではタイトルと抽象のみを使用しましたが、紙全体ではありません。これはチャットボットの利用とAPIの開発の初期段階であったため、ここでは手作業による実行について報告する。もちろん、プロセスの自動化は、一般的に有用であるならば、従わなければなりません。サンプルの約60%において, LLMはarXivで既に報告されている一次分類マッチングを作成した。約半数の症例では、検出されなかった追加の一次分類があった。サンプルの約40%において、LLMは提供されたものとは異なる分類を提案した。しかし, これらの症例の詳細な検査では, LLMを推奨する分類は, 提供された分類よりも, 多くの場合において良好であった。

In this article we report on an initial exploration to assess the viability of using the general large language models (LLMs), recently made public, to classify mathematical documents. Automated classification would be useful from the applied perspective of improving the navigation of the literature and the more open-ended goal of identifying relations among mathematical results. The Mathematical Subject Classification MSC 2020, from MathSciNet and zbMATH, is widely used and there is a significant corpus of ground truth material in the open literature. We have evaluated the classification of preprint articles from arXiv.org according to MSC 2020. The experiment used only the title and abstract alone -- not the entire paper. Since this was early in the use of chatbots and the development of their APIs, we report here on what was carried out by hand. Of course, the automation of the process will have to follow if it is to be generally useful. We found that in about 60% of our sample the LLM produced a primary classification matching that already reported on arXiv. In about half of those instances, there were additional primary classifications that were not detected. In about 40% of our sample, the LLM suggested a different classification than what was provided. A detailed examination of these cases, however, showed that the LLM-suggested classifications were in most cases better than those provided.

翻訳日:2024-06-19 01:21:32 公開日:2024-06-11

# ExHuBERT:37の感情データセットのブロック拡張と細調整によるHumberTの強化

ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets ( http://arxiv.org/abs/2406.10275v1 )

ライセンス: Link先を確認

Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller,

(参考訳) 基礎モデルは、事前訓練された表現を利用して、音声信号の感情パターンをキャプチャすることで、音声感情認識(SER)に大きな可能性を示してきた。様々な言語やドメインのSER性能をさらに向上するために,新しい2次元アプローチを提案する。 EmoSet++は、37のデータセット、150,907のサンプル、合計119.5時間からなる包括的な多言語、多文化の音声感情コーパスである。次に、バックボーン拡張とEmoSet++の微調整によって達成されたHuBERTの拡張バージョンであるExHuBERTを紹介します。それぞれのエンコーダ層とその重みを複製し、最初の複製を凍結し、余分なゼロ初期化線形層を統合し、接続をスキップして機能を保ち、その後の微調整への適応性を確保する。未知のデータセットに対する評価は、ExHuBERTの有効性を示し、様々なSERタスクに対する新しいベンチマークを設定した。 EmoSet++に関するモデルと詳細: https://huggingface.co/amiriparian/ExHuBERT

Foundation models have shown great promise in speech emotion recognition (SER) by leveraging their pre-trained representations to capture emotion patterns in speech signals. To further enhance SER performance across various languages and domains, we propose a novel twofold approach. First, we gather EmoSet++, a comprehensive multi-lingual, multi-cultural speech emotion corpus with 37 datasets, 150,907 samples, and a total duration of 119.5 hours. Second, we introduce ExHuBERT, an enhanced version of HuBERT achieved by backbone extension and fine-tuning on EmoSet++. We duplicate each encoder layer and its weights, then freeze the first duplicate, integrating an extra zero-initialized linear layer and skip connections to preserve functionality and ensure its adaptability for subsequent fine-tuning. Our evaluation on unseen datasets shows the efficacy of ExHuBERT, setting a new benchmark for various SER tasks. Model and details on EmoSet++: https://huggingface.co/amiriparian/ExHuBERT.

翻訳日:2024-06-19 01:21:32 公開日:2024-06-11

# YOLOモデルによる道路信号検出の高速化と伝達学習

Advancing Roadway Sign Detection with YOLO Models and Transfer Learning ( http://arxiv.org/abs/2406.09437v1 )

ライセンス: Link先を確認

Selvia Nafaa, Hafsa Essam, Karim Ashour, Doaa Emad, Rana Mohamed, Mohammed Elhenawy, Huthaifa I. Ashqar, Abdallah A. Hassan, Taqwa I. Alhadidi,

(参考訳) 道路標識の検出と認識はアドバンスト・ドライビング・アシスタント・システム(ADAS)の重要な要素である。いくつかの人工知能手法が、YOLOv5とYOLOv8の中で広く使われている。本稿では,異なる照明条件下で異なる道路標識を検出し,分類するために,改良されたYOLOv5とYOLOv8を用いた。実験の結果、YOLOv8モデルでは、エポック数やバッチサイズによってMAP50のスコアは94.6%から97.1%に変化していることがわかった。 YOLOv5モデルは競合性能を示し、MAP50のスコアは92.4%から96.9%である。これらの結果から, YOLOv8はMAP50スコアをわずかに高め, 異なるトレーニング設定で良好に動作することが示唆された。これらの結果は、どちらのモデルも異なるトレーニング設定下でもうまく機能し、オブジェクト検出アプリケーションにおいて信頼性と適応性のあるソリューションを求める実践者にとって貴重な洞察を提供することを示唆している。

Roadway signs detection and recognition is an essential element in the Advanced Driving Assistant Systems (ADAS). Several artificial intelligence methods have been used widely among of them YOLOv5 and YOLOv8. In this paper, we used a modified YOLOv5 and YOLOv8 to detect and classify different roadway signs under different illumination conditions. Experimental results indicated that for the YOLOv8 model, varying the number of epochs and batch size yields consistent MAP50 scores, ranging from 94.6% to 97.1% on the testing set. The YOLOv5 model demonstrates competitive performance, with MAP50 scores ranging from 92.4% to 96.9%. These results suggest that both models perform well across different training setups, with YOLOv8 generally achieving slightly higher MAP50 scores. These findings suggest that both models can perform well under different training setups, offering valuable insights for practitioners seeking reliable and adaptable solutions in object detection applications.

翻訳日:2024-06-17 17:54:01 公開日:2024-06-11

# テキストマイニング分析を用いたヨルダンにおける交通事故物語の探索

Exploring Traffic Crash Narratives in Jordan Using Text Mining Analytics ( http://arxiv.org/abs/2406.09438v1 )

ライセンス: Link先を確認

Shadi Jaradat, Taqwa I. Alhadidi, Huthaifa I. Ashqar, Ahmed Hossain, Mohammed Elhenawy,

(参考訳) 本研究は,テキストマイニング分析による交通安全政策の効果的な情報提供と強化を目的として,交通事故の物語を考察する。テキストマイニング技術は、物語の中の主要なテーマや傾向を解明するために使われ、交通事故の原因についてより深く理解することを目的としている。この研究は、2018-2022年の7,587件の記録をカバーしたヨルダンの5つの主要高速道路の事故データを収集した。事故データからパターンを学習するために,教師なし学習法を採用した。トピックモデリング、キーワード抽出、Word Co-Occurrence Networkといったテキストマイニング技術も、クラッシュパターンの共起を明らかにするために使用された。その結果,テキストマイニング分析は有望な手法であり,交通事故の多因子的特性を裏付けるものであることがわかった。すべての分析における繰り返しのテーマは、道路安全に対するバランスのとれたアプローチの必要性を強調し、積極的かつ反応性のある手段を融合させる。動物関連の出来事に関するドライバー教育と認識が重要視される。

This study explores traffic crash narratives in an attempt to inform and enhance effective traffic safety policies using text-mining analytics. Text mining techniques are employed to unravel key themes and trends within the narratives, aiming to provide a deeper understanding of the factors contributing to traffic crashes. This study collected crash data from five major freeways in Jordan that cover narratives of 7,587 records from 2018-2022. An unsupervised learning method was adopted to learn the pattern from crash data. Various text mining techniques, such as topic modeling, keyword extraction, and Word Co-Occurrence Network, were also used to reveal the co-occurrence of crash patterns. Results show that text mining analytics is a promising method and underscore the multifactorial nature of traffic crashes, including intertwining human decisions and vehicular conditions. The recurrent themes across all analyses highlight the need for a balanced approach to road safety, merging both proactive and reactive measures. Emphasis on driver education and awareness around animal-related incidents is paramount.

翻訳日:2024-06-17 17:54:01 公開日:2024-06-11

# 論文へのコメント: 位置: 大規模トラベリングセールスマン問題の解決のためのポストホック検索に基づくニューラルアプローチの再考

Comment on paper: Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems ( http://arxiv.org/abs/2406.09441v1 )

ライセンス: Link先を確認

Yimeng Min,

(参考訳) 我々は,SoftDistの論文(Xia et al )において,(1)異なるベースラインのすべてのステップを同じハードウェア環境で実行できないこと,(2)他のベースラインとの比較において不整合時間測定を使用すること,の2つの主要な課題を識別する。これらの問題は欠点のある結論に繋がる。すべてのステップが同じハードウェア環境で実行される場合、SoftDistの主要なクレームはもはやサポートされない。

We identify two major issues in the SoftDist paper (Xia et al.): (1) the failure to run all steps of different baselines on the same hardware environment, and (2) the use of inconsistent time measurements when comparing to other baselines. These issues lead to flawed conclusions. When all steps are executed in the same hardware environment, the primary claim made in SoftDist is no longer supported.

翻訳日:2024-06-17 17:54:01 公開日:2024-06-11

# Deep Contextualized Transformer を用いた質問分類

Question Classification with Deep Contextualized Transformer ( http://arxiv.org/abs/1910.10492v3 )

ライセンス: Link先を確認

Haozheng Luo, Ningwei Liu, Charles Feng,

(参考訳) 質問と回答に関する最新の作業は、Stanford Parse Treeを使用することだ。我々は,事前の作業に基づいて,Deep Contextualized Transformerを用いて質問・回答問題に対処する新しい手法を開発し,いくつかの異常表現を管理する。また、SQuADおよびSwDAデータセットの広範囲な評価を行い、産業ニーズのQA問題分類よりも大幅に改善されたことを示す。また,問題解の精度と効率性に対する異なるモデルの影響についても検討する。本手法はより高精度でQA問題の解法に有効であることを示す。

The latest work for Question and Answer problems is to use the Stanford Parse Tree. We build on prior work and develop a new method to handle the Question and Answer problem with the Deep Contextualized Transformer to manage some aberrant expressions. We also conduct extensive evaluations of the SQuAD and SwDA dataset and show significant improvement over QA problem classification of industry needs. We also investigate the impact of different models for the accuracy and efficiency of the problem answers. It shows that our new method is more effective for solving QA problems with higher accuracy

翻訳日:2024-06-16 18:08:02 公開日:2024-06-11

# 単語埋め込みメソッドは安定しているか、それに気を配るべきか?

Are Word Embedding Methods Stable and Should We Care About It? ( http://arxiv.org/abs/2104.08433v2 )

ライセンス: Link先を確認

Angana Borah, Manash Pratim Barman, Amit Awekar,

(参考訳) 表現学習法は、複数の実行で与えられたデータの類似した表現を一貫して生成している場合、安定であると考えられる。 Word Embedding Methods (WEM) は、与えられたテキストデータ中の各単語に対して密度の高いベクトル表現を生成する表現学習のクラスである。本研究の中心となる考え方は,単語の類似性に基づく内在的評価を用いたWEMの安定性の測定である。我々は、Word2Vec、GloVe、fastTextの3つの人気のあるWEMを実験した。安定度測定には,これらのモデルのトレーニングに係わる5つのパラメータの影響について検討する。われわれは、ウィキペディア、ニュース、ソング歌詞、欧州議会の議事録の4つの実世界のデータセットを用いて実験を行う。また、WEM安定性が3つの下流タスク(クラスタリング、POSタグ付け、フェアネス評価)に与える影響を観察した。我々の実験は、3つのWEMの中で、fastTextが最も安定しており、GloVeとWord2Vecが続くことを示している。

A representation learning method is considered stable if it consistently generates similar representation of the given data across multiple runs. Word Embedding Methods (WEMs) are a class of representation learning methods that generate dense vector representation for each word in the given text data. The central idea of this paper is to explore the stability measurement of WEMs using intrinsic evaluation based on word similarity. We experiment with three popular WEMs: Word2Vec, GloVe, and fastText. For stability measurement, we investigate the effect of five parameters involved in training these models. We perform experiments using four real-world datasets from different domains: Wikipedia, News, Song lyrics, and European parliament proceedings. We also observe the effect of WEM stability on three downstream tasks: Clustering, POS tagging, and Fairness evaluation. Our experiments indicate that amongst the three WEMs, fastText is the most stable, followed by GloVe and Word2Vec.

翻訳日:2024-06-15 02:48:35 公開日:2024-06-11

# ランダムに相互作用するスピンモデルにおける創発的ユニバーサルクエンチダイナミクス

Emergent Universal Quench Dynamics in Randomly Interacting Spin Models ( http://arxiv.org/abs/2406.07625v1 )

ライセンス: Link先を確認

Yuchen Li, Tian-Gang Zhou, Ze Wu, Pai Peng, Shengyu Zhang, Riqiang Fu, Ren Zhang, Wei Zheng, Pengfei Zhang, Hui Zhai, Xinhua Peng, Jiangfeng Du,

(参考訳) 普遍性はしばしば、その微妙な複雑さと多様性にもかかわらず、量子多体系の低エネルギー平衡物理学に現れる。近年、量子多体系の遠方平衡力学の研究への関心が高まっている。このような力学は、通常、伝統的な低エネルギー理論の記述を超える非常に励起的な状態を含む。このような非平衡力学において普遍的挙動がもたらされるかどうかは、量子力学のフロンティアにおける中心的な問題である。本稿では、ランダムに相互作用するスピンのアンサンブルによって記述された固体NMR系におけるスピン脱分極過程を監視することにより、普遍力学の実験的観察を報告する。スピン脱分極は、高温における時間的スピン-スピン相関関数と関連付けられる。これらの相関関数が普遍関数形式に従うという驚くべき現象を発見した。この実験的な事実は、この普遍性につながるスピン脱分極力学における支配的な相互作用過程を特定するのに役立つ。我々の観測は、高温における非平衡力学においても普遍性の存在を示し、低エネルギー物理学において確立された普遍性を補完するものである。

Universality often emerges in low-energy equilibrium physics of quantum many-body systems, despite their microscopic complexity and variety. Recently, there has been a growing interest in studying far-from-equilibrium dynamics of quantum many-body systems. Such dynamics usually involves highly excited states beyond the traditional low-energy theory description. Whether universal behaviors can also emerge in such non-equilibrium dynamics is a central issue at the frontier of quantum dynamics. Here we report the experimental observation of universal dynamics by monitoring the spin depolarization process in a solid-state NMR system described by an ensemble of randomly interacting spins. The spin depolarization can be related to temporal spin-spin correlation functions at high temperatures. We discover a remarkable phenomenon that these correlation functions obey a universal functional form. This experimental fact helps us identify the dominant interacting processes in the spin depolarization dynamics that lead to this universality. Our observation demonstrates the existence of universality even in non-equilibrium dynamics at high temperatures, thereby complementing the well-established universality in low-energy physics.

翻訳日:2024-06-14 22:37:00 公開日:2024-06-11

# SAADを用いた自動車システムの異常検出の高速化:統計的異常検出

Enhanced Anomaly Detection in Automotive Systems Using SAAD: Statistical Aggregated Anomaly Detection ( http://arxiv.org/abs/2406.08516v1 )

ライセンス: Link先を確認

Dacian Goina, Eduard Hogea, George Maties,

(参考訳) 本稿では,SAADと呼ばれる新しい異常検出手法を提案する。 SAADアプローチは、高度な統計技術と機械学習を統合し、その有効性は、自動車領域内のハードウェア・イン・ザ・ループ(HIL)環境からの実センサデータを検証することによって実証される。 SAADの重要な革新は、ドロップアウト層によって強化されたFCN(Fully Connected Networks)と組み合わせることで、異常検出の精度と堅牢性を大幅に向上する能力である。総合的な実験的評価では、スタンドアロン統計手法は72.1%の精度を達成し、ディープラーニングモデルは71.5%の精度を達成している。対照的に、集約された手法は88.3%の精度、F1スコア0.921の精度を実現し、個々のモデルよりも優れている。これらの結果はSAADの有効性を浮き彫りにし、自動車システムを含む様々な分野への応用の可能性を示している。

This paper presents a novel anomaly detection methodology termed Statistical Aggregated Anomaly Detection (SAAD). The SAAD approach integrates advanced statistical techniques with machine learning, and its efficacy is demonstrated through validation on real sensor data from a Hardware-in-the-Loop (HIL) environment within the automotive domain. The key innovation of SAAD lies in its ability to significantly enhance the accuracy and robustness of anomaly detection when combined with Fully Connected Networks (FCNs) augmented by dropout layers. Comprehensive experimental evaluations indicate that the standalone statistical method achieves an accuracy of 72.1%, whereas the deep learning model alone attains an accuracy of 71.5%. In contrast, the aggregated method achieves a superior accuracy of 88.3% and an F1 score of 0.921, thereby outperforming the individual models. These results underscore the effectiveness of SAAD, demonstrating its potential for broad application in various domains, including automotive systems.

翻訳日:2024-06-14 22:37:00 公開日:2024-06-11

# アラビア語学習支援のための質問応答(QA)モデル

Question-Answering (QA) Model for a Personalized Learning Assistant for Arabic Language ( http://arxiv.org/abs/2406.08519v1 )

ライセンス: Link先を確認

Mohammad Sammoudi, Ahmad Habaybeh, Huthaifa I. Ashqar, Mohammed Elhenawy,

(参考訳) 本稿では,アラビア語用にカスタマイズされたBERTトランスフォーマーを用いたパーソナライズされた学習アシスタントのための質問応答モデルの作成,最適化,評価について述べる。このモデルは特にパレスチナのカリキュラムの科学教科書に微調整された。私たちのアプローチでは、理科教育の分野における質問に対する正しい回答を自動的に生成するためにBERTの素晴らしい能力を使用します。このモデルは、パレスチナのカリキュラムで11年生と12年生の生物学の本を用いて微調整することで、関連する情報を理解し、抽出する能力を向上させる。これにより、啓蒙応答の生成におけるモデルの有効性が向上する。 Exact Match(EM)とF1スコアは、モデルのパフォーマンスを評価するために使用され、結果は、EMスコアが20%、F1スコアが51%である。これらの結果は、このモデルがパレスチナの科学書の文脈で質問を理解し、反応することができることを示している。この結果は、アラビア語の学生の質問を学習し理解するためのBERTベースのQAモデルの可能性を示している。

This paper describes the creation, optimization, and assessment of a question-answering (QA) model for a personalized learning assistant that uses BERT transformers customized for the Arabic language. The model was particularly finetuned on science textbooks in Palestinian curriculum. Our approach uses BERT's brilliant capabilities to automatically produce correct answers to questions in the field of science education. The model's ability to understand and extract pertinent information is improved by finetuning it using 11th and 12th grade biology book in Palestinian curriculum. This increases the model's efficacy in producing enlightening responses. Exact match (EM) and F1 score metrics are used to assess the model's performance; the results show an EM score of 20% and an F1 score of 51%. These findings show that the model can comprehend and react to questions in the context of Palestinian science book. The results demonstrate the potential of BERT-based QA models to support learning and understanding Arabic students questions.

翻訳日:2024-06-14 22:37:00 公開日:2024-06-11

# NLP技術を用いたアラビア語の科学実験のための自動質問生成

Automated Question Generation for Science Tests in Arabic Language Using NLP Techniques ( http://arxiv.org/abs/2406.08520v1 )

ライセンス: Link先を確認

Mohammad Tami, Huthaifa I. Ashqar, Mohammed Elhenawy,

(参考訳) 教育評価のための質問生成は、教育に応用される人工知能における成長分野である。これらの質問生成ツールは、インテリジェント・チュータリングシステムや対話型プラットフォームなど、教育技術分野において重要な役割を担っている。明確な答えを必要とする評価質問の自動生成は、通常、宣言文内の構文的および意味的な指示に依存し、質問に変換される。最近の研究は、アラビア語における評価教育問題の発生を探求している。報告された性能は、文解析の不正確さ、名前の認識の問題、ルールベースの質問変換に起因する誤りなど、固有の誤りによって悪影響を受けている。さらに、長大なアラビア語文の複雑さがこれらの課題に寄与している。本研究は,キーワードとキーフレーズ抽出,質問生成,その後のランク付けという3段階のプロセスに基づいて,アラビア語の革新的な質問生成システムを提案する。本研究の目的は,アラビア語における評価質問の自動生成に関わる課題に対処することである。提案手法と結果から,83.50%の精度,78.68%のリコール,80.95%のFlスコアが得られた。人的評価によりモデルの有効性が確認され、平均評価は84%となった。

Question generation for education assessments is a growing field within artificial intelligence applied to education. These question-generation tools have significant importance in the educational technology domain, such as intelligent tutoring systems and dialogue-based platforms. The automatic generation of assessment questions, which entail clear-cut answers, usually relies on syntactical and semantic indications within declarative sentences, which are then transformed into questions. Recent research has explored the generation of assessment educational questions in Arabic. The reported performance has been adversely affected by inherent errors, including sentence parsing inaccuracies, name entity recognition issues, and errors stemming from rule-based question transformation. Furthermore, the complexity of lengthy Arabic sentences has contributed to these challenges. This research presents an innovative Arabic question-generation system built upon a three-stage process: keywords and key phrases extraction, question generation, and subsequent ranking. The aim is to tackle the difficulties associated with automatically generating assessment questions in the Arabic language. The proposed approach and results show a precision of 83.50%, a recall of 78.68%, and an Fl score of 80.95%, indicating the framework high efficiency. Human evaluation further confirmed the model efficiency, receiving an average rating of 84%.

翻訳日:2024-06-14 22:37:00 公開日:2024-06-11

# 組込み型マルチモーダルラーニングによる生存率向上のためのパン扁平上皮癌

Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes ( http://arxiv.org/abs/2406.08521v1 )

ライセンス: Link先を確認

Asim Waqas, Aakash Tripathi, Paul Stewart, Mia Naeini, Ghulam Rasool,

(参考訳) がんクリニックは、遺伝子から臓器レベルまで、さまざまなスケールで疾患データをキャプチャする。現在のバイオインフォマティクス法は、このデータの不均一な性質、特に欠落したモダリティを扱うのに苦労している。 PARADIGMは,多モーダルな異種データセットから学習し,臨床結果の予測を改善するためのグラフニューラルネットワーク(GNN)フレームワークである。 PARADIGMは、基礎モデルを使用してマルチ解像度データから埋め込みを生成し、それらを患者レベルの表現に集約し、それらを統一されたグラフに融合し、生存分析のようなタスクのパフォーマンスを向上させる。膵扁平上皮癌においてGNNを訓練し,Moffitt Cancer Center肺SCCデータに対するアプローチを検証した。マルチモーダルGNNは、患者生存予測において他のモデルより優れている。さまざまなスケールにわたる個々のデータモダリティの収束は、より洞察に富んだ病気の見方を提供する。我々のソリューションは、患者の状況を包括的に理解することを目的としており、異種データ統合と最大データビューの収束の利点についての洞察を提供する。

Cancer clinics capture disease data at various scales, from genetic to organ level. Current bioinformatic methods struggle to handle the heterogeneous nature of this data, especially with missing modalities. We propose PARADIGM, a Graph Neural Network (GNN) framework that learns from multimodal, heterogeneous datasets to improve clinical outcome prediction. PARADIGM generates embeddings from multi-resolution data using foundation models, aggregates them into patient-level representations, fuses them into a unified graph, and enhances performance for tasks like survival analysis. We train GNNs on pan-Squamous Cell Carcinomas and validate our approach on Moffitt Cancer Center lung SCC data. Multimodal GNN outperforms other models in patient survival prediction. Converging individual data modalities across varying scales provides a more insightful disease view. Our solution aims to understand the patient's circumstances comprehensively, offering insights on heterogeneous data integration and the benefits of converging maximum data views.

翻訳日:2024-06-14 22:37:00 公開日:2024-06-11

# ExioML:グローバルセクタサステナビリティにおける機械学習のためのエコエコノミクスデータセット

ExioML: Eco-economic dataset for Machine Learning in Global Sectoral Sustainability ( http://arxiv.org/abs/2406.09046v1 )

ライセンス: Link先を確認

Yanming Guo, Jin Ma,

(参考訳) 環境拡張多段階インプット・アウトプット分析は、経済活動の環境影響を評価するための生態経済学の主要な枠組みである。本稿では,持続可能性分析のための最初の機械学習ベンチマークデータセットであるExioMLを紹介する。セクターサステナビリティを評価し,データセットのユーザビリティを実証するために,温室効果ガスのレグレッションタスクを実施した。従来の浅層モデルと深層学習モデルを比較し,因子会計表を多用し,分類的・数値的特徴を取り入れた。この結果から,ExioMLはユーザビリティが高く,深層およびアンサンブルモデルによる平均二乗誤差の低減を可能にし,将来の機械学習研究のベースラインを確立した。 ExioMLを通じて、さまざまな機械学習アプリケーションをサポートする基盤データセットを構築し、気候変動対策と持続可能な投資決定を促進することを目指している。

The Environmental Extended Multi-Regional Input-Output analysis is the predominant framework in Ecological Economics for assessing the environmental impact of economic activities. This paper introduces ExioML, the first Machine Learning benchmark dataset designed for sustainability analysis, aimed at lowering barriers and fostering collaboration between Machine Learning and Ecological Economics research. A crucial greenhouse gas emission regression task was conducted to evaluate sectoral sustainability and demonstrate the usability of the dataset. We compared the performance of traditional shallow models with deep learning models, utilizing a diverse Factor Accounting table and incorporating various categorical and numerical features. Our findings reveal that ExioML, with its high usability, enables deep and ensemble models to achieve low mean square errors, establishing a baseline for future Machine Learning research. Through ExioML, we aim to build a foundational dataset supporting various Machine Learning applications and promote climate actions and sustainable investment decisions.

翻訳日:2024-06-14 18:05:18 公開日:2024-06-11

# フィルタされた2モードスクイーズ混合状態における絡み合い, スクイーズおよび非局所性

Entanglement, Squeezing and non-Locality in Filtered Two-Mode Squeezed Mixed States ( http://arxiv.org/abs/2406.09134v1 )

ライセンス: Link先を確認

Souvik Agasti,

(参考訳) 連続可変2モード圧縮混合状態のスペクトル成分間の絡み合いと非局所性について検討し,その限界を同定した。これらのスペクトル成分は、光学系でよく用いられるフィルタを用いて出力モードから選択される。絡み合いと非局所性は、フィルタが同一であるときにピークに達する。しかし、非同一性フィルタを適用しながら入力スキューズする度合いを増大させると、絡み合いと非局所性の両方が乱れ、ベル状のパターンが生まれる。さらに、絡み合いと非局所性のための正確な境界を提供する。さらに,2モードのハイブリッド二次体のスケズングを絡み合いの尺度として評価し,対数ネガティビティとどのように類似しているかを示した。このフィルタと組み合わせて、2モードの加圧熱光の集団は、最大で加圧されたハイブリッド二次構造の角度に影響を及ぼす。

We investigate the entanglement and non-locality between specific spectral components of continuous variable two-mode squeezed mixed states, identifying their limits. These spectral components are selected from output modes using filters commonly employed in optomechanical systems. Both entanglement and non-locality reach their peak when the filters are identical. However, increasing the degree of input squeezing while applying non-identical filters disrupts both entanglement and non-locality, leading to a bell-shaped pattern. Additionally, we provide precise boundaries for entanglement and non-locality. Furthermore, we also evaluate the squeezing of two-mode hybrid quadrature as a measure of entanglement, thereby demonstrating how it remains analogous to logarithmic negativity. Combined with the filter, the population of two-mode squeezed thermal light influences the angle of a maximally squeezed hybrid quadrature.

翻訳日:2024-06-14 17:34:25 公開日:2024-06-11

# 状態準備と一元合成における汚れ量子ビットのTゲート

Trading T gates for dirty qubits in state preparation and unitary synthesis ( http://arxiv.org/abs/1812.00954v2 )

ライセンス: Link先を確認

Guang Hao Low, Vadym Kliuchnikov, Luke Schaeffer,

(参考訳) 普遍的なフォールトトレラントゲートセット e g Clifford+T からの任意の量子状態とユニタリの効率的な合成は、量子計算における重要なサブルーチンである。大規模な量子アルゴリズムは、コヒーレントな量子情報を符号化する多くの量子ビットを特徴としているが、計算の一部にアイドルを保っているため、ゲート数、特に高価なTゲートの数を最小化すれば、これらを用いるべきである。我々は、空間とTゲートの間のトレードオフを実現するため、任意の次元-$N$純量子状態を作成する量子アルゴリズムを提案する。我々のスキームは、$\mathcal{O}(\log{(N/\epsilon)})$ clean qubitsと$\sim(\lambda\log{(\frac{\log{N}}{\epsilon})})$ dirty qubitsを使って、Tゲートコストを$\mathcal{O}(\frac{N}{\lambda}+\lambda\log{\frac{N}{\epsilon}}\log{\log{N}}{\epsilon}})$に下げる。このトレードオフは、下界を数える無条件ゲートを通して証明された対数的因子に最適であり、最良の場合、前回の無条件アプローチよりもTカウントの二次的な改善である。状態生成への還元によるユニタリ合成についても同様のことが証明されている。我々の構成は、任意の古典的データに対する量子オラクルのT効率回路の実装である。

Efficient synthesis of arbitrary quantum states and unitaries from a universal fault-tolerant gate-set e.g. Clifford+T is a key subroutine in quantum computation. As large quantum algorithms feature many qubits that encode coherent quantum information but remain idle for parts of the computation, these should be used if it minimizes overall gate counts, especially that of the expensive T-gates. We present a quantum algorithm for preparing any dimension-$N$ pure quantum state specified by a list of $N$ classical numbers, that realizes a trade-off between space and T-gates. Our scheme uses $\mathcal{O}(\log{(N/\epsilon)})$ clean qubits and a tunable number of $\sim(\lambda\log{(\frac{\log{N}}{\epsilon})})$ dirty qubits, to reduce the T-gate cost to $\mathcal{O}(\frac{N}{\lambda}+\lambda\log{\frac{N}{\epsilon}}\log{\frac{\log{N}}{\epsilon}})$. This trade-off is optimal up to logarithmic factors, proven through an unconditional gate counting lower bound, and is, in the best case, a quadratic improvement in T-count over prior ancillary-free approaches. We prove similar statements for unitary synthesis by reduction to state preparation. Underlying our constructions is a T-efficient circuit implementation of a quantum oracle for arbitrary classical data.

翻訳日:2024-06-14 02:02:19 公開日:2024-06-11

# コンベックスゲームにおける平衡予測学習のための演算子分割

Operator Splitting for Learning to Predict Equilibria in Convex Games ( http://arxiv.org/abs/2106.00906v4 )

ライセンス: Link先を確認

Daniel McKenzie, Howard Heaton, Qiuwei Li, Samy Wu Fung, Stanley Osher, Wotao Yin,

(参考訳) 競合するエージェントのシステムは、しばしばゲームとしてモデル化される。合理性を仮定すると、最も可能性の高い結果は平衡(例えばナッシュ平衡)によって与えられる。多くの実践的な環境では、ゲームは文脈、すなわちいかなるエージェントの制御以外の追加データ(例えば交通の天気や市場経済の財政政策)に影響を受けている。正確なゲーム力学は分かっていないが、(コンテキスト、平衡)ペアからなる膨大な歴史的データが利用可能であり、文脈のみに与えられる平衡を予測できる解法を学ぶ可能性を高める。平衡を自然に出力するニューラルネットワークのクラスであるNash Fixed Point Networks (N-FPNs)を紹介する。重要なことに、N-FPNは複雑なエージェントアクションセットを扱うために、高価なプロジェクションを避けながら制約デカップリング方式を採用している。経験的に、N-FPNは暗黙のネットワークをトレーニングするための最近開発されたヤコビアンフリーバックプロパゲーション技術と互換性があり、従来のモデルよりもはるかに高速で訓練が容易である。実験の結果,N-FPNは既存の学習ゲーム解法よりも桁違いに大きい問題にスケール可能であることがわかった。

Systems of competing agents can often be modeled as games. Assuming rationality, the most likely outcomes are given by an equilibrium (e.g. a Nash equilibrium). In many practical settings, games are influenced by context, i.e. additional data beyond the control of any agent (e.g. weather for traffic and fiscal policy for market economies). Often the exact game mechanics are unknown, yet vast amounts of historical data consisting of (context, equilibrium) pairs are available, raising the possibility of learning a solver which predicts the equilibria given only the context. We introduce Nash Fixed Point Networks (N-FPNs), a class of neural networks that naturally output equilibria. Crucially, N- FPNs employ a constraint decoupling scheme to handle complicated agent action sets while avoiding expensive projections. Empirically, we find N-FPNs are compatible with the recently developed Jacobian-Free Backpropagation technique for training implicit networks, making them significantly faster and easier to train than prior models. Our experiments show N-FPNs are capable of scaling to problems orders of magnitude larger than existing learned game solvers.

翻訳日:2024-06-14 02:02:19 公開日:2024-06-11

# ダイヤモンドの量子欠陥を用いた集積回路活動の三次元イメージング

Three-dimensional imaging of integrated-circuit activity using quantum defects in diamond ( http://arxiv.org/abs/2112.12242v2 )

ライセンス: Link先を確認

Marwa Garsi, Rainer Stöhr, Andrej Denisenko, Farida Shagieva, Nils Trautmann, Ulrich Vogl, Badou Sene, Florian Kaiser, Andrea Zappe, Rolf Reuter, Jörg Wrachtrup,

(参考訳) 半導体ベースの技術のミクロンおよびサブミクロンレギュレーションへの継続的なスケーリングにより、デバイス密度は高く、消費電力も低くなった。このようなスケールでは、自己加熱や電流漏れなどの多くの物理的現象が重要となり、これらの特徴を明らかにするために現在の密度をマッピングすることは、現代のエレクトロニクスの発展にとって決定的なことである。しかし、高度な非侵襲技術は、感度が低く、空間分解能が悪く、2次元の空間マッピングに限られる。ここでは, ダイヤモンド中の窒素空孔近傍のセンターを用いて, 多層集積回路内を電流流によって生成したOersted場を予備開発時に探究する。本研究では,電流密度の3次元成分を約$\approx 10 \,\rm \mu A / \mu m^2$,室温でのサブミクロン空間分解能で再現した。また、異なる層内の電流の局在を報告し、電子チップ内の異常な電流の流れを観察する。そこで本手法は,ナノスケール半導体チップの3次元電流マッピングに向けた決定的なステップを提供する。

The continuous scaling of semiconductor-based technologies to micron and sub-micron regimes has resulted in higher device density and lower power dissipation. Many physical phenomena such as self-heating or current leakage become significant at such scales, and mapping current densities to reveal these features is decisive for the development of modern electronics. However, advanced non-invasive technologies either offer low sensitivity or poor spatial resolution and are limited to two-dimensional spatial mapping. Here we use near-surface nitrogen-vacancy centres in diamond to probe Oersted fields created by current flowing within a multi-layered integrated circuit in pre-development. We show the reconstruction of the three-dimensional components of the current density with a magnitude down to about $\approx 10 \,\rm \mu A / \mu m^2$ and sub-micron spatial resolution at room temperature. We also report the localisation of currents in different layers and observe anomalous current flow in an electronic chip. Our method provides, therefore a decisive step toward three-dimensional current mapping in technologically relevant nanoscale electronics chips.

翻訳日:2024-06-14 02:02:19 公開日:2024-06-11

# 顔の認識システム:DNNを特定の人だけに強制的に操作する

Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons ( http://arxiv.org/abs/2301.03118v2 )

ライセンス: Link先を確認

Irad Zehavi, Roee Nitzan, Adi Shamir,

(参考訳) 本稿では,ディープシームズニューラルネットワークの一般的なアーキテクチャに基づいて,あらゆる顔認識モデルに新しい種類のバックドアを植える方法について述べる。これらのバックドアは、攻撃者によって事前に選択された特定の人物の自然なイメージのみに、システムの外観を制御したり、トリガーを挿入したりすることなく、システムを強制する。例えば、そのようなバックドアシステムは、ある人物の2つのイメージを別人、または同一人物の2つのイメージを同一人物と分類し、その決定の正しさにほとんど影響を与えないことを示す。モデルの最後の重み行列に線形変換を適用することで、バックドアのイメージのみを用いて、追加のトレーニングや最適化を行わずに、両方のバックドアを実装できることが驚きである。我々の攻撃の特徴は、複数のバックドアを同一モデルに独立して設置できることである。我々は,SOTA顔認識システムに対する攻撃を実験的に検証した。 10人の有名人を個別に匿名化しようとしたが、ネットワークは2つの画像が同じ人物であることを9,7.02 %から9,8.31 %に認識できなかった。例えば、非常に異なるモーガン・フリーマンとスカーレット・ヨハンソンを混同しようとしたとき、彼らのイメージは当時の9,8.47 %で同一人物であると宣言された。バックドアの種類によっては、お互いのパフォーマンスに最小限の影響しか与えない複数のバックドアを順次設置した(例えば、同じモデルで有名人10人全員を匿名化することで、有名人の成功率が1.01\%以下になった)。実験では、他人のネットワークの良さがほとんど損なわれませんでした(ほとんどの場合、0.05\%以下で劣化しました)。

In this paper, we describe how to plant novel types of backdoors in any facial recognition model based on the popular architecture of deep Siamese neural networks. These backdoors force the system to err only on natural images of specific persons who are preselected by the attacker, without controlling their appearance or inserting any triggers. For example, we show how such a backdoored system can classify any two images of a particular person as different people, or any two images of a particular pair of persons as the same person, with almost no effect on the correctness of its decisions for other persons. Surprisingly, we show that both types of backdoors can be implemented by applying linear transformations to the model's last weight matrix, with no additional training or optimization, using only images of the backdoor identities. A unique property of our attack is that multiple backdoors can be independently installed in the same model by multiple attackers, who may not be aware of each other's existence, with almost no interference. We have experimentally verified the attacks on a SOTA facial recognition system. When we tried to individually anonymize ten celebrities, the network failed to recognize two of their images as being the same person in $97.02\%$ to $98.31\%$ of the time. When we tried to confuse between the extremely different-looking Morgan Freeman and Scarlett Johansson, for example, their images were declared to be the same person in $98.47 \%$ of the time. For each type of backdoor, we sequentially installed multiple backdoors with minimal effect on the performance of each other (for example, anonymizing all ten celebrities on the same model reduced the success rate for each celebrity by no more than $1.01\%$). In all of our experiments, the benign accuracy of the network on other persons barely degraded (in most cases, it degraded by less than $0.05\%$).

翻訳日:2024-06-14 01:52:33 公開日:2024-06-11

# 指数型家族雑音を用いたグラフラプラシアン学習

Graph Laplacian Learning with Exponential Family Noise ( http://arxiv.org/abs/2306.08201v2 )

ライセンス: Link先を確認

Changhao Shi, Gal Mishne,

(参考訳) グラフ信号処理(GSP)は、非ユークリッド領域の信号を分析するための重要なフレームワークである。グラフフーリエ変換(GFT)は、組合せグラフラプラシア行列を用いて、グラフ周波数領域における信号のスペクトル分解を明らかにする。しかし、GSP法の適用における一般的な課題は、多くのシナリオにおいてシステムの基盤となるグラフが不明であることである。そのような場合の解決策は、一般にグラフまたはネットワーク推論と呼ばれる、利用可能なデータから観測されていないグラフを構築することである。異なるグラフ推論法が存在するが、これらは滑らかなグラフ信号または単純な加法的ガウスノイズから学ぶことに限定されている。離散数や二進数といった他のノイズの多いデータは、現実のアプリケーションではよく見られるが、グラフ推論では過小評価されている。本稿では,指数関数的ファミリーノイズによって劣化したグラフ信号から学習する汎用グラフ推論フレームワークを提案する。本フレームワークは,連続的なスムーズなグラフ信号から様々なデータタイプまで,従来の手法を一般化する。雑音信号からラプラシアングラフと保存されない滑らかな表現を共同で推定する交互アルゴリズムを提案する。また、我々のアプローチを変分形式に拡張し、潜在滑らかな表現の固有の確率性を考慮した。最後に、実世界のグラフ信号はしばしば非独立で時間的に相関しているので、元の設定を時間頂点の定式化に適応させる。ノイズモデルミスマッチに苦しむ競合するラプラシアン推定法より優れた合成および実世界のデータを示す。

Graph signal processing (GSP) is a prominent framework for analyzing signals on non-Euclidean domains. The graph Fourier transform (GFT) uses the combinatorial graph Laplacian matrix to reveal the spectral decomposition of signals in the graph frequency domain. However, a common challenge in applying GSP methods is that in many scenarios the underlying graph of a system is unknown. A solution in such cases is to construct the unobserved graph from available data, which is commonly referred to as graph or network inference. Although different graph inference methods exist, these are restricted to learning from either smooth graph signals or simple additive Gaussian noise. Other types of noisy data, such as discrete counts or binary digits, are rather common in real-world applications, yet are underexplored in graph inference. In this paper, we propose a versatile graph inference framework for learning from graph signals corrupted by exponential family noise. Our framework generalizes previous methods from continuous smooth graph signals to various data types. We propose an alternating algorithm that jointly estimates the graph Laplacian and the unobserved smooth representation from the noisy signals. We also extend our approach to a variational form to account for the inherent stochasticity of the latent smooth representation. Finally, since real-world graph signals are frequently non-independent and temporally correlated, we further adapt our original setting to a time-vertex formulation. We demonstrate on synthetic and real-world data that our new algorithms outperform competing Laplacian estimation methods that suffer from noise model mismatch.

翻訳日:2024-06-14 01:42:49 公開日:2024-06-11

# AViT:小さな皮膚病変セグメンテーションデータセットに対する視覚変換器の適応

AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets ( http://arxiv.org/abs/2307.13897v2 )

ライセンス: Link先を確認

Siyi Du, Nourhan Bayasi, Ghassan Hamarneh, Rafeef Garbi,

(参考訳) 皮膚病変セグメンテーション(SLS)は皮膚病変解析において重要な役割を担っている。視覚トランスフォーマー(ViT)は、SLSにとって注目に値するソリューションと考えられているが、固有のパラメータ重構造と誘導バイアスの欠如により、畳み込みニューラルネットワーク(CNN)と比較して、より多くのトレーニングデータを必要とする。この問題を軽減するため、現在のSLSデータセット上で、微調整済みのViTバックボーンにアプローチすることで、より大規模な自然画像から学んだ知識を活用して、必要な皮膚トレーニングデータの量を減らすことを目指している。しかし、大きなバックボーンの全てのパラメータを完全に微調整することは、計算コストが高く、メモリ集約的である。本稿では,任意のトレーニング済みViTをSLSタスクに転送することで,ViTのデータハンガーを緩和する,新しい効率的な戦略であるAViTを提案する。具体的には、トランス層に軽量モジュール(アダプタ)を統合することで、トレーニング済みの重みを更新することなく、ViTの特徴表現を変調する。さらに,入力画像からサブジェネレータとして浅いCNNを用いて,細粒度情報とCNNの帰納バイアスを把握し,セグメント化タスクを小さなデータセット上で案内する。 4つの皮膚病変データセットに関する定量的実験により、AViTはSOTAよりも競争力があり、時には優れているが、トレーニング可能なパラメータは極めて少ないことが示されている。私たちのコードはhttps://github.com/siyi-wind/AViT.comで利用可能です。

Skin lesion segmentation (SLS) plays an important role in skin lesion analysis. Vision transformers (ViTs) are considered an auspicious solution for SLS, but they require more training data compared to convolutional neural networks (CNNs) due to their inherent parameter-heavy structure and lack of some inductive biases. To alleviate this issue, current approaches fine-tune pre-trained ViT backbones on SLS datasets, aiming to leverage the knowledge learned from a larger set of natural images to lower the amount of skin training data needed. However, fully fine-tuning all parameters of large backbones is computationally expensive and memory intensive. In this paper, we propose AViT, a novel efficient strategy to mitigate ViTs' data-hunger by transferring any pre-trained ViTs to the SLS task. Specifically, we integrate lightweight modules (adapters) within the transformer layers, which modulate the feature representation of a ViT without updating its pre-trained weights. In addition, we employ a shallow CNN as a prompt generator to create a prompt embedding from the input image, which grasps fine-grained information and CNN's inductive biases to guide the segmentation task on small datasets. Our quantitative experiments on 4 skin lesion datasets demonstrate that AViT achieves competitive, and at times superior, performance to SOTA but with significantly fewer trainable parameters. Our code is available at https://github.com/siyi-wind/AViT.

翻訳日:2024-06-13 23:42:48 公開日:2024-06-11

# ニューラルソースコード要約のための意味的類似性損失

Semantic Similarity Loss for Neural Source Code Summarization ( http://arxiv.org/abs/2308.07429v2 )

ライセンス: Link先を確認

Chia-Yi Su, Collin McMillan,

(参考訳) 本稿では,ニューラルネットワークの要約における損失関数として意味的類似度測定を用いた手法と評価について述べる。コード要約は、ソースコードの自然言語記述を記述するタスクである。ニューラルネットワークの要約(英: Neural code summarization)とは、ニューラルネットワークを用いてこれらの記述を生成する自動化技術である。現在のアプローチのほとんどすべてが、ニューラルネットワークをスタンドアロンモデルとして、あるいはトレーニング済みの大規模言語モデル(g , GPT, Codex, LLaMA)の一部として含んでいる。しかし、ほとんどの場合、ネットワーク最適化に分類的クロスエントロピー(CCE)損失関数を使用する。 CCEの2つの問題は 1)全文を評価するのではなく、各単語の1対1の予測における損失を計算し、 2) 完全な予測が必要であり、同義語に対する部分的信用の余地は残っていない。本稿では,従来の意味的類似度指標に関する研究を拡張し,その課題を軽減するために意味的類似度を損失関数として用いた手法を示し,この手法をメトリクス駆動型と人為的両方の研究においていくつかの設定で評価する。本質的には,各単語の損失だけでなく,学習バッチごとの出力文予測全体の損失を計算するために,意味的類似度尺度を用いることを提案する。また,各単語に対するCCEの損失と組み合わせることで,ベースラインと比較してトレーニングプロセスの合理化を図ることを提案する。我々は,いくつかのベースラインに対するアプローチを評価し,ほとんどの条件で改善を報告した。

This paper presents a procedure for and evaluation of using a semantic similarity metric as a loss function for neural source code summarization. Code summarization is the task of writing natural language descriptions of source code. Neural code summarization refers to automated techniques for generating these descriptions using neural networks. Almost all current approaches involve neural networks as either standalone models or as part of a pretrained large language models e.g., GPT, Codex, LLaMA. Yet almost all also use a categorical cross-entropy (CCE) loss function for network optimization. Two problems with CCE are that 1) it computes loss over each word prediction one-at-a-time, rather than evaluating a whole sentence, and 2) it requires a perfect prediction, leaving no room for partial credit for synonyms. In this paper, we extend our previous work on semantic similarity metrics to show a procedure for using semantic similarity as a loss function to alleviate this problem, and we evaluate this procedure in several settings in both metrics-driven and human studies. In essence, we propose to use a semantic similarity metric to calculate loss over the whole output sentence prediction per training batch, rather than just loss for each word. We also propose to combine our loss with CCE for each word, which streamlines the training process compared to baselines. We evaluate our approach over several baselines and report improvement in the vast majority of conditions.

翻訳日:2024-06-13 23:42:48 公開日:2024-06-11

# 完全遺伝性原子性OML

Completely hereditarily atomic OMLs ( http://arxiv.org/abs/2308.08508v2 )

ライセンス: Link先を確認

John Harding, Andre Kornell,

(参考訳) 無限高さの既約完全原子型 OML は代数的かつ被覆性を持つことができない。しかし、カルムバッハの構成は代数的で 2-被覆性を持つような OML の例を示し、ケラーの構成は被覆性を持ち、完全に遺伝学的にアトミックであるような OML の例を提供する。完全に遺伝的にアトミックなOMLは、量子述語論理に相応しい代数的OMLを一般化する。

An irreducible complete atomic OML of infinite height cannot both be algebraic and have the covering property. However, Kalmbach's construction provides an example of such an OML that is algebraic and has the 2-covering property, and Keller's construction provides an example of such an OML that has the covering property and is completely hereditarily atomic. Completely hereditarily atomic OMLs generalize algebraic OMLs suitably to quantum predicate logic.

翻訳日:2024-06-13 23:42:48 公開日:2024-06-11

# 変圧器は未知系の最適フィルタリングを学習できるか?

Can Transformers Learn Optimal Filtering for Unknown Systems? ( http://arxiv.org/abs/2308.08536v3 )

ライセンス: Link先を確認

Haldun Balim, Zhe Du, Samet Oymak, Necmiye Ozay,

(参考訳) トランスフォーマーモデルは自然言語処理において大きな成功をおさめてきたが、そのポテンシャルは力学系では未解明のままである。本研究では,過去の全ての出力予測を生成する変換器を用いた最適出力推定問題について検討する。特に,様々な異なるシステムを用いて変圧器を訓練し,未知のダイナミクスを持つ未知のシステムの性能を評価する。経験的に、訓練された変圧器は異なる未知の系に非常によく適応し、線形系に対してカルマンフィルタが与える最適性能にさえ適合する。非d.d.ノイズ、時間変化力学、未知のパラメータを持つ四元数系のような非線形力学のより複雑な設定では、トランスフォーマーも有望な結果を示す。実験結果を支援するため,変圧器が所望の余剰リスクを達成するのに必要なトレーニングデータの量を定量化する統計的保証を提供する。最後に,性能低下につながる2つの問題のクラスを特定し,制御と推定にトランスフォーマーを使用する場合の注意点を強調することで,いくつかの制約を指摘した。

Transformer models have shown great success in natural language processing; however, their potential remains mostly unexplored for dynamical systems. In this work, we investigate the optimal output estimation problem using transformers, which generate output predictions using all the past ones. Particularly, we train the transformer using various distinct systems and then evaluate the performance on unseen systems with unknown dynamics. Empirically, the trained transformer adapts exceedingly well to different unseen systems and even matches the optimal performance given by the Kalman filter for linear systems. In more complex settings with non-i.i.d. noise, time-varying dynamics, and nonlinear dynamics like a quadrotor system with unknown parameters, transformers also demonstrate promising results. To support our experimental findings, we provide statistical guarantees that quantify the amount of training data required for the transformer to achieve a desired excess risk. Finally, we point out some limitations by identifying two classes of problems that lead to degraded performance, highlighting the need for caution when using transformers for control and estimation.

翻訳日:2024-06-13 23:42:48 公開日:2024-06-11

# 分類におけるスプーラス相関の測定--英訳における「クレバーハンズ」について

Measuring Spurious Correlation in Classification: 'Clever Hans' in Translationese ( http://arxiv.org/abs/2308.13170v2 )

ライセンス: Link先を確認

Angana Borah, Daria Pylypenko, Cristina Espana-Bonet, Josef van Genabith,

(参考訳) 近年の研究では、BERTをベースとした分類器が、真の翻訳信号ではなく、データとターゲット分類ラベルの間の素早い相関、特にトピック情報に乗じている、高性能なニューラル翻訳分類器における「クレバーハンズ」の挙動を示す証拠が示されている。翻訳信号は微妙な(特に専門的な翻訳のために)、ジャンル、スタイル、著者、特にトピックといった他の多くの信号と競合する。このことは、特に微妙なターゲット信号や挑戦的な(リソースの低い)データ設定において、分類器のパフォーマンスが、実際に分類器がターゲットとする信号と、データの急激な相関によるものであるという一般的な疑問を提起する。トピックベースの素早い相関に注目し、質問に2つの方向からアプローチする。一急激な話題情報及びデータにおけるその分布に関する知識がない場合。 (II) 突発的トピック相関の性質について, 若干の指標が得られた。目的 (i)データ中の素早い話題情報の指標として,教師なしトピックと対象分類ラベルとのアライメントを捉えた第一原理から尺度を作成する。本手法はクラスタリングにおける純度と同一であることを示し,分類のための「トピックフロア」(「ノイズフロア」など)を提案する。目的 (II) 既知の話題担体の分類におけるマスキングについて検討する。両方 (i)および (二)定量化及び定量化に寄与する (ii)急激な相関を緩和する。

Recent work has shown evidence of 'Clever Hans' behavior in high-performance neural translationese classifiers, where BERT-based classifiers capitalize on spurious correlations, in particular topic information, between data and target classification labels, rather than genuine translationese signals. Translationese signals are subtle (especially for professional translation) and compete with many other signals in the data such as genre, style, author, and, in particular, topic. This raises the general question of how much of the performance of a classifier is really due to spurious correlations in the data versus the signals actually targeted for by the classifier, especially for subtle target signals and in challenging (low resource) data settings. We focus on topic-based spurious correlation and approach the question from two directions: (i) where we have no knowledge about spurious topic information and its distribution in the data, (ii) where we have some indication about the nature of spurious topic correlations. For (i) we develop a measure from first principles capturing alignment of unsupervised topics with target classification labels as an indication of spurious topic information in the data. We show that our measure is the same as purity in clustering and propose a 'topic floor' (as in a 'noise floor') for classification. For (ii) we investigate masking of known spurious topic carriers in classification. Both (i) and (ii) contribute to quantifying and (ii) to mitigating spurious correlations.

翻訳日:2024-06-13 23:42:48 公開日:2024-06-11

# ニューラルネットワークにおける損失平坦性から圧縮表現への簡単な接続

A simple connection from loss flatness to compressed representations in neural networks ( http://arxiv.org/abs/2310.01770v3 )

ライセンス: Link先を確認

Shirui Chen, Stefano Recanatesi, Eric Shea-Brown,

(参考訳) ディープニューラルネットワークの一般化能力は、パラメータ空間における損失ランドスケープの形状に基づくものと、特徴空間における表現多様体の構造に基づくもの(つまり、単位活動の空間における)という、少なくとも2つの異なるアプローチのカテゴリを含む様々な方法で研究されてきた。これら2つのアプローチは関連しているが、これらは明確に研究されることはめったにない。ここでは、このギャップを埋める分析について述べる。ディープニューラルネットワークにおける学習の最終段階において、ニューラルネットワークの多様体の圧縮は、SGDが探索したミニマのまわりの損失の平坦さと相関することを示す。この相関関係は比較的単純な数学的関係によって予測される: 平坦な損失は、ニューラル表現の圧縮指標上の下限に対応する。本研究は,Ma と Ying による線形安定性の洞察に基づくもので,様々な圧縮測定値と鋭さを含む量の不等式を導出する。実験によって得られた不等式は,複数の実験環境における表現圧縮と損失シャープネスの連続的な正の相関を予測した。全体として、パラメータと特徴空間の両方におけるニューラルネットワークの一般化に関する双対視点を推し進める。

The generalization capacity of deep neural networks has been studied in a variety of ways, including at least two distinct categories of approaches: one based on the shape of the loss landscape in parameter space, and the other based on the structure of the representation manifold in feature space (that is, in the space of unit activities). Although these two approaches are related, they are rarely studied together explicitly. Here, we present an analysis that bridges this gap. We show that in the final phase of learning in deep neural networks, the compression of the manifold of neural representations correlates with the flatness of the loss around the minima explored by SGD. This correlation is predicted by a relatively simple mathematical relationship: a flatter loss corresponds to a lower upper bound on the compression metrics of neural representations. Our work builds upon the linear stability insight by Ma and Ying, deriving inequalities between various compression metrics and quantities involving sharpness. Empirically, our derived inequality predicts a consistently positive correlation between representation compression and loss sharpness in multiple experimental settings. Overall, we advance a dual perspective on generalization in neural networks in both parameter and feature space.

翻訳日:2024-06-13 23:33:02 公開日:2024-06-11

# インスタンスにもっと注意が必要だ - ループ収量の改善によるゼロショットパフォーマンス向上のために,LLMを使用したインスタンスのプロンプトを書き換える

Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance ( http://arxiv.org/abs/2310.02107v4 )

ライセンス: Link先を確認

Saurabh Srivastava, Chengyue Huang, Weiguo Fan, Ziyu Yao,

(参考訳) 大規模言語モデル(LLM)はゼロショットタスクのパフォーマンスに革命をもたらし、タスク固有のアノテーションの必要性を軽減し、タスクの一般化性を高めている。その進歩にもかかわらず、「ステップ・バイ・ステップ」のようなトリガーフレーズを用いた現在の手法は依然として限られている。 PRomPTedは「ループ内のLLM」というイノベーティブな方法に従って、個々のタスクインスタンスに対してゼロショットプロンプトを最適化する手法である。 GPT-4に基づく13のデータセットと10のタスクタイプにわたる包括的な評価により、PRomPTedは、入力プロンプトの代わりにタスク出力を洗練する、単純なゼロショットアプローチと強力なベースライン(すなわち「出力リファインメント」)の両方を著しく上回っていることが明らかとなった。実験の結果, 比較的弱い GPT-3.5 に対して, この利点が一般化されることが確認された。さらに興味深いことに, GPT-3.5 を用いてより強力な GPT-4 のプロンプトを書き換えるだけでなく, 時折 GPT-4 をプロンプトリライタとして使用する効果を上回ることが判明した。本研究は, ゼロショットLDMの性能向上だけでなく, より弱めのLCMを監視できる可能性も示しており, 最近では注目されている。最後に,Mistral 7B や Mixtral 8x7B などのオープンソース LLM の利点の一般化を確認した。

Large language models (LLMs) have revolutionized zero-shot task performance, mitigating the need for task-specific annotations while enhancing task generalizability. Despite its advancements, current methods using trigger phrases such as "Let's think step by step" remain limited. This study introduces PRomPTed, an approach that optimizes the zero-shot prompts for individual task instances following an innovative manner of "LLMs in the loop". Our comprehensive evaluation across 13 datasets and 10 task types based on GPT-4 reveals that PRomPTed significantly outperforms both the naive zero-shot approaches and a strong baseline (i.e., "Output Refinement") which refines the task output instead of the input prompt. Our experimental results also confirmed the generalization of this advantage to the relatively weaker GPT-3.5. Even more intriguingly, we found that leveraging GPT-3.5 to rewrite prompts for the stronger GPT-4 not only matches but occasionally exceeds the efficacy of using GPT-4 as the prompt rewriter. Our research thus presents a huge value in not only enhancing zero-shot LLM performance but also potentially enabling supervising LLMs with their weaker counterparts, a capability attracting much interest recently. Finally, our additional experiments confirm the generalization of the advantages to open-source LLMs such as Mistral 7B and Mixtral 8x7B.

翻訳日:2024-06-13 23:33:02 公開日:2024-06-11

# RIR-SF:マルチチャンネルマルチスピーカシナリオにおけるターゲット音声認識のための室内インパルス応答に基づく空間的特徴

RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios ( http://arxiv.org/abs/2311.00146v2 )

ライセンス: Link先を確認

Yiwen Shao, Shi-Xiong Zhang, Dong Yu,

(参考訳) マルチトーカー録音における音声認識(ASR)は困難である。マルチチャンネルオーディオとビジュアルキューの3次元空間データを用いた現在の手法は、主にターゲット話者からの直接波に焦点を合わせ、反射波の影響を見越して、残響環境における性能を阻害する。 RIR-SFは, 話者の位置, 室内音響, リフレクションダイナミクスを生かした, 室内インパルス応答(RIR)に基づく空間的特徴である。 RIR-SFは従来の3次元空間特性よりも優れており、理論的および経験的性能が優れている。また、RIR-SFのための最適化されたオールニューラルマルチチャネルASRフレームワークを提案し、マルチチャネル設定におけるターゲット話者ASRに対するCERの相対的な21.3倍の削減を実現した。 RIR-SFは認識精度を高め、従来の手法の限界を克服し、高残響シナリオの堅牢性を示す。

Automatic speech recognition (ASR) on multi-talker recordings is challenging. Current methods using 3D spatial data from multi-channel audio and visual cues focus mainly on direct waves from the target speaker, overlooking reflection wave impacts, which hinders performance in reverberant environments. Our research introduces RIR-SF, a novel spatial feature based on room impulse response (RIR) that leverages the speaker's position, room acoustics, and reflection dynamics. RIR-SF significantly outperforms traditional 3D spatial features, showing superior theoretical and empirical performance. We also propose an optimized all-neural multi-channel ASR framework for RIR-SF, achieving a relative 21.3\% reduction in CER for target speaker ASR in multi-channel settings. RIR-SF enhances recognition accuracy and demonstrates robustness in high-reverberation scenarios, overcoming the limitations of previous methods.

翻訳日:2024-06-13 23:33:02 公開日:2024-06-11

# 効率的なファインチューニングのための勾配型パラメータ選択法

Gradient-based Parameter Selection for Efficient Fine-Tuning ( http://arxiv.org/abs/2312.10136v3 )

ライセンス: Link先を確認

Zhi Zhang, Qizhe Zhang, Zijun Gao, Renrui Zhang, Ekaterina Shutova, Shiji Zhou, Shanghang Zhang,

(参考訳) 事前訓練されたモデルのサイズが大きくなるにつれて、さまざまな下流タスクのパラメータをすべて微調整して保存することは、コストがかかり、実現不可能になります。本稿では, パラメータ効率のよいパラメータ選択法, Gradient-based Parameter Selection (GPS) を提案し, 既訓練モデルから選択したパラメータを調整し, 残りのモデルを凍結したままにしておくことで, フルモデルファインチューニング法と比較して, 同様の, あるいは優れた性能が得られることを示した。本手法は,既存のパラメータ・パラメータ・効率的な微調整手法と異なり,学習段階と推論段階の両方で追加のパラメータや計算コストを導入していない。もう1つの利点は、モデルに依存しない非破壊的な性質であり、特定のモデルに固有の他の設計の必要性を排除している。完全な微調整と比較すると、GPSは3.33%(91.78%対88.45%、FGVC)と9.61%(73.1%対65.57%、VTAB)の精度向上を実現し、24以上の画像分類タスクにおいて、トレーニング済みモデルのパラメータの6%しか調整していない。さらに,既存のPEFT法と比較すると,GPSは最先端の性能を実現している。

With the growing size of pre-trained models, full fine-tuning and storing all the parameters for various downstream tasks is costly and infeasible. In this paper, we propose a new parameter-efficient fine-tuning method, Gradient-based Parameter Selection (GPS), demonstrating that only tuning a few selected parameters from the pre-trained model while keeping the remainder of the model frozen can generate similar or better performance compared with the full model fine-tuning method. Different from the existing popular and state-of-the-art parameter-efficient fine-tuning approaches, our method does not introduce any additional parameters and computational costs during both the training and inference stages. Another advantage is the model-agnostic and non-destructive property, which eliminates the need for any other design specific to a particular model. Compared with the full fine-tuning, GPS achieves 3.33% (91.78% vs. 88.45%, FGVC) and 9.61% (73.1% vs. 65.57%, VTAB) improvement of the accuracy with tuning only 0.36% parameters of the pre-trained model on average over 24 image classification tasks; it also demonstrates a significant improvement of 17% and 16.8% in mDice and mIoU, respectively, on medical image segmentation task. Moreover, GPS achieves state-of-the-art performance compared with existing PEFT methods.

翻訳日:2024-06-13 23:13:33 公開日:2024-06-11

# 流通シフト下における私的移動学習のための公共表現のメリットについて

On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift ( http://arxiv.org/abs/2312.15551v3 )

ライセンス: Link先を確認

Pratiksha Thaker, Amrith Setlur, Zhiwei Steven Wu, Virginia Smith,

(参考訳) 公的な事前訓練は、微分プライベートモデルトレーニングを改善するための有望なアプローチである。しかし、近年の研究では、このパラダイムを研究する多くの肯定的な研究成果は、分散タスクのみを考慮しており、事前学習データと微調整データの間に分散シフトがある設定には適用できない可能性がある、と指摘している。本研究では、公開データからのゼロショットのパフォーマンスとプライベートデータによるゼロショットのトレーニングの両方が、不可能なほど弱い結果をもたらすような、大規模な分散シフトの設定においても、3つのタスクを経験的に比較し、パブリック機能は、スクラッチからプライベートトレーニングよりも最大67倍まで、プライベートトレーニングの精度を向上させることができることを示す。この現象の理論的説明として、公開データとプライベートデータが低次元表現を共有している場合、公開データのみからプライベートタスクを学習できない場合でも、公開表現はプライベートトレーニングのサンプル複雑さを改善することができることを示す。いずれにせよ,我々の結果は,公開データによって,極端分布シフトの現実的な設定において,私的なトレーニングを現実的に行うことができることを示すものである。

Public pretraining is a promising approach to improve differentially private model training. However, recent work has noted that many positive research results studying this paradigm only consider in-distribution tasks, and may not apply to settings where there is distribution shift between the pretraining and finetuning data -- a scenario that is likely when finetuning private tasks due to the sensitive nature of the data. In this work, we show empirically across three tasks that even in settings with large distribution shift, where both zero-shot performance from public data and training from scratch with private data give unusably weak results, public features can in fact improve private training accuracy by up to 67\% over private training from scratch. We provide a theoretical explanation for this phenomenon, showing that if the public and private data share a low-dimensional representation, public representations can improve the sample complexity of private training even if it is impossible to learn the private task from the public data alone. Altogether, our results provide evidence that public data can indeed make private training practical in realistic settings of extreme distribution shift.

翻訳日:2024-06-13 23:13:33 公開日:2024-06-11

# 理科教育評価の自動化のためのLLMの知識蒸留

Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments ( http://arxiv.org/abs/2312.15842v3 )

ライセンス: Link先を確認

Ehsan Latif, Luyang Fang, Ping Ma, Xiaoming Zhai,

(参考訳) 本研究では, より小さく, より効率的かつ正確なニューラルネットワークへの微調整型大言語モデル(LLM)の知識蒸留(KD)手法を提案する。リソース制約のあるデバイスにこれらのモデルをデプロイするという課題を特にターゲットとしています。本手法は,教師モデルとして機能するLSMの予測確率(ソフトラベル)を用いて,より小さな学生モデル(ニューラルネットワーク)を訓練することを含む。これは、LLMの出力確率から学習するために調整された特殊な損失関数によって達成され、学生モデルが教師のパフォーマンスを忠実に模倣することを保証する。 KD手法の性能を検証するために,6,684名の学生による科学質問に対する回答と,人間の専門家が評価した学生による回答を用いた3つの数学的推論データセットを含む,大規模なデータセット7Tを用いた。我々は,最先端(SOTA)蒸留モデル,TinyBERT,人工ニューラルネットワーク(ANN)モデルと比較した。その結果,KD法はANN法とTinyBERT法に比較して評価精度が3%,TinyBERT法が2%高く,教師モデルに比較して精度が高かった。さらに、生徒モデルのサイズは0.03Mで、パラメータの4000倍小さく、x10は教師モデルとTinyBERTよりも高速である。この研究の意義は、高度なAI技術を一般的な教育環境、特に自動スコアリングで利用できるようにすることにある。

This study proposes a method for knowledge distillation (KD) of fine-tuned Large Language Models (LLMs) into smaller, more efficient, and accurate neural networks. We specifically target the challenge of deploying these models on resource-constrained devices. Our methodology involves training the smaller student model (Neural Network) using the prediction probabilities (as soft labels) of the LLM, which serves as a teacher model. This is achieved through a specialized loss function tailored to learn from the LLM's output probabilities, ensuring that the student model closely mimics the teacher's performance. To validate the performance of the KD approach, we utilized a large dataset, 7T, containing 6,684 student-written responses to science questions and three mathematical reasoning datasets with student-written responses graded by human experts. We compared accuracy with state-of-the-art (SOTA) distilled models, TinyBERT, and artificial neural network (ANN) models. Results have shown that the KD approach has 3% and 2% higher scoring accuracy than ANN and TinyBERT, respectively, and comparable accuracy to the teacher model. Furthermore, the student model size is 0.03M, 4,000 times smaller in parameters and x10 faster in inferencing than the teacher model and TinyBERT, respectively. The significance of this research lies in its potential to make advanced AI technologies accessible in typical educational settings, particularly for automatic scoring.

翻訳日:2024-06-13 23:13:33 公開日:2024-06-11

# 大型スピン猫符号を用いたフォールトトレラント量子計算

Fault-tolerant quantum computation using large spin cat-codes ( http://arxiv.org/abs/2401.04271v4 )

ライセンス: Link先を確認

Sivaprasad Omanakuttan, Vikas Buchemmavari, Jonathan A. Gross, Ivan H Deutsch, Milad Marvian,

(参考訳) 本研究では、スピンキャット符号を用いて、大きなスピンキューディットに符号化された量子ビットに基づいて、フォールトトレラントな量子誤り訂正プロトコルを構築する。これにより、支配的な誤差源、すなわち角運動量の成分において線型あるいは二次的な誤差演算子として表現できる過程を補正することができる。このような符号は、非構造ノイズモデルのために設計された符号に比べて、優れたしきい値と低いリソースオーバーヘッドを示す。ゲート操作における支配的なエラーを保存するため、適切なユニバーサルゲートセットを同定する。鍵となる構成要素は、球面テンソル作用素のランクを保存するCNOTゲートである。支配的な誤差を位相誤差と振幅誤差と分類し、量子ビットの位相フリップ誤差に類似した位相誤差を効果的に補正できることを示す。さらに,シンドローム測定に頼らずに振幅誤差に対処する計測自由誤差補正手法を提案する。論理的CNOTゲート誤差の詳細な解析により、スピンキャット符号化における誤り訂正の耐故障しきい値が標準量子ビット符号化のそれを超えることが確認される。我々は、量子制御とライドベルク封鎖を用いて、ランク保存型CNOTゲートを含む普遍ゲートセットを生成する方法を示す。これらの知見は、量子情報処理において、耐障害性、高いしきい値、リソースオーバーヘッドを低減できる可能性を持つ、大きなスピンで量子ビットを符号化する方法を舗装している。

We construct a fault-tolerant quantum error-correcting protocol based on a qubit encoded in a large spin qudit using a spin-cat code, analogous to the continuous variable cat encoding. With this, we can correct the dominant error sources, namely processes that can be expressed as error operators that are linear or quadratic in the components of angular momentum. Such codes tailored to dominant error sources {can} exhibit superior thresholds and lower resource overheads when compared to those designed for unstructured noise models. To preserve the dominant errors during gate operations, we identify a suitable universal gate set. A key component is the CNOT gate that preserves the rank of spherical tensor operators. Categorizing the dominant errors as phase and amplitude errors, we demonstrate how phase errors, analogous to phase-flip errors for qubits, can be effectively corrected. Furthermore, we propose a measurement-free error correction scheme to address amplitude errors without relying on syndrome measurements. Through an in-depth analysis of logical CNOT gate errors, we establish that the fault-tolerant threshold for error correction in the spin-cat encoding surpasses that of standard qubit-based encodings. We consider a specific implementation based on neutral-atom quantum computing, with qudits encoded in the nuclear spin of $^{87}$Sr, and show how to generate the universal gate set, including the rank-preserving CNOT gate, using quantum control and the Rydberg blockade. These findings pave the way for encoding a qubit in a large spin with the potential to achieve fault tolerance, high threshold, and reduced resource overhead in quantum information processing.

翻訳日:2024-06-13 23:13:33 公開日:2024-06-11

# 自由フェルミオン系の絡み合い、信号処理および代数的コンビネータ

Entanglement of free-fermion systems, signal processing and algebraic combinatorics ( http://arxiv.org/abs/2401.07150v2 )

ライセンス: Link先を確認

Pierre-Antoine Bernard, Nicolas Crampé, Rafael I. Nepomechie, Gilles Parez, Luc Vinet,

(参考訳) 本稿では,信号処理や代数コンビネータの手法を生かしたグラフ上の自由フェルミオン系の絡み合いに関する最近の研究について述べる。一方、時間と帯域制限の問題と平行して、双スペクトル状態において切断された相関行列と交換する三角行列を求め、他方では、$P$-ポリノミカルなアソシエーションスキームの文脈で生じるテルウィガー代数の既約分解は、単純化された枠組みをもたらす。

This paper offers a review of recent studies on the entanglement of free-fermion systems on graphs that take advantage of methods pertaining to signal processing and algebraic combinatorics. On the one hand, a parallel with time and band limiting problems is used to obtain a tridiagonal matrix commuting with the chopped correlation matrix in bispectral situations and on the other, the irreducible decomposition of the Terwilliger algebra arising in the context of $P$-polynomial association schemes is seen to yield a simplifying framework.

翻訳日:2024-06-13 23:03:49 公開日:2024-06-11

# 2次元量子多体基底状態のバンバン準備--2次元テンソルネットワークを用いたアルゴリズムの最適化

Bang-bang preparation of quantum many-body ground states in two dimensions: optimization of the algorithm with a two-dimensional tensor network ( http://arxiv.org/abs/2401.09158v4 )

ライセンス: Link先を確認

Yintai Zhang, Jacek Dziarmaga,

(参考訳) バンバン(BB)アルゴリズムは、初期積状態が$H_1$と$H_2$の間で交互に変化することによって、2次元(2次元)量子多体ハミルトンの基底状態を作成する。近傍テンソル更新を用いて、BB進化を無限対絡み状態 (iPEPS) でシミュレートする。交代シーケンスは、最終エネルギーをコスト関数として最適化する。エネルギーは、その安定性のために接空間法で計算される。この手法は、iPEPSの変分最適化により得られた基底状態に対して、量子臨界点付近の2次元逆場量子イジングモデルでベンチマークされる。最適BB配列は、基底状態の量子アニールまたは断熱処理(AP)をシミュレートする配列と非摂動的に異なる。最適BBエネルギーは最適APエネルギーよりもはるかに速いバン数と収束する。

A bang-bang (BB) algorithm prepares the ground state of a two-dimensional (2D) quantum many-body Hamiltonian $H=H_1+H_2$ by evolving an initial product state alternating between $H_1$ and $H_2$. We use the neighborhood tensor update to simulate the BB evolution with an infinite pair-entangled projected state (iPEPS). The alternating sequence is optimized with the final energy as a cost function. The energy is calculated with the tangent space methods for the sake of their stability. The method is benchmarked in the 2D transverse field quantum Ising model near its quantum critical point against a ground state obtained by variational optimization of the iPEPS. The optimal BB sequence differs non-perturbatively from a sequence simulating quantum annealing or adiabatic preparation (AP) of the ground state. The optimal BB energy converges with the number of bangs much faster than the optimal AP energy.

翻訳日:2024-06-13 23:03:49 公開日:2024-06-11

# 生成コンテキストによるブラインド: 言語モデルと生成コンテキストのマージは、知識衝突時にどのように行われるか?

Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts? ( http://arxiv.org/abs/2401.11911v6 )

ライセンス: Link先を確認

Hexiang Tan, Fei Sun, Wanli Yang, Yuanzhuo Wang, Qi Cao, Xueqi Cheng,

(参考訳) 補助情報は、LLM(Large Language Models)の拡張の鍵となっているが、LLMがこれらのコンテキストをどのように統合するかについては、特にLLMが生成したコンテキストと外部ソースから取得したコンテキストについてはあまり知られていない。そこで本研究では,LLMの応答が生成した文脈と検索した文脈のいずれに起因しているかを特定するための体系的な枠組みを定式化する。応答の起源を容易に追跡するために,各質問は生成したコンテキストと検索したコンテキストの両方にペアリングされるが,その中の1つだけが正解である。実験の結果,複数のLDM (GPT-4/3.5, Llama2) において, 誤った情報を提供する場合でも, 生成コンテキストを優先する有意なバイアスが認められた。さらに、このバイアスに寄与する2つの重要な要因を特定します。 i) LLMが生成する文脈は,通常,質問とより類似し,選択される可能性を高める。二検索した文脈におけるセグメンテーションのプロセスは、その完全性を損なうため、LLMの完全利用を阻害する。我々の分析は,LLMが様々な文脈を融合する方法の理解を深め,現在のLLM拡張法を進展させる上で貴重な洞察を提供し,LLM検索における誤情報の発生リスクを強調している。

While auxiliary information has become a key to enhancing Large Language Models (LLMs), relatively little is known about how LLMs merge these contexts, specifically contexts generated by LLMs and those retrieved from external sources. To investigate this, we formulate a systematic framework to identify whether LLMs' responses are attributed to either generated or retrieved contexts. To easily trace the origin of the response, we construct datasets with conflicting contexts, i.e., each question is paired with both generated and retrieved contexts, yet only one of them contains the correct answer. Our experiments reveal a significant bias in several LLMs (GPT-4/3.5 and Llama2) to favor generated contexts, even when they provide incorrect information. We further identify two key factors contributing to this bias: i) contexts generated by LLMs typically show greater similarity to the questions, increasing their likelihood of being selected; ii) the segmentation process used in retrieved contexts disrupts their completeness, thereby hindering their full utilization in LLMs. Our analysis enhances the understanding of how LLMs merge diverse contexts, offers valuable insights for advancing current LLM augmentation methods, and highlights the risk of generated misinformation for retrieval-augmented LLMs.

翻訳日:2024-06-13 23:03:49 公開日:2024-06-11

# 浅部ReLU様ニューラルネットワークのランドスケープ:静止点,サドルエスケープ,ネットワーク埋め込み

Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escaping, and Network Embedding ( http://arxiv.org/abs/2402.05626v4 )

ライセンス: Link先を確認

Zhengqing Wu, Berfin Simsek, Francois Ged,

(参考訳) 本稿では,経験的二乗損失を学習したReLU様活性化関数を持つ一層ニューラルネットワークの損失状況について検討する。アクティベーション関数は微分不可能であるため、固定点を完全に特徴づける方法は今のところ不明である。非微分可能ケースと微分可能ケースの両方に適用可能な定常条件を提案する。さらに、定常点が一階条件で定義される「エスケープニューロン」を含まない場合、局所最小値でなければならないことを示す。さらに、スカラーアウトプットの場合、エスケープニューロンの存在は、静止点が局所的な最小値でないことを保証している。その結果,浅部ReLU様ネットワークに対する無限小の初期化から始まり,サドルからサドルまでのトレーニングプロセスの記述を洗練し,サドルから脱出したニューロンのパラメータ変化と直接関連付けることができた。さらに、より広いネットワーク内でより狭いネットワークをインスタンス化するネットワーク埋め込みが、静止点を再設定する方法について、十分に議論することができる。

In this paper, we investigate the loss landscape of one-hidden-layer neural networks with ReLU-like activation functions trained with the empirical squared loss. As the activation function is non-differentiable, it is so far unclear how to completely characterize the stationary points. We propose the conditions for stationarity that apply to both non-differentiable and differentiable cases. Additionally, we show that, if a stationary point does not contain "escape neurons", which are defined with first-order conditions, then it must be a local minimum. Moreover, for the scalar-output case, the presence of an escape neuron guarantees that the stationary point is not a local minimum. Our results refine the description of the saddle-to-saddle training process starting from infinitesimally small (vanishing) initialization for shallow ReLU-like networks, linking saddle escaping directly with the parameter changes of escape neurons. Moreover, we are also able to fully discuss how network embedding, which is to instantiate a narrower network within a wider network, reshapes the stationary points.

翻訳日:2024-06-13 22:53:54 公開日:2024-06-11

# チュニジア・アラビアの正規化オルソグラフィー

Normalized Orthography for Tunisian Arabic ( http://arxiv.org/abs/2402.12940v2 )

ライセンス: Link先を確認

Houcemeddine Turki, Kawthar Ellouze, Hager Ben Ammar, Mohamed Ali Hadj Taieb, Imed Adel, Mohamed Ben Aouicha, Pier Luigi Farri, Abderrezak Bennour,

(参考訳) チュニジア・アラビア(英語: Tunisian Arabic、ISO 693-3: aeb)は、チュニジア原産で、様々な歴史的影響を受け、アラビア語に由来する。本研究は、チュニジア・アラビア語をアラビア語で翻訳するためのCODA*ガイドラインの適応である「チュニジア・アラビア語版Normalized Orthography for Tunisian Arabic」(NOTA)を紹介する。ユーザフレンドリさと一貫性を確保することで、言語リソースの開発を強化することを目的としている。改訂された標準は、チュニジアの音韻学と形態学を正確に表現する上での課題に対処し、現代標準アラビア語に基づく転写の問題を修正した。

Tunisian Arabic (ISO 693-3: aeb) isa distinct variety native to Tunisia, derived from Arabic and enriched by various historical influences. This research introduces the "Normalized Orthography for Tunisian Arabic" (NOTA), an adaptation of CODA* guidelines for transcribing Tunisian Arabic using Arabic script. The aim is to enhance language resource development by ensuring user-friendliness and consistency. The updated standard addresses challenges in accurately representing Tunisian phonology and morphology, correcting issues from transcriptions based on Modern Standard Arabic.

翻訳日:2024-06-13 22:44:06 公開日:2024-06-11

# ターゲットデータサブセット選択のためのサブモジュール情報対策の理論解析

Theoretical Analysis of Submodular Information Measures for Targeted Data Subset Selection ( http://arxiv.org/abs/2402.13454v2 )

ライセンス: Link先を確認

Nathan Beck, Truong Pham, Rishabh Iyer,

(参考訳) 機械学習タスク全体で使用されているデータの量が増えると、データの特定のサブセットをターゲットする能力がより重要になる。この機能を実現するために、最近提案されたsubmodular Mutual Information (SMI) は、文献の様々なタスクに効果的に適用され、典型的なクエリセットの助けを借りてターゲットサブセットの選択を行う。しかし、これらすべての研究は、サブセットの関連性や対象データのカバレッジに対する感度の観点から、SMIの理論的保証を提供するには不十分である。対象データの関連性やカバレッジに関連する量に関する類似性に基づく境界を導出することで,このような保証を初めて提供する。これらの境界により、複数のアプリケーションで経験的に成功したSMI関数は、理論的には、クエリ関連性およびクエリカバレッジが良好であることを示す。

With increasing volume of data being used across machine learning tasks, the capability to target specific subsets of data becomes more important. To aid in this capability, the recently proposed Submodular Mutual Information (SMI) has been effectively applied across numerous tasks in literature to perform targeted subset selection with the aid of a exemplar query set. However, all such works are deficient in providing theoretical guarantees for SMI in terms of its sensitivity to a subset's relevance and coverage of the targeted data. For the first time, we provide such guarantees by deriving similarity-based bounds on quantities related to relevance and coverage of the targeted data. With these bounds, we show that the SMI functions, which have empirically shown success in multiple applications, are theoretically sound in achieving good query relevance and query coverage.

翻訳日:2024-06-13 22:44:06 公開日:2024-06-11

# 並列文脈符号化を用いたLong-Context言語モデリング

Long-Context Language Modeling with Parallel Context Encoding ( http://arxiv.org/abs/2402.16617v2 )

ライセンス: Link先を確認

Howard Yen, Tianyu Gao, Danqi Chen,

(参考訳) 大きな言語モデル(LLM)を拡張して、より長い入力を処理することは、幅広いアプリケーションにとって不可欠である。しかし、トランスのかなりの計算コストと位置符号化の限定的な一般化により、コンテキストウィンドウのサイズは制限される。既存のデコーダのみのLLMに適用可能なフレームワークであるCEPE(Context Expansion with Parallel Encoding)を導入し、コンテキストウィンドウを拡張する。 CEPEは小さなエンコーダを使用して長い入力チャンクをチャンク単位で処理し、冷凍復号器はクロスアテンションを介して追加のコンテキストを利用することができる。 CEPEは効率的で汎用的で汎用的であり、8Kの文書で訓練され、LLAMA-2のコンテキストウィンドウを128Kのトークンに拡張し、メモリの1/6のスループットを10倍提供する。 CEPEは、言語モデリングとコンテキスト内学習に強いパフォーマンスをもたらす。 CEPEは検索拡張アプリケーションでも優れており、既存の長期コンテキストモデルは検索コンテキストで縮退する。さらに、ラベルなしデータのみを用いて命令調整モデルのコンテキストウィンドウを拡張するCEPE変異を導入し、LLAMA-2-CHAT上での有効性を示し、下流タスクにおいて非常に長いコンテキストを活用できる強力な命令追従モデルを実現する。

Extending large language models (LLMs) to process longer inputs is crucial for a wide range of applications. However, the substantial computational cost of transformers and limited generalization of positional encoding restrict the size of their context window. We introduce Context Expansion with Parallel Encoding (CEPE), a framework that can be applied to any existing decoder-only LLMs to extend their context window. CEPE employs a small encoder to process long inputs chunk by chunk, enabling the frozen decoder to utilize additional contexts via cross-attention. CEPE is efficient, generalizable, and versatile: trained with 8K-token documents, it extends the context window of LLAMA-2 to 128K tokens, offering 10x the throughput with only 1/6 of the memory. CEPE yields strong performance on language modeling and in-context learning. CEPE also excels in retrieval-augmented applications, while existing long-context models degenerate with retrieved contexts. We further introduce a CEPE variant that can extend the context window of instruction-tuned models using only unlabeled data, and showcase its effectiveness on LLAMA-2-CHAT, leading to a strong instruction-following model that can leverage very long contexts on downstream tasks.

翻訳日:2024-06-13 22:44:06 公開日:2024-06-11

# Larimar: エピソードメモリ制御を備えた大規模言語モデル

Larimar: Large Language Models with Episodic Memory Control ( http://arxiv.org/abs/2403.11901v2 )

ライセンス: Link先を確認

Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, Soham Dan, Pin-Yu Chen,

(参考訳) LLM(Large Language Models)に格納された知識の効率的かつ正確な更新は、今日の最も急進的な研究課題の1つである。本稿では,Larimarについて述べる。Larimarは,分散エピソードメモリを用いてLLMを拡張するための,脳にインスパイアされた新しいアーキテクチャである。 Larimarのメモリは、計算コストのかかるリトレーニングや微調整を必要とせずに、動的でワンショットの知識更新を可能にする。複数のファクト編集ベンチマークの実験結果から、Larimarは、挑戦的なシーケンシャルな編集セットアップであっても、最も競争力のあるベースラインに匹敵する精度を達成できただけでなく、ベースLLMに依存して8～10倍のスピードアップを実現している。さらに,Larimarを用いた情報漏洩防止,入力コンテキスト長の一般化のメカニズムを提案し,その有効性を示す。私たちのコードはhttps://github.com/IBM/larimarで利用可能です。

Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 8-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting, information leakage prevention, and input context length generalization with Larimar and show their effectiveness. Our code is available at https://github.com/IBM/larimar

翻訳日:2024-06-13 22:34:15 公開日:2024-06-11

# ニューラルネットワークによる最適化のための自己改善: 置き換えせずに、改善されたサンプル

Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement ( http://arxiv.org/abs/2403.15180v2 )

ライセンス: Link先を確認

Jonathan Pirnay, Dominik G. Grimm,

(参考訳) エンドツーエンド構築型ニューラルネットワーク最適化の現在の手法は、通常、専門家ソリューションからの行動クローニングや強化学習からのポリシー勾配手法を用いてポリシーを訓練する。行動クローニングは単純であるが、高価な専門家のソリューションが必要であり、ポリシー勾配法は計算的に要求され、微調整が複雑であることが多い。本研究では、各エポックにおける現在のモデルを用いてランダムなインスタンスに対する複数のソリューションをサンプリングし、その後、教師付き模倣学習の専門的軌跡として最適解を選択することにより、これら2つを橋渡しし、トレーニングプロセスを簡素化する。最小限のサンプリングで徐々に改善する手法を実現するため,提案手法では,ラウンドワイド・確率的ビームサーチと,証明可能なポリシー改善から得られた更新戦略を組み合わせた手法を提案する。この戦略は、ほとんど計算オーバーヘッドのないサンプルシーケンスの利点を利用して、ラウンド間のポリシーを洗練させる。我々は,トラベリングセールスマン問題とキャパシタントカールーティング問題に対する我々のアプローチを評価する。本手法で訓練したモデルでは,専門家データと同等の性能と一般化を実現している。さらに,この手法をトランスフォーマーアーキテクチャを用いてジョブショップスケジューリング問題に適用し,既存の最先端手法よりも広いマージンで性能を向上する。

Current methods for end-to-end constructive neural combinatorial optimization usually train a policy using behavior cloning from expert solutions or policy gradient methods from reinforcement learning. While behavior cloning is straightforward, it requires expensive expert solutions, and policy gradient methods are often computationally demanding and complex to fine-tune. In this work, we bridge the two and simplify the training process by sampling multiple solutions for random instances using the current model in each epoch and then selecting the best solution as an expert trajectory for supervised imitation learning. To achieve progressively improving solutions with minimal sampling, we introduce a method that combines round-wise Stochastic Beam Search with an update strategy derived from a provable policy improvement. This strategy refines the policy between rounds by utilizing the advantage of the sampled sequences with almost no computational overhead. We evaluate our approach on the Traveling Salesman Problem and the Capacitated Vehicle Routing Problem. The models trained with our method achieve comparable performance and generalization to those trained with expert data. Additionally, we apply our method to the Job Shop Scheduling Problem using a transformer-based architecture and outperform existing state-of-the-art methods by a wide margin.

翻訳日:2024-06-13 22:34:15 公開日:2024-06-11

# 類似OOD検出パラドックスの幾何学的説明

A Geometric Explanation of the Likelihood OOD Detection Paradox ( http://arxiv.org/abs/2403.18910v2 )

ライセンス: Link先を確認

Hamidreza Kamkari, Brendan Leigh Ross, Jesse C. Cresswell, Anthony L. Caterini, Rahul G. Krishnan, Gabriel Loaiza-Ganem,

(参考訳) Likelihood-based Deep Generative Model (DGM) は一般的に、比較的複雑なデータセットで訓練された場合、より単純なソースからのアウト・オブ・ディストリビューション(OOD)データに高い確率値を割り当てる。謎に加え、OODサンプルは高い可能性にもかかわらずこれらのDGMによって生成されることはない。この2重のパラドックスはまだ決定的に説明されていないため、OOD検出の確率は信頼性が低い。我々の第一の観察は、最小の確率質量を含む場合、高濃度の領域は発生しないということである。このような大きな密度と低い確率質量の矛盾が、低次元多様体に制限されたデータの周りに生じることを示す。また、このシナリオは、局所固有次元(LID)推定により同定できることを示し、事前訓練されたDGMから得られる可能性とLID推定をペアリングするOOD検出法を提案する。提案手法はフローの正規化やスコアベース拡散モデルに適用でき、同じDGMバックボーンを用いて最先端のOOD検出ベンチマークに適合または超越した結果が得られる。私たちのコードはhttps://github.com/layer6ai-labs/dgm_ood_detectionで利用可能です。

Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at https://github.com/layer6ai-labs/dgm_ood_detection.

翻訳日:2024-06-13 22:34:15 公開日:2024-06-11

# BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes (英語)

BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes ( http://arxiv.org/abs/2404.03022v2 )

ライセンス: Link先を確認

Amirhossein Abaskohi, Amirhossein Dabiriaghdam, Lele Wang, Giuseppe Carenini,

(参考訳) テキストと画像を組み合わせたミームは、しばしばメタファーを使って説得力のあるメッセージを伝え、世論を形成する。そこで本研究チームはSemEval-2024 Task 4という階層型マルチラベル分類タスクに取り組み,その手法をミーム内に組み込んだ修辞的,心理的説得的手法を同定した。この問題に対処するために,画像のモダリティギャップと追加の意味情報の影響を評価するキャプション生成手法を導入し,その結果を改良した。本モデルでは, テキストエンコーダとしてRoBERTa, 画像エンコーダとしてCLIPを微調整するために, GPT-4 生成キャプションとミームテキストを併用した。ベースラインは12のサブタスクすべてにおいて大きなマージンで上回っている。特に、Subtask 2aの全言語でトップ3、Subtask 2bでトップ4にランクインし、定量的に強いパフォーマンスを示した。中間段階の導入によって達成された改善は、視覚エンコーダに挑戦する画像の比喩的本質に起因する可能性が高い。これは抽象的な視覚的セマンティックスエンコーディングを改善する可能性を強調している。

Memes, combining text and images, frequently use metaphors to convey persuasive messages, shaping public opinion. Motivated by this, our team engaged in SemEval-2024 Task 4, a hierarchical multi-label classification task designed to identify rhetorical and psychological persuasion techniques embedded within memes. To tackle this problem, we introduced a caption generation step to assess the modality gap and the impact of additional semantic information from images, which improved our result. Our best model utilizes GPT-4 generated captions alongside meme text to fine-tune RoBERTa as the text encoder and CLIP as the image encoder. It outperforms the baseline by a large margin in all 12 subtasks. In particular, it ranked in top-3 across all languages in Subtask 2a, and top-4 in Subtask 2b, demonstrating quantitatively strong performance. The improvement achieved by the introduced intermediate step is likely attributable to the metaphorical essence of images that challenges visual encoders. This highlights the potential for improving abstract visual semantics encoding.

翻訳日:2024-06-13 22:24:31 公開日:2024-06-11

# 汎用行動エージェントのためのデータ駆動ゴール認識設計

Data-Driven Goal Recognition Design for General Behavioral Agents ( http://arxiv.org/abs/2404.03054v2 )

ライセンス: Link先を確認

Robert Kasumba, Guanghui Yu, Chien-Ju Ho, Sarah Keren, William Yeoh,

(参考訳) 目標認識設計は、意思決定環境への限定的な修正を目標とし、それらの環境内で行動するエージェントの目標の推測を容易にすることを目的としている。目標認識設計において様々な研究努力がなされてきたが、既存のアプローチは計算的に要求されており、エージェントが意思決定において(ほぼ)最適であると仮定することが多い。これらの制約に対処するために、汎用的な行動モデルを持つエージェントを考慮に入れた、ゴール認識設計のためのデータ駆動型アプローチを導入する。既存の文献に従えば、意思決定環境におけるエージェントの目標を推測する難しさの尺度として、最悪のケースの識別性($\textit{wcd}$)を用いる。私たちのアプローチは、与えられた環境とエージェントの振る舞いモデルに対して$\textit{wcd}$を予測するために、機械学習モデルをトレーニングすることから始まります。そこで我々は,目標認識の強化のための意思決定環境を最適化するために,様々な制約を満たす勾配に基づく最適化フレームワークを提案する。シミュレーションにより,既存の手法よりも$\textit{wcd}$を削減し,従来のセットアップにおける実行効率を向上させることが実証された。さらに, フレキシブルな予算制約, より複雑な環境, 最適なエージェント動作など, 既存のアプローチが適用されないような設定にも適応する。最後に,本手法が実世界の人的意思決定者による効率的な目標認識を促進する環境を創出できることを確認した。

Goal recognition design aims to make limited modifications to decision-making environments with the goal of making it easier to infer the goals of agents acting within those environments. Although various research efforts have been made in goal recognition design, existing approaches are computationally demanding and often assume that agents are (near-)optimal in their decision-making. To address these limitations, we introduce a data-driven approach to goal recognition design that can account for agents with general behavioral models. Following existing literature, we use worst-case distinctiveness($\textit{wcd}$) as a measure of the difficulty in inferring the goal of an agent in a decision-making environment. Our approach begins by training a machine learning model to predict the $\textit{wcd}$ for a given environment and the agent behavior model. We then propose a gradient-based optimization framework that accommodates various constraints to optimize decision-making environments for enhanced goal recognition. Through extensive simulations, we demonstrate that our approach outperforms existing methods in reducing $\textit{wcd}$ and enhancing runtime efficiency in conventional setup. Moreover, our approach also adapts to settings in which existing approaches do not apply, such as those involving flexible budget constraints, more complex environments, and suboptimal agent behavior. Finally, we have conducted human-subject experiments which confirm that our method can create environments that facilitate efficient goal recognition from real-world human decision-makers.

翻訳日:2024-06-13 22:24:31 公開日:2024-06-11

# ニューラルネットワーク検証のための最小NAP仕様の学習

Learning Minimal NAP Specifications for Neural Network Verification ( http://arxiv.org/abs/2404.04662v2 )

ライセンス: Link先を確認

Chuqin Geng, Zhaoyue Wang, Haolin Ye, Saifei Liao, Xujie Si,

(参考訳) 仕様はニューラルネットワークの検証において重要な役割を果たす。彼らは我々が検証しようとする正確な入力領域を定義し、典型的にはL-無限ノルム球として表される。最近の研究では、未確認のテストデータセットを検証するための仕様として、ニューラルアクティベーションパターン(NAP)を使用することが提案されているが、最も洗練されたNAPの計算に焦点を当てており、しばしば入力空間の非常に小さな領域に限られている。本稿では,ニューラルネットワークが与えられた場合,ネットワークの堅牢性の形式的検証に十分な最小限の(最も粗い)NAPを求める。最小のNAP仕様を見つけることは、検証可能な境界を広げるだけでなく、どのニューロンがモデルの堅牢性に寄与するかの洞察を与える。この問題に対処するために、我々はいくつかの正確で近似的なアプローチを提案する。我々の正確なアプローチは、検証ツールを利用して、決定論的または統計的に最小限のNAP仕様を見つけます。近似手法は, 検証ツールを呼び出すことなく, 逆例と局所勾配を用いて最小NAPを効率的に推定する。これにより、ニューロン間の潜在的な因果関係と、既存の検証フレームワークがスケールできないタスクである最先端のニューラルネットワークの堅牢性を調べることができる。我々の実験結果から、最小のNAP仕様は最も洗練されたNAP仕様よりもはるかに少ない神経細胞を必要とすることが示唆されるが、検証可能な境界を桁違いに大きく拡張することができる。

Specifications play a crucial role in neural network verification. They define the precise input regions we aim to verify, typically represented as L-infinity norm balls. While recent research suggests using neural activation patterns (NAPs) as specifications for verifying unseen test set data, it focuses on computing the most refined NAPs, often limited to very small regions in the input space. In this paper, we study the following problem: Given a neural network, find a minimal (coarsest) NAP that is sufficient for formal verification of the network's robustness. Finding the minimal NAP specification not only expands verifiable bounds but also provides insights into which neurons contribute to the model's robustness. To address this problem, we propose several exact and approximate approaches. Our exact approaches leverage the verification tool to find minimal NAP specifications in either a deterministic or statistical manner. Whereas the approximate methods efficiently estimate minimal NAPs using adversarial examples and local gradients, without making calls to the verification tool. This allows us to inspect potential causal links between neurons and the robustness of state-of-the-art neural networks, a task for which existing verification frameworks fail to scale. Our experimental results suggest that minimal NAP specifications require much smaller fractions of neurons compared to the most refined NAP specifications, yet they can significantly expand the verifiable boundaries to several orders of magnitude larger.

翻訳日:2024-06-13 22:24:31 公開日:2024-06-11

# PVF (Parameter Vulnerability Factor):モデルパラメータにおけるSDCに対するAI脆弱性を理解するためのスケーラブルなメトリクス

PVF (Parameter Vulnerability Factor): A Scalable Metric for Understanding AI Vulnerability Against SDCs in Model Parameters ( http://arxiv.org/abs/2405.01741v3 )

ライセンス: Link先を確認

Xun Jiao, Fred Lin, Harish D. Dixit, Joel Coburn, Abhinav Pandey, Han Wang, Venkat Ramesh, Jianyu Huang, Wang Xu, Daniel Moore, Sriram Sankar,

(参考訳) AIシステムの信頼性は、デプロイメントの成功とAI技術の広範な採用に対する基本的な懸念である。残念なことに、AIハードウェアシステムのエスカレートする複雑さとヘテロジニティは、ハードウェアの欠陥(例えば、サイレントデータ破損(SDC))の影響を受けやすくなり、モデルパラメータを破損させる可能性がある。これがAI推論/サービス中に発生する場合、ユーザにとって誤ったあるいは劣化したモデルアウトプットが発生し、最終的にはAIサービスの品質と信頼性に影響を与える可能性がある。モデル内のさまざまなコンポーネント(モジュール、レイヤなど)が、パラメータの破損に対して、どのようにさまざまな脆弱性を示すのか? この問題を体系的に解決するために,コンピュータアーキテクチャコミュニティにおいて,AIモデル脆弱性のパラメータ破損に対する定量化を目標とした,新しい量的尺度であるパラメータ脆弱性係数(PVF)を提案する。モデルパラメータのPVFを、そのモデルパラメータの破損が誤った出力をもたらす確率として定義する。本稿では,推論中にPVFを3種類のタスク/モデルに適用するためのいくつかのユースケースについて述べる。 PVFは、脆弱なAIパラメータコンポーネントを保護されたハードウェアモジュールにマッピングするなど、フォールトプロテクションとパフォーマンス/効率のトレードオフのバランスにおいて、AIハードウェアデザイナに重要な洞察を提供することができる。 PVFメトリックは任意のAIモデルに適用可能であり、AI脆弱性/レジリエンス評価プラクティスの統合と標準化を支援する可能性がある。

Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults, e.g., silent data corruptions (SDC), that can potentially corrupt model parameters. When this occurs during AI inference/servicing, it can potentially lead to incorrect or degraded model output for users, ultimately affecting the quality and reliability of AI services. In light of the escalating threat, it is crucial to address key questions: How vulnerable are AI models to parameter corruptions, and how do different components (such as modules, layers) of the models exhibit varying vulnerabilities to parameter corruptions? To systematically address this question, we propose a novel quantitative metric, Parameter Vulnerability Factor (PVF), inspired by architectural vulnerability factor (AVF) in computer architecture community, aiming to standardize the quantification of AI model vulnerability against parameter corruptions. We define a model parameter's PVF as the probability that a corruption in that particular model parameter will result in an incorrect output. In this paper, we present several use cases on applying PVF to three types of tasks/models during inference -- recommendation (DLRM), vision classification (CNN), and text classification (BERT), while presenting an in-depth vulnerability analysis on DLRM. PVF can provide pivotal insights to AI hardware designers in balancing the tradeoff between fault protection and performance/efficiency such as mapping vulnerable AI parameter components to well-protected hardware modules. PVF metric is applicable to any AI model and has a potential to help unify and standardize AI vulnerability/resilience evaluation practice.

翻訳日:2024-06-13 22:14:47 公開日:2024-06-11

# 潜伏潜伏実験による運動不自由者に対するハンドジェスチャのウェアラブルセンサベースFew-Shot連続学習

Wearable Sensor-Based Few-Shot Continual Learning on Hand Gestures for Motor-Impaired Individuals via Latent Embedding Exploitation ( http://arxiv.org/abs/2405.08969v2 )

ライセンス: Link先を確認

Riyad Bin Rafiq, Weishi Shi, Mark V. Albert,

(参考訳) ハンドジェスチャは、人間とコンピュータのインタラクションの自然な手段を提供し、会話ができない人でも効率的にコミュニケーションできる。既存のジェスチャー認識法は、事前に定義されたジェスチャーに大きく依存するが、運動障害のある個人は、各個人のジェスチャー動作やスタイルに合わせて、新しいジェスチャーを必要とする。異なる人物から採取したジェスチャーサンプルは、健康状態、障害の重症度、腕の動きパターンなどによって分布の変化がある。本稿では,リプレイベースFew-Shot Continual Learning (FSCL) フレームワークにおけるLatent Embedding Exploitation (LEE) 機構を紹介する。本手法は,2つの追加埋め込みから得られた着地内ばらつきとともに,ジェスチャー先行知識として知られる保存された潜伏埋め込みを活用することにより,多彩な潜伏特徴空間を創出する。このように、モデルは、限られたサンプルで高度に可変なジェスチャーで潜時統計構造をキャプチャすることができる。我々はSmartWatch GestureとMotion Gestureデータセットを用いて実験評価を行う。提案手法は,6種類のジェスチャーに対して,1,3,5サンプルを用いて平均57.0%,64.6%,69.3%の検査精度を示す。本手法は、運動障害者がウェアラブルデバイスを活用するのに役立ち、そのユニークな動作様式を学習し、人間とコンピュータのインタラクションやソーシャルコミュニケーションに適用することができる。 https://github.com/riyadRafiq/wearable-latent-embedding-exploitation

Hand gestures can provide a natural means of human-computer interaction and enable people who cannot speak to communicate efficiently. Existing hand gesture recognition methods heavily depend on pre-defined gestures, however, motor-impaired individuals require new gestures tailored to each individual's gesture motion and style. Gesture samples collected from different persons have distribution shifts due to their health conditions, the severity of the disability, motion patterns of the arms, etc. In this paper, we introduce the Latent Embedding Exploitation (LEE) mechanism in our replay-based Few-Shot Continual Learning (FSCL) framework that significantly improves the performance of fine-tuning a model for out-of-distribution data. Our method produces a diversified latent feature space by leveraging a preserved latent embedding known as gesture prior knowledge, along with intra-gesture divergence derived from two additional embeddings. Thus, the model can capture latent statistical structure in highly variable gestures with limited samples. We conduct an experimental evaluation using the SmartWatch Gesture and the Motion Gesture datasets. The proposed method results in an average test accuracy of 57.0%, 64.6%, and 69.3% by using one, three, and five samples for six different gestures. Our method helps motor-impaired persons leverage wearable devices, and their unique styles of movement can be learned and applied in human-computer interaction and social communication. Code is available at: https://github.com/riyadRafiq/wearable-latent-embedding-exploitation

翻訳日:2024-06-13 22:14:47 公開日:2024-06-11

# LOGO:言語協調とグリフ知覚モデルを用いたビデオテキストスポッティング

LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model ( http://arxiv.org/abs/2405.19194v2 )

ライセンス: Link先を確認

Hongen Liu, Di Sun, Jiahao Wang, Yi Liu, Gang Pan,

(参考訳) ビデオテキストスポッティング(VTS)は、ビデオ内のテキストインスタンスを同時にローカライズ、認識、追跡することを目的としている。エンド・ツー・エンド方式の限られた認識能力に対処するため、最新の手法では、最先端画像テキストスポッターのゼロショット結果を直接追跡し、印象的な性能を実現している。しかしながら、異なるデータセット間のドメインギャップのため、これらのメソッドは通常、極端なデータセット上の限られたトラッキングトラジェクトリを取得する。特定のデータセット上の微調整トランスフォーマーベースのテキストスポッターは、かなりのトレーニングリソースを犠牲にして、パフォーマンスの向上をもたらす可能性がある。本稿では,従来のテキストスポッターの性能向上を目的とした革新的なフレームワークであるLOGO(Language Collaboration and Glyph Perception Model)を提案する。この目的を達成するために、認識段階における背景雑音からテキストインスタンスを明示的に識別する言語シナジー分類器(LSC)を設計する。特に、言語シナジー分類器は、テキスト領域の正当性に基づいて、テキストコンテンツまたはバックグラウンドコードを出力できるため、言語スコアを計算できる。その後、検出スコアと言語スコアの平均値を取得して融合スコアを算出し、追跡前に検出結果を再スコアする。再描画機構により,LSCはテキストライクな領域をフィルタリングしながら低解像度テキストインスタンスの検出を容易にする。さらに、ノイズの多いテキスト領域の認識精度を高めるために、グリフ監視を導入する。さらに、位置情報と視覚的特徴を効率よく統合し、より識別的な追跡機能を得る視覚的位置混合モジュールを提案する。提案手法の有効性を,公開ベンチマークで検証した。

Video text spotting (VTS) aims to simultaneously localize, recognize and track text instances in videos. To address the limited recognition capability of end-to-end methods, recent methods track the zero-shot results of state-of-the-art image text spotters directly, and achieve impressive performance. However, owing to the domain gap between different datasets, these methods usually obtain limited tracking trajectories on extreme dataset. Fine-tuning transformer-based text spotters on specific datasets could yield performance enhancements, albeit at the expense of considerable training resources. In this paper, we propose a Language Collaboration and Glyph Perception Model, termed LOGO, an innovative framework designed to enhance the performance of conventional text spotters. To achieve this goal, we design a language synergy classifier (LSC) to explicitly discern text instances from background noise in the recognition stage. Specially, the language synergy classifier can output text content or background code based on the legibility of text regions, thus computing language scores. Subsequently, fusion scores are computed by taking the average of detection scores and language scores, and are utilized to re-score the detection results before tracking. By the re-scoring mechanism, the proposed LSC facilitates the detection of low-resolution text instances while filtering out text-like regions. Moreover, the glyph supervision is introduced to enhance the recognition accuracy of noisy text regions. In addition, we propose the visual position mixture module, which can merge the position information and visual features efficiently, and acquire more discriminative tracking features. Extensive experiments on public benchmarks validate the effectiveness of the proposed method.

翻訳日:2024-06-13 22:05:02 公開日:2024-06-11

# 最大傾き問題の解法における量子アニーリングアルゴリズムの解析

An Analysis of Quantum Annealing Algorithms for Solving the Maximum Clique Problem ( http://arxiv.org/abs/2406.07587v1 )

ライセンス: Link先を確認

Alessandro Gherardi, Alberto Leporati,

(参考訳) 量子アンニアは、2次非制約二元最適化(QUBO)問題として定式化したり、等しくイジングの定式化を用いて多くの(おそらくNP-ハード)組合せ最適化問題を解くのに使うことができる。本稿では,QUBO問題として表されるグラフ上の最大傾きを求める量子D波アニーラの能力を解析する。アンネラが課した164ノードの埋め込み限界のため, グラフ分解によるインスタンスの埋め込みについて検討した。そこで本稿では, 相補的な最大独立集合問題に対する分解アルゴリズムと, ノード数, 傾き数, 密度, 接続率, 解サイズの他のノード数に対する比を制御するグラフ生成アルゴリズムを提案する。そして、これらの変数が量子アニールによって見つかる解の質にどのように影響するかを統計的に分析した。本研究の結果は, 最適に近い解を得る確率を最大化するために実施すべき一連の予防策, 事前分析など, 比および密度限界を超過しないよう推奨することを含む。

Quantum annealers can be used to solve many (possibly NP-hard) combinatorial optimization problems, by formulating them as quadratic unconstrained binary optimization (QUBO) problems or, equivalently, using the Ising formulation. In this paper we analyse the ability of quantum D-Wave annealers to find the maximum clique on a graph, expressed as a QUBO problem. Due to the embedding limit of 164 nodes imposed by the anneler, we conducted a study on graph decomposition to enable instance embedding. We thus propose a decomposition algorithm for the complementary maximum independent set problem, and a graph generation algorithm to control the number of nodes, the number of cliques, the density, the connectivity indices and the ratio of the solution size to the number of other nodes. We then statistically analysed how these variables affect the quality of the solutions found by the quantum annealer. The results of our investigation include recommendations on ratio and density limits not to be exceeded, as well as a series of precautions and a priori analyses to be carried out in order to maximise the probability of obtaining a solution close to the optimum.

翻訳日:2024-06-13 21:45:26 公開日:2024-06-11

# AIM: マルチモーダルな大規模言語モデルにインコンテキスト学習を効果的に実施させる

AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning ( http://arxiv.org/abs/2406.07588v1 )

ライセンス: Link先を確認

Jun Gao, Qian Qiao, Ziqiang Cao, Zili Wang, Wenjie Li,

(参考訳) In-context Learning(ICL)は、数十億のパラメータを更新することなく、下流タスクに創発的な能力を示すLarge Language Models(LLM)を容易にする。しかし、MLLM(Multi-modal Large Language Models)の分野では、2つの問題がマルチモーダルICLの適用を妨げる。 2)デモの増加に伴い,数千の視覚トークンがハードウェアに挑戦し,ICL性能を低下させた。予備的な調査では、内部のLLMは、応答を生成するためのマルチモーダルな実演において、言語的モダリティに重点を置いていることが判明した。そこで本稿では, 対応する言語部分の高密度潜在空間に対して, <textbf{A}mage information of \textbf{M}ultimodal demonstrations を集約することで, 上記の問題に対処するための, 汎用的で軽量なフレームワークである \textbf{AIM} を提案する。具体的には、AIMはまず凍結したバックボーンMLLMを使用して各画像テキストのデモを読み出し、テキストの上のベクトル表現を抽出する。これらのベクトルは自然に画像とテキストのペアに関する情報を融合させ、AIMはそれらを訓練可能な投影層を介して内部LLMに許容される融合仮想トークンに変換する。最終的に、これらの融合トークンはマルチモーダルなデモの変種として機能し、MLLMに入力され、通常通り現在のクエリに応答する。これらの融合トークンは、画像とテキストのペアのテキストコンポーネントに由来するため、マルチモーダルなデモはほぼ純粋なテキストによるデモに還元され、任意のMLLMにシームレスに適用される。実のMLLMを凍結することで、AIMはパラメータ効率が良く、下流のテストタスクとは無関係な公開マルチモーダルウェブコーパスでトレーニングする。

In-context learning (ICL) facilitates Large Language Models (LLMs) exhibiting emergent ability on downstream tasks without updating billions of parameters. However, in the area of multi-modal Large Language Models (MLLMs), two problems hinder the application of multi-modal ICL: (1) Most primary MLLMs are only trained on single-image datasets, making them unable to read multi-modal demonstrations. (2) With the demonstrations increasing, thousands of visual tokens highly challenge hardware and degrade ICL performance. During preliminary explorations, we discovered that the inner LLM tends to focus more on the linguistic modality within multi-modal demonstrations to generate responses. Therefore, we propose a general and light-weighted framework \textbf{AIM} to tackle the mentioned problems through \textbf{A}ggregating \textbf{I}mage information of \textbf{M}ultimodal demonstrations to the dense latent space of the corresponding linguistic part. Specifically, AIM first uses the frozen backbone MLLM to read each image-text demonstration and extracts the vector representations on top of the text. These vectors naturally fuse the information of the image-text pair, and AIM transforms them into fused virtual tokens acceptable for the inner LLM via a trainable projection layer. Ultimately, these fused tokens function as variants of multi-modal demonstrations, fed into the MLLM to direct its response to the current query as usual. Because these fused tokens stem from the textual component of the image-text pair, a multi-modal demonstration is nearly reduced to a pure textual demonstration, thus seamlessly applying to any MLLMs. With its de facto MLLM frozen, AIM is parameter-efficient and we train it on public multi-modal web corpora which have nothing to do with downstream test tasks.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# タグと正しい:音声認識誤り訂正のための高精度後編集手法

Tag and correct: high precision post-editing approach to correction of speech recognition errors ( http://arxiv.org/abs/2406.07589v1 )

ライセンス: Link先を確認

Tomasz Ziętkiewicz,

(参考訳) 本稿では,後編集による音声認識誤り訂正問題に対する新しいアプローチを提案する。 ASR(Automatic Speech Recognition)仮説の単語を単語単位で修正する方法を学ぶニューラルネットワークタグと、タグによって返される修正を適用する修正モジュールとから構成される。提案手法はアーキテクチャによらず,任意のASRシステムに適用可能である。これは本番環境では特に重要であり、エラー訂正モデルによる新しいミスの導入を避けることは、全体的な結果の純利よりも重要である可能性がある。その結果,提案モデルの性能は従来の手法に匹敵するが,トレーニングに要するリソースははるかに小さいため,推論遅延とトレーニング時間の両方が他の手法の使用を制限する重要な要因である産業用途に適していることがわかった。

This paper presents a new approach to the problem of correcting speech recognition errors by means of post-editing. It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis word by word and a corrector module that applies corrections returned by the tagger. The proposed solution is applicable to any ASR system, regardless of its architecture, and provides high-precision control over errors being corrected. This is especially crucial in production environments, where avoiding the introduction of new mistakes by the error correction model may be more important than the net gain in overall results. The results show that the performance of the proposed error correction models is comparable with previous approaches while requiring much smaller resources to train, which makes it suitable for industrial applications, where both inference latency and training times are critical factors that limit the use of other techniques.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# StreamPrompt: 効率的なストリーム学習のための学習可能なプロンプト誘導データ選択

StreamPrompt: Learnable Prompt-guided Data Selection for Efficient Stream Learning ( http://arxiv.org/abs/2406.07590v1 )

ライセンス: Link先を確認

Tongjun Shi, Shuhao Zhang,

(参考訳) ストリーム学習(SL)は、従来の継続学習(CL)とは別物として、連続したデータストリームに迅速に適応するモデルを必要とする。近年のSL法では、トレーニング用のデータサブセットを選択することで効率性が強調されているが、データの重要性の変化に効果的に適応できない静的なルールベースの選択アルゴリズムに依存しているため、しばしば苦労する。本稿では,動的で学習可能なプロンプトによってデータ選択を強化するStreamPromptを紹介する。これらの動的なプロンプトは、モデル推論を導くこと以上の2つの目的を果たす。 1)データ選択の最適化、及び 2) リハーサルバッファの更新を案内する。このアプローチは、連続データストリームの処理における適応性と計算効率の課題に対処する。さらに、StreamPromptは、迅速な学習の効率を高めるメカニズムであるPrompt Attunementを導入した。視覚変換器からの注意層を活用し、それらの出力をゲートユニットとソフトに結合することにより、Prompt Attunementrefinesはプロンプトを最小の計算資源で処理する。総合的な評価では、StreamPromptは最先端よりも優れたパフォーマンスを示し、トレーニング時間の精度と削減が大幅に向上した。これらの結果はStreamPromptの有効性と効率を裏付け、SLの進化する要求に対するスケーラブルで効果的なソリューションとしての可能性を確立した。私たちのコードはhttps://github.com/intellistream/Efficient-Stream-Learning.comで公開されています。

Stream Learning (SL) requires models to rapidly adapt to continuous data streams, setting it apart from traditional Continual Learning (CL). Recent SL methods emphasize efficiency by selecting data subsets for training, but they often struggle due to their reliance on static, rule-based selection algorithms that cannot effectively adapt to the changing importance of data. In this work, we introduce StreamPrompt, a method that enhances data selection through dynamic, learnable prompts. These dynamic prompts serve two purposes beyond guiding model inference: 1) optimizing data selection, and 2) guiding updates to the rehearsal buffer. This approach addresses the challenges of adaptability and computational efficiency in processing continuous data streams. Moreover, StreamPrompt introduces Prompt Attunement,a mechanism that enhances the efficiency of prompt learning. By leveraging attention layers from vision transformers and softly combining their outputs with a gate unit, Prompt Attunementrefines prompts with minimal computational resources. Comprehensive evaluations demonstrate StreamPrompts superior performance over state-of-the-art, with significant improvements in accuracy and reductions in training time. These results underscore the efficacy and efficiency of StreamPrompt, establishing its potential as a scalable and effective solution for the evolving demands of SL. Our code is available at https://github.com/intellistream/Efficient-Stream-Learning.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# MambaLRP: Selective State Space Sequence Modelの説明

MambaLRP: Explaining Selective State Space Sequence Models ( http://arxiv.org/abs/2406.07592v1 )

ライセンス: Link先を確認

Farnoush Rezaei Jafari, Grégoire Montavon, Klaus-Robert Müller, Oliver Eberle,

(参考訳) 選択状態空間系列モデル(マンバモデルと呼ばれる)を用いた最近のシーケンスモデリング手法は、関心が高まりつつある。これらのモデルは、線形時間における長いシーケンスの効率的な処理を可能にし、言語モデリングのような幅広いアプリケーションで急速に採用され、有望な性能を示す。現実のシナリオにおける信頼性の高い利用を促進するためには、透明性を高めることが重要です。私たちの研究は、説明可能性、特にLayer-wise Relevance Propagation(LRP)をMambaアーキテクチャにもたらすことで、この重要なギャップを埋めます。関係保存の公理に導かれ、マムバ建築の特定の構成要素を特定し、不誠実な説明を引き起こす。この問題を解決するため,LRP フレームワーク内の新しいアルゴリズムである MambaLRP を提案する。提案手法は理論的に健全であり,多種多様なモデルやデータセットにまたがる最先端の説明性能を実現するのに優れている。さらに、MambaLRPは、Mambaアーキテクチャのより深い検査を促進し、様々なバイアスを明らかにし、それらの重要性を評価する。また、マンバ模型の長距離能力に関する以前の憶測の分析も可能である。

Recent sequence modeling approaches using Selective State Space Sequence Models, referred to as Mamba models, have seen a surge of interest. These models allow efficient processing of long sequences in linear time and are rapidly being adopted in a wide range of applications such as language modeling, demonstrating promising performance. To foster their reliable use in real-world scenarios, it is crucial to augment their transparency. Our work bridges this critical gap by bringing explainability, particularly Layer-wise Relevance Propagation (LRP), to the Mamba architecture. Guided by the axiom of relevance conservation, we identify specific components in the Mamba architecture, which cause unfaithful explanations. To remedy this issue, we propose MambaLRP, a novel algorithm within the LRP framework, which ensures a more stable and reliable relevance propagation through these components. Our proposed method is theoretically sound and excels in achieving state-of-the-art explanation performance across a diverse range of models and datasets. Moreover, MambaLRP facilitates a deeper inspection of Mamba architectures, uncovering various biases and evaluating their significance. It also enables the analysis of previous speculations regarding the long-range capabilities of Mamba models.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# アクティブ推論を用いた持続可能な資源管理のモデリング

Modeling Sustainable Resource Management using Active Inference ( http://arxiv.org/abs/2406.07593v1 )

ライセンス: Link先を確認

Mahault Albarracin, Ines Hipolito, Maria Raffa, Paul Kinghorn,

(参考訳) 能動推論は,生物および人工エージェントの適応行動と意思決定をシミュレートするのに役立つ。本研究は, アクティブ推論, ウェルビーイング, レジリエンス, 持続可能性の関係を探求する先行研究に基づいて, 静的環境と動的環境の両方において, 持続可能な資源管理戦略を学習するエージェントの計算モデルを提案する。エージェントの行動は、環境力学に関する信念に基づいて、優先的な嗜好によって表される自身の幸福を最適化することから生じる。静的な環境では、エージェントはそのニーズを満たすためにリソースを一貫して消費することを学ぶ。エージェントの動作に基づいてリソースが枯渇し、補給される動的な環境では、エージェントは、その動作に適応して、短期的なリソース可用性と即時的な要求のバランスをとる。これは、環境条件の変化に直面した場合に、アクティブな推論が持続的で回復力のある行動を引き起こすことを示す。我々は,モデルの意味,その限界,さらに複雑なエージェントと環境の相互作用を統合するための今後の方向性について論じる。我々の研究は、持続可能な行動を理解し形成する活動的推論の可能性を強調している。

Active inference helps us simulate adaptive behavior and decision-making in biological and artificial agents. Building on our previous work exploring the relationship between active inference, well-being, resilience, and sustainability, we present a computational model of an agent learning sustainable resource management strategies in both static and dynamic environments. The agent's behavior emerges from optimizing its own well-being, represented by prior preferences, subject to beliefs about environmental dynamics. In a static environment, the agent learns to consistently consume resources to satisfy its needs. In a dynamic environment where resources deplete and replenish based on the agent's actions, the agent adapts its behavior to balance immediate needs with long-term resource availability. This demonstrates how active inference can give rise to sustainable and resilient behaviors in the face of changing environmental conditions. We discuss the implications of our model, its limitations, and suggest future directions for integrating more complex agent-environment interactions. Our work highlights active inference's potential for understanding and shaping sustainable behaviors.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# MLLMGuard:マルチモーダル大言語モデルのための多次元安全評価スイート

MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models ( http://arxiv.org/abs/2406.07594v1 )

ライセンス: Link先を確認

Tianle Gu, Zeyang Zhou, Kexin Huang, Dandan Liang, Yixu Wang, Haiquan Zhao, Yuanqi Yao, Xingge Qiao, Keqing Wang, Yujiu Yang, Yan Teng, Yu Qiao, Yingchun Wang,

(参考訳) LLM(Large Language Models)やMLLM(Multimodal Large Language Models)の顕著な進歩によって、多様体のタスクにおける印象的な能力が示される。しかし、MLLMの実践的な応用シナリオは複雑であり、悪意のある命令に晒され、それによって安全性のリスクが生じる。現在のベンチマークには特定の安全性の考慮事項が含まれているが、包括的なカバレッジが欠如しており、必要な厳密さと堅牢性を示すことができないことが多い。例えば、評価対象と評価対象のモデルの両方にGPT-4Vを用いるという一般的な実践は、自分自身の反応に偏りを示す傾向があるため、信頼性に欠ける。本稿では,MLLMの多次元安全性評価スイートであるMLLMGuardについて述べる。 MLLMGuardの評価は、2つの言語(英語と中国語)と5つの重要な安全次元(Privacy, Bias, Toxicity, Truthfulness, Legality)を包括的にカバーしている。これらの次元に着目して、評価データセットは主にソーシャルメディアなどのプラットフォームから作成されており、テキストベースおよび画像ベースのレッドチーム技術と、人間の専門家による巧妙なアノテーションを統合している。これにより、オープンソースのデータセットを使用する際のデータ漏洩による不正確な評価が防止され、ベンチマークの品質と課題の性質が保証される。さらに、完全に自動化された軽量評価器であるGuardRankが開発され、GPT-4よりも高い評価精度を実現している。 13種類の先進モデルに対する評価結果は,MLLMが安全かつ責任を負うことができるまでには,まだかなりの道のりを歩んでいることを示唆している。

Powered by remarkable advancements in Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities in manifold tasks. However, the practical application scenarios of MLLMs are intricate, exposing them to potential malicious instructions and thereby posing safety risks. While current benchmarks do incorporate certain safety considerations, they often lack comprehensive coverage and fail to exhibit the necessary rigor and robustness. For instance, the common practice of employing GPT-4V as both the evaluator and a model to be evaluated lacks credibility, as it tends to exhibit a bias toward its own responses. In this paper, we present MLLMGuard, a multidimensional safety evaluation suite for MLLMs, including a bilingual image-text evaluation dataset, inference utilities, and a lightweight evaluator. MLLMGuard's assessment comprehensively covers two languages (English and Chinese) and five important safety dimensions (Privacy, Bias, Toxicity, Truthfulness, and Legality), each with corresponding rich subtasks. Focusing on these dimensions, our evaluation dataset is primarily sourced from platforms such as social media, and it integrates text-based and image-based red teaming techniques with meticulous annotation by human experts. This can prevent inaccurate evaluation caused by data leakage when using open-source datasets and ensures the quality and challenging nature of our benchmark. Additionally, a fully automated lightweight evaluator termed GuardRank is developed, which achieves significantly higher evaluation accuracy than GPT-4. Our evaluation results across 13 advanced models indicate that MLLMs still have a substantial journey ahead before they can be considered safe and responsible.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# VulDetectBench: 大規模言語モデルによる脆弱性検出の深い機能評価

VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models ( http://arxiv.org/abs/2406.07595v1 )

ライセンス: Link先を確認

Yu Liu, Mingxin Yang, Yu Xie, Ping Chen, Xiaojin Zhang, Wei Chen,

(参考訳) 大規模言語モデル(LLM)は、大量のプログラムコードを含むトレーニングコーパスを持ち、モデルのコード理解と生成能力を大幅に改善する。しかし、プログラムの脆弱性の検出、コードに関するより具体的なタスク、そしてこのより専門的なシナリオにおけるLLMの性能評価に関する包括的な研究は、いまだに不足している。脆弱性分析における一般的な課題に対処するため,本研究では,LSMの脆弱性検出機能を評価するために特別に設計された,新たなベンチマークであるVulDetectBenchを紹介した。このベンチマークは、LLMの脆弱性を特定し、分類し、発見する能力を、難易度を高める5つのタスクを通じて総合的に評価している。我々は17モデル(オープンソースとクローズドソースの両方)の性能を評価し、既存のモデルでは脆弱性の識別と分類に関連するタスクにおいて80%以上の精度を達成できるが、その一方で、特定のより詳細な脆弱性分析タスクでは、30%未満の精度で不足しており、プロの脆弱性マイニングに有用な補助情報を提供することは困難である。本ベンチマークでは,脆弱性検出の特定のタスクにおいて,様々なLLMの能力評価を効果的に行うとともに,コードセキュリティの重要領域における今後の研究と改善の基盤となる。 VulDetectBenchはhttps://github.com/Sweetaroo/VulDetectBench.comで公開されている。

Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challenges in vulnerability analysis, our study introduces a new benchmark, VulDetectBench, specifically designed to assess the vulnerability detection capabilities of LLMs. The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty. We evaluate the performance of 17 models (both open- and closed-source) and find that while existing models can achieve over 80% accuracy on tasks related to vulnerability identification and classification, they still fall short on specific, more detailed vulnerability analysis tasks, with less than 30% accuracy, making it difficult to provide valuable auxiliary information for professional vulnerability mining. Our benchmark effectively evaluates the capabilities of various LLMs at different levels in the specific task of vulnerability detection, providing a foundation for future research and improvements in this critical area of code security. VulDetectBench is publicly available at https://github.com/Sweetaroo/VulDetectBench.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# 最小フレーム平均化による高対称性と効率の等価性

Equivariance via Minimal Frame Averaging for More Symmetries and Efficiency ( http://arxiv.org/abs/2406.07598v1 )

ライセンス: Link先を確認

Yuchao Lin, Jacob Helwig, Shurui Gui, Shuiwang Ji,

(参考訳) フレーム平均化による機械学習システムにおける等価性の実現を検討する。現在のフレーム平均化法は、大きなフレーム上でのコストのかかる和や、近似同値しか得られないサンプリングベースのアプローチに依存している。本稿では,最小フレーム平均化(MFA, Minimal Frame Averaging)を提案する。 MFAの一般基盤はまた、時空の対称性を記述するローレンツ群や複素値領域のユニタリ群など、これまで考えられていたよりも多くの群にフレーム平均化を拡張できる。その結果,MFAによる対称性の符号化は,$n$-bodyシミュレーション,コライダー物理におけるトップタグ付け,緩和エネルギー予測など,多種多様なタスクにまたがって効率と効果が示された。私たちのコードはhttps://github.com/divelab/MFA.comで公開されています。

We consider achieving equivariance in machine learning systems via frame averaging. Current frame averaging methods involve a costly sum over large frames or rely on sampling-based approaches that only yield approximate equivariance. Here, we propose Minimal Frame Averaging (MFA), a mathematical framework for constructing provably minimal frames that are exactly equivariant. The general foundations of MFA also allow us to extend frame averaging to more groups than previously considered, including the Lorentz group for describing symmetries in space-time, and the unitary group for complex-valued domains. Results demonstrate the efficiency and effectiveness of encoding symmetries via MFA across a diverse range of tasks, including $n$-body simulation, top tagging in collider physics, and relaxed energy prediction. Our code is available at https://github.com/divelab/MFA.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# CTIBench:サイバー脅威インテリジェンスにおけるLCMの評価ベンチマーク

CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence ( http://arxiv.org/abs/2406.07599v1 )

ライセンス: Link先を確認

Md Tanvirul Alam, Dipkamal Bhushl, Le Nguyen, Nidhi Rastogi,

(参考訳) サイバー脅威インテリジェンス(サイバー脅威インテリジェンス、サイバー脅威インテリジェンス、CTI)は、サイバーセキュリティの世界において重要な存在であり、進化を続けるサイバー脅威を理解し、緩和するための重要な洞察を提供する。近年のLarge Language Models (LLM) の台頭は、この領域における潜在的な可能性を示しているが、信頼性、正確性、幻覚に関する懸念は続いている。既存のベンチマークはLCMの一般的な評価を提供するが、CTI固有のタスクの実践的および応用的な側面に対処するベンチマークは存在しない。このギャップを埋めるために、我々はCTIアプリケーションにおけるLCMの性能を評価するために設計されたベンチマークであるCTIBenchを紹介する。 CTIBenchには、サイバー脅威の状況においてLLMが取得した知識を評価することに焦点を当てた複数のデータセットが含まれている。これらのタスクに対するいくつかの最先端モデルの評価は、CTIコンテキストにおけるその強みと弱みに関する洞察を与え、CTIにおけるLLM能力のより深い理解に寄与する。

Cyber threat intelligence (CTI) is crucial in today's cybersecurity landscape, providing essential insights to understand and mitigate the ever-evolving cyber threats. The recent rise of Large Language Models (LLMs) have shown potential in this domain, but concerns about their reliability, accuracy, and hallucinations persist. While existing benchmarks provide general evaluations of LLMs, there are no benchmarks that address the practical and applied aspects of CTI-specific tasks. To bridge this gap, we introduce CTIBench, a benchmark designed to assess LLMs' performance in CTI applications. CTIBench includes multiple datasets focused on evaluating knowledge acquired by LLMs in the cyber-threat landscape. Our evaluation of several state-of-the-art models on these tasks provides insights into their strengths and weaknesses in CTI contexts, contributing to a better understanding of LLM capabilities in CTI.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# 中間回路計測における読み出し誤差低減とフィードフォワード

Readout Error Mitigation for Mid-Circuit Measurements and Feedforward ( http://arxiv.org/abs/2406.07611v1 )

ライセンス: Link先を確認

Jin Ming Koh, Dax Enshan Koh, Jayne Thompson,

(参考訳) 現在の量子コンピューティングプラットフォームはリードアウトエラーを受けており、デバイスが計測結果の故障を報告している。中間回路測定とフィードフォワードの回路では、読み出しノイズは、不正確な条件量子演算をショット単位で適用することができる。後処理で作用する終端測定のための標準読み出し誤差軽減法は、この文脈では十分ではない。本稿では,回路の任意の層数とフィードフォワードを含む回路上の期待値に対する,回路深さ0と2量子ゲートカウントコストで読み出し誤差を緩和する一般的な手法を提案する。このプロトコルは、エラーチャネルの対称性化とフィードフォワードデータの確率的ビットフリップを量子軌道のアンサンブル上で平均化するためにゲートツイリングの形式を使用する。緩和された推定器は偏りがなく、サンプリングオーバーヘッドは${\sim} 1 / (1 - 2 r)^m$ for $m$ total measured and characteristic readout error rate $r$である。本稿では,動的量子ビットリセット,浅部GHZ状態準備,多段量子状態テレポーテーションなど,実用上の興味のあるフィードフォワード回路の例に対して,超伝導量子プロセッサのエラーを最大60 %削減する手法の有効性を実証する。

Current-day quantum computing platforms are subject to readout errors, in which faulty measurement outcomes are reported by the device. On circuits with mid-circuit measurements and feedforward, readout noise can cause incorrect conditional quantum operations to be applied on a per-shot basis. Standard readout error mitigation methods for terminal measurements which act in post-processing do not suffice in this context. Here we present a general method for readout error mitigation for expectation values on circuits containing an arbitrary number of layers of mid-circuit measurements and feedforward, at zero circuit depth and two-qubit gate count cost. The protocol uses a form of gate twirling for symmetrization of the error channels and probabilistic bit-flips in feedforward data to average over an ensemble of quantum trajectories. The mitigated estimator is unbiased and has a sampling overhead of ${\sim} 1 / (1 - 2 r)^m$ for $m$ total measurements and characteristic readout error rate $r$ per measurement. We demonstrate the effectiveness of our method, obtaining up to a ${\sim} 60\%$ reduction in error on superconducting quantum processors for several examples of feedforward circuits of practical interest, including dynamic qubit resets, shallow-depth GHZ state preparation, and multi-stage quantum state teleportation.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# 散逸系における正規およびカオス古典力学の量子的区別の破壊

Breakdown of the quantum distinction of regular and chaotic classical dynamics in dissipative systems ( http://arxiv.org/abs/2406.07616v1 )

ライセンス: Link先を確認

David Villaseñor, Lea F. Santos, Pablo Barberis-Blostein,

(参考訳) グローブ・ヘイク・ソマーズ予想(Grobe-Haake-Sommers、GHS)は、ボヒガス・ジョノニ・シュミット予想を散逸系に一般化し、古典的なカオス系と、ジニブレのアンサンブルによって予測されるレベル反発を示す量子スペクトルを結びつける。ここでは、GHS予想が実験的関心のスピンボソンモデルであるオープンディックモデルに当てはまらないことを示す。驚くべきことに、オープン量子モデルがジニブレ準位統計を示す場合、古典的極限におけるカオス構造の証拠は必ずしも見つからない。この結果は、GHS予想の普遍性に挑戦し、オープン量子系におけるスペクトル相関の源は何かという疑問を提起する。

The Grobe-Haake-Sommers (GHS) conjecture generalizes the Bohigas-Giannoni-Schmit conjecture to dissipative systems, connecting classically chaotic systems with quantum spectra that exhibit level repulsion as predicted by Ginibre ensembles. Here, we show that the GHS conjecture does not hold for the open Dicke model, which is a spin-boson model of experimental interest. Surprisingly, where the open quantum model shows Ginibre level statistics, we do not always find evidence of chaotic structures in the classical limit. This result challenges the universality of the GHS conjecture and raises the question of what is the source of spectral correlations in open quantum systems.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# 2次元サブ波長アレイにおける不純物を用いた協調センシング

Cooperative Sensing with Impurities in a Two-Dimensional Subwavelength Array ( http://arxiv.org/abs/2406.07619v1 )

ライセンス: Link先を確認

Oliver August Dall'Alba Sandberg, Stefan Ostermann, Susanne F. Yelin,

(参考訳) 本稿では,2次元サブ波長原子配列に不純物として埋め込まれた2つの散逸結合した遠方原子をベースとした多用途量子センシングプロトコルを提案する。アレイはエミッタ光の導波路として機能し、より効率的な人口移動を可能にする協調的な拡張を生み出す。不純物原子の1つの集団を監視することにより、エミッタの共鳴周波数の周波数シフトを検出することができる。我々は、達成可能な感度と様々なシステムパラメータへの依存性を解析的に推定する。提案プロトコルは, 様々な環境要因や摂動に対して堅牢であり, 実環境における適用性を高めている。

We propose a versatile quantum sensing protocol based on two dissipatively coupled distant atoms embedded as impurities in a two-dimensional sub-wavelength atomic array. The array acts as a waveguide for the emitter light, creating cooperative enhancement that allows for more efficient population transfer. By monitoring the population of one of the impurity atoms, it is possible to detect frequency shifts in the emitters' resonance frequencies. We analytically estimate achievable sensitivities as well as the dependence on various system parameters. The proposed protocol is robust against various environmental factors and perturbations, which enhances its applicability in real-world scenarios.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# 高次元非エルミタン系におけるテーラー境界状態幾何

Tailoring Bound State Geometry in High-Dimensional Non-Hermitian Systems ( http://arxiv.org/abs/2406.07626v1 )

ライセンス: Link先を確認

Ao Yang, Zixi Fang, Kai Zhang, Chen Fang,

(参考訳) 非エルミート効果(NHSE)はその非相互性のため、不純物境界状態の出現の障壁を生じると一般に信じられている。本稿では,2次元以上の次元において,幾何依存性の皮膚効果の存在は,無限小不純物ポテンシャルでさえも,このタイプの非エルミート系において有界な状態を閉じ込められるような障壁を排除できることを見出した。ブロッホ・サドル点の周囲の有界状態を調べることで、非ハーミティティーは有界状態の等方性を阻害し、ダンベル型有界状態の凹凸となることが分かる。我々の研究は、高次元非エルミート系における凹凸と凸の間の境界状態の幾何学的遷移を明らかにする。

It is generally believed that the non-Hermitian effect (NHSE), due to its non-reciprocal nature, creates barriers for the appearance of impurity bound states. In this paper, we find that in two and higher dimensions, the presence of geometry-dependent skin effect eliminates this barrier such that even an infinitesimal impurity potential can confine bound states in this type of non-Hermitian systems. By examining bound states around Bloch saddle points, we find that non-Hermiticity can disrupt the isotropy of bound states, resulting in concave dumbbell-shaped bound states. Our work reveals a geometry transition of bound state between concavity and convexity in high-dimensional non-Hermitian systems.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# Qureed

QuReed ( http://arxiv.org/abs/2406.07638v1 )

ライセンス: Link先を確認

Simon Sekavčnik, Kareem H. El-Safty, Janis Nötzel,

(参考訳) 提案するQuReedは,量子理論と実験コミュニティ,エンジニアリングのギャップを埋めるために設計された,オープンソースの量子シミュレーションフレームワークである。量子力学が成熟し、量子コンピューティング以上の大きな可能性を秘めているため、物理的に正確なシミュレーションの必要性が重要になる。 QuReedはピアレビューシミュレーションモデルを提供し、量子通信プロトコルやアプリケーションを探索するための信頼性の高いツールを研究者やエンジニアに提供する。理論と実験のクロストークを促進することで、QuReedは分野の進歩を加速し、通信業界における量子力学の変換力を解き放つことを目指している。ユーザフレンドリなPythonインターフェースと包括的なドキュメントは、広範囲のアクセシビリティとユーザビリティを保証する。

We present QuReed, an open-source quantum simulation framework designed to bridge gaps between quantum theory, experimental community and engineering. With Quantum Mechanics maturing and holding significant potential beyond quantum computing, the need for physically accurate simulations becomes critical. QuReed offers peer-reviewed simulation models, providing researchers and engineers with reliable tools for exploring quantum communication protocols and applications. By facilitating cross-talk between theory and experiments, QuReed aims to accelerate progress in the field and unlock the transformative power of quantum mechanics in the communications industry. Its user-friendly Python interface and comprehensive documentation ensure widespread accessibility and usability, making QuReed a valuable resource for advancing quantum communication technologies.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# 埋め込みモデルはいつ、他のモデルよりも確率が高いのか?

When is an Embedding Model More Promising than Another? ( http://arxiv.org/abs/2406.07640v1 )

ライセンス: Link先を確認

Maxime Darrin, Philippe Formont, Ismail Ben Ayed, Jackie CK Cheung, Pablo Piantanida,

(参考訳) 埋め込みは機械学習において中心的な役割を担い、任意のオブジェクトを数値表現に投影することで、様々な下流タスクの実行に活用することができる。埋め込みモデルの評価は、典型的には、下流タスクを利用したドメイン固有の経験的アプローチに依存している。しかし、これらの評価を行うための適切な大規模で代表的なデータセットを取得することは必ずしも可能ではなく、違法に高価で時間を要することを証明できる。本稿では,組込み装置の評価に統一的なアプローチを提案する。まず, 埋め込みモデルを比較し, 十分性および情報性の概念に基づく理論的基礎を確立する。次に、これらの概念を活用して、抽出可能な比較基準(情報充足性)を考案し、タスクに依存しない自己監督的なランク付け手順を導出する。提案手法は,自然言語処理と分子生物学の両方において,様々な下流作業を容易にするために,モデル埋め込みの能力と密接に一致していることを実験的に実証した。これは、実践者がモデルトライアルを優先順位付けするための貴重なツールを効果的に提供します。

Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to perform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately large and representative datasets for conducting these assessments is not always viable and can prove to be prohibitively expensive and time-consuming. In this paper, we present a unified approach to evaluate embedders. First, we establish theoretical foundations for comparing embedding models, drawing upon the concepts of sufficiency and informativeness. We then leverage these concepts to devise a tractable comparison criterion (information sufficiency), leading to a task-agnostic and self-supervised ranking procedure. We demonstrate experimentally that our approach aligns closely with the capability of embedding models to facilitate various downstream tasks in both natural language processing and molecular biology. This effectively offers practitioners a valuable tool for prioritizing model trials.

翻訳日:2024-06-13 21:35:30 公開日:2024-06-11

# ノード埋め込みのための人間の理解できない説明の生成

Generating Human Understandable Explanations for Node Embeddings ( http://arxiv.org/abs/2406.07642v1 )

ライセンス: Link先を確認

Zohair Shafi, Ayan Chatterjee, Tina Eliassi-Rad,

(参考訳) ノード埋め込みアルゴリズムはグラフ内のノードの低次元潜在表現を生成する。これらの埋め込みは、ノード分類やリンク予測といった下流タスクによく使用される。本稿では,次の2つの質問について検討する: (Q1) 埋め込み次元を人間の理解可能なグラフ特徴(例えば,クラスタリング係数,PageRank)で説明できる。 (Q2) 既存のノード埋め込みアルゴリズムをどう修正すれば、人間の理解可能なグラフ機能で簡単に説明できる埋め込みを生成することができるのか? Q1への回答はイエスであり、Q2に答えるためにXM(eXplain eMbeddingのショート)と呼ばれる新しいフレームワークを導入する。 XMの重要な側面は、生成された説明の核規範を最小化することである。核規範を最小化することにより、生成した説明のエントロピーの低い境界を最小化することを示す。我々は,XMを実世界の様々なグラフ上でテストし,XMが既存のノード埋め込み手法の性能を保っているだけでなく,その説明可能性も向上していることを示す。

Node embedding algorithms produce low-dimensional latent representations of nodes in a graph. These embeddings are often used for downstream tasks, such as node classification and link prediction. In this paper, we investigate the following two questions: (Q1) Can we explain each embedding dimension with human-understandable graph features (e.g. degree, clustering coefficient and PageRank). (Q2) How can we modify existing node embedding algorithms to produce embeddings that can be easily explained by human-understandable graph features? We find that the answer to Q1 is yes and introduce a new framework called XM (short for eXplain eMbedding) to answer Q2. A key aspect of XM involves minimizing the nuclear norm of the generated explanations. We show that by minimizing the nuclear norm, we minimize the lower bound on the entropy of the generated explanations. We test XM on a variety of real-world graphs and show that XM not only preserves the performance of existing node embedding methods, but also enhances their explainability.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# SSNVC:意図しないテンポラル情報を用いた単一ストリームニューラルビデオ圧縮

SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information ( http://arxiv.org/abs/2406.07645v1 )

ライセンス: Link先を確認

Feng Wang, Haihang Ruan, Zhihuang Xie, Ronggang Wang, Xiangyu Yue,

(参考訳) 近年、ニューラルビデオ圧縮(NVC)技術は、従来の失われたビデオコーデックを超越したパフォーマンスを達成している。しかし、既存のNVC手法の多くは、正確なコンテキスト特徴を生成するために、MV(Motion Vector)の送信に大きく依存している。 1) MVの圧縮と送信には,モジュールを冗長にする特殊なMVエンコーダとデコーダが必要である。 2)MVエンコーダデコーダが存在するため,訓練戦略は複雑である。本稿では,複雑なMVエンコーダ・デコーダ構造を除去し,一段階のトレーニング戦略を用いる,Noval Single Stream NVC framework (SSNVC)を提案する。 SSNVCは、現在のエントロピーモデルに以前のエントロピーモデル機能を追加し、以前の2フレームを使用してデコーダ側で予測された動き情報を生成することで、時間情報を暗黙的に利用する。さらに,フレーム生成装置を改良し,高品質な再構成フレームを生成する。実験により、SSNVCは複数のベンチマークで最先端のパフォーマンスを達成でき、圧縮プロセスとトレーニングプロセスを大幅に単純化できることが示された。

Recently, Neural Video Compression (NVC) techniques have achieved remarkable performance, even surpassing the best traditional lossy video codec. However, most existing NVC methods heavily rely on transmitting Motion Vector (MV) to generate accurate contextual features, which has the following drawbacks. (1) Compressing and transmitting MV requires specialized MV encoder and decoder, which makes modules redundant. (2) Due to the existence of MV Encoder-Decoder, the training strategy is complex. In this paper, we present a noval Single Stream NVC framework (SSNVC), which removes complex MV Encoder-Decoder structure and uses a one-stage training strategy. SSNVC implicitly use temporal information by adding previous entropy model feature to current entropy model and using previous two frame to generate predicted motion information at the decoder side. Besides, we enhance the frame generator to generate higher quality reconstructed frame. Experiments demonstrate that SSNVC can achieve state-of-the-art performance on multiple benchmarks, and can greatly simplify compression process as well as training process.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# 音声強調のための事前学習特徴誘導拡散モデル

Pre-training Feature Guided Diffusion Model for Speech Enhancement ( http://arxiv.org/abs/2406.07646v1 )

ライセンス: Link先を確認

Yiyuan Yang, Niki Trigoni, Andrew Markham,

(参考訳) 音声強調は、雑音の多い環境下での音声の明瞭さと明瞭さを著しく改善し、コミュニケーションと聴取経験を向上する。本稿では,既存の識別モデルと生成モデルの限界に対処する,効率的な音声強調に適した,事前学習型特徴誘導拡散モデルを提案する。スペクトル特徴を可変オートエンコーダ (VAE) に統合し, 逆処理の指導に事前学習した特徴を活用することにより, サンプリングステップの合理化に決定論的離散積分法 (DDIM) を併用することにより, 効率と音声強調品質を向上させる。異なるSNRを持つ2つの公開データセットの最先端結果を示すため、我々のモデルは効率とロバスト性において他のベースラインよりも優れている。提案手法は, 性能を最適化するだけでなく, 計算要求を増大させることなく, 実用的な展開能力を向上する。

Speech enhancement significantly improves the clarity and intelligibility of speech in noisy environments, improving communication and listening experiences. In this paper, we introduce a novel pretraining feature-guided diffusion model tailored for efficient speech enhancement, addressing the limitations of existing discriminative and generative models. By integrating spectral features into a variational autoencoder (VAE) and leveraging pre-trained features for guidance during the reverse process, coupled with the utilization of the deterministic discrete integration method (DDIM) to streamline sampling steps, our model improves efficiency and speech enhancement quality. Demonstrating state-of-the-art results on two public datasets with different SNRs, our model outshines other baselines in efficiency and robustness. The proposed method not only optimizes performance but also enhances practical deployment capabilities, without increasing computational demands.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# FP-Inconsistent:Browser Fingerprintの不整合を用いた侵入ボットの検出

FP-Inconsistent: Detecting Evasive Bots using Browser Fingerprint Inconsistencies ( http://arxiv.org/abs/2406.07647v1 )

ライセンス: Link先を確認

Hari Venugopalan, Shaoor Munir, Shuaib Ahmed, Tangbaihe Wang, Samuel T. King, Zubair Shafiq,

(参考訳) ブラウザの指紋認証がますますボット検出に使われている中、ボットは回避のために指紋を変更し始めている。本研究では,回避ボットの大規模な評価を行い,指紋の改ざんが検出の妨げになるかどうかを調査する。回避ボットを体系的に調査するために,2つのアンチボットサービス(DataDomeとBotD)と20種類のボットサービスからのボットトラフィックを取り入れたハニーサイトをデプロイした。ハニーサイトの20のボットサービスからの50万件のリクエストのうち、DataDomeに対する平均回避率は52.93%、BotDに対する平均回避率は44.56%である。ボットサービスとボットサービスの両方を個別に回避するボットサービスによる指紋属性の比較は、ボットサービスが実際に回避のために異なるブラウザ指紋属性を変更していることを示している。さらに,本研究では,回避ボットにおける指紋属性の不整合の存在を明らかにした。回避ボットは, 指紋属性の整合性を確保するのに困難であると考えられるため, 空間的不整合(ブラウザ指紋の2つの属性)と時間的(2つの異なる点における単一の属性)を検出するためのデータ駆動型アプローチを提案する。これらのルールは、アンチボットサービスによって容易にデプロイでき、DataDomeとBotDに対する回避ボットの回避率をそれぞれ48.11%、44.95%削減する。

As browser fingerprinting is increasingly being used for bot detection, bots have started altering their fingerprints for evasion. We conduct the first large-scale evaluation of evasive bots to investigate whether and how altering fingerprints helps bots evade detection. To systematically investigate evasive bots, we deploy a honey site incorporating two anti-bot services (DataDome and BotD) and solicit bot traffic from 20 different bot services that purport to sell "realistic and undetectable traffic". Across half a million requests from 20 different bot services on our honey site, we find an average evasion rate of 52.93% against DataDome and 44.56% evasion rate against BotD. Our comparison of fingerprint attributes from bot services that evade each anti-bot service individually as well as bot services that evade both shows that bot services indeed alter different browser fingerprint attributes for evasion. Further, our analysis reveals the presence of inconsistent fingerprint attributes in evasive bots. Given evasive bots seem to have difficulty in ensuring consistency in their fingerprint attributes, we propose a data-driven approach to discover rules to detect such inconsistencies across space (two attributes in a given browser fingerprint) and time (a single attribute at two different points in time). These rules, which can be readily deployed by anti-bot services, reduce the evasion rate of evasive bots against DataDome and BotD by 48.11% and 44.95% respectively.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# M-LRM:多視点大規模再構成モデル

M-LRM: Multi-view Large Reconstruction Model ( http://arxiv.org/abs/2406.07648v1 )

ライセンス: Link先を確認

Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Xiaowei Chi, Xingqun Qi, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo,

(参考訳) 大規模再構成モデル(LRM)の最近の進歩にもかかわらず、単一の画像から複数の画像への入力を拡大する際には、非効率性、幾何学的およびテクスチャの質、および予想以上に収束速度が遅くなる。 LRMは、入力画像間の強い3Dコヒーレンスを無視して、3D再構成を自然な画像から3Dへの変換問題として定式化する。本稿では,M-LRM(Multi-view Large Restruction Model)を提案する。具体的には、M-LRMが入力画像から情報を正確にクエリできるマルチビュー整合型クロスアテンション方式を提案する。さらに、入力された多視点画像の3次元先行情報を用いて、三面体トークンを初期化する。 LRMと比較すると、提案したM-LRMは128ドル(約1,800円)の3次元のNeRFを生成し、高忠実度の3次元形状を生成することができる。実験により,本モデルがLRMよりも優れた性能向上と訓練収束を達成できることが実証された。プロジェクトページ:https://murphylmf.github.io/M-LRM/

Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the strong 3D coherence among the input images. In this paper, we propose a Multi-view Large Reconstruction Model (M-LRM) designed to efficiently reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner. Specifically, we introduce a multi-view consistent cross-attention scheme to enable M-LRM to accurately query information from the input images. Moreover, we employ the 3D priors of the input multi-view images to initialize the tri-plane tokens. Compared to LRM, the proposed M-LRM can produce a tri-plane NeRF with $128 \times 128$ resolution and generate 3D shapes of high fidelity. Experimental studies demonstrate that our model achieves a significant performance gain and faster training convergence than LRM. Project page: https://murphylmf.github.io/M-LRM/

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# 逐次ノイズ測定による資源の回収

Recovery of resource through sequential noisy measurements ( http://arxiv.org/abs/2406.07652v1 )

ライセンス: Link先を確認

Sudipta Mondal, Pritam Halder, Amit Kumar Pal, Aditi Sen De,

(参考訳) 量子情報プロトコルに組み込まれたノイズのないアンシャープ測定は、性能を阻害し、量子上の優位性を低下させる可能性がある。しかし、量子ネットワーク内のノード間の量子相関を完全に破壊する射影測定とは異なり、ノイズ測定の逐次的な応用は、量子情報処理タスクにおける測定装置のノイズの悪影響を軽減することができる。量子ネットワークにおける選択ノードの絡み合いに集中する場合,量子ビットのアシストによる雑音測定によりこれを実証する。 3つ以上の量子ビットのクラスタを持つネットワークの場合、アシスト量子ビット上で最適なアンシャープ測定を順次行うと、同じアシスト量子ビット上での最適射影測定により得られるものと似た2つのノード間の局所的絡み合いが得られることを示す。さらに, 連続雑音測定を用いた提案手法は, 特定の量子スキームの資源となる所望の状態の調製に有効である可能性が示唆された。また、量子通信に影響を及ぼす可能性のある鋭い計測ベースのプロトコルとは対照的に、量子ビットのアシストは、アンシャープ測定によって絡み合いが集中する量子ビットをより多く制御できると主張している。

Noisy unsharp measurements incorporated in quantum information protocols may hinder performance, reducing the quantum advantage. However, we show that, unlike projective measurements which completely destroy quantum correlations between nodes in quantum networks, sequential applications of noisy measurements can mitigate the adverse impact of noise in the measurement device on quantum information processing tasks. We demonstrate this in the case of concentrating entanglement on chosen nodes in quantum networks via noisy measurements performed by assisting qubits. In the case of networks with a cluster of three or higher number of qubits, we exhibit that sequentially performing optimal unsharp measurements on the assisting qubits yields localizable entanglement between two nodes akin to that obtained by optimal projective measurements on the same assisting qubits. Furthermore, we find that the proposed approach using consecutive noisy measurements can potentially be used to prepare desired states that are resource for specific quantum schemes. We also argue that assisting qubits have greater control over the qubits on which entanglement is concentrated via unsharp measurements, in contrast to sharp measurement-based protocols, which may have implications for secure quantum communication.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# OPTune: 効率的なオンライン参照チューニング

OPTune: Efficient Online Preference Tuning ( http://arxiv.org/abs/2406.07657v1 )

ライセンス: Link先を確認

Lichang Chen, Jiuhai Chen, Chenxi Liu, John Kirchenbauer, Davit Soselia, Chen Zhu, Tom Goldstein, Tianyi Zhou, Heng Huang,

(参考訳) RLHF(Reinforcement Learning with Human feedback)は、Large Language Models(LLM)を人間の好みに合わせるために重要である。 RLHF のオフライン版である \emph{e g } direct preference optimization (DPO) と比較して、最近の研究ではオンライン版の方がアライメントがさらに優れていることが示されている。しかし、オンラインアライメントには、コストがかかり、並列化が困難で、さまざまな品質と実用性に苦しむ、新たなトレーニングデータをオンザフライで生成する必要がある。本稿では,オンライン嗜好調整(OPTune)のためのより効率的なデータ探索手法を提案する。データ生成中、OPTuneは(再)生成された応答が既存の応答よりも情報的かつ高品質なトレーニング信号を提供するプロンプトのみを選択する。トレーニング目標では、OPTuneは、各生成された応答(ペア)をそのユーティリティによって再重み付けし、アライメントを改善し、学習が最も有用なサンプルに集中できるようにしている。我々の評価を通じて、OPTuneのLLMは、効率的なデータ探索戦略により1.27-1.56倍高速なトレーニング速度を享受しながら、標準設定チューニングによって提供される命令追従の利点を維持している。

Reinforcement learning with human feedback~(RLHF) is critical for aligning Large Language Models (LLMs) with human preference. Compared to the widely studied offline version of RLHF, \emph{e.g.} direct preference optimization (DPO), recent works have shown that the online variants achieve even better alignment. However, online alignment requires on-the-fly generation of new training data, which is costly, hard to parallelize, and suffers from varying quality and utility. In this paper, we propose a more efficient data exploration strategy for online preference tuning (OPTune), which does not rely on human-curated or pre-collected teacher responses but dynamically samples informative responses for on-policy preference alignment. During data generation, OPTune only selects prompts whose (re)generated responses can potentially provide more informative and higher-quality training signals than the existing responses. In the training objective, OPTune reweights each generated response (pair) by its utility in improving the alignment so that learning can be focused on the most helpful samples. Throughout our evaluations, OPTune'd LLMs maintain the instruction-following benefits provided by standard preference tuning whilst enjoying 1.27-1.56x faster training speed due to the efficient data exploration strategy.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# Treeffuser: 勾配ブースト木を用いた条件拡散による確率予測

Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees ( http://arxiv.org/abs/2406.07658v1 )

ライセンス: Link先を確認

Nicolas Beltran-Velez, Alessandro Antonio Grande, Achille Nazaret, Alp Kucukelbir, David Blei,

(参考訳) 確率予測は単点予測よりも予測分布を計算することを目的としている。これらの分布により、実践者は不確実性を定量化し、リスクを計算し、外れ値を検出することができる。しかしながら、ほとんどの確率的手法はガウス分布やポアソン分布のようなパラメトリック応答を仮定する。これらの仮定が失敗すると、そのようなモデルは予測が悪く、不確実性が不確かである。本稿では,表型データに対する確率的予測法であるTreeffuserを提案する。傾き木を用いてスコア関数を推定する条件拡散モデルを学習する。条件付き拡散モデルにより、Treeffuserは柔軟で非パラメトリックになり、グラデーションブーストツリーは、CPU上でのトレーニングが堅牢で簡単になる。 Treeffuserはよく校正された予測分布を学習し、多変量、マルチモーダル、歪んだ応答を含む幅広い回帰タスクを処理できる。 1%, カテゴリー的予測器および欠落データとともに, 合成および実データに基づいてTreeffuserを研究した結果, 既存の手法よりも優れ, よりキャリブレーションのよい確率予測が得られた。さらに、Walmartの営業データを用いた不確実性の下での在庫配分への応用について、その汎用性を実証する。 Treeffuser は \href{https://github.com/blei-lab/treeffuser}{https://github.com/blei-lab/treeffuser} に実装しています。

Probabilistic prediction aims to compute predictive distributions rather than single-point predictions. These distributions enable practitioners to quantify uncertainty, compute risk, and detect outliers. However, most probabilistic methods assume parametric responses, such as Gaussian or Poisson distributions. When these assumptions fail, such models lead to bad predictions and poorly calibrated uncertainty. In this paper, we propose Treeffuser, an easy-to-use method for probabilistic prediction on tabular data. The idea is to learn a conditional diffusion model where the score function is estimated using gradient-boosted trees. The conditional diffusion model makes Treeffuser flexible and non-parametric, while the gradient-boosted trees make it robust and easy to train on CPUs. Treeffuser learns well-calibrated predictive distributions and can handle a wide range of regression tasks -- including those with multivariate, multimodal, and skewed responses. % , as well as categorical predictors and missing data We study Treeffuser on synthetic and real data and show that it outperforms existing methods, providing better-calibrated probabilistic predictions. We further demonstrate its versatility with an application to inventory allocation under uncertainty using sales data from Walmart. We implement Treeffuser in \href{https://github.com/blei-lab/treeffuser}{https://github.com/blei-lab/treeffuser}.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# 量子コンピュータのベンチマークのための多部非局所性の生成

Generating multipartite nonlocality to benchmark quantum computers ( http://arxiv.org/abs/2406.07659v1 )

ライセンス: Link先を確認

Jan Lennart Bönsel, Otfried Gühne, Adán Cabello,

(参考訳) 量子コンピュータは, 大規模に$n$の非局所性を生成するために利用でき, それらをベンチマークする方法を提供する。克服すべき主な課題は次のとおりである。 (i)相互作用トポロジーは任意の2ビットゲートを許さないかもしれない。 (二)ベル違反の騒音 (3)局所測定の組み合わせ数は指数関数的に$n$と増加する。乗り越える i) コンピュータの2ビット接続と互換性のあるグラフを効率的に作成できることを指摘した。 mitigate (複数形 mitigates) (ii) 特定のグラフ状態に対して、ホワイトノイズに対する抵抗が指数関数的に$n$で増加するような$n$-partite Bell不等式が存在することに留意する。宛て (iii)任意の$nおよび接続性に対して、ランダムサンプリングに依存する推定器を導入する。その結果,これまでにない大額な$n$で$n$パーティトベル非局所性を生成する方法が提案された。これにより、量子ビットの数や接続性に関わらず、古典的でない相関をベンチマークすることができる。我々は、少なくとも$n=24$ qubitsで$n$-partite Bell非局所性を予測できる、ノイズの多いIBM量子コンピュータのシミュレーションを用いて、我々のアプローチをテストする。

We show that quantum computers can be used for producing large $n$-partite nonlocality, thereby providing a method to benchmark them. The main challenges to overcome are: (i) The interaction topology might not allow arbitrary two-qubit gates. (ii) Noise limits the Bell violation. (iii) The number of combinations of local measurements grows exponentially with $n$. To overcome (i), we point out that graph states that are compatible with the two-qubit connectivity of the computer can be efficiently prepared. To mitigate (ii), we note that, for specific graph states, there are $n$-partite Bell inequalities whose resistance to white noise increases exponentially with $n$. To address (iii) for any $n$ and any connectivity, we introduce an estimator that relies on random sampling. As a result, we propose a method for producing $n$-partite Bell nonlocality with unprecedented large $n$. This allows in return to benchmark nonclassical correlations regardless of the number of qubits or the connectivity. We test our approach by using a simulation for a noisy IBM quantum computer, which predicts $n$-partite Bell nonlocality for at least $n=24$ qubits.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# ROADWorkデータセット:ワークゾーンを認識し、観察し、分析し、運転する学習

ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones ( http://arxiv.org/abs/2406.07661v1 )

ライセンス: Link先を確認

Anurag Ghosh, Robert Tamburo, Shen Zheng, Juan R. Alvarez-Padilla, Hailiang Zhu, Michael Cardei, Nicholas Dunn, Christoph Mertz, Srinivasa G. Narasimhan,

(参考訳) 自動運転研究の大きな進歩にもかかわらず、ワークゾーンの認識とナビゲートは困難で、未調査だ。重要な理由は、この長いシナリオに対処する新しいアルゴリズムを開発するためのオープンデータセットがないことである。 ROADWorkデータセットを提案し、ワークゾーンの認識、観察、分析、運転の仕方を学習する。最先端のファンデーションモデルでは、作業ゾーンではパフォーマンスが悪いことが分かりました。本データセットでは,作業ゾーン検出(+26.2 AP),高精度(+32.5%),発見率(12.8倍),検出(+23.9 AP),読取(+14.2%1-NED),作業ゾーン記述(+36.7 SPICE)の改善を行った。また、作業ゾーンのナビゲーションビデオから乾燥可能な経路を計算し、53.6%の目標が角誤差 (AE) <0.5度 (+9.9 %) で75.3%の経路がAE <0.5度 (+8.1 %) であるような航法目標や経路を予測することができることを示した。

Perceiving and navigating through work zones is challenging and under-explored, even with major strides in self-driving research. An important reason is the lack of open datasets for developing new algorithms to address this long-tailed scenario. We propose the ROADWork dataset to learn how to recognize, observe and analyze and drive through work zones. We find that state-of-the-art foundation models perform poorly on work zones. With our dataset, we improve upon detecting work zone objects (+26.2 AP), while discovering work zones with higher precision (+32.5%) at a much higher discovery rate (12.8 times), significantly improve detecting (+23.9 AP) and reading (+14.2% 1-NED) work zone signs and describing work zones (+36.7 SPICE). We also compute drivable paths from work zone navigation videos and show that it is possible to predict navigational goals and pathways such that 53.6% goals have angular error (AE) < 0.5 degrees (+9.9 %) and 75.3% pathways have AE < 0.5 degrees (+8.1 %).

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# fNIRSによる画像の復号化に向けて

Progress Towards Decoding Visual Imagery via fNIRS ( http://arxiv.org/abs/2406.07662v1 )

ライセンス: Link先を確認

Michel Adamic, Wellington Avelino, Anna Brandenberger, Bryan Chiang, Hunter Davis, Stephen Fay, Andrew Gregory, Aayush Gupta, Raphael Hotter, Grace Jiang, Fiona Leng, Stephen Polcyn, Thomas Ribeiro, Paul Scotti, Michelle Wang, Marley Xiong, Jonathan Xu,

(参考訳) 我々は,fNIRS脳活動からのイメージ再構成の可能性を示し,必要な仕様に適合するプロトタイプの構築に着手する。縮小されたfMRIデータを用いて画像再構成モデルを訓練することにより,cmスケールの空間分解能は画像生成に十分であることがわかった。その結果, フル解像度fMRIでは93%, 2cmでは20%の精度で検索精度は71%であった。シミュレーションと高密度トモグラフィにより,時間領域fNIRSは連続波fNIRSの2cm分解能と比較して1cm分解能が得られることがわかった。最後に,レーザードライバ,光子検出器,デジタルコンバータシステムからなるプロトタイプの時間領域fNIRSデバイスの設計を共有する。

We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 20% with 2-cm resolution. With simulations and high-density tomography, we found that time-domain fNIRS can achieve 1-cm resolution, compared to 2-cm resolution for continuous-wave fNIRS. Lastly, we share designs for a prototype time-domain fNIRS device, consisting of a laser driver, a single photon detector, and a time-to-digital converter system.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# グラフマッチング問題の整数計画定式化のための統一フレームワーク

A Unified Framework for Integer Programming Formulation of Graph Matching Problems ( http://arxiv.org/abs/2406.07666v1 )

ライセンス: Link先を確認

Bahram Alidaee, Haibo Wang, Hugh Sloan,

(参考訳) グラフ理論は、あらゆる分野における困難で複雑な問題を解決する強力なツールである。特に、グラフマッチングは、膨大な応用を伴うパターン解析における古典的な問題である。多くのグラフ問題は数学的プログラムとして定式化され、正確な、ヒューリスティックな、あるいは近似された保証された手順を用いて解かれる。一方、グラフ理論は複雑な数学的プログラミング問題、特に整数プログラムを可視化し理解するための強力なツールである。グラフ問題を自然整数プログラム(IP)として定式化することは、しばしば難しい課題である。しかし、IPの定式化には多くの利点がある。数人の研究者が、グラフ理論問題の自然なIP定式化の必要性について言及している。本研究の目的は,グラフマッチング問題のIP定式化のための統一的なフレームワークを提供することである。グラフマッチング問題に関する多くの調査があるが、IPの定式化には関心がない。本稿では,このような問題に対する包括的IP定式化を初めて提供する。このフレームワークには、文献における様々なグラフ最適化の問題が含まれている。しかしながら、これらの問題は異なる研究コミュニティによって研究されてきたが、ここで提示される枠組みは、このような多様で複雑な問題に取り組むために、異なる分野からの取り組みを促進するのに役立っている。本研究は,特にパターン解析において,実際に発生する難題のいくつかを単純化する上で,極めて有効であることを期待する。

Graph theory has been a powerful tool in solving difficult and complex problems arising in all disciplines. In particular, graph matching is a classical problem in pattern analysis with enormous applications. Many graph problems have been formulated as a mathematical program and then solved using exact, heuristic, and/or approximated-guaranteed procedures. On the other hand, graph theory has been a powerful tool in visualizing and understanding complex mathematical programming problems, especially integer programs. Formulating a graph problem as a natural integer program (IP) is often a challenging task. However, an IP formulation of the problem has many advantages. Several researchers have noted the need for natural IP formulation of graph theoretic problems. The present study aims to provide a unified framework for IP formulation of graph-matching problems. Although there are many surveys on graph matching problems, none is concerned with IP formulation. This paper is the first to provide a comprehensive IP formulation for such problems. The framework includes a variety of graph optimization problems in the literature. While these problems have been studied by different research communities, however, the framework presented here helps to bring efforts from different disciplines to tackle such diverse and complex problems. We hope the present study can significantly help to simplify some of the difficult problems arising in practice, especially in pattern analysis.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# PLT-D3:ステレオ深度とシーンフローのための高忠実度動的運転シミュレーションデータセット

PLT-D3: A High-fidelity Dynamic Driving Simulation Dataset for Stereo Depth and Scene Flow ( http://arxiv.org/abs/2406.07667v1 )

ライセンス: Link先を確認

Joshua Tokarsky, Ibrahim Abdulhafiz, Satya Ayyalasomayajula, Mostafa Mohsen, Navya G. Rao, Adam Forbes,

(参考訳) 自律運転は、計算ハードウェアと高度なディープラーニング方法論の革新に支えられ、目覚ましい進歩を遂げてきた。これらの進歩の基盤はデータセットの可用性と品質に依存しており、信頼性と汎用的な自律運転アルゴリズムの開発と改良に不可欠である。自律運転認識技術の進化を支援するために多くのデータセットが開発されているが、様々な気象条件下でシステムの堅牢性を徹底的にテストし強化するために必要な多様性を提供するものはほとんどない。多くの公開データセットは、挑戦的な気象シナリオと詳細な高解像度データに関する包括的なカバレッジを欠いている。本稿では,各種気象条件に対する自律運転システムの適応性向上を目的とした動的天候駆動データセットであるPLT-D3を紹介する。 PLT-D3は、Unreal Engine 5を用いて生成された高忠実度ステレオ深度およびシーンフローグラウンド真理データを提供する。特に、このデータセットには、雨、雪、霧、様々な照明条件を含む幅広い動的気象シナリオを再現する、同期された高解像度ステレオ画像シーケンスが含まれており、シミュレーションベースのテストでは前例のないレベルのリアリズムを提供する。 PLT-D3の主な目的は、現実世界の気象変動をシミュレートできる総合的な訓練と試験資源の不足に対処することである。 PLT-D3を用いたいくつかの重要な自律運転タスクのためのベンチマークが確立されている。

Autonomous driving has experienced remarkable progress, bolstered by innovations in computational hardware and sophisticated deep learning methodologies. The foundation of these advancements rests on the availability and quality of datasets, which are crucial for the development and refinement of dependable and versatile autonomous driving algorithms. While numerous datasets have been developed to support the evolution of autonomous driving perception technologies, few offer the diversity required to thoroughly test and enhance system robustness under varied weather conditions. Many public datasets lack the comprehensive coverage of challenging weather scenarios and detailed, high-resolution data, which are critical for training and validating advanced autonomous-driving perception models. In this paper, we introduce PLT-D3; a Dynamic-weather Driving Dataset, designed specifically to enhance autonomous driving systems' adaptability to diverse weather conditions. PLT-D3 provides high-fidelity stereo depth and scene flow ground truth data generated using Unreal Engine 5. In particular, this dataset includes synchronized high-resolution stereo image sequences that replicate a wide array of dynamic weather scenarios including rain, snow, fog, and diverse lighting conditions, offering an unprecedented level of realism in simulation-based testing. The primary aim of PLT-D3 is to address the scarcity of comprehensive training and testing resources that can simulate real-world weather variations. Benchmarks have been established for several critical autonomous driving tasks using PLT-D3, such as depth estimation, optical flow and scene-flow to measure and enhance the performance of state-of-the-art models.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# フェルミオン計数による一般化ゼノ効果と絡み合いダイナミクス

Generalized Zeno effect and entanglement dynamics induced by fermion counting ( http://arxiv.org/abs/2406.07673v1 )

ライセンス: Link先を確認

Elias Starchl, Mark H. Fischer, Lukas M. Sieberer,

(参考訳) 本研究では, 粒子をその環境と交換する一般計測プロセスによる自由フェルミオンの1次元格子系について検討するが, それぞれのフェルミオンの離脱・入射はカウントされる。格子サイト占有数の頻繁な測定によるダイナミクスの凍結とは対照的に、フェルミオン数の増加は系の状態の急激な変動を引き起こす。それでも、量子軌道の数値シミュレーションと、複製ケルディシュ場理論に基づく解析的アプローチにより、フェルミオン計数および局所的占有測定による自由フェルミオンの瞬時相関と絡み合い特性が著しく類似していることが分かる。この類似性は、フェルミオンカウントによって誘導される一般化されたゼノ効果と、$\mathrm{SU}(R)$非線形シグマモデルによる普遍長波長記述によって説明される。さらに, 両種類の測定プロセスにおいて, 対数的絡み合いと有限測定速度での共形不変性を有する臨界相の存在に対する強い証拠を示す。代わりに、共形不変量のシグネチャが観測可能な長さスケールの、明確に定義された有限臨界範囲を同定する。面積法的な絡み合いは、測定速度において指数関数的に大きいスケールを超えて確立されるが、臨界範囲の上界は代数的に大きく、したがって数値的にアクセス可能である。

We study a one-dimensional lattice system of free fermions subjected to a generalized measurement process: the system exchanges particles with its environment, but each fermion leaving or entering the system is counted. In contrast to the freezing of dynamics due to frequent measurements of lattice-site occupation numbers, a high rate of fermion counts induces fast fluctuations in the state of the system. Still, through numerical simulations of quantum trajectories and an analytical approach based on replica Keldysh field theory, we find that instantaneous correlations and entanglement properties of free fermions subjected to fermion counting and local occupation measurements are strikingly similar. We explain this similarity through a generalized Zeno effect induced by fermion counting and a universal long-wavelength description in terms of an $\mathrm{SU}(R)$ nonlinear sigma model. Further, for both types of measurement processes, we present strong evidence against the existence of a critical phase with logarithmic entanglement and conformal invariance at finite measurement rates. Instead, we identify a well-defined and finite critical range of length scales on which signatures of conformal invariance are observable. While area-law entanglement is established beyond a scale that is exponentially large in the measurement rate, the upper boundary of the critical range is only algebraically large and thus numerically accessible.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# ディープラーニングを用いた自動舗装き裂検出と分類

Automated Pavement Cracks Detection and Classification Using Deep Learning ( http://arxiv.org/abs/2406.07674v1 )

ライセンス: Link先を確認

Selvia Nafaa, Hafsa Essam, Karim Ashour, Doaa Emad, Rana Mohamed, Mohammed Elhenawy, Huthaifa I. Ashqar, Abdallah A. Hassan, Taqwa I. Alhadidi,

(参考訳) 効率的な輸送資産管理を構築する上で、資産状況のモニタリングが重要な要素である。画像処理の進歩により、従来の手動分類はセミオートマチック/オートマチック技術に置き換えられている。その結果,自動資産検出・分類技術が求められた。本稿では, 道路舗装の亀裂の検出と分類を, 有名なYou Only Look Once (YOLO) バージョン5 (YOLOv5) とバージョン8 (YOLOv8) のアルゴリズムを用いて行う手法を提案する。実験結果から, 照明条件と画像サイズが異なる場合, 舗装き裂検出精度は67.3%に達することがわかった。本研究は,異なる照明条件下での資産状況の正確な検出・分類を支援することを目的としている。これにより、手動検査に伴うコストと時間を削減し、ハイウェイ資産維持のコストを大幅に削減することができる。

Monitoring asset conditions is a crucial factor in building efficient transportation asset management. Because of substantial advances in image processing, traditional manual classification has been largely replaced by semi-automatic/automatic techniques. As a result, automated asset detection and classification techniques are required. This paper proposes a methodology to detect and classify roadway pavement cracks using the well-known You Only Look Once (YOLO) version five (YOLOv5) and version 8 (YOLOv8) algorithms. Experimental results indicated that the precision of pavement crack detection reaches up to 67.3% under different illumination conditions and image sizes. The findings of this study can assist highway agencies in accurately detecting and classifying asset conditions under different illumination conditions. This will reduce the cost and time that are associated with manual inspection, which can greatly reduce the cost of highway asset maintenance.

翻訳日:2024-06-13 21:25:46 公開日:2024-06-11

# FastAST:Token Mergingとクロスモデル知識蒸留によるオーディオスペクトログラム変換器の高速化

FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation ( http://arxiv.org/abs/2406.07676v1 )

ライセンス: Link先を確認

Swarup Ranjan Behera, Abhishek Dhiman, Karthik Gowda, Aalekhya Satya Narayani,

(参考訳) 音声分類モデル、特にAudio Spectrogram Transformer(AST)は、効率的な音声分析において重要な役割を果たす。しかし、精度を損なうことなく効率を最適化することは依然として課題である。本稿では,Token Merging(ToMe)をASTフレームワークに統合するフレームワークであるFastASTを紹介する。 FastASTは、オーディオスペクトログラムに類似のトークンをマージすることで、広範な再トレーニングを必要とせずに、推論速度を向上させる。さらに、トレーニング中に、FastASTは大幅なスピード改善をもたらす。実験により、FastASTは精度に最小限の影響を与えることなく、オーディオ分類のスループットを向上できることが示された。精度への影響を軽減するため、Cross-Model Knowledge Distillation (CMKD)をFastASTフレームワークに統合する。 ToMeとCMKDをASTに統合すると、より高速な推論速度を維持しながら、ASTと比較して精度が向上する。 FastASTは、リアルタイムでリソース効率の良いオーディオ分析への一歩である。

Audio classification models, particularly the Audio Spectrogram Transformer (AST), play a crucial role in efficient audio analysis. However, optimizing their efficiency without compromising accuracy remains a challenge. In this paper, we introduce FastAST, a framework that integrates Token Merging (ToMe) into the AST framework. FastAST enhances inference speed without requiring extensive retraining by merging similar tokens in audio spectrograms. Furthermore, during training, FastAST brings about significant speed improvements. The experiments indicate that FastAST can increase audio classification throughput with minimal impact on accuracy. To mitigate the accuracy impact, we integrate Cross-Model Knowledge Distillation (CMKD) into the FastAST framework. Integrating ToMe and CMKD into AST results in improved accuracy compared to AST while maintaining faster inference speeds. FastAST represents a step towards real-time, resource-efficient audio analysis.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# 量子コンピュータ上の多体熱状態--変分的アプローチ

Many-body thermal states on a quantum computer: a variational approach ( http://arxiv.org/abs/2406.07677v1 )

ライセンス: Link先を確認

Mirko Consiglio, Tony J. G. Apollaro,

(参考訳) 熱平衡状態の多くの量子状態は自然界においてユビキタスである。それらの力学的性質を調べることは、ヒルベルト空間の複雑さのために、非常に難しい作業である。量子コンピュータは量子システムを効果的にシミュレートする可能性があり、多くのボディ状態は効率的なアルゴリズムによって忠実に準備できる。この目的により、量子$XY$モデルのギブス状態を作成するためのハイブリッド量子-古典的変分量子アルゴリズムを提案する。本アルゴリズムは,Grover と Rudolph のパラメトリゼーション量子回路を用いて,Gibs 状態のボルツマン重みを合成し,各ボルツマン重みにアイジェネギー基底を割り当てるためのパリティ保存アンサッツを用いている。本稿では,多体システムの対称性を利用して,Grover と Rudolph のアルゴリズムで要求される変動パラメータの指数関数的増加を著しく低減できることを示す。最後に、異なるパラメータのステートベクターシミュレーションによって得られた$XY$モデルのギブス状態の密度行列が、ギブス状態と正確なユニティに近い忠実性を示すことを示し、これが現在の量子コンピュータにおける我々のプロトコルの潜在的使用の可能性を浮き彫りにしている。

{Many-body quantum states at thermal equilibrium are ubiquitous in nature. Investigating their dynamical properties is a formidable task due to the complexity of the Hilbert space they live in. Quantum computers may have the potential to effectively simulate quantum systems, provided that the many-body state under scrutiny can be faithfully prepared via an efficient algorithm. With this aim, we present a hybrid quantum--classical variational quantum algorithm for the preparation of the Gibbs state of the quantum $XY$ model. Our algorithm is based on the Grover and Rudolph parametrized quantum circuit for the preparation of the Boltzmann weights of the Gibbs state, and on a parity-preserving ansatz for the allocation of the eigenenergy basis to their respective Boltzmann weight. We explicitly show, with a paradigmatic few-body case instance, how the symmetries of a many-body system can be exploited to significantly reduce the exponentially increasing number of variational parameters needed in the Grover and Rudolph algorithm. Finally, we show that the density matrix, of the Gibbs state of the $XY$ model, obtained by statevector simulations for different parameters, exhibits a fidelity close to unity with the exact Gibbs state; this highlights the potential use of our protocol on current quantum computers.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# ドローンビデオにおける高度な物体追跡のためのフレームワークAboveのSwarm Dynamics

Watching Swarm Dynamics from Above: A Framework for Advanced Object Tracking in Drone Videos ( http://arxiv.org/abs/2406.07680v1 )

ライセンス: Link先を確認

Duc Pham, Matthew Hansen, Félicie Dhellemmens, Jens Krause, Pia Bideau,

(参考訳) さまざまなセンサーを搭載したドローンのような、簡単にアクセスできるセンサーは、自然環境における動物行動の研究を大幅に拡大した。しかし、しばしば数時間にわたる膨大なラベルのないビデオデータを分析することは、機械学習、特にコンピュータビジョンにとって依然として課題である。既存のアプローチでは、ほんの数フレームしか分析できないことが多い。我々の焦点は、長期的な動物行動分析である。この課題に対処するために、粒子フィルタリングのような古典的確率的手法を用いて状態推定を行う。セマンティックオブジェクトセグメンテーションの最近の進歩を取り入れることで、データ可用性に制限のあるシナリオであっても、急速に進化するオブジェクトの連続的な追跡を可能にする。粒子フィルタは、新しい入ってくる情報を再帰的に追加するための、証明可能なアルゴリズム構造を提供する。本研究では,無人ドローン映像から海中の魚の群れを追跡する新しい手法を提案する。我々のフレームワークは、2Dで古典的な物体追跡を行うだけでなく、ビデオデータとドローンの搭載センサー情報(GPSとIMU)を融合させることで、世界座標における魚学校の位置と空間的拡張を追跡する。提示された枠組みにより、研究者は非侵襲的でスケーラブルな方法で、自然の社会的・環境的な文脈で魚学校の集団行動を研究することができる。

Easily accessible sensors, like drones with diverse onboard sensors, have greatly expanded studying animal behavior in natural environments. Yet, analyzing vast, unlabeled video data, often spanning hours, remains a challenge for machine learning, especially in computer vision. Existing approaches often analyze only a few frames. Our focus is on long-term animal behavior analysis. To address this challenge, we utilize classical probabilistic methods for state estimation, such as particle filtering. By incorporating recent advancements in semantic object segmentation, we enable continuous tracking of rapidly evolving object formations, even in scenarios with limited data availability. Particle filters offer a provably optimal algorithmic structure for recursively adding new incoming information. We propose a novel approach for tracking schools of fish in the open ocean from drone videos. Our framework not only performs classical object tracking in 2D, instead it tracks the position and spatial expansion of the fish school in world coordinates by fusing video data and the drone's on board sensor information (GPS and IMU). The presented framework for the first time allows researchers to study collective behavior of fish schools in its natural social and environmental context in a non-invasive and scalable way.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# 量子コンピューティングのための最適化QUBO定式化法

Optimized QUBO formulation methods for quantum computing ( http://arxiv.org/abs/2406.07681v1 )

ライセンス: Link先を確認

Dario De Santis, Salvatore Tirone, Stefano Marmi, Vittorio Giovannetti,

(参考訳) NISQデバイスでは、対応する2次非制約バイナリ最適化(QUBO)形式が導出されると、いくつかの組合せ最適化問題を解くことができる。本研究の目的は、これらのQUBO改革に必要な変数を劇的に削減し、NISQ機器の最適化問題に対する最適解を効率よく得られるようにすることである。これは、スラック変数の効率的な使用を可能にする新しいツールを導入することで実現される。我々は,新しい手法を2つの独立部分,すなわち反復二次多項式とマスター・サテライト法に分割する。そこで,本手法をNPハード最適化問題に応用する方法を,Max-Profit Balance Settlementと呼ばれる現実の金融シナリオにインスパイアされた場合に適用する方法を示す。 2つのD波量子異方体にこの問題のいくつかの事例を提出し、これらのシナリオで使用される標準手法と新しい手法の性能を比較した。さらに、本研究では、D波アドバンテージとAdvantage2量子アニールのいくつかの性能差を評価できる。

Several combinatorial optimization problems can be solved with NISQ devices once that a corresponding quadratic unconstrained binary optimization (QUBO) form is derived. The aim of this work is to drastically reduce the variables needed for these QUBO reformulations in order to unlock the possibility to efficiently obtain optimal solutions for a class of optimization problems with NISQ devices. This is achieved by introducing novel tools that allow an efficient use of slack variables, even for problems with non-linear constraints, without the need to approximate the starting problem. We divide our new techniques in two independent parts, called the iterative quadratic polynomial and the master-satellite methods. Hence, we show how to apply our techniques in case of an NP-hard optimization problem inspired by a real-world financial scenario called Max-Profit Balance Settlement. We follow by submitting several instances of this problem to two D-wave quantum annealers, comparing the performances of our novel approach with the standard methods used in these scenarios. Moreover, this study allows to appreciate several performance differences between the D-wave Advantage and Advantage2 quantum annealers.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# AIトリングがエンジニアリングワークスペースに与える影響

Impact of AI-tooling on the Engineering Workspace ( http://arxiv.org/abs/2406.07683v1 )

ライセンス: Link先を確認

Lena Chretien, Nikolas Albarran,

(参考訳) AI駆動のコーディングツールがエンジニアのワークフローや作業環境に与える影響を理解するために、私たちはJellyfishプラットフォームを使用して変化の指標を分析します。主な指標は、Allocations、Coding Fraction vs. PR Fraction、Lifecycle Phases、Cycle Time、Jiraチケットサイズ、PRピックアップ時間、PRコメント、PRコメント数、インタラクション、コーディング言語から導かれる。また,Copilot利用者のコーディング時間に有意な変化がみられ,平均3%の減少と最大15%の減少がみられた。 4社で平均16%減少し, サイクルタイムは8%減少したが, コントロールグループは変化を認めなかった。さらに、PRプロセスはCopilotの使用とともに進化し、週単位のPR数が一定であるにもかかわらず、より長い包括的なコメントが特徴となった。すべての企業で仮説変更が観測されたわけではない。しかし、いくつかの企業はPRのピックアップ時間を最大33%減少させ、ワークフローのボトルネックを減らし、ある企業は最大17%の作業がメンテナンスから製品の成長イニシアチブへ移行した。この研究は、複数の企業のデータを初めて利用し、代わりに実際のエンジニアリング設定を考慮して、単純な生産性と満足度の測定を超えたものだ。そうすることによって、一部の企業は、Copilotの使用によって他の企業よりもメリットがあるように思われると同時に、エンジニアリング作業やワークフローの特定の側面ではなく、集約を調査する場合には、変更が微妙になる可能性がある、と強調する。

To understand the impacts of AI-driven coding tools on engineers' workflow and work environment, we utilize the Jellyfish platform to analyze indicators of change. Key indicators are derived from Allocations, Coding Fraction vs. PR Fraction, Lifecycle Phases, Cycle Time, Jira ticket size, PR pickup time, PR comments, PR comment count, interactions, and coding languages. Significant changes were observed in coding time fractions among Copilot users, with an average decrease of 3% with individual decreases as large as 15%. Ticket sizes decreased by an average of 16% across four companies, accompanied by an 8% decrease in cycle times, whereas the control group showed no change. Additionally, the PR process evolved with Copilot usage, featuring longer and more comprehensive comments, despite the weekly number of PRs reviewed remaining constant. Not all hypothesized changes were observed across all participating companies. However, some companies experienced a decrease in PR pickup times by up to 33%, indicating reduced workflow bottlenecks, and one company experienced a shift of up to 17% of effort from maintenance and support work towards product growth initiatives. This study is the first to utilize data from more than one company and goes beyond simple productivity and satisfaction measures, considering real-world engineering settings instead. By doing so, we highlight that some companies seem to benefit more than others from the use of Copilot and that changes can be subtle when investigating aggregates rather than specific aspects of engineering work and workflows - something that will be further investigated in the future.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# 大規模言語モデル予測におけるアウトオフコンテキストプロンプトの公正性とロバスト性向上

Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions ( http://arxiv.org/abs/2406.07685v1 )

ライセンス: Link先を確認

Leonardo Cotta, Chris J. Maddison,

(参考訳) Frontier Large Language Models (LLMs)は、高い意思決定のためにますますデプロイされている。一方で、これらのモデルは、ユーザや社会の期待に反する予測、例えば、幻覚、あるいは差別を継続的に行っています。したがって、信頼性を向上させるためのテストタイム戦略を開発することが重要である。従来の作業にインスパイアされた私たちは、因果関係をツールとして活用して、LLMにおける信頼の2つの側面、すなわち公正性と堅牢性を公式にエンコードします。この観点では、既存のテストタイムソリューションは、モデルに公正か堅牢かを明確に指示するが、LLMの因果推論能力に依存している。この研究では、反対のアプローチを探求する。 LLMに信頼性を明示的に求める代わりに、我々は、構築によってより信頼性の高い予測をもたらす根底にある因果推論アルゴリズムを符号化するプロンプトを設計する。具体的には、LLMの公平性と堅牢性を促進するテストタイムソリューションとして、アウト・オブ・コンテクストを提案する。アウト・オブ・コンテクスト(Out-of-context prompting)は、タスクの因果モデルに関するユーザの事前の知識を活用して、(ランダムな)反事実変換を適用し、モデルの信頼性を向上させる。経験的に、アウト・オブ・コンテクストは、追加のデータや微調整や事前学習を必要とせずに、5つのベンチマークデータセットにわたるフロンティアLSMの公平性と堅牢性を一貫して改善することを示す。

Frontier Large Language Models (LLMs) are increasingly being deployed for high-stakes decision-making. On the other hand, these models are still consistently making predictions that contradict users' or society's expectations, e.g., hallucinating, or discriminating. Thus, it is important that we develop test-time strategies to improve their trustworthiness. Inspired by prior work, we leverage causality as a tool to formally encode two aspects of trustworthiness in LLMs: fairness and robustness. Under this perspective, existing test-time solutions explicitly instructing the model to be fair or robust implicitly depend on the LLM's causal reasoning capabilities. In this work, we explore the opposite approach. Instead of explicitly asking the LLM for trustworthiness, we design prompts to encode the underlying causal inference algorithm that will, by construction, result in more trustworthy predictions. Concretely, we propose out-of-context prompting as a test-time solution to encourage fairness and robustness in LLMs. Out-of-context prompting leverages the user's prior knowledge of the task's causal model to apply (random) counterfactual transformations and improve the model's trustworthiness. Empirically, we show that out-of-context prompting consistently improves the fairness and robustness of frontier LLMs across five different benchmark datasets without requiring additional data, finetuning or pre-training.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# AV-DiT:ジョイントオーディオ・ビデオ生成のための高能率オーディオ・ビジュアル・ディフュージョン変換器

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation ( http://arxiv.org/abs/2406.07686v1 )

ライセンス: Link先を確認

Kai Wang, Shijian Deng, Jing Shi, Dimitrios Hatzinakos, Yapeng Tian,

(参考訳) 最近のDiffusion Transformers (DiTs)は、画像、ビデオ、オーディオを含む高品質な単一モダリティコンテンツを生成する素晴らしい能力を示している。しかし, 変圧器をベースとしたディフューザがガウス雑音を効率よくマルチモーダルコンテンツ生成に分解できるかどうかはまだ未定である。このギャップを埋めるために、視覚とオーディオの両方で高品質でリアルなビデオを生成するために設計された、新しく効率的なオーディオ-視覚拡散変換器であるAV-DiTを導入する。モデルの複雑さと計算コストを最小限に抑えるため、AV-DiTは画像のみのデータで事前訓練された共有のDiTバックボーンを使用し、新しく挿入されたアダプタのみをトレーニングできる。この共有バックボーンは、オーディオとビデオの両方を生成する。具体的には、トレーニング可能な時間的注意層を凍結したトレーニング済みのDiTブロックに組み込んで、時間的一貫性を実現する。さらに、少数のトレーニング可能なパラメータが画像ベースのDiTブロックに適応してオーディオを生成する。軽量なパラメータを備えた追加の共有DiTブロックは、オーディオと視覚のモダリティ間の特徴的相互作用を促進し、アライメントを確保する。 AIST++とLandscapeデータセットの大規模な実験により、AV-DiTは可変パラメータが大幅に少ない共同オーディオ・ビジュアル生成において最先端のパフォーマンスを達成することが示された。さらに, 単一の共有画像生成バックボーンをモダリティに適応させることで, 共同オーディオ映像生成装置を構築するのに十分であることを示した。ソースコードと事前訓練されたモデルがリリースされます。

Recent Diffusion Transformers (DiTs) have shown impressive capabilities in generating high-quality single-modality content, including images, videos, and audio. However, it is still under-explored whether the transformer-based diffuser can efficiently denoise the Gaussian noises towards superb multimodal content creation. To bridge this gap, we introduce AV-DiT, a novel and efficient audio-visual diffusion transformer designed to generate high-quality, realistic videos with both visual and audio tracks. To minimize model complexity and computational costs, AV-DiT utilizes a shared DiT backbone pre-trained on image-only data, with only lightweight, newly inserted adapters being trainable. This shared backbone facilitates both audio and video generation. Specifically, the video branch incorporates a trainable temporal attention layer into a frozen pre-trained DiT block for temporal consistency. Additionally, a small number of trainable parameters adapt the image-based DiT block for audio generation. An extra shared DiT block, equipped with lightweight parameters, facilitates feature interaction between audio and visual modalities, ensuring alignment. Extensive experiments on the AIST++ and Landscape datasets demonstrate that AV-DiT achieves state-of-the-art performance in joint audio-visual generation with significantly fewer tunable parameters. Furthermore, our results highlight that a single shared image generative backbone with modality-specific adaptations is sufficient for constructing a joint audio-video generator. Our source code and pre-trained models will be released.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# 対訳マシンアンラーニング

Adversarial Machine Unlearning ( http://arxiv.org/abs/2406.07687v1 )

ライセンス: Link先を確認

Zonglin Di, Sixie Yu, Yevgeniy Vorobeychik, Yang Liu,

(参考訳) 本稿では,機械学習モデルに対する特定のトレーニングデータの影響を取り除くことを目的とした,機械学習の課題に焦点を当てた。従来、未学習アルゴリズムの開発は、トレーニングにデータインスタンスを使用したかどうかを判断するプライバシーの脅威である、メンバシップ推論攻撃(MIA)と並行して実行される。しかし、2つのストランドは密接に結びついており、削除されたデータに関してMIAの成功のレンズを通して機械学習を見ることができる。この関係を認識し,未学習アルゴリズムの設計にMIAを統合するゲーム理論フレームワークを提案する。具体的には、未学習の問題をモデルから特定のトレーニングデータを解放しようとするStackelbergゲームとしてモデル化し、監査官はMIAを用いて視覚的に除去されたデータのトレースを検出する。この対立的な観点を採用することで、新たな攻撃の進展が利用でき、未学習アルゴリズムの設計が容易になる。私たちのフレームワークは2つの点で際立っている。まず、敵対的なアプローチをとり、攻撃を非学習アルゴリズムの設計に積極的に組み込む。第二に、攻撃者の成功を制限する勾配を得るために暗黙の差別を利用するため、学習を解き放つプロセスの恩恵を受ける。本研究では,機械学習における提案手法の有効性を示す実験結果を示す。

This paper focuses on the challenge of machine unlearning, aiming to remove the influence of specific training data on machine learning models. Traditionally, the development of unlearning algorithms runs parallel with that of membership inference attacks (MIA), a type of privacy threat to determine whether a data instance was used for training. However, the two strands are intimately connected: one can view machine unlearning through the lens of MIA success with respect to removed data. Recognizing this connection, we propose a game-theoretic framework that integrates MIAs into the design of unlearning algorithms. Specifically, we model the unlearning problem as a Stackelberg game in which an unlearner strives to unlearn specific training data from a model, while an auditor employs MIAs to detect the traces of the ostensibly removed data. Adopting this adversarial perspective allows the utilization of new attack advancements, facilitating the design of unlearning algorithms. Our framework stands out in two ways. First, it takes an adversarial approach and proactively incorporates the attacks into the design of unlearning algorithms. Secondly, it uses implicit differentiation to obtain the gradients that limit the attacker's success, thus benefiting the process of unlearning. We present empirical results to demonstrate the effectiveness of the proposed approach for machine unlearning.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# AIラジオロジスト:畳み込みニューラルネットワークと臨床用GUIによる肝組織分節の革命

AI Radiologist: Revolutionizing Liver Tissue Segmentation with Convolutional Neural Networks and a Clinician-Friendly GUI ( http://arxiv.org/abs/2406.07688v1 )

ライセンス: Link先を確認

Ayman Al-Kababji, Faycal Bensaali, Sarada Prasad Dakua, Yassine Himeur,

(参考訳) 人工知能(AI)は、様々な分野や応用に浸透する幅広い研究トピックである。本研究では,肝組織分割のためのAI,特に畳み込みニューラルネットワーク(ConvNets)のパワーを利用する。また、ユーザフレンドリーなグラフィカルユーザインタフェース(GUI)ツールである"AI Radioologist"の開発にも重点を置いている。この取り組みは、学術研究と実践的、産業的応用のギャップを埋めるものである。 GUIはシングルページアプリケーションであり、PyQt5 Pythonフレームワークを使って設計されている。オフラインで利用できるAIラジオロジストは、すべての肝臓組織をセグメンテーションするためにトレーニングされた3つのConvNetモデルを利用している。 Diceの指標では、ベスト肝のConvNetスコアは98.16%、ベスト腫瘍のConvNetスコアは65.95%、ベスト血管のConvNetスコアは51.94%である。肝臓、腫瘍、血管の2Dスライスと、.NETの3D補間を出力する。 objと... これは、どんな3D互換のソフトウェアでも視覚化/プリントできる。したがって、AIラジオロジストは、臨床医が肝組織セグメンテーションと組織セグメンテーションの最先端モデルを用いた3D補間を行うのに便利なツールを提供する。ボリュームと事前訓練されたモデルを選択する能力が提供されるため、臨床医は残りをAIラジオロジストに委ねることができる。

Artificial Intelligence (AI) is a pervasive research topic, permeating various sectors and applications. In this study, we harness the power of AI, specifically convolutional neural networks (ConvNets), for segmenting liver tissues. It also focuses on developing a user-friendly graphical user interface (GUI) tool, "AI Radiologist", enabling clinicians to effectively delineate different liver tissues (parenchyma, tumors, and vessels), thereby saving lives. This endeavor bridges the gap between academic research and practical, industrial applications. The GUI is a single-page application and is designed using the PyQt5 Python framework. The offline-available AI Radiologist resorts to three ConvNet models trained to segment all liver tissues. With respect to the Dice metric, the best liver ConvNet scores 98.16%, the best tumor ConvNet scores 65.95%, and the best vessel ConvNet scores 51.94%. It outputs 2D slices of the liver, tumors, and vessels, along with 3D interpolations in .obj and .mtl formats, which can be visualized/printed using any 3D-compatible software. Thus, the AI Radiologist offers a convenient tool for clinicians to perform liver tissue segmentation and 3D interpolation employing state-of-the-art models for tissues segmentation. With the provided capacity to select the volumes and pre-trained models, the clinicians can leave the rest to the AI Radiologist.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# 教育におけるトランスフォーマーモデル:AraBART、MT5、AraT5、mBARTによるサイエンス教科書の要約

Transformer Models in Education: Summarizing Science Textbooks with AraBART, MT5, AraT5, and mBART ( http://arxiv.org/abs/2406.07692v1 )

ライセンス: Link先を確認

Sari Masri, Yaqeen Raddad, Fidaa Khandaqji, Huthaifa I. Ashqar, Mohammed Elhenawy,

(参考訳) 近年,技術分野の急速な発展と,インターネット上で利用できるテキストtの量の増加により,情報の基本的基礎を損なうことなく,コンテンツを要約してテキストを処理・理解するための効果的なツールの開発が急務となっている。この課題から、アラビア語の教科書を対象とする高度なテキスト要約システムを開発した。 MT5, AraBART, AraT5, mBART50などの現代のナチュラル言語処理モデルに基づいて, パレスチナのカリキュラムにおける11年生および12年生の生物学教科書で見られる最も重要な文章を評価し, 抽出し, 学生や教師が, 内容を容易に理解するための正確で有用な要約を得られるようにした。トレーニングされたモデルの性能を評価するために,ルージュ測度を用いた。さらに、教育エドゥの教科書執筆の専門家は、訓練されたモデルのアウトプットを評価する。このアプローチは、最良のソリューションを特定し、改善が必要な領域を明確にすることを目的としています。この研究はアラビア語のテキストを要約するための解決策を提供する。アラビア語の理解と生成のための技術において、研究と開発のための新たな地平線を開くことができる結果を提供することによって、この分野を豊かにする。さらに、教科書のテキストを作成し、編集し、データセットを構築することで、アラビア語のテキストでこの分野に貢献する。

Recently, with the rapid development in the fields of technology and the increasing amount of text t available on the internet, it has become urgent to develop effective tools for processing and understanding texts in a way that summaries the content without losing the fundamental essence of the information. Given this challenge, we have developed an advanced text summarization system targeting Arabic textbooks. Relying on modern natu-ral language processing models such as MT5, AraBART, AraT5, and mBART50, this system evaluates and extracts the most important sentences found in biology textbooks for the 11th and 12th grades in the Palestinian curriculum, which enables students and teachers to obtain accurate and useful summaries that help them easily understand the content. We utilized the Rouge metric to evaluate the performance of the trained models. Moreover, experts in education Edu textbook authoring assess the output of the trained models. This approach aims to identify the best solutions and clarify areas needing improvement. This research provides a solution for summarizing Arabic text. It enriches the field by offering results that can open new horizons for research and development in the technologies for understanding and generating the Arabic language. Additionally, it contributes to the field with Arabic texts through creating and compiling schoolbook texts and building a dataset.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# YouTube、TikTok、その他2024年の麻疹のアウトブレイクに関する動画の感情分析のためのラベル付きデータセット

A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles ( http://arxiv.org/abs/2406.07693v1 )

ライセンス: Link先を確認

Nirmalya Thakur, Vanessa Su, Mingchen Shao, Kesha A. Patel, Hongseok Jeong, Victoria Knieling, Andrew Brian,

(参考訳) 本稿では,2024年1月1日から5月31日までにインターネット上の264のウェブサイトで公表された麻疹の流行に関する4011件のビデオデータを含むデータセットを提案する。データセットはhttps://dx.doi.org/10.21227/40s8-xf63で公開されている。これらのウェブサイトにはYouTubeとTikTokが含まれるが、これはそれぞれ48.6%と15.2%である。残りのWebサイトは、InstagramとFacebookだけでなく、さまざまなグローバルおよびローカルなニュース組織のWebサイトも含んでいる。これらのビデオのそれぞれについて、ビデオのURL、投稿のタイトル、投稿の説明、およびビデオの公開日をデータセット内の別の属性として提示する。このデータセットを開発した後、ビデオタイトルとビデオ記述の感情分析(VADERを用いた)、主観的分析(TextBlobを用いた)、微粒な感情分析(DistilRoBERTaベースを用いた)を行った。これには、各ビデオタイトルとビデオ記述を分類することが含まれる。 (i)肯定的、否定的、中立的な感情階級の1つ (二)主観的階級の1つ、即ち、高い意見、中立的な意見、または、最小の意見 (三)恐怖、驚き、喜び、悲しみ、怒り、嫌悪、中立という微粒な感情のクラスの一つ。これらの結果は、この分野での感情分析や主観分析を行う機械学習アルゴリズムのトレーニングとテストのためのデータセットと、他のアプリケーションのためのデータセットの別属性として提示される。最後に,本データセットを用いて検討することのできるオープンリサーチ質問のリストも提示する。

The work of this paper presents a dataset that contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. The dataset is available at https://dx.doi.org/10.21227/40s8-xf63. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. Finally, this paper also presents a list of open research questions that may be investigated using this dataset.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# 産業欠陥検出のためのベンチマークとモデル開発のための公開データセットのPRISMA駆動型システムレビュー

A PRISMA Driven Systematic Review of Publicly Available Datasets for Benchmark and Model Developments for Industrial Defect Detection ( http://arxiv.org/abs/2406.07694v1 )

ライセンス: Link先を確認

Can Akbas, Irem Su Arin, Sinan Onal,

(参考訳) 近年, 様々な産業における品質管理の進歩は, ビデオカメラと画像処理を統合し, 効果的な欠陥検出を実現している。進歩にとって重要な障壁は、注釈付き欠陥を含む包括的なデータセットの不足であり、自動欠陥検出モデルの開発と修正に不可欠である。この体系的なレビューは、2015年から2023年にかけて、15の公開データセットを特定し、ベンチマークとモデル開発の有効性と適用性を評価するために、それらを批判的に検証する。 NEU-CLS, NEU-DET, DAGM, KolektorSDD, PCB Defect Dataset, Hollow Cylindrical Defect Detection Datasetなどのデータセットには,画像品質, 欠陥型表現, 実世界の適用性など,それぞれ独自の長所と制限がある。この体系的なレビューの目的は、これらのデータセットを単一の場所にまとめることであり、そのような公開リソースを包括的な参照で探す研究者に提供することである。

Recent advancements in quality control across various industries have increasingly utilized the integration of video cameras and image processing for effective defect detection. A critical barrier to progress is the scarcity of comprehensive datasets featuring annotated defects, which are essential for developing and refining automated defect detection models. This systematic review, spanning from 2015 to 2023, identifies 15 publicly available datasets and critically examines them to assess their effectiveness and applicability for benchmarking and model development. Our findings reveal a diverse landscape of datasets, such as NEU-CLS, NEU-DET, DAGM, KolektorSDD, PCB Defect Dataset, and the Hollow Cylindrical Defect Detection Dataset, each with unique strengths and limitations in terms of image quality, defect type representation, and real-world applicability. The goal of this systematic review is to consolidate these datasets in a single location, providing researchers who seek such publicly available resources with a comprehensive reference.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# 音声表現のための持続的自己教師型学習

Sustainable self-supervised learning for speech representations ( http://arxiv.org/abs/2406.07696v1 )

ライセンス: Link先を確認

Luis Lugo, Valentin Vielzeuf,

(参考訳) 持続可能な人工知能は、データ、ハードウェア、アルゴリズムに焦点を当て、機械学習モデルをより環境に責任を持つものにする。特に、音声表現のための機械学習モデルは計算コストが高く、高エネルギー消費のため環境問題が発生する。そこで本稿では,音声表現学習のための持続的自己教師モデルを提案する。提案モデルでは,資源効率のよいベースラインを改良し,メモリ使用量と計算コストの見積を削減した。 1日以内で1つのGPUを使用して事前トレーニングを行う。それに加えて、下流タスク評価におけるベースラインのエラー率パフォーマンスを向上させる。大規模な音声表現アプローチと比較すると、メモリ使用量の桁違いの削減が見られ、計算コストの削減は、ほぼ3桁の桁違いの改善を示している。

Sustainable artificial intelligence focuses on data, hardware, and algorithms to make machine learning models more environmentally responsible. In particular, machine learning models for speech representations are computationally expensive, generating environmental concerns because of their high energy consumption. Thus, we propose a sustainable self-supervised model to learn speech representation, combining optimizations in neural layers and training to reduce computing costs. The proposed model improves over a resource-efficient baseline, reducing both memory usage and computing cost estimations. It pretrains using a single GPU in less than a day. On top of that, it improves the error rate performance of the baseline in downstream task evaluations. When comparing it to large speech representation approaches, there is an order of magnitude reduction in memory usage, while computing cost reductions represent almost three orders of magnitude improvement.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# Label Smoothingが機械学習を改善

Label Smoothing Improves Machine Unlearning ( http://arxiv.org/abs/2406.07698v1 )

ライセンス: Link先を確認

Zonglin Di, Zhaowei Zhu, Jinghan Jia, Jiancheng Liu, Zafar Takhirov, Bo Jiang, Yuanshun Yao, Sijia Liu, Yang Liu,

(参考訳) マシン・アンラーニング(MU)の目的は、以前に学習したデータをモデルから排除することである。しかし、既存のMU技術を使用する場合、計算コストと性能のバランスをとることは困難である。ラベル平滑化がモデル信頼性と差分プライバシーに与える影響から着想を得て,ラベル平滑化の逆プロセスを用いた単純な勾配に基づくMUアプローチを提案する。この研究は、スムーズなラベルを使用するシンプルなプラグアンドプレイMUアプローチであるUGradSLを導入している。ラベルのスムース化を適切に導入することでMU性能が向上する理由を理論的に分析する。提案手法の有効性とロバスト性を実証し,様々なサイズと異なるモードの6つのデータセットについて広範な実験を行った。 MU性能の一貫した改善は、余剰計算の限界コストでしかない。例えば、UGradSLは、未学習効率を犠牲にすることなく、勾配上昇MUベースラインを66%の未学習精度で改善する。

The objective of machine unlearning (MU) is to eliminate previously learned data from a model. However, it is challenging to strike a balance between computation cost and performance when using existing MU techniques. Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of label smoothing. This work introduces UGradSL, a simple, plug-and-play MU approach that uses smoothed labels. We provide theoretical analyses demonstrating why properly introducing label smoothing improves MU performance. We conducted extensive experiments on six datasets of various sizes and different modalities, demonstrating the effectiveness and robustness of our proposed method. The consistent improvement in MU performance is only at a marginal cost of additional computations. For instance, UGradSL improves over the gradient ascent MU baseline by 66% unlearning accuracy without sacrificing unlearning efficiency.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# CUPID: プロンプト条件付き画像分布の文脈的理解

CUPID: Contextual Understanding of Prompt-conditioned Image Distributions ( http://arxiv.org/abs/2406.07699v1 )

ライセンス: Link先を確認

Yayan Zhao, Mingwei Li, Matthew Berger,

(参考訳) 本稿では,プロンプト条件付き画像分布の文脈的理解のための可視化手法CUPIDを提案する。 CUPIDは、ユーザが自然言語でシーンを指定できる現代のテキスト・画像生成モデルによって生成された分布の視覚的解析を目標とし、そのモデルがユーザの記述を満足する一連の画像を生成する。 CUPIDは、結果の分布を理解するために設計されており、文脈的手がかりを用いて分析を容易にする。 CUPIDの中心は高次元分布を可視化する新しい手法であり、画像内の物体の文脈的埋め込みは密度に基づく埋め込みによって低次元空間にマッピングされる。このような埋め込みによって、分布内のオブジェクトの健全なスタイルを発見できるだけでなく、異常なオブジェクトスタイルやまれなオブジェクトスタイルを識別できることを示す。さらに、条件密度埋め込みを導入し、与えられたオブジェクトの条件付けにより、分布内のオブジェクトの依存関係を比較することができる。大規模拡散モデルにより生成された画像の分布解析にCUPIDを用いており、実験結果から、そのようなモデルからの言語誤解やオブジェクト構成のバイアスについての洞察が得られ、また、典型的あるいは稀な合成シーンの発見のためのインターフェースを提供する。

We present CUPID: a visualization method for the contextual understanding of prompt-conditioned image distributions. CUPID targets the visual analysis of distributions produced by modern text-to-image generative models, wherein a user can specify a scene via natural language, and the model generates a set of images, each intended to satisfy the user's description. CUPID is designed to help understand the resulting distribution, using contextual cues to facilitate analysis: objects mentioned in the prompt, novel, synthesized objects not explicitly mentioned, and their potential relationships. Central to CUPID is a novel method for visualizing high-dimensional distributions, wherein contextualized embeddings of objects, those found within images, are mapped to a low-dimensional space via density-based embeddings. We show how such embeddings allows one to discover salient styles of objects within a distribution, as well as identify anomalous, or rare, object styles. Moreover, we introduce conditional density embeddings, whereby conditioning on a given object allows one to compare object dependencies within the distribution. We employ CUPID for analyzing image distributions produced by large-scale diffusion models, where our experimental results offer insights on language misunderstanding from such models and biases in object composition, while also providing an interface for discovery of typical, or rare, synthesized scenes.

翻訳日:2024-06-13 21:16:01 公開日:2024-06-11

# 微細分散状態によるスケーラブルUTXOスマートコントラクト

Scalable UTXO Smart Contracts via Fine-Grained Distributed State ( http://arxiv.org/abs/2406.07700v1 )

ライセンス: Link先を確認

Massimo Bartoletti, Riccardo Marchesin, Roberto Zunino,

(参考訳) 現在のUTXOベースのスマートコントラクトは効率上のボトルネックに直面しており、更新されたコントラクト状態全体を特定するために、コントラクトに送信されるすべてのトランザクションが必要になる。この要件は、契約状態がマップのような動的なデータ構造を含んでいる場合、特に負担になる。一方、トランザクションにおける大きな状態は、大きなトランザクション手数料を意味する。一方、大きな中央集中状態は、トランザクションの並列化に有害であり、これは、アカウントベースのものと比較してUTXOベースのブロックチェーンの主要なセールスポイントの1つである。本稿では,拡張UTXOブロックチェーン上でのスマートコントラクトの効率的な実行手法を提案する。このようにして、トランザクションはアクセスする必要のある状態の一部のみを指定し、サイズ(および料金)を削減します。また、マルチコアCPU上でのトランザクションの検証を並列化するために、我々のモデルを利用する方法を示す。我々は,本手法を実装し,その有効性を実証的に検証する。

Current UTXO-based smart contracts face an efficiency bottleneck, requiring any transaction sent to a contract to specify the entire updated contract state. This requirement becomes particularly burdensome when the contract state contains dynamic data structures, such as maps, which are needed in many use cases for tracking users interactions with the contract. The problem is twofold: on the one hand, a large state in transactions implies a large transaction fee; on the other hand, a large centralized state is detrimental to the parallelization of transactions, which should be one of the main selling points of UTXO-based blockchains compared to account-based ones. We propose a technique to efficiently execute smart contracts on an extended UTXO blockchain, which allows the contract state to be distributed across multiple UTXOs. In this way, transactions only need to specify the part of the state they need to access, reducing their size (and fees). We also show how to exploit our model to parallelize the validation of transactions on multi-core CPUs. We implement our technique and provide an empirical validation of its effectiveness.

翻訳日:2024-06-13 21:06:17 公開日:2024-06-11

# 正当性に基づくモデル説明のグラフ的知覚

Graphical Perception of Saliency-based Model Explanations ( http://arxiv.org/abs/2406.07702v1 )

ライセンス: Link先を確認

Yayan Zhao, Mingwei Li, Matthew Berger,

(参考訳) 近年、予測的、深層学習に基づくモデルの説明に多くの研究が注がれてきた。評価手法の重要なクラスは人間中心のものであり、可視化を通して説明の伝達を必要とするのが普通である。ビジュアライゼーションは、モデル説明の知覚と理解において重要な役割を担っているが、ビジュアライゼーションデザインが人間の説明に対する認識にどのように影響するかは、まだよく分かっていない。本研究では,モデル説明のグラフィカルな知覚,特に視覚的認識モデルに対するサリエンシに基づく説明について検討する。本研究では,人間の知覚が視覚的デザインにどのように影響するかを実験的に検討し,アライメントアセスメントの課題や,画像中の物体とサリエンシマップが一致しているかを考察する。以上の結果から, 可視化設計決定やアライメントの種類, サリエンシマップの質に関連する要因が, 人間がサリエンシに基づく視覚的説明を知覚する上で重要な役割を担っていることが明らかとなった。

In recent years, considerable work has been devoted to explaining predictive, deep learning-based models, and in turn how to evaluate explanations. An important class of evaluation methods are ones that are human-centered, which typically require the communication of explanations through visualizations. And while visualization plays a critical role in perceiving and understanding model explanations, how visualization design impacts human perception of explanations remains poorly understood. In this work, we study the graphical perception of model explanations, specifically, saliency-based explanations for visual recognition models. We propose an experimental design to investigate how human perception is influenced by visualization design, wherein we study the task of alignment assessment, or whether a saliency map aligns with an object in an image. Our findings show that factors related to visualization design decisions, the type of alignment, and qualities of the saliency map all play important roles in how humans perceive saliency-based visual explanations.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# オブジェクトレベルのシーンデクルージョン

Object-level Scene Deocclusion ( http://arxiv.org/abs/2406.07706v1 )

ライセンス: Link先を確認

Zhengzhe Liu, Qing Liu, Chirui Chang, Jianming Zhang, Daniil Pakhomov, Haitian Zheng, Zhe Lin, Daniel Cohen-Or, Chi-Wing Fu,

(参考訳) シーン内のオブジェクトの隠された部分を取り除くことは、特に現実世界のシーンに対処する場合、非常に恐ろしい作業である。本稿では,オブジェクトレベルのシーン・デクルージョンの基礎モデルであるPACOという,自己制御型PArallel可視・コミュールト拡散フレームワークを提案する。事前訓練されたモデルのリッチな事前処理を活用して、複数の完全オブジェクトを同時に符号化するフルビュー特徴マップを生成する並列変分オートエンコーダと、部分ビュー特徴マップから全ビュー特徴マップを暗黙的に予測し、入力画像中の不完全オブジェクトから抽出したテキストプロンプトを学習する可視から完全ラテント生成器を設計する。 PACOをトレーニングするために、500kサンプルによる大規模データセットを作成し、アモーダルマスクや隠蔽領域の退屈なアノテーションを回避し、自己教師付き学習を可能にする。提案手法では,非閉塞性を維持しつつ,効率向上を図るため,層単位の非閉塞性戦略を考案する。 COCOAと様々な現実世界のシーンに対する大規模な実験は、PACOがシーンの排除に優れた能力を示し、最先端の技術をはるかに上回っていることを示している。また,本手法は,トレーニングセットがカバーしていないクロスドメインシーンや新しいカテゴリにも拡張可能である。さらに,単視点3次元シーン再構成とオブジェクト再構成におけるPACOの非閉塞性を示す。

Deoccluding the hidden portions of objects in a scene is a formidable task, particularly when addressing real-world scenes. In this paper, we present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, a foundation model for object-level scene deocclusion. Leveraging the rich prior of pre-trained models, we first design the parallel variational autoencoder, which produces a full-view feature map that simultaneously encodes multiple complete objects, and the visible-to-complete latent generator, which learns to implicitly predict the full-view feature map from partial-view feature map and text prompts extracted from the incomplete objects in the input image. To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning, avoiding tedious annotations of the amodal masks and occluded regions. At inference, we devise a layer-wise deocclusion strategy to improve efficiency while maintaining the deocclusion quality. Extensive experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin. Our method can also be extended to cross-domain scenes and novel categories that are not covered by the training set. Further, we demonstrate the deocclusion applicability of PACO in single-view 3D scene reconstruction and object recomposition.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# YOLOv7に基づく全安全機器検出のための深層学習手法

A Deep Learning Approach to Detect Complete Safety Equipment For Construction Workers Based On YOLOv7 ( http://arxiv.org/abs/2406.07707v1 )

ライセンス: Link先を確認

Md. Shariful Islam, SM Shaqib, Shahriar Sultan Ramit, Shahrun Akter Khushbu, Mr. Abdus Sattar, Dr. Sheak Rashed Haider Noor,

(参考訳) 建設部門では、労働者の安全を確保することが最も重要である。本研究では, ヘルメット, ゴーグル, ジャケット, 手袋, 履物など, 建設作業員が着用する安全装備を同定するための深層学習技術を提案する。推奨されるアプローチは、YOLO v7(You Only Look Once)オブジェクト検出アルゴリズムを使用して、これらの安全アイテムを正確に検出する。この作業で使用されるデータセットは、トレーニング、テスト、検証セットに分割されたラベル付きイメージで構成されている。各画像には、画像内の安全装置の位置を示すバウンディングボックスラベルがある。モデルは、反復的なトレーニングアプローチを通じてラベル付きデータセットに基づいて安全装置を識別し、分類するように訓練されている。このモデルをトレーニングするためにカスタムデータセットを使用しました。トレーニングされたモデルでは,安全機器認識のための精度,リコール,F1スコアが良好に動作した。また、モデルの評価は、mAP@0.5スコア87.7\%の励振結果を生み出した。モデルは効果的に動作し、建設現場における安全装置の違反を迅速に識別することができる。結果の徹底的な評価は、モデルの利点を明らかにし、開発の潜在的な領域を指摘します。本研究は,自動かつ信頼性の高い安全機器検出手法を提供することにより,コンピュータビジョンと職場安全の分野に貢献する。深層学習に基づくアプローチは、安全コンプライアンスを高め、建設業界における事故リスクを低減する

In the construction sector, ensuring worker safety is of the utmost significance. In this study, a deep learning-based technique is presented for identifying safety gear worn by construction workers, such as helmets, goggles, jackets, gloves, and footwears. The recommended approach uses the YOLO v7 (You Only Look Once) object detection algorithm to precisely locate these safety items. The dataset utilized in this work consists of labeled images split into training, testing and validation sets. Each image has bounding box labels that indicate where the safety equipment is located within the image. The model is trained to identify and categorize the safety equipment based on the labeled dataset through an iterative training approach. We used custom dataset to train this model. Our trained model performed admirably well, with good precision, recall, and F1-score for safety equipment recognition. Also, the model's evaluation produced encouraging results, with a mAP@0.5 score of 87.7\%. The model performs effectively, making it possible to quickly identify safety equipment violations on building sites. A thorough evaluation of the outcomes reveals the model's advantages and points up potential areas for development. By offering an automatic and trustworthy method for safety equipment detection, this research makes a contribution to the fields of computer vision and workplace safety. The proposed deep learning-based approach will increase safety compliance and reduce the risk of accidents in the construction industry

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# 分子設計のためのベイズ最適化における共通問題の診断と修正

Diagnosing and fixing common problems in Bayesian optimization for molecule design ( http://arxiv.org/abs/2406.07709v1 )

ライセンス: Link先を確認

Austin Tripp, José Miguel Hernández-Lobato,

(参考訳) ベイズ最適化(英: Bayesian Optimization、BO)は、分子設計の課題に対する原理的なアプローチである。本稿では,不正確な先行幅,過度な平滑化,不適切な獲得関数の最大化という,経験的性能の低下を引き起こすBOの落とし穴を3つ説明する。これらの課題に対処して,分子設計のためのPMOベンチマーク(Gao et al, 2022)において,基本的なBO設定でも高い総合的な性能を達成可能であることを示す。これらの結果から,BOは分子群集における機械学習のさらなる注目の恩恵を受ける可能性が示唆された。

Bayesian optimization (BO) is a principled approach to molecular design tasks. In this paper we explain three pitfalls of BO which can cause poor empirical performance: an incorrect prior width, over-smoothing, and inadequate acquisition function maximization. We show that with these issues addressed, even a basic BO setup is able to achieve the highest overall performance on the PMO benchmark for molecule design (Gao et al, 2022). These results suggest that BO may benefit from more attention in the machine learning for molecules community.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# YOLOv8を利用した大都市圏の道路安全・交通管理向上のための車両速度検出システム

Vehicle Speed Detection System Utilizing YOLOv8: Enhancing Road Safety and Traffic Management for Metropolitan Areas ( http://arxiv.org/abs/2406.07710v1 )

ライセンス: Link先を確認

SM Shaqib, Alaya Parvin Alo, Shahriar Sultan Ramit, Afraz Ul Haque Rupak, Sadman Sadik Khan, Mr. Md. Sadekur Rahman,

(参考訳) 死者や事故の減少による交通安全を確保するためには,車両の速度検出が不可欠である。自動車の速度の正確なモニタリングによって可能となる速度制限の実施によって、寛大な運転慣行は避けられる。バングラデシュでは道路事故が主要な死因の1つとなっている。バングラデシュ旅客福祉協会は2023年に、年間7,902人が交通事故で命を落としたと発表した。交通安全維持には効率的な車両速度検出が不可欠である。信頼性の高い速度検出は重要なトラフィックデータ収集にも役立ち、トラフィックフローを最適化し、より安全な道路インフラを提供する。 YOLOv8モデルは、密接な監督の下で訓練されたときに、より高速で精度の高いビデオ中の車を認識、追跡することができる。バングラデシュにおける車両の速度推定における物体識別への教師あり学習の適用と、特定の交通状況と安全上の懸念に焦点を当てた知見を提供することにより、この研究は、この地域に注目すべき貢献である。 MAEは3.5,RMSEは4.22であり,提案手法は従来の手法に代えて経済的に有効な代替手段である。

In order to ensure traffic safety through a reduction in fatalities and accidents, vehicle speed detection is essential. Relentless driving practices are discouraged by the enforcement of speed restrictions, which are made possible by accurate monitoring of vehicle speeds. Road accidents remain one of the leading causes of death in Bangladesh. The Bangladesh Passenger Welfare Association stated in 2023 that 7,902 individuals lost their lives in traffic accidents during the course of the year. Efficient vehicle speed detection is essential to maintaining traffic safety. Reliable speed detection can also help gather important traffic data, which makes it easier to optimize traffic flow and provide safer road infrastructure. The YOLOv8 model can recognize and track cars in videos with greater speed and accuracy when trained under close supervision. By providing insights into the application of supervised learning in object identification for vehicle speed estimation and concentrating on the particular traffic conditions and safety concerns in Bangladesh, this work represents a noteworthy contribution to the area. The MAE was 3.5 and RMSE was 4.22 between the predicted speed of our model and the actual speed or the ground truth measured by the speedometer Promising increased efficiency and wider applicability in a variety of traffic conditions, the suggested solution offers a financially viable substitute for conventional approaches.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# 損失勾配ガウス幅に基づく一般化と最適化保証

Loss Gradient Gaussian Width based Generalization and Optimization Guarantees ( http://arxiv.org/abs/2406.07712v1 )

ライセンス: Link先を確認

Arindam Banerjee, Qiaobo Li, Yingxue Zhou,

(参考訳) 機械学習における集団損失の一般化と最適化は、しばしば一様収束に基づく解析に頼っている。現代のモデルの豊かな表現力は、このアプローチに対する懸念につながっている。本稿では,Loss Gradient Gaussian Width (LGGW)によって測定された勾配の複雑さの観点から,一般化と最適化の保証を示す。まず,LGGWのフレキシブルな勾配支配条件下での一般化保証を導入する。第二に, 有限和最適化におけるサンプル再利用は, LGGWが小さい限り, 集団勾配から経験的勾配を逸脱させるものではないことを示す。第3に、ディープネットワークに着目し、軽度な仮定の下でLGGWをバインドする方法を示す。特に,LGGWは有界であることを示す。 (a) 損失ヘッセン固有値の$L_2$-normにより、一般に使用されるディープモデルに対して$\tilde{O}(1)$と実証的に示されている。 (b) プロデューサのガウス幅、すなわち、最後のただし1層の出力の点で。我々の知る限り、LGGWによる一般化と最適化の保証は、その種の第一の結果であり、予測器ラデマッハの複雑性に基づく解析の落とし穴を回避し、深層モデルの量的に厳密な境界に対するかなりの保証を保っている。

Generalization and optimization guarantees on the population loss in machine learning often rely on uniform convergence based analysis, typically based on the Rademacher complexity of the predictors. The rich representation power of modern models has led to concerns about this approach. In this paper, we present generalization and optimization guarantees in terms of the complexity of the gradients, as measured by the Loss Gradient Gaussian Width (LGGW). First, we introduce generalization guarantees directly in terms of the LGGW under a flexible gradient domination condition, which we demonstrate to hold empirically for deep models. Second, we show that sample reuse in finite sum (stochastic) optimization does not make the empirical gradient deviate from the population gradient as long as the LGGW is small. Third, focusing on deep networks, we present results showing how to bound their LGGW under mild assumptions. In particular, we show that their LGGW can be bounded (a) by the $L_2$-norm of the loss Hessian eigenvalues, which has been empirically shown to be $\tilde{O}(1)$ for commonly used deep models; and (b) in terms of the Gaussian width of the featurizer, i.e., the output of the last-but-one layer. To our knowledge, our generalization and optimization guarantees in terms of LGGW are the first results of its kind, avoid the pitfalls of predictor Rademacher complexity based analysis, and hold considerable promise towards quantitatively tight bounds for deep models.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# LLAMAFUZZ: 大規模言語モデルによるGreybox Fuzzingの拡張

LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing ( http://arxiv.org/abs/2406.07714v1 )

ライセンス: Link先を確認

Hongxiang Zhang, Yuyang Rong, Yifeng He, Hao Chen,

(参考訳) Greyboxのファジィは、プログラムのバグや脆弱性を明らかにすることに成功している。しかし、ランダム化された突然変異戦略は、構造データに対するファジィザの性能を制限している。特殊なファジィザは複雑な構造化データを扱うことができるが、文法にさらなる努力が必要であり、低スループットに悩まされる。本稿では,構造化データに対するグレーボックスファジングを強化するために,Large Language Modelを活用する可能性について検討する。我々は、データ変換とフォーマットに関するLLMの事前学習知識を利用して、新しい有効な入力を生成する。さらに、組換え突然変異種を用いて微調整を行い、構造化形式と突然変異戦略を効果的に学習した。 LLMベースのファザであるLLAMAFUZZは、LLMのパワーを統合して、構造化データをファザリングに理解し、変更する。我々は,標準的なバグベースのベンチマークMagmaと,さまざまな実世界のプログラムで実験を行う。 LLAMAFUZZは、平均して41のバグでトップのライバルより優れています。また、すべてのトライアルで47のユニークなバグを特定しました。さらに、LLAMAFUZはバグトリガとバグ到達の両方で一貫したパフォーマンスを示した。 AFL++と比較すると、LLAMAFUZは現実世界のプログラムセットで平均27.19%以上の分岐を達成した。また、コードカバレッジの観点からLLMがファジィ処理をどのように強化するかを説明するためのケーススタディも紹介する。

Greybox fuzzing has achieved success in revealing bugs and vulnerabilities in programs. However, randomized mutation strategies have limited the fuzzer's performance on structured data. Specialized fuzzers can handle complex structured data, but require additional efforts in grammar and suffer from low throughput. In this paper, we explore the potential of utilizing the Large Language Model to enhance greybox fuzzing for structured data. We utilize the pre-trained knowledge of LLM about data conversion and format to generate new valid inputs. We further fine-tuned it with paired mutation seeds to learn structured format and mutation strategies effectively. Our LLM-based fuzzer, LLAMAFUZZ, integrates the power of LLM to understand and mutate structured data to fuzzing. We conduct experiments on the standard bug-based benchmark Magma and a wide variety of real-world programs. LLAMAFUZZ outperforms our top competitor by 41 bugs on average. We also identified 47 unique bugs across all trials. Moreover, LLAMAFUZZ demonstrated consistent performance on both bug trigger and bug reached. Compared to AFL++, LLAMAFUZZ achieved 27.19% more branches in real-world program sets on average. We also demonstrate a case study to explain how LLMs enhance the fuzzing process in terms of code coverage.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# 脳のコインフリップ:神経集合を用いた統計的学習

Coin-Flipping In The Brain: Statistical Learning with Neuronal Assemblies ( http://arxiv.org/abs/2406.07715v1 )

ライセンス: Link先を確認

Max Dabagia, Daniel Mitropolsky, Christos H. Papadimitriou, Santosh S. Vempala,

(参考訳) 脳からインテリジェンスが発生するかは科学の中心的な問題である。知性の重要な側面は、不確実性に対処することである -- 環境に関する優れた予測を発達させ、これらの予測を判断に転換すること。脳自身は、発達と神経活動を駆動する化学プロセスから刺激に対する反応のバラツキを試すまでの多くのレベルでうるさい。一つの仮説は、脳のメカニズムに固有のノイズが、世界のモデルからサンプリングされ、予測を発生させることである。この仮説をテストするために、我々は、スタイリングされたニューロンやシナプス、可塑性、抑制に基づく、脳の生物学的に妥当な計算モデルであるNEMOにおける統計的学習の出現について研究し、調整された発火が位置、概念、記憶、その他の認知の項目を思い出させるために、組み立てられたニューロンのグループであるアセンブリを発生させる。理論とシミュレーションにおいて、アセンブリ間の接続が統計を記録し、周囲ノイズを利用してアセンブリ間の確率的選択を行うことが示されている。これによりNEMOは、マルコフ連鎖のような内部モデルを作成することができる。本研究は, 生物学的に妥当な確率的計算の基礎を提供し, 雑音が脳の認知メカニズムの有用な構成要素であるという仮説を理論的に裏付けるものである。

How intelligence arises from the brain is a central problem in science. A crucial aspect of intelligence is dealing with uncertainty -- developing good predictions about one's environment, and converting these predictions into decisions. The brain itself seems to be noisy at many levels, from chemical processes which drive development and neuronal activity to trial variability of responses to stimuli. One hypothesis is that the noise inherent to the brain's mechanisms is used to sample from a model of the world and generate predictions. To test this hypothesis, we study the emergence of statistical learning in NEMO, a biologically plausible computational model of the brain based on stylized neurons and synapses, plasticity, and inhibition, and giving rise to assemblies -- a group of neurons whose coordinated firing is tantamount to recalling a location, concept, memory, or other primitive item of cognition. We show in theory and simulation that connections between assemblies record statistics, and ambient noise can be harnessed to make probabilistic choices between assemblies. This allows NEMO to create internal models such as Markov chains entirely from the presentation of sequences of stimuli. Our results provide a foundation for biologically plausible probabilistic computation, and add theoretical support to the hypothesis that noise is a useful component of the brain's mechanism for cognition.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# 高度化昆虫検出のための移動学習モデルのパワーを開放する:進化的昆虫分類

Unleashing the Power of Transfer Learning Model for Sophisticated Insect Detection: Revolutionizing Insect Classification ( http://arxiv.org/abs/2406.07716v1 )

ライセンス: Link先を確認

Md. Mahmudul Hasan, SM Shaqib, Ms. Sharmin Akter, Rabiul Alam, Afraz Ul Haque, Shahrun akter khushbu,

(参考訳) 作物・植物健康のための昆虫検出システムの目的は、農業地帯における昆虫の寄生虫の発見と発見を目立たせることである。コンピュータービジョンや機械学習などの最先端技術を活用して、有害昆虫を迅速かつ正確に識別する。これにより、作物を救い、最適な植物の健康を維持することができる。本研究は,データ取得,前処理,データ分割,モデル実装,モデル評価を含む。この研究ではMobileNetV2、ResNet152V2、Xecption、Custom CNNといった異なるモデルが使用された。昆虫の写真を分類するために,ResNet152V2アーキテクチャに基づく畳み込みニューラルネットワーク(CNN)を構築し,評価した。 ResNet152V2は、99%のトレーニング精度と97%のテスト精度を達成した。この結果は、昆虫の分類と昆虫学研究における現実世界の応用の可能性を強調し、効率と精度を強調した。食料の安全を確保し、世界の農業生産を維持するためには、昆虫の発見が不可欠である。 ResNet152V2モデルのようなカットエッジ技術は、昆虫の識別の自動化と精度の向上に大きな影響を与えている。効率的な昆虫検出は作物の損失を最小限に抑えるだけでなく、農業の生産性を高め、持続可能な食料生産に寄与する。このことは、グローバルな食料安全保障に関わる課題に対処する上で、テクノロジーが重要な役割を担っていることを裏付けている。

The purpose of the Insect Detection System for Crop and Plant Health is to keep an eye out for and identify insect infestations in farming areas. By utilizing cutting-edge technology like computer vision and machine learning, the system seeks to identify hazardous insects early and accurately. This would enable prompt response to save crops and maintain optimal plant health. The Method of this study includes Data Acquisition, Preprocessing, Data splitting, Model Implementation and Model evaluation. Different models like MobileNetV2, ResNet152V2, Xecption, Custom CNN was used in this study. In order to categorize insect photos, a Convolutional Neural Network (CNN) based on the ResNet152V2 architecture is constructed and evaluated in this work. Achieving 99% training accuracy and 97% testing accuracy, ResNet152V2 demonstrates superior performance among four implemented models. The results highlight its potential for real-world applications in insect classification and entomology studies, emphasizing efficiency and accuracy. To ensure food security and sustain agricultural output globally, finding insects is crucial. Cutting-edge technology, such as ResNet152V2 models, greatly influence automating and improving the accuracy of insect identification. Efficient insect detection not only minimizes crop losses but also enhances agricultural productivity, contributing to sustainable food production. This underscores the pivotal role of technology in addressing challenges related to global food security.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# 離散時間におけるアクティブ推論の簡潔な数学的記述

A Concise Mathematical Description of Active Inference in Discrete Time ( http://arxiv.org/abs/2406.07726v1 )

ライセンス: Link先を確認

Jesse van Oostrum, Carlotta Langer, Nihat Ay,

(参考訳) 本稿では,離散時間における能動推論の簡潔な数学的記述について述べる。本論文の主部は,行動選択理論の具体例を含む,このトピックの一般的な紹介として機能する。付録では、より微妙な数学的詳細が議論されている。この部分は、既に活発な推論文学を研究しているが、数学的詳細や導出を理解するのに苦労している読者を対象としている。写本全体を通して、標準的な数学的テキストと正確かつ一致した表記法を採用することに特に注意が払われている。すべての方程式と導出は、トピック上の他の人気のあるテキストの特定の方程式数に関連付けられている。さらに,本論文で記述したアクション選択機構を実装し,pymdp環境と互換性を持つPythonコードも提供される。

In this paper we present a concise mathematical description of active inference in discrete time. The main part of the paper serves as a general introduction to the topic, including an example illustrating the theory on action selection. In the appendix the more subtle mathematical details are discussed. This part is aimed at readers who have already studied the active inference literature but struggle to make sense of the mathematical details and derivations. Throughout the whole manuscript, special attention has been paid to adopting notation that is both precise and in line with standard mathematical texts. All equations and derivations are linked to specific equation numbers in other popular text on the topic. Furthermore, Python code is provided that implements the action selection mechanism described in this paper and is compatible with pymdp environments.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# 効率的な並列マルチホップ推論:知識グラフ解析のためのスケーラブルなアプローチ

Efficient Parallel Multi-Hop Reasoning: A Scalable Approach for Knowledge Graph Analysis ( http://arxiv.org/abs/2406.07727v1 )

ライセンス: Link先を確認

Jesmin Jahan Tithi, Fabio Checconi, Fabrizio Petrini,

(参考訳) マルチホップ推論(MHR、Multi-hop reasoning)は、人工知能と自然言語処理におけるプロセスであり、システムは結論または答えに到達するために複数の推論ステップを行う必要がある。知識グラフやデータベースのコンテキストでは、複雑なクエリを理解したり、より深い理解を必要とするタスクを実行するために、複数のリンクされたエンティティや関係をトラバースする。マルチホップ推論は、質問応答、知識ベース補完、リンク予測など、様々なアプリケーションにおいて重要な機能である。人工知能、機械学習、グラフ分析に多大な関心を寄せている。本稿では,大規模グラフ上での時間効率の最適化に焦点をあて,直交目標である精度の従来の重視から逸脱する。本稿では,知識グラフ内の頂点間の上位K経路を効率よく識別し,3つのホップクエリの最適解を求めるために,ドメイン固有の学習埋め込みを利用する並列アルゴリズムを提案する。 1) MHRの性能, スケーラビリティ, 効率を向上させるための新しい並列アルゴリズムを提案する。 2) 先進的なIntelおよびAMDアーキテクチャにおけるアルゴリズムの優れた性能を実証実験により示す。本稿では,深層学習におけるチューリング賞の学術的関連性を特定するためのケーススタディを通じて,アルゴリズムの実践性を実証し,複雑な実体関係を扱う能力を強調した。これは、現代の知識グラフの複雑さの増大をナビゲートするのに有用な、高性能なMHRを実現するための我々のアプローチの可能性を示すものである。

Multi-hop reasoning (MHR) is a process in artificial intelligence and natural language processing where a system needs to make multiple inferential steps to arrive at a conclusion or answer. In the context of knowledge graphs or databases, it involves traversing multiple linked entities and relationships to understand complex queries or perform tasks requiring a deeper understanding. Multi-hop reasoning is a critical function in various applications, including question answering, knowledge base completion, and link prediction. It has garnered significant interest in artificial intelligence, machine learning, and graph analytics. This paper focuses on optimizing MHR for time efficiency on large-scale graphs, diverging from the traditional emphasis on accuracy which is an orthogonal goal. We introduce a novel parallel algorithm that harnesses domain-specific learned embeddings to efficiently identify the top K paths between vertices in a knowledge graph to find the best answers to a three-hop query. Our contributions are: (1) We present a new parallel algorithm to enhance MHR performance, scalability and efficiency. (2) We demonstrate the algorithm's superior performance on leading-edge Intel and AMD architectures through empirical results. We showcase the algorithm's practicality through a case study on identifying academic affiliations of potential Turing Award laureates in Deep Learning, highlighting its capability to handle intricate entity relationships. This demonstrates the potential of our approach to enabling high-performance MHR, useful to navigate the growing complexity of modern knowledge graphs.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# 素因数分解問題に対するD波量子アニールの実験

Experimenting with D-Wave Quantum Annealers on Prime Factorization problems ( http://arxiv.org/abs/2406.07732v1 )

ライセンス: Link先を確認

Jingwen Ding, Giuseppe Spallitta, Roberto Sebastiani,

(参考訳) この論文は、我々が最近発表した論文の上に構築されており、量子アニールによる素因数分解(PF)に対する新しいアプローチを提案しており、そこでは8,219,999=32,749x251が分解可能な最高素数である。しかし、これらの結果に繋がる一連のアニール実験は、直線的な経路をたどるものではなく、失敗または部分的に失敗する試みとバックトラックに満ちた、複雑な試行錯誤プロセスに関係していたため、最終的に成功したアニール戦略を見つけることができたのです。本稿では、実験的な意思決定の背後にある理由を掘り下げ、その結果を達成できる最終戦略を思いつく前に、私たちが行った試みのいくつかについて説明します。これはまた、私たちが調査した多くのアイデア、テクニック、戦略を含んでいます。最終的に私たちが採用したものは、D-Waveのユーザや実践者の、より専門化されたオーディエンスに洞察を与えることができます。 i$) 異なる初期化技術がパフォーマンスに影響を与えることを示し、そのうちの1つは、局所的な構造的埋め込みをターゲットとする場合のフラックスバイアスが効果的であること、(ii$) 連鎖強度は、グローバルな埋め込みに依存する問題よりも局所的な構造的埋め込みに低い影響を持つこと、(iii$) 壊れたチェーンと励起されたCFAの間にトレードオフがあること、そして、単一のキュービットの代わりにモジュールをベースとした漸進的なオフセット救済アプローチが提案されている。このように、私たちの経験の詳細を共有することで、進化を続ける量子アニールの風景についての洞察を提供し、人々がD-Wave量子アニールにアクセスし、効果的に利用することを目指している。

This paper builds on top of a paper we have published very recently, in which we have proposed a novel approach to prime factorization (PF) by quantum annealing, where 8,219,999=32,749x251 was the highest prime product we were able to factorize -- which, to the best of our knowledge is the largest number which was ever factorized by means of a quantum device. The series of annealing experiments which led us to these results, however, did not follow a straight-line path; rather, they involved a convoluted trial-and-error process, full of failed or partially-failed attempts and backtracks, which only in the end drove us to find the successful annealing strategies. In this paper, we delve into the reasoning behind our experimental decisions and provide an account of some of the attempts we have taken before conceiving the final strategies that allowed us to achieve the results. This involves also a bunch of ideas, techniques, and strategies we investigated which, although turned out to be inferior wrt. those we adopted in the end, may instead provide insights to a more-specialized audience of D-Wave users and practitioners. In particular, we show the following insights: ($i$) different initialization techniques affect performances, among which flux biases are effective when targeting locally-structured embeddings; ($ii$) chain strengths have a lower impact in locally-structured embeddings compared to problem relying on global embeddings; ($iii$) there is a trade-off between broken chain and excited CFAs, suggesting an incremental annealing offset remedy approach based on the modules instead of single qubits. Thus, by sharing the details of our experiences, we aim to provide insights into the evolving landscape of quantum annealing, and help people access and effectively use D-Wave quantum annealers.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# REALサンプリング:漸近エントロピーによるオープンエンデッドジェネレーションの現実性と多様性を高める

REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy ( http://arxiv.org/abs/2406.07735v1 )

ライセンス: Link先を確認

Haw-Shiuan Chang, Nanyun Peng, Mohit Bansal, Anil Ramakrishna, Tagyoung Chung,

(参考訳) 大規模言語モデル(LLM)の復号法は通常、事実性の確保と多様性の維持のトレードオフに苦慮する。例えば、核内の高pしきい値(トップp)のサンプリングは多様性を増すが、事実性を低下させる。本稿では,適応しきい値の$p$を予測して,実効性と核サンプリングの多様性を向上させる復号法であるREAL(Residual Entropy from Asymptotic Line)サンプリングを提案する。具体的には、REAL サンプリングは LLM が幻覚するステップワイドな確率を予測し、 LLM が幻覚する確率の p 閾値を下げる。そうでなければ、REALサンプリングは多様性を高めるためにpしきい値を増加させる。本研究では,次のトークンの漸近エントロピー(すなわち固有の不確実性)を,異なる大きさのLCMから次トーケンエントロピーを外挿することによって予測する,トークンレベルの幻覚予測(THF)モデルを構築した。 LLMのエントロピーが漸近エントロピーよりも高い場合、THFモデルは高い幻覚障害を予測し、REALサンプリングではp閾値が低い。 FactualityPromptsベンチマークでは,70M THFモデルに基づくREALサンプリングが,検索基準と人的評価の両方から,7B LLMの事実と多様性を同時に改善できることが示されている。対照的な復号法と組み合わせて、REALサンプリングは9つのサンプリング方法より優れ、グリーディサンプリングよりも現実的で、$p=0.5$の核サンプリングよりも多種多様であるテキストを生成する。さらに、予測された漸近性エントロピーは幻覚検出タスクに有用な教師なし信号である。

Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. For example, a higher p threshold in the nucleus (top-p) sampling increases the diversity but decreases the factuality, and vice versa. In this paper, we propose REAL (Residual Entropy from Asymptotic Line) sampling, a decoding method that achieves improved factuality and diversity over nucleus sampling by predicting an adaptive threshold of $p$. Specifically, REAL sampling predicts the step-wise likelihood of an LLM to hallucinate, and lowers the p threshold when an LLM is likely to hallucinate. Otherwise, REAL sampling increases the p threshold to boost the diversity. To predict the step-wise hallucination likelihood without supervision, we construct a Token-level Hallucination Forecasting (THF) model to predict the asymptotic entropy (i.e., inherent uncertainty) of the next token by extrapolating the next-token entropies from a series of LLMs with different sizes. If a LLM's entropy is higher than the asymptotic entropy (i.e., the LLM is more uncertain than it should be), the THF model predicts a high hallucination hazard, which leads to a lower p threshold in REAL sampling. In the FactualityPrompts benchmark, we demonstrate that REAL sampling based on a 70M THF model can substantially improve the factuality and diversity of 7B LLMs simultaneously, judged by both retrieval-based metrics and human evaluation. After combined with contrastive decoding, REAL sampling outperforms 9 sampling methods, and generates texts that are more factual than the greedy sampling and more diverse than the nucleus sampling with $p=0.5$. Furthermore, the predicted asymptotic entropy is also a useful unsupervised signal for hallucination detection tasks.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# MultiPragEval:大規模言語モデルの多言語プラグマティック評価

MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models ( http://arxiv.org/abs/2406.07736v1 )

ライセンス: Link先を確認

Dojun Park, Jiwoo Lee, Seohyun Park, Hyeyun Jeong, Youngeun Koo, Soonha Hwang, Seonwoo Park, Sungeun Lee,

(参考訳) LLMの能力が拡大するにつれて、より高度な言語理解に焦点をあてて、基本的な知識評価以上の評価を行うことがますます重要になる。本研究は,英語,ドイツ語,韓国語,中国語におけるLLMの多言語的評価を目的とした頑健なテストスイートであるMultiPragEvalを紹介する。 Griceの協力原理と4つの会話の最大値に基づいて分類された1200の質問ユニットを補完するMultiPragEvalは、LLMの文脈認識とインプリケートされた意味を推測する能力の詳細な評価を可能にする。以上の結果から,Claude3-Opusはすべてのテスト言語で他のモデルよりも優れており,この分野における最先端の確立が期待できる。オープンソースのモデルでは、Solar-10.7BとQwen1.5-14Bが強力なライバルとして登場している。この研究は、実用的推論におけるLLMの多言語評価の道のりを導くだけでなく、AIシステムにおける高度な言語理解に必要なニュアンスド能力に関する貴重な洞察を提供する。

As the capabilities of LLMs expand, it becomes increasingly important to evaluate them beyond basic knowledge assessment, focusing on higher-level language understanding. This study introduces MultiPragEval, a robust test suite designed for the multilingual pragmatic evaluation of LLMs across English, German, Korean, and Chinese. Comprising 1200 question units categorized according to Grice's Cooperative Principle and its four conversational maxims, MultiPragEval enables an in-depth assessment of LLMs' contextual awareness and their ability to infer implied meanings. Our findings demonstrate that Claude3-Opus significantly outperforms other models in all tested languages, establishing a state-of-the-art in the field. Among open-source models, Solar-10.7B and Qwen1.5-14B emerge as strong competitors. This study not only leads the way in the multilingual evaluation of LLMs in pragmatic inference but also provides valuable insights into the nuanced capabilities necessary for advanced language comprehension in AI systems.

翻訳日:2024-06-13 21:06:16 公開日:2024-06-11

# AI駆動の世界におけるソフトウェア工学の未来

The Future of Software Engineering in an AI-Driven World ( http://arxiv.org/abs/2406.07737v1 )

ライセンス: Link先を確認

Valerio Terragni, Partha Roop, Kelly Blincoe,

(参考訳) ソフトウェアエンジニアリングではパラダイムシフトが進行中であり、LLMのようなAIシステムがソフトウェア開発の生産性向上に重要性を増している。この傾向は続くと予測されている。今後5年間では、人間開発者とAIの共生的なパートナーシップが増加するだろう。私たちは、AIをソフトウェア開発プロセスに統合することによって引き起こされる重要な研究課題に対処する必要があります。本稿では、AI駆動の世界におけるソフトウェア開発の将来についてのビジョンを示し、このビジョンを実現するために研究コミュニティが取り組むべき重要な課題について考察する。

A paradigm shift is underway in Software Engineering, with AI systems such as LLMs gaining increasing importance for improving software development productivity. This trend is anticipated to persist. In the next five years, we will likely see an increasing symbiotic partnership between human developers and AI. The Software Engineering research community cannot afford to overlook this trend; we must address the key research challenges posed by the integration of AI into the software development process. In this paper, we present our vision of the future of software development in an AI-Driven world and explore the key challenges that our research community should address to realize this vision.